[jira] [Commented] (SPARK-44280) Add convertJavaTimestampToTimestamp in JDBCDialect API
[ https://issues.apache.org/jira/browse/SPARK-44280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740393#comment-17740393 ] Snoot.io commented on SPARK-44280: -- User 'mingkangli-db' has created a pull request for this issue: https://github.com/apache/spark/pull/41843 > Add convertJavaTimestampToTimestamp in JDBCDialect API > -- > > Key: SPARK-44280 > URL: https://issues.apache.org/jira/browse/SPARK-44280 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Mingkang Li >Priority: Major > > A new method, {{{}convertJavaTimestampToTimestamp{}}}, is introduced to the > JDBCDialects API, providing the capability for JDBC dialects to override the > default Java timestamp conversion behavior. This enhancement is particularly > beneficial for databases such as PostgreSQL, which feature special values for > timestamps representing positive and negative infinity. > The pre-existing default behavior of timestamp conversion potentially > triggers an overflow due to these special values (i.e. The executor would > crash if you select a column that contains infinity timestamps in > PostgreSQL.) By integrating this new function, we can mitigate such issues, > enabling more versatile and robust timestamp value conversions across various > JDBC-based connectors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44317) Define the computing logic through PartitionEvaluator API and use it in ShuffledHashJoinExec
[ https://issues.apache.org/jira/browse/SPARK-44317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740392#comment-17740392 ] Snoot.io commented on SPARK-44317: -- User 'vinodkc' has created a pull request for this issue: https://github.com/apache/spark/pull/41875 > Define the computing logic through PartitionEvaluator API and use it in > ShuffledHashJoinExec > > > Key: SPARK-44317 > URL: https://issues.apache.org/jira/browse/SPARK-44317 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in > ShuffledHashJoinExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44317) Define the computing logic through PartitionEvaluator API and use it in ShuffledHashJoinExec
[ https://issues.apache.org/jira/browse/SPARK-44317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740391#comment-17740391 ] Snoot.io commented on SPARK-44317: -- User 'vinodkc' has created a pull request for this issue: https://github.com/apache/spark/pull/41875 > Define the computing logic through PartitionEvaluator API and use it in > ShuffledHashJoinExec > > > Key: SPARK-44317 > URL: https://issues.apache.org/jira/browse/SPARK-44317 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in > ShuffledHashJoinExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44317) Define the computing logic through PartitionEvaluator API and use it in ShuffledHashJoinExec
Vinod KC created SPARK-44317: Summary: Define the computing logic through PartitionEvaluator API and use it in ShuffledHashJoinExec Key: SPARK-44317 URL: https://issues.apache.org/jira/browse/SPARK-44317 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Vinod KC Define the computing logic through PartitionEvaluator API and use it in ShuffledHashJoinExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44268) Add tests to ensure error-classes.json and docs are in sync
[ https://issues.apache.org/jira/browse/SPARK-44268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740388#comment-17740388 ] Snoot.io commented on SPARK-44268: -- User 'Hisoka-X' has created a pull request for this issue: https://github.com/apache/spark/pull/41865 > Add tests to ensure error-classes.json and docs are in sync > --- > > Key: SPARK-44268 > URL: https://issues.apache.org/jira/browse/SPARK-44268 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Jia Fan >Assignee: Jia Fan >Priority: Major > Fix For: 3.5.0 > > > We should add tests to ensure error-classes.json and docs are in sync, docs > and error-classes.json are always up to date before the PR is committed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44314) Add a new checkstyle rule to prohibit the use of `@Test(expected = SomeException.class)`
[ https://issues.apache.org/jira/browse/SPARK-44314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740386#comment-17740386 ] Snoot.io commented on SPARK-44314: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/41872 > Add a new checkstyle rule to prohibit the use of `@Test(expected = > SomeException.class)` > > > Key: SPARK-44314 > URL: https://issues.apache.org/jira/browse/SPARK-44314 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > [https://github.com/junit-team/junit4/wiki/Exception-testing] > > {code:java} > The expected parameter should be used with care. The above test will pass if > any code in the method throws IndexOutOfBoundsException. Using the method you > also cannot test the value of the message in the exception, or the state of a > domain object after the exception has been thrown.For these reasons, the > previous approaches are recommended. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44316) Upgrade Jersey to 2.40
BingKun Pan created SPARK-44316: --- Summary: Upgrade Jersey to 2.40 Key: SPARK-44316 URL: https://issues.apache.org/jira/browse/SPARK-44316 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44252) Add error class for the case when loading state from DFS fails
[ https://issues.apache.org/jira/browse/SPARK-44252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740381#comment-17740381 ] Hudson commented on SPARK-44252: User 'lucyyao-db' has created a pull request for this issue: https://github.com/apache/spark/pull/41705 > Add error class for the case when loading state from DFS fails > -- > > Key: SPARK-44252 > URL: https://issues.apache.org/jira/browse/SPARK-44252 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Lucy Yao >Priority: Major > > This is part of [https://github.com/apache/spark/pull/41705.] > Wrap the exception during the loading state, to assign error class properly. > With assigning error class, we can classify the errors which help us to > determine what errors customers are struggling much. > StateStoreProvider.getStore() & StateStoreProvider.getReadStore() is the > entry point. > This ticket also covers failedToReadDeltaFileError and > failedToReadSnapshotFileError from > [https://issues.apache.org/jira/browse/SPARK-36305|http://example.com/]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44315) Move DefinedByConstructorParams to sql/api
Rui Wang created SPARK-44315: Summary: Move DefinedByConstructorParams to sql/api Key: SPARK-44315 URL: https://issues.apache.org/jira/browse/SPARK-44315 Project: Spark Issue Type: Sub-task Components: Connect, SQL Affects Versions: 3.5.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44303) Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324]
[ https://issues.apache.org/jira/browse/SPARK-44303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740380#comment-17740380 ] Mike K commented on SPARK-44303: User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/41863 > Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324] > -- > > Key: SPARK-44303 > URL: https://issues.apache.org/jira/browse/SPARK-44303 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44314) Add a new checkstyle rule to prohibit the use of `@Test(expected = SomeException.class)`
Yang Jie created SPARK-44314: Summary: Add a new checkstyle rule to prohibit the use of `@Test(expected = SomeException.class)` Key: SPARK-44314 URL: https://issues.apache.org/jira/browse/SPARK-44314 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: Yang Jie [https://github.com/junit-team/junit4/wiki/Exception-testing] {code:java} The expected parameter should be used with care. The above test will pass if any code in the method throws IndexOutOfBoundsException. Using the method you also cannot test the value of the message in the exception, or the state of a domain object after the exception has been thrown.For these reasons, the previous approaches are recommended. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44313) Generated column expression validation fails if there is a char/varchar column anywhere in the schema
[ https://issues.apache.org/jira/browse/SPARK-44313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-44313. -- Fix Version/s: 3.5.0 3.4.2 Resolution: Fixed Issue resolved by pull request 41868 [https://github.com/apache/spark/pull/41868] > Generated column expression validation fails if there is a char/varchar > column anywhere in the schema > - > > Key: SPARK-44313 > URL: https://issues.apache.org/jira/browse/SPARK-44313 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.4.1 >Reporter: Allison Portis >Assignee: Allison Portis >Priority: Major > Fix For: 3.5.0, 3.4.2 > > > When validating generated column expressions, this call > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala#L123 > to checkAnalysis fails when there are char or varchar columns anywhere in > the schema. > > For example, this query will fail > {code:java} > CREATE TABLE default.example ( > name VARCHAR(64), > tstamp TIMESTAMP, > tstamp_date DATE GENERATED ALWAYS AS (CAST(tstamp as DATE)) > ){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44313) Generated column expression validation fails if there is a char/varchar column anywhere in the schema
[ https://issues.apache.org/jira/browse/SPARK-44313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-44313: Assignee: Allison Portis > Generated column expression validation fails if there is a char/varchar > column anywhere in the schema > - > > Key: SPARK-44313 > URL: https://issues.apache.org/jira/browse/SPARK-44313 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.4.1 >Reporter: Allison Portis >Assignee: Allison Portis >Priority: Major > > When validating generated column expressions, this call > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala#L123 > to checkAnalysis fails when there are char or varchar columns anywhere in > the schema. > > For example, this query will fail > {code:java} > CREATE TABLE default.example ( > name VARCHAR(64), > tstamp TIMESTAMP, > tstamp_date DATE GENERATED ALWAYS AS (CAST(tstamp as DATE)) > ){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44215) Client receives zero number of chunks in merge meta response which doesn't trigger fallback to unmerged blocks
[ https://issues.apache.org/jira/browse/SPARK-44215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740368#comment-17740368 ] Mridul Muralidharan commented on SPARK-44215: - Issue resolved by pull request 41762 https://github.com/apache/spark/pull/41762 > Client receives zero number of chunks in merge meta response which doesn't > trigger fallback to unmerged blocks > -- > > Key: SPARK-44215 > URL: https://issues.apache.org/jira/browse/SPARK-44215 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Fix For: 3.3.3, 3.5.0, 3.4.2 > > > We still see instances of the server returning 0 {{numChunks}} in > {{mergedMetaResponse}} which causes the executor to fail with > {{ArithmeticException}}. > {code} > java.lang.ArithmeticException: / by zero > at > org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:128) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:1047) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:90) > at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) > at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) > {code} > Here the executor doesn't fallback to fetch un-merged blocks and this also > doesn't result in a {{FetchFailure}}. So, the application fails. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44215) Client receives zero number of chunks in merge meta response which doesn't trigger fallback to unmerged blocks
[ https://issues.apache.org/jira/browse/SPARK-44215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan updated SPARK-44215: Fix Version/s: 3.3.3 > Client receives zero number of chunks in merge meta response which doesn't > trigger fallback to unmerged blocks > -- > > Key: SPARK-44215 > URL: https://issues.apache.org/jira/browse/SPARK-44215 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Fix For: 3.3.3, 3.5.0, 3.4.2 > > > We still see instances of the server returning 0 {{numChunks}} in > {{mergedMetaResponse}} which causes the executor to fail with > {{ArithmeticException}}. > {code} > java.lang.ArithmeticException: / by zero > at > org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:128) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:1047) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:90) > at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) > at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) > {code} > Here the executor doesn't fallback to fetch un-merged blocks and this also > doesn't result in a {{FetchFailure}}. So, the application fails. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44154) Bitmap functions
[ https://issues.apache.org/jira/browse/SPARK-44154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-44154. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41623 [https://github.com/apache/spark/pull/41623] > Bitmap functions > > > Key: SPARK-44154 > URL: https://issues.apache.org/jira/browse/SPARK-44154 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Entong Shen >Priority: Major > Fix For: 3.5.0 > > > Implemented bitmap functions. The functions are: > * {{{}bitmap_bucket_number(){}}}: returns the bucket number for a given > input number > * {{{}bitmap_bit_position(){}}}: returns bit position for a given input > number > * {{{}bitmap_count(){}}}: returns the number of set bits from an input bitmap > * {{{}bitmap_construct_agg(){}}}: aggregation function that aggregates input > bit positions, and creates a bitmap > * {{{}bitmap_or_agg(){}}}: aggregation function that performs a bitwise OR > on all the input bitmaps -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44310) The Connect Server startup log should display the hostname and port
[ https://issues.apache.org/jira/browse/SPARK-44310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44310: Assignee: BingKun Pan > The Connect Server startup log should display the hostname and port > --- > > Key: SPARK-44310 > URL: https://issues.apache.org/jira/browse/SPARK-44310 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44310) The Connect Server startup log should display the hostname and port
[ https://issues.apache.org/jira/browse/SPARK-44310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44310. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41862 [https://github.com/apache/spark/pull/41862] > The Connect Server startup log should display the hostname and port > --- > > Key: SPARK-44310 > URL: https://issues.apache.org/jira/browse/SPARK-44310 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44313) Generated column expression validation fails if there is a char/varchar column anywhere in the schema
Allison Portis created SPARK-44313: -- Summary: Generated column expression validation fails if there is a char/varchar column anywhere in the schema Key: SPARK-44313 URL: https://issues.apache.org/jira/browse/SPARK-44313 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.1, 3.4.0 Reporter: Allison Portis When validating generated column expressions, this call https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala#L123 to checkAnalysis fails when there are char or varchar columns anywhere in the schema. For example, this query will fail {code:java} CREATE TABLE default.example ( name VARCHAR(64), tstamp TIMESTAMP, tstamp_date DATE GENERATED ALWAYS AS (CAST(tstamp as DATE)) ){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44281) Move QueryCompilation error that used by DataType to sql/api as DataTypeErrors
[ https://issues.apache.org/jira/browse/SPARK-44281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang resolved SPARK-44281. -- Resolution: Fixed > Move QueryCompilation error that used by DataType to sql/api as DataTypeErrors > -- > > Key: SPARK-44281 > URL: https://issues.apache.org/jira/browse/SPARK-44281 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44312) [PYTHON][CONNECT] Use SPARK_CONNECT_USER_AGENT environment variable for the user agent
Robert Dillitz created SPARK-44312: -- Summary: [PYTHON][CONNECT] Use SPARK_CONNECT_USER_AGENT environment variable for the user agent Key: SPARK-44312 URL: https://issues.apache.org/jira/browse/SPARK-44312 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.1 Reporter: Robert Dillitz Allow us to prepend a Spark Connect user agent with an environment variable: *SPARK_CONNECT_USER_AGENT* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43416) Fix the bug where the ProduceEncoder#tuples fields names are different from server
[ https://issues.apache.org/jira/browse/SPARK-43416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-43416. --- Fix Version/s: 3.5.0 Assignee: Niranjan Jayakar Resolution: Fixed > Fix the bug where the ProduceEncoder#tuples fields names are different from > server > -- > > Key: SPARK-43416 > URL: https://issues.apache.org/jira/browse/SPARK-43416 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Zhen Li >Assignee: Niranjan Jayakar >Priority: Major > Fix For: 3.5.0 > > > The fields are named _1, _2, ... etc. However on the server side it could be > nicely named in agg operations such as key, value etc. Fix this if possible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44282) Split of DataType parsing for Connect
[ https://issues.apache.org/jira/browse/SPARK-44282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44282. --- Fix Version/s: 3.5.0 Resolution: Fixed > Split of DataType parsing for Connect > - > > Key: SPARK-44282 > URL: https://issues.apache.org/jira/browse/SPARK-44282 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.1 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43416) Fix the bug where the ProduceEncoder#tuples fields names are different from server
[ https://issues.apache.org/jira/browse/SPARK-43416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740312#comment-17740312 ] Zhen Li commented on SPARK-43416: - [~hvanhovell] Yes. Fixed by https://github.com/apache/spark/pull/41846 > Fix the bug where the ProduceEncoder#tuples fields names are different from > server > -- > > Key: SPARK-43416 > URL: https://issues.apache.org/jira/browse/SPARK-43416 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Zhen Li >Priority: Major > > The fields are named _1, _2, ... etc. However on the server side it could be > nicely named in agg operations such as key, value etc. Fix this if possible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43416) Fix the bug where the ProduceEncoder#tuples fields names are different from server
[ https://issues.apache.org/jira/browse/SPARK-43416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740302#comment-17740302 ] Herman van Hövell commented on SPARK-43416: --- [~zhenli] has this been fixed? > Fix the bug where the ProduceEncoder#tuples fields names are different from > server > -- > > Key: SPARK-43416 > URL: https://issues.apache.org/jira/browse/SPARK-43416 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Zhen Li >Priority: Major > > The fields are named _1, _2, ... etc. However on the server side it could be > nicely named in agg operations such as key, value etc. Fix this if possible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44291) [CONNECT][SCALA] range query returns incorrect schema
[ https://issues.apache.org/jira/browse/SPARK-44291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44291. --- Fix Version/s: 3.5.0 Assignee: Niranjan Jayakar Resolution: Fixed > [CONNECT][SCALA] range query returns incorrect schema > - > > Key: SPARK-44291 > URL: https://issues.apache.org/jira/browse/SPARK-44291 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.1 >Reporter: Niranjan Jayakar >Assignee: Niranjan Jayakar >Priority: Major > Fix For: 3.5.0 > > > The following code on Spark Connect produces the following output > Code: > > {code:java} > val df = spark.range(3) > df.show() > df.printSchema(){code} > > Output: > {code:java} > +---+ > | id| > +---+ > | 0| > | 1| > | 2| > +---+ > root > |-- value: long (nullable = true) {code} > The mismatch is that one shows the column as "id" while the other shows this > as "value". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44311) UDF should support function taking value classes
Emil Ejbyfeldt created SPARK-44311: -- Summary: UDF should support function taking value classes Key: SPARK-44311 URL: https://issues.apache.org/jira/browse/SPARK-44311 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.1 Reporter: Emil Ejbyfeldt Running the following code in a spark ``` final case class ValueClass(a: Int) extends AnyVal final case class Wrapper(v: ValueClass) val f = udf((a: ValueClass) => a.a > 0) spark.createDataset(Seq(Wrapper(ValueClass(1.filter(f(col("v"))).show() ``` fails with ``` java.lang.ClassCastException: class org.apache.spark.sql.types.IntegerType$ cannot be cast to class org.apache.spark.sql.types.StructType (org.apache.spark.sql.types.IntegerType$ and org.apache.spark.sql.types.StructType are in unnamed module of loader 'app') at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.$anonfun$applyOrElse$220(Analyzer.scala:3241) at scala.Option.map(Option.scala:242) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.$anonfun$applyOrElse$219(Analyzer.scala:3239) at scala.collection.immutable.List.map(List.scala:246) at scala.collection.immutable.List.map(List.scala:79) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.applyOrElse(Analyzer.scala:3237) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.applyOrElse(Analyzer.scala:3234) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:566) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:566) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42554) Spark Connect Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42554: - Fix Version/s: (was: 3.5.0) > Spark Connect Scala Client > -- > > Key: SPARK-42554 > URL: https://issues.apache.org/jira/browse/SPARK-42554 > Project: Spark > Issue Type: Epic > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > > This is the EPIC to track all the work for the Spark Connect Scala Client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-42554) Spark Connect Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-42554: -- > Spark Connect Scala Client > -- > > Key: SPARK-42554 > URL: https://issues.apache.org/jira/browse/SPARK-42554 > Project: Spark > Issue Type: Epic > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Fix For: 3.5.0 > > > This is the EPIC to track all the work for the Spark Connect Scala Client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44193) Implement GRPC exceptions interception for conversion
[ https://issues.apache.org/jira/browse/SPARK-44193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44193. -- Assignee: Yihong He Resolution: Fixed Fixed in https://github.com/apache/spark/pull/41743 > Implement GRPC exceptions interception for conversion > - > > Key: SPARK-44193 > URL: https://issues.apache.org/jira/browse/SPARK-44193 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yihong He >Assignee: Yihong He >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44310) The Connect Server startup log should display the hostname and port
BingKun Pan created SPARK-44310: --- Summary: The Connect Server startup log should display the hostname and port Key: SPARK-44310 URL: https://issues.apache.org/jira/browse/SPARK-44310 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44299) Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8]
[ https://issues.apache.org/jira/browse/SPARK-44299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740126#comment-17740126 ] ASF GitHub Bot commented on SPARK-44299: User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/41858 > Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8] > - > > Key: SPARK-44299 > URL: https://issues.apache.org/jira/browse/SPARK-44299 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44193) Implement GRPC exceptions interception for conversion
[ https://issues.apache.org/jira/browse/SPARK-44193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740123#comment-17740123 ] ASF GitHub Bot commented on SPARK-44193: User 'heyihong' has created a pull request for this issue: https://github.com/apache/spark/pull/41743 > Implement GRPC exceptions interception for conversion > - > > Key: SPARK-44193 > URL: https://issues.apache.org/jira/browse/SPARK-44193 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yihong He >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42554) Spark Connect Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42554. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41743 [https://github.com/apache/spark/pull/41743] > Spark Connect Scala Client > -- > > Key: SPARK-42554 > URL: https://issues.apache.org/jira/browse/SPARK-42554 > Project: Spark > Issue Type: Epic > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Fix For: 3.5.0 > > > This is the EPIC to track all the work for the Spark Connect Scala Client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44309) Display Add/Remove Time of Executors on ExecutorsTab
Kent Yao created SPARK-44309: Summary: Display Add/Remove Time of Executors on ExecutorsTab Key: SPARK-44309 URL: https://issues.apache.org/jira/browse/SPARK-44309 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.5.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44294) HeapHistogram column shows unexpectedly w/ select-all-box
[ https://issues.apache.org/jira/browse/SPARK-44294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44294: Assignee: Kent Yao > HeapHistogram column shows unexpectedly w/ select-all-box > - > > Key: SPARK-44294 > URL: https://issues.apache.org/jira/browse/SPARK-44294 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.5.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44294) HeapHistogram column shows unexpectedly w/ select-all-box
[ https://issues.apache.org/jira/browse/SPARK-44294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44294. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41847 [https://github.com/apache/spark/pull/41847] > HeapHistogram column shows unexpectedly w/ select-all-box > - > > Key: SPARK-44294 > URL: https://issues.apache.org/jira/browse/SPARK-44294 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.5.0 >Reporter: Kent Yao >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-44305) Broadcast operation is not required when no parameters are specified
[ https://issues.apache.org/jira/browse/SPARK-44305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740018#comment-17740018 ] 7mming7 edited comment on SPARK-44305 at 7/5/23 7:35 AM: - cc [~yuming] [~r...@databricks.com] was (Author: 7mming7): cc [~yuming] > Broadcast operation is not required when no parameters are specified > > > Key: SPARK-44305 > URL: https://issues.apache.org/jira/browse/SPARK-44305 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: 7mming7 >Priority: Minor > Attachments: image-2023-07-05-11-51-41-708.png > > > The ability introduced by SPARK-14912, we can broadcast the parameters of the > data source to the read and write operations, but if the user does not > specify a specific parameter, the propagation operation will also be > performed, which affects the performance has a greater impact, so we need to > avoid broadcasting the full Hadoop parameters when the user does not specify > a specific parameter > > !image-2023-07-05-11-51-41-708.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43438) Fix mismatched column list error on INSERT
[ https://issues.apache.org/jira/browse/SPARK-43438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740048#comment-17740048 ] Max Gekk commented on SPARK-43438: -- Current behaviour on the recent OSS master: {code:sql} spark-sql (default)> CREATE TABLE tabtest(c1 INT, c2 INT); spark-sql (default)> INSERT INTO tabtest SELECT 1; spark-sql (default)> select * from tabtest; 1 NULL spark-sql (default)> INSERT INTO tabtest(c1) SELECT 1, 2, 3; [INSERT_COLUMN_ARITY_MISMATCH.TOO_MANY_DATA_COLUMNS] Cannot write to `spark_catalog`.`default`.`tabtest`, the reason is too many data columns: Table columns: `c1`. Data columns: `1`, `2`, `3`. {code} [~srielau] Are ok with such behaviour? > Fix mismatched column list error on INSERT > -- > > Key: SPARK-43438 > URL: https://issues.apache.org/jira/browse/SPARK-43438 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > This error message is pretty bad, and common > "_LEGACY_ERROR_TEMP_1038" : { > "message" : [ > "Cannot write to table due to mismatched user specified column > size() and data column size()." > ] > }, > It can perhaps be merged with this one - after giving it an ERROR_CLASS > "_LEGACY_ERROR_TEMP_1168" : { > "message" : [ > " requires that the data to be inserted have the same number of > columns as the target table: target table has column(s) but > the inserted data has column(s), including > partition column(s) having constant value(s)." > ] > }, > Repro: > CREATE TABLE tabtest(c1 INT, c2 INT); > INSERT INTO tabtest SELECT 1; > `spark_catalog`.`default`.`tabtest` requires that the data to be inserted > have the same number of columns as the target table: target table has 2 > column(s) but the inserted data has 1 column(s), including 0 partition > column(s) having constant value(s). > INSERT INTO tabtest(c1) SELECT 1, 2, 3; > Cannot write to table due to mismatched user specified column size(1) and > data column size(3).; line 1 pos 24 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44277) Upgrade Avro to version 1.11.2
[ https://issues.apache.org/jira/browse/SPARK-44277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44277: Assignee: Ismaël Mejía > Upgrade Avro to version 1.11.2 > -- > > Key: SPARK-44277 > URL: https://issues.apache.org/jira/browse/SPARK-44277 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.1 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44277) Upgrade Avro to version 1.11.2
[ https://issues.apache.org/jira/browse/SPARK-44277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44277. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41830 [https://github.com/apache/spark/pull/41830] > Upgrade Avro to version 1.11.2 > -- > > Key: SPARK-44277 > URL: https://issues.apache.org/jira/browse/SPARK-44277 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.1 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44308) Spark 3.0.1 functions.scala -> posexplode_outer API not flattening data
Chirag Sanghvi created SPARK-44308: -- Summary: Spark 3.0.1 functions.scala -> posexplode_outer API not flattening data Key: SPARK-44308 URL: https://issues.apache.org/jira/browse/SPARK-44308 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 3.0.1 Reporter: Chirag Sanghvi Spark 3.x API functions.scala -> posexplode_outer to flatten the array column value doesn't work as expected when the table is created with "collection.delim" set to non default value. This used to work as expected in Spark 2.4.5 Use the below DDL to create a hive table CREATE EXTERNAL TABLE `testnorm2`( `enquiryuid` string, `rulestriggered` array) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'collection.delim'=';', 'field.delim'='|', 'line.delim'='\n', 'serialization.format'='|') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; And fill up with the table with array values. The below statements fill up the table with sample data. INSERT INTO testnorm2 SELECT 'A', array('a','b'); INSERT INTO testnorm2 SELECT 'B', array('e','f','g','h'); INSERT INTO testnorm2 SELECT 'C', array(); INSERT INTO testnorm2 SELECT 'D', array(''); INSERT INTO testnorm2 SELECT 'E', array('',''); INSERT INTO testnorm2 SELECT 'F', array('','1','2'); INSERT INTO testnorm2 SELECT 'G', array(null); INSERT INTO testnorm2 SELECT 'H', array(null,''); INSERT INTO testnorm2 SELECT 'I', array(null,'4','5','6'); INSERT INTO testnorm2 SELECT 'G', array(""); Open Spark Shell (in spark 3.0.1) and run below scala code statements val df = spark.sql("select * from testnorm2"); the df.show () gives this output in both cases(spark 2.4 and spark 3.0.1). +--+--+ |enquiryuid|data | +--+--+ | I| [, 4, 5, 6]| | F| [, 1, 2]| | B| [e, f, g, h]| | A| [a, b]| | H| [, ]| | E| [, ]| | G| null| | G| []| | D| []| | C| []| +--+--+ val explodeDF = df.select($"id",(posexplode_outer($"data")); on doing this there is a difference in output for 2.4 and spark 3.0.1 on 2.4.x the output is +--+++ |enquiryuid| pos| col| +--+++ | I| 0|null| | I| 1| 4| | I| 2| 5| | I| 3| 6| | F| 0| | | F| 1| 1| | F| 2| 2| | B| 0| e| | B| 1| f| | B| 2| g| | B| 3| h| | A| 0| a| | A| 1| b| | H| 0|null| | H| 1| | | E| 0| | | E| 1| | | G|null|null| | G|null|null| | D|null|null| +--+++ Whereas in 3.x the output is +--+++ |enquiryuid| pos| col| +--+++ | I| 0|\N,4,5,6| | F| 0| ,1,2| | 1| 0| a,b| | C|null| null| | G|null| null| | B| 0| e,f,g,h| | H| 0| \N,| | G|null| null| | E| 0| ,| | D|null| null| +--+++ The array in column 2 is not getting flattened in the case of spark 3.0.1 but in spark 2.4.5 it gets flattened. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org