[jira] [Assigned] (SPARK-39167) Throw an exception w/ an error class for multiple rows from a subquery used as an expression
[ https://issues.apache.org/jira/browse/SPARK-39167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-39167: Assignee: panbingkun > Throw an exception w/ an error class for multiple rows from a subquery used > as an expression > > > Key: SPARK-39167 > URL: https://issues.apache.org/jira/browse/SPARK-39167 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: panbingkun >Priority: Major > > Users can trigger an illegal state exception by the SQL statement: > {code:sql} > > select (select a from (select 1 as a union all select 2 as a) t) as b > {code} > {code:java} > Caused by: java.lang.IllegalStateException: more than one row returned by a > subquery used as an expression: > Subquery subquery#242, [id=#100] > +- AdaptiveSparkPlan isFinalPlan=true >+- == Final Plan == > Union > :- *(1) Project [1 AS a#240] > : +- *(1) Scan OneRowRelation[] > +- *(2) Project [2 AS a#241] > +- *(2) Scan OneRowRelation[] >+- == Initial Plan == > Union > :- Project [1 AS a#240] > : +- Scan OneRowRelation[] > +- Project [2 AS a#241] > +- Scan OneRowRelation[] > at > org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:83) > {code} > but such kind of exceptions are not supposed to be visible to users. Need to > introduce an error class (or re-use an existing one), and replace the > IllegalStateException. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39167) Throw an exception w/ an error class for multiple rows from a subquery used as an expression
[ https://issues.apache.org/jira/browse/SPARK-39167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-39167. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36580 [https://github.com/apache/spark/pull/36580] > Throw an exception w/ an error class for multiple rows from a subquery used > as an expression > > > Key: SPARK-39167 > URL: https://issues.apache.org/jira/browse/SPARK-39167 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: panbingkun >Priority: Major > Fix For: 3.4.0 > > > Users can trigger an illegal state exception by the SQL statement: > {code:sql} > > select (select a from (select 1 as a union all select 2 as a) t) as b > {code} > {code:java} > Caused by: java.lang.IllegalStateException: more than one row returned by a > subquery used as an expression: > Subquery subquery#242, [id=#100] > +- AdaptiveSparkPlan isFinalPlan=true >+- == Final Plan == > Union > :- *(1) Project [1 AS a#240] > : +- *(1) Scan OneRowRelation[] > +- *(2) Project [2 AS a#241] > +- *(2) Scan OneRowRelation[] >+- == Initial Plan == > Union > :- Project [1 AS a#240] > : +- Scan OneRowRelation[] > +- Project [2 AS a#241] > +- Scan OneRowRelation[] > at > org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:83) > {code} > but such kind of exceptions are not supposed to be visible to users. Need to > introduce an error class (or re-use an existing one), and replace the > IllegalStateException. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39245) Support Avro file scans with DEFAULT values
[ https://issues.apache.org/jira/browse/SPARK-39245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540335#comment-17540335 ] Apache Spark commented on SPARK-39245: -- User 'dtenedor' has created a pull request for this issue: https://github.com/apache/spark/pull/36623 > Support Avro file scans with DEFAULT values > --- > > Key: SPARK-39245 > URL: https://issues.apache.org/jira/browse/SPARK-39245 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39245) Support Avro file scans with DEFAULT values
[ https://issues.apache.org/jira/browse/SPARK-39245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39245: Assignee: Apache Spark > Support Avro file scans with DEFAULT values > --- > > Key: SPARK-39245 > URL: https://issues.apache.org/jira/browse/SPARK-39245 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39245) Support Avro file scans with DEFAULT values
[ https://issues.apache.org/jira/browse/SPARK-39245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39245: Assignee: (was: Apache Spark) > Support Avro file scans with DEFAULT values > --- > > Key: SPARK-39245 > URL: https://issues.apache.org/jira/browse/SPARK-39245 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39245) Support Avro file scans with DEFAULT values
Daniel created SPARK-39245: -- Summary: Support Avro file scans with DEFAULT values Key: SPARK-39245 URL: https://issues.apache.org/jira/browse/SPARK-39245 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Daniel -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39244) Use `--no-echo` instead of `--slave` in R 4.0
[ https://issues.apache.org/jira/browse/SPARK-39244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39244: Assignee: (was: Apache Spark) > Use `--no-echo` instead of `--slave` in R 4.0 > - > > Key: SPARK-39244 > URL: https://issues.apache.org/jira/browse/SPARK-39244 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39244) Use `--no-echo` instead of `--slave` in R 4.0
[ https://issues.apache.org/jira/browse/SPARK-39244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39244: Assignee: Apache Spark > Use `--no-echo` instead of `--slave` in R 4.0 > - > > Key: SPARK-39244 > URL: https://issues.apache.org/jira/browse/SPARK-39244 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: William Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39244) Use `--no-echo` instead of `--slave` in R 4.0
[ https://issues.apache.org/jira/browse/SPARK-39244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540325#comment-17540325 ] Apache Spark commented on SPARK-39244: -- User 'williamhyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36622 > Use `--no-echo` instead of `--slave` in R 4.0 > - > > Key: SPARK-39244 > URL: https://issues.apache.org/jira/browse/SPARK-39244 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39244) Use `--no-echo` instead of `--slave` in R 4.0
[ https://issues.apache.org/jira/browse/SPARK-39244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540326#comment-17540326 ] Apache Spark commented on SPARK-39244: -- User 'williamhyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36622 > Use `--no-echo` instead of `--slave` in R 4.0 > - > > Key: SPARK-39244 > URL: https://issues.apache.org/jira/browse/SPARK-39244 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39244) Use `--no-echo` instead of `--slave` in R 4.0
William Hyun created SPARK-39244: Summary: Use `--no-echo` instead of `--slave` in R 4.0 Key: SPARK-39244 URL: https://issues.apache.org/jira/browse/SPARK-39244 Project: Spark Issue Type: Task Components: Project Infra Affects Versions: 3.4.0 Reporter: William Hyun -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39243) Describe the rules of quoting elements in error messages
[ https://issues.apache.org/jira/browse/SPARK-39243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39243: Assignee: (was: Apache Spark) > Describe the rules of quoting elements in error messages > > > Key: SPARK-39243 > URL: https://issues.apache.org/jira/browse/SPARK-39243 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Add a comment to QueryErrorsBase and describe rules of quoting > elements/parameters in error messages. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39243) Describe the rules of quoting elements in error messages
[ https://issues.apache.org/jira/browse/SPARK-39243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39243: Assignee: Apache Spark > Describe the rules of quoting elements in error messages > > > Key: SPARK-39243 > URL: https://issues.apache.org/jira/browse/SPARK-39243 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Add a comment to QueryErrorsBase and describe rules of quoting > elements/parameters in error messages. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39243) Describe the rules of quoting elements in error messages
[ https://issues.apache.org/jira/browse/SPARK-39243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540296#comment-17540296 ] Apache Spark commented on SPARK-39243: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/36621 > Describe the rules of quoting elements in error messages > > > Key: SPARK-39243 > URL: https://issues.apache.org/jira/browse/SPARK-39243 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Add a comment to QueryErrorsBase and describe rules of quoting > elements/parameters in error messages. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39243) Describe the rules of quoting elements in error messages
Max Gekk created SPARK-39243: Summary: Describe the rules of quoting elements in error messages Key: SPARK-39243 URL: https://issues.apache.org/jira/browse/SPARK-39243 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Max Gekk Add a comment to QueryErrorsBase and describe rules of quoting elements/parameters in error messages. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39213) Create ANY_VALUE aggregate function
[ https://issues.apache.org/jira/browse/SPARK-39213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-39213: Assignee: Vitalii Li > Create ANY_VALUE aggregate function > --- > > Key: SPARK-39213 > URL: https://issues.apache.org/jira/browse/SPARK-39213 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Assignee: Vitalii Li >Priority: Major > > This is a feature request to add an \{{ANY_VALUE}} aggregate function. This > would consume input values and quickly return any arbitrary element. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39213) Create ANY_VALUE aggregate function
[ https://issues.apache.org/jira/browse/SPARK-39213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-39213. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36584 [https://github.com/apache/spark/pull/36584] > Create ANY_VALUE aggregate function > --- > > Key: SPARK-39213 > URL: https://issues.apache.org/jira/browse/SPARK-39213 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Assignee: Vitalii Li >Priority: Major > Fix For: 3.4.0 > > > This is a feature request to add an \{{ANY_VALUE}} aggregate function. This > would consume input values and quickly return any arbitrary element. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39242) AwaitOffset does not wait correctly for atleast expected offset and RateStreamProvider test is flaky
[ https://issues.apache.org/jira/browse/SPARK-39242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39242: Assignee: Apache Spark > AwaitOffset does not wait correctly for atleast expected offset and > RateStreamProvider test is flaky > > > Key: SPARK-39242 > URL: https://issues.apache.org/jira/browse/SPARK-39242 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1 >Reporter: Anish Shrigondekar >Assignee: Apache Spark >Priority: Major > > AwaitOffset does not wait correctly for atleast expected offset and > RateStreamProvider test is flaky -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39242) AwaitOffset does not wait correctly for atleast expected offset and RateStreamProvider test is flaky
[ https://issues.apache.org/jira/browse/SPARK-39242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540250#comment-17540250 ] Apache Spark commented on SPARK-39242: -- User 'anishshri-db' has created a pull request for this issue: https://github.com/apache/spark/pull/36620 > AwaitOffset does not wait correctly for atleast expected offset and > RateStreamProvider test is flaky > > > Key: SPARK-39242 > URL: https://issues.apache.org/jira/browse/SPARK-39242 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1 >Reporter: Anish Shrigondekar >Priority: Major > > AwaitOffset does not wait correctly for atleast expected offset and > RateStreamProvider test is flaky -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39242) AwaitOffset does not wait correctly for atleast expected offset and RateStreamProvider test is flaky
[ https://issues.apache.org/jira/browse/SPARK-39242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540252#comment-17540252 ] Anish Shrigondekar commented on SPARK-39242: PR for the change submitted here: [https://github.com/apache/spark/pull/36620] CC - [~kabhwan] - please take a look. Thanks > AwaitOffset does not wait correctly for atleast expected offset and > RateStreamProvider test is flaky > > > Key: SPARK-39242 > URL: https://issues.apache.org/jira/browse/SPARK-39242 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1 >Reporter: Anish Shrigondekar >Priority: Major > > AwaitOffset does not wait correctly for atleast expected offset and > RateStreamProvider test is flaky -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39242) AwaitOffset does not wait correctly for atleast expected offset and RateStreamProvider test is flaky
[ https://issues.apache.org/jira/browse/SPARK-39242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39242: Assignee: (was: Apache Spark) > AwaitOffset does not wait correctly for atleast expected offset and > RateStreamProvider test is flaky > > > Key: SPARK-39242 > URL: https://issues.apache.org/jira/browse/SPARK-39242 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1 >Reporter: Anish Shrigondekar >Priority: Major > > AwaitOffset does not wait correctly for atleast expected offset and > RateStreamProvider test is flaky -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39199) Implement pandas API missing parameters
[ https://issues.apache.org/jira/browse/SPARK-39199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-39199: - Description: pandas API on Spark aims to make pandas code work on Spark clusters without any changes. So full API coverage has been one of our major goals. Currently, most pandas functions are implemented, whereas some of them are have incomplete parameters support. There are some common parameters missing (resolved): * How to do with NAs * Filter data types * Control result length * Reindex result There are remaining missing parameters to implement (see doc below). See the design and the current status at [https://docs.google.com/document/d/1H6RXL6oc-v8qLJbwKl6OEqBjRuMZaXcTYmrZb9yNm5I/edit?usp=sharing]. was: pandas API on Spark aims to achieve full pandas API coverage. Currently, most pandas functions are supported in pandas API on Spark with parameters missing. There are some common parameters missing: - how to do with NAs: `skipna`, `dropna` - filter data types: `numeric_only`, `bool_only` - filter result length: `keep` - reindex result: `ignore_index` They support common use cases and should be prioritized. > Implement pandas API missing parameters > --- > > Key: SPARK-39199 > URL: https://issues.apache.org/jira/browse/SPARK-39199 > Project: Spark > Issue Type: Umbrella > Components: Pandas API on Spark, PySpark >Affects Versions: 3.3.0, 3.4.0, 3.3.1 >Reporter: Xinrong Meng >Priority: Major > > pandas API on Spark aims to make pandas code work on Spark clusters without > any changes. So full API coverage has been one of our major goals. Currently, > most pandas functions are implemented, whereas some of them are have > incomplete parameters support. > There are some common parameters missing (resolved): > * How to do with NAs > * Filter data types > * Control result length > * Reindex result > There are remaining missing parameters to implement (see doc below). > See the design and the current status at > [https://docs.google.com/document/d/1H6RXL6oc-v8qLJbwKl6OEqBjRuMZaXcTYmrZb9yNm5I/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39242) AwaitOffset does not wait correctly for atleast expected offset and RateStreamProvider test is flaky
[ https://issues.apache.org/jira/browse/SPARK-39242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540231#comment-17540231 ] Anish Shrigondekar commented on SPARK-39242: I have found the root cause for the issue and will submit the PR soon. > AwaitOffset does not wait correctly for atleast expected offset and > RateStreamProvider test is flaky > > > Key: SPARK-39242 > URL: https://issues.apache.org/jira/browse/SPARK-39242 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1 >Reporter: Anish Shrigondekar >Priority: Major > > AwaitOffset does not wait correctly for atleast expected offset and > RateStreamProvider test is flaky -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39242) AwaitOffset does not wait correctly for atleast expected offset and RateStreamProvider test is flaky
Anish Shrigondekar created SPARK-39242: -- Summary: AwaitOffset does not wait correctly for atleast expected offset and RateStreamProvider test is flaky Key: SPARK-39242 URL: https://issues.apache.org/jira/browse/SPARK-39242 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.2.1 Reporter: Anish Shrigondekar AwaitOffset does not wait correctly for atleast expected offset and RateStreamProvider test is flaky -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39240) Source and binary releases using different tool to generates hashes for integrity
[ https://issues.apache.org/jira/browse/SPARK-39240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-39240. -- Fix Version/s: 3.3.1 3.2.2 Resolution: Fixed Issue resolved by pull request 36619 [https://github.com/apache/spark/pull/36619] > Source and binary releases using different tool to generates hashes for > integrity > - > > Key: SPARK-39240 > URL: https://issues.apache.org/jira/browse/SPARK-39240 > Project: Spark > Issue Type: Bug > Components: Build, Project Infra >Affects Versions: 3.2.1, 3.3.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.3.1, 3.2.2 > > > shasum for source > gpg for binary -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39240) Source and binary releases using different tool to generates hashes for integrity
[ https://issues.apache.org/jira/browse/SPARK-39240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-39240: Assignee: Kent Yao > Source and binary releases using different tool to generates hashes for > integrity > - > > Key: SPARK-39240 > URL: https://issues.apache.org/jira/browse/SPARK-39240 > Project: Spark > Issue Type: Bug > Components: Build, Project Infra >Affects Versions: 3.2.1, 3.3.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > shasum for source > gpg for binary -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39240) Source and binary releases using different tool to generates hashes for integrity
[ https://issues.apache.org/jira/browse/SPARK-39240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-39240: - Issue Type: Improvement (was: Bug) Priority: Trivial (was: Major) > Source and binary releases using different tool to generates hashes for > integrity > - > > Key: SPARK-39240 > URL: https://issues.apache.org/jira/browse/SPARK-39240 > Project: Spark > Issue Type: Improvement > Components: Build, Project Infra >Affects Versions: 3.2.1, 3.3.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Trivial > Fix For: 3.2.2, 3.3.1 > > > shasum for source > gpg for binary -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39241) Spark SQL 'Like' operator behaves wrongly while filtering on partitioned column after Spark 3.1
[ https://issues.apache.org/jira/browse/SPARK-39241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Gorbatsevich updated SPARK-39241: Description: It seems like introduction of "like any" in spark 3.1 breaks "like" behaviour when filtering on partitioned column. Here is the example: 1. Create test table: {code:java} scala> spark.sql( | """ | CREATE EXTERNAL TABLE tmp( | f1 STRING | ) | PARTITIONED BY (dt STRING) | ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' | LINES TERMINATED BY '\n' | STORED AS TEXTFILE | LOCATION 's3://vlg-data-us-east-1/tmp/tmp/'; | """) res2: org.apache.spark.sql.DataFrame = []{code} 2. insert something there: {code:java} scala> spark.sql( | """ | insert into table tmp partition(dt="2022051000") values("1") | """ | ) res3: org.apache.spark.sql.DataFrame = [] {code} 3. Do select using 'like': {code:java} scala> spark.sql( | """ | select * from tmp | where dt like '202205100%' | """ | ).show() +---+---+ | f1| dt| +---+---+ +---+---+ {code} 4. Do select using 'like any': {code:java} scala> spark.sql( | """ | select * from tmp | where dt like any ('202205100%') | """ | ).show() 22/05/20 14:50:26 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist +---+--+ | f1| dt| +---+--+ | 1|2022051000| +---+--+ {code} Expectation is that results 3 and 4 are identical, however this is not the case and result #3 is obviously wrong. *Environment: EMR* Release label:emr-6.5.0 Hadoop distribution:Amazon 3.2.1 Applications:{*}Spark 3.1.2{*}, Hive 3.1.2, Livy 0.7.1 was: It seems like introduction of "like any" in spark 3.1 breaks "like" behaviour when filtering on partitioned column. Here is the example: 1. Create test table: {code:java} scala> spark.sql( | """ | CREATE EXTERNAL TABLE tmp( | f1 STRING | ) | PARTITIONED BY (dt STRING) | ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' | LINES TERMINATED BY '\n' | STORED AS TEXTFILE | LOCATION 's3://vlg-data-us-east-1/tmp/tmp/'; | """) res2: org.apache.spark.sql.DataFrame = []{code} 2. insert something there: {code:java} scala> spark.sql( | """ | insert into table tmp partition(dt="2022051000") values("1") | """ | ) res3: org.apache.spark.sql.DataFrame = [] {code} 3. Do select using 'like': {code:java} scala> spark.sql( | """ | select * from tmp | where dt like '202205100%' | """ | ).show() +---+---+ | f1| dt| +---+---+ +---+---+ {code} 4. Do select using 'like any': {code:java} scala> spark.sql( | """ | select * from tmp | where dt like any ('202205100%') | """ | ).show() 22/05/20 14:50:26 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist +---+--+ | f1| dt| +---+--+ | 1|2022051000| +---+--+ {code} Expectation is that results 3 and 4 are identical, however this is not the case and result #3 is obviously wrong. *Environment: EMR* Release label:emr-6.5.0 Hadoop distribution:Amazon 3.2.1 Applications:{*}Spark 3.1.2{*}, Hive 3.1.2, Livy 0.7.1 > Spark SQL 'Like' operator behaves wrongly while filtering on partitioned > column after Spark 3.1 > --- > > Key: SPARK-39241 > URL: https://issues.apache.org/jira/browse/SPARK-39241 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 > Environment: *Environment: EMR* > Release label:emr-6.5.0 > Hadoop distribution:Amazon 3.2.1 > Applications:{*}Spark 3.1.2{*}, Hive 3.1.2, Livy 0.7.1 >Reporter: Dmitry Gorbatsevich >Priority: Major > > It seems like introduction of "like any" in spark 3.1 breaks "like" behaviour > when filtering on partitioned column. Here is the example: > 1. Create test table: > {code:java} > scala> spark.sql( > | """ > | CREATE EXTERNAL TABLE tmp( > | f1 STRING > | ) > | PARTITIONED BY (dt STRING) > | ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' > | LINES TERMINATED BY '\n' > | STORED AS TEXTFILE > | LOCATION 's3://vlg-data-us-east-1/tmp/tmp/'; > | """) > res2: org.apache.spark.sql.DataFrame = []{code} > 2. insert something there: > {code:java} > scala> spark.sql( > | """ > | insert into table tmp partition(dt="2022051000") values("1") > | """ > | ) > res3: org.apache.spark.sql.DataFrame = [] {code} > 3. Do select using 'like': > {code:java
[jira] [Created] (SPARK-39241) Spark SQL 'Like' operator behaves wrongly while filtering on partitioned column after Spark 3.1
Dmitry Gorbatsevich created SPARK-39241: --- Summary: Spark SQL 'Like' operator behaves wrongly while filtering on partitioned column after Spark 3.1 Key: SPARK-39241 URL: https://issues.apache.org/jira/browse/SPARK-39241 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2 Environment: *Environment: EMR* Release label:emr-6.5.0 Hadoop distribution:Amazon 3.2.1 Applications:{*}Spark 3.1.2{*}, Hive 3.1.2, Livy 0.7.1 Reporter: Dmitry Gorbatsevich It seems like introduction of "like any" in spark 3.1 breaks "like" behaviour when filtering on partitioned column. Here is the example: 1. Create test table: {code:java} scala> spark.sql( | """ | CREATE EXTERNAL TABLE tmp( | f1 STRING | ) | PARTITIONED BY (dt STRING) | ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' | LINES TERMINATED BY '\n' | STORED AS TEXTFILE | LOCATION 's3://vlg-data-us-east-1/tmp/tmp/'; | """) res2: org.apache.spark.sql.DataFrame = []{code} 2. insert something there: {code:java} scala> spark.sql( | """ | insert into table tmp partition(dt="2022051000") values("1") | """ | ) res3: org.apache.spark.sql.DataFrame = [] {code} 3. Do select using 'like': {code:java} scala> spark.sql( | """ | select * from tmp | where dt like '202205100%' | """ | ).show() +---+---+ | f1| dt| +---+---+ +---+---+ {code} 4. Do select using 'like any': {code:java} scala> spark.sql( | """ | select * from tmp | where dt like any ('202205100%') | """ | ).show() 22/05/20 14:50:26 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist +---+--+ | f1| dt| +---+--+ | 1|2022051000| +---+--+ {code} Expectation is that results 3 and 4 are identical, however this is not the case and result #3 is obviously wrong. *Environment: EMR* Release label:emr-6.5.0 Hadoop distribution:Amazon 3.2.1 Applications:{*}Spark 3.1.2{*}, Hive 3.1.2, Livy 0.7.1 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39240) Source and binary releases using different tool to generates hashes for integrity
[ https://issues.apache.org/jira/browse/SPARK-39240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540017#comment-17540017 ] Apache Spark commented on SPARK-39240: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/36619 > Source and binary releases using different tool to generates hashes for > integrity > - > > Key: SPARK-39240 > URL: https://issues.apache.org/jira/browse/SPARK-39240 > Project: Spark > Issue Type: Bug > Components: Build, Project Infra >Affects Versions: 3.2.1, 3.3.0 >Reporter: Kent Yao >Priority: Major > > shasum for source > gpg for binary -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39240) Source and binary releases using different tool to generates hashes for integrity
[ https://issues.apache.org/jira/browse/SPARK-39240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39240: Assignee: (was: Apache Spark) > Source and binary releases using different tool to generates hashes for > integrity > - > > Key: SPARK-39240 > URL: https://issues.apache.org/jira/browse/SPARK-39240 > Project: Spark > Issue Type: Bug > Components: Build, Project Infra >Affects Versions: 3.2.1, 3.3.0 >Reporter: Kent Yao >Priority: Major > > shasum for source > gpg for binary -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39240) Source and binary releases using different tool to generates hashes for integrity
[ https://issues.apache.org/jira/browse/SPARK-39240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39240: Assignee: Apache Spark > Source and binary releases using different tool to generates hashes for > integrity > - > > Key: SPARK-39240 > URL: https://issues.apache.org/jira/browse/SPARK-39240 > Project: Spark > Issue Type: Bug > Components: Build, Project Infra >Affects Versions: 3.2.1, 3.3.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > shasum for source > gpg for binary -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39240) Source and binary releases using different tool to generates hashes for integrity
[ https://issues.apache.org/jira/browse/SPARK-39240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540015#comment-17540015 ] Apache Spark commented on SPARK-39240: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/36619 > Source and binary releases using different tool to generates hashes for > integrity > - > > Key: SPARK-39240 > URL: https://issues.apache.org/jira/browse/SPARK-39240 > Project: Spark > Issue Type: Bug > Components: Build, Project Infra >Affects Versions: 3.2.1, 3.3.0 >Reporter: Kent Yao >Priority: Major > > shasum for source > gpg for binary -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38687) Use error classes in the compilation errors of generators
[ https://issues.apache.org/jira/browse/SPARK-38687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540011#comment-17540011 ] Apache Spark commented on SPARK-38687: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/36617 > Use error classes in the compilation errors of generators > - > > Key: SPARK-38687 > URL: https://issues.apache.org/jira/browse/SPARK-38687 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Migrate the following errors in QueryCompilationErrors: > * nestedGeneratorError > * moreThanOneGeneratorError > * generatorOutsideSelectError > * generatorNotExpectedError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryCompilationErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39237) Update the ANSI SQL mode documentation
[ https://issues.apache.org/jira/browse/SPARK-39237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540008#comment-17540008 ] Apache Spark commented on SPARK-39237: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/36618 > Update the ANSI SQL mode documentation > -- > > Key: SPARK-39237 > URL: https://issues.apache.org/jira/browse/SPARK-39237 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.3.0, 3.2.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Fix For: 3.3.0 > > > 1. Remove the Experimental notation in ANSI SQL compliance doc > 2. Update the description of `spark.sql.ansi.enabled`, since the ANSI > reversed keyword is disabled by default now -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39237) Update the ANSI SQL mode documentation
[ https://issues.apache.org/jira/browse/SPARK-39237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540009#comment-17540009 ] Apache Spark commented on SPARK-39237: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/36618 > Update the ANSI SQL mode documentation > -- > > Key: SPARK-39237 > URL: https://issues.apache.org/jira/browse/SPARK-39237 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.3.0, 3.2.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Fix For: 3.3.0 > > > 1. Remove the Experimental notation in ANSI SQL compliance doc > 2. Update the description of `spark.sql.ansi.enabled`, since the ANSI > reversed keyword is disabled by default now -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39240) Source and binary releases using different tool to generates hashes for integrity
Kent Yao created SPARK-39240: Summary: Source and binary releases using different tool to generates hashes for integrity Key: SPARK-39240 URL: https://issues.apache.org/jira/browse/SPARK-39240 Project: Spark Issue Type: Bug Components: Build, Project Infra Affects Versions: 3.2.1, 3.3.0 Reporter: Kent Yao shasum for source gpg for binary -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38687) Use error classes in the compilation errors of generators
[ https://issues.apache.org/jira/browse/SPARK-38687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38687: Assignee: (was: Apache Spark) > Use error classes in the compilation errors of generators > - > > Key: SPARK-38687 > URL: https://issues.apache.org/jira/browse/SPARK-38687 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Migrate the following errors in QueryCompilationErrors: > * nestedGeneratorError > * moreThanOneGeneratorError > * generatorOutsideSelectError > * generatorNotExpectedError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryCompilationErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38687) Use error classes in the compilation errors of generators
[ https://issues.apache.org/jira/browse/SPARK-38687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540010#comment-17540010 ] Apache Spark commented on SPARK-38687: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/36617 > Use error classes in the compilation errors of generators > - > > Key: SPARK-38687 > URL: https://issues.apache.org/jira/browse/SPARK-38687 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Migrate the following errors in QueryCompilationErrors: > * nestedGeneratorError > * moreThanOneGeneratorError > * generatorOutsideSelectError > * generatorNotExpectedError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryCompilationErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38687) Use error classes in the compilation errors of generators
[ https://issues.apache.org/jira/browse/SPARK-38687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38687: Assignee: Apache Spark > Use error classes in the compilation errors of generators > - > > Key: SPARK-38687 > URL: https://issues.apache.org/jira/browse/SPARK-38687 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Migrate the following errors in QueryCompilationErrors: > * nestedGeneratorError > * moreThanOneGeneratorError > * generatorOutsideSelectError > * generatorNotExpectedError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryCompilationErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39237) Update the ANSI SQL mode documentation
[ https://issues.apache.org/jira/browse/SPARK-39237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-39237. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 36614 [https://github.com/apache/spark/pull/36614] > Update the ANSI SQL mode documentation > -- > > Key: SPARK-39237 > URL: https://issues.apache.org/jira/browse/SPARK-39237 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.3.0, 3.2.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Fix For: 3.3.0 > > > 1. Remove the Experimental notation in ANSI SQL compliance doc > 2. Update the description of `spark.sql.ansi.enabled`, since the ANSI > reversed keyword is disabled by default now -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39239) Parquet written by spark in yarn mode can not be read by spark in local[2+] mode
[ https://issues.apache.org/jira/browse/SPARK-39239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kondziolka9ld updated SPARK-39239: -- Description: Hi, I came across a strange issue, namely data written by spark in yarn mode can not be read by spark in local[2+] mode. By saying can not be read I mean that read operations hangs forever. Strangely enough, local[1] is able to read these parquet data. Additionally, repartition of data before writing is some kind of workaround as well. I attached thread dump and in fact, thread waits on latch. I am not sure if it is a bug or some kind of misconfiguration or misunderstanding. h4. Reproduction scenario: h4. Writer console log: {code:java} user@host [] /tmp $ spark-shell --master yarn [...] scala> (1 to 1000).toDF.write.parquet("hdfs:///tmp/sample_1") scala> (1 to 1000).toDF.repartition(42).write.parquet("hdfs:///tmp/sample_2"){code} h4. Reader console log: {code:java} user@host [] /tmp $ spark-shell --master local[2] [...] scala> spark.read.parquet("hdfs:///tmp/sample_2").count # data were repartitioned before write res2: Long = 1000 scala> spark.read.parquet("hdfs:///tmp/sample_1").count # # it will hang forever [Stage 5:=> (1 + 0) / 2] user@host [] /tmp $ spark-shell --master local[1] [...] scala> spark.read.parquet("hdfs:///tmp/sample_1").count res0: Long = 1000 {code} h4. Thread dump of locked thread {code:java} "main" #1 prio=5 os_prio=0 tid=0x7f93b8054000 nid=0x6dce waiting on condition [0x7f93c0658000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xeb65eab8> (a scala.concurrent.impl.Promise$CompletionLatch) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:242) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:258) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:187) at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:334) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:859) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2261) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030) at org.apache.spark.rdd.RDD$$Lambda$2193/1084000875.apply(Unknown Source) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.collect(RDD.scala:1029) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:390) at org.apache.spark.sql.Dataset.$anonfun$count$1(Dataset.scala:3006) at org.apache.spark.sql.Dataset.$anonfun$count$1$adapted(Dataset.scala:3005) at org.apache.spark.sql.Dataset$$Lambda$2847/937335652.apply(Unknown Source) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687) at org.apache.spark.sql.Dataset$$Lambda$2848/1831604445.apply(Unknown Source) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$$$Lambda$2853/2038636888.apply(Unknown Source) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.execution.SQLExecution$$$Lambda$2849/1622269832.apply(Unknown Source) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685) at org.apache.spark.sql.Dataset.count(Dataset.scala:3005) at $line19.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:24) at $line19.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:28)
[jira] [Updated] (SPARK-39239) Parquet written by spark in yarn mode can not be read by spark in local[2+] mode
[ https://issues.apache.org/jira/browse/SPARK-39239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kondziolka9ld updated SPARK-39239: -- Description: Hi, I came across a strange issue, namely data written by spark in yarn mode can not be read by spark in local[2+] mode. By saying can not be read I mean that read operations hangs forever. Strangely enough, local[1] is able to read these parquet data. Additionally, repartition of data before writing is some kind of workaround as well. I attached thread dump and in fact, thread waits on latch. I am not sure if it is a bug or some kind of misconfiguration or misunderstanding. h4. Reproduction scenario: h4. Writer console log: {code:java} user@host [] /tmp $ spark-shell --master yarn [...] scala> (1 to 1000).toDF.write.parquet("hdfs:///tmp/sample_1") scala> (1 to 1000).toDF.repartition(42).write.parquet("hdfs:///tmp/sample_2"){code} h4. Reader console log: {code:java} user@host [] /tmp $ spark-shell --master local[2] [...] scala> spark.read.parquet("hdfs:///tmp/sample_2").count res2: Long = 1000 scala> spark.read.parquet("hdfs:///tmp/sample_1").count [Stage 5:=> (1 + 0) / 2] user@host [] /tmp $ spark-shell --master local[1] [...] scala> spark.read.parquet("hdfs:///tmp/sample_1").count res0: Long = 1000 {code} h4. Thread dump of locked thread {code:java} "main" #1 prio=5 os_prio=0 tid=0x7f93b8054000 nid=0x6dce waiting on condition [0x7f93c0658000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xeb65eab8> (a scala.concurrent.impl.Promise$CompletionLatch) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:242) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:258) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:187) at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:334) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:859) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2261) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030) at org.apache.spark.rdd.RDD$$Lambda$2193/1084000875.apply(Unknown Source) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.collect(RDD.scala:1029) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:390) at org.apache.spark.sql.Dataset.$anonfun$count$1(Dataset.scala:3006) at org.apache.spark.sql.Dataset.$anonfun$count$1$adapted(Dataset.scala:3005) at org.apache.spark.sql.Dataset$$Lambda$2847/937335652.apply(Unknown Source) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687) at org.apache.spark.sql.Dataset$$Lambda$2848/1831604445.apply(Unknown Source) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$$$Lambda$2853/2038636888.apply(Unknown Source) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.execution.SQLExecution$$$Lambda$2849/1622269832.apply(Unknown Source) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685) at org.apache.spark.sql.Dataset.count(Dataset.scala:3005) at $line19.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:24) at $line19.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:28) at $line19.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:30) at $line1
[jira] [Updated] (SPARK-39239) Parquet written by spark in yarn mode can not be read by spark in local[2+] mode
[ https://issues.apache.org/jira/browse/SPARK-39239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kondziolka9ld updated SPARK-39239: -- Description: Hi, I came across a strange issue, namely data written by spark in yarn mode can not be read by spark in local[2+] mode. By saying can not be read I mean that read operations hangs forever. Strangely enough, local[1] is able to read these parquet data. Additionally, repartition of data before writing is some kind of workaround as well. I attached thread dump and in fact, thread waits on latch. I am not sure if it is a bug or some kind of misconfiguration or misunderstanding. h4. Reproduction scenario: h4. Writer console log: {code:java} user@host [] /tmp $ spark-shell --master yarn [...] scala> (1 to 1000).toDF.write.parquet("hdfs:///tmp/sample_1") scala> (1 to 1000).toDF.repartition(42).write.parquet("hdfs:///tmp/sample_2"){code} h4. Reader console log: {code:java} user@host [] /tmp $ spark-shell --master local[2] [...] scala> spark.read.parquet("hdfs:///tmp/sample_2").count res2: Long = 1000 scala> spark.read.parquet("hdfs:///tmp/sample_1").count [Stage 5:=> (1 + 0) / 2] user@host [] /tmp $ spark-shell --master local[1] [...] scala> spark.read.parquet("hdfs:///tmp/sample_1").count res0: Long = 1000 {code} h4. Thread dump of locked thread {code:java} "main" #1 prio=5 os_prio=0 tid=0x7f93b8054000 nid=0x6dce waiting on condition [0x7f93c0658000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xeb65eab8> (a scala.concurrent.impl.Promise$CompletionLatch) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:242) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:258) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:187) at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:334) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:859) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2261) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030) at org.apache.spark.rdd.RDD$$Lambda$2193/1084000875.apply(Unknown Source) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.collect(RDD.scala:1029) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:390) at org.apache.spark.sql.Dataset.$anonfun$count$1(Dataset.scala:3006) at org.apache.spark.sql.Dataset.$anonfun$count$1$adapted(Dataset.scala:3005) at org.apache.spark.sql.Dataset$$Lambda$2847/937335652.apply(Unknown Source) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687) at org.apache.spark.sql.Dataset$$Lambda$2848/1831604445.apply(Unknown Source) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$$$Lambda$2853/2038636888.apply(Unknown Source) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.execution.SQLExecution$$$Lambda$2849/1622269832.apply(Unknown Source) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685) at org.apache.spark.sql.Dataset.count(Dataset.scala:3005) at $line19.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:24) at $line19.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:28) at $line19.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:30) at $line19
[jira] [Updated] (SPARK-39239) Parquet written by spark in yarn mode can not be read by spark in local[2+] mode
[ https://issues.apache.org/jira/browse/SPARK-39239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kondziolka9ld updated SPARK-39239: -- Attachment: threaddump_spark_shell > Parquet written by spark in yarn mode can not be read by spark in local[2+] > mode > > > Key: SPARK-39239 > URL: https://issues.apache.org/jira/browse/SPARK-39239 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: kondziolka9ld >Priority: Minor > Attachments: threaddump_spark_shell > > > Hi, > I came across a strange issue, namely data written by spark in yarn mode can > not be read by spark in local[2+] mode. By saying can not be read I mean that > read operations hangs forever. Strangely enough, local[1] is able to read > these parquet data. Additionally, repartition of data before writing is some > kind of workaround as well. I attached thread dump and in fact, thread waits > on latch. > I am not sure if it is a bug or some kind of misconfiguration or > misunderstanding. > > h4. Reproduction scenario: > h4. Writer console log: > {code:java} > user@host [] /tmp $ spark-shell --master yarn > [...] > scala> (1 to 1000).toDF.write.parquet("hdfs:///tmp/sample_1") > scala> (1 to > 1000).toDF.repartition(42).write.parquet("hdfs:///tmp/sample_2"){code} > h4. Reader console log: > {code:java} > user@host [] /tmp $ spark-shell --master local[2] > [...] > scala> spark.read.parquet("hdfs:///tmp/sample_2").count > res2: Long = 1000 > scala> spark.read.parquet("hdfs:///tmp/sample_1").count > [Stage 5:=> (1 + 0) / > 2] > user@host [] /tmp $ spark-shell --master local[1] > [...] > scala> spark.read.parquet("hdfs:///tmp/sample_1").count > res0: Long = 1000 > {code} > > h4. Thread dump of locked thread > {code:java} > "main" #1 prio=5 os_prio=0 tid=0x7f93b8054000 nid=0x6dce waiting on > condition [0x7f93c0658000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xeb65eab8> (a > scala.concurrent.impl.Promise$CompletionLatch) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:242) > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:258) > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:187) > at > org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:334) > at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:859) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2261) > at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030) > at org.apache.spark.rdd.RDD$$Lambda$2193/1084000875.apply(Unknown > Source) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) > at org.apache.spark.rdd.RDD.collect(RDD.scala:1029) > at > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:390) > at org.apache.spark.sql.Dataset.$anonfun$count$1(Dataset.scala:3006) > at > org.apache.spark.sql.Dataset.$anonfun$count$1$adapted(Dataset.scala:3005) > at org.apache.spark.sql.Dataset$$Lambda$2847/937335652.apply(Unknown > Source) > at > org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687) > at org.apache.spark.sql.Dataset$$Lambda$2848/1831604445.apply(Unknown > Source) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$$$Lambda$2853/2038636888.apply(Unknown > Source) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) >
[jira] [Created] (SPARK-39239) Parquet written by spark in yarn mode can not be read by spark in local[2+] mode
kondziolka9ld created SPARK-39239: - Summary: Parquet written by spark in yarn mode can not be read by spark in local[2+] mode Key: SPARK-39239 URL: https://issues.apache.org/jira/browse/SPARK-39239 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.2 Reporter: kondziolka9ld Hi, I came across a strange issue, namely data written by spark in yarn mode can not be read by spark in local[2+] mode. By saying can not be read I mean that read operations hangs forever. Strangely enough, local[1] is able to read these parquet data. Additionally, repartition of data before writing is some kind of workaround as well. I attached thread dump and in fact, thread waits on latch. I am not sure if it is a bug or some kind of misconfiguration or misunderstanding. h4. Reproduction scenario: h4. Writer console log: {code:java} user@host [] /tmp $ spark-shell --master yarn [...] scala> (1 to 1000).toDF.write.parquet("hdfs:///tmp/sample_1") scala> (1 to 1000).toDF.repartition(42).write.parquet("hdfs:///tmp/sample_2"){code} h4. Reader console log: {code:java} user@host [] /tmp $ spark-shell --master local[2] [...] scala> spark.read.parquet("hdfs:///tmp/sample_2").count res2: Long = 1000 scala> spark.read.parquet("hdfs:///tmp/sample_1").count [Stage 5:=> (1 + 0) / 2] user@host [] /tmp $ spark-shell --master local[1] [...] scala> spark.read.parquet("hdfs:///tmp/sample_1").count res0: Long = 1000 {code} h4. Thread dump of locked thread {code:java} "main" #1 prio=5 os_prio=0 tid=0x7f93b8054000 nid=0x6dce waiting on condition [0x7f93c0658000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xeb65eab8> (a scala.concurrent.impl.Promise$CompletionLatch) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:242) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:258) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:187) at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:334) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:859) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2261) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030) at org.apache.spark.rdd.RDD$$Lambda$2193/1084000875.apply(Unknown Source) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.collect(RDD.scala:1029) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:390) at org.apache.spark.sql.Dataset.$anonfun$count$1(Dataset.scala:3006) at org.apache.spark.sql.Dataset.$anonfun$count$1$adapted(Dataset.scala:3005) at org.apache.spark.sql.Dataset$$Lambda$2847/937335652.apply(Unknown Source) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687) at org.apache.spark.sql.Dataset$$Lambda$2848/1831604445.apply(Unknown Source) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$$$Lambda$2853/2038636888.apply(Unknown Source) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.execution.SQLExecution$$$Lambda$2849/1622269832.apply(Unknown Source) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685) at org.apache.spark.sql.Dataset.co