[jira] [Resolved] (SPARK-36643) Add more information in ERROR log while SparkConf is modified when spark.sql.legacy.setCommandRejectsSparkCoreConfs is set
[ https://issues.apache.org/jira/browse/SPARK-36643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-36643. --- Fix Version/s: 3.3.0 Assignee: Senthil Kumar Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/33894 > Add more information in ERROR log while SparkConf is modified when > spark.sql.legacy.setCommandRejectsSparkCoreConfs is set > -- > > Key: SPARK-36643 > URL: https://issues.apache.org/jira/browse/SPARK-36643 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.2 >Reporter: Senthil Kumar >Assignee: Senthil Kumar >Priority: Minor > Fix For: 3.3.0 > > > Right now, by default sql.legacy.setCommandRejectsSparkCoreConfs is set as > true in Spark 3.* versions int order to avoid changing Spark Confs. But from > the error message we get confused if we can not modify/change Spark conf in > Spark 3.* or not. > Current Error Message : > {code:java} > Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot > modify the value of a Spark config: spark.driver.host > at > org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:156) > at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:40){code} > > So adding little more information( how to modify Spark Conf), in ERROR log > while SparkConf is modified when > spark.sql.legacy.setCommandRejectsSparkCoreConfs is 'true', will be helpful > to avoid confusions. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36666) [SQL] Regression in AQEShuffleReadExec
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-3: - Assignee: Andy Grove > [SQL] Regression in AQEShuffleReadExec > -- > > Key: SPARK-3 > URL: https://issues.apache.org/jira/browse/SPARK-3 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Blocker > > I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark > 3.2 release candidate and there is a regression in AQEShuffleReadExec where > it now throws an exception if the shuffle's output partitioning does not > match a specific list of schemes. > The problem can be solved by returning UnknownPartitioning, as it already > does in some cases, rather than throwing an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36666) [SQL] Regression in AQEShuffleReadExec
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-3: -- Parent: SPARK-33828 Issue Type: Sub-task (was: Bug) > [SQL] Regression in AQEShuffleReadExec > -- > > Key: SPARK-3 > URL: https://issues.apache.org/jira/browse/SPARK-3 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Andy Grove >Priority: Blocker > > I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark > 3.2 release candidate and there is a regression in AQEShuffleReadExec where > it now throws an exception if the shuffle's output partitioning does not > match a specific list of schemes. > The problem can be solved by returning UnknownPartitioning, as it already > does in some cases, rather than throwing an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36653) Implement Series.__xor__
[ https://issues.apache.org/jira/browse/SPARK-36653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36653: Assignee: (was: Apache Spark) > Implement Series.__xor__ > > > Key: SPARK-36653 > URL: https://issues.apache.org/jira/browse/SPARK-36653 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36653) Implement Series.__xor__
[ https://issues.apache.org/jira/browse/SPARK-36653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409857#comment-17409857 ] Apache Spark commented on SPARK-36653: -- User 'dgd-contributor' has created a pull request for this issue: https://github.com/apache/spark/pull/33911 > Implement Series.__xor__ > > > Key: SPARK-36653 > URL: https://issues.apache.org/jira/browse/SPARK-36653 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36653) Implement Series.__xor__
[ https://issues.apache.org/jira/browse/SPARK-36653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409858#comment-17409858 ] Apache Spark commented on SPARK-36653: -- User 'dgd-contributor' has created a pull request for this issue: https://github.com/apache/spark/pull/33911 > Implement Series.__xor__ > > > Key: SPARK-36653 > URL: https://issues.apache.org/jira/browse/SPARK-36653 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36653) Implement Series.__xor__
[ https://issues.apache.org/jira/browse/SPARK-36653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36653: Assignee: Apache Spark > Implement Series.__xor__ > > > Key: SPARK-36653 > URL: https://issues.apache.org/jira/browse/SPARK-36653 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36667) Close resources properly in StateStoreSuite/RocksDBStateStoreSuite
[ https://issues.apache.org/jira/browse/SPARK-36667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409779#comment-17409779 ] Jungtaek Lim commented on SPARK-36667: -- Will submit a PR soon. > Close resources properly in StateStoreSuite/RocksDBStateStoreSuite > -- > > Key: SPARK-36667 > URL: https://issues.apache.org/jira/browse/SPARK-36667 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Priority: Major > > The StateStoreProvider instances created from "newStoreProvider" are NOT > automatically closed. > While this is trivial for HDFSBackedStateStoreProvider, for > RocksDBStateStoreProvider we leak RocksDB instance as well which should have > closed. Most tests in the RocksDBStateStoreSuite initialize > RocksDBStateStoreProvider, meaning that 60+ RocksDB instances are not closed > in the suite. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36667) Close resources properly in StateStoreSuite/RocksDBStateStoreSuite
Jungtaek Lim created SPARK-36667: Summary: Close resources properly in StateStoreSuite/RocksDBStateStoreSuite Key: SPARK-36667 URL: https://issues.apache.org/jira/browse/SPARK-36667 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.2.0 Reporter: Jungtaek Lim The StateStoreProvider instances created from "newStoreProvider" are NOT automatically closed. While this is trivial for HDFSBackedStateStoreProvider, for RocksDBStateStoreProvider we leak RocksDB instance as well which should have closed. Most tests in the RocksDBStateStoreSuite initialize RocksDBStateStoreProvider, meaning that 60+ RocksDB instances are not closed in the suite. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36666) [SQL] Regression in AQEShuffleReadExec
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-3: -- Priority: Blocker (was: Major) > [SQL] Regression in AQEShuffleReadExec > -- > > Key: SPARK-3 > URL: https://issues.apache.org/jira/browse/SPARK-3 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Andy Grove >Priority: Blocker > > I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark > 3.2 release candidate and there is a regression in AQEShuffleReadExec where > it now throws an exception if the shuffle's output partitioning does not > match a specific list of schemes. > The problem can be solved by returning UnknownPartitioning, as it already > does in some cases, rather than throwing an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36666) [SQL] Regression in AQEShuffleReadExec
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-3: Assignee: Apache Spark > [SQL] Regression in AQEShuffleReadExec > -- > > Key: SPARK-3 > URL: https://issues.apache.org/jira/browse/SPARK-3 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Andy Grove >Assignee: Apache Spark >Priority: Major > > I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark > 3.2 release candidate and there is a regression in AQEShuffleReadExec where > it now throws an exception if the shuffle's output partitioning does not > match a specific list of schemes. > The problem can be solved by returning UnknownPartitioning, as it already > does in some cases, rather than throwing an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36666) [SQL] Regression in AQEShuffleReadExec
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409761#comment-17409761 ] Apache Spark commented on SPARK-3: -- User 'andygrove' has created a pull request for this issue: https://github.com/apache/spark/pull/33910 > [SQL] Regression in AQEShuffleReadExec > -- > > Key: SPARK-3 > URL: https://issues.apache.org/jira/browse/SPARK-3 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Andy Grove >Priority: Major > > I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark > 3.2 release candidate and there is a regression in AQEShuffleReadExec where > it now throws an exception if the shuffle's output partitioning does not > match a specific list of schemes. > The problem can be solved by returning UnknownPartitioning, as it already > does in some cases, rather than throwing an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36666) [SQL] Regression in AQEShuffleReadExec
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-3: Assignee: (was: Apache Spark) > [SQL] Regression in AQEShuffleReadExec > -- > > Key: SPARK-3 > URL: https://issues.apache.org/jira/browse/SPARK-3 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Andy Grove >Priority: Major > > I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark > 3.2 release candidate and there is a regression in AQEShuffleReadExec where > it now throws an exception if the shuffle's output partitioning does not > match a specific list of schemes. > The problem can be solved by returning UnknownPartitioning, as it already > does in some cases, rather than throwing an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36666) [SQL] Regression in AQEShuffleReadExec
Andy Grove created SPARK-3: -- Summary: [SQL] Regression in AQEShuffleReadExec Key: SPARK-3 URL: https://issues.apache.org/jira/browse/SPARK-3 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Andy Grove I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark 3.2 release candidate and there is a regression in AQEShuffleReadExec where it now throws an exception if the shuffle's output partitioning does not match a specific list of schemes. The problem can be solved by returning UnknownPartitioning, as it does in some cases, rather than throwing an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36666) [SQL] Regression in AQEShuffleReadExec
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated SPARK-3: --- Description: I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark 3.2 release candidate and there is a regression in AQEShuffleReadExec where it now throws an exception if the shuffle's output partitioning does not match a specific list of schemes. The problem can be solved by returning UnknownPartitioning, as it already does in some cases, rather than throwing an exception. was: I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark 3.2 release candidate and there is a regression in AQEShuffleReadExec where it now throws an exception if the shuffle's output partitioning does not match a specific list of schemes. The problem can be solved by returning UnknownPartitioning, as it does in some cases, rather than throwing an exception. > [SQL] Regression in AQEShuffleReadExec > -- > > Key: SPARK-3 > URL: https://issues.apache.org/jira/browse/SPARK-3 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Andy Grove >Priority: Major > > I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark > 3.2 release candidate and there is a regression in AQEShuffleReadExec where > it now throws an exception if the shuffle's output partitioning does not > match a specific list of schemes. > The problem can be solved by returning UnknownPartitioning, as it already > does in some cases, rather than throwing an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36665) Add more Not operator optimizations
[ https://issues.apache.org/jira/browse/SPARK-36665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409746#comment-17409746 ] Kazuyuki Tanimura commented on SPARK-36665: --- I am working on this > Add more Not operator optimizations > --- > > Key: SPARK-36665 > URL: https://issues.apache.org/jira/browse/SPARK-36665 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: Kazuyuki Tanimura >Priority: Major > > {{BooleanSimplification should be able to do more simplifications for Not > operators applying following rules}} > # {{Not(null) == null}} > ## {{e.g. IsNull(Not(...)) can be IsNull(...)}} > # {{(Not(a) = b) == (a = Not(b))}} > ## {{e.g. Not(...) = true can be (...) = false}} > # {{(a != b) == (a = Not(b))}} > ## {{e.g. (...) != true can be (...) = false}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36665) Add more Not operator optimizations
Kazuyuki Tanimura created SPARK-36665: - Summary: Add more Not operator optimizations Key: SPARK-36665 URL: https://issues.apache.org/jira/browse/SPARK-36665 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.2, 3.2.0, 3.3.0 Reporter: Kazuyuki Tanimura {{BooleanSimplification should be able to do more simplifications for Not operators applying following rules}} # {{Not(null) == null}} ## {{e.g. IsNull(Not(...)) can be IsNull(...)}} # {{(Not(a) = b) == (a = Not(b))}} ## {{e.g. Not(...) = true can be (...) = false}} # {{(a != b) == (a = Not(b))}} ## {{e.g. (...) != true can be (...) = false}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36655) Add `versionadded` for API added in Spark 3.3.0
[ https://issues.apache.org/jira/browse/SPARK-36655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-36655. --- Fix Version/s: 3.3.0 Assignee: Xinrong Meng Resolution: Fixed Issue resolved by pull request 33901 https://github.com/apache/spark/pull/33901 > Add `versionadded` for API added in Spark 3.3.0 > --- > > Key: SPARK-36655 > URL: https://issues.apache.org/jira/browse/SPARK-36655 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36401) Implement Series.cov
[ https://issues.apache.org/jira/browse/SPARK-36401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-36401. --- Fix Version/s: 3.3.0 Assignee: dgd_contributor Resolution: Fixed Issue resolved by pull request 33752 https://github.com/apache/spark/pull/33752 > Implement Series.cov > > > Key: SPARK-36401 > URL: https://issues.apache.org/jira/browse/SPARK-36401 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: dgd_contributor >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
[ https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409634#comment-17409634 ] Dongjoon Hyun commented on SPARK-36659: --- Although RC2 will fail, I set the fixed version with 3.2.1 because the RC vote is still open. > Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config > -- > > Key: SPARK-36659 > URL: https://issues.apache.org/jira/browse/SPARK-36659 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.2.1 > > > spark.sql.execution.topKSortFallbackThreshold now is an internal config > hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world > cases, if the K is very big, there would be performance issues. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
[ https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36659: -- Fix Version/s: (was: 3.3.0) 3.2.1 > Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config > -- > > Key: SPARK-36659 > URL: https://issues.apache.org/jira/browse/SPARK-36659 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.2.1 > > > spark.sql.execution.topKSortFallbackThreshold now is an internal config > hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world > cases, if the K is very big, there would be performance issues. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
[ https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36659: -- Fix Version/s: (was: 3.2.0) 3.3.0 > Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config > -- > > Key: SPARK-36659 > URL: https://issues.apache.org/jira/browse/SPARK-36659 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.3.0 > > > spark.sql.execution.topKSortFallbackThreshold now is an internal config > hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world > cases, if the K is very big, there would be performance issues. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
[ https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36659: -- Fix Version/s: (was: 3.3.0) 3.2.0 > Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config > -- > > Key: SPARK-36659 > URL: https://issues.apache.org/jira/browse/SPARK-36659 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.2.0 > > > spark.sql.execution.topKSortFallbackThreshold now is an internal config > hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world > cases, if the K is very big, there would be performance issues. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36639) SQL sequence function with interval returns unexpected error in latest versions
[ https://issues.apache.org/jira/browse/SPARK-36639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409620#comment-17409620 ] Apache Spark commented on SPARK-36639: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33909 > SQL sequence function with interval returns unexpected error in latest > versions > --- > > Key: SPARK-36639 > URL: https://issues.apache.org/jira/browse/SPARK-36639 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2 >Reporter: Ignatiy Vdovichenko >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0, 3.1.3 > > > For example this returns > {color:#FF}java.lang.ArrayIndexOutOfBoundsException: 1 {color} > {code:java} > select sequence( > date_trunc('month', '2021-08-30'), > date_trunc('month', '2021-08-15'), > - interval 1 month){code} > Another cases like - all ok > {code:java} > select sequence( > date_trunc('month', '2021-07-15'), > date_trunc('month', '2021-08-30'), > interval 1 month) as x > , sequence( > date_trunc('month', '2021-08-30'), > date_trunc('month', '2021-07-15'), > - interval 1 month) as y > , sequence( > date_trunc('month', '2021-08-15'), > date_trunc('month', '2021-08-30'), > interval 1 month) as z{code} > In version 3.0.0 this works -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36639) SQL sequence function with interval returns unexpected error in latest versions
[ https://issues.apache.org/jira/browse/SPARK-36639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409619#comment-17409619 ] Apache Spark commented on SPARK-36639: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33909 > SQL sequence function with interval returns unexpected error in latest > versions > --- > > Key: SPARK-36639 > URL: https://issues.apache.org/jira/browse/SPARK-36639 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2 >Reporter: Ignatiy Vdovichenko >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0, 3.1.3 > > > For example this returns > {color:#FF}java.lang.ArrayIndexOutOfBoundsException: 1 {color} > {code:java} > select sequence( > date_trunc('month', '2021-08-30'), > date_trunc('month', '2021-08-15'), > - interval 1 month){code} > Another cases like - all ok > {code:java} > select sequence( > date_trunc('month', '2021-07-15'), > date_trunc('month', '2021-08-30'), > interval 1 month) as x > , sequence( > date_trunc('month', '2021-08-30'), > date_trunc('month', '2021-07-15'), > - interval 1 month) as y > , sequence( > date_trunc('month', '2021-08-15'), > date_trunc('month', '2021-08-30'), > interval 1 month) as z{code} > In version 3.0.0 this works -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36664) Log time spent waiting for cluster resources
Holden Karau created SPARK-36664: Summary: Log time spent waiting for cluster resources Key: SPARK-36664 URL: https://issues.apache.org/jira/browse/SPARK-36664 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.0 Reporter: Holden Karau To provide better visibility into why jobs might be running slow it would be useful to log when we are waiting for resources and how long we are waiting for resources so if there is an underlying cluster issue the user can be aware. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36622) spark.history.kerberos.principal doesn't take value _HOST
[ https://issues.apache.org/jira/browse/SPARK-36622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409617#comment-17409617 ] pralabhkumar commented on SPARK-36622: -- [~thejdeep] Its better to have _HOST , its been common practice for hiveserver and similar projects. [~tgraves] Agreed Please let me know , if you are ok . I can create the PR . > spark.history.kerberos.principal doesn't take value _HOST > - > > Key: SPARK-36622 > URL: https://issues.apache.org/jira/browse/SPARK-36622 > Project: Spark > Issue Type: Improvement > Components: Deploy, Security, Spark Core >Affects Versions: 3.0.1, 3.1.2 >Reporter: pralabhkumar >Priority: Minor > > spark.history.kerberos.principal doesn't understand value _HOST. > It says failure to login for principal : spark/_HOST@realm . > It will be helpful to take _HOST value via config file and change it with > current hostname(similar to what Hive does) . This will also help to run SHS > on multiple machines without hardcoding principal hostname. > .spark.history.kerberos.principal > > It require minor change in HistoryServer.scala in initSecurity method . > > Please let me know , if this request make sense , I'll create the PR . > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36639) SQL sequence function with interval returns unexpected error in latest versions
[ https://issues.apache.org/jira/browse/SPARK-36639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409539#comment-17409539 ] Kousuke Saruta commented on SPARK-36639: Issue resolved in https://github.com/apache/spark/pull/33895 for 3.1 and 3.2. > SQL sequence function with interval returns unexpected error in latest > versions > --- > > Key: SPARK-36639 > URL: https://issues.apache.org/jira/browse/SPARK-36639 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2 >Reporter: Ignatiy Vdovichenko >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0, 3.1.3 > > > For example this returns > {color:#FF}java.lang.ArrayIndexOutOfBoundsException: 1 {color} > {code:java} > select sequence( > date_trunc('month', '2021-08-30'), > date_trunc('month', '2021-08-15'), > - interval 1 month){code} > Another cases like - all ok > {code:java} > select sequence( > date_trunc('month', '2021-07-15'), > date_trunc('month', '2021-08-30'), > interval 1 month) as x > , sequence( > date_trunc('month', '2021-08-30'), > date_trunc('month', '2021-07-15'), > - interval 1 month) as y > , sequence( > date_trunc('month', '2021-08-15'), > date_trunc('month', '2021-08-30'), > interval 1 month) as z{code} > In version 3.0.0 this works -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36639) SQL sequence function with interval returns unexpected error in latest versions
[ https://issues.apache.org/jira/browse/SPARK-36639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-36639. Assignee: Kousuke Saruta Resolution: Fixed > SQL sequence function with interval returns unexpected error in latest > versions > --- > > Key: SPARK-36639 > URL: https://issues.apache.org/jira/browse/SPARK-36639 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2 >Reporter: Ignatiy Vdovichenko >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0, 3.1.3 > > > For example this returns > {color:#FF}java.lang.ArrayIndexOutOfBoundsException: 1 {color} > {code:java} > select sequence( > date_trunc('month', '2021-08-30'), > date_trunc('month', '2021-08-15'), > - interval 1 month){code} > Another cases like - all ok > {code:java} > select sequence( > date_trunc('month', '2021-07-15'), > date_trunc('month', '2021-08-30'), > interval 1 month) as x > , sequence( > date_trunc('month', '2021-08-30'), > date_trunc('month', '2021-07-15'), > - interval 1 month) as y > , sequence( > date_trunc('month', '2021-08-15'), > date_trunc('month', '2021-08-30'), > interval 1 month) as z{code} > In version 3.0.0 this works -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36639) SQL sequence function with interval returns unexpected error in latest versions
[ https://issues.apache.org/jira/browse/SPARK-36639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36639: --- Fix Version/s: 3.1.3 3.2.0 > SQL sequence function with interval returns unexpected error in latest > versions > --- > > Key: SPARK-36639 > URL: https://issues.apache.org/jira/browse/SPARK-36639 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2 >Reporter: Ignatiy Vdovichenko >Priority: Major > Fix For: 3.2.0, 3.1.3 > > > For example this returns > {color:#FF}java.lang.ArrayIndexOutOfBoundsException: 1 {color} > {code:java} > select sequence( > date_trunc('month', '2021-08-30'), > date_trunc('month', '2021-08-15'), > - interval 1 month){code} > Another cases like - all ok > {code:java} > select sequence( > date_trunc('month', '2021-07-15'), > date_trunc('month', '2021-08-30'), > interval 1 month) as x > , sequence( > date_trunc('month', '2021-08-30'), > date_trunc('month', '2021-07-15'), > - interval 1 month) as y > , sequence( > date_trunc('month', '2021-08-15'), > date_trunc('month', '2021-08-30'), > interval 1 month) as z{code} > In version 3.0.0 this works -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36663) When the existing field name is a number, an error will be reported when reading the orc file
[ https://issues.apache.org/jira/browse/SPARK-36663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409492#comment-17409492 ] mcdull_zhang commented on SPARK-36663: -- cc [~hyukjin.kwon] [~cloud_fan] > When the existing field name is a number, an error will be reported when > reading the orc file > - > > Key: SPARK-36663 > URL: https://issues.apache.org/jira/browse/SPARK-36663 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2 >Reporter: mcdull_zhang >Priority: Critical > Attachments: image-2021-09-03-20-56-28-846.png > > > You can use the following methods to reproduce the problem: > {quote}val path = "file:///tmp/test_orc" > spark.range(1).withColumnRenamed("id", "100").repartition(1).write.orc(path) > spark.read.orc(path) > {quote} > The error message is like this: > {quote}org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '100' expecting {'ADD', 'AFTER' > == SQL == > struct<100:bigint> > ---^^^ > {quote} > The error is actually issued by this line of code: > {quote}CatalystSqlParser.parseDataType("100:bigint") > {quote} > > The specific background is that spark calls the above code in the process of > converting the schema of the orc file into the catalyst schema. > {quote}// code in OrcUtils > private def toCatalystSchema(schema: TypeDescription): StructType = > Unknown macro: \{ > CharVarcharUtils.replaceCharVarcharWithStringInSchema(CatalystSqlParser.parseDataType(schema.toString).asInstanceOf[StructType]) > }{quote} > There are two solutions I currently think of: > # Modify the syntax analysis of SparkSQL to identify this kind of schema > # The TypeDescription.toString method should add the quote symbol to the > numeric column name, because the following syntax is supported: > {quote}CatalystSqlParser.parseDataType("`100`:bigint") > {quote} > But currently TypeDescription does not support changing the UNQUOTED_NAMES > variable, should we first submit a pr to the orc project to support the > configuration of this variable。 > !image-2021-09-03-20-56-28-846.png! > > How do spark members think about this issue? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36663) When the existing field name is a number, an error will be reported when reading the orc file
[ https://issues.apache.org/jira/browse/SPARK-36663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mcdull_zhang updated SPARK-36663: - Description: You can use the following methods to reproduce the problem: {quote}val path = "file:///tmp/test_orc" spark.range(1).withColumnRenamed("id", "100").repartition(1).write.orc(path) spark.read.orc(path) {quote} The error message is like this: {quote}org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '100' expecting {'ADD', 'AFTER' == SQL == struct<100:bigint> ---^^^ {quote} The error is actually issued by this line of code: {quote}CatalystSqlParser.parseDataType("100:bigint") {quote} The specific background is that spark calls the above code in the process of converting the schema of the orc file into the catalyst schema. {quote}// code in OrcUtils private def toCatalystSchema(schema: TypeDescription): StructType = Unknown macro: \{ CharVarcharUtils.replaceCharVarcharWithStringInSchema(CatalystSqlParser.parseDataType(schema.toString).asInstanceOf[StructType]) }{quote} There are two solutions I currently think of: # Modify the syntax analysis of SparkSQL to identify this kind of schema # The TypeDescription.toString method should add the quote symbol to the numeric column name, because the following syntax is supported: {quote}CatalystSqlParser.parseDataType("`100`:bigint") {quote} But currently TypeDescription does not support changing the UNQUOTED_NAMES variable, should we first submit a pr to the orc project to support the configuration of this variable。 !image-2021-09-03-20-56-28-846.png! How do spark members think about this issue? was: You can use the following methods to reproduce the problem: {quote}val path = "file:///tmp/test_orc" spark.range(1).withColumnRenamed("id", "100").repartition(1).write.orc(path) spark.read.orc(path) {quote} The error message is like this: {quote}org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '100' expecting {'ADD', 'AFTER' == SQL == struct<100:bigint> ---^^^ {quote} The error is actually issued by this line of code: {quote}CatalystSqlParser.parseDataType("100:bigint") {quote} The specific background is that spark calls the above code in the process of converting the schema of the orc file into the catalyst schema. {quote}// code in OrcUtils private def toCatalystSchema(schema: TypeDescription): StructType = { CharVarcharUtils.replaceCharVarcharWithStringInSchema(CatalystSqlParser.parseDataType(schema.toString).asInstanceOf[StructType]) }{quote} There are two solutions I currently think of: # Modify the syntax analysis of SparkSQL to identify this kind of schema # The TypeDescription.toString method should add the quote symbol to the numeric column name, because the following syntax is supported: {quote}CatalystSqlParser.parseDataType("`100`:bigint"){quote} But currently TypeDescription does not support changing the UNQUOTED_NAMES variable, should we first submit a pr to the orc project to support the configuration of this variable。 !image-2021-09-03-20-53-35-626.png! How do spark members think about this issue? > When the existing field name is a number, an error will be reported when > reading the orc file > - > > Key: SPARK-36663 > URL: https://issues.apache.org/jira/browse/SPARK-36663 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2 >Reporter: mcdull_zhang >Priority: Critical > Attachments: image-2021-09-03-20-56-28-846.png > > > You can use the following methods to reproduce the problem: > {quote}val path = "file:///tmp/test_orc" > spark.range(1).withColumnRenamed("id", "100").repartition(1).write.orc(path) > spark.read.orc(path) > {quote} > The error message is like this: > {quote}org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '100' expecting {'ADD', 'AFTER' > == SQL == > struct<100:bigint> > ---^^^ > {quote} > The error is actually issued by this line of code: > {quote}CatalystSqlParser.parseDataType("100:bigint") > {quote} > > The specific background is that spark calls the above code in the process of > converting the schema of the orc file into the catalyst schema. > {quote}// code in OrcUtils > private def toCatalystSchema(schema: TypeDescription): StructType = > Unknown macro: \{ > CharVarcharUtils.replaceCharVarcharWithStringInSchema(CatalystSqlParser.parseDataType(schema.toString).asInstanceOf[StructType]) > }{quote} > There are two solutions I currently think of: > # Modify the syntax analysis of SparkSQL to identify this kind of schema > # The TypeDescription.toString method should add the quote symbol to the > numeric column name, because the following syntax is supp
[jira] [Updated] (SPARK-36663) When the existing field name is a number, an error will be reported when reading the orc file
[ https://issues.apache.org/jira/browse/SPARK-36663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mcdull_zhang updated SPARK-36663: - Attachment: image-2021-09-03-20-56-28-846.png > When the existing field name is a number, an error will be reported when > reading the orc file > - > > Key: SPARK-36663 > URL: https://issues.apache.org/jira/browse/SPARK-36663 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2 >Reporter: mcdull_zhang >Priority: Critical > Attachments: image-2021-09-03-20-56-28-846.png > > > You can use the following methods to reproduce the problem: > {quote}val path = "file:///tmp/test_orc" > spark.range(1).withColumnRenamed("id", "100").repartition(1).write.orc(path) > spark.read.orc(path) > {quote} > The error message is like this: > {quote}org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '100' expecting {'ADD', 'AFTER' > == SQL == > struct<100:bigint> > ---^^^ > {quote} > The error is actually issued by this line of code: > {quote}CatalystSqlParser.parseDataType("100:bigint") > {quote} > > The specific background is that spark calls the above code in the process of > converting the schema of the orc file into the catalyst schema. > {quote}// code in OrcUtils > private def toCatalystSchema(schema: TypeDescription): StructType = { > > CharVarcharUtils.replaceCharVarcharWithStringInSchema(CatalystSqlParser.parseDataType(schema.toString).asInstanceOf[StructType]) > }{quote} > There are two solutions I currently think of: > # Modify the syntax analysis of SparkSQL to identify this kind of schema > # The TypeDescription.toString method should add the quote symbol to the > numeric column name, because the following syntax is supported: > {quote}CatalystSqlParser.parseDataType("`100`:bigint"){quote} > But currently TypeDescription does not support changing the UNQUOTED_NAMES > variable, should we first submit a pr to the orc project to support the > configuration of this variable。 > !image-2021-09-03-20-53-35-626.png! > > How do spark members think about this issue? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36663) When the existing field name is a number, an error will be reported when reading the orc file
mcdull_zhang created SPARK-36663: Summary: When the existing field name is a number, an error will be reported when reading the orc file Key: SPARK-36663 URL: https://issues.apache.org/jira/browse/SPARK-36663 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2, 3.0.3 Reporter: mcdull_zhang You can use the following methods to reproduce the problem: {quote}val path = "file:///tmp/test_orc" spark.range(1).withColumnRenamed("id", "100").repartition(1).write.orc(path) spark.read.orc(path) {quote} The error message is like this: {quote}org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '100' expecting {'ADD', 'AFTER' == SQL == struct<100:bigint> ---^^^ {quote} The error is actually issued by this line of code: {quote}CatalystSqlParser.parseDataType("100:bigint") {quote} The specific background is that spark calls the above code in the process of converting the schema of the orc file into the catalyst schema. {quote}// code in OrcUtils private def toCatalystSchema(schema: TypeDescription): StructType = { CharVarcharUtils.replaceCharVarcharWithStringInSchema(CatalystSqlParser.parseDataType(schema.toString).asInstanceOf[StructType]) }{quote} There are two solutions I currently think of: # Modify the syntax analysis of SparkSQL to identify this kind of schema # The TypeDescription.toString method should add the quote symbol to the numeric column name, because the following syntax is supported: {quote}CatalystSqlParser.parseDataType("`100`:bigint"){quote} But currently TypeDescription does not support changing the UNQUOTED_NAMES variable, should we first submit a pr to the orc project to support the configuration of this variable。 !image-2021-09-03-20-53-35-626.png! How do spark members think about this issue? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36609) Add `errors` argument for `ps.to_numeric`.
[ https://issues.apache.org/jira/browse/SPARK-36609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36609. -- Fix Version/s: 3.3.0 Assignee: Haejoon Lee Resolution: Fixed Fixed in https://github.com/apache/spark/pull/33882 > Add `errors` argument for `ps.to_numeric`. > -- > > Key: SPARK-36609 > URL: https://issues.apache.org/jira/browse/SPARK-36609 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.3.0 > > > To match the behavior with pandas, we should support `errors` argument for > `ps.to_numeric` API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
[ https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-36659. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33904 [https://github.com/apache/spark/pull/33904] > Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config > -- > > Key: SPARK-36659 > URL: https://issues.apache.org/jira/browse/SPARK-36659 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.3.0 > > > spark.sql.execution.topKSortFallbackThreshold now is an internal config > hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world > cases, if the K is very big, there would be performance issues. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
[ https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-36659: Assignee: Kent Yao > Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config > -- > > Key: SPARK-36659 > URL: https://issues.apache.org/jira/browse/SPARK-36659 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > > spark.sql.execution.topKSortFallbackThreshold now is an internal config > hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world > cases, if the K is very big, there would be performance issues. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36661) Support TimestampNTZ in Py4J
[ https://issues.apache.org/jira/browse/SPARK-36661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36661: Assignee: (was: Apache Spark) > Support TimestampNTZ in Py4J > > > Key: SPARK-36661 > URL: https://issues.apache.org/jira/browse/SPARK-36661 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36661) Support TimestampNTZ in Py4J
[ https://issues.apache.org/jira/browse/SPARK-36661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409447#comment-17409447 ] Apache Spark commented on SPARK-36661: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/33877 > Support TimestampNTZ in Py4J > > > Key: SPARK-36661 > URL: https://issues.apache.org/jira/browse/SPARK-36661 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36661) Support TimestampNTZ in Py4J
[ https://issues.apache.org/jira/browse/SPARK-36661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36661: Assignee: Apache Spark > Support TimestampNTZ in Py4J > > > Key: SPARK-36661 > URL: https://issues.apache.org/jira/browse/SPARK-36661 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26208) Empty dataframe does not roundtrip for csv with header
[ https://issues.apache.org/jira/browse/SPARK-26208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409427#comment-17409427 ] Ranga Reddy commented on SPARK-26208: - cc [~hyukjin.kwon] > Empty dataframe does not roundtrip for csv with header > -- > > Key: SPARK-26208 > URL: https://issues.apache.org/jira/browse/SPARK-26208 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 > Environment: master branch, > commit 034ae305c33b1990b3c1a284044002874c343b4d, > date: Sun Nov 18 16:02:15 2018 +0800 >Reporter: koert kuipers >Assignee: Koert Kuipers >Priority: Minor > Fix For: 3.0.0 > > > when we write empty part file for csv and header=true we fail to write > header. the result cannot be read back in. > when header=true a part file with zero rows should still have header -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36662) special timestamps values support for path filters - modifiedBefore/modifiedAfter
[ https://issues.apache.org/jira/browse/SPARK-36662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36662: Assignee: Apache Spark > special timestamps values support for path filters - > modifiedBefore/modifiedAfter > - > > Key: SPARK-36662 > URL: https://issues.apache.org/jira/browse/SPARK-36662 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > support today, now, tomorrow, etc in path filter modifiedBefore/modifiedAfter -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36662) special timestamps values support for path filters - modifiedBefore/modifiedAfter
[ https://issues.apache.org/jira/browse/SPARK-36662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409356#comment-17409356 ] Apache Spark commented on SPARK-36662: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/33908 > special timestamps values support for path filters - > modifiedBefore/modifiedAfter > - > > Key: SPARK-36662 > URL: https://issues.apache.org/jira/browse/SPARK-36662 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Kent Yao >Priority: Major > > support today, now, tomorrow, etc in path filter modifiedBefore/modifiedAfter -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36662) special timestamps values support for path filters - modifiedBefore/modifiedAfter
[ https://issues.apache.org/jira/browse/SPARK-36662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36662: Assignee: (was: Apache Spark) > special timestamps values support for path filters - > modifiedBefore/modifiedAfter > - > > Key: SPARK-36662 > URL: https://issues.apache.org/jira/browse/SPARK-36662 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Kent Yao >Priority: Major > > support today, now, tomorrow, etc in path filter modifiedBefore/modifiedAfter -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36644) Push down boolean column filter
[ https://issues.apache.org/jira/browse/SPARK-36644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai resolved SPARK-36644. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33898 [https://github.com/apache/spark/pull/33898] > Push down boolean column filter > --- > > Key: SPARK-36644 > URL: https://issues.apache.org/jira/browse/SPARK-36644 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Major > Fix For: 3.3.0 > > > The following query does not push down the filter > ``` > SELECT * FROM t WHERE boolean_field > ``` > although the following query pushes down the filter as expected. > ``` > SELECT * FROM t WHERE boolean_field = true > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36644) Push down boolean column filter
[ https://issues.apache.org/jira/browse/SPARK-36644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai reassigned SPARK-36644: --- Assignee: Kazuyuki Tanimura > Push down boolean column filter > --- > > Key: SPARK-36644 > URL: https://issues.apache.org/jira/browse/SPARK-36644 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Major > > The following query does not push down the filter > ``` > SELECT * FROM t WHERE boolean_field > ``` > although the following query pushes down the filter as expected. > ``` > SELECT * FROM t WHERE boolean_field = true > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36662) special timestamps values support for path filters - modifiedBefore/modifiedAfter
Kent Yao created SPARK-36662: Summary: special timestamps values support for path filters - modifiedBefore/modifiedAfter Key: SPARK-36662 URL: https://issues.apache.org/jira/browse/SPARK-36662 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Kent Yao support today, now, tomorrow, etc in path filter modifiedBefore/modifiedAfter -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36610) Add `thousands` argument to `ps.read_csv`.
[ https://issues.apache.org/jira/browse/SPARK-36610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36610: Assignee: Apache Spark > Add `thousands` argument to `ps.read_csv`. > -- > > Key: SPARK-36610 > URL: https://issues.apache.org/jira/browse/SPARK-36610 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > When reading csv file in pandas, pandas automatically detect the thousand > separator if `thousands` argument is specified. > {code:java} > >>> pd.read_csv(path, sep=";") > name agejob money > 0 Jorge 30 Developer 1,000,000 > 1Bob 32 Developer100 > >>> pd.read_csv(path, sep=";", thousands=",") > name agejobmoney > 0 Jorge 30 Developer 100 > 1Bob 32 Developer 100{code} > However, pandas-on-Spark doesn't support it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36610) Add `thousands` argument to `ps.read_csv`.
[ https://issues.apache.org/jira/browse/SPARK-36610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36610: Assignee: (was: Apache Spark) > Add `thousands` argument to `ps.read_csv`. > -- > > Key: SPARK-36610 > URL: https://issues.apache.org/jira/browse/SPARK-36610 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > When reading csv file in pandas, pandas automatically detect the thousand > separator if `thousands` argument is specified. > {code:java} > >>> pd.read_csv(path, sep=";") > name agejob money > 0 Jorge 30 Developer 1,000,000 > 1Bob 32 Developer100 > >>> pd.read_csv(path, sep=";", thousands=",") > name agejobmoney > 0 Jorge 30 Developer 100 > 1Bob 32 Developer 100{code} > However, pandas-on-Spark doesn't support it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36610) Add `thousands` argument to `ps.read_csv`.
[ https://issues.apache.org/jira/browse/SPARK-36610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409328#comment-17409328 ] Apache Spark commented on SPARK-36610: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/33907 > Add `thousands` argument to `ps.read_csv`. > -- > > Key: SPARK-36610 > URL: https://issues.apache.org/jira/browse/SPARK-36610 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > When reading csv file in pandas, pandas automatically detect the thousand > separator if `thousands` argument is specified. > {code:java} > >>> pd.read_csv(path, sep=";") > name agejob money > 0 Jorge 30 Developer 1,000,000 > 1Bob 32 Developer100 > >>> pd.read_csv(path, sep=";", thousands=",") > name agejobmoney > 0 Jorge 30 Developer 100 > 1Bob 32 Developer 100{code} > However, pandas-on-Spark doesn't support it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org