[jira] [Commented] (SPARK-35905) Adding withTable in SQLQuerySuite about test("SPARK-33338: GROUP BY using literal map should not fail")
[ https://issues.apache.org/jira/browse/SPARK-35905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369813#comment-17369813 ] Apache Spark commented on SPARK-35905: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33092 > Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using > literal map should not fail") > -- > > Key: SPARK-35905 > URL: https://issues.apache.org/jira/browse/SPARK-35905 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.8, 3.0.2, 3.1.0 >Reporter: angerszhu >Priority: Major > Fix For: 3.2.0, 3.1.3, 3.0.4 > > > Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using > literal map should not fail") -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35905) Adding withTable in SQLQuerySuite about test("SPARK-33338: GROUP BY using literal map should not fail")
[ https://issues.apache.org/jira/browse/SPARK-35905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369812#comment-17369812 ] Apache Spark commented on SPARK-35905: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33092 > Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using > literal map should not fail") > -- > > Key: SPARK-35905 > URL: https://issues.apache.org/jira/browse/SPARK-35905 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.8, 3.0.2, 3.1.0 >Reporter: angerszhu >Priority: Major > Fix For: 3.2.0, 3.1.3, 3.0.4 > > > Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using > literal map should not fail") -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35905) Adding withTable in SQLQuerySuite about test("SPARK-33338: GROUP BY using literal map should not fail")
[ https://issues.apache.org/jira/browse/SPARK-35905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35905: Assignee: (was: Apache Spark) > Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using > literal map should not fail") > -- > > Key: SPARK-35905 > URL: https://issues.apache.org/jira/browse/SPARK-35905 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.8, 3.0.2, 3.1.0 >Reporter: angerszhu >Priority: Major > Fix For: 3.2.0, 3.1.3, 3.0.4 > > > Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using > literal map should not fail") -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35905) Adding withTable in SQLQuerySuite about test("SPARK-33338: GROUP BY using literal map should not fail")
[ https://issues.apache.org/jira/browse/SPARK-35905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35905: Assignee: Apache Spark > Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using > literal map should not fail") > -- > > Key: SPARK-35905 > URL: https://issues.apache.org/jira/browse/SPARK-35905 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.8, 3.0.2, 3.1.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > Fix For: 3.2.0, 3.1.3, 3.0.4 > > > Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using > literal map should not fail") -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35905) Adding withTable in SQLQuerySuite about test("SPARK-33338: GROUP BY using literal map should not fail")
angerszhu created SPARK-35905: - Summary: Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using literal map should not fail") Key: SPARK-35905 URL: https://issues.apache.org/jira/browse/SPARK-35905 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0, 3.0.2, 2.4.8 Reporter: angerszhu Fix For: 3.2.0, 3.1.3, 3.0.4 Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using literal map should not fail") -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35879) Fix performance regression caused by collectFetchRequests
[ https://issues.apache.org/jira/browse/SPARK-35879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-35879. -- Fix Version/s: 3.1.3 3.2.0 Resolution: Fixed Issue resolved by pull request 33063 [https://github.com/apache/spark/pull/33063] > Fix performance regression caused by collectFetchRequests > - > > Key: SPARK-35879 > URL: https://issues.apache.org/jira/browse/SPARK-35879 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 3.1.0, 3.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.2.0, 3.1.3 > > > {code:java} > ```sql > SET spark.sql.adaptive.enabled=true; > SET spark.sql.shuffle.partitions=3000; > SELECT /*+ REPARTITION */ 1 as pid, id from range(1, 100, 1, 500); > SELECT /*+ REPARTITION(pid, id) */ 1 as pid, id from range(1, 100, 1, > 500); > ```{code} > {code:java} > ```log > 21/06/23 13:54:22 DEBUG ShuffleBlockFetcherIterator: maxBytesInFlight: > 50331648, targetRemoteRequestSize: 10066329, maxBlocksInFlightPerAddress: > 2147483647 > 21/06/23 13:54:38 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 2314708 at BlockManagerId(2, 10.1.3.114, 36423, None) with 86 blocks > 21/06/23 13:54:59 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 2636612 at BlockManagerId(3, 10.1.3.115, 34293, None) with 87 blocks > 21/06/23 13:55:18 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 2508706 at BlockManagerId(4, 10.1.3.116, 41869, None) with 90 blocks > 21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 2350854 at BlockManagerId(5, 10.1.3.117, 45787, None) with 85 blocks > 21/06/23 13:55:34 INFO ShuffleBlockFetcherIterator: Getting 438 (11.8 MiB) > non-empty blocks including 90 (2.5 MiB) local and 0 (0.0 B) host-local and > 348 (9.4 MiB) remote blocks > 21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Sending request for 87 > blocks (2.5 MiB) from 10.1.3.115:34293 > 21/06/23 13:55:34 INFO TransportClientFactory: Successfully created > connection to /10.1.3.115:34293 after 1 ms (0 ms spent in bootstraps) > 21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Sending request for 90 > blocks (2.4 MiB) from 10.1.3.116:41869 > 21/06/23 13:55:34 INFO TransportClientFactory: Successfully created > connection to /10.1.3.116:41869 after 2 ms (0 ms spent in bootstraps) > 21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Sending request for 85 > blocks (2.2 MiB) from 10.1.3.117:45787 > ```{code} > {code:java} > ```log > 21/06/23 14:00:45 INFO MapOutputTracker: Broadcast outputstatuses size = > 411, actual size = 828997 > 21/06/23 14:00:45 INFO MapOutputTrackerWorker: Got the map output locations > 21/06/23 14:00:45 DEBUG ShuffleBlockFetcherIterator: maxBytesInFlight: > 50331648, targetRemoteRequestSize: 10066329, maxBlocksInFlightPerAddress: > 2147483647 > 21/06/23 14:00:55 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 1894389 at BlockManagerId(2, 10.1.3.114, 36423, None) with 99 blocks > 21/06/23 14:01:04 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 1919993 at BlockManagerId(3, 10.1.3.115, 34293, None) with 100 blocks > 21/06/23 14:01:14 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 1977186 at BlockManagerId(5, 10.1.3.117, 45787, None) with 103 blocks > 21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 1938336 at BlockManagerId(4, 10.1.3.116, 41869, None) with 101 blocks > 21/06/23 14:01:23 INFO ShuffleBlockFetcherIterator: Getting 500 (9.1 MiB) > non-empty blocks including 97 (1820.3 KiB) local and 0 (0.0 B) host-local and > 403 (7.4 MiB) remote blocks > 21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 101 > blocks (1892.9 KiB) from 10.1.3.116:41869 > 21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 103 > blocks (1930.8 KiB) from 10.1.3.117:45787 > 21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 99 > blocks (1850.0 KiB) from 10.1.3.114:36423 > 21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 100 > blocks (1875.0 KiB) from 10.1.3.115:34293 > 21/06/23 14:01:23 INFO ShuffleBlockFetcherIterator: Started 4 remote fetches > in 37889 ms > ```{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35879) Fix performance regression caused by collectFetchRequests
[ https://issues.apache.org/jira/browse/SPARK-35879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-35879: Assignee: Kent Yao > Fix performance regression caused by collectFetchRequests > - > > Key: SPARK-35879 > URL: https://issues.apache.org/jira/browse/SPARK-35879 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 3.1.0, 3.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > {code:java} > ```sql > SET spark.sql.adaptive.enabled=true; > SET spark.sql.shuffle.partitions=3000; > SELECT /*+ REPARTITION */ 1 as pid, id from range(1, 100, 1, 500); > SELECT /*+ REPARTITION(pid, id) */ 1 as pid, id from range(1, 100, 1, > 500); > ```{code} > {code:java} > ```log > 21/06/23 13:54:22 DEBUG ShuffleBlockFetcherIterator: maxBytesInFlight: > 50331648, targetRemoteRequestSize: 10066329, maxBlocksInFlightPerAddress: > 2147483647 > 21/06/23 13:54:38 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 2314708 at BlockManagerId(2, 10.1.3.114, 36423, None) with 86 blocks > 21/06/23 13:54:59 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 2636612 at BlockManagerId(3, 10.1.3.115, 34293, None) with 87 blocks > 21/06/23 13:55:18 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 2508706 at BlockManagerId(4, 10.1.3.116, 41869, None) with 90 blocks > 21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 2350854 at BlockManagerId(5, 10.1.3.117, 45787, None) with 85 blocks > 21/06/23 13:55:34 INFO ShuffleBlockFetcherIterator: Getting 438 (11.8 MiB) > non-empty blocks including 90 (2.5 MiB) local and 0 (0.0 B) host-local and > 348 (9.4 MiB) remote blocks > 21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Sending request for 87 > blocks (2.5 MiB) from 10.1.3.115:34293 > 21/06/23 13:55:34 INFO TransportClientFactory: Successfully created > connection to /10.1.3.115:34293 after 1 ms (0 ms spent in bootstraps) > 21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Sending request for 90 > blocks (2.4 MiB) from 10.1.3.116:41869 > 21/06/23 13:55:34 INFO TransportClientFactory: Successfully created > connection to /10.1.3.116:41869 after 2 ms (0 ms spent in bootstraps) > 21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Sending request for 85 > blocks (2.2 MiB) from 10.1.3.117:45787 > ```{code} > {code:java} > ```log > 21/06/23 14:00:45 INFO MapOutputTracker: Broadcast outputstatuses size = > 411, actual size = 828997 > 21/06/23 14:00:45 INFO MapOutputTrackerWorker: Got the map output locations > 21/06/23 14:00:45 DEBUG ShuffleBlockFetcherIterator: maxBytesInFlight: > 50331648, targetRemoteRequestSize: 10066329, maxBlocksInFlightPerAddress: > 2147483647 > 21/06/23 14:00:55 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 1894389 at BlockManagerId(2, 10.1.3.114, 36423, None) with 99 blocks > 21/06/23 14:01:04 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 1919993 at BlockManagerId(3, 10.1.3.115, 34293, None) with 100 blocks > 21/06/23 14:01:14 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 1977186 at BlockManagerId(5, 10.1.3.117, 45787, None) with 103 blocks > 21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Creating fetch request > of 1938336 at BlockManagerId(4, 10.1.3.116, 41869, None) with 101 blocks > 21/06/23 14:01:23 INFO ShuffleBlockFetcherIterator: Getting 500 (9.1 MiB) > non-empty blocks including 97 (1820.3 KiB) local and 0 (0.0 B) host-local and > 403 (7.4 MiB) remote blocks > 21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 101 > blocks (1892.9 KiB) from 10.1.3.116:41869 > 21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 103 > blocks (1930.8 KiB) from 10.1.3.117:45787 > 21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 99 > blocks (1850.0 KiB) from 10.1.3.114:36423 > 21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 100 > blocks (1875.0 KiB) from 10.1.3.115:34293 > 21/06/23 14:01:23 INFO ShuffleBlockFetcherIterator: Started 4 remote fetches > in 37889 ms > ```{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35904) Collapse above RebalancePartitions
[ https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35904: Assignee: (was: Apache Spark) > Collapse above RebalancePartitions > -- > > Key: SPARK-35904 > URL: https://issues.apache.org/jira/browse/SPARK-35904 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Make RebalancePartitions extends RepartitionOperation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35904) Collapse above RebalancePartitions
[ https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35904: Assignee: Apache Spark > Collapse above RebalancePartitions > -- > > Key: SPARK-35904 > URL: https://issues.apache.org/jira/browse/SPARK-35904 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > Make RebalancePartitions extends RepartitionOperation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35904) Collapse above RebalancePartitions
[ https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369796#comment-17369796 ] Apache Spark commented on SPARK-35904: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/33099 > Collapse above RebalancePartitions > -- > > Key: SPARK-35904 > URL: https://issues.apache.org/jira/browse/SPARK-35904 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Make RebalancePartitions extends RepartitionOperation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35904) Collapse above RebalancePartitions
[ https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-35904: Description: Make RebalancePartitions extends RepartitionOperation. > Collapse above RebalancePartitions > -- > > Key: SPARK-35904 > URL: https://issues.apache.org/jira/browse/SPARK-35904 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Make RebalancePartitions extends RepartitionOperation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35904) Collapse above RebalancePartitions
Yuming Wang created SPARK-35904: --- Summary: Collapse above RebalancePartitions Key: SPARK-35904 URL: https://issues.apache.org/jira/browse/SPARK-35904 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35903) Parameterize `master` in TPCDSQueryBenchmark
[ https://issues.apache.org/jira/browse/SPARK-35903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35903: Assignee: (was: Apache Spark) > Parameterize `master` in TPCDSQueryBenchmark > - > > Key: SPARK-35903 > URL: https://issues.apache.org/jira/browse/SPARK-35903 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35903) Parameterize `master` in TPCDSQueryBenchmark
[ https://issues.apache.org/jira/browse/SPARK-35903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35903: Assignee: Apache Spark > Parameterize `master` in TPCDSQueryBenchmark > - > > Key: SPARK-35903 > URL: https://issues.apache.org/jira/browse/SPARK-35903 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35903) Parameterize `master` in TPCDSQueryBenchmark
[ https://issues.apache.org/jira/browse/SPARK-35903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369786#comment-17369786 ] Apache Spark commented on SPARK-35903: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33098 > Parameterize `master` in TPCDSQueryBenchmark > - > > Key: SPARK-35903 > URL: https://issues.apache.org/jira/browse/SPARK-35903 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35884) EXPLAIN FORMATTED for AQE
[ https://issues.apache.org/jira/browse/SPARK-35884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35884: -- Parent: SPARK-33828 Issue Type: Sub-task (was: Bug) > EXPLAIN FORMATTED for AQE > - > > Key: SPARK-35884 > URL: https://issues.apache.org/jira/browse/SPARK-35884 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35903) Parameterize `master` in TPCDSQueryBenchmark
Dongjoon Hyun created SPARK-35903: - Summary: Parameterize `master` in TPCDSQueryBenchmark Key: SPARK-35903 URL: https://issues.apache.org/jira/browse/SPARK-35903 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.2.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35902) spark.driver.log.dfsDir with hdfs scheme failed
[ https://issues.apache.org/jira/browse/SPARK-35902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369783#comment-17369783 ] Dongjoon Hyun commented on SPARK-35902: --- Thank you for reporting, [~yghu]. Go for it! > spark.driver.log.dfsDir with hdfs scheme failed > --- > > Key: SPARK-35902 > URL: https://issues.apache.org/jira/browse/SPARK-35902 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0, 3.1.1, 3.1.2 > Environment: Spark3.1.1 Hadoop 3.1.1 >Reporter: YuanGuanhu >Priority: Major > > when i set spark.driver.log.dfsDir value with hdfs scheme path, it throw an > exception: > spark.driver.log.persistToDfs.enabled = true > spark.driver.log.dfsDir = hdfs://hacluster/spark2xdriverlogs1 > > 2021-06-25 14:56:45,786 | ERROR | main | Could not persist driver logs to dfs > | org.apache.spark.util.logging.DriverLogger.logError(Logging.scala:94) > java.lang.IllegalArgumentException: Pathname > /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 from > /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 is not a > valid DFS filename. > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:252) > at > org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1375) > at > org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1372) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1389) > at > org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1364) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2410) > at > org.apache.spark.deploy.SparkHadoopUtil$.createFile(SparkHadoopUtil.scala:528) > at > org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.init(DriverLogger.scala:118) > at > org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.(DriverLogger.scala:104) > at > org.apache.spark.util.logging.DriverLogger.startSync(DriverLogger.scala:72) > at > org.apache.spark.SparkContext.$anonfun$postApplicationStart$1(SparkContext.scala:2688) > at > org.apache.spark.SparkContext.$anonfun$postApplicationStart$1$adapted(SparkContext.scala:2688) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2688) > at org.apache.spark.SparkContext.(SparkContext.scala:640) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2814) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:947) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:941) > at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106) > at $line3.$read$$iw$$iw.(:15) > at $line3.$read$$iw.(:42) > at $line3.$read.(:44) > at $line3.$read$.(:48) > at $line3.$read$.() > at $line3.$eval$.$print$lzycompute(:7) > at $line3.$eval$.$print(:6) > at $line3.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745) > at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021) > at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574) > at > scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41) > at > scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37) > at > scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) > at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570) > at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:224) > at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214) > at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:224) > at > org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$2(SparkILoop.scala:83) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$1(SparkILoop.scala:83) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:99) > at
[jira] [Updated] (SPARK-35902) spark.driver.log.dfsDir with hdfs scheme failed
[ https://issues.apache.org/jira/browse/SPARK-35902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YuanGuanhu updated SPARK-35902: --- Description: when i set spark.driver.log.dfsDir value with hdfs scheme path, it throw an exception: spark.driver.log.persistToDfs.enabled = true spark.driver.log.dfsDir = hdfs://hacluster/spark2xdriverlogs1 2021-06-25 14:56:45,786 | ERROR | main | Could not persist driver logs to dfs | org.apache.spark.util.logging.DriverLogger.logError(Logging.scala:94) java.lang.IllegalArgumentException: Pathname /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 from /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:252) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1375) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1372) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1389) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1364) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2410) at org.apache.spark.deploy.SparkHadoopUtil$.createFile(SparkHadoopUtil.scala:528) at org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.init(DriverLogger.scala:118) at org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.(DriverLogger.scala:104) at org.apache.spark.util.logging.DriverLogger.startSync(DriverLogger.scala:72) at org.apache.spark.SparkContext.$anonfun$postApplicationStart$1(SparkContext.scala:2688) at org.apache.spark.SparkContext.$anonfun$postApplicationStart$1$adapted(SparkContext.scala:2688) at scala.Option.foreach(Option.scala:407) at org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2688) at org.apache.spark.SparkContext.(SparkContext.scala:640) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2814) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:947) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:941) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106) at $line3.$read$$iw$$iw.(:15) at $line3.$read$$iw.(:42) at $line3.$read.(:44) at $line3.$read$.(:48) at $line3.$read$.() at $line3.$eval$.$print$lzycompute(:7) at $line3.$eval$.$print(:6) at $line3.$eval.$print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021) at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570) at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:224) at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214) at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:224) at org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$2(SparkILoop.scala:83) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$1(SparkILoop.scala:83) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:99) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:83) at org.apache.spark.repl.SparkILoop.$anonfun$process$4(SparkILoop.scala:165) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.tools.nsc.interpreter.ILoop.$anonfun$mumly$1(ILoop.scala:168) at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214) at scala.tools.nsc.interpreter.ILoop.mumly(ILoop.scala:165) at org.apache.spark.repl.SparkILoop.loopPostInit$1(SparkILoop.scala:153) at org.apache.spark.repl.SparkILoop.$anonfun$process$10(SparkILoop.scala:221) at org.apache.spark.repl.SparkILoop.withSuppressedSettings$1(SparkILoop.scala:189) at
[jira] [Commented] (SPARK-35902) spark.driver.log.dfsDir with hdfs scheme failed
[ https://issues.apache.org/jira/browse/SPARK-35902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369780#comment-17369780 ] YuanGuanhu commented on SPARK-35902: I'd like to work on this. > spark.driver.log.dfsDir with hdfs scheme failed > --- > > Key: SPARK-35902 > URL: https://issues.apache.org/jira/browse/SPARK-35902 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0, 3.1.1, 3.1.2 > Environment: Spark3.1.1 Hadoop 3.1.1 >Reporter: YuanGuanhu >Priority: Major > > when i set spark.driver.log.dfsDir value with hdfs scheme path, it throw an > exception: > spark.driver.log.persistToDfs.enabled = true > spark.driver.log.dfsDir = hdfs://hacluster/sparkdriverlogs > > 2021-06-25 14:56:45,786 | ERROR | main | Could not persist driver logs to dfs > | org.apache.spark.util.logging.DriverLogger.logError(Logging.scala:94) > java.lang.IllegalArgumentException: Pathname > /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 from > /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 is not a > valid DFS filename. > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:252) > at > org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1375) > at > org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1372) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1389) > at > org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1364) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2410) > at > org.apache.spark.deploy.SparkHadoopUtil$.createFile(SparkHadoopUtil.scala:528) > at > org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.init(DriverLogger.scala:118) > at > org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.(DriverLogger.scala:104) > at > org.apache.spark.util.logging.DriverLogger.startSync(DriverLogger.scala:72) > at > org.apache.spark.SparkContext.$anonfun$postApplicationStart$1(SparkContext.scala:2688) > at > org.apache.spark.SparkContext.$anonfun$postApplicationStart$1$adapted(SparkContext.scala:2688) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2688) > at org.apache.spark.SparkContext.(SparkContext.scala:640) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2814) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:947) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:941) > at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106) > at $line3.$read$$iw$$iw.(:15) > at $line3.$read$$iw.(:42) > at $line3.$read.(:44) > at $line3.$read$.(:48) > at $line3.$read$.() > at $line3.$eval$.$print$lzycompute(:7) > at $line3.$eval$.$print(:6) > at $line3.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745) > at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021) > at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574) > at > scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41) > at > scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37) > at > scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) > at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570) > at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:224) > at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214) > at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:224) > at > org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$2(SparkILoop.scala:83) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$1(SparkILoop.scala:83) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:99) > at
[jira] [Updated] (SPARK-35902) spark.driver.log.dfsDir with hdfs scheme failed
[ https://issues.apache.org/jira/browse/SPARK-35902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YuanGuanhu updated SPARK-35902: --- Description: when i set spark.driver.log.dfsDir value with hdfs scheme path, it throw an exception: spark.driver.log.persistToDfs.enabled = true spark.driver.log.dfsDir = hdfs://hacluster/spark2xdriverlogs 2021-06-25 14:56:45,786 | ERROR | main | Could not persist driver logs to dfs | org.apache.spark.util.logging.DriverLogger.logError(Logging.scala:94) java.lang.IllegalArgumentException: Pathname /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 from /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:252) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1375) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1372) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1389) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1364) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2410) at org.apache.spark.deploy.SparkHadoopUtil$.createFile(SparkHadoopUtil.scala:528) at org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.init(DriverLogger.scala:118) at org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.(DriverLogger.scala:104) at org.apache.spark.util.logging.DriverLogger.startSync(DriverLogger.scala:72) at org.apache.spark.SparkContext.$anonfun$postApplicationStart$1(SparkContext.scala:2688) at org.apache.spark.SparkContext.$anonfun$postApplicationStart$1$adapted(SparkContext.scala:2688) at scala.Option.foreach(Option.scala:407) at org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2688) at org.apache.spark.SparkContext.(SparkContext.scala:640) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2814) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:947) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:941) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106) at $line3.$read$$iw$$iw.(:15) at $line3.$read$$iw.(:42) at $line3.$read.(:44) at $line3.$read$.(:48) at $line3.$read$.() at $line3.$eval$.$print$lzycompute(:7) at $line3.$eval$.$print(:6) at $line3.$eval.$print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021) at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570) at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:224) at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214) at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:224) at org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$2(SparkILoop.scala:83) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$1(SparkILoop.scala:83) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:99) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:83) at org.apache.spark.repl.SparkILoop.$anonfun$process$4(SparkILoop.scala:165) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.tools.nsc.interpreter.ILoop.$anonfun$mumly$1(ILoop.scala:168) at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214) at scala.tools.nsc.interpreter.ILoop.mumly(ILoop.scala:165) at org.apache.spark.repl.SparkILoop.loopPostInit$1(SparkILoop.scala:153) at org.apache.spark.repl.SparkILoop.$anonfun$process$10(SparkILoop.scala:221) at org.apache.spark.repl.SparkILoop.withSuppressedSettings$1(SparkILoop.scala:189) at
[jira] [Created] (SPARK-35902) spark.driver.log.dfsDir with hdfs scheme failed
YuanGuanhu created SPARK-35902: -- Summary: spark.driver.log.dfsDir with hdfs scheme failed Key: SPARK-35902 URL: https://issues.apache.org/jira/browse/SPARK-35902 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.2, 3.1.1, 3.1.0 Environment: Spark3.1.1 Hadoop 3.1.1 Reporter: YuanGuanhu when i set spark.driver.log.dfsDir value with hdfs scheme path, it throw an exception: spark.driver.log.persistToDfs.enabled = true spark.driver.log.dfsDir = hdfs://hacluster/sparkdriverlogs 2021-06-25 14:56:45,786 | ERROR | main | Could not persist driver logs to dfs | org.apache.spark.util.logging.DriverLogger.logError(Logging.scala:94) java.lang.IllegalArgumentException: Pathname /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 from /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:252) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1375) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1372) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1389) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1364) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2410) at org.apache.spark.deploy.SparkHadoopUtil$.createFile(SparkHadoopUtil.scala:528) at org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.init(DriverLogger.scala:118) at org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.(DriverLogger.scala:104) at org.apache.spark.util.logging.DriverLogger.startSync(DriverLogger.scala:72) at org.apache.spark.SparkContext.$anonfun$postApplicationStart$1(SparkContext.scala:2688) at org.apache.spark.SparkContext.$anonfun$postApplicationStart$1$adapted(SparkContext.scala:2688) at scala.Option.foreach(Option.scala:407) at org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2688) at org.apache.spark.SparkContext.(SparkContext.scala:640) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2814) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:947) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:941) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106) at $line3.$read$$iw$$iw.(:15) at $line3.$read$$iw.(:42) at $line3.$read.(:44) at $line3.$read$.(:48) at $line3.$read$.() at $line3.$eval$.$print$lzycompute(:7) at $line3.$eval$.$print(:6) at $line3.$eval.$print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021) at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570) at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:224) at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214) at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:224) at org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$2(SparkILoop.scala:83) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$1(SparkILoop.scala:83) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:99) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:83) at org.apache.spark.repl.SparkILoop.$anonfun$process$4(SparkILoop.scala:165) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.tools.nsc.interpreter.ILoop.$anonfun$mumly$1(ILoop.scala:168) at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214) at scala.tools.nsc.interpreter.ILoop.mumly(ILoop.scala:165) at
[jira] [Resolved] (SPARK-35466) Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.
[ https://issues.apache.org/jira/browse/SPARK-35466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-35466. --- Fix Version/s: 3.2.0 Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 33094 https://github.com/apache/spark/pull/33094 > Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops. > - > > Key: SPARK-35466 > URL: https://issues.apache.org/jira/browse/SPARK-35466 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35901) Refine type hints in pyspark.pandas.window
[ https://issues.apache.org/jira/browse/SPARK-35901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35901: Assignee: (was: Apache Spark) > Refine type hints in pyspark.pandas.window > -- > > Key: SPARK-35901 > URL: https://issues.apache.org/jira/browse/SPARK-35901 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > We can use more strict type hints for functions in {{pyspark.pandas.window}} > using the generic way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35901) Refine type hints in pyspark.pandas.window
[ https://issues.apache.org/jira/browse/SPARK-35901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35901: Assignee: Apache Spark > Refine type hints in pyspark.pandas.window > -- > > Key: SPARK-35901 > URL: https://issues.apache.org/jira/browse/SPARK-35901 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > We can use more strict type hints for functions in {{pyspark.pandas.window}} > using the generic way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35901) Refine type hints in pyspark.pandas.window
[ https://issues.apache.org/jira/browse/SPARK-35901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369776#comment-17369776 ] Apache Spark commented on SPARK-35901: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33097 > Refine type hints in pyspark.pandas.window > -- > > Key: SPARK-35901 > URL: https://issues.apache.org/jira/browse/SPARK-35901 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > We can use more strict type hints for functions in {{pyspark.pandas.window}} > using the generic way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions
[ https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35899: - Assignee: Anton Okolnychyi > Add a utility to convert connector expressions to Catalyst expressions > -- > > Key: SPARK-35899 > URL: https://issues.apache.org/jira/browse/SPARK-35899 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > > There are more and more places that require converting a v2 connector > expression to an internal Catalyst expression. We need to build a utility > method to avoid having the same logic in a lot of places. > See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions
[ https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35899. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33096 [https://github.com/apache/spark/pull/33096] > Add a utility to convert connector expressions to Catalyst expressions > -- > > Key: SPARK-35899 > URL: https://issues.apache.org/jira/browse/SPARK-35899 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Fix For: 3.2.0 > > > There are more and more places that require converting a v2 connector > expression to an internal Catalyst expression. We need to build a utility > method to avoid having the same logic in a lot of places. > See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
[ https://issues.apache.org/jira/browse/SPARK-35894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-35894. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33085 [https://github.com/apache/spark/pull/33085] > Introduce new style enforce to not import scala.collection.Seq/IndexedSeq > - > > Key: SPARK-35894 > URL: https://issues.apache.org/jira/browse/SPARK-35894 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.2.0 > > > Due to the changes on Scala 2.13, importing scala.collection.Seq or > scala.collection.IndexedSeq would bring weird issue. > (It doesn't bring issue on Scala 2.12 so it's high likely we couldn't > indicate the problem till we see complication failure in Scala 2.13.) > Please refer below page to see the details of changes around Seq. > https://docs.scala-lang.org/overviews/core/collections-migration-213.html > It would be nice if we can prevent the case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
[ https://issues.apache.org/jira/browse/SPARK-35894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-35894: Assignee: Jungtaek Lim > Introduce new style enforce to not import scala.collection.Seq/IndexedSeq > - > > Key: SPARK-35894 > URL: https://issues.apache.org/jira/browse/SPARK-35894 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > Due to the changes on Scala 2.13, importing scala.collection.Seq or > scala.collection.IndexedSeq would bring weird issue. > (It doesn't bring issue on Scala 2.12 so it's high likely we couldn't > indicate the problem till we see complication failure in Scala 2.13.) > Please refer below page to see the details of changes around Seq. > https://docs.scala-lang.org/overviews/core/collections-migration-213.html > It would be nice if we can prevent the case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35901) Refine type hints in pyspark.pandas.window
Takuya Ueshin created SPARK-35901: - Summary: Refine type hints in pyspark.pandas.window Key: SPARK-35901 URL: https://issues.apache.org/jira/browse/SPARK-35901 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin We can use more strict type hints for functions in {{pyspark.pandas.window}} using the generic way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions
[ https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35899: Assignee: Apache Spark > Add a utility to convert connector expressions to Catalyst expressions > -- > > Key: SPARK-35899 > URL: https://issues.apache.org/jira/browse/SPARK-35899 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Assignee: Apache Spark >Priority: Major > > There are more and more places that require converting a v2 connector > expression to an internal Catalyst expression. We need to build a utility > method to avoid having the same logic in a lot of places. > See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions
[ https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369760#comment-17369760 ] Apache Spark commented on SPARK-35899: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/33096 > Add a utility to convert connector expressions to Catalyst expressions > -- > > Key: SPARK-35899 > URL: https://issues.apache.org/jira/browse/SPARK-35899 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Priority: Major > > There are more and more places that require converting a v2 connector > expression to an internal Catalyst expression. We need to build a utility > method to avoid having the same logic in a lot of places. > See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions
[ https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35899: Assignee: (was: Apache Spark) > Add a utility to convert connector expressions to Catalyst expressions > -- > > Key: SPARK-35899 > URL: https://issues.apache.org/jira/browse/SPARK-35899 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Priority: Major > > There are more and more places that require converting a v2 connector > expression to an internal Catalyst expression. We need to build a utility > method to avoid having the same logic in a lot of places. > See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35339) Improve unit tests for data-type-based basic operations
[ https://issues.apache.org/jira/browse/SPARK-35339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369753#comment-17369753 ] Apache Spark commented on SPARK-35339: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/33095 > Improve unit tests for data-type-based basic operations > --- > > Key: SPARK-35339 > URL: https://issues.apache.org/jira/browse/SPARK-35339 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Unit tests for arithmetic operations are scattered in the codebase: > * pyspark/pandas/tests/test_ops_on_diff_frames.py > * pyspark/pandas/tests/test_dataframe.py > * pyspark/pandas/tests/test_series.py > * (Upcoming) pyspark/pandas/tests/data_type_ops/ > We wanted to consolidate them. > The code would be cleaner and easier to maintain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35339) Improve unit tests for data-type-based basic operations
[ https://issues.apache.org/jira/browse/SPARK-35339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35339: Assignee: (was: Apache Spark) > Improve unit tests for data-type-based basic operations > --- > > Key: SPARK-35339 > URL: https://issues.apache.org/jira/browse/SPARK-35339 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Unit tests for arithmetic operations are scattered in the codebase: > * pyspark/pandas/tests/test_ops_on_diff_frames.py > * pyspark/pandas/tests/test_dataframe.py > * pyspark/pandas/tests/test_series.py > * (Upcoming) pyspark/pandas/tests/data_type_ops/ > We wanted to consolidate them. > The code would be cleaner and easier to maintain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35339) Improve unit tests for data-type-based basic operations
[ https://issues.apache.org/jira/browse/SPARK-35339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35339: Assignee: Apache Spark > Improve unit tests for data-type-based basic operations > --- > > Key: SPARK-35339 > URL: https://issues.apache.org/jira/browse/SPARK-35339 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Unit tests for arithmetic operations are scattered in the codebase: > * pyspark/pandas/tests/test_ops_on_diff_frames.py > * pyspark/pandas/tests/test_dataframe.py > * pyspark/pandas/tests/test_series.py > * (Upcoming) pyspark/pandas/tests/data_type_ops/ > We wanted to consolidate them. > The code would be cleaner and easier to maintain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35339) Improve unit tests for data-type-based basic operations
[ https://issues.apache.org/jira/browse/SPARK-35339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35339: - Summary: Improve unit tests for data-type-based basic operations (was: Consolidate unit tests for arithmetic operations) > Improve unit tests for data-type-based basic operations > --- > > Key: SPARK-35339 > URL: https://issues.apache.org/jira/browse/SPARK-35339 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Unit tests for arithmetic operations are scattered in the codebase: > * pyspark/pandas/tests/test_ops_on_diff_frames.py > * pyspark/pandas/tests/test_dataframe.py > * pyspark/pandas/tests/test_series.py > * (Upcoming) pyspark/pandas/tests/data_type_ops/ > We wanted to consolidate them. > The code would be cleaner and easier to maintain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35466) Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.
[ https://issues.apache.org/jira/browse/SPARK-35466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35466: Assignee: Apache Spark > Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops. > - > > Key: SPARK-35466 > URL: https://issues.apache.org/jira/browse/SPARK-35466 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35466) Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.
[ https://issues.apache.org/jira/browse/SPARK-35466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35466: Assignee: (was: Apache Spark) > Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops. > - > > Key: SPARK-35466 > URL: https://issues.apache.org/jira/browse/SPARK-35466 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35466) Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.
[ https://issues.apache.org/jira/browse/SPARK-35466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369654#comment-17369654 ] Apache Spark commented on SPARK-35466: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33094 > Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops. > - > > Key: SPARK-35466 > URL: https://issues.apache.org/jira/browse/SPARK-35466 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35900) Consider whether we need to broadcast dynamic filtering subquery results for v2 tables
Anton Okolnychyi created SPARK-35900: Summary: Consider whether we need to broadcast dynamic filtering subquery results for v2 tables Key: SPARK-35900 URL: https://issues.apache.org/jira/browse/SPARK-35900 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Anton Okolnychyi We don't necessarily have to broadcast the results of dynamic filtering subqueries for v2 tables. See [here|https://github.com/apache/spark/pull/32921#discussion_r653212341]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35896) [SS] Include more granular metrics for stateful operators in StreamingQueryProgress
[ https://issues.apache.org/jira/browse/SPARK-35896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369627#comment-17369627 ] Apache Spark commented on SPARK-35896: -- User 'vkorukanti' has created a pull request for this issue: https://github.com/apache/spark/pull/33091 > [SS] Include more granular metrics for stateful operators in > StreamingQueryProgress > --- > > Key: SPARK-35896 > URL: https://issues.apache.org/jira/browse/SPARK-35896 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Venki Korukanti >Priority: Major > > Currently the streaming progress is missing a few important stateful operator > metrics in {{StateOperatorProgress}}. Each stateful operator consists of > multiple steps. Ex: {{flatMapGroupsWithState}} has two major steps: 1) > processing the input and 2) timeout processing to remove entries from the > state which have expired. The main motivation is to track down the time it > took for each individual step (such as timeout processing, watermark > processing etc) and how much data is processed to pinpoint the bottlenecks > and compare for reasoning why some microbatches are slow compared to others > in the same job. > Below are the final metrics common to all stateful operators (the one in > _*bold-italic*_ are proposed new). These metrics are in > {{StateOperatorProgress}} which is part of {{StreamingQueryProgress}}. > * _*operatorName*_ - State operator name. Can help us identify any operator > specific slowness and state store usage patterns. Ex. "dedupe" (derived using > {{StateStoreWriter.shortName}}) > * _numRowsTotal_ - number of rows in the state store across all tasks in a > stage where the operator has executed. > * _numRowsUpdated_ - number of rows added to or update in the store > * _*allUpdatesTimeMs*_ - time taken to add new rows or update existing state > store rows across all tasks in a stage where the operator has executed. > * _*numRowsRemoved*_ - number of rows deleted from state store as part of > the state cleanup mechanism across all tasks in a stage where the operator > has executed. This number helps measure the state store deletions and impact > on checkpoint commit and other latencies. > * _*allRemovalsTimeMs*_ - time taken to remove the rows from the state store > as part of state (also includes the iterating through the entire state store > to find which rows to delete) across all tasks in a stage where the operator > has executed. If we see jobs spending significant time here, it may justify a > better layout in the state store to read only the required rows than the > entire state store that is read currently. > * _*commitTimeMs*_ - time taken to commit the state store changes to > external storage for checkpointing. This is cumulative across all tasks in a > stage where this operator has executed. > * _*numShufflePartitions*_ - number of shuffle partitions this state > operator is part of. Currently the metrics like times are aggregated across > all tasks in a stage where the operator has executed. Having the number > shuffle partitions (corresponds to number of tasks) helps us find the average > task contribution to the metric. > * _*numStateStores*_ - number of state stores in the operator across all > tasks in the stage. Some stateful operators have more than one state store > (eg. stream-stream join). Tracking this number helps us find correlations > between state stores instances and microbatch latency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35896) [SS] Include more granular metrics for stateful operators in StreamingQueryProgress
[ https://issues.apache.org/jira/browse/SPARK-35896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369626#comment-17369626 ] Apache Spark commented on SPARK-35896: -- User 'vkorukanti' has created a pull request for this issue: https://github.com/apache/spark/pull/33091 > [SS] Include more granular metrics for stateful operators in > StreamingQueryProgress > --- > > Key: SPARK-35896 > URL: https://issues.apache.org/jira/browse/SPARK-35896 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Venki Korukanti >Priority: Major > > Currently the streaming progress is missing a few important stateful operator > metrics in {{StateOperatorProgress}}. Each stateful operator consists of > multiple steps. Ex: {{flatMapGroupsWithState}} has two major steps: 1) > processing the input and 2) timeout processing to remove entries from the > state which have expired. The main motivation is to track down the time it > took for each individual step (such as timeout processing, watermark > processing etc) and how much data is processed to pinpoint the bottlenecks > and compare for reasoning why some microbatches are slow compared to others > in the same job. > Below are the final metrics common to all stateful operators (the one in > _*bold-italic*_ are proposed new). These metrics are in > {{StateOperatorProgress}} which is part of {{StreamingQueryProgress}}. > * _*operatorName*_ - State operator name. Can help us identify any operator > specific slowness and state store usage patterns. Ex. "dedupe" (derived using > {{StateStoreWriter.shortName}}) > * _numRowsTotal_ - number of rows in the state store across all tasks in a > stage where the operator has executed. > * _numRowsUpdated_ - number of rows added to or update in the store > * _*allUpdatesTimeMs*_ - time taken to add new rows or update existing state > store rows across all tasks in a stage where the operator has executed. > * _*numRowsRemoved*_ - number of rows deleted from state store as part of > the state cleanup mechanism across all tasks in a stage where the operator > has executed. This number helps measure the state store deletions and impact > on checkpoint commit and other latencies. > * _*allRemovalsTimeMs*_ - time taken to remove the rows from the state store > as part of state (also includes the iterating through the entire state store > to find which rows to delete) across all tasks in a stage where the operator > has executed. If we see jobs spending significant time here, it may justify a > better layout in the state store to read only the required rows than the > entire state store that is read currently. > * _*commitTimeMs*_ - time taken to commit the state store changes to > external storage for checkpointing. This is cumulative across all tasks in a > stage where this operator has executed. > * _*numShufflePartitions*_ - number of shuffle partitions this state > operator is part of. Currently the metrics like times are aggregated across > all tasks in a stage where the operator has executed. Having the number > shuffle partitions (corresponds to number of tasks) helps us find the average > task contribution to the metric. > * _*numStateStores*_ - number of state stores in the operator across all > tasks in the stage. Some stateful operators have more than one state store > (eg. stream-stream join). Tracking this number helps us find correlations > between state stores instances and microbatch latency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35897) Support user defined initial state with flatMapGroupsWithState in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-35897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35897: Assignee: (was: Apache Spark) > Support user defined initial state with flatMapGroupsWithState in Structured > Streaming > -- > > Key: SPARK-35897 > URL: https://issues.apache.org/jira/browse/SPARK-35897 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Rahul Shivu Mahadev >Priority: Major > Fix For: 3.2.0 > > > Structured Streaming supports arbitrary stateful processing using > mapGroupsWithState and flatMapGroupWithState operators. The state is created > by processing the data that comes in with every batch. This API improvement > will allow users to specify an initial state which is applied at the time of > executing the first batch. > > h2. Proposed new APIs (Scala) > > > def mapGroupsWithState[S: Encoder, U: Encoder]( > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)])( > func: (K, Iterator[V], GroupState[S]) => U): Dataset[U] > > def flatMapGroupsWithState[S: Encoder, U: Encoder]( > outputMode: OutputMode, > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)])( > func: (K, Iterator[V], GroupState[S]) => Iterator[U]) > > h2. Proposed new APIs (Java) > > def mapGroupsWithState[S, U]( > func: MapGroupsWithStateFunction[K, V, S, U], > stateEncoder: Encoder[S], > outputEncoder: Encoder[U], > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)]): Dataset[U] > def flatMapGroupsWithState[S, U]( > func: FlatMapGroupsWithStateFunction[K, V, S, U], > outputMode: OutputMode, > stateEncoder: Encoder[S], > outputEncoder: Encoder[U], > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)]): Dataset[U] > > > h2. Example Usage > > val initialState: Dataset[(String, RunningCount)] = Seq( > ("a", new RunningCount(1)), > ("b", new RunningCount(1)) > ).toDS() > > val inputData = MemoryStream[String] > val result = > inputData.toDS() > .groupByKey(x => x) > .mapGroupsWithState(initialState, timeoutConf)(stateFunc) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35897) Support user defined initial state with flatMapGroupsWithState in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-35897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35897: Assignee: Apache Spark > Support user defined initial state with flatMapGroupsWithState in Structured > Streaming > -- > > Key: SPARK-35897 > URL: https://issues.apache.org/jira/browse/SPARK-35897 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Rahul Shivu Mahadev >Assignee: Apache Spark >Priority: Major > Fix For: 3.2.0 > > > Structured Streaming supports arbitrary stateful processing using > mapGroupsWithState and flatMapGroupWithState operators. The state is created > by processing the data that comes in with every batch. This API improvement > will allow users to specify an initial state which is applied at the time of > executing the first batch. > > h2. Proposed new APIs (Scala) > > > def mapGroupsWithState[S: Encoder, U: Encoder]( > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)])( > func: (K, Iterator[V], GroupState[S]) => U): Dataset[U] > > def flatMapGroupsWithState[S: Encoder, U: Encoder]( > outputMode: OutputMode, > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)])( > func: (K, Iterator[V], GroupState[S]) => Iterator[U]) > > h2. Proposed new APIs (Java) > > def mapGroupsWithState[S, U]( > func: MapGroupsWithStateFunction[K, V, S, U], > stateEncoder: Encoder[S], > outputEncoder: Encoder[U], > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)]): Dataset[U] > def flatMapGroupsWithState[S, U]( > func: FlatMapGroupsWithStateFunction[K, V, S, U], > outputMode: OutputMode, > stateEncoder: Encoder[S], > outputEncoder: Encoder[U], > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)]): Dataset[U] > > > h2. Example Usage > > val initialState: Dataset[(String, RunningCount)] = Seq( > ("a", new RunningCount(1)), > ("b", new RunningCount(1)) > ).toDS() > > val inputData = MemoryStream[String] > val result = > inputData.toDS() > .groupByKey(x => x) > .mapGroupsWithState(initialState, timeoutConf)(stateFunc) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35897) Support user defined initial state with flatMapGroupsWithState in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-35897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369625#comment-17369625 ] Apache Spark commented on SPARK-35897: -- User 'rahulsmahadev' has created a pull request for this issue: https://github.com/apache/spark/pull/33093 > Support user defined initial state with flatMapGroupsWithState in Structured > Streaming > -- > > Key: SPARK-35897 > URL: https://issues.apache.org/jira/browse/SPARK-35897 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Rahul Shivu Mahadev >Priority: Major > Fix For: 3.2.0 > > > Structured Streaming supports arbitrary stateful processing using > mapGroupsWithState and flatMapGroupWithState operators. The state is created > by processing the data that comes in with every batch. This API improvement > will allow users to specify an initial state which is applied at the time of > executing the first batch. > > h2. Proposed new APIs (Scala) > > > def mapGroupsWithState[S: Encoder, U: Encoder]( > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)])( > func: (K, Iterator[V], GroupState[S]) => U): Dataset[U] > > def flatMapGroupsWithState[S: Encoder, U: Encoder]( > outputMode: OutputMode, > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)])( > func: (K, Iterator[V], GroupState[S]) => Iterator[U]) > > h2. Proposed new APIs (Java) > > def mapGroupsWithState[S, U]( > func: MapGroupsWithStateFunction[K, V, S, U], > stateEncoder: Encoder[S], > outputEncoder: Encoder[U], > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)]): Dataset[U] > def flatMapGroupsWithState[S, U]( > func: FlatMapGroupsWithStateFunction[K, V, S, U], > outputMode: OutputMode, > stateEncoder: Encoder[S], > outputEncoder: Encoder[U], > timeoutConf: GroupStateTimeout, > initialState: Dataset[(K, S)]): Dataset[U] > > > h2. Example Usage > > val initialState: Dataset[(String, RunningCount)] = Seq( > ("a", new RunningCount(1)), > ("b", new RunningCount(1)) > ).toDS() > > val inputData = MemoryStream[String] > val result = > inputData.toDS() > .groupByKey(x => x) > .mapGroupsWithState(initialState, timeoutConf)(stateFunc) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33338) GROUP BY using literal map should not fail
[ https://issues.apache.org/jira/browse/SPARK-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369624#comment-17369624 ] Apache Spark commented on SPARK-8: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33092 > GROUP BY using literal map should not fail > -- > > Key: SPARK-8 > URL: https://issues.apache.org/jira/browse/SPARK-8 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.1, 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 2.4.8, 3.0.2, 3.1.0 > > > Apache Spark 2.x ~ 3.0.1 raise`RuntimeException` for the following queries. > *SQL* > {code} > CREATE TABLE t USING ORC AS SELECT map('k1', 'v1') m, 'k1' k > SELECT map('k1', 'v1')[k] FROM t GROUP BY 1 > SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k] > SELECT map('k1', 'v1')[k] a FROM t GROUP BY a > {code} > *ERROR* > {code} > Caused by: java.lang.RuntimeException: Couldn't find k#3 in [keys: [k1], > values: [v1][k#3]#6] > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:85) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:79) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) > {code} > This is a regression from Apache Spark 1.6.x. > {code} > scala> sc.version > res1: String = 1.6.3 > scala> sqlContext.sql("SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', > 'v1')[k]").show > +---+ > |_c0| > +---+ > | v1| > +---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35896) [SS] Include more granular metrics for stateful operators in StreamingQueryProgress
[ https://issues.apache.org/jira/browse/SPARK-35896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35896: Assignee: (was: Apache Spark) > [SS] Include more granular metrics for stateful operators in > StreamingQueryProgress > --- > > Key: SPARK-35896 > URL: https://issues.apache.org/jira/browse/SPARK-35896 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Venki Korukanti >Priority: Major > > Currently the streaming progress is missing a few important stateful operator > metrics in {{StateOperatorProgress}}. Each stateful operator consists of > multiple steps. Ex: {{flatMapGroupsWithState}} has two major steps: 1) > processing the input and 2) timeout processing to remove entries from the > state which have expired. The main motivation is to track down the time it > took for each individual step (such as timeout processing, watermark > processing etc) and how much data is processed to pinpoint the bottlenecks > and compare for reasoning why some microbatches are slow compared to others > in the same job. > Below are the final metrics common to all stateful operators (the one in > _*bold-italic*_ are proposed new). These metrics are in > {{StateOperatorProgress}} which is part of {{StreamingQueryProgress}}. > * _*operatorName*_ - State operator name. Can help us identify any operator > specific slowness and state store usage patterns. Ex. "dedupe" (derived using > {{StateStoreWriter.shortName}}) > * _numRowsTotal_ - number of rows in the state store across all tasks in a > stage where the operator has executed. > * _numRowsUpdated_ - number of rows added to or update in the store > * _*allUpdatesTimeMs*_ - time taken to add new rows or update existing state > store rows across all tasks in a stage where the operator has executed. > * _*numRowsRemoved*_ - number of rows deleted from state store as part of > the state cleanup mechanism across all tasks in a stage where the operator > has executed. This number helps measure the state store deletions and impact > on checkpoint commit and other latencies. > * _*allRemovalsTimeMs*_ - time taken to remove the rows from the state store > as part of state (also includes the iterating through the entire state store > to find which rows to delete) across all tasks in a stage where the operator > has executed. If we see jobs spending significant time here, it may justify a > better layout in the state store to read only the required rows than the > entire state store that is read currently. > * _*commitTimeMs*_ - time taken to commit the state store changes to > external storage for checkpointing. This is cumulative across all tasks in a > stage where this operator has executed. > * _*numShufflePartitions*_ - number of shuffle partitions this state > operator is part of. Currently the metrics like times are aggregated across > all tasks in a stage where the operator has executed. Having the number > shuffle partitions (corresponds to number of tasks) helps us find the average > task contribution to the metric. > * _*numStateStores*_ - number of state stores in the operator across all > tasks in the stage. Some stateful operators have more than one state store > (eg. stream-stream join). Tracking this number helps us find correlations > between state stores instances and microbatch latency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35896) [SS] Include more granular metrics for stateful operators in StreamingQueryProgress
[ https://issues.apache.org/jira/browse/SPARK-35896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35896: Assignee: Apache Spark > [SS] Include more granular metrics for stateful operators in > StreamingQueryProgress > --- > > Key: SPARK-35896 > URL: https://issues.apache.org/jira/browse/SPARK-35896 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Venki Korukanti >Assignee: Apache Spark >Priority: Major > > Currently the streaming progress is missing a few important stateful operator > metrics in {{StateOperatorProgress}}. Each stateful operator consists of > multiple steps. Ex: {{flatMapGroupsWithState}} has two major steps: 1) > processing the input and 2) timeout processing to remove entries from the > state which have expired. The main motivation is to track down the time it > took for each individual step (such as timeout processing, watermark > processing etc) and how much data is processed to pinpoint the bottlenecks > and compare for reasoning why some microbatches are slow compared to others > in the same job. > Below are the final metrics common to all stateful operators (the one in > _*bold-italic*_ are proposed new). These metrics are in > {{StateOperatorProgress}} which is part of {{StreamingQueryProgress}}. > * _*operatorName*_ - State operator name. Can help us identify any operator > specific slowness and state store usage patterns. Ex. "dedupe" (derived using > {{StateStoreWriter.shortName}}) > * _numRowsTotal_ - number of rows in the state store across all tasks in a > stage where the operator has executed. > * _numRowsUpdated_ - number of rows added to or update in the store > * _*allUpdatesTimeMs*_ - time taken to add new rows or update existing state > store rows across all tasks in a stage where the operator has executed. > * _*numRowsRemoved*_ - number of rows deleted from state store as part of > the state cleanup mechanism across all tasks in a stage where the operator > has executed. This number helps measure the state store deletions and impact > on checkpoint commit and other latencies. > * _*allRemovalsTimeMs*_ - time taken to remove the rows from the state store > as part of state (also includes the iterating through the entire state store > to find which rows to delete) across all tasks in a stage where the operator > has executed. If we see jobs spending significant time here, it may justify a > better layout in the state store to read only the required rows than the > entire state store that is read currently. > * _*commitTimeMs*_ - time taken to commit the state store changes to > external storage for checkpointing. This is cumulative across all tasks in a > stage where this operator has executed. > * _*numShufflePartitions*_ - number of shuffle partitions this state > operator is part of. Currently the metrics like times are aggregated across > all tasks in a stage where the operator has executed. Having the number > shuffle partitions (corresponds to number of tasks) helps us find the average > task contribution to the metric. > * _*numStateStores*_ - number of state stores in the operator across all > tasks in the stage. Some stateful operators have more than one state store > (eg. stream-stream join). Tracking this number helps us find correlations > between state stores instances and microbatch latency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions
Anton Okolnychyi created SPARK-35899: Summary: Add a utility to convert connector expressions to Catalyst expressions Key: SPARK-35899 URL: https://issues.apache.org/jira/browse/SPARK-35899 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Anton Okolnychyi There are more and more places that require converting a v2 connector expression to an internal Catalyst expression. We need to build a utility method to avoid having the same logic in a lot of places. See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN
[ https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369565#comment-17369565 ] Apache Spark commented on SPARK-35672: -- User 'xkrogen' has created a pull request for this issue: https://github.com/apache/spark/pull/33090 > Spark fails to launch executors with very large user classpath lists on YARN > > > Key: SPARK-35672 > URL: https://issues.apache.org/jira/browse/SPARK-35672 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 3.1.2 > Environment: Linux RHEL7 > Spark 3.1.1 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.2.0 > > > When running Spark on YARN, the {{user-class-path}} argument to > {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to > executor processes. The argument is specified once for each JAR, and the URIs > are fully-qualified, so the paths can be quite long. With large user JAR > lists (say 1000+), this can result in system-level argument length limits > being exceeded, typically manifesting as the error message: > {code} > /bin/bash: Argument list too long > {code} > A [Google > search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22] > indicates that this is not a theoretical problem and afflicts real users, > including ours. This issue was originally observed on Spark 2.3, but has been > confirmed to exist in the master branch as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN
[ https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369564#comment-17369564 ] Erik Krogen commented on SPARK-35672: - #32810 went into master. Put up #33090 for branch-3.1 > Spark fails to launch executors with very large user classpath lists on YARN > > > Key: SPARK-35672 > URL: https://issues.apache.org/jira/browse/SPARK-35672 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 3.1.2 > Environment: Linux RHEL7 > Spark 3.1.1 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.2.0 > > > When running Spark on YARN, the {{user-class-path}} argument to > {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to > executor processes. The argument is specified once for each JAR, and the URIs > are fully-qualified, so the paths can be quite long. With large user JAR > lists (say 1000+), this can result in system-level argument length limits > being exceeded, typically manifesting as the error message: > {code} > /bin/bash: Argument list too long > {code} > A [Google > search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22] > indicates that this is not a theoretical problem and afflicts real users, > including ours. This issue was originally observed on Spark 2.3, but has been > confirmed to exist in the master branch as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35898) Converting arrays with RowToColumnConverter triggers assertion
Tom van Bussel created SPARK-35898: -- Summary: Converting arrays with RowToColumnConverter triggers assertion Key: SPARK-35898 URL: https://issues.apache.org/jira/browse/SPARK-35898 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2 Reporter: Tom van Bussel When trying to convert a row that contains an array to a ColumnVector with RowToColumnConverter the following error is thrown: {code:java} java.lang.AssertionError at org.apache.spark.sql.execution.vectorized.OffHeapColumnVector.putArray(OffHeapColumnVector.java:560) at org.apache.spark.sql.execution.vectorized.WritableColumnVector.appendArray(WritableColumnVector.java:622) at org.apache.spark.sql.execution.RowToColumnConverter$ArrayConverter.append(Columnar.scala:353) at org.apache.spark.sql.execution.RowToColumnConverter$BasicNullableTypeConverter.append(Columnar.scala:241) at org.apache.spark.sql.execution.RowToColumnConverter.convert(Columnar.scala:221) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35898) Converting arrays with RowToColumnConverter triggers assertion
[ https://issues.apache.org/jira/browse/SPARK-35898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369553#comment-17369553 ] Tom van Bussel commented on SPARK-35898: I will open a PR with the fix. > Converting arrays with RowToColumnConverter triggers assertion > -- > > Key: SPARK-35898 > URL: https://issues.apache.org/jira/browse/SPARK-35898 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Tom van Bussel >Priority: Major > > When trying to convert a row that contains an array to a ColumnVector with > RowToColumnConverter the following error is thrown: > {code:java} > java.lang.AssertionError at > org.apache.spark.sql.execution.vectorized.OffHeapColumnVector.putArray(OffHeapColumnVector.java:560) > at > org.apache.spark.sql.execution.vectorized.WritableColumnVector.appendArray(WritableColumnVector.java:622) > at > org.apache.spark.sql.execution.RowToColumnConverter$ArrayConverter.append(Columnar.scala:353) > at > org.apache.spark.sql.execution.RowToColumnConverter$BasicNullableTypeConverter.append(Columnar.scala:241) > at > org.apache.spark.sql.execution.RowToColumnConverter.convert(Columnar.scala:221) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35897) Support user defined initial state with flatMapGroupsWithState in Structured Streaming
Rahul Shivu Mahadev created SPARK-35897: --- Summary: Support user defined initial state with flatMapGroupsWithState in Structured Streaming Key: SPARK-35897 URL: https://issues.apache.org/jira/browse/SPARK-35897 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.1.2 Reporter: Rahul Shivu Mahadev Fix For: 3.2.0 Structured Streaming supports arbitrary stateful processing using mapGroupsWithState and flatMapGroupWithState operators. The state is created by processing the data that comes in with every batch. This API improvement will allow users to specify an initial state which is applied at the time of executing the first batch. h2. Proposed new APIs (Scala) def mapGroupsWithState[S: Encoder, U: Encoder]( timeoutConf: GroupStateTimeout, initialState: Dataset[(K, S)])( func: (K, Iterator[V], GroupState[S]) => U): Dataset[U] def flatMapGroupsWithState[S: Encoder, U: Encoder]( outputMode: OutputMode, timeoutConf: GroupStateTimeout, initialState: Dataset[(K, S)])( func: (K, Iterator[V], GroupState[S]) => Iterator[U]) h2. Proposed new APIs (Java) def mapGroupsWithState[S, U]( func: MapGroupsWithStateFunction[K, V, S, U], stateEncoder: Encoder[S], outputEncoder: Encoder[U], timeoutConf: GroupStateTimeout, initialState: Dataset[(K, S)]): Dataset[U] def flatMapGroupsWithState[S, U]( func: FlatMapGroupsWithStateFunction[K, V, S, U], outputMode: OutputMode, stateEncoder: Encoder[S], outputEncoder: Encoder[U], timeoutConf: GroupStateTimeout, initialState: Dataset[(K, S)]): Dataset[U] h2. Example Usage val initialState: Dataset[(String, RunningCount)] = Seq( ("a", new RunningCount(1)), ("b", new RunningCount(1)) ).toDS() val inputData = MemoryStream[String] val result = inputData.toDS() .groupByKey(x => x) .mapGroupsWithState(initialState, timeoutConf)(stateFunc) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35863) Upgrade Ivy to 2.5.0
[ https://issues.apache.org/jira/browse/SPARK-35863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35863. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33088 [https://github.com/apache/spark/pull/33088] > Upgrade Ivy to 2.5.0 > > > Key: SPARK-35863 > URL: https://issues.apache.org/jira/browse/SPARK-35863 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 3.1.2 >Reporter: Adam Binford >Assignee: Adam Binford >Priority: Minor > Fix For: 3.2.0 > > > Apache Ivy 2.5.0 was released nearly two years ago. The new bug fixes and > features can be found here: > [https://ant.apache.org/ivy/history/latest-milestone/release-notes.html] > Most notably, the adding of ivy.maven.lookup.sources and > ivy.maven.lookup.javadoc configs can significantly speed up module resolution > time if these are turned off, especially behind a proxy. These could arguably > be turned off by default, because when submitting jobs you probably don't > care about the sources or javadoc jars. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35863) Upgrade Ivy to 2.5.0
[ https://issues.apache.org/jira/browse/SPARK-35863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35863: - Assignee: Adam Binford > Upgrade Ivy to 2.5.0 > > > Key: SPARK-35863 > URL: https://issues.apache.org/jira/browse/SPARK-35863 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 3.1.2 >Reporter: Adam Binford >Assignee: Adam Binford >Priority: Minor > > Apache Ivy 2.5.0 was released nearly two years ago. The new bug fixes and > features can be found here: > [https://ant.apache.org/ivy/history/latest-milestone/release-notes.html] > Most notably, the adding of ivy.maven.lookup.sources and > ivy.maven.lookup.javadoc configs can significantly speed up module resolution > time if these are turned off, especially behind a proxy. These could arguably > be turned off by default, because when submitting jobs you probably don't > care about the sources or javadoc jars. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35896) [SS] Include more granular metrics for stateful operators in StreamingQueryProgress
Venki Korukanti created SPARK-35896: --- Summary: [SS] Include more granular metrics for stateful operators in StreamingQueryProgress Key: SPARK-35896 URL: https://issues.apache.org/jira/browse/SPARK-35896 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.1.2 Reporter: Venki Korukanti Currently the streaming progress is missing a few important stateful operator metrics in {{StateOperatorProgress}}. Each stateful operator consists of multiple steps. Ex: {{flatMapGroupsWithState}} has two major steps: 1) processing the input and 2) timeout processing to remove entries from the state which have expired. The main motivation is to track down the time it took for each individual step (such as timeout processing, watermark processing etc) and how much data is processed to pinpoint the bottlenecks and compare for reasoning why some microbatches are slow compared to others in the same job. Below are the final metrics common to all stateful operators (the one in _*bold-italic*_ are proposed new). These metrics are in {{StateOperatorProgress}} which is part of {{StreamingQueryProgress}}. * _*operatorName*_ - State operator name. Can help us identify any operator specific slowness and state store usage patterns. Ex. "dedupe" (derived using {{StateStoreWriter.shortName}}) * _numRowsTotal_ - number of rows in the state store across all tasks in a stage where the operator has executed. * _numRowsUpdated_ - number of rows added to or update in the store * _*allUpdatesTimeMs*_ - time taken to add new rows or update existing state store rows across all tasks in a stage where the operator has executed. * _*numRowsRemoved*_ - number of rows deleted from state store as part of the state cleanup mechanism across all tasks in a stage where the operator has executed. This number helps measure the state store deletions and impact on checkpoint commit and other latencies. * _*allRemovalsTimeMs*_ - time taken to remove the rows from the state store as part of state (also includes the iterating through the entire state store to find which rows to delete) across all tasks in a stage where the operator has executed. If we see jobs spending significant time here, it may justify a better layout in the state store to read only the required rows than the entire state store that is read currently. * _*commitTimeMs*_ - time taken to commit the state store changes to external storage for checkpointing. This is cumulative across all tasks in a stage where this operator has executed. * _*numShufflePartitions*_ - number of shuffle partitions this state operator is part of. Currently the metrics like times are aggregated across all tasks in a stage where the operator has executed. Having the number shuffle partitions (corresponds to number of tasks) helps us find the average task contribution to the metric. * _*numStateStores*_ - number of state stores in the operator across all tasks in the stage. Some stateful operators have more than one state store (eg. stream-stream join). Tracking this number helps us find correlations between state stores instances and microbatch latency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35892) numPartitions does not work when saves the RDD to database
[ https://issues.apache.org/jira/browse/SPARK-35892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369487#comment-17369487 ] Apache Spark commented on SPARK-35892: -- User 'zengruios' has created a pull request for this issue: https://github.com/apache/spark/pull/33089 > numPartitions does not work when saves the RDD to database > -- > > Key: SPARK-35892 > URL: https://issues.apache.org/jira/browse/SPARK-35892 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: zengrui >Priority: Major > > When use SQL to insert data in spark to database, suppose the original RDD's > partition num is 10, and when i set the numPartitions to 20 in SQL(because i > need more parallelism to insert data to database), but it does not work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35892) numPartitions does not work when saves the RDD to database
[ https://issues.apache.org/jira/browse/SPARK-35892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369490#comment-17369490 ] Apache Spark commented on SPARK-35892: -- User 'zengruios' has created a pull request for this issue: https://github.com/apache/spark/pull/33089 > numPartitions does not work when saves the RDD to database > -- > > Key: SPARK-35892 > URL: https://issues.apache.org/jira/browse/SPARK-35892 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: zengrui >Priority: Major > > When use SQL to insert data in spark to database, suppose the original RDD's > partition num is 10, and when i set the numPartitions to 20 in SQL(because i > need more parallelism to insert data to database), but it does not work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35892) numPartitions does not work when saves the RDD to database
[ https://issues.apache.org/jira/browse/SPARK-35892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35892: Assignee: (was: Apache Spark) > numPartitions does not work when saves the RDD to database > -- > > Key: SPARK-35892 > URL: https://issues.apache.org/jira/browse/SPARK-35892 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: zengrui >Priority: Major > > When use SQL to insert data in spark to database, suppose the original RDD's > partition num is 10, and when i set the numPartitions to 20 in SQL(because i > need more parallelism to insert data to database), but it does not work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35892) numPartitions does not work when saves the RDD to database
[ https://issues.apache.org/jira/browse/SPARK-35892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35892: Assignee: Apache Spark > numPartitions does not work when saves the RDD to database > -- > > Key: SPARK-35892 > URL: https://issues.apache.org/jira/browse/SPARK-35892 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: zengrui >Assignee: Apache Spark >Priority: Major > > When use SQL to insert data in spark to database, suppose the original RDD's > partition num is 10, and when i set the numPartitions to 20 in SQL(because i > need more parallelism to insert data to database), but it does not work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34927) Support TPCDSQueryBenchmark in Benchmarks
[ https://issues.apache.org/jira/browse/SPARK-34927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369486#comment-17369486 ] Dongjoon Hyun commented on SPARK-34927: --- No problem~ I was just curious if we can have this at Apache Spark 3.2.0. :) > Support TPCDSQueryBenchmark in Benchmarks > - > > Key: SPARK-34927 > URL: https://issues.apache.org/jira/browse/SPARK-34927 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Minor > > Benchmarks.scala currently does not support TPCDSQueryBenchmark. We should > make it supported. See also > https://github.com/apache/spark/pull/32015#issuecomment-89046 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN
[ https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-35672. --- Fix Version/s: 3.2.0 Assignee: Erik Krogen Resolution: Fixed > Spark fails to launch executors with very large user classpath lists on YARN > > > Key: SPARK-35672 > URL: https://issues.apache.org/jira/browse/SPARK-35672 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 3.1.2 > Environment: Linux RHEL7 > Spark 3.1.1 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.2.0 > > > When running Spark on YARN, the {{user-class-path}} argument to > {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to > executor processes. The argument is specified once for each JAR, and the URIs > are fully-qualified, so the paths can be quite long. With large user JAR > lists (say 1000+), this can result in system-level argument length limits > being exceeded, typically manifesting as the error message: > {code} > /bin/bash: Argument list too long > {code} > A [Google > search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22] > indicates that this is not a theoretical problem and afflicts real users, > including ours. This issue was originally observed on Spark 2.3, but has been > confirmed to exist in the master branch as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35893) No Unit Test case for MySQLDialect.getCatalystType
[ https://issues.apache.org/jira/browse/SPARK-35893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35893: Assignee: (was: Apache Spark) > No Unit Test case for MySQLDialect.getCatalystType > --- > > Key: SPARK-35893 > URL: https://issues.apache.org/jira/browse/SPARK-35893 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.1, 3.1.2 >Reporter: zengrui >Priority: Minor > > No Unit Test case for MySQLDialect.getCatalystType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35863) Upgrade Ivy to 2.5.0
[ https://issues.apache.org/jira/browse/SPARK-35863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369473#comment-17369473 ] Apache Spark commented on SPARK-35863: -- User 'Kimahriman' has created a pull request for this issue: https://github.com/apache/spark/pull/33088 > Upgrade Ivy to 2.5.0 > > > Key: SPARK-35863 > URL: https://issues.apache.org/jira/browse/SPARK-35863 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 3.1.2 >Reporter: Adam Binford >Priority: Minor > > Apache Ivy 2.5.0 was released nearly two years ago. The new bug fixes and > features can be found here: > [https://ant.apache.org/ivy/history/latest-milestone/release-notes.html] > Most notably, the adding of ivy.maven.lookup.sources and > ivy.maven.lookup.javadoc configs can significantly speed up module resolution > time if these are turned off, especially behind a proxy. These could arguably > be turned off by default, because when submitting jobs you probably don't > care about the sources or javadoc jars. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35863) Upgrade Ivy to 2.5.0
[ https://issues.apache.org/jira/browse/SPARK-35863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35863: Assignee: Apache Spark > Upgrade Ivy to 2.5.0 > > > Key: SPARK-35863 > URL: https://issues.apache.org/jira/browse/SPARK-35863 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 3.1.2 >Reporter: Adam Binford >Assignee: Apache Spark >Priority: Minor > > Apache Ivy 2.5.0 was released nearly two years ago. The new bug fixes and > features can be found here: > [https://ant.apache.org/ivy/history/latest-milestone/release-notes.html] > Most notably, the adding of ivy.maven.lookup.sources and > ivy.maven.lookup.javadoc configs can significantly speed up module resolution > time if these are turned off, especially behind a proxy. These could arguably > be turned off by default, because when submitting jobs you probably don't > care about the sources or javadoc jars. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35893) No Unit Test case for MySQLDialect.getCatalystType
[ https://issues.apache.org/jira/browse/SPARK-35893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369474#comment-17369474 ] Apache Spark commented on SPARK-35893: -- User 'zengruios' has created a pull request for this issue: https://github.com/apache/spark/pull/33087 > No Unit Test case for MySQLDialect.getCatalystType > --- > > Key: SPARK-35893 > URL: https://issues.apache.org/jira/browse/SPARK-35893 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.1, 3.1.2 >Reporter: zengrui >Priority: Minor > > No Unit Test case for MySQLDialect.getCatalystType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35863) Upgrade Ivy to 2.5.0
[ https://issues.apache.org/jira/browse/SPARK-35863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35863: Assignee: (was: Apache Spark) > Upgrade Ivy to 2.5.0 > > > Key: SPARK-35863 > URL: https://issues.apache.org/jira/browse/SPARK-35863 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 3.1.2 >Reporter: Adam Binford >Priority: Minor > > Apache Ivy 2.5.0 was released nearly two years ago. The new bug fixes and > features can be found here: > [https://ant.apache.org/ivy/history/latest-milestone/release-notes.html] > Most notably, the adding of ivy.maven.lookup.sources and > ivy.maven.lookup.javadoc configs can significantly speed up module resolution > time if these are turned off, especially behind a proxy. These could arguably > be turned off by default, because when submitting jobs you probably don't > care about the sources or javadoc jars. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35893) No Unit Test case for MySQLDialect.getCatalystType
[ https://issues.apache.org/jira/browse/SPARK-35893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35893: Assignee: Apache Spark > No Unit Test case for MySQLDialect.getCatalystType > --- > > Key: SPARK-35893 > URL: https://issues.apache.org/jira/browse/SPARK-35893 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.1, 3.1.2 >Reporter: zengrui >Assignee: Apache Spark >Priority: Minor > > No Unit Test case for MySQLDialect.getCatalystType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35878) add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null
[ https://issues.apache.org/jira/browse/SPARK-35878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35878: - Assignee: Steve Loughran > add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null > --- > > Key: SPARK-35878 > URL: https://issues.apache.org/jira/browse/SPARK-35878 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > > People working with S3A and hadoop 3.3.1 outside of EC2, and without the AWS > CLI setup, are likely to hit: HADOOP-17771 > It should be straightforward to fix up the config similar to > SPARK-35868...this will be backwards (harmless) and forwards compatible (it's > the recommended workaround) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35878) add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null
[ https://issues.apache.org/jira/browse/SPARK-35878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35878. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33064 [https://github.com/apache/spark/pull/33064] > add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null > --- > > Key: SPARK-35878 > URL: https://issues.apache.org/jira/browse/SPARK-35878 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Fix For: 3.2.0 > > > People working with S3A and hadoop 3.3.1 outside of EC2, and without the AWS > CLI setup, are likely to hit: HADOOP-17771 > It should be straightforward to fix up the config similar to > SPARK-35868...this will be backwards (harmless) and forwards compatible (it's > the recommended workaround) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35895) Support subtracting Intervals from TimestampWithoutTZ
[ https://issues.apache.org/jira/browse/SPARK-35895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35895: Assignee: Gengliang Wang (was: Apache Spark) > Support subtracting Intervals from TimestampWithoutTZ > - > > Key: SPARK-35895 > URL: https://issues.apache.org/jira/browse/SPARK-35895 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Support and test the following operations: > TimestampWithoutTZ - Calendar interval > TimestampWithoutTZ - Year-Month interval > TimestampWithoutTZ - Daytime interval -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35895) Support subtracting Intervals from TimestampWithoutTZ
[ https://issues.apache.org/jira/browse/SPARK-35895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369436#comment-17369436 ] Apache Spark commented on SPARK-35895: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/33086 > Support subtracting Intervals from TimestampWithoutTZ > - > > Key: SPARK-35895 > URL: https://issues.apache.org/jira/browse/SPARK-35895 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Support and test the following operations: > TimestampWithoutTZ - Calendar interval > TimestampWithoutTZ - Year-Month interval > TimestampWithoutTZ - Daytime interval -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35895) Support subtracting Intervals from TimestampWithoutTZ
[ https://issues.apache.org/jira/browse/SPARK-35895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35895: Assignee: Apache Spark (was: Gengliang Wang) > Support subtracting Intervals from TimestampWithoutTZ > - > > Key: SPARK-35895 > URL: https://issues.apache.org/jira/browse/SPARK-35895 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Support and test the following operations: > TimestampWithoutTZ - Calendar interval > TimestampWithoutTZ - Year-Month interval > TimestampWithoutTZ - Daytime interval -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35895) Support subtracting Intervals from TimestampWithoutTZ
[ https://issues.apache.org/jira/browse/SPARK-35895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35895: Assignee: Apache Spark (was: Gengliang Wang) > Support subtracting Intervals from TimestampWithoutTZ > - > > Key: SPARK-35895 > URL: https://issues.apache.org/jira/browse/SPARK-35895 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Support and test the following operations: > TimestampWithoutTZ - Calendar interval > TimestampWithoutTZ - Year-Month interval > TimestampWithoutTZ - Daytime interval -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
[ https://issues.apache.org/jira/browse/SPARK-35894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35894: Assignee: (was: Apache Spark) > Introduce new style enforce to not import scala.collection.Seq/IndexedSeq > - > > Key: SPARK-35894 > URL: https://issues.apache.org/jira/browse/SPARK-35894 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Priority: Major > > Due to the changes on Scala 2.13, importing scala.collection.Seq or > scala.collection.IndexedSeq would bring weird issue. > (It doesn't bring issue on Scala 2.12 so it's high likely we couldn't > indicate the problem till we see complication failure in Scala 2.13.) > Please refer below page to see the details of changes around Seq. > https://docs.scala-lang.org/overviews/core/collections-migration-213.html > It would be nice if we can prevent the case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
[ https://issues.apache.org/jira/browse/SPARK-35894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369435#comment-17369435 ] Apache Spark commented on SPARK-35894: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/33085 > Introduce new style enforce to not import scala.collection.Seq/IndexedSeq > - > > Key: SPARK-35894 > URL: https://issues.apache.org/jira/browse/SPARK-35894 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Priority: Major > > Due to the changes on Scala 2.13, importing scala.collection.Seq or > scala.collection.IndexedSeq would bring weird issue. > (It doesn't bring issue on Scala 2.12 so it's high likely we couldn't > indicate the problem till we see complication failure in Scala 2.13.) > Please refer below page to see the details of changes around Seq. > https://docs.scala-lang.org/overviews/core/collections-migration-213.html > It would be nice if we can prevent the case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35895) Support subtracting Intervals from TimestampWithoutTZ
[ https://issues.apache.org/jira/browse/SPARK-35895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-35895: --- Summary: Support subtracting Intervals from TimestampWithoutTZ (was: Support subtract Intervals from TimestampWithoutTZ) > Support subtracting Intervals from TimestampWithoutTZ > - > > Key: SPARK-35895 > URL: https://issues.apache.org/jira/browse/SPARK-35895 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Support and test the following operations: > TimestampWithoutTZ - Calendar interval > TimestampWithoutTZ - Year-Month interval > TimestampWithoutTZ - Daytime interval -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
[ https://issues.apache.org/jira/browse/SPARK-35894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35894: Assignee: Apache Spark > Introduce new style enforce to not import scala.collection.Seq/IndexedSeq > - > > Key: SPARK-35894 > URL: https://issues.apache.org/jira/browse/SPARK-35894 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Assignee: Apache Spark >Priority: Major > > Due to the changes on Scala 2.13, importing scala.collection.Seq or > scala.collection.IndexedSeq would bring weird issue. > (It doesn't bring issue on Scala 2.12 so it's high likely we couldn't > indicate the problem till we see complication failure in Scala 2.13.) > Please refer below page to see the details of changes around Seq. > https://docs.scala-lang.org/overviews/core/collections-migration-213.html > It would be nice if we can prevent the case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35895) Support subtract Intervals from TimestampWithoutTZ
Gengliang Wang created SPARK-35895: -- Summary: Support subtract Intervals from TimestampWithoutTZ Key: SPARK-35895 URL: https://issues.apache.org/jira/browse/SPARK-35895 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Gengliang Wang Assignee: Gengliang Wang Support and test the following operations: TimestampWithoutTZ - Calendar interval TimestampWithoutTZ - Year-Month interval TimestampWithoutTZ - Daytime interval -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
[ https://issues.apache.org/jira/browse/SPARK-35894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-35894: - Summary: Introduce new style enforce to not import scala.collection.Seq/IndexedSeq (was: Introduce new style enforce to not import scala.collection.Seq) > Introduce new style enforce to not import scala.collection.Seq/IndexedSeq > - > > Key: SPARK-35894 > URL: https://issues.apache.org/jira/browse/SPARK-35894 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Priority: Major > > Due to the changes on Scala 2.13, importing scala.collection.Seq or > scala.collection.IndexedSeq would bring weird issue. > (It doesn't bring issue on Scala 2.12 so it's high likely we couldn't > indicate the problem till we see complication failure in Scala 2.13.) > Please refer below page to see the details of changes around Seq. > https://docs.scala-lang.org/overviews/core/collections-migration-213.html > It would be nice if we can prevent the case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq
Jungtaek Lim created SPARK-35894: Summary: Introduce new style enforce to not import scala.collection.Seq Key: SPARK-35894 URL: https://issues.apache.org/jira/browse/SPARK-35894 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.2.0 Reporter: Jungtaek Lim Due to the changes on Scala 2.13, importing scala.collection.Seq or scala.collection.IndexedSeq would bring weird issue. (It doesn't bring issue on Scala 2.12 so it's high likely we couldn't indicate the problem till we see complication failure in Scala 2.13.) Please refer below page to see the details of changes around Seq. https://docs.scala-lang.org/overviews/core/collections-migration-213.html It would be nice if we can prevent the case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35889) Support adding TimestampWithoutTZ with Interval types
[ https://issues.apache.org/jira/browse/SPARK-35889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-35889. Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33076 [https://github.com/apache/spark/pull/33076] > Support adding TimestampWithoutTZ with Interval types > - > > Key: SPARK-35889 > URL: https://issues.apache.org/jira/browse/SPARK-35889 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > > Supprot the following operations: > * TimestampWithoutTZ + Calendar interval > * TimestampWithoutTZ + Year-Month interval > * TimestampWithoutTZ + Daytime interval -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35628) RocksDBFileManager - load checkpoint from DFS
[ https://issues.apache.org/jira/browse/SPARK-35628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369423#comment-17369423 ] Apache Spark commented on SPARK-35628: -- User 'xuanyuanking' has created a pull request for this issue: https://github.com/apache/spark/pull/33084 > RocksDBFileManager - load checkpoint from DFS > - > > Key: SPARK-35628 > URL: https://issues.apache.org/jira/browse/SPARK-35628 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.2.0 > > > The implementation for the load path of the checkpoint data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35893) No Unit Test case for MySQLDialect.getCatalystType
zengrui created SPARK-35893: --- Summary: No Unit Test case for MySQLDialect.getCatalystType Key: SPARK-35893 URL: https://issues.apache.org/jira/browse/SPARK-35893 Project: Spark Issue Type: Test Components: Tests Affects Versions: 3.1.2, 3.1.1 Reporter: zengrui No Unit Test case for MySQLDialect.getCatalystType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35886) Codegen issue for decimal type
[ https://issues.apache.org/jira/browse/SPARK-35886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369363#comment-17369363 ] Apache Spark commented on SPARK-35886: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33082 > Codegen issue for decimal type > -- > > Key: SPARK-35886 > URL: https://issues.apache.org/jira/browse/SPARK-35886 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:scala} > spark.sql( > """ > |CREATE TABLE t1 ( > | c1 DECIMAL(18,6), > | c2 DECIMAL(18,6), > | c3 DECIMAL(18,6)) > |USING parquet; > |""".stripMargin) > spark.sql("SELECT sum(c1 * c3) + sum(c2 * c3) FROM t1").show > {code} > {noformat} > 20:23:36.272 ERROR > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to > compile: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 56, Column 6: Expression "agg_exprIsNull_2_0" is not > an rvalue > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 56, Column 6: Expression "agg_exprIsNull_2_0" is not an rvalue > at > org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12675) > at > org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7676) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35886) Codegen issue for decimal type
[ https://issues.apache.org/jira/browse/SPARK-35886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35886: Assignee: (was: Apache Spark) > Codegen issue for decimal type > -- > > Key: SPARK-35886 > URL: https://issues.apache.org/jira/browse/SPARK-35886 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:scala} > spark.sql( > """ > |CREATE TABLE t1 ( > | c1 DECIMAL(18,6), > | c2 DECIMAL(18,6), > | c3 DECIMAL(18,6)) > |USING parquet; > |""".stripMargin) > spark.sql("SELECT sum(c1 * c3) + sum(c2 * c3) FROM t1").show > {code} > {noformat} > 20:23:36.272 ERROR > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to > compile: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 56, Column 6: Expression "agg_exprIsNull_2_0" is not > an rvalue > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 56, Column 6: Expression "agg_exprIsNull_2_0" is not an rvalue > at > org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12675) > at > org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7676) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35886) Codegen issue for decimal type
[ https://issues.apache.org/jira/browse/SPARK-35886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369362#comment-17369362 ] Apache Spark commented on SPARK-35886: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33082 > Codegen issue for decimal type > -- > > Key: SPARK-35886 > URL: https://issues.apache.org/jira/browse/SPARK-35886 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:scala} > spark.sql( > """ > |CREATE TABLE t1 ( > | c1 DECIMAL(18,6), > | c2 DECIMAL(18,6), > | c3 DECIMAL(18,6)) > |USING parquet; > |""".stripMargin) > spark.sql("SELECT sum(c1 * c3) + sum(c2 * c3) FROM t1").show > {code} > {noformat} > 20:23:36.272 ERROR > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to > compile: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 56, Column 6: Expression "agg_exprIsNull_2_0" is not > an rvalue > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 56, Column 6: Expression "agg_exprIsNull_2_0" is not an rvalue > at > org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12675) > at > org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7676) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35886) Codegen issue for decimal type
[ https://issues.apache.org/jira/browse/SPARK-35886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35886: Assignee: Apache Spark > Codegen issue for decimal type > -- > > Key: SPARK-35886 > URL: https://issues.apache.org/jira/browse/SPARK-35886 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > How to reproduce this issue: > {code:scala} > spark.sql( > """ > |CREATE TABLE t1 ( > | c1 DECIMAL(18,6), > | c2 DECIMAL(18,6), > | c3 DECIMAL(18,6)) > |USING parquet; > |""".stripMargin) > spark.sql("SELECT sum(c1 * c3) + sum(c2 * c3) FROM t1").show > {code} > {noformat} > 20:23:36.272 ERROR > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to > compile: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 56, Column 6: Expression "agg_exprIsNull_2_0" is not > an rvalue > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 56, Column 6: Expression "agg_exprIsNull_2_0" is not an rvalue > at > org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12675) > at > org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7676) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35628) RocksDBFileManager - load checkpoint from DFS
[ https://issues.apache.org/jira/browse/SPARK-35628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-35628: Assignee: Yuanjian Li > RocksDBFileManager - load checkpoint from DFS > - > > Key: SPARK-35628 > URL: https://issues.apache.org/jira/browse/SPARK-35628 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > > The implementation for the load path of the checkpoint data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35628) RocksDBFileManager - load checkpoint from DFS
[ https://issues.apache.org/jira/browse/SPARK-35628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-35628. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32767 [https://github.com/apache/spark/pull/32767] > RocksDBFileManager - load checkpoint from DFS > - > > Key: SPARK-35628 > URL: https://issues.apache.org/jira/browse/SPARK-35628 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.2.0 > > > The implementation for the load path of the checkpoint data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34893) Support native session window
[ https://issues.apache.org/jira/browse/SPARK-34893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34893: Assignee: Apache Spark > Support native session window > - > > Key: SPARK-34893 > URL: https://issues.apache.org/jira/browse/SPARK-34893 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Assignee: Apache Spark >Priority: Major > > This issue tracks effort on supporting native session window, on both batch > query and streaming query. > This issue is the finalization of SPARK-10816 leveraging SPARK-34888, > SPARK-34889, SPARK-35861, SPARK-34891, SPARK-34892. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34893) Support native session window
[ https://issues.apache.org/jira/browse/SPARK-34893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34893: Assignee: (was: Apache Spark) > Support native session window > - > > Key: SPARK-34893 > URL: https://issues.apache.org/jira/browse/SPARK-34893 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Priority: Major > > This issue tracks effort on supporting native session window, on both batch > query and streaming query. > This issue is the finalization of SPARK-10816 leveraging SPARK-34888, > SPARK-34889, SPARK-35861, SPARK-34891, SPARK-34892. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34893) Support native session window
[ https://issues.apache.org/jira/browse/SPARK-34893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369340#comment-17369340 ] Apache Spark commented on SPARK-34893: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/33081 > Support native session window > - > > Key: SPARK-34893 > URL: https://issues.apache.org/jira/browse/SPARK-34893 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Priority: Major > > This issue tracks effort on supporting native session window, on both batch > query and streaming query. > This issue is the finalization of SPARK-10816 leveraging SPARK-34888, > SPARK-34889, SPARK-35861, SPARK-34891, SPARK-34892. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35892) numPartitions does not work when saves the RDD to database
[ https://issues.apache.org/jira/browse/SPARK-35892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zengrui updated SPARK-35892: Description: When use SQL to insert data in spark to database, suppose the original RDD's partition num is 10, and when i set the numPartitions to 20 in SQL(because i need more parallelism to insert data to database), but it does not work. was: The original RDD's partition num is 10, and if i set the numPartitions to 20 in SQL,because i need more parallelism to insert data to database, but it does not work. > numPartitions does not work when saves the RDD to database > -- > > Key: SPARK-35892 > URL: https://issues.apache.org/jira/browse/SPARK-35892 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: zengrui >Priority: Major > > When use SQL to insert data in spark to database, suppose the original RDD's > partition num is 10, and when i set the numPartitions to 20 in SQL(because i > need more parallelism to insert data to database), but it does not work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35778) Check multiply/divide of year-month intervals of any fields by numeric
[ https://issues.apache.org/jira/browse/SPARK-35778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369323#comment-17369323 ] Apache Spark commented on SPARK-35778: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/33080 > Check multiply/divide of year-month intervals of any fields by numeric > -- > > Key: SPARK-35778 > URL: https://issues.apache.org/jira/browse/SPARK-35778 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: PengLei >Priority: Major > Fix For: 3.2.0 > > > Write tests that checks multiply/divide of the following intervals by numeric: > # INTERVAL YEAR > # INTERVAL YEAR TO MONTH > # INTERVAL MONTH -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35728) Check multiply/divide of day-time intervals of any fields by numeric
[ https://issues.apache.org/jira/browse/SPARK-35728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369321#comment-17369321 ] Apache Spark commented on SPARK-35728: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/33080 > Check multiply/divide of day-time intervals of any fields by numeric > > > Key: SPARK-35728 > URL: https://issues.apache.org/jira/browse/SPARK-35728 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: PengLei >Priority: Major > Fix For: 3.2.0 > > > Write tests that checks multiply/divide of the following intervals by numeric: > # INTERVAL DAY > # INTERVAL DAY TO HOUR > # INTERVAL DAY TO MINUTE > # INTERVAL HOUR > # INTERVAL HOUR TO MINUTE > # INTERVAL HOUR TO SECOND > # INTERVAL MINUTE > # INTERVAL MINUTE TO SECOND > # INTERVAL SECOND -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35892) numPartitions does not work when saves the RDD to database
zengrui created SPARK-35892: --- Summary: numPartitions does not work when saves the RDD to database Key: SPARK-35892 URL: https://issues.apache.org/jira/browse/SPARK-35892 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.1 Reporter: zengrui The original RDD's partition num is 10, and if i set the numPartitions to 20 in SQL,because i need more parallelism to insert data to database, but it does not work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org