[jira] [Commented] (SPARK-35905) Adding withTable in SQLQuerySuite about test("SPARK-33338: GROUP BY using literal map should not fail")

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369813#comment-17369813
 ] 

Apache Spark commented on SPARK-35905:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/33092

> Adding withTable  in SQLQuerySuite about   test("SPARK-8: GROUP BY using 
> literal map should not fail")
> --
>
> Key: SPARK-35905
> URL: https://issues.apache.org/jira/browse/SPARK-35905
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.2.0, 3.1.3, 3.0.4
>
>
> Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using 
> literal map should not fail")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35905) Adding withTable in SQLQuerySuite about test("SPARK-33338: GROUP BY using literal map should not fail")

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369812#comment-17369812
 ] 

Apache Spark commented on SPARK-35905:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/33092

> Adding withTable  in SQLQuerySuite about   test("SPARK-8: GROUP BY using 
> literal map should not fail")
> --
>
> Key: SPARK-35905
> URL: https://issues.apache.org/jira/browse/SPARK-35905
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.2.0, 3.1.3, 3.0.4
>
>
> Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using 
> literal map should not fail")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35905) Adding withTable in SQLQuerySuite about test("SPARK-33338: GROUP BY using literal map should not fail")

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35905:


Assignee: (was: Apache Spark)

> Adding withTable  in SQLQuerySuite about   test("SPARK-8: GROUP BY using 
> literal map should not fail")
> --
>
> Key: SPARK-35905
> URL: https://issues.apache.org/jira/browse/SPARK-35905
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.2.0, 3.1.3, 3.0.4
>
>
> Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using 
> literal map should not fail")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35905) Adding withTable in SQLQuerySuite about test("SPARK-33338: GROUP BY using literal map should not fail")

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35905:


Assignee: Apache Spark

> Adding withTable  in SQLQuerySuite about   test("SPARK-8: GROUP BY using 
> literal map should not fail")
> --
>
> Key: SPARK-35905
> URL: https://issues.apache.org/jira/browse/SPARK-35905
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0, 3.1.3, 3.0.4
>
>
> Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using 
> literal map should not fail")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35905) Adding withTable in SQLQuerySuite about test("SPARK-33338: GROUP BY using literal map should not fail")

2021-06-25 Thread angerszhu (Jira)
angerszhu created SPARK-35905:
-

 Summary: Adding withTable  in SQLQuerySuite about   
test("SPARK-8: GROUP BY using literal map should not fail")
 Key: SPARK-35905
 URL: https://issues.apache.org/jira/browse/SPARK-35905
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0, 3.0.2, 2.4.8
Reporter: angerszhu
 Fix For: 3.2.0, 3.1.3, 3.0.4


Adding withTable in SQLQuerySuite about test("SPARK-8: GROUP BY using 
literal map should not fail")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35879) Fix performance regression caused by collectFetchRequests

2021-06-25 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-35879.
--
Fix Version/s: 3.1.3
   3.2.0
   Resolution: Fixed

Issue resolved by pull request 33063
[https://github.com/apache/spark/pull/33063]

> Fix performance regression caused by collectFetchRequests
> -
>
> Key: SPARK-35879
> URL: https://issues.apache.org/jira/browse/SPARK-35879
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.2.0, 3.1.3
>
>
> {code:java}
> ```sql
>  SET spark.sql.adaptive.enabled=true;
>  SET spark.sql.shuffle.partitions=3000;
>  SELECT /*+ REPARTITION */ 1 as pid, id from range(1, 100, 1, 500);
>  SELECT /*+ REPARTITION(pid, id) */ 1 as pid, id from range(1, 100, 1, 
> 500);
>  ```{code}
> {code:java}
> ```log
>  21/06/23 13:54:22 DEBUG ShuffleBlockFetcherIterator: maxBytesInFlight: 
> 50331648, targetRemoteRequestSize: 10066329, maxBlocksInFlightPerAddress: 
> 2147483647
>  21/06/23 13:54:38 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 2314708 at BlockManagerId(2, 10.1.3.114, 36423, None) with 86 blocks
>  21/06/23 13:54:59 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 2636612 at BlockManagerId(3, 10.1.3.115, 34293, None) with 87 blocks
>  21/06/23 13:55:18 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 2508706 at BlockManagerId(4, 10.1.3.116, 41869, None) with 90 blocks
>  21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 2350854 at BlockManagerId(5, 10.1.3.117, 45787, None) with 85 blocks
>  21/06/23 13:55:34 INFO ShuffleBlockFetcherIterator: Getting 438 (11.8 MiB) 
> non-empty blocks including 90 (2.5 MiB) local and 0 (0.0 B) host-local and 
> 348 (9.4 MiB) remote blocks
>  21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Sending request for 87 
> blocks (2.5 MiB) from 10.1.3.115:34293
>  21/06/23 13:55:34 INFO TransportClientFactory: Successfully created 
> connection to /10.1.3.115:34293 after 1 ms (0 ms spent in bootstraps)
>  21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Sending request for 90 
> blocks (2.4 MiB) from 10.1.3.116:41869
>  21/06/23 13:55:34 INFO TransportClientFactory: Successfully created 
> connection to /10.1.3.116:41869 after 2 ms (0 ms spent in bootstraps)
>  21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Sending request for 85 
> blocks (2.2 MiB) from 10.1.3.117:45787
>  ```{code}
> {code:java}
> ```log
>  21/06/23 14:00:45 INFO MapOutputTracker: Broadcast outputstatuses size = 
> 411, actual size = 828997
>  21/06/23 14:00:45 INFO MapOutputTrackerWorker: Got the map output locations
>  21/06/23 14:00:45 DEBUG ShuffleBlockFetcherIterator: maxBytesInFlight: 
> 50331648, targetRemoteRequestSize: 10066329, maxBlocksInFlightPerAddress: 
> 2147483647
>  21/06/23 14:00:55 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 1894389 at BlockManagerId(2, 10.1.3.114, 36423, None) with 99 blocks
>  21/06/23 14:01:04 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 1919993 at BlockManagerId(3, 10.1.3.115, 34293, None) with 100 blocks
>  21/06/23 14:01:14 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 1977186 at BlockManagerId(5, 10.1.3.117, 45787, None) with 103 blocks
>  21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 1938336 at BlockManagerId(4, 10.1.3.116, 41869, None) with 101 blocks
>  21/06/23 14:01:23 INFO ShuffleBlockFetcherIterator: Getting 500 (9.1 MiB) 
> non-empty blocks including 97 (1820.3 KiB) local and 0 (0.0 B) host-local and 
> 403 (7.4 MiB) remote blocks
>  21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 101 
> blocks (1892.9 KiB) from 10.1.3.116:41869
>  21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 103 
> blocks (1930.8 KiB) from 10.1.3.117:45787
>  21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 99 
> blocks (1850.0 KiB) from 10.1.3.114:36423
>  21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 100 
> blocks (1875.0 KiB) from 10.1.3.115:34293
>  21/06/23 14:01:23 INFO ShuffleBlockFetcherIterator: Started 4 remote fetches 
> in 37889 ms
>  ```{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35879) Fix performance regression caused by collectFetchRequests

2021-06-25 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-35879:


Assignee: Kent Yao

> Fix performance regression caused by collectFetchRequests
> -
>
> Key: SPARK-35879
> URL: https://issues.apache.org/jira/browse/SPARK-35879
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> {code:java}
> ```sql
>  SET spark.sql.adaptive.enabled=true;
>  SET spark.sql.shuffle.partitions=3000;
>  SELECT /*+ REPARTITION */ 1 as pid, id from range(1, 100, 1, 500);
>  SELECT /*+ REPARTITION(pid, id) */ 1 as pid, id from range(1, 100, 1, 
> 500);
>  ```{code}
> {code:java}
> ```log
>  21/06/23 13:54:22 DEBUG ShuffleBlockFetcherIterator: maxBytesInFlight: 
> 50331648, targetRemoteRequestSize: 10066329, maxBlocksInFlightPerAddress: 
> 2147483647
>  21/06/23 13:54:38 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 2314708 at BlockManagerId(2, 10.1.3.114, 36423, None) with 86 blocks
>  21/06/23 13:54:59 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 2636612 at BlockManagerId(3, 10.1.3.115, 34293, None) with 87 blocks
>  21/06/23 13:55:18 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 2508706 at BlockManagerId(4, 10.1.3.116, 41869, None) with 90 blocks
>  21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 2350854 at BlockManagerId(5, 10.1.3.117, 45787, None) with 85 blocks
>  21/06/23 13:55:34 INFO ShuffleBlockFetcherIterator: Getting 438 (11.8 MiB) 
> non-empty blocks including 90 (2.5 MiB) local and 0 (0.0 B) host-local and 
> 348 (9.4 MiB) remote blocks
>  21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Sending request for 87 
> blocks (2.5 MiB) from 10.1.3.115:34293
>  21/06/23 13:55:34 INFO TransportClientFactory: Successfully created 
> connection to /10.1.3.115:34293 after 1 ms (0 ms spent in bootstraps)
>  21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Sending request for 90 
> blocks (2.4 MiB) from 10.1.3.116:41869
>  21/06/23 13:55:34 INFO TransportClientFactory: Successfully created 
> connection to /10.1.3.116:41869 after 2 ms (0 ms spent in bootstraps)
>  21/06/23 13:55:34 DEBUG ShuffleBlockFetcherIterator: Sending request for 85 
> blocks (2.2 MiB) from 10.1.3.117:45787
>  ```{code}
> {code:java}
> ```log
>  21/06/23 14:00:45 INFO MapOutputTracker: Broadcast outputstatuses size = 
> 411, actual size = 828997
>  21/06/23 14:00:45 INFO MapOutputTrackerWorker: Got the map output locations
>  21/06/23 14:00:45 DEBUG ShuffleBlockFetcherIterator: maxBytesInFlight: 
> 50331648, targetRemoteRequestSize: 10066329, maxBlocksInFlightPerAddress: 
> 2147483647
>  21/06/23 14:00:55 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 1894389 at BlockManagerId(2, 10.1.3.114, 36423, None) with 99 blocks
>  21/06/23 14:01:04 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 1919993 at BlockManagerId(3, 10.1.3.115, 34293, None) with 100 blocks
>  21/06/23 14:01:14 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 1977186 at BlockManagerId(5, 10.1.3.117, 45787, None) with 103 blocks
>  21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
> of 1938336 at BlockManagerId(4, 10.1.3.116, 41869, None) with 101 blocks
>  21/06/23 14:01:23 INFO ShuffleBlockFetcherIterator: Getting 500 (9.1 MiB) 
> non-empty blocks including 97 (1820.3 KiB) local and 0 (0.0 B) host-local and 
> 403 (7.4 MiB) remote blocks
>  21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 101 
> blocks (1892.9 KiB) from 10.1.3.116:41869
>  21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 103 
> blocks (1930.8 KiB) from 10.1.3.117:45787
>  21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 99 
> blocks (1850.0 KiB) from 10.1.3.114:36423
>  21/06/23 14:01:23 DEBUG ShuffleBlockFetcherIterator: Sending request for 100 
> blocks (1875.0 KiB) from 10.1.3.115:34293
>  21/06/23 14:01:23 INFO ShuffleBlockFetcherIterator: Started 4 remote fetches 
> in 37889 ms
>  ```{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35904) Collapse above RebalancePartitions

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35904:


Assignee: (was: Apache Spark)

> Collapse above RebalancePartitions
> --
>
> Key: SPARK-35904
> URL: https://issues.apache.org/jira/browse/SPARK-35904
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Make RebalancePartitions extends RepartitionOperation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35904) Collapse above RebalancePartitions

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35904:


Assignee: Apache Spark

> Collapse above RebalancePartitions
> --
>
> Key: SPARK-35904
> URL: https://issues.apache.org/jira/browse/SPARK-35904
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> Make RebalancePartitions extends RepartitionOperation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35904) Collapse above RebalancePartitions

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369796#comment-17369796
 ] 

Apache Spark commented on SPARK-35904:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/33099

> Collapse above RebalancePartitions
> --
>
> Key: SPARK-35904
> URL: https://issues.apache.org/jira/browse/SPARK-35904
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Make RebalancePartitions extends RepartitionOperation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35904) Collapse above RebalancePartitions

2021-06-25 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-35904:

Description: Make RebalancePartitions extends RepartitionOperation.

> Collapse above RebalancePartitions
> --
>
> Key: SPARK-35904
> URL: https://issues.apache.org/jira/browse/SPARK-35904
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Make RebalancePartitions extends RepartitionOperation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35904) Collapse above RebalancePartitions

2021-06-25 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-35904:
---

 Summary: Collapse above RebalancePartitions
 Key: SPARK-35904
 URL: https://issues.apache.org/jira/browse/SPARK-35904
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35903) Parameterize `master` in TPCDSQueryBenchmark

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35903:


Assignee: (was: Apache Spark)

> Parameterize `master` in TPCDSQueryBenchmark 
> -
>
> Key: SPARK-35903
> URL: https://issues.apache.org/jira/browse/SPARK-35903
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35903) Parameterize `master` in TPCDSQueryBenchmark

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35903:


Assignee: Apache Spark

> Parameterize `master` in TPCDSQueryBenchmark 
> -
>
> Key: SPARK-35903
> URL: https://issues.apache.org/jira/browse/SPARK-35903
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35903) Parameterize `master` in TPCDSQueryBenchmark

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369786#comment-17369786
 ] 

Apache Spark commented on SPARK-35903:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33098

> Parameterize `master` in TPCDSQueryBenchmark 
> -
>
> Key: SPARK-35903
> URL: https://issues.apache.org/jira/browse/SPARK-35903
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35884) EXPLAIN FORMATTED for AQE

2021-06-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-35884:
--
Parent: SPARK-33828
Issue Type: Sub-task  (was: Bug)

> EXPLAIN FORMATTED for AQE
> -
>
> Key: SPARK-35884
> URL: https://issues.apache.org/jira/browse/SPARK-35884
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35903) Parameterize `master` in TPCDSQueryBenchmark

2021-06-25 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-35903:
-

 Summary: Parameterize `master` in TPCDSQueryBenchmark 
 Key: SPARK-35903
 URL: https://issues.apache.org/jira/browse/SPARK-35903
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35902) spark.driver.log.dfsDir with hdfs scheme failed

2021-06-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369783#comment-17369783
 ] 

Dongjoon Hyun commented on SPARK-35902:
---

Thank you for reporting, [~yghu]. Go for it!

> spark.driver.log.dfsDir with hdfs scheme failed
> ---
>
> Key: SPARK-35902
> URL: https://issues.apache.org/jira/browse/SPARK-35902
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
> Environment: Spark3.1.1 Hadoop 3.1.1
>Reporter: YuanGuanhu
>Priority: Major
>
> when i set spark.driver.log.dfsDir value with hdfs scheme path, it throw an 
> exception:
> spark.driver.log.persistToDfs.enabled = true
> spark.driver.log.dfsDir = hdfs://hacluster/spark2xdriverlogs1
>  
> 2021-06-25 14:56:45,786 | ERROR | main | Could not persist driver logs to dfs 
> | org.apache.spark.util.logging.DriverLogger.logError(Logging.scala:94)
>  java.lang.IllegalArgumentException: Pathname 
> /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 from 
> /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 is not a 
> valid DFS filename.
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:252)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1375)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1372)
>  at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1389)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1364)
>  at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2410)
>  at 
> org.apache.spark.deploy.SparkHadoopUtil$.createFile(SparkHadoopUtil.scala:528)
>  at 
> org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.init(DriverLogger.scala:118)
>  at 
> org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.(DriverLogger.scala:104)
>  at 
> org.apache.spark.util.logging.DriverLogger.startSync(DriverLogger.scala:72)
>  at 
> org.apache.spark.SparkContext.$anonfun$postApplicationStart$1(SparkContext.scala:2688)
>  at 
> org.apache.spark.SparkContext.$anonfun$postApplicationStart$1$adapted(SparkContext.scala:2688)
>  at scala.Option.foreach(Option.scala:407)
>  at 
> org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2688)
>  at org.apache.spark.SparkContext.(SparkContext.scala:640)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2814)
>  at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:947)
>  at scala.Option.getOrElse(Option.scala:189)
>  at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:941)
>  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106)
>  at $line3.$read$$iw$$iw.(:15)
>  at $line3.$read$$iw.(:42)
>  at $line3.$read.(:44)
>  at $line3.$read$.(:48)
>  at $line3.$read$.()
>  at $line3.$eval$.$print$lzycompute(:7)
>  at $line3.$eval$.$print(:6)
>  at $line3.$eval.$print()
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
>  at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
>  at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
>  at 
> scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
>  at 
> scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
>  at 
> scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
>  at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
>  at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
>  at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
>  at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:224)
>  at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
>  at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:224)
>  at 
> org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$2(SparkILoop.scala:83)
>  at scala.collection.immutable.List.foreach(List.scala:392)
>  at 
> org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$1(SparkILoop.scala:83)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:99)
>  at 

[jira] [Updated] (SPARK-35902) spark.driver.log.dfsDir with hdfs scheme failed

2021-06-25 Thread YuanGuanhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YuanGuanhu updated SPARK-35902:
---
Description: 
when i set spark.driver.log.dfsDir value with hdfs scheme path, it throw an 
exception:

spark.driver.log.persistToDfs.enabled = true

spark.driver.log.dfsDir = hdfs://hacluster/spark2xdriverlogs1

 

2021-06-25 14:56:45,786 | ERROR | main | Could not persist driver logs to dfs | 
org.apache.spark.util.logging.DriverLogger.logError(Logging.scala:94)
 java.lang.IllegalArgumentException: Pathname 
/opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 from 
/opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 is not a valid 
DFS filename.
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:252)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1375)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1372)
 at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1389)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1364)
 at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2410)
 at 
org.apache.spark.deploy.SparkHadoopUtil$.createFile(SparkHadoopUtil.scala:528)
 at 
org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.init(DriverLogger.scala:118)
 at 
org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.(DriverLogger.scala:104)
 at org.apache.spark.util.logging.DriverLogger.startSync(DriverLogger.scala:72)
 at 
org.apache.spark.SparkContext.$anonfun$postApplicationStart$1(SparkContext.scala:2688)
 at 
org.apache.spark.SparkContext.$anonfun$postApplicationStart$1$adapted(SparkContext.scala:2688)
 at scala.Option.foreach(Option.scala:407)
 at org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2688)
 at org.apache.spark.SparkContext.(SparkContext.scala:640)
 at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2814)
 at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:947)
 at scala.Option.getOrElse(Option.scala:189)
 at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:941)
 at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106)
 at $line3.$read$$iw$$iw.(:15)
 at $line3.$read$$iw.(:42)
 at $line3.$read.(:44)
 at $line3.$read$.(:48)
 at $line3.$read$.()
 at $line3.$eval$.$print$lzycompute(:7)
 at $line3.$eval$.$print(:6)
 at $line3.$eval.$print()
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
 at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
 at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
 at 
scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
 at 
scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
 at 
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
 at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
 at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
 at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
 at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:224)
 at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
 at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:224)
 at 
org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$2(SparkILoop.scala:83)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at 
org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$1(SparkILoop.scala:83)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:99)
 at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:83)
 at org.apache.spark.repl.SparkILoop.$anonfun$process$4(SparkILoop.scala:165)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 at scala.tools.nsc.interpreter.ILoop.$anonfun$mumly$1(ILoop.scala:168)
 at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
 at scala.tools.nsc.interpreter.ILoop.mumly(ILoop.scala:165)
 at org.apache.spark.repl.SparkILoop.loopPostInit$1(SparkILoop.scala:153)
 at org.apache.spark.repl.SparkILoop.$anonfun$process$10(SparkILoop.scala:221)
 at 
org.apache.spark.repl.SparkILoop.withSuppressedSettings$1(SparkILoop.scala:189)
 at 

[jira] [Commented] (SPARK-35902) spark.driver.log.dfsDir with hdfs scheme failed

2021-06-25 Thread YuanGuanhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369780#comment-17369780
 ] 

YuanGuanhu commented on SPARK-35902:


I'd like to work on this.

> spark.driver.log.dfsDir with hdfs scheme failed
> ---
>
> Key: SPARK-35902
> URL: https://issues.apache.org/jira/browse/SPARK-35902
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
> Environment: Spark3.1.1 Hadoop 3.1.1
>Reporter: YuanGuanhu
>Priority: Major
>
> when i set spark.driver.log.dfsDir value with hdfs scheme path, it throw an 
> exception:
> spark.driver.log.persistToDfs.enabled = true
> spark.driver.log.dfsDir = hdfs://hacluster/sparkdriverlogs
>  
> 2021-06-25 14:56:45,786 | ERROR | main | Could not persist driver logs to dfs 
> | org.apache.spark.util.logging.DriverLogger.logError(Logging.scala:94)
> java.lang.IllegalArgumentException: Pathname 
> /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 from 
> /opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 is not a 
> valid DFS filename.
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:252)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1375)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1372)
>  at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1389)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1364)
>  at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2410)
>  at 
> org.apache.spark.deploy.SparkHadoopUtil$.createFile(SparkHadoopUtil.scala:528)
>  at 
> org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.init(DriverLogger.scala:118)
>  at 
> org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.(DriverLogger.scala:104)
>  at 
> org.apache.spark.util.logging.DriverLogger.startSync(DriverLogger.scala:72)
>  at 
> org.apache.spark.SparkContext.$anonfun$postApplicationStart$1(SparkContext.scala:2688)
>  at 
> org.apache.spark.SparkContext.$anonfun$postApplicationStart$1$adapted(SparkContext.scala:2688)
>  at scala.Option.foreach(Option.scala:407)
>  at 
> org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2688)
>  at org.apache.spark.SparkContext.(SparkContext.scala:640)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2814)
>  at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:947)
>  at scala.Option.getOrElse(Option.scala:189)
>  at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:941)
>  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106)
>  at $line3.$read$$iw$$iw.(:15)
>  at $line3.$read$$iw.(:42)
>  at $line3.$read.(:44)
>  at $line3.$read$.(:48)
>  at $line3.$read$.()
>  at $line3.$eval$.$print$lzycompute(:7)
>  at $line3.$eval$.$print(:6)
>  at $line3.$eval.$print()
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
>  at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
>  at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
>  at 
> scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
>  at 
> scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
>  at 
> scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
>  at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
>  at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
>  at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
>  at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:224)
>  at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
>  at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:224)
>  at 
> org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$2(SparkILoop.scala:83)
>  at scala.collection.immutable.List.foreach(List.scala:392)
>  at 
> org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$1(SparkILoop.scala:83)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:99)
>  at 

[jira] [Updated] (SPARK-35902) spark.driver.log.dfsDir with hdfs scheme failed

2021-06-25 Thread YuanGuanhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YuanGuanhu updated SPARK-35902:
---
Description: 
when i set spark.driver.log.dfsDir value with hdfs scheme path, it throw an 
exception:

spark.driver.log.persistToDfs.enabled = true

spark.driver.log.dfsDir = hdfs://hacluster/spark2xdriverlogs

 

2021-06-25 14:56:45,786 | ERROR | main | Could not persist driver logs to dfs | 
org.apache.spark.util.logging.DriverLogger.logError(Logging.scala:94)
 java.lang.IllegalArgumentException: Pathname 
/opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 from 
/opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 is not a valid 
DFS filename.
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:252)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1375)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1372)
 at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1389)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1364)
 at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2410)
 at 
org.apache.spark.deploy.SparkHadoopUtil$.createFile(SparkHadoopUtil.scala:528)
 at 
org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.init(DriverLogger.scala:118)
 at 
org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.(DriverLogger.scala:104)
 at org.apache.spark.util.logging.DriverLogger.startSync(DriverLogger.scala:72)
 at 
org.apache.spark.SparkContext.$anonfun$postApplicationStart$1(SparkContext.scala:2688)
 at 
org.apache.spark.SparkContext.$anonfun$postApplicationStart$1$adapted(SparkContext.scala:2688)
 at scala.Option.foreach(Option.scala:407)
 at org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2688)
 at org.apache.spark.SparkContext.(SparkContext.scala:640)
 at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2814)
 at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:947)
 at scala.Option.getOrElse(Option.scala:189)
 at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:941)
 at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106)
 at $line3.$read$$iw$$iw.(:15)
 at $line3.$read$$iw.(:42)
 at $line3.$read.(:44)
 at $line3.$read$.(:48)
 at $line3.$read$.()
 at $line3.$eval$.$print$lzycompute(:7)
 at $line3.$eval$.$print(:6)
 at $line3.$eval.$print()
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
 at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
 at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
 at 
scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
 at 
scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
 at 
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
 at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
 at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
 at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
 at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:224)
 at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
 at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:224)
 at 
org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$2(SparkILoop.scala:83)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at 
org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$1(SparkILoop.scala:83)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:99)
 at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:83)
 at org.apache.spark.repl.SparkILoop.$anonfun$process$4(SparkILoop.scala:165)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 at scala.tools.nsc.interpreter.ILoop.$anonfun$mumly$1(ILoop.scala:168)
 at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
 at scala.tools.nsc.interpreter.ILoop.mumly(ILoop.scala:165)
 at org.apache.spark.repl.SparkILoop.loopPostInit$1(SparkILoop.scala:153)
 at org.apache.spark.repl.SparkILoop.$anonfun$process$10(SparkILoop.scala:221)
 at 
org.apache.spark.repl.SparkILoop.withSuppressedSettings$1(SparkILoop.scala:189)
 at 

[jira] [Created] (SPARK-35902) spark.driver.log.dfsDir with hdfs scheme failed

2021-06-25 Thread YuanGuanhu (Jira)
YuanGuanhu created SPARK-35902:
--

 Summary: spark.driver.log.dfsDir with hdfs scheme failed
 Key: SPARK-35902
 URL: https://issues.apache.org/jira/browse/SPARK-35902
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.2, 3.1.1, 3.1.0
 Environment: Spark3.1.1 Hadoop 3.1.1
Reporter: YuanGuanhu


when i set spark.driver.log.dfsDir value with hdfs scheme path, it throw an 
exception:

spark.driver.log.persistToDfs.enabled = true

spark.driver.log.dfsDir = hdfs://hacluster/sparkdriverlogs

 

2021-06-25 14:56:45,786 | ERROR | main | Could not persist driver logs to dfs | 
org.apache.spark.util.logging.DriverLogger.logError(Logging.scala:94)
java.lang.IllegalArgumentException: Pathname 
/opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 from 
/opt/client811/Spark2x/spark/hdfs:/hacluster/spark2xdriverlogs1 is not a valid 
DFS filename.
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:252)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1375)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1372)
 at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1389)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1364)
 at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2410)
 at 
org.apache.spark.deploy.SparkHadoopUtil$.createFile(SparkHadoopUtil.scala:528)
 at 
org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.init(DriverLogger.scala:118)
 at 
org.apache.spark.util.logging.DriverLogger$DfsAsyncWriter.(DriverLogger.scala:104)
 at org.apache.spark.util.logging.DriverLogger.startSync(DriverLogger.scala:72)
 at 
org.apache.spark.SparkContext.$anonfun$postApplicationStart$1(SparkContext.scala:2688)
 at 
org.apache.spark.SparkContext.$anonfun$postApplicationStart$1$adapted(SparkContext.scala:2688)
 at scala.Option.foreach(Option.scala:407)
 at org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:2688)
 at org.apache.spark.SparkContext.(SparkContext.scala:640)
 at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2814)
 at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:947)
 at scala.Option.getOrElse(Option.scala:189)
 at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:941)
 at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106)
 at $line3.$read$$iw$$iw.(:15)
 at $line3.$read$$iw.(:42)
 at $line3.$read.(:44)
 at $line3.$read$.(:48)
 at $line3.$read$.()
 at $line3.$eval$.$print$lzycompute(:7)
 at $line3.$eval$.$print(:6)
 at $line3.$eval.$print()
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
 at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
 at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
 at 
scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
 at 
scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
 at 
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
 at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
 at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
 at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
 at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:224)
 at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
 at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:224)
 at 
org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$2(SparkILoop.scala:83)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at 
org.apache.spark.repl.SparkILoop.$anonfun$initializeSpark$1(SparkILoop.scala:83)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:99)
 at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:83)
 at org.apache.spark.repl.SparkILoop.$anonfun$process$4(SparkILoop.scala:165)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 at scala.tools.nsc.interpreter.ILoop.$anonfun$mumly$1(ILoop.scala:168)
 at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
 at scala.tools.nsc.interpreter.ILoop.mumly(ILoop.scala:165)
 at 

[jira] [Resolved] (SPARK-35466) Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.

2021-06-25 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-35466.
---
Fix Version/s: 3.2.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 33094
https://github.com/apache/spark/pull/33094

> Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.
> -
>
> Key: SPARK-35466
> URL: https://issues.apache.org/jira/browse/SPARK-35466
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35901) Refine type hints in pyspark.pandas.window

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35901:


Assignee: (was: Apache Spark)

> Refine type hints in pyspark.pandas.window
> --
>
> Key: SPARK-35901
> URL: https://issues.apache.org/jira/browse/SPARK-35901
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> We can use more strict type hints for functions in {{pyspark.pandas.window}} 
> using the generic way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35901) Refine type hints in pyspark.pandas.window

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35901:


Assignee: Apache Spark

> Refine type hints in pyspark.pandas.window
> --
>
> Key: SPARK-35901
> URL: https://issues.apache.org/jira/browse/SPARK-35901
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> We can use more strict type hints for functions in {{pyspark.pandas.window}} 
> using the generic way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35901) Refine type hints in pyspark.pandas.window

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369776#comment-17369776
 ] 

Apache Spark commented on SPARK-35901:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33097

> Refine type hints in pyspark.pandas.window
> --
>
> Key: SPARK-35901
> URL: https://issues.apache.org/jira/browse/SPARK-35901
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> We can use more strict type hints for functions in {{pyspark.pandas.window}} 
> using the generic way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions

2021-06-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35899:
-

Assignee: Anton Okolnychyi

> Add a utility to convert connector expressions to Catalyst expressions
> --
>
> Key: SPARK-35899
> URL: https://issues.apache.org/jira/browse/SPARK-35899
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
>
> There are more and more places that require converting a v2 connector 
> expression to an internal Catalyst expression. We need to build a utility 
> method to avoid having the same logic in a lot of places.
> See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions

2021-06-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35899.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33096
[https://github.com/apache/spark/pull/33096]

> Add a utility to convert connector expressions to Catalyst expressions
> --
>
> Key: SPARK-35899
> URL: https://issues.apache.org/jira/browse/SPARK-35899
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.2.0
>
>
> There are more and more places that require converting a v2 connector 
> expression to an internal Catalyst expression. We need to build a utility 
> method to avoid having the same logic in a lot of places.
> See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq/IndexedSeq

2021-06-25 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-35894.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33085
[https://github.com/apache/spark/pull/33085]

> Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
> -
>
> Key: SPARK-35894
> URL: https://issues.apache.org/jira/browse/SPARK-35894
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.2.0
>
>
> Due to the changes on Scala 2.13, importing scala.collection.Seq or 
> scala.collection.IndexedSeq would bring weird issue.
> (It doesn't bring issue on Scala 2.12 so it's high likely we couldn't 
> indicate the problem till we see complication failure in Scala 2.13.)
> Please refer below page to see the details of changes around Seq.
> https://docs.scala-lang.org/overviews/core/collections-migration-213.html
> It would be nice if we can prevent the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq/IndexedSeq

2021-06-25 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-35894:


Assignee: Jungtaek Lim

> Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
> -
>
> Key: SPARK-35894
> URL: https://issues.apache.org/jira/browse/SPARK-35894
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> Due to the changes on Scala 2.13, importing scala.collection.Seq or 
> scala.collection.IndexedSeq would bring weird issue.
> (It doesn't bring issue on Scala 2.12 so it's high likely we couldn't 
> indicate the problem till we see complication failure in Scala 2.13.)
> Please refer below page to see the details of changes around Seq.
> https://docs.scala-lang.org/overviews/core/collections-migration-213.html
> It would be nice if we can prevent the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35901) Refine type hints in pyspark.pandas.window

2021-06-25 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-35901:
-

 Summary: Refine type hints in pyspark.pandas.window
 Key: SPARK-35901
 URL: https://issues.apache.org/jira/browse/SPARK-35901
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin


We can use more strict type hints for functions in {{pyspark.pandas.window}} 
using the generic way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35899:


Assignee: Apache Spark

> Add a utility to convert connector expressions to Catalyst expressions
> --
>
> Key: SPARK-35899
> URL: https://issues.apache.org/jira/browse/SPARK-35899
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Assignee: Apache Spark
>Priority: Major
>
> There are more and more places that require converting a v2 connector 
> expression to an internal Catalyst expression. We need to build a utility 
> method to avoid having the same logic in a lot of places.
> See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369760#comment-17369760
 ] 

Apache Spark commented on SPARK-35899:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/33096

> Add a utility to convert connector expressions to Catalyst expressions
> --
>
> Key: SPARK-35899
> URL: https://issues.apache.org/jira/browse/SPARK-35899
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> There are more and more places that require converting a v2 connector 
> expression to an internal Catalyst expression. We need to build a utility 
> method to avoid having the same logic in a lot of places.
> See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35899:


Assignee: (was: Apache Spark)

> Add a utility to convert connector expressions to Catalyst expressions
> --
>
> Key: SPARK-35899
> URL: https://issues.apache.org/jira/browse/SPARK-35899
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> There are more and more places that require converting a v2 connector 
> expression to an internal Catalyst expression. We need to build a utility 
> method to avoid having the same logic in a lot of places.
> See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35339) Improve unit tests for data-type-based basic operations

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369753#comment-17369753
 ] 

Apache Spark commented on SPARK-35339:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/33095

> Improve unit tests for data-type-based basic operations
> ---
>
> Key: SPARK-35339
> URL: https://issues.apache.org/jira/browse/SPARK-35339
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Unit tests for arithmetic operations are scattered in the codebase: 
>  * pyspark/pandas/tests/test_ops_on_diff_frames.py
>  * pyspark/pandas/tests/test_dataframe.py
>  * pyspark/pandas/tests/test_series.py
>  * (Upcoming) pyspark/pandas/tests/data_type_ops/
> We wanted to consolidate them.
> The code would be cleaner and easier to maintain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35339) Improve unit tests for data-type-based basic operations

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35339:


Assignee: (was: Apache Spark)

> Improve unit tests for data-type-based basic operations
> ---
>
> Key: SPARK-35339
> URL: https://issues.apache.org/jira/browse/SPARK-35339
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Unit tests for arithmetic operations are scattered in the codebase: 
>  * pyspark/pandas/tests/test_ops_on_diff_frames.py
>  * pyspark/pandas/tests/test_dataframe.py
>  * pyspark/pandas/tests/test_series.py
>  * (Upcoming) pyspark/pandas/tests/data_type_ops/
> We wanted to consolidate them.
> The code would be cleaner and easier to maintain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35339) Improve unit tests for data-type-based basic operations

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35339:


Assignee: Apache Spark

> Improve unit tests for data-type-based basic operations
> ---
>
> Key: SPARK-35339
> URL: https://issues.apache.org/jira/browse/SPARK-35339
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Unit tests for arithmetic operations are scattered in the codebase: 
>  * pyspark/pandas/tests/test_ops_on_diff_frames.py
>  * pyspark/pandas/tests/test_dataframe.py
>  * pyspark/pandas/tests/test_series.py
>  * (Upcoming) pyspark/pandas/tests/data_type_ops/
> We wanted to consolidate them.
> The code would be cleaner and easier to maintain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35339) Improve unit tests for data-type-based basic operations

2021-06-25 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35339:
-
Summary: Improve unit tests for data-type-based basic operations  (was: 
Consolidate unit tests for arithmetic operations)

> Improve unit tests for data-type-based basic operations
> ---
>
> Key: SPARK-35339
> URL: https://issues.apache.org/jira/browse/SPARK-35339
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Unit tests for arithmetic operations are scattered in the codebase: 
>  * pyspark/pandas/tests/test_ops_on_diff_frames.py
>  * pyspark/pandas/tests/test_dataframe.py
>  * pyspark/pandas/tests/test_series.py
>  * (Upcoming) pyspark/pandas/tests/data_type_ops/
> We wanted to consolidate them.
> The code would be cleaner and easier to maintain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35466) Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35466:


Assignee: Apache Spark

> Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.
> -
>
> Key: SPARK-35466
> URL: https://issues.apache.org/jira/browse/SPARK-35466
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35466) Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35466:


Assignee: (was: Apache Spark)

> Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.
> -
>
> Key: SPARK-35466
> URL: https://issues.apache.org/jira/browse/SPARK-35466
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35466) Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369654#comment-17369654
 ] 

Apache Spark commented on SPARK-35466:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33094

> Enable disallow_untyped_defs mypy check for pyspark.pandas.data_type_ops.
> -
>
> Key: SPARK-35466
> URL: https://issues.apache.org/jira/browse/SPARK-35466
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35900) Consider whether we need to broadcast dynamic filtering subquery results for v2 tables

2021-06-25 Thread Anton Okolnychyi (Jira)
Anton Okolnychyi created SPARK-35900:


 Summary: Consider whether we need to broadcast dynamic filtering 
subquery results for v2 tables
 Key: SPARK-35900
 URL: https://issues.apache.org/jira/browse/SPARK-35900
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Anton Okolnychyi


We don't necessarily have to broadcast the results of dynamic filtering 
subqueries for v2 tables.

See [here|https://github.com/apache/spark/pull/32921#discussion_r653212341].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35896) [SS] Include more granular metrics for stateful operators in StreamingQueryProgress

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369627#comment-17369627
 ] 

Apache Spark commented on SPARK-35896:
--

User 'vkorukanti' has created a pull request for this issue:
https://github.com/apache/spark/pull/33091

> [SS] Include more granular metrics for stateful operators in 
> StreamingQueryProgress
> ---
>
> Key: SPARK-35896
> URL: https://issues.apache.org/jira/browse/SPARK-35896
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Venki Korukanti
>Priority: Major
>
> Currently the streaming progress is missing a few important stateful operator 
> metrics in {{StateOperatorProgress}}. Each stateful operator consists of 
> multiple steps. Ex: {{flatMapGroupsWithState}} has two major steps: 1) 
> processing the input and 2) timeout processing to remove entries from the 
> state which have expired. The main motivation is to track down the time it 
> took for each individual step (such as timeout processing, watermark 
> processing etc) and how much data is processed to pinpoint the bottlenecks 
> and compare for reasoning why some microbatches are slow compared to others 
> in the same job.
> Below are the final metrics common to all stateful operators (the one in 
> _*bold-italic*_ are proposed new). These metrics are in 
> {{StateOperatorProgress}} which is part of {{StreamingQueryProgress}}.
>  * _*operatorName*_ - State operator name. Can help us identify any operator 
> specific slowness and state store usage patterns. Ex. "dedupe" (derived using 
> {{StateStoreWriter.shortName}})
>  * _numRowsTotal_ - number of rows in the state store across all tasks in a 
> stage where the operator has executed.
>  * _numRowsUpdated_ - number of rows added to or update in the store
>  * _*allUpdatesTimeMs*_ - time taken to add new rows or update existing state 
> store rows across all tasks in a stage where the operator has executed.
>  * _*numRowsRemoved*_ - number of rows deleted from state store as part of 
> the state cleanup mechanism across all tasks in a stage where the operator 
> has executed. This number helps measure the state store deletions and impact 
> on checkpoint commit and other latencies.
>  * _*allRemovalsTimeMs*_ - time taken to remove the rows from the state store 
> as part of state (also includes the iterating through the entire state store 
> to find which rows to delete) across all tasks in a stage where the operator 
> has executed. If we see jobs spending significant time here, it may justify a 
> better layout in the state store to read only the required rows than the 
> entire state store that is read currently.
>  * _*commitTimeMs*_ - time taken to commit the state store changes to 
> external storage for checkpointing. This is cumulative across all tasks in a 
> stage where this operator has executed.
>  * _*numShufflePartitions*_ - number of shuffle partitions this state 
> operator is part of. Currently the metrics like times are aggregated across 
> all tasks in a stage where the operator has executed. Having the number 
> shuffle partitions (corresponds to number of tasks) helps us find the average 
> task contribution to the metric.
>  * _*numStateStores*_ - number of state stores in the operator across all 
> tasks in the stage. Some stateful operators have more than one state store 
> (eg. stream-stream join). Tracking this number helps us find correlations 
> between state stores instances and microbatch latency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35896) [SS] Include more granular metrics for stateful operators in StreamingQueryProgress

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369626#comment-17369626
 ] 

Apache Spark commented on SPARK-35896:
--

User 'vkorukanti' has created a pull request for this issue:
https://github.com/apache/spark/pull/33091

> [SS] Include more granular metrics for stateful operators in 
> StreamingQueryProgress
> ---
>
> Key: SPARK-35896
> URL: https://issues.apache.org/jira/browse/SPARK-35896
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Venki Korukanti
>Priority: Major
>
> Currently the streaming progress is missing a few important stateful operator 
> metrics in {{StateOperatorProgress}}. Each stateful operator consists of 
> multiple steps. Ex: {{flatMapGroupsWithState}} has two major steps: 1) 
> processing the input and 2) timeout processing to remove entries from the 
> state which have expired. The main motivation is to track down the time it 
> took for each individual step (such as timeout processing, watermark 
> processing etc) and how much data is processed to pinpoint the bottlenecks 
> and compare for reasoning why some microbatches are slow compared to others 
> in the same job.
> Below are the final metrics common to all stateful operators (the one in 
> _*bold-italic*_ are proposed new). These metrics are in 
> {{StateOperatorProgress}} which is part of {{StreamingQueryProgress}}.
>  * _*operatorName*_ - State operator name. Can help us identify any operator 
> specific slowness and state store usage patterns. Ex. "dedupe" (derived using 
> {{StateStoreWriter.shortName}})
>  * _numRowsTotal_ - number of rows in the state store across all tasks in a 
> stage where the operator has executed.
>  * _numRowsUpdated_ - number of rows added to or update in the store
>  * _*allUpdatesTimeMs*_ - time taken to add new rows or update existing state 
> store rows across all tasks in a stage where the operator has executed.
>  * _*numRowsRemoved*_ - number of rows deleted from state store as part of 
> the state cleanup mechanism across all tasks in a stage where the operator 
> has executed. This number helps measure the state store deletions and impact 
> on checkpoint commit and other latencies.
>  * _*allRemovalsTimeMs*_ - time taken to remove the rows from the state store 
> as part of state (also includes the iterating through the entire state store 
> to find which rows to delete) across all tasks in a stage where the operator 
> has executed. If we see jobs spending significant time here, it may justify a 
> better layout in the state store to read only the required rows than the 
> entire state store that is read currently.
>  * _*commitTimeMs*_ - time taken to commit the state store changes to 
> external storage for checkpointing. This is cumulative across all tasks in a 
> stage where this operator has executed.
>  * _*numShufflePartitions*_ - number of shuffle partitions this state 
> operator is part of. Currently the metrics like times are aggregated across 
> all tasks in a stage where the operator has executed. Having the number 
> shuffle partitions (corresponds to number of tasks) helps us find the average 
> task contribution to the metric.
>  * _*numStateStores*_ - number of state stores in the operator across all 
> tasks in the stage. Some stateful operators have more than one state store 
> (eg. stream-stream join). Tracking this number helps us find correlations 
> between state stores instances and microbatch latency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35897) Support user defined initial state with flatMapGroupsWithState in Structured Streaming

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35897:


Assignee: (was: Apache Spark)

> Support user defined initial state with flatMapGroupsWithState in Structured 
> Streaming
> --
>
> Key: SPARK-35897
> URL: https://issues.apache.org/jira/browse/SPARK-35897
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Rahul Shivu Mahadev
>Priority: Major
> Fix For: 3.2.0
>
>
> Structured Streaming supports arbitrary stateful processing using 
> mapGroupsWithState and flatMapGroupWithState operators. The state is created 
> by processing the data that comes in with every batch. This API improvement 
> will allow users to specify an initial state which is applied at the time of 
> executing the first batch.
>  
> h2. Proposed new APIs (Scala)
>  
>  
>   def mapGroupsWithState[S: Encoder, U: Encoder](
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)])(
>   func: (K, Iterator[V], GroupState[S]) => U): Dataset[U] 
>  
>   def flatMapGroupsWithState[S: Encoder, U: Encoder](
>   outputMode: OutputMode,
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)])(
>   func: (K, Iterator[V], GroupState[S]) => Iterator[U])
>  
> h2.    Proposed new APIs (Java)
>   
> def mapGroupsWithState[S, U](
>   func: MapGroupsWithStateFunction[K, V, S, U],
>   stateEncoder: Encoder[S],
>   outputEncoder: Encoder[U],
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)]): Dataset[U]
> def flatMapGroupsWithState[S, U](
>   func: FlatMapGroupsWithStateFunction[K, V, S, U],
>   outputMode: OutputMode,
>   stateEncoder: Encoder[S],
>   outputEncoder: Encoder[U],
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)]): Dataset[U]
>  
>    
> h2. Example Usage
>    
> val initialState: Dataset[(String, RunningCount)] = Seq(
>   ("a", new RunningCount(1)),
>  ("b", new RunningCount(1))
> ).toDS()
>  
> val inputData = MemoryStream[String]
> val result =
>   inputData.toDS()
> .groupByKey(x => x)
> .mapGroupsWithState(initialState, timeoutConf)(stateFunc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35897) Support user defined initial state with flatMapGroupsWithState in Structured Streaming

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35897:


Assignee: Apache Spark

> Support user defined initial state with flatMapGroupsWithState in Structured 
> Streaming
> --
>
> Key: SPARK-35897
> URL: https://issues.apache.org/jira/browse/SPARK-35897
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Rahul Shivu Mahadev
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> Structured Streaming supports arbitrary stateful processing using 
> mapGroupsWithState and flatMapGroupWithState operators. The state is created 
> by processing the data that comes in with every batch. This API improvement 
> will allow users to specify an initial state which is applied at the time of 
> executing the first batch.
>  
> h2. Proposed new APIs (Scala)
>  
>  
>   def mapGroupsWithState[S: Encoder, U: Encoder](
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)])(
>   func: (K, Iterator[V], GroupState[S]) => U): Dataset[U] 
>  
>   def flatMapGroupsWithState[S: Encoder, U: Encoder](
>   outputMode: OutputMode,
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)])(
>   func: (K, Iterator[V], GroupState[S]) => Iterator[U])
>  
> h2.    Proposed new APIs (Java)
>   
> def mapGroupsWithState[S, U](
>   func: MapGroupsWithStateFunction[K, V, S, U],
>   stateEncoder: Encoder[S],
>   outputEncoder: Encoder[U],
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)]): Dataset[U]
> def flatMapGroupsWithState[S, U](
>   func: FlatMapGroupsWithStateFunction[K, V, S, U],
>   outputMode: OutputMode,
>   stateEncoder: Encoder[S],
>   outputEncoder: Encoder[U],
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)]): Dataset[U]
>  
>    
> h2. Example Usage
>    
> val initialState: Dataset[(String, RunningCount)] = Seq(
>   ("a", new RunningCount(1)),
>  ("b", new RunningCount(1))
> ).toDS()
>  
> val inputData = MemoryStream[String]
> val result =
>   inputData.toDS()
> .groupByKey(x => x)
> .mapGroupsWithState(initialState, timeoutConf)(stateFunc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35897) Support user defined initial state with flatMapGroupsWithState in Structured Streaming

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369625#comment-17369625
 ] 

Apache Spark commented on SPARK-35897:
--

User 'rahulsmahadev' has created a pull request for this issue:
https://github.com/apache/spark/pull/33093

> Support user defined initial state with flatMapGroupsWithState in Structured 
> Streaming
> --
>
> Key: SPARK-35897
> URL: https://issues.apache.org/jira/browse/SPARK-35897
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Rahul Shivu Mahadev
>Priority: Major
> Fix For: 3.2.0
>
>
> Structured Streaming supports arbitrary stateful processing using 
> mapGroupsWithState and flatMapGroupWithState operators. The state is created 
> by processing the data that comes in with every batch. This API improvement 
> will allow users to specify an initial state which is applied at the time of 
> executing the first batch.
>  
> h2. Proposed new APIs (Scala)
>  
>  
>   def mapGroupsWithState[S: Encoder, U: Encoder](
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)])(
>   func: (K, Iterator[V], GroupState[S]) => U): Dataset[U] 
>  
>   def flatMapGroupsWithState[S: Encoder, U: Encoder](
>   outputMode: OutputMode,
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)])(
>   func: (K, Iterator[V], GroupState[S]) => Iterator[U])
>  
> h2.    Proposed new APIs (Java)
>   
> def mapGroupsWithState[S, U](
>   func: MapGroupsWithStateFunction[K, V, S, U],
>   stateEncoder: Encoder[S],
>   outputEncoder: Encoder[U],
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)]): Dataset[U]
> def flatMapGroupsWithState[S, U](
>   func: FlatMapGroupsWithStateFunction[K, V, S, U],
>   outputMode: OutputMode,
>   stateEncoder: Encoder[S],
>   outputEncoder: Encoder[U],
>   timeoutConf: GroupStateTimeout,
>   initialState: Dataset[(K, S)]): Dataset[U]
>  
>    
> h2. Example Usage
>    
> val initialState: Dataset[(String, RunningCount)] = Seq(
>   ("a", new RunningCount(1)),
>  ("b", new RunningCount(1))
> ).toDS()
>  
> val inputData = MemoryStream[String]
> val result =
>   inputData.toDS()
> .groupByKey(x => x)
> .mapGroupsWithState(initialState, timeoutConf)(stateFunc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33338) GROUP BY using literal map should not fail

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369624#comment-17369624
 ] 

Apache Spark commented on SPARK-8:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/33092

> GROUP BY using literal map should not fail
> --
>
> Key: SPARK-8
> URL: https://issues.apache.org/jira/browse/SPARK-8
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.1, 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 2.4.8, 3.0.2, 3.1.0
>
>
> Apache Spark 2.x ~ 3.0.1 raise`RuntimeException` for the following queries.
> *SQL*
> {code}
> CREATE TABLE t USING ORC AS SELECT map('k1', 'v1') m, 'k1' k
> SELECT map('k1', 'v1')[k] FROM t GROUP BY 1
> SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]
> SELECT map('k1', 'v1')[k] a FROM t GROUP BY a
> {code}
> *ERROR*
> {code}
> Caused by: java.lang.RuntimeException: Couldn't find k#3 in [keys: [k1], 
> values: [v1][k#3]#6]
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:85)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:79)
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
> {code}
> This is a regression from Apache Spark 1.6.x.
> {code}
> scala> sc.version
> res1: String = 1.6.3
> scala> sqlContext.sql("SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 
> 'v1')[k]").show
> +---+
> |_c0|
> +---+
> | v1|
> +---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35896) [SS] Include more granular metrics for stateful operators in StreamingQueryProgress

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35896:


Assignee: (was: Apache Spark)

> [SS] Include more granular metrics for stateful operators in 
> StreamingQueryProgress
> ---
>
> Key: SPARK-35896
> URL: https://issues.apache.org/jira/browse/SPARK-35896
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Venki Korukanti
>Priority: Major
>
> Currently the streaming progress is missing a few important stateful operator 
> metrics in {{StateOperatorProgress}}. Each stateful operator consists of 
> multiple steps. Ex: {{flatMapGroupsWithState}} has two major steps: 1) 
> processing the input and 2) timeout processing to remove entries from the 
> state which have expired. The main motivation is to track down the time it 
> took for each individual step (such as timeout processing, watermark 
> processing etc) and how much data is processed to pinpoint the bottlenecks 
> and compare for reasoning why some microbatches are slow compared to others 
> in the same job.
> Below are the final metrics common to all stateful operators (the one in 
> _*bold-italic*_ are proposed new). These metrics are in 
> {{StateOperatorProgress}} which is part of {{StreamingQueryProgress}}.
>  * _*operatorName*_ - State operator name. Can help us identify any operator 
> specific slowness and state store usage patterns. Ex. "dedupe" (derived using 
> {{StateStoreWriter.shortName}})
>  * _numRowsTotal_ - number of rows in the state store across all tasks in a 
> stage where the operator has executed.
>  * _numRowsUpdated_ - number of rows added to or update in the store
>  * _*allUpdatesTimeMs*_ - time taken to add new rows or update existing state 
> store rows across all tasks in a stage where the operator has executed.
>  * _*numRowsRemoved*_ - number of rows deleted from state store as part of 
> the state cleanup mechanism across all tasks in a stage where the operator 
> has executed. This number helps measure the state store deletions and impact 
> on checkpoint commit and other latencies.
>  * _*allRemovalsTimeMs*_ - time taken to remove the rows from the state store 
> as part of state (also includes the iterating through the entire state store 
> to find which rows to delete) across all tasks in a stage where the operator 
> has executed. If we see jobs spending significant time here, it may justify a 
> better layout in the state store to read only the required rows than the 
> entire state store that is read currently.
>  * _*commitTimeMs*_ - time taken to commit the state store changes to 
> external storage for checkpointing. This is cumulative across all tasks in a 
> stage where this operator has executed.
>  * _*numShufflePartitions*_ - number of shuffle partitions this state 
> operator is part of. Currently the metrics like times are aggregated across 
> all tasks in a stage where the operator has executed. Having the number 
> shuffle partitions (corresponds to number of tasks) helps us find the average 
> task contribution to the metric.
>  * _*numStateStores*_ - number of state stores in the operator across all 
> tasks in the stage. Some stateful operators have more than one state store 
> (eg. stream-stream join). Tracking this number helps us find correlations 
> between state stores instances and microbatch latency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35896) [SS] Include more granular metrics for stateful operators in StreamingQueryProgress

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35896:


Assignee: Apache Spark

> [SS] Include more granular metrics for stateful operators in 
> StreamingQueryProgress
> ---
>
> Key: SPARK-35896
> URL: https://issues.apache.org/jira/browse/SPARK-35896
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Venki Korukanti
>Assignee: Apache Spark
>Priority: Major
>
> Currently the streaming progress is missing a few important stateful operator 
> metrics in {{StateOperatorProgress}}. Each stateful operator consists of 
> multiple steps. Ex: {{flatMapGroupsWithState}} has two major steps: 1) 
> processing the input and 2) timeout processing to remove entries from the 
> state which have expired. The main motivation is to track down the time it 
> took for each individual step (such as timeout processing, watermark 
> processing etc) and how much data is processed to pinpoint the bottlenecks 
> and compare for reasoning why some microbatches are slow compared to others 
> in the same job.
> Below are the final metrics common to all stateful operators (the one in 
> _*bold-italic*_ are proposed new). These metrics are in 
> {{StateOperatorProgress}} which is part of {{StreamingQueryProgress}}.
>  * _*operatorName*_ - State operator name. Can help us identify any operator 
> specific slowness and state store usage patterns. Ex. "dedupe" (derived using 
> {{StateStoreWriter.shortName}})
>  * _numRowsTotal_ - number of rows in the state store across all tasks in a 
> stage where the operator has executed.
>  * _numRowsUpdated_ - number of rows added to or update in the store
>  * _*allUpdatesTimeMs*_ - time taken to add new rows or update existing state 
> store rows across all tasks in a stage where the operator has executed.
>  * _*numRowsRemoved*_ - number of rows deleted from state store as part of 
> the state cleanup mechanism across all tasks in a stage where the operator 
> has executed. This number helps measure the state store deletions and impact 
> on checkpoint commit and other latencies.
>  * _*allRemovalsTimeMs*_ - time taken to remove the rows from the state store 
> as part of state (also includes the iterating through the entire state store 
> to find which rows to delete) across all tasks in a stage where the operator 
> has executed. If we see jobs spending significant time here, it may justify a 
> better layout in the state store to read only the required rows than the 
> entire state store that is read currently.
>  * _*commitTimeMs*_ - time taken to commit the state store changes to 
> external storage for checkpointing. This is cumulative across all tasks in a 
> stage where this operator has executed.
>  * _*numShufflePartitions*_ - number of shuffle partitions this state 
> operator is part of. Currently the metrics like times are aggregated across 
> all tasks in a stage where the operator has executed. Having the number 
> shuffle partitions (corresponds to number of tasks) helps us find the average 
> task contribution to the metric.
>  * _*numStateStores*_ - number of state stores in the operator across all 
> tasks in the stage. Some stateful operators have more than one state store 
> (eg. stream-stream join). Tracking this number helps us find correlations 
> between state stores instances and microbatch latency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35899) Add a utility to convert connector expressions to Catalyst expressions

2021-06-25 Thread Anton Okolnychyi (Jira)
Anton Okolnychyi created SPARK-35899:


 Summary: Add a utility to convert connector expressions to 
Catalyst expressions
 Key: SPARK-35899
 URL: https://issues.apache.org/jira/browse/SPARK-35899
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Anton Okolnychyi


There are more and more places that require converting a v2 connector 
expression to an internal Catalyst expression. We need to build a utility 
method to avoid having the same logic in a lot of places.

See [here|https://github.com/apache/spark/pull/32921#discussion_r653129793].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369565#comment-17369565
 ] 

Apache Spark commented on SPARK-35672:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/33090

> Spark fails to launch executors with very large user classpath lists on YARN
> 
>
> Key: SPARK-35672
> URL: https://issues.apache.org/jira/browse/SPARK-35672
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 3.1.2
> Environment: Linux RHEL7
> Spark 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.2.0
>
>
> When running Spark on YARN, the {{user-class-path}} argument to 
> {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to 
> executor processes. The argument is specified once for each JAR, and the URIs 
> are fully-qualified, so the paths can be quite long. With large user JAR 
> lists (say 1000+), this can result in system-level argument length limits 
> being exceeded, typically manifesting as the error message:
> {code}
> /bin/bash: Argument list too long
> {code}
> A [Google 
> search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22]
>  indicates that this is not a theoretical problem and afflicts real users, 
> including ours. This issue was originally observed on Spark 2.3, but has been 
> confirmed to exist in the master branch as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN

2021-06-25 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369564#comment-17369564
 ] 

Erik Krogen commented on SPARK-35672:
-

#32810 went into master.

Put up #33090 for branch-3.1

> Spark fails to launch executors with very large user classpath lists on YARN
> 
>
> Key: SPARK-35672
> URL: https://issues.apache.org/jira/browse/SPARK-35672
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 3.1.2
> Environment: Linux RHEL7
> Spark 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.2.0
>
>
> When running Spark on YARN, the {{user-class-path}} argument to 
> {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to 
> executor processes. The argument is specified once for each JAR, and the URIs 
> are fully-qualified, so the paths can be quite long. With large user JAR 
> lists (say 1000+), this can result in system-level argument length limits 
> being exceeded, typically manifesting as the error message:
> {code}
> /bin/bash: Argument list too long
> {code}
> A [Google 
> search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22]
>  indicates that this is not a theoretical problem and afflicts real users, 
> including ours. This issue was originally observed on Spark 2.3, but has been 
> confirmed to exist in the master branch as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35898) Converting arrays with RowToColumnConverter triggers assertion

2021-06-25 Thread Tom van Bussel (Jira)
Tom van Bussel created SPARK-35898:
--

 Summary: Converting arrays with RowToColumnConverter triggers 
assertion
 Key: SPARK-35898
 URL: https://issues.apache.org/jira/browse/SPARK-35898
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2
Reporter: Tom van Bussel


When trying to convert a row that contains an array to a ColumnVector with 
RowToColumnConverter the following error is thrown:
{code:java}
java.lang.AssertionError at 
org.apache.spark.sql.execution.vectorized.OffHeapColumnVector.putArray(OffHeapColumnVector.java:560)
 at 
org.apache.spark.sql.execution.vectorized.WritableColumnVector.appendArray(WritableColumnVector.java:622)
 at 
org.apache.spark.sql.execution.RowToColumnConverter$ArrayConverter.append(Columnar.scala:353)
 at 
org.apache.spark.sql.execution.RowToColumnConverter$BasicNullableTypeConverter.append(Columnar.scala:241)
 at 
org.apache.spark.sql.execution.RowToColumnConverter.convert(Columnar.scala:221)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35898) Converting arrays with RowToColumnConverter triggers assertion

2021-06-25 Thread Tom van Bussel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369553#comment-17369553
 ] 

Tom van Bussel commented on SPARK-35898:


I will open a PR with the fix.

> Converting arrays with RowToColumnConverter triggers assertion
> --
>
> Key: SPARK-35898
> URL: https://issues.apache.org/jira/browse/SPARK-35898
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Tom van Bussel
>Priority: Major
>
> When trying to convert a row that contains an array to a ColumnVector with 
> RowToColumnConverter the following error is thrown:
> {code:java}
> java.lang.AssertionError at 
> org.apache.spark.sql.execution.vectorized.OffHeapColumnVector.putArray(OffHeapColumnVector.java:560)
>  at 
> org.apache.spark.sql.execution.vectorized.WritableColumnVector.appendArray(WritableColumnVector.java:622)
>  at 
> org.apache.spark.sql.execution.RowToColumnConverter$ArrayConverter.append(Columnar.scala:353)
>  at 
> org.apache.spark.sql.execution.RowToColumnConverter$BasicNullableTypeConverter.append(Columnar.scala:241)
>  at 
> org.apache.spark.sql.execution.RowToColumnConverter.convert(Columnar.scala:221)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35897) Support user defined initial state with flatMapGroupsWithState in Structured Streaming

2021-06-25 Thread Rahul Shivu Mahadev (Jira)
Rahul Shivu Mahadev created SPARK-35897:
---

 Summary: Support user defined initial state with 
flatMapGroupsWithState in Structured Streaming
 Key: SPARK-35897
 URL: https://issues.apache.org/jira/browse/SPARK-35897
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.1.2
Reporter: Rahul Shivu Mahadev
 Fix For: 3.2.0


Structured Streaming supports arbitrary stateful processing using 
mapGroupsWithState and flatMapGroupWithState operators. The state is created by 
processing the data that comes in with every batch. This API improvement will 
allow users to specify an initial state which is applied at the time of 
executing the first batch.

 
h2. Proposed new APIs (Scala)

 

 

  def mapGroupsWithState[S: Encoder, U: Encoder](

  timeoutConf: GroupStateTimeout,

  initialState: Dataset[(K, S)])(

  func: (K, Iterator[V], GroupState[S]) => U): Dataset[U] 

 

  def flatMapGroupsWithState[S: Encoder, U: Encoder](

  outputMode: OutputMode,

  timeoutConf: GroupStateTimeout,

  initialState: Dataset[(K, S)])(

  func: (K, Iterator[V], GroupState[S]) => Iterator[U])

 
h2.    Proposed new APIs (Java)

  

def mapGroupsWithState[S, U](

  func: MapGroupsWithStateFunction[K, V, S, U],

  stateEncoder: Encoder[S],

  outputEncoder: Encoder[U],

  timeoutConf: GroupStateTimeout,

  initialState: Dataset[(K, S)]): Dataset[U]





def flatMapGroupsWithState[S, U](

  func: FlatMapGroupsWithStateFunction[K, V, S, U],

  outputMode: OutputMode,

  stateEncoder: Encoder[S],

  outputEncoder: Encoder[U],

  timeoutConf: GroupStateTimeout,

  initialState: Dataset[(K, S)]): Dataset[U]

 

   
h2. Example Usage

   

val initialState: Dataset[(String, RunningCount)] = Seq(

  ("a", new RunningCount(1)),

 ("b", new RunningCount(1))

).toDS()

 

val inputData = MemoryStream[String]

val result =

  inputData.toDS()

.groupByKey(x => x)

.mapGroupsWithState(initialState, timeoutConf)(stateFunc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35863) Upgrade Ivy to 2.5.0

2021-06-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35863.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33088
[https://github.com/apache/spark/pull/33088]

> Upgrade Ivy to 2.5.0
> 
>
> Key: SPARK-35863
> URL: https://issues.apache.org/jira/browse/SPARK-35863
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.1.2
>Reporter: Adam Binford
>Assignee: Adam Binford
>Priority: Minor
> Fix For: 3.2.0
>
>
> Apache Ivy 2.5.0 was released nearly two years ago. The new bug fixes and 
> features can be found here: 
> [https://ant.apache.org/ivy/history/latest-milestone/release-notes.html]
> Most notably, the adding of ivy.maven.lookup.sources and 
> ivy.maven.lookup.javadoc configs can significantly speed up module resolution 
> time if these are turned off, especially behind a proxy. These could arguably 
> be turned off by default, because when submitting jobs you probably don't 
> care about the sources or javadoc jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35863) Upgrade Ivy to 2.5.0

2021-06-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35863:
-

Assignee: Adam Binford

> Upgrade Ivy to 2.5.0
> 
>
> Key: SPARK-35863
> URL: https://issues.apache.org/jira/browse/SPARK-35863
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.1.2
>Reporter: Adam Binford
>Assignee: Adam Binford
>Priority: Minor
>
> Apache Ivy 2.5.0 was released nearly two years ago. The new bug fixes and 
> features can be found here: 
> [https://ant.apache.org/ivy/history/latest-milestone/release-notes.html]
> Most notably, the adding of ivy.maven.lookup.sources and 
> ivy.maven.lookup.javadoc configs can significantly speed up module resolution 
> time if these are turned off, especially behind a proxy. These could arguably 
> be turned off by default, because when submitting jobs you probably don't 
> care about the sources or javadoc jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35896) [SS] Include more granular metrics for stateful operators in StreamingQueryProgress

2021-06-25 Thread Venki Korukanti (Jira)
Venki Korukanti created SPARK-35896:
---

 Summary: [SS] Include more granular metrics for stateful operators 
in StreamingQueryProgress
 Key: SPARK-35896
 URL: https://issues.apache.org/jira/browse/SPARK-35896
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.1.2
Reporter: Venki Korukanti


Currently the streaming progress is missing a few important stateful operator 
metrics in {{StateOperatorProgress}}. Each stateful operator consists of 
multiple steps. Ex: {{flatMapGroupsWithState}} has two major steps: 1) 
processing the input and 2) timeout processing to remove entries from the state 
which have expired. The main motivation is to track down the time it took for 
each individual step (such as timeout processing, watermark processing etc) and 
how much data is processed to pinpoint the bottlenecks and compare for 
reasoning why some microbatches are slow compared to others in the same job.

Below are the final metrics common to all stateful operators (the one in 
_*bold-italic*_ are proposed new). These metrics are in 
{{StateOperatorProgress}} which is part of {{StreamingQueryProgress}}.
 * _*operatorName*_ - State operator name. Can help us identify any operator 
specific slowness and state store usage patterns. Ex. "dedupe" (derived using 
{{StateStoreWriter.shortName}})
 * _numRowsTotal_ - number of rows in the state store across all tasks in a 
stage where the operator has executed.
 * _numRowsUpdated_ - number of rows added to or update in the store
 * _*allUpdatesTimeMs*_ - time taken to add new rows or update existing state 
store rows across all tasks in a stage where the operator has executed.
 * _*numRowsRemoved*_ - number of rows deleted from state store as part of the 
state cleanup mechanism across all tasks in a stage where the operator has 
executed. This number helps measure the state store deletions and impact on 
checkpoint commit and other latencies.
 * _*allRemovalsTimeMs*_ - time taken to remove the rows from the state store 
as part of state (also includes the iterating through the entire state store to 
find which rows to delete) across all tasks in a stage where the operator has 
executed. If we see jobs spending significant time here, it may justify a 
better layout in the state store to read only the required rows than the entire 
state store that is read currently.
 * _*commitTimeMs*_ - time taken to commit the state store changes to external 
storage for checkpointing. This is cumulative across all tasks in a stage where 
this operator has executed.
 * _*numShufflePartitions*_ - number of shuffle partitions this state operator 
is part of. Currently the metrics like times are aggregated across all tasks in 
a stage where the operator has executed. Having the number shuffle partitions 
(corresponds to number of tasks) helps us find the average task contribution to 
the metric.
 * _*numStateStores*_ - number of state stores in the operator across all tasks 
in the stage. Some stateful operators have more than one state store (eg. 
stream-stream join). Tracking this number helps us find correlations between 
state stores instances and microbatch latency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35892) numPartitions does not work when saves the RDD to database

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369487#comment-17369487
 ] 

Apache Spark commented on SPARK-35892:
--

User 'zengruios' has created a pull request for this issue:
https://github.com/apache/spark/pull/33089

> numPartitions does not work when saves the RDD to database
> --
>
> Key: SPARK-35892
> URL: https://issues.apache.org/jira/browse/SPARK-35892
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: zengrui
>Priority: Major
>
> When use SQL to insert data in spark to database, suppose the original RDD's 
> partition num is 10, and when i set the numPartitions to 20 in SQL(because i 
> need more parallelism to insert data to database), but it does not work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35892) numPartitions does not work when saves the RDD to database

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369490#comment-17369490
 ] 

Apache Spark commented on SPARK-35892:
--

User 'zengruios' has created a pull request for this issue:
https://github.com/apache/spark/pull/33089

> numPartitions does not work when saves the RDD to database
> --
>
> Key: SPARK-35892
> URL: https://issues.apache.org/jira/browse/SPARK-35892
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: zengrui
>Priority: Major
>
> When use SQL to insert data in spark to database, suppose the original RDD's 
> partition num is 10, and when i set the numPartitions to 20 in SQL(because i 
> need more parallelism to insert data to database), but it does not work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35892) numPartitions does not work when saves the RDD to database

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35892:


Assignee: (was: Apache Spark)

> numPartitions does not work when saves the RDD to database
> --
>
> Key: SPARK-35892
> URL: https://issues.apache.org/jira/browse/SPARK-35892
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: zengrui
>Priority: Major
>
> When use SQL to insert data in spark to database, suppose the original RDD's 
> partition num is 10, and when i set the numPartitions to 20 in SQL(because i 
> need more parallelism to insert data to database), but it does not work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35892) numPartitions does not work when saves the RDD to database

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35892:


Assignee: Apache Spark

> numPartitions does not work when saves the RDD to database
> --
>
> Key: SPARK-35892
> URL: https://issues.apache.org/jira/browse/SPARK-35892
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: zengrui
>Assignee: Apache Spark
>Priority: Major
>
> When use SQL to insert data in spark to database, suppose the original RDD's 
> partition num is 10, and when i set the numPartitions to 20 in SQL(because i 
> need more parallelism to insert data to database), but it does not work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34927) Support TPCDSQueryBenchmark in Benchmarks

2021-06-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369486#comment-17369486
 ] 

Dongjoon Hyun commented on SPARK-34927:
---

No problem~ I was just curious if we can have this at Apache Spark 3.2.0. :)

> Support TPCDSQueryBenchmark in Benchmarks
> -
>
> Key: SPARK-34927
> URL: https://issues.apache.org/jira/browse/SPARK-34927
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Benchmarks.scala currently does not support TPCDSQueryBenchmark. We should 
> make it supported. See also 
> https://github.com/apache/spark/pull/32015#issuecomment-89046



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN

2021-06-25 Thread Thomas Graves (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-35672.
---
Fix Version/s: 3.2.0
 Assignee: Erik Krogen
   Resolution: Fixed

> Spark fails to launch executors with very large user classpath lists on YARN
> 
>
> Key: SPARK-35672
> URL: https://issues.apache.org/jira/browse/SPARK-35672
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 3.1.2
> Environment: Linux RHEL7
> Spark 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.2.0
>
>
> When running Spark on YARN, the {{user-class-path}} argument to 
> {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to 
> executor processes. The argument is specified once for each JAR, and the URIs 
> are fully-qualified, so the paths can be quite long. With large user JAR 
> lists (say 1000+), this can result in system-level argument length limits 
> being exceeded, typically manifesting as the error message:
> {code}
> /bin/bash: Argument list too long
> {code}
> A [Google 
> search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22]
>  indicates that this is not a theoretical problem and afflicts real users, 
> including ours. This issue was originally observed on Spark 2.3, but has been 
> confirmed to exist in the master branch as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35893) No Unit Test case for MySQLDialect.getCatalystType

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35893:


Assignee: (was: Apache Spark)

> No Unit Test case for MySQLDialect.getCatalystType 
> ---
>
> Key: SPARK-35893
> URL: https://issues.apache.org/jira/browse/SPARK-35893
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.1, 3.1.2
>Reporter: zengrui
>Priority: Minor
>
> No Unit Test case for MySQLDialect.getCatalystType. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35863) Upgrade Ivy to 2.5.0

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369473#comment-17369473
 ] 

Apache Spark commented on SPARK-35863:
--

User 'Kimahriman' has created a pull request for this issue:
https://github.com/apache/spark/pull/33088

> Upgrade Ivy to 2.5.0
> 
>
> Key: SPARK-35863
> URL: https://issues.apache.org/jira/browse/SPARK-35863
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.1.2
>Reporter: Adam Binford
>Priority: Minor
>
> Apache Ivy 2.5.0 was released nearly two years ago. The new bug fixes and 
> features can be found here: 
> [https://ant.apache.org/ivy/history/latest-milestone/release-notes.html]
> Most notably, the adding of ivy.maven.lookup.sources and 
> ivy.maven.lookup.javadoc configs can significantly speed up module resolution 
> time if these are turned off, especially behind a proxy. These could arguably 
> be turned off by default, because when submitting jobs you probably don't 
> care about the sources or javadoc jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35863) Upgrade Ivy to 2.5.0

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35863:


Assignee: Apache Spark

> Upgrade Ivy to 2.5.0
> 
>
> Key: SPARK-35863
> URL: https://issues.apache.org/jira/browse/SPARK-35863
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.1.2
>Reporter: Adam Binford
>Assignee: Apache Spark
>Priority: Minor
>
> Apache Ivy 2.5.0 was released nearly two years ago. The new bug fixes and 
> features can be found here: 
> [https://ant.apache.org/ivy/history/latest-milestone/release-notes.html]
> Most notably, the adding of ivy.maven.lookup.sources and 
> ivy.maven.lookup.javadoc configs can significantly speed up module resolution 
> time if these are turned off, especially behind a proxy. These could arguably 
> be turned off by default, because when submitting jobs you probably don't 
> care about the sources or javadoc jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35893) No Unit Test case for MySQLDialect.getCatalystType

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369474#comment-17369474
 ] 

Apache Spark commented on SPARK-35893:
--

User 'zengruios' has created a pull request for this issue:
https://github.com/apache/spark/pull/33087

> No Unit Test case for MySQLDialect.getCatalystType 
> ---
>
> Key: SPARK-35893
> URL: https://issues.apache.org/jira/browse/SPARK-35893
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.1, 3.1.2
>Reporter: zengrui
>Priority: Minor
>
> No Unit Test case for MySQLDialect.getCatalystType. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35863) Upgrade Ivy to 2.5.0

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35863:


Assignee: (was: Apache Spark)

> Upgrade Ivy to 2.5.0
> 
>
> Key: SPARK-35863
> URL: https://issues.apache.org/jira/browse/SPARK-35863
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.1.2
>Reporter: Adam Binford
>Priority: Minor
>
> Apache Ivy 2.5.0 was released nearly two years ago. The new bug fixes and 
> features can be found here: 
> [https://ant.apache.org/ivy/history/latest-milestone/release-notes.html]
> Most notably, the adding of ivy.maven.lookup.sources and 
> ivy.maven.lookup.javadoc configs can significantly speed up module resolution 
> time if these are turned off, especially behind a proxy. These could arguably 
> be turned off by default, because when submitting jobs you probably don't 
> care about the sources or javadoc jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35893) No Unit Test case for MySQLDialect.getCatalystType

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35893:


Assignee: Apache Spark

> No Unit Test case for MySQLDialect.getCatalystType 
> ---
>
> Key: SPARK-35893
> URL: https://issues.apache.org/jira/browse/SPARK-35893
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.1, 3.1.2
>Reporter: zengrui
>Assignee: Apache Spark
>Priority: Minor
>
> No Unit Test case for MySQLDialect.getCatalystType. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35878) add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null

2021-06-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35878:
-

Assignee: Steve Loughran

> add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null
> ---
>
> Key: SPARK-35878
> URL: https://issues.apache.org/jira/browse/SPARK-35878
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> People working with S3A and hadoop 3.3.1 outside of EC2, and without the AWS 
> CLI setup, are likely to hit: HADOOP-17771
> It should be straightforward to fix up the config similar to 
> SPARK-35868...this will be backwards (harmless) and forwards compatible (it's 
> the recommended workaround)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35878) add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null

2021-06-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35878.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33064
[https://github.com/apache/spark/pull/33064]

> add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null
> ---
>
> Key: SPARK-35878
> URL: https://issues.apache.org/jira/browse/SPARK-35878
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.2.0
>
>
> People working with S3A and hadoop 3.3.1 outside of EC2, and without the AWS 
> CLI setup, are likely to hit: HADOOP-17771
> It should be straightforward to fix up the config similar to 
> SPARK-35868...this will be backwards (harmless) and forwards compatible (it's 
> the recommended workaround)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35895) Support subtracting Intervals from TimestampWithoutTZ

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35895:


Assignee: Gengliang Wang  (was: Apache Spark)

> Support subtracting Intervals from TimestampWithoutTZ
> -
>
> Key: SPARK-35895
> URL: https://issues.apache.org/jira/browse/SPARK-35895
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Support and test the following operations:
> TimestampWithoutTZ - Calendar interval
> TimestampWithoutTZ - Year-Month interval
> TimestampWithoutTZ - Daytime interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35895) Support subtracting Intervals from TimestampWithoutTZ

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369436#comment-17369436
 ] 

Apache Spark commented on SPARK-35895:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33086

> Support subtracting Intervals from TimestampWithoutTZ
> -
>
> Key: SPARK-35895
> URL: https://issues.apache.org/jira/browse/SPARK-35895
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Support and test the following operations:
> TimestampWithoutTZ - Calendar interval
> TimestampWithoutTZ - Year-Month interval
> TimestampWithoutTZ - Daytime interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35895) Support subtracting Intervals from TimestampWithoutTZ

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35895:


Assignee: Apache Spark  (was: Gengliang Wang)

> Support subtracting Intervals from TimestampWithoutTZ
> -
>
> Key: SPARK-35895
> URL: https://issues.apache.org/jira/browse/SPARK-35895
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Support and test the following operations:
> TimestampWithoutTZ - Calendar interval
> TimestampWithoutTZ - Year-Month interval
> TimestampWithoutTZ - Daytime interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35895) Support subtracting Intervals from TimestampWithoutTZ

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35895:


Assignee: Apache Spark  (was: Gengliang Wang)

> Support subtracting Intervals from TimestampWithoutTZ
> -
>
> Key: SPARK-35895
> URL: https://issues.apache.org/jira/browse/SPARK-35895
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Support and test the following operations:
> TimestampWithoutTZ - Calendar interval
> TimestampWithoutTZ - Year-Month interval
> TimestampWithoutTZ - Daytime interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq/IndexedSeq

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35894:


Assignee: (was: Apache Spark)

> Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
> -
>
> Key: SPARK-35894
> URL: https://issues.apache.org/jira/browse/SPARK-35894
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Due to the changes on Scala 2.13, importing scala.collection.Seq or 
> scala.collection.IndexedSeq would bring weird issue.
> (It doesn't bring issue on Scala 2.12 so it's high likely we couldn't 
> indicate the problem till we see complication failure in Scala 2.13.)
> Please refer below page to see the details of changes around Seq.
> https://docs.scala-lang.org/overviews/core/collections-migration-213.html
> It would be nice if we can prevent the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq/IndexedSeq

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369435#comment-17369435
 ] 

Apache Spark commented on SPARK-35894:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/33085

> Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
> -
>
> Key: SPARK-35894
> URL: https://issues.apache.org/jira/browse/SPARK-35894
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Due to the changes on Scala 2.13, importing scala.collection.Seq or 
> scala.collection.IndexedSeq would bring weird issue.
> (It doesn't bring issue on Scala 2.12 so it's high likely we couldn't 
> indicate the problem till we see complication failure in Scala 2.13.)
> Please refer below page to see the details of changes around Seq.
> https://docs.scala-lang.org/overviews/core/collections-migration-213.html
> It would be nice if we can prevent the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35895) Support subtracting Intervals from TimestampWithoutTZ

2021-06-25 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35895:
---
Summary: Support subtracting Intervals from TimestampWithoutTZ  (was: 
Support subtract Intervals from TimestampWithoutTZ)

> Support subtracting Intervals from TimestampWithoutTZ
> -
>
> Key: SPARK-35895
> URL: https://issues.apache.org/jira/browse/SPARK-35895
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Support and test the following operations:
> TimestampWithoutTZ - Calendar interval
> TimestampWithoutTZ - Year-Month interval
> TimestampWithoutTZ - Daytime interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq/IndexedSeq

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35894:


Assignee: Apache Spark

> Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
> -
>
> Key: SPARK-35894
> URL: https://issues.apache.org/jira/browse/SPARK-35894
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> Due to the changes on Scala 2.13, importing scala.collection.Seq or 
> scala.collection.IndexedSeq would bring weird issue.
> (It doesn't bring issue on Scala 2.12 so it's high likely we couldn't 
> indicate the problem till we see complication failure in Scala 2.13.)
> Please refer below page to see the details of changes around Seq.
> https://docs.scala-lang.org/overviews/core/collections-migration-213.html
> It would be nice if we can prevent the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35895) Support subtract Intervals from TimestampWithoutTZ

2021-06-25 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-35895:
--

 Summary: Support subtract Intervals from TimestampWithoutTZ
 Key: SPARK-35895
 URL: https://issues.apache.org/jira/browse/SPARK-35895
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Support and test the following operations:
TimestampWithoutTZ - Calendar interval
TimestampWithoutTZ - Year-Month interval
TimestampWithoutTZ - Daytime interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq/IndexedSeq

2021-06-25 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-35894:
-
Summary: Introduce new style enforce to not import 
scala.collection.Seq/IndexedSeq  (was: Introduce new style enforce to not 
import scala.collection.Seq)

> Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
> -
>
> Key: SPARK-35894
> URL: https://issues.apache.org/jira/browse/SPARK-35894
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Due to the changes on Scala 2.13, importing scala.collection.Seq or 
> scala.collection.IndexedSeq would bring weird issue.
> (It doesn't bring issue on Scala 2.12 so it's high likely we couldn't 
> indicate the problem till we see complication failure in Scala 2.13.)
> Please refer below page to see the details of changes around Seq.
> https://docs.scala-lang.org/overviews/core/collections-migration-213.html
> It would be nice if we can prevent the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35894) Introduce new style enforce to not import scala.collection.Seq

2021-06-25 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-35894:


 Summary: Introduce new style enforce to not import 
scala.collection.Seq
 Key: SPARK-35894
 URL: https://issues.apache.org/jira/browse/SPARK-35894
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.0
Reporter: Jungtaek Lim


Due to the changes on Scala 2.13, importing scala.collection.Seq or 
scala.collection.IndexedSeq would bring weird issue.
(It doesn't bring issue on Scala 2.12 so it's high likely we couldn't indicate 
the problem till we see complication failure in Scala 2.13.)

Please refer below page to see the details of changes around Seq.
https://docs.scala-lang.org/overviews/core/collections-migration-213.html

It would be nice if we can prevent the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35889) Support adding TimestampWithoutTZ with Interval types

2021-06-25 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-35889.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33076
[https://github.com/apache/spark/pull/33076]

> Support adding TimestampWithoutTZ with Interval types
> -
>
> Key: SPARK-35889
> URL: https://issues.apache.org/jira/browse/SPARK-35889
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Supprot the following operations:
> * TimestampWithoutTZ + Calendar interval
> * TimestampWithoutTZ + Year-Month interval
> * TimestampWithoutTZ + Daytime interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35628) RocksDBFileManager - load checkpoint from DFS

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369423#comment-17369423
 ] 

Apache Spark commented on SPARK-35628:
--

User 'xuanyuanking' has created a pull request for this issue:
https://github.com/apache/spark/pull/33084

> RocksDBFileManager - load checkpoint from DFS
> -
>
> Key: SPARK-35628
> URL: https://issues.apache.org/jira/browse/SPARK-35628
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.2.0
>
>
> The implementation for the load path of the checkpoint data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35893) No Unit Test case for MySQLDialect.getCatalystType

2021-06-25 Thread zengrui (Jira)
zengrui created SPARK-35893:
---

 Summary: No Unit Test case for MySQLDialect.getCatalystType 
 Key: SPARK-35893
 URL: https://issues.apache.org/jira/browse/SPARK-35893
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 3.1.2, 3.1.1
Reporter: zengrui


No Unit Test case for MySQLDialect.getCatalystType. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35886) Codegen issue for decimal type

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369363#comment-17369363
 ] 

Apache Spark commented on SPARK-35886:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/33082

> Codegen issue for decimal type
> --
>
> Key: SPARK-35886
> URL: https://issues.apache.org/jira/browse/SPARK-35886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:scala}
> spark.sql(
>   """
> |CREATE TABLE t1 (
> |  c1 DECIMAL(18,6),
> |  c2 DECIMAL(18,6),
> |  c3 DECIMAL(18,6))
> |USING parquet;
> |""".stripMargin)
> spark.sql("SELECT sum(c1 * c3) + sum(c2 * c3) FROM t1").show
> {code}
> {noformat}
> 20:23:36.272 ERROR 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 56, Column 6: Expression "agg_exprIsNull_2_0" is not 
> an rvalue
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 6: Expression "agg_exprIsNull_2_0" is not an rvalue
>   at 
> org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12675)
>   at 
> org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7676)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35886) Codegen issue for decimal type

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35886:


Assignee: (was: Apache Spark)

> Codegen issue for decimal type
> --
>
> Key: SPARK-35886
> URL: https://issues.apache.org/jira/browse/SPARK-35886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:scala}
> spark.sql(
>   """
> |CREATE TABLE t1 (
> |  c1 DECIMAL(18,6),
> |  c2 DECIMAL(18,6),
> |  c3 DECIMAL(18,6))
> |USING parquet;
> |""".stripMargin)
> spark.sql("SELECT sum(c1 * c3) + sum(c2 * c3) FROM t1").show
> {code}
> {noformat}
> 20:23:36.272 ERROR 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 56, Column 6: Expression "agg_exprIsNull_2_0" is not 
> an rvalue
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 6: Expression "agg_exprIsNull_2_0" is not an rvalue
>   at 
> org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12675)
>   at 
> org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7676)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35886) Codegen issue for decimal type

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369362#comment-17369362
 ] 

Apache Spark commented on SPARK-35886:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/33082

> Codegen issue for decimal type
> --
>
> Key: SPARK-35886
> URL: https://issues.apache.org/jira/browse/SPARK-35886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:scala}
> spark.sql(
>   """
> |CREATE TABLE t1 (
> |  c1 DECIMAL(18,6),
> |  c2 DECIMAL(18,6),
> |  c3 DECIMAL(18,6))
> |USING parquet;
> |""".stripMargin)
> spark.sql("SELECT sum(c1 * c3) + sum(c2 * c3) FROM t1").show
> {code}
> {noformat}
> 20:23:36.272 ERROR 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 56, Column 6: Expression "agg_exprIsNull_2_0" is not 
> an rvalue
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 6: Expression "agg_exprIsNull_2_0" is not an rvalue
>   at 
> org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12675)
>   at 
> org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7676)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35886) Codegen issue for decimal type

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35886:


Assignee: Apache Spark

> Codegen issue for decimal type
> --
>
> Key: SPARK-35886
> URL: https://issues.apache.org/jira/browse/SPARK-35886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> How to reproduce this issue:
> {code:scala}
> spark.sql(
>   """
> |CREATE TABLE t1 (
> |  c1 DECIMAL(18,6),
> |  c2 DECIMAL(18,6),
> |  c3 DECIMAL(18,6))
> |USING parquet;
> |""".stripMargin)
> spark.sql("SELECT sum(c1 * c3) + sum(c2 * c3) FROM t1").show
> {code}
> {noformat}
> 20:23:36.272 ERROR 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 56, Column 6: Expression "agg_exprIsNull_2_0" is not 
> an rvalue
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 56, Column 6: Expression "agg_exprIsNull_2_0" is not an rvalue
>   at 
> org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12675)
>   at 
> org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7676)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35628) RocksDBFileManager - load checkpoint from DFS

2021-06-25 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-35628:


Assignee: Yuanjian Li

> RocksDBFileManager - load checkpoint from DFS
> -
>
> Key: SPARK-35628
> URL: https://issues.apache.org/jira/browse/SPARK-35628
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
>
> The implementation for the load path of the checkpoint data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35628) RocksDBFileManager - load checkpoint from DFS

2021-06-25 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-35628.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32767
[https://github.com/apache/spark/pull/32767]

> RocksDBFileManager - load checkpoint from DFS
> -
>
> Key: SPARK-35628
> URL: https://issues.apache.org/jira/browse/SPARK-35628
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.2.0
>
>
> The implementation for the load path of the checkpoint data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34893) Support native session window

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34893:


Assignee: Apache Spark

> Support native session window
> -
>
> Key: SPARK-34893
> URL: https://issues.apache.org/jira/browse/SPARK-34893
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> This issue tracks effort on supporting native session window, on both batch 
> query and streaming query.
> This issue is the finalization of SPARK-10816 leveraging SPARK-34888, 
> SPARK-34889, SPARK-35861, SPARK-34891, SPARK-34892.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34893) Support native session window

2021-06-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34893:


Assignee: (was: Apache Spark)

> Support native session window
> -
>
> Key: SPARK-34893
> URL: https://issues.apache.org/jira/browse/SPARK-34893
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> This issue tracks effort on supporting native session window, on both batch 
> query and streaming query.
> This issue is the finalization of SPARK-10816 leveraging SPARK-34888, 
> SPARK-34889, SPARK-35861, SPARK-34891, SPARK-34892.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34893) Support native session window

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369340#comment-17369340
 ] 

Apache Spark commented on SPARK-34893:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/33081

> Support native session window
> -
>
> Key: SPARK-34893
> URL: https://issues.apache.org/jira/browse/SPARK-34893
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> This issue tracks effort on supporting native session window, on both batch 
> query and streaming query.
> This issue is the finalization of SPARK-10816 leveraging SPARK-34888, 
> SPARK-34889, SPARK-35861, SPARK-34891, SPARK-34892.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35892) numPartitions does not work when saves the RDD to database

2021-06-25 Thread zengrui (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zengrui updated SPARK-35892:

Description: 
When use SQL to insert data in spark to database, suppose the original RDD's 
partition num is 10, and when i set the numPartitions to 20 in SQL(because i 
need more parallelism to insert data to database), but it does not work.


  was:
The original RDD's partition num is 10, and if i set the numPartitions to 20 in 
SQL,because i need more parallelism to insert data to database, but it does not 
work.  




> numPartitions does not work when saves the RDD to database
> --
>
> Key: SPARK-35892
> URL: https://issues.apache.org/jira/browse/SPARK-35892
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: zengrui
>Priority: Major
>
> When use SQL to insert data in spark to database, suppose the original RDD's 
> partition num is 10, and when i set the numPartitions to 20 in SQL(because i 
> need more parallelism to insert data to database), but it does not work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35778) Check multiply/divide of year-month intervals of any fields by numeric

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369323#comment-17369323
 ] 

Apache Spark commented on SPARK-35778:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/33080

> Check multiply/divide of year-month intervals of any fields by numeric
> --
>
> Key: SPARK-35778
> URL: https://issues.apache.org/jira/browse/SPARK-35778
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: PengLei
>Priority: Major
> Fix For: 3.2.0
>
>
> Write tests that checks multiply/divide of the following intervals by numeric:
> # INTERVAL YEAR
> # INTERVAL YEAR TO MONTH
> # INTERVAL MONTH



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35728) Check multiply/divide of day-time intervals of any fields by numeric

2021-06-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369321#comment-17369321
 ] 

Apache Spark commented on SPARK-35728:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/33080

> Check multiply/divide of day-time intervals of any fields by numeric
> 
>
> Key: SPARK-35728
> URL: https://issues.apache.org/jira/browse/SPARK-35728
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: PengLei
>Priority: Major
> Fix For: 3.2.0
>
>
> Write tests that checks multiply/divide of the following intervals by numeric:
> # INTERVAL DAY
> # INTERVAL DAY TO HOUR
> # INTERVAL DAY TO MINUTE
> # INTERVAL HOUR
> # INTERVAL HOUR TO MINUTE
> # INTERVAL HOUR TO SECOND
> # INTERVAL MINUTE
> # INTERVAL MINUTE TO SECOND
> # INTERVAL SECOND



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35892) numPartitions does not work when saves the RDD to database

2021-06-25 Thread zengrui (Jira)
zengrui created SPARK-35892:
---

 Summary: numPartitions does not work when saves the RDD to database
 Key: SPARK-35892
 URL: https://issues.apache.org/jira/browse/SPARK-35892
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1
Reporter: zengrui


The original RDD's partition num is 10, and if i set the numPartitions to 20 in 
SQL,because i need more parallelism to insert data to database, but it does not 
work.  





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >