date:20210315

[jira] [Updated] (SPARK-34727) Difference in results of casting float to timestamp

2021-03-15 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34727:
-
Fix Version/s: 3.1.2

> Difference in results of casting float to timestamp
> ---
>
> Key: SPARK-34727
> URL: https://issues.apache.org/jira/browse/SPARK-34727
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
>
> The code below portraits the issue:
> {code:sql}
> spark-sql> CREATE TEMP VIEW v1 AS SELECT 16777215.0f AS f;
> spark-sql> SELECT * FROM v1;
> 1.6777215E7
> spark-sql> SELECT CAST(f AS TIMESTAMP) FROM v1;
> 1970-07-14 07:20:15
> spark-sql> CACHE TABLE v1;
> spark-sql> SELECT * FROM v1;
> 1.6777215E7
> spark-sql> SELECT CAST(f AS TIMESTAMP) FROM v1;
> 1970-07-14 07:20:14.951424
> {code}
> The result from the cached view *1970-07-14 07:20:14.951424* is different 
> from un-cached view *1970-07-14 07:20:15*.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34755) Support the utils for transform number format

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34755:


Assignee: (was: Apache Spark)

> Support the utils for transform number format
> -
>
> Key: SPARK-34755
> URL: https://issues.apache.org/jira/browse/SPARK-34755
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> Data Type Formatting Functions: `to_number` and `to_char` is very useful.
> We create this ticket to implement the util for transform



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34755) Support the utils for transform number format

2021-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302198#comment-17302198
 ] 

Apache Spark commented on SPARK-34755:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/31847

> Support the utils for transform number format
> -
>
> Key: SPARK-34755
> URL: https://issues.apache.org/jira/browse/SPARK-34755
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> Data Type Formatting Functions: `to_number` and `to_char` is very useful.
> We create this ticket to implement the util for transform



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34755) Support the utils for transform number format

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34755:


Assignee: Apache Spark

> Support the utils for transform number format
> -
>
> Key: SPARK-34755
> URL: https://issues.apache.org/jira/browse/SPARK-34755
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Data Type Formatting Functions: `to_number` and `to_char` is very useful.
> We create this ticket to implement the util for transform



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34755) Support the utils for transform number format

2021-03-15 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-34755:
--

 Summary: Support the utils for transform number format
 Key: SPARK-34755
 URL: https://issues.apache.org/jira/browse/SPARK-34755
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.2.0
Reporter: jiaan.geng


Data Type Formatting Functions: `to_number` and `to_char` is very useful.
We create this ticket to implement the util for transform



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34754) sparksql 'add jar' not support hdfs ha mode in k8s

2021-03-15 Thread lithiumlee-_- (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lithiumlee-_- updated SPARK-34754:
--
Description: 
Submit app to K8S,  the driver already running  but meet exception  
"java.net.UnknownHostException: xx" when starting executors. 

The udf jar uri using ha style, but the exception stack is 
"...*createNonHAProxy*..."

 

hql: 
{code:java}
// code placeholder

add jar hdfs://xx/test.jar;
create temporary function test_udf as 'com.xxx.xxx';

create table test.test_udf as 
select test_udf('1') name_1;
 {code}
 

 

exception:
{code:java}
// code placeholder
 TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): 
java.lang.IllegalArgumentException: java.net.UnknownHostException: xx
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496)
at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:816)
at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:808)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at 
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:808)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: xx
... 28 more

{code}
 

  was:
 The driver already running , but meet exception  
"java.net.UnknownHostException: xx" when starting executors. 

The udf jar uri using ha style, but the exception stack is 
"...*createNonHAProxy*..."

 

hql: 
{code:java}
// code placeholder

add jar hdfs://xx/test.jar;
create temporary function test_udf as 'com.xxx.xxx';

create table test.test_udf as 
select test_udf('1') name_1;
 {code}
 

 

exception:
{code:java}
// code placeholder
 TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): 
java.lang.IllegalArgumentException: java.net.UnknownHostException: xx
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721)
at org.apache.spark.u

[jira] [Updated] (SPARK-34754) sparksql 'add jar' not support hdfs ha mode in k8s

2021-03-15 Thread lithiumlee-_- (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lithiumlee-_- updated SPARK-34754:
--
Description: 
 The driver already running , but meet exception  
"java.net.UnknownHostException: xx" when starting executors. 

The udf jar uri using ha style, but the exception stack is 
"...*createNonHAProxy*..."

 

hql: 
{code:java}
// code placeholder

add jar hdfs://xx/test.jar;
create temporary function test_udf as 'com.xxx.xxx';

create table test.test_udf as 
select test_udf('1') name_1;
 {code}
 

 

exception:
{code:java}
// code placeholder
 TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): 
java.lang.IllegalArgumentException: java.net.UnknownHostException: xx
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496)
at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:816)
at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:808)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at 
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:808)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: xx
... 28 more

{code}
 

  was:
 

The driver already running , but meet exception  
"java.net.UnknownHostException: xx" when starting executors. 

The udf jar uri using ha style, but the exception stack is 
"...*createNonHAProxy*..."

 

 

hql:

 
{code:java}
// code placeholder

add jar hdfs://xx/test.jar;
create temporary function test_udf as 'com.xxx.xxx';

create table test.test_udf as 
select test_udf('1') name_1;
 {code}
 

 

exception:
{code:java}
// code placeholder
 TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): 
java.lang.IllegalArgumentException: java.net.UnknownHostException: xx
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721)
at org.apache.spark.util.Utils$.

[jira] [Created] (SPARK-34754) sparksql 'add jar' not support hdfs ha mode in k8s

2021-03-15 Thread lithiumlee-_- (Jira)

lithiumlee-_- created SPARK-34754:
-

 Summary: sparksql  'add jar' not support  hdfs ha mode in k8s  
 Key: SPARK-34754
 URL: https://issues.apache.org/jira/browse/SPARK-34754
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 2.4.7
Reporter: lithiumlee-_-


 

The driver already running , but meet exception  
"java.net.UnknownHostException: xx" when starting executors. 

The udf jar uri using ha style, but the exception stack is 
"...*createNonHAProxy*..."

 

 

hql:

 
{code:java}
// code placeholder

add jar hdfs://xx/test.jar;
create temporary function test_udf as 'com.xxx.xxx';

create table test.test_udf as 
select test_udf('1') name_1;
 {code}
 

 

exception:
{code:java}
// code placeholder
 TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): 
java.lang.IllegalArgumentException: java.net.UnknownHostException: xx
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496)
at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:816)
at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:808)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at 
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:808)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: xx
... 28 more

{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223

2021-03-15 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-34752:


Assignee: Erik Krogen

> Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
> -
>
> Key: SPARK-34752
> URL: https://issues.apache.org/jira/browse/SPARK-34752
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> Another day, another Jetty CVE :)  Our internal build tools are complaining 
> about Spark's dependency on Jetty 9.4.36 and I found it is because there is 
> another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
> for another upgrade to 9.4.37.
>  
> Find more at:
> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223
> https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223

2021-03-15 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-34752.
--
Fix Version/s: 3.1.2
   3.2.0
   Resolution: Fixed

Issue resolved by pull request 31846
[https://github.com/apache/spark/pull/31846]

> Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
> -
>
> Key: SPARK-34752
> URL: https://issues.apache.org/jira/browse/SPARK-34752
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
>
> Another day, another Jetty CVE :)  Our internal build tools are complaining 
> about Spark's dependency on Jetty 9.4.36 and I found it is because there is 
> another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
> for another upgrade to 9.4.37.
>  
> Find more at:
> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223
> https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-21449) Hive client's SessionState was not closed properly in HiveExternalCatalog

2021-03-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reopened SPARK-21449:
-

> Hive client's SessionState was not closed properly  in HiveExternalCatalog
> --
>
> Key: SPARK-21449
> URL: https://issues.apache.org/jira/browse/SPARK-21449
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: bulk-closed
>
> close the sessionstate to clear `hive.downloaded.resources.dir` and else.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21449) Hive client's SessionState was not closed properly in HiveExternalCatalog

2021-03-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-21449:

Labels:   (was: bulk-closed)

> Hive client's SessionState was not closed properly  in HiveExternalCatalog
> --
>
> Key: SPARK-21449
> URL: https://issues.apache.org/jira/browse/SPARK-21449
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.2.0
>
>
> close the sessionstate to clear `hive.downloaded.resources.dir` and else.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-21449) Hive client's SessionState was not closed properly in HiveExternalCatalog

2021-03-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-21449.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31833
https://github.com/apache/spark/pull/31833

> Hive client's SessionState was not closed properly  in HiveExternalCatalog
> --
>
> Key: SPARK-21449
> URL: https://issues.apache.org/jira/browse/SPARK-21449
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: bulk-closed
> Fix For: 3.2.0
>
>
> close the sessionstate to clear `hive.downloaded.resources.dir` and else.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped

2021-03-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-23745:

Labels:   (was: bulk-closed)

> Remove the directories of the “hive.downloaded.resources.dir” when 
> HiveThriftServer2 stopped
> 
>
> Key: SPARK-23745
> URL: https://issues.apache.org/jira/browse/SPARK-23745
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: linux
>Reporter: zuotingbing
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: 2018-03-20_164832.png
>
>
> !2018-03-20_164832.png!  
> when start the HiveThriftServer2, we create some directories for 
> hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not 
> remove these directories. The directories could accumulate a lot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped

2021-03-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-23745.
-
Fix Version/s: 3.2.0
 Assignee: Kent Yao
   Resolution: Fixed

Issue resolved by pull request 31833
https://github.com/apache/spark/pull/31833

> Remove the directories of the “hive.downloaded.resources.dir” when 
> HiveThriftServer2 stopped
> 
>
> Key: SPARK-23745
> URL: https://issues.apache.org/jira/browse/SPARK-23745
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: linux
>Reporter: zuotingbing
>Assignee: Kent Yao
>Priority: Major
>  Labels: bulk-closed
> Fix For: 3.2.0
>
> Attachments: 2018-03-20_164832.png
>
>
> !2018-03-20_164832.png!  
> when start the HiveThriftServer2, we create some directories for 
> hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not 
> remove these directories. The directories could accumulate a lot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns

2021-03-15 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302171#comment-17302171
 ] 

Hyukjin Kwon commented on SPARK-34694:
--

Oh, okay. it more proposes to handle column references. Probably the concern is 
about the type handling of the pushed predicate in the source but sure sounds 
like a valid issue.

> Improve Spark SQL Source Filter to allow pushdown of filters span multiple 
> columns
> --
>
> Key: SPARK-34694
> URL: https://issues.apache.org/jira/browse/SPARK-34694
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1
>Reporter: Chen Zou
>Priority: Minor
>
> The current org.apache.spark.sql.sources.Filter abstract class only allows 
> pushdown of filters on single column or sum of products of multiple such 
> single-column filters.
> Filters on multiple columns cannot be pushed down through this Filter 
> subclass to source, e.g. from TPC-H benchmark on lineitem table:
> (l_commitdate#11 < l_receiptdate#12)
> (l_shipdate#10 < l_commitdate#11)
>  
> The current design probably originates from the point that columnar source 
> has a hard time supporting these cross-column filters. But with batching 
> implemented in columnar sources, they can still support cross-column filters. 
>  This issue tries to open up discussion on a more general Filter interface to 
> allow pushing down cross-column filters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped

2021-03-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reopened SPARK-23745:
-

> Remove the directories of the “hive.downloaded.resources.dir” when 
> HiveThriftServer2 stopped
> 
>
> Key: SPARK-23745
> URL: https://issues.apache.org/jira/browse/SPARK-23745
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: linux
>Reporter: zuotingbing
>Priority: Major
>  Labels: bulk-closed
> Attachments: 2018-03-20_164832.png
>
>
> !2018-03-20_164832.png!  
> when start the HiveThriftServer2, we create some directories for 
> hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not 
> remove these directories. The directories could accumulate a lot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23745:


Assignee: (was: Apache Spark)

> Remove the directories of the “hive.downloaded.resources.dir” when 
> HiveThriftServer2 stopped
> 
>
> Key: SPARK-23745
> URL: https://issues.apache.org/jira/browse/SPARK-23745
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: linux
>Reporter: zuotingbing
>Priority: Major
>  Labels: bulk-closed
> Attachments: 2018-03-20_164832.png
>
>
> !2018-03-20_164832.png!  
> when start the HiveThriftServer2, we create some directories for 
> hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not 
> remove these directories. The directories could accumulate a lot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23745:


Assignee: Apache Spark

> Remove the directories of the “hive.downloaded.resources.dir” when 
> HiveThriftServer2 stopped
> 
>
> Key: SPARK-23745
> URL: https://issues.apache.org/jira/browse/SPARK-23745
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: linux
>Reporter: zuotingbing
>Assignee: Apache Spark
>Priority: Major
>  Labels: bulk-closed
> Attachments: 2018-03-20_164832.png
>
>
> !2018-03-20_164832.png!  
> when start the HiveThriftServer2, we create some directories for 
> hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not 
> remove these directories. The directories could accumulate a lot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21449) Hive client's SessionState was not closed properly in HiveExternalCatalog

2021-03-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-21449:
---

Assignee: Kent Yao

> Hive client's SessionState was not closed properly  in HiveExternalCatalog
> --
>
> Key: SPARK-21449
> URL: https://issues.apache.org/jira/browse/SPARK-21449
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: bulk-closed
>
> close the sessionstate to clear `hive.downloaded.resources.dir` and else.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34753) Deadlock in executor RPC shutdown hook

2021-03-15 Thread Dylan Patterson (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Patterson updated SPARK-34753:

Attachment: sb-dylanw-spark-0ec26858-b72ed278375bf3a9-exec-38.log

> Deadlock in executor RPC shutdown hook
> --
>
> Key: SPARK-34753
> URL: https://issues.apache.org/jira/browse/SPARK-34753
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1
> Environment: Not sure this is relevant but let me know and I can 
> append
>Reporter: Dylan Patterson
>Priority: Major
> Attachments: sb-dylanw-spark-0ec26858-b72ed278375bf3a9-exec-38.log
>
>
> Ran into an issue where executors initiate shutdown sequence, System.exit is 
> called but java process never dies leaving orphaned containers in kubernetes. 
> Tracked it down to a deadlock in the RPC shutdown. See thread dump
> {code:java}
> "Thread-2" #26 prio=5 os_prio=0 tid=0x7f6410231800 nid=0x2a2 waiting on 
> condition [0x7f63c3bf1000] java.lang.Thread.State: TIMED_WAITING 
> (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 
> <0xc05a47b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>  at 
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
>  at 
> java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:675)
>  at 
> org.apache.spark.rpc.netty.MessageLoop.stop(MessageLoop.scala:60)org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1(Dispatcher.scala:190)
>  at 
> org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1$adapted(Dispatcher.scala:187)
>  at org.apache.spark.rpc.netty.Dispatcher$$Lambda$214/337533935.apply(Unknown 
> Source) at scala.collection.Iterator.foreach(Iterator.scala:941) at 
> scala.collection.Iterator.foreach$(Iterator.scala:941) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at 
> scala.collection.IterableLike.foreach(IterableLike.scala:74) at 
> scala.collection.IterableLike.foreach$(IterableLike.scala:73) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:56) at 
> org.apache.spark.rpc.netty.Dispatcher.stop(Dispatcher.scala:187) at 
> org.apache.spark.rpc.netty.NettyRpcEnv.cleanup(NettyRpcEnv.scala:324) at 
> org.apache.spark.rpc.netty.NettyRpcEnv.shutdown(NettyRpcEnv.scala:302) at 
> org.apache.spark.SparkEnv.stop(SparkEnv.scala:96) at 
> org.apache.spark.executor.Executor.stop(Executor.scala:292) at 
> org.apache.spark.executor.Executor.$anonfun$new$2(Executor.scala:74) at 
> org.apache.spark.executor.Executor$$Lambda$317/1046854795.apply$mcV$sp(Unknown
>  Source) at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>  at 
> org.apache.spark.util.SparkShutdownHookManager$$Lambda$2192/1832515374.apply$mcV$sp(Unknown
>  Source) at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932) at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>  at 
> org.apache.spark.util.SparkShutdownHookManager$$Lambda$2191/952019066.apply$mcV$sp(Unknown
>  Source) at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
> scala.util.Try$.apply(Try.scala:213) at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>  at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>  at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34753) Deadlock in executor RPC shutdown hook

2021-03-15 Thread Dylan Patterson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302090#comment-17302090
 ] 

Dylan Patterson commented on SPARK-34753:
-

Aside from fixing the underlying issue it might be worth adding some sort of 
killswitch timeout for the containers since this causes resource leaks.

> Deadlock in executor RPC shutdown hook
> --
>
> Key: SPARK-34753
> URL: https://issues.apache.org/jira/browse/SPARK-34753
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1
> Environment: Not sure this is relevant but let me know and I can 
> append
>Reporter: Dylan Patterson
>Priority: Major
>
> Ran into an issue where executors initiate shutdown sequence, System.exit is 
> called but java process never dies leaving orphaned containers in kubernetes. 
> Tracked it down to a deadlock in the RPC shutdown. See thread dump
> {code:java}
> "Thread-2" #26 prio=5 os_prio=0 tid=0x7f6410231800 nid=0x2a2 waiting on 
> condition [0x7f63c3bf1000] java.lang.Thread.State: TIMED_WAITING 
> (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 
> <0xc05a47b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>  at 
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
>  at 
> java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:675)
>  at 
> org.apache.spark.rpc.netty.MessageLoop.stop(MessageLoop.scala:60)org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1(Dispatcher.scala:190)
>  at 
> org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1$adapted(Dispatcher.scala:187)
>  at org.apache.spark.rpc.netty.Dispatcher$$Lambda$214/337533935.apply(Unknown 
> Source) at scala.collection.Iterator.foreach(Iterator.scala:941) at 
> scala.collection.Iterator.foreach$(Iterator.scala:941) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at 
> scala.collection.IterableLike.foreach(IterableLike.scala:74) at 
> scala.collection.IterableLike.foreach$(IterableLike.scala:73) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:56) at 
> org.apache.spark.rpc.netty.Dispatcher.stop(Dispatcher.scala:187) at 
> org.apache.spark.rpc.netty.NettyRpcEnv.cleanup(NettyRpcEnv.scala:324) at 
> org.apache.spark.rpc.netty.NettyRpcEnv.shutdown(NettyRpcEnv.scala:302) at 
> org.apache.spark.SparkEnv.stop(SparkEnv.scala:96) at 
> org.apache.spark.executor.Executor.stop(Executor.scala:292) at 
> org.apache.spark.executor.Executor.$anonfun$new$2(Executor.scala:74) at 
> org.apache.spark.executor.Executor$$Lambda$317/1046854795.apply$mcV$sp(Unknown
>  Source) at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>  at 
> org.apache.spark.util.SparkShutdownHookManager$$Lambda$2192/1832515374.apply$mcV$sp(Unknown
>  Source) at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932) at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>  at 
> org.apache.spark.util.SparkShutdownHookManager$$Lambda$2191/952019066.apply$mcV$sp(Unknown
>  Source) at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
> scala.util.Try$.apply(Try.scala:213) at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>  at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>  at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34753) Deadlock in executor RPC shutdown hook

2021-03-15 Thread Dylan Patterson (Jira)

Dylan Patterson created SPARK-34753:
---

 Summary: Deadlock in executor RPC shutdown hook
 Key: SPARK-34753
 URL: https://issues.apache.org/jira/browse/SPARK-34753
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.0.1
 Environment: Not sure this is relevant but let me know and I can append
Reporter: Dylan Patterson


Ran into an issue where executors initiate shutdown sequence, System.exit is 
called but java process never dies leaving orphaned containers in kubernetes. 
Tracked it down to a deadlock in the RPC shutdown. See thread dump



{code:java}
"Thread-2" #26 prio=5 os_prio=0 tid=0x7f6410231800 nid=0x2a2 waiting on 
condition [0x7f63c3bf1000] java.lang.Thread.State: TIMED_WAITING (parking) 
at sun.misc.Unsafe.park(Native Method) - parking to wait for 
<0xc05a47b8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 at 
java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
 at 
java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:675)
 at 
org.apache.spark.rpc.netty.MessageLoop.stop(MessageLoop.scala:60)org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1(Dispatcher.scala:190)
 at 
org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1$adapted(Dispatcher.scala:187)
 at org.apache.spark.rpc.netty.Dispatcher$$Lambda$214/337533935.apply(Unknown 
Source) at scala.collection.Iterator.foreach(Iterator.scala:941) at 
scala.collection.Iterator.foreach$(Iterator.scala:941) at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at 
scala.collection.IterableLike.foreach(IterableLike.scala:74) at 
scala.collection.IterableLike.foreach$(IterableLike.scala:73) at 
scala.collection.AbstractIterable.foreach(Iterable.scala:56) at 
org.apache.spark.rpc.netty.Dispatcher.stop(Dispatcher.scala:187) at 
org.apache.spark.rpc.netty.NettyRpcEnv.cleanup(NettyRpcEnv.scala:324) at 
org.apache.spark.rpc.netty.NettyRpcEnv.shutdown(NettyRpcEnv.scala:302) at 
org.apache.spark.SparkEnv.stop(SparkEnv.scala:96) at 
org.apache.spark.executor.Executor.stop(Executor.scala:292) at 
org.apache.spark.executor.Executor.$anonfun$new$2(Executor.scala:74) at 
org.apache.spark.executor.Executor$$Lambda$317/1046854795.apply$mcV$sp(Unknown 
Source) at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
 at 
org.apache.spark.util.SparkShutdownHookManager$$Lambda$2192/1832515374.apply$mcV$sp(Unknown
 Source) at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932) at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
 at 
org.apache.spark.util.SparkShutdownHookManager$$Lambda$2191/952019066.apply$mcV$sp(Unknown
 Source) at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
scala.util.Try$.apply(Try.scala:213) at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
 at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
 at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-15 Thread Nivas Umapathy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nivas Umapathy updated SPARK-34751:
---
Description: 
I have a parquet file that has data with invalid column names on it. 
[#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
file attached with this ticket.

I tried to load this file with 

{{df = glue_context.read.parquet('invalid_columns_double.parquet')}}

{{df = df.withColumnRenamed('COL 1', 'COL_1')}}

{{df = df.withColumnRenamed('COL,2', 'COL_2')}}

{{df = df.withColumnRenamed('COL;3', 'COL_3') }}

and so on.

Now if i call

{{df.show()}}

it throws this exception that is still pointing to the old column name.

 {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
invalid character(s) among " ,;{}()}}
{{n}}
 {{t=". Please use alias to rename it.;'}}

 

When i read about it in some blogs, there was suggestion to re-read the same 
parquet with new schema applied. So i did 

{{df = 
glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}

 

and it works, but all the data in the dataframe are null. The same works for 
String datatypes

 

  was:
I have a parquet file that has data with invalid column names on it. 
[#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
file [Invalid Header 
Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing].

I tried to load this file with 

{{df = glue_context.read.parquet('invalid_columns_double.parquet')}}

{{df = df.withColumnRenamed('COL 1', 'COL_1')}}

{{df = df.withColumnRenamed('COL,2', 'COL_2')}}

{{df = df.withColumnRenamed('COL;3', 'COL_3') }}

and so on.

Now if i call

{{df.show()}}

it throws this exception that is still pointing to the old column name.

 {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
invalid character(s) among " ,;{}()\\n}}
{{t=". Please use alias to rename it.;'}}

 

When i read about it in some blogs, there was suggestion to re-read the same 
parquet with new schema applied. So i did 

{{df = 
glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}

 

and it works, but all the data in the dataframe are null. The same works for 
String datatypes

 


> Parquet with invalid chars on column name reads double as null when a clean 
> schema is applied
> -
>
> Key: SPARK-34751
> URL: https://issues.apache.org/jira/browse/SPARK-34751
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.3
> Environment: Pyspark 2.4.3
> AWS Glue Dev Endpoint EMR
>Reporter: Nivas Umapathy
>Priority: Major
> Fix For: 2.4.8
>
> Attachments: invalid_columns_double.parquet
>
>
> I have a parquet file that has data with invalid column names on it. 
> [#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
> file attached with this ticket.
> I tried to load this file with 
> {{df = glue_context.read.parquet('invalid_columns_double.parquet')}}
> {{df = df.withColumnRenamed('COL 1', 'COL_1')}}
> {{df = df.withColumnRenamed('COL,2', 'COL_2')}}
> {{df = df.withColumnRenamed('COL;3', 'COL_3') }}
> and so on.
> Now if i call
> {{df.show()}}
> it throws this exception that is still pointing to the old column name.
>  {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
> invalid character(s) among " ,;{}()}}
> {{n}}
>  {{t=". Please use alias to rename it.;'}}
>  
> When i read about it in some blogs, there was suggestion to re-read the same 
> parquet with new schema applied. So i did 
> {{df = 
> glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}
>  
> and it works, but all the data in the dataframe are null. The same works for 
> String datatypes
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34729) Faster execution for broadcast nested loop join (left semi/anti with no condition)

2021-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302043#comment-17302043
 ] 

Apache Spark commented on SPARK-34729:
--

User 'c21' has created a pull request for this issue:
https://github.com/apache/spark/pull/31845

> Faster execution for broadcast nested loop join (left semi/anti with no 
> condition)
> --
>
> Key: SPARK-34729
> URL: https://issues.apache.org/jira/browse/SPARK-34729
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
> Fix For: 3.2.0
>
>
> For `BroadcastNestedLoopJoinExec` left semi and left anti join without 
> condition. If we broadcast left side. Currently we check whether every row 
> from broadcast side has a match or not by iterating broadcast side a lot of 
> time - 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala#L256-L275]
>  . This is unnecessary, as there's no condition, and we only need to check 
> whether stream side is empty or not. Create this Jira to add the 
> optimization. This can boost the affected query execution performance a lot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223

2021-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302028#comment-17302028
 ] 

Apache Spark commented on SPARK-34752:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/31846

> Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
> -
>
> Key: SPARK-34752
> URL: https://issues.apache.org/jira/browse/SPARK-34752
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Priority: Major
>
> Another day, another Jetty CVE :)  Our internal build tools are complaining 
> about Spark's dependency on Jetty 9.4.36 and I found it is because there is 
> another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
> for another upgrade to 9.4.37.
>  
> Find more at:
> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223
> https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34752:


Assignee: (was: Apache Spark)

> Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
> -
>
> Key: SPARK-34752
> URL: https://issues.apache.org/jira/browse/SPARK-34752
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Priority: Major
>
> Another day, another Jetty CVE :)  Our internal build tools are complaining 
> about Spark's dependency on Jetty 9.4.36 and I found it is because there is 
> another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
> for another upgrade to 9.4.37.
>  
> Find more at:
> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223
> https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34752:


Assignee: Apache Spark

> Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
> -
>
> Key: SPARK-34752
> URL: https://issues.apache.org/jira/browse/SPARK-34752
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Assignee: Apache Spark
>Priority: Major
>
> Another day, another Jetty CVE :)  Our internal build tools are complaining 
> about Spark's dependency on Jetty 9.4.36 and I found it is because there is 
> another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
> for another upgrade to 9.4.37.
>  
> Find more at:
> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223
> https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223

2021-03-15 Thread Erik Krogen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated SPARK-34752:

Description: 
Another day, another Jetty CVE :)  Our internal build tools are complaining 
about Spark's dependency on Jetty 9.4.36 and I found it is because there is 
another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
for another upgrade to 9.4.37.

 

Find more at:
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223
https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523

  was:
Another day, another Jetty CVE :)  Our internal build tools are complaining 
about Spark's dependency on Jetty 9.3.36 and I found it is because there is 
another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
for another upgrade to 9.3.37.

 

Find more at:
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223
https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523


> Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
> -
>
> Key: SPARK-34752
> URL: https://issues.apache.org/jira/browse/SPARK-34752
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Priority: Major
>
> Another day, another Jetty CVE :)  Our internal build tools are complaining 
> about Spark's dependency on Jetty 9.4.36 and I found it is because there is 
> another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
> for another upgrade to 9.4.37.
>  
> Find more at:
> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223
> https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223

2021-03-15 Thread Erik Krogen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated SPARK-34752:

Summary: Upgrade Jetty to 9.4.37 to fix CVE-2020-27223  (was: Upgrade Jetty 
to 9.3.37 to fix CVE-2020-27223)

> Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
> -
>
> Key: SPARK-34752
> URL: https://issues.apache.org/jira/browse/SPARK-34752
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Priority: Major
>
> Another day, another Jetty CVE :)  Our internal build tools are complaining 
> about Spark's dependency on Jetty 9.3.36 and I found it is because there is 
> another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
> for another upgrade to 9.3.37.
>  
> Find more at:
> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223
> https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34752) Upgrade Jetty to 9.3.37 to fix CVE-2020-27223

2021-03-15 Thread Erik Krogen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated SPARK-34752:

Description: 
Another day, another Jetty CVE :)  Our internal build tools are complaining 
about Spark's dependency on Jetty 9.3.36 and I found it is because there is 
another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
for another upgrade to 9.3.37.

 

Find more at:
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223
https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523

  was:
Another day, another Jetty CVE :)  Our internal build tools are complaining 
about Spark's dependency on Jetty 9.3.36 and I found it is because there is 
another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
for another upgrade to 9.3.37.

 

Find more at https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 / 
https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523


> Upgrade Jetty to 9.3.37 to fix CVE-2020-27223
> -
>
> Key: SPARK-34752
> URL: https://issues.apache.org/jira/browse/SPARK-34752
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Priority: Major
>
> Another day, another Jetty CVE :)  Our internal build tools are complaining 
> about Spark's dependency on Jetty 9.3.36 and I found it is because there is 
> another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
> for another upgrade to 9.3.37.
>  
> Find more at:
> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223
> https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34752) Upgrade Jetty to 9.3.37 to fix CVE-2020-27223

2021-03-15 Thread Erik Krogen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated SPARK-34752:

Description: 
Another day, another Jetty CVE :)  Our internal build tools are complaining 
about Spark's dependency on Jetty 9.3.36 and I found it is because there is 
another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
for another upgrade to 9.3.37.

 

Find more at https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 / 
https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523

  was:Another day, another Jetty CVE :)  Our internal build tools are 
complaining about Spark's dependency on Jetty 9.3.36 and I found it is because 
there is another Jetty CVE on the version we recently upgraded to in 
SPARK-34449. Time for another upgrade to 9.3.37.


> Upgrade Jetty to 9.3.37 to fix CVE-2020-27223
> -
>
> Key: SPARK-34752
> URL: https://issues.apache.org/jira/browse/SPARK-34752
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Erik Krogen
>Priority: Major
>
> Another day, another Jetty CVE :)  Our internal build tools are complaining 
> about Spark's dependency on Jetty 9.3.36 and I found it is because there is 
> another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
> for another upgrade to 9.3.37.
>  
> Find more at https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 / 
> https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34752) Upgrade Jetty to 9.3.37 to fix CVE-2020-27223

2021-03-15 Thread Erik Krogen (Jira)

Erik Krogen created SPARK-34752:
---

 Summary: Upgrade Jetty to 9.3.37 to fix CVE-2020-27223
 Key: SPARK-34752
 URL: https://issues.apache.org/jira/browse/SPARK-34752
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.1.1
Reporter: Erik Krogen


Another day, another Jetty CVE :)  Our internal build tools are complaining 
about Spark's dependency on Jetty 9.3.36 and I found it is because there is 
another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time 
for another upgrade to 9.3.37.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34738) Upgrade Minikube and kubernetes cluster version on Jenkins

2021-03-15 Thread Shane Knapp (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Knapp reassigned SPARK-34738:
---

Assignee: Shane Knapp

> Upgrade Minikube and kubernetes cluster version on Jenkins
> --
>
> Key: SPARK-34738
> URL: https://issues.apache.org/jira/browse/SPARK-34738
> Project: Spark
>  Issue Type: Task
>  Components: jenkins, Kubernetes
>Affects Versions: 3.2.0
>Reporter: Attila Zsolt Piros
>Assignee: Shane Knapp
>Priority: Major
>
> [~shaneknapp] as we discussed [on the mailing 
> list|http://apache-spark-developers-list.1001551.n3.nabble.com/minikube-and-kubernetes-cluster-versions-for-integration-testing-td30856.html]
>  Minikube can be upgraded to the latest (v1.18.1) and kubernetes version 
> should be v1.17.3 (`minikube config set kubernetes-version v1.17.3`).
> [Here|https://github.com/apache/spark/pull/31829] is my PR which uses a new 
> method to configure the kubernetes client. Thanks in advance to use it for 
> testing on the Jenkins after the Minikube version is updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34646) TreeNode bind issue for duplicate column name.

2021-03-15 Thread loc nguyen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301924#comment-17301924
 ] 

loc nguyen commented on SPARK-34646:


I am little confused with your response.  What are information are u looking 
for to receive.

> TreeNode bind issue for duplicate column name.
> --
>
> Key: SPARK-34646
> URL: https://issues.apache.org/jira/browse/SPARK-34646
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.4.3
> Environment: Spark 2.4.3, Scala 2.11.8, Hadoop 3.2.1
>Reporter: loc nguyen
>Priority: Major
>  Labels: spark
>
> I received a Spark {{TreeNodeException}} executing a union of two data 
> frames. When I assign the union results to a DataFrame that will be returned 
> by a function, this error occurs. However, I am able to assign the union 
> results to a DataFrame that will not be returned. I have examined the schema 
> for all the data frames participating in the code. The PT_Id is being 
> duplicated. The PT_Id is duplicated and results in the failed search.
>  
>   
>  {{21/03/04 19:58:28 ERROR Executor: Exception in task 2.0 in stage 2281.0 
> (TID 5557)
>  org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: PT_ID#140575 at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
>  at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:79)
>  at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:78)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:255)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:261)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:261)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:261)
>  at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:245)
>  at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:78)
>  at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:45)
>  at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:40)
>  at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1190)
>  at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:403)
>  at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:87)
>  at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:85)
>  at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2$$anonfun$apply$3.apply(BroadcastNestedLoopJoinExec.scala:191)
>  at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2$$anonfun$apply$3.apply(BroadcastNestedLoopJoinExec.scala:191)
>  at 
> scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38)
>  at 
> scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:46)
>  at scala.collection.mutable.ArrayOps$ofRef.exists(ArrayOps.scala:186)
>  at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2.apply(BroadcastNestedLoopJoinExec.scala:191)
>  at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2.apply(BroadcastNestedLoopJoinExec.scala:190)
>  at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:464)
>  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>  at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>  at org.apache.spark.scheduler.Shuffle

[jira] [Updated] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-15 Thread Nivas Umapathy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nivas Umapathy updated SPARK-34751:
---
Description: 
I have a parquet file that has data with invalid column names on it. 
[#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
file [Invalid Header 
Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing].

I tried to load this file with 

{{df = glue_context.read.parquet('invalid_columns_double.parquet')}}

{{df = df.withColumnRenamed('COL 1', 'COL_1')}}

{{df = df.withColumnRenamed('COL,2', 'COL_2')}}

{{df = df.withColumnRenamed('COL;3', 'COL_3') }}

and so on.

Now if i call

{{df.show()}}

it throws this exception that is still pointing to the old column name.

 {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
invalid character(s) among " ,;{}()\\n}}
{{t=". Please use alias to rename it.;'}}

 

When i read about it in some blogs, there was suggestion to re-read the same 
parquet with new schema applied. So i did 

{{df = 
glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}

 

and it works, but all the data in the dataframe are null. The same works for 
String datatypes

 

  was:
I have a parquet file that has data with invalid column names on it. 
[#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
file [Invalid Header 
Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing].

I tried to load this file with 

{{df = glue_context.read.parquet('invalid_columns_double.parquet')}}

{{df = df.withColumnRenamed('COL 1', 'COL_1')}}

{{df = df.withColumnRenamed('COL,2', 'COL_2')}}

{{df = df.withColumnRenamed('COL;3', 'COL_3') }}

and so on.

Now if i call

{{df.show()}}

it throws this exception that is still pointing to the old column name.

 {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
invalid character(s) among " ,;{}()\\n\\t=". Please use alias to rename it.;'}}

 

When i read about it in some blogs, there was suggestion to re-read the same 
parquet with new schema applied. So i did 

{{df = 
glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}{{}}

 

and it works, but all the data in the dataframe are null. The same works for 
Strings

 


> Parquet with invalid chars on column name reads double as null when a clean 
> schema is applied
> -
>
> Key: SPARK-34751
> URL: https://issues.apache.org/jira/browse/SPARK-34751
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.3
> Environment: Pyspark 2.4.3
> AWS Glue Dev Endpoint EMR
>Reporter: Nivas Umapathy
>Priority: Major
> Fix For: 2.4.8
>
> Attachments: invalid_columns_double.parquet
>
>
> I have a parquet file that has data with invalid column names on it. 
> [#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
> file [Invalid Header 
> Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing].
> I tried to load this file with 
> {{df = glue_context.read.parquet('invalid_columns_double.parquet')}}
> {{df = df.withColumnRenamed('COL 1', 'COL_1')}}
> {{df = df.withColumnRenamed('COL,2', 'COL_2')}}
> {{df = df.withColumnRenamed('COL;3', 'COL_3') }}
> and so on.
> Now if i call
> {{df.show()}}
> it throws this exception that is still pointing to the old column name.
>  {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
> invalid character(s) among " ,;{}()\\n}}
> {{t=". Please use alias to rename it.;'}}
>  
> When i read about it in some blogs, there was suggestion to re-read the same 
> parquet with new schema applied. So i did 
> {{df = 
> glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}
>  
> and it works, but all the data in the dataframe are null. The same works for 
> String datatypes
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-15 Thread Nivas Umapathy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nivas Umapathy updated SPARK-34751:
---
Attachment: invalid_columns_double.parquet

> Parquet with invalid chars on column name reads double as null when a clean 
> schema is applied
> -
>
> Key: SPARK-34751
> URL: https://issues.apache.org/jira/browse/SPARK-34751
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.3
> Environment: Pyspark 2.4.3
> AWS Glue Dev Endpoint EMR
>Reporter: Nivas Umapathy
>Priority: Major
> Fix For: 2.4.8
>
> Attachments: invalid_columns_double.parquet
>
>
> I have a parquet file that has data with invalid column names on it. 
> [#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
> file [Invalid Header 
> Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing].
> I tried to load this file with 
> {{df = glue_context.read.parquet('invalid_columns_double.parquet')}}
> {{df = df.withColumnRenamed('COL 1', 'COL_1')}}
> {{df = df.withColumnRenamed('COL,2', 'COL_2')}}
> {{df = df.withColumnRenamed('COL;3', 'COL_3') }}
> and so on.
> Now if i call
> {{df.show()}}
> it throws this exception that is still pointing to the old column name.
>  {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
> invalid character(s) among " ,;{}()\\n\\t=". Please use alias to rename 
> it.;'}}
>  
> When i read about it in some blogs, there was suggestion to re-read the same 
> parquet with new schema applied. So i did 
> {{df = 
> glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}{{}}
>  
> and it works, but all the data in the dataframe are null. The same works for 
> Strings
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-15 Thread Nivas Umapathy (Jira)

Nivas Umapathy created SPARK-34751:
--

 Summary: Parquet with invalid chars on column name reads double as 
null when a clean schema is applied
 Key: SPARK-34751
 URL: https://issues.apache.org/jira/browse/SPARK-34751
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 2.4.3
 Environment: Pyspark 2.4.3

AWS Glue Dev Endpoint EMR
Reporter: Nivas Umapathy
 Fix For: 2.4.8
 Attachments: invalid_columns_double.parquet

I have a parquet file that has data with invalid column names on it. 
[#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
file [Invalid Header 
Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing].

I tried to load this file with 

{{df = glue_context.read.parquet('invalid_columns_double.parquet')}}

{{df = df.withColumnRenamed('COL 1', 'COL_1')}}

{{df = df.withColumnRenamed('COL,2', 'COL_2')}}

{{df = df.withColumnRenamed('COL;3', 'COL_3') }}

and so on.

Now if i call

{{df.show()}}

it throws this exception that is still pointing to the old column name.

 {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
invalid character(s) among " ,;{}()\\n\\t=". Please use alias to rename it.;'}}

 

When i read about it in some blogs, there was suggestion to re-read the same 
parquet with new schema applied. So i did 

{{df = 
glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}{{}}

 

and it works, but all the data in the dataframe are null. The same works for 
Strings

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34750) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-15 Thread Nivas Umapathy (Jira)

Nivas Umapathy created SPARK-34750:
--

 Summary: Parquet with invalid chars on column name reads double as 
null when a clean schema is applied
 Key: SPARK-34750
 URL: https://issues.apache.org/jira/browse/SPARK-34750
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 2.4.3
 Environment: Pyspark 2.4.3

AWS Glue Dev Endpoint EMR
Reporter: Nivas Umapathy
 Fix For: 2.4.8


I have a parquet file that has data with invalid column names on it. 
[#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
file [Invalid Header 
Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing].

I tried to load this file with 

{{df = glue_context.read.parquet('invalid_columns_double.parquet')}}

{{df = df.withColumnRenamed('COL 1', 'COL_1')}}

{{df = df.withColumnRenamed('COL,2', 'COL_2')}}

{{df = df.withColumnRenamed('COL;3', 'COL_3') }}

and so on.

Now if i call

{{df.show()}}

it throws this exception that is still pointing to the old column name.

 {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
invalid character(s) among " ,;{}()\\n\\t=". Please use alias to rename it.;'}}

 

When i read about it in some blogs, there was suggestion to re-read the same 
parquet with new schema applied. So i did 

{{df = 
glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}{{}}

 

and it works, but all the data in the dataframe are null. The same works for 
Strings

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34749) Simplify CreateNamedStruct

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34749:


Assignee: (was: Apache Spark)

> Simplify CreateNamedStruct
> --
>
> Key: SPARK-34749
> URL: https://issues.apache.org/jira/browse/SPARK-34749
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34749) Simplify CreateNamedStruct

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34749:


Assignee: Apache Spark

> Simplify CreateNamedStruct
> --
>
> Key: SPARK-34749
> URL: https://issues.apache.org/jira/browse/SPARK-34749
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34749) Simplify CreateNamedStruct

2021-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301827#comment-17301827
 ] 

Apache Spark commented on SPARK-34749:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/31843

> Simplify CreateNamedStruct
> --
>
> Key: SPARK-34749
> URL: https://issues.apache.org/jira/browse/SPARK-34749
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34749) Simplify CreateNamedStruct

2021-03-15 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-34749:
---

 Summary: Simplify CreateNamedStruct
 Key: SPARK-34749
 URL: https://issues.apache.org/jira/browse/SPARK-34749
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34731) ConcurrentModificationException in EventLoggingListener when redacting properties

2021-03-15 Thread Bruce Robbins (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-34731:
--
Affects Version/s: 3.1.1

> ConcurrentModificationException in EventLoggingListener when redacting 
> properties
> -
>
> Key: SPARK-34731
> URL: https://issues.apache.org/jira/browse/SPARK-34731
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Bruce Robbins
>Priority: Major
>
> Reproduction:
> The key elements of reproduction are enabling event logging, setting 
> spark.executor.cores, and some bad luck:
> {noformat}
> $ bin/spark-shell --conf spark.ui.showConsoleProgress=false \
> --conf spark.executor.cores=1 --driver-memory 4g --conf \
> "spark.ui.showConsoleProgress=false" \
> --conf spark.eventLog.enabled=true \
> --conf spark.eventLog.dir=/tmp/spark-events
> ...
> scala> (0 to 500).foreach { i =>
>  |   val df = spark.range(0, 2).toDF("a")
>  |   df.filter("a > 12").count
>  | }
> 21/03/12 18:16:44 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.util.ConcurrentModificationException
>   at java.util.Hashtable$Enumerator.next(Hashtable.java:1387)
>   at 
> scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$6.next(Wrappers.scala:424)
>   at 
> scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$6.next(Wrappers.scala:420)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at scala.collection.mutable.MapLike.toSeq(MapLike.scala:75)
>   at scala.collection.mutable.MapLike.toSeq$(MapLike.scala:72)
>   at scala.collection.mutable.AbstractMap.toSeq(Map.scala:82)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.redactProperties(EventLoggingListener.scala:290)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:162)
>   at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37)
>   at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
>   at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
>   at 
> scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1379)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
> {noformat}
> Analysis from quick reading of the code:
> DAGScheduler posts a JobSubmitted event containing a clone of a properties 
> object 
> [here|https://github.com/apache/spark/blob/4f1e434ec57070b52b28f98c66b53ca6ec4de7a4/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L834].
> This event is handled 
> [here|https://github.com/apache/spark/blob/4f1e434ec57070b52b28f98c66b53ca6ec4de7a4/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2394].
> DAGScheduler#handleJobSubmitted stores the properties object in a [Job 
> object|https://github.com/apache/spark/blob/4f1e434ec57070b52b28f98c66b53ca6ec4de7a4/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1154],
>  which in turn is [saved in the jobIdToActiveJob 
> map|https://github.com/apache/spark/blob/4f1e434ec57070b52b28f98c66b53ca6ec4de7a4/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1163].
> DAGScheduler#handleJobSubmitted posts a SparkListenerJobStart event 
> [here|https://github.com/apache/spark/blob/4f1e434ec57070b52b28f98c66b53ca6ec4de7a4/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1169]
>  with a reference to

[jira] [Assigned] (SPARK-34748) Create a rule of the analysis logic for streaming write

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34748:


Assignee: (was: Apache Spark)

> Create a rule of the analysis logic for streaming write
> ---
>
> Key: SPARK-34748
> URL: https://issues.apache.org/jira/browse/SPARK-34748
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Priority: Major
>
> Currently, the analysis logic for streaming write is mixed in 
> StreamingQueryManager. If we create a specific analyzer rule and separated 
> logical plans, it should be helpful for further extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34748) Create a rule of the analysis logic for streaming write

2021-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301792#comment-17301792
 ] 

Apache Spark commented on SPARK-34748:
--

User 'xuanyuanking' has created a pull request for this issue:
https://github.com/apache/spark/pull/31842

> Create a rule of the analysis logic for streaming write
> ---
>
> Key: SPARK-34748
> URL: https://issues.apache.org/jira/browse/SPARK-34748
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Priority: Major
>
> Currently, the analysis logic for streaming write is mixed in 
> StreamingQueryManager. If we create a specific analyzer rule and separated 
> logical plans, it should be helpful for further extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34748) Create a rule of the analysis logic for streaming write

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34748:


Assignee: Apache Spark

> Create a rule of the analysis logic for streaming write
> ---
>
> Key: SPARK-34748
> URL: https://issues.apache.org/jira/browse/SPARK-34748
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Assignee: Apache Spark
>Priority: Major
>
> Currently, the analysis logic for streaming write is mixed in 
> StreamingQueryManager. If we create a specific analyzer rule and separated 
> logical plans, it should be helpful for further extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34747) Add virtual operators to the built-in function document.

2021-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301791#comment-17301791
 ] 

Apache Spark commented on SPARK-34747:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/31841

> Add virtual operators to the built-in function document.
> 
>
> Key: SPARK-34747
> URL: https://issues.apache.org/jira/browse/SPARK-34747
> Project: Spark
>  Issue Type: Bug
>  Components: docs, SQL
>Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> After SPARK-34697, DESCRIBE FUNCTION and SHOW FUNCTIONS can describe/show 
> built-in operators including the following virtual operators.
> * !=
> * <>
> * between
> * case
> * ||
> But they are still absent from the built-in functions document.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34747) Add virtual operators to the built-in function document.

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34747:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Add virtual operators to the built-in function document.
> 
>
> Key: SPARK-34747
> URL: https://issues.apache.org/jira/browse/SPARK-34747
> Project: Spark
>  Issue Type: Bug
>  Components: docs, SQL
>Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> After SPARK-34697, DESCRIBE FUNCTION and SHOW FUNCTIONS can describe/show 
> built-in operators including the following virtual operators.
> * !=
> * <>
> * between
> * case
> * ||
> But they are still absent from the built-in functions document.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34747) Add virtual operators to the built-in function document.

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34747:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Add virtual operators to the built-in function document.
> 
>
> Key: SPARK-34747
> URL: https://issues.apache.org/jira/browse/SPARK-34747
> Project: Spark
>  Issue Type: Bug
>  Components: docs, SQL
>Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> After SPARK-34697, DESCRIBE FUNCTION and SHOW FUNCTIONS can describe/show 
> built-in operators including the following virtual operators.
> * !=
> * <>
> * between
> * case
> * ||
> But they are still absent from the built-in functions document.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34747) Add virtual operators to the built-in function document.

2021-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301789#comment-17301789
 ] 

Apache Spark commented on SPARK-34747:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/31841

> Add virtual operators to the built-in function document.
> 
>
> Key: SPARK-34747
> URL: https://issues.apache.org/jira/browse/SPARK-34747
> Project: Spark
>  Issue Type: Bug
>  Components: docs, SQL
>Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> After SPARK-34697, DESCRIBE FUNCTION and SHOW FUNCTIONS can describe/show 
> built-in operators including the following virtual operators.
> * !=
> * <>
> * between
> * case
> * ||
> But they are still absent from the built-in functions document.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34748) Create a rule of the analysis logic for streaming write

2021-03-15 Thread Yuanjian Li (Jira)

Yuanjian Li created SPARK-34748:
---

 Summary: Create a rule of the analysis logic for streaming write
 Key: SPARK-34748
 URL: https://issues.apache.org/jira/browse/SPARK-34748
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.2.0
Reporter: Yuanjian Li


Currently, the analysis logic for streaming write is mixed in 
StreamingQueryManager. If we create a specific analyzer rule and separated 
logical plans, it should be helpful for further extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34747) Add virtual operators to the built-in function document.

2021-03-15 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-34747:
--

 Summary: Add virtual operators to the built-in function document.
 Key: SPARK-34747
 URL: https://issues.apache.org/jira/browse/SPARK-34747
 Project: Spark
  Issue Type: Bug
  Components: docs, SQL
Affects Versions: 3.1.1, 3.0.2, 2.4.7, 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


After SPARK-34697, DESCRIBE FUNCTION and SHOW FUNCTIONS can describe/show 
built-in operators including the following virtual operators.

* !=
* <>
* between
* case
* ||

But they are still absent from the built-in functions document.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34746) Spark dependencies require scala 2.12.12

2021-03-15 Thread Peter Kaiser (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301733#comment-17301733
 ] 

Peter Kaiser commented on SPARK-34746:
--

works when enforcing Scala 2.12.10 in the gradle build file:
{code:java}
implementation ('org.scala-lang:scala-library:2.12.10') {
 force = true
}{code}

> Spark dependencies require scala 2.12.12
> 
>
> Key: SPARK-34746
> URL: https://issues.apache.org/jira/browse/SPARK-34746
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.1
>Reporter: Peter Kaiser
>Priority: Critical
>
> In our application we're creating a spark session programmatically. The 
> application is built using gradle.
> After upgrading spark to 3.1.1 it no longer works, due to incompatible 
> classes on driver and executor (namely: 
> scala.lang.collections.immutable.WrappedArray.ofRef).
> Turns out this was caused by different scala versions on driver vs. executor. 
> While spark still comes with Scala 2.12.10, some of its dependencies in the 
> gradle build require Scala 2.12.12:
> {noformat}
> Cannot find a version of 'org.scala-lang:scala-library' that satisfies the 
> version constraints:
> Dependency path '...' --> '...' --> 'org.scala-lang:scala-library:{strictly 
> 2.12.10}'
> Dependency path '...' --> 'org.apache.spark:spark-core_2.12:3.1.1' --> 
> 'org.json4s:json4s-jackson_2.12:3.7.0-M5' --> 
> 'org.scala-lang:scala-library:2.12.12' {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34746) Spark dependencies require scala 2.12.12

2021-03-15 Thread Peter Kaiser (Jira)

Peter Kaiser created SPARK-34746:


 Summary: Spark dependencies require scala 2.12.12
 Key: SPARK-34746
 URL: https://issues.apache.org/jira/browse/SPARK-34746
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.1
Reporter: Peter Kaiser


In our application we're creating a spark session programmatically. The 
application is built using gradle.

After upgrading spark to 3.1.1 it no longer works, due to incompatible classes 
on driver and executor (namely: 
scala.lang.collections.immutable.WrappedArray.ofRef).



Turns out this was caused by different scala versions on driver vs. executor. 
While spark still comes with Scala 2.12.10, some of its dependencies in the 
gradle build require Scala 2.12.12:
{noformat}
Cannot find a version of 'org.scala-lang:scala-library' that satisfies the 
version constraints:
Dependency path '...' --> '...' --> 'org.scala-lang:scala-library:{strictly 
2.12.10}'
Dependency path '...' --> 'org.apache.spark:spark-core_2.12:3.1.1' --> 
'org.json4s:json4s-jackson_2.12:3.7.0-M5' --> 
'org.scala-lang:scala-library:2.12.12' {noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns

2021-03-15 Thread Chen Zou (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701
 ] 

Chen Zou edited comment on SPARK-34694 at 3/15/21, 3:26 PM:


Hi Hyukjin,

I think the design you described would work.

But the current org.apache.spark.sql.sources.Filter isn't built under the 
assumption that the 'value' parameter could be a column reference.

e.g. the findReferences member function does not consider value being a column 
references.


{code:scala}
  protected def findReferences(value: Any): Array[String] = value match {
case f: Filter => f.references
case _ => Array.empty
  }

{code}
 

And this is probably why 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the 
cross-column filters down to data sources.

The end result is that cross-column filters don't get pushed down, from stderr 
of a spark job doing TPC-H Q12:

21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: 
 Pushing operators to lineitem@[file:///blah/blah/lineitem]
 Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), 
GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01)
 Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < 
l_commitdate#11)
 Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, 
l_shipmode#14

 

Regards,
 Chen


was (Author: zinechant):
Hi Hyukjin,

I think the design you described would work.

But the current org.apache.spark.sql.sources.Filter isn't built under the 
assumption that the 'value' parameter could be a column reference.

e.g. the findReferences member function does not consider value being a column 
references.


{code:scala}
  protected def findReferences(value: Any): Array[String] = value match {
case f: Filter => f.references
case _ => Array.empty
  }

{code}
 

And this is probably why 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the 
cross-column filters down to data sources.

The end result is that cross-column filters don't get pushed down, from stderr 
of a spark job doing TPC-H Q12:

21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: 
 Pushing operators to 
lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem]
 Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), 
GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01)
 Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < 
l_commitdate#11)
 Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, 
l_shipmode#14

 

Regards,
 Chen

> Improve Spark SQL Source Filter to allow pushdown of filters span multiple 
> columns
> --
>
> Key: SPARK-34694
> URL: https://issues.apache.org/jira/browse/SPARK-34694
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1
>Reporter: Chen Zou
>Priority: Minor
>
> The current org.apache.spark.sql.sources.Filter abstract class only allows 
> pushdown of filters on single column or sum of products of multiple such 
> single-column filters.
> Filters on multiple columns cannot be pushed down through this Filter 
> subclass to source, e.g. from TPC-H benchmark on lineitem table:
> (l_commitdate#11 < l_receiptdate#12)
> (l_shipdate#10 < l_commitdate#11)
>  
> The current design probably originates from the point that columnar source 
> has a hard time supporting these cross-column filters. But with batching 
> implemented in columnar sources, they can still support cross-column filters. 
>  This issue tries to open up discussion on a more general Filter interface to 
> allow pushing down cross-column filters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns

2021-03-15 Thread Chen Zou (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701
 ] 

Chen Zou edited comment on SPARK-34694 at 3/15/21, 3:24 PM:


Hi Hyukjin,

I think the design you described would work.

But the current org.apache.spark.sql.sources.Filter isn't built under the 
assumption that the 'value' parameter could be a column reference.

e.g. the findReferences member function does not consider value being a column 
references.


{code:scala}
  protected def findReferences(value: Any): Array[String] = value match {
case f: Filter => f.references
case _ => Array.empty
  }

{code}
 

And this is probably why 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the 
cross-column filters down to data sources.

The end result is that cross-column filters don't get pushed down, from stderr 
of a spark job doing TPC-H Q12:

21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: 
 Pushing operators to 
lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem]
 Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), 
GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01)
 Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < 
l_commitdate#11)
 Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, 
l_shipmode#14

 

Regards,
 Chen


was (Author: zinechant):
Hi Hyukjin,

I think the design you described would work.

But the current org.apache.spark.sql.sources.Filter isn't built under the 
assumption that the 'value' parameter could be a column reference.

e.g. the findReferences member function does not consider value being a column 
references.

  protected def findReferences(value: Any): Array[String] = value match {
case f: Filter => f.references

case _ => Array.empty
  }

 

And this is probably why 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the 
cross-column filters down to data sources.

The end result is that cross-column filters don't get pushed down, from stderr 
of a spark job doing TPC-H Q12:

21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: 
 Pushing operators to 
lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem]
 Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), 
GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01)
 Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < 
l_commitdate#11)
 Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, 
l_shipmode#14

 

Regards,
 Chen

> Improve Spark SQL Source Filter to allow pushdown of filters span multiple 
> columns
> --
>
> Key: SPARK-34694
> URL: https://issues.apache.org/jira/browse/SPARK-34694
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1
>Reporter: Chen Zou
>Priority: Minor
>
> The current org.apache.spark.sql.sources.Filter abstract class only allows 
> pushdown of filters on single column or sum of products of multiple such 
> single-column filters.
> Filters on multiple columns cannot be pushed down through this Filter 
> subclass to source, e.g. from TPC-H benchmark on lineitem table:
> (l_commitdate#11 < l_receiptdate#12)
> (l_shipdate#10 < l_commitdate#11)
>  
> The current design probably originates from the point that columnar source 
> has a hard time supporting these cross-column filters. But with batching 
> implemented in columnar sources, they can still support cross-column filters. 
>  This issue tries to open up discussion on a more general Filter interface to 
> allow pushing down cross-column filters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns

2021-03-15 Thread Chen Zou (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701
 ] 

Chen Zou edited comment on SPARK-34694 at 3/15/21, 3:23 PM:


Hi Hyukjin,

I think the design you described would work.

But the current org.apache.spark.sql.sources.Filter isn't built under the 
assumption that the 'value' parameter could be a column reference.

e.g. the findReferences member function does not consider value being a column 
references.
{quote}  protected def findReferences(value: Any): Array[String] = value match {
case f: Filter => f.references
case _ => Array.empty
  }
{quote}
 

And this is probably why 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the 
cross-column filters down to data sources.

The end result is that cross-column filters don't get pushed down, from stderr 
of a spark job doing TPC-H Q12:

21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: 
 Pushing operators to 
lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem]
 Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), 
GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01)
 Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < 
l_commitdate#11)
 Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, 
l_shipmode#14

 

Regards,
 Chen


was (Author: zinechant):
Hi Hyukjin,

I think the design you described would work.

But the current org.apache.spark.sql.sources.Filter isn't built under the 
assumption that the 'value' parameter could be a column reference.

e.g. the findReferences member function does not consider value being a column 
references.
{quote}  protected def findReferences(value: Any): Array[String] = value match {
case f: Filter => f.references
case _ => Array.empty
  }
 {quote}
 

And this is probably why 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the 
cross-column filters down to data sources.

The end result is that cross-column filters don't get pushed down, from stderr 
of a spark job doing TPC-H Q12:

21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: 
 Pushing operators to 
lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem]
 Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), 
GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01)
 Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < 
l_commitdate#11)
 Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, 
l_shipmode#14

 

Regards,
 Chen

> Improve Spark SQL Source Filter to allow pushdown of filters span multiple 
> columns
> --
>
> Key: SPARK-34694
> URL: https://issues.apache.org/jira/browse/SPARK-34694
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1
>Reporter: Chen Zou
>Priority: Minor
>
> The current org.apache.spark.sql.sources.Filter abstract class only allows 
> pushdown of filters on single column or sum of products of multiple such 
> single-column filters.
> Filters on multiple columns cannot be pushed down through this Filter 
> subclass to source, e.g. from TPC-H benchmark on lineitem table:
> (l_commitdate#11 < l_receiptdate#12)
> (l_shipdate#10 < l_commitdate#11)
>  
> The current design probably originates from the point that columnar source 
> has a hard time supporting these cross-column filters. But with batching 
> implemented in columnar sources, they can still support cross-column filters. 
>  This issue tries to open up discussion on a more general Filter interface to 
> allow pushing down cross-column filters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns

2021-03-15 Thread Chen Zou (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701
 ] 

Chen Zou edited comment on SPARK-34694 at 3/15/21, 3:23 PM:


Hi Hyukjin,

I think the design you described would work.

But the current org.apache.spark.sql.sources.Filter isn't built under the 
assumption that the 'value' parameter could be a column reference.

e.g. the findReferences member function does not consider value being a column 
references.

  protected def findReferences(value: Any): Array[String] = value match {
case f: Filter => f.references
case _ => Array.empty
  }

 

And this is probably why 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the 
cross-column filters down to data sources.

The end result is that cross-column filters don't get pushed down, from stderr 
of a spark job doing TPC-H Q12:

21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: 
 Pushing operators to 
lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem]
 Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), 
GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01)
 Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < 
l_commitdate#11)
 Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, 
l_shipmode#14

 

Regards,
 Chen


was (Author: zinechant):
Hi Hyukjin,

I think the design you described would work.

But the current org.apache.spark.sql.sources.Filter isn't built under the 
assumption that the 'value' parameter could be a column reference.

e.g. the findReferences member function does not consider value being a column 
references.
{quote}  protected def findReferences(value: Any): Array[String] = value match {
case f: Filter => f.references
case _ => Array.empty
  }
{quote}
 

And this is probably why 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the 
cross-column filters down to data sources.

The end result is that cross-column filters don't get pushed down, from stderr 
of a spark job doing TPC-H Q12:

21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: 
 Pushing operators to 
lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem]
 Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), 
GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01)
 Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < 
l_commitdate#11)
 Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, 
l_shipmode#14

 

Regards,
 Chen

> Improve Spark SQL Source Filter to allow pushdown of filters span multiple 
> columns
> --
>
> Key: SPARK-34694
> URL: https://issues.apache.org/jira/browse/SPARK-34694
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1
>Reporter: Chen Zou
>Priority: Minor
>
> The current org.apache.spark.sql.sources.Filter abstract class only allows 
> pushdown of filters on single column or sum of products of multiple such 
> single-column filters.
> Filters on multiple columns cannot be pushed down through this Filter 
> subclass to source, e.g. from TPC-H benchmark on lineitem table:
> (l_commitdate#11 < l_receiptdate#12)
> (l_shipdate#10 < l_commitdate#11)
>  
> The current design probably originates from the point that columnar source 
> has a hard time supporting these cross-column filters. But with batching 
> implemented in columnar sources, they can still support cross-column filters. 
>  This issue tries to open up discussion on a more general Filter interface to 
> allow pushing down cross-column filters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns

2021-03-15 Thread Chen Zou (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701
 ] 

Chen Zou edited comment on SPARK-34694 at 3/15/21, 3:23 PM:


Hi Hyukjin,

I think the design you described would work.

But the current org.apache.spark.sql.sources.Filter isn't built under the 
assumption that the 'value' parameter could be a column reference.

e.g. the findReferences member function does not consider value being a column 
references.

  protected def findReferences(value: Any): Array[String] = value match {
case f: Filter => f.references

case _ => Array.empty
  }

 

And this is probably why 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the 
cross-column filters down to data sources.

The end result is that cross-column filters don't get pushed down, from stderr 
of a spark job doing TPC-H Q12:

21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: 
 Pushing operators to 
lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem]
 Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), 
GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01)
 Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < 
l_commitdate#11)
 Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, 
l_shipmode#14

 

Regards,
 Chen


was (Author: zinechant):
Hi Hyukjin,

I think the design you described would work.

But the current org.apache.spark.sql.sources.Filter isn't built under the 
assumption that the 'value' parameter could be a column reference.

e.g. the findReferences member function does not consider value being a column 
references.

  protected def findReferences(value: Any): Array[String] = value match {
case f: Filter => f.references
case _ => Array.empty
  }

 

And this is probably why 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the 
cross-column filters down to data sources.

The end result is that cross-column filters don't get pushed down, from stderr 
of a spark job doing TPC-H Q12:

21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: 
 Pushing operators to 
lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem]
 Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), 
GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01)
 Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < 
l_commitdate#11)
 Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, 
l_shipmode#14

 

Regards,
 Chen

> Improve Spark SQL Source Filter to allow pushdown of filters span multiple 
> columns
> --
>
> Key: SPARK-34694
> URL: https://issues.apache.org/jira/browse/SPARK-34694
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1
>Reporter: Chen Zou
>Priority: Minor
>
> The current org.apache.spark.sql.sources.Filter abstract class only allows 
> pushdown of filters on single column or sum of products of multiple such 
> single-column filters.
> Filters on multiple columns cannot be pushed down through this Filter 
> subclass to source, e.g. from TPC-H benchmark on lineitem table:
> (l_commitdate#11 < l_receiptdate#12)
> (l_shipdate#10 < l_commitdate#11)
>  
> The current design probably originates from the point that columnar source 
> has a hard time supporting these cross-column filters. But with batching 
> implemented in columnar sources, they can still support cross-column filters. 
>  This issue tries to open up discussion on a more general Filter interface to 
> allow pushing down cross-column filters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns

2021-03-15 Thread Chen Zou (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701
 ] 

Chen Zou edited comment on SPARK-34694 at 3/15/21, 3:22 PM:


Hi Hyukjin,

I think the design you described would work.

But the current org.apache.spark.sql.sources.Filter isn't built under the 
assumption that the 'value' parameter could be a column reference.

e.g. the findReferences member function does not consider value being a column 
references.
{quote}  protected def findReferences(value: Any): Array[String] = value match {
case f: Filter => f.references
case _ => Array.empty
  }
 {quote}
 

And this is probably why 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the 
cross-column filters down to data sources.

The end result is that cross-column filters don't get pushed down, from stderr 
of a spark job doing TPC-H Q12:

21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: 
 Pushing operators to 
lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem]
 Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), 
GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01)
 Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < 
l_commitdate#11)
 Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, 
l_shipmode#14

 

Regards,
 Chen


was (Author: zinechant):
Hi Hyukjin,

I think the design you described would work.

But the current org.apache.spark.sql.sources.Filter isn't built under the 
assumption that the 'value' parameter could be a column reference.

e.g. the findReferences member function does not consider value being a column 
references.
{quote}  protected def findReferences(value: Any): Array[String] = value match {
case f: Filter => f.references
case _ => Array.empty
  }{quote}
 

And this is probably why 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the 
cross-column filters down to data sources.

The end result is that cross-column filters don't get pushed down, from stderr 
of a spark job doing TPC-H Q12:

21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: 
Pushing operators to 
lineitem@file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem
Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), 
GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01)
Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < 
l_commitdate#11)
Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, 
l_shipmode#14

 

Regards,
Chen

> Improve Spark SQL Source Filter to allow pushdown of filters span multiple 
> columns
> --
>
> Key: SPARK-34694
> URL: https://issues.apache.org/jira/browse/SPARK-34694
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1
>Reporter: Chen Zou
>Priority: Minor
>
> The current org.apache.spark.sql.sources.Filter abstract class only allows 
> pushdown of filters on single column or sum of products of multiple such 
> single-column filters.
> Filters on multiple columns cannot be pushed down through this Filter 
> subclass to source, e.g. from TPC-H benchmark on lineitem table:
> (l_commitdate#11 < l_receiptdate#12)
> (l_shipdate#10 < l_commitdate#11)
>  
> The current design probably originates from the point that columnar source 
> has a hard time supporting these cross-column filters. But with batching 
> implemented in columnar sources, they can still support cross-column filters. 
>  This issue tries to open up discussion on a more general Filter interface to 
> allow pushing down cross-column filters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns

2021-03-15 Thread Chen Zou (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701
 ] 

Chen Zou commented on SPARK-34694:
--

Hi Hyukjin,

I think the design you described would work.

But the current org.apache.spark.sql.sources.Filter isn't built under the 
assumption that the 'value' parameter could be a column reference.

e.g. the findReferences member function does not consider value being a column 
references.
{quote}  protected def findReferences(value: Any): Array[String] = value match {
case f: Filter => f.references
case _ => Array.empty
  }{quote}
 

And this is probably why 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the 
cross-column filters down to data sources.

The end result is that cross-column filters don't get pushed down, from stderr 
of a spark job doing TPC-H Q12:

21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: 
Pushing operators to 
lineitem@file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem
Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), 
GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01)
Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < 
l_commitdate#11)
Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, 
l_shipmode#14

 

Regards,
Chen

> Improve Spark SQL Source Filter to allow pushdown of filters span multiple 
> columns
> --
>
> Key: SPARK-34694
> URL: https://issues.apache.org/jira/browse/SPARK-34694
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1
>Reporter: Chen Zou
>Priority: Minor
>
> The current org.apache.spark.sql.sources.Filter abstract class only allows 
> pushdown of filters on single column or sum of products of multiple such 
> single-column filters.
> Filters on multiple columns cannot be pushed down through this Filter 
> subclass to source, e.g. from TPC-H benchmark on lineitem table:
> (l_commitdate#11 < l_receiptdate#12)
> (l_shipdate#10 < l_commitdate#11)
>  
> The current design probably originates from the point that columnar source 
> has a hard time supporting these cross-column filters. But with batching 
> implemented in columnar sources, they can still support cross-column filters. 
>  This issue tries to open up discussion on a more general Filter interface to 
> allow pushing down cross-column filters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34745) Unify overflow exception error message of integral types

2021-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301693#comment-17301693
 ] 

Apache Spark commented on SPARK-34745:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/31840

> Unify overflow exception error message of integral types
> 
>
> Key: SPARK-34745
> URL: https://issues.apache.org/jira/browse/SPARK-34745
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Current, the overflow exception error messages of integral types are 
> different.
> For Byte/Short type, the message is "... caused overflow"
> For Int/Long, the message is "int/long overflow" since Spark is calling the 
> "*Exact"(e.g. addExact, negateExact) methods from java.lang.Math.
> We should unify the error message by changing the message of Byte/Short as 
> "tinyint/smallint overflow"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34745) Unify overflow exception error message of integral types

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34745:


Assignee: Apache Spark  (was: Gengliang Wang)

> Unify overflow exception error message of integral types
> 
>
> Key: SPARK-34745
> URL: https://issues.apache.org/jira/browse/SPARK-34745
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Current, the overflow exception error messages of integral types are 
> different.
> For Byte/Short type, the message is "... caused overflow"
> For Int/Long, the message is "int/long overflow" since Spark is calling the 
> "*Exact"(e.g. addExact, negateExact) methods from java.lang.Math.
> We should unify the error message by changing the message of Byte/Short as 
> "tinyint/smallint overflow"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34745) Unify overflow exception error message of integral types

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34745:


Assignee: Gengliang Wang  (was: Apache Spark)

> Unify overflow exception error message of integral types
> 
>
> Key: SPARK-34745
> URL: https://issues.apache.org/jira/browse/SPARK-34745
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Current, the overflow exception error messages of integral types are 
> different.
> For Byte/Short type, the message is "... caused overflow"
> For Int/Long, the message is "int/long overflow" since Spark is calling the 
> "*Exact"(e.g. addExact, negateExact) methods from java.lang.Math.
> We should unify the error message by changing the message of Byte/Short as 
> "tinyint/smallint overflow"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34745) Unify overflow exception error message of integral types

2021-03-15 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-34745:
---
Description: 
Current, the overflow exception error messages of integral types are different.
For Byte/Short type, the message is "... caused overflow"
For Int/Long, the message is "int/long overflow" since Spark is calling the 
"*Exact"(e.g. addExact, negateExact) methods from java.lang.Math.

We should unify the error message by changing the message of Byte/Short as 
"tinyint/smallint overflow"

  was:
Current, the overflow exception error messages of integral types are different.
For Byte/Short type, the message is "... caused overflow"
For Int/Long, the message is "int/long overflow" since Spark is calling the 
"exact*" methods from java.lang.Math.

We should unify the error message by changing the message of Byte/Short as 
"tinyint/smallint overflow"


> Unify overflow exception error message of integral types
> 
>
> Key: SPARK-34745
> URL: https://issues.apache.org/jira/browse/SPARK-34745
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Current, the overflow exception error messages of integral types are 
> different.
> For Byte/Short type, the message is "... caused overflow"
> For Int/Long, the message is "int/long overflow" since Spark is calling the 
> "*Exact"(e.g. addExact, negateExact) methods from java.lang.Math.
> We should unify the error message by changing the message of Byte/Short as 
> "tinyint/smallint overflow"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34744) Improve error message for casting cause overflow error

2021-03-15 Thread Xingchao, Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301687#comment-17301687
 ] 

Xingchao, Zhang commented on SPARK-34744:
-

I will try to fix it

> Improve error message for casting cause overflow error
> --
>
> Key: SPARK-34744
> URL: https://issues.apache.org/jira/browse/SPARK-34744
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> For example:
> {code:sql}
> set spark.sql.ansi.enabled=true;
> select tinyint(128) * tinyint(2);
> {code}
> Error message:
> {noformat}
> Casting 128 to scala.Byte$ causes overflow
> {noformat}
> Expected:
> {noformat}
> Casting 128 to tinyint causes overflow
> {noformat}
> We should use DataType's catalogString.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34745) Unify overflow exception error message of integral types

2021-03-15 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-34745:
--

 Summary: Unify overflow exception error message of integral types
 Key: SPARK-34745
 URL: https://issues.apache.org/jira/browse/SPARK-34745
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Current, the overflow exception error messages of integral types are different.
For Byte/Short type, the message is "... caused overflow"
For Int/Long, the message is "int/long overflow" since Spark is calling the 
"exact*" methods from java.lang.Math.

We should unify the error message by changing the message of Byte/Short as 
"tinyint/smallint overflow"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34744) Improve error message for casting cause overflow error

2021-03-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-34744:

Description: 
For example:
{code:sql}
set spark.sql.ansi.enabled=true;
select tinyint(128) * tinyint(2);
{code}

Error message:
{noformat}
Casting 128 to scala.Byte$ causes overflow
{noformat}

Expected:
{noformat}
Casting 128 to tinyint causes overflow
{noformat}

We should use DataType's catalogString.






  was:
For example:
{code:sql}
set spark.sql.ansi.enabled=true;
select tinyint(128) * tinyint(2);
{code}

Error message:
{noformat}
Casting 128 to scala.Byte$ causes overflow
{noformat}

Expected:
{noformat}
Casting 128 to tinyint causes overflow
{noformat}

We can update 
[castingCauseOverflowError|https://github.com/apache/spark/blob/5b2ad59f64a9bb065b49acb2e73a6b246a3d8c64/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala#L64-L66]
 to:
{code:scala}
  def castingCauseOverflowError(t: Any, targetType: DataType): 
ArithmeticException = {
new ArithmeticException(s"Casting $t to ${targetType.catalogString} causes 
overflow")
  }
{code}







> Improve error message for casting cause overflow error
> --
>
> Key: SPARK-34744
> URL: https://issues.apache.org/jira/browse/SPARK-34744
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> For example:
> {code:sql}
> set spark.sql.ansi.enabled=true;
> select tinyint(128) * tinyint(2);
> {code}
> Error message:
> {noformat}
> Casting 128 to scala.Byte$ causes overflow
> {noformat}
> Expected:
> {noformat}
> Casting 128 to tinyint causes overflow
> {noformat}
> We should use DataType's catalogString.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34744) Improve error message for casting cause overflow error

2021-03-15 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-34744:
---

 Summary: Improve error message for casting cause overflow error
 Key: SPARK-34744
 URL: https://issues.apache.org/jira/browse/SPARK-34744
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Yuming Wang


For example:
{code:sql}
set spark.sql.ansi.enabled=true;
select tinyint(128) * tinyint(2);
{code}

Error message:
{noformat}
Casting 128 to scala.Byte$ causes overflow
{noformat}

Expected:
{noformat}
Casting 128 to tinyint causes overflow
{noformat}

We can update 
[castingCauseOverflowError|https://github.com/apache/spark/blob/5b2ad59f64a9bb065b49acb2e73a6b246a3d8c64/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala#L64-L66]
 to:
{code:scala}
  def castingCauseOverflowError(t: Any, targetType: DataType): 
ArithmeticException = {
new ArithmeticException(s"Casting $t to ${targetType.catalogString} causes 
overflow")
  }
{code}








--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34087) a memory leak occurs when we clone the spark session

2021-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301665#comment-17301665
 ] 

Apache Spark commented on SPARK-34087:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/31839

> a memory leak occurs when we clone the spark session
> 
>
> Key: SPARK-34087
> URL: https://issues.apache.org/jira/browse/SPARK-34087
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Fu Chen
>Priority: Major
> Attachments: 1610451044690.jpg
>
>
> In Spark-3.0.1, the memory leak occurs when we keep cloning the spark session 
> because a new ExecutionListenerBus instance will add to AsyncEventQueue when 
> we clone a new session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34739) Add an year-month interval to a timestamp

2021-03-15 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-34739.
--
Resolution: Done

> Add an year-month interval to a timestamp
> -
>
> Key: SPARK-34739
> URL: https://issues.apache.org/jira/browse/SPARK-34739
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Support adding of YearMonthIntervalType values to TIMESTAMP values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`

2021-03-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-34743.
---
Fix Version/s: 3.0.3
   3.1.2
   2.4.8
   3.2.0
   Resolution: Fixed

Issue resolved by pull request 31837
[https://github.com/apache/spark/pull/31837]

> ExpressionEncoderSuite should use deepEquals when we expect `array of array`
> 
>
> Key: SPARK-34743
> URL: https://issues.apache.org/jira/browse/SPARK-34743
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.2, 3.1.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0, 2.4.8, 3.1.2, 3.0.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`

2021-03-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-34743:
-

Assignee: Dongjoon Hyun

> ExpressionEncoderSuite should use deepEquals when we expect `array of array`
> 
>
> Key: SPARK-34743
> URL: https://issues.apache.org/jira/browse/SPARK-34743
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.2, 3.1.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34639) always remove unnecessary Alias in Analyzer.resolveExpression

2021-03-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34639.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31758
[https://github.com/apache/spark/pull/31758]

> always remove unnecessary Alias in Analyzer.resolveExpression
> -
>
> Key: SPARK-34639
> URL: https://issues.apache.org/jira/browse/SPARK-34639
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34639) always remove unnecessary Alias in Analyzer.resolveExpression

2021-03-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34639:
---

Assignee: Wenchen Fan

> always remove unnecessary Alias in Analyzer.resolveExpression
> -
>
> Key: SPARK-34639
> URL: https://issues.apache.org/jira/browse/SPARK-34639
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34743:


Assignee: (was: Apache Spark)

> ExpressionEncoderSuite should use deepEquals when we expect `array of array`
> 
>
> Key: SPARK-34743
> URL: https://issues.apache.org/jira/browse/SPARK-34743
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.2, 3.1.1
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`

2021-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301488#comment-17301488
 ] 

Apache Spark commented on SPARK-34743:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31837

> ExpressionEncoderSuite should use deepEquals when we expect `array of array`
> 
>
> Key: SPARK-34743
> URL: https://issues.apache.org/jira/browse/SPARK-34743
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.2, 3.1.1
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34743:


Assignee: Apache Spark

> ExpressionEncoderSuite should use deepEquals when we expect `array of array`
> 
>
> Key: SPARK-34743
> URL: https://issues.apache.org/jira/browse/SPARK-34743
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.2, 3.1.1
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`

2021-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301487#comment-17301487
 ] 

Apache Spark commented on SPARK-34743:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31837

> ExpressionEncoderSuite should use deepEquals when we expect `array of array`
> 
>
> Key: SPARK-34743
> URL: https://issues.apache.org/jira/browse/SPARK-34743
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.2, 3.1.1
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`

2021-03-15 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-34743:
-

 Summary: ExpressionEncoderSuite should use deepEquals when we 
expect `array of array`
 Key: SPARK-34743
 URL: https://issues.apache.org/jira/browse/SPARK-34743
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.1.1, 3.0.2, 2.4.7, 2.3.4, 2.2.3, 2.1.3, 2.0.2, 1.6.3
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34742) ANSI mode: Abs throws exception if input is out of range

2021-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301476#comment-17301476
 ] 

Apache Spark commented on SPARK-34742:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/31836

> ANSI mode: Abs throws exception if input is out of range
> 
>
> Key: SPARK-34742
> URL: https://issues.apache.org/jira/browse/SPARK-34742
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> For the following cases, ABS should throw exceptions since the result is out 
> of the range of result data type in ANSI mode.
> {code:java}
> SELECT abs(${Int.MinValue});
> SELECT abs(${Long.MinValue});
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34742) ANSI mode: Abs throws exception if input is out of range

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34742:


Assignee: Gengliang Wang  (was: Apache Spark)

> ANSI mode: Abs throws exception if input is out of range
> 
>
> Key: SPARK-34742
> URL: https://issues.apache.org/jira/browse/SPARK-34742
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> For the following cases, ABS should throw exceptions since the result is out 
> of the range of result data type in ANSI mode.
> {code:java}
> SELECT abs(${Int.MinValue});
> SELECT abs(${Long.MinValue});
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34742) ANSI mode: Abs throws exception if input is out of range

2021-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301477#comment-17301477
 ] 

Apache Spark commented on SPARK-34742:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/31836

> ANSI mode: Abs throws exception if input is out of range
> 
>
> Key: SPARK-34742
> URL: https://issues.apache.org/jira/browse/SPARK-34742
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> For the following cases, ABS should throw exceptions since the result is out 
> of the range of result data type in ANSI mode.
> {code:java}
> SELECT abs(${Int.MinValue});
> SELECT abs(${Long.MinValue});
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34742) ANSI mode: Abs throws exception if input is out of range

2021-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34742:


Assignee: Apache Spark  (was: Gengliang Wang)

> ANSI mode: Abs throws exception if input is out of range
> 
>
> Key: SPARK-34742
> URL: https://issues.apache.org/jira/browse/SPARK-34742
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> For the following cases, ABS should throw exceptions since the result is out 
> of the range of result data type in ANSI mode.
> {code:java}
> SELECT abs(${Int.MinValue});
> SELECT abs(${Long.MinValue});
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34742) ANSI mode: Abs throws exception if input is out of range

2021-03-15 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-34742:
--

 Summary: ANSI mode: Abs throws exception if input is out of range
 Key: SPARK-34742
 URL: https://issues.apache.org/jira/browse/SPARK-34742
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


For the following cases, ABS should throw exceptions since the result is out of 
the range of result data type in ANSI mode.

{code:java}
SELECT abs(${Int.MinValue});
SELECT abs(${Long.MinValue});
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

86 matches

Mail list logo