[jira] [Updated] (SPARK-34727) Difference in results of casting float to timestamp
[ https://issues.apache.org/jira/browse/SPARK-34727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34727: - Fix Version/s: 3.1.2 > Difference in results of casting float to timestamp > --- > > Key: SPARK-34727 > URL: https://issues.apache.org/jira/browse/SPARK-34727 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0, 3.1.2 > > > The code below portraits the issue: > {code:sql} > spark-sql> CREATE TEMP VIEW v1 AS SELECT 16777215.0f AS f; > spark-sql> SELECT * FROM v1; > 1.6777215E7 > spark-sql> SELECT CAST(f AS TIMESTAMP) FROM v1; > 1970-07-14 07:20:15 > spark-sql> CACHE TABLE v1; > spark-sql> SELECT * FROM v1; > 1.6777215E7 > spark-sql> SELECT CAST(f AS TIMESTAMP) FROM v1; > 1970-07-14 07:20:14.951424 > {code} > The result from the cached view *1970-07-14 07:20:14.951424* is different > from un-cached view *1970-07-14 07:20:15*. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34755) Support the utils for transform number format
[ https://issues.apache.org/jira/browse/SPARK-34755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34755: Assignee: (was: Apache Spark) > Support the utils for transform number format > - > > Key: SPARK-34755 > URL: https://issues.apache.org/jira/browse/SPARK-34755 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Data Type Formatting Functions: `to_number` and `to_char` is very useful. > We create this ticket to implement the util for transform -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34755) Support the utils for transform number format
[ https://issues.apache.org/jira/browse/SPARK-34755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302198#comment-17302198 ] Apache Spark commented on SPARK-34755: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/31847 > Support the utils for transform number format > - > > Key: SPARK-34755 > URL: https://issues.apache.org/jira/browse/SPARK-34755 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Data Type Formatting Functions: `to_number` and `to_char` is very useful. > We create this ticket to implement the util for transform -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34755) Support the utils for transform number format
[ https://issues.apache.org/jira/browse/SPARK-34755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34755: Assignee: Apache Spark > Support the utils for transform number format > - > > Key: SPARK-34755 > URL: https://issues.apache.org/jira/browse/SPARK-34755 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > Data Type Formatting Functions: `to_number` and `to_char` is very useful. > We create this ticket to implement the util for transform -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34755) Support the utils for transform number format
jiaan.geng created SPARK-34755: -- Summary: Support the utils for transform number format Key: SPARK-34755 URL: https://issues.apache.org/jira/browse/SPARK-34755 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.2.0 Reporter: jiaan.geng Data Type Formatting Functions: `to_number` and `to_char` is very useful. We create this ticket to implement the util for transform -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34754) sparksql 'add jar' not support hdfs ha mode in k8s
[ https://issues.apache.org/jira/browse/SPARK-34754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lithiumlee-_- updated SPARK-34754: -- Description: Submit app to K8S, the driver already running but meet exception "java.net.UnknownHostException: xx" when starting executors. The udf jar uri using ha style, but the exception stack is "...*createNonHAProxy*..." hql: {code:java} // code placeholder add jar hdfs://xx/test.jar; create temporary function test_udf as 'com.xxx.xxx'; create table test.test_udf as select test_udf('1') name_1; {code} exception: {code:java} // code placeholder TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): java.lang.IllegalArgumentException: java.net.UnknownHostException: xx at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390) at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:816) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:808) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:808) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.UnknownHostException: xx ... 28 more {code} was: The driver already running , but meet exception "java.net.UnknownHostException: xx" when starting executors. The udf jar uri using ha style, but the exception stack is "...*createNonHAProxy*..." hql: {code:java} // code placeholder add jar hdfs://xx/test.jar; create temporary function test_udf as 'com.xxx.xxx'; create table test.test_udf as select test_udf('1') name_1; {code} exception: {code:java} // code placeholder TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): java.lang.IllegalArgumentException: java.net.UnknownHostException: xx at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390) at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721) at org.apache.spark.u
[jira] [Updated] (SPARK-34754) sparksql 'add jar' not support hdfs ha mode in k8s
[ https://issues.apache.org/jira/browse/SPARK-34754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lithiumlee-_- updated SPARK-34754: -- Description: The driver already running , but meet exception "java.net.UnknownHostException: xx" when starting executors. The udf jar uri using ha style, but the exception stack is "...*createNonHAProxy*..." hql: {code:java} // code placeholder add jar hdfs://xx/test.jar; create temporary function test_udf as 'com.xxx.xxx'; create table test.test_udf as select test_udf('1') name_1; {code} exception: {code:java} // code placeholder TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): java.lang.IllegalArgumentException: java.net.UnknownHostException: xx at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390) at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:816) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:808) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:808) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.UnknownHostException: xx ... 28 more {code} was: The driver already running , but meet exception "java.net.UnknownHostException: xx" when starting executors. The udf jar uri using ha style, but the exception stack is "...*createNonHAProxy*..." hql: {code:java} // code placeholder add jar hdfs://xx/test.jar; create temporary function test_udf as 'com.xxx.xxx'; create table test.test_udf as select test_udf('1') name_1; {code} exception: {code:java} // code placeholder TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): java.lang.IllegalArgumentException: java.net.UnknownHostException: xx at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390) at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721) at org.apache.spark.util.Utils$.
[jira] [Created] (SPARK-34754) sparksql 'add jar' not support hdfs ha mode in k8s
lithiumlee-_- created SPARK-34754: - Summary: sparksql 'add jar' not support hdfs ha mode in k8s Key: SPARK-34754 URL: https://issues.apache.org/jira/browse/SPARK-34754 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 2.4.7 Reporter: lithiumlee-_- The driver already running , but meet exception "java.net.UnknownHostException: xx" when starting executors. The udf jar uri using ha style, but the exception stack is "...*createNonHAProxy*..." hql: {code:java} // code placeholder add jar hdfs://xx/test.jar; create temporary function test_udf as 'com.xxx.xxx'; create table test.test_udf as select test_udf('1') name_1; {code} exception: {code:java} // code placeholder TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): java.lang.IllegalArgumentException: java.net.UnknownHostException: xx at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390) at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:816) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:808) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:808) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.UnknownHostException: xx ... 28 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
[ https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-34752: Assignee: Erik Krogen > Upgrade Jetty to 9.4.37 to fix CVE-2020-27223 > - > > Key: SPARK-34752 > URL: https://issues.apache.org/jira/browse/SPARK-34752 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > Another day, another Jetty CVE :) Our internal build tools are complaining > about Spark's dependency on Jetty 9.4.36 and I found it is because there is > another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time > for another upgrade to 9.4.37. > > Find more at: > https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 > https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
[ https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-34752. -- Fix Version/s: 3.1.2 3.2.0 Resolution: Fixed Issue resolved by pull request 31846 [https://github.com/apache/spark/pull/31846] > Upgrade Jetty to 9.4.37 to fix CVE-2020-27223 > - > > Key: SPARK-34752 > URL: https://issues.apache.org/jira/browse/SPARK-34752 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.2.0, 3.1.2 > > > Another day, another Jetty CVE :) Our internal build tools are complaining > about Spark's dependency on Jetty 9.4.36 and I found it is because there is > another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time > for another upgrade to 9.4.37. > > Find more at: > https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 > https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-21449) Hive client's SessionState was not closed properly in HiveExternalCatalog
[ https://issues.apache.org/jira/browse/SPARK-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reopened SPARK-21449: - > Hive client's SessionState was not closed properly in HiveExternalCatalog > -- > > Key: SPARK-21449 > URL: https://issues.apache.org/jira/browse/SPARK-21449 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: bulk-closed > > close the sessionstate to clear `hive.downloaded.resources.dir` and else. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21449) Hive client's SessionState was not closed properly in HiveExternalCatalog
[ https://issues.apache.org/jira/browse/SPARK-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-21449: Labels: (was: bulk-closed) > Hive client's SessionState was not closed properly in HiveExternalCatalog > -- > > Key: SPARK-21449 > URL: https://issues.apache.org/jira/browse/SPARK-21449 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.2.0 > > > close the sessionstate to clear `hive.downloaded.resources.dir` and else. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21449) Hive client's SessionState was not closed properly in HiveExternalCatalog
[ https://issues.apache.org/jira/browse/SPARK-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-21449. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31833 https://github.com/apache/spark/pull/31833 > Hive client's SessionState was not closed properly in HiveExternalCatalog > -- > > Key: SPARK-21449 > URL: https://issues.apache.org/jira/browse/SPARK-21449 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: bulk-closed > Fix For: 3.2.0 > > > close the sessionstate to clear `hive.downloaded.resources.dir` and else. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped
[ https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-23745: Labels: (was: bulk-closed) > Remove the directories of the “hive.downloaded.resources.dir” when > HiveThriftServer2 stopped > > > Key: SPARK-23745 > URL: https://issues.apache.org/jira/browse/SPARK-23745 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: linux >Reporter: zuotingbing >Assignee: Kent Yao >Priority: Major > Fix For: 3.2.0 > > Attachments: 2018-03-20_164832.png > > > !2018-03-20_164832.png! > when start the HiveThriftServer2, we create some directories for > hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not > remove these directories. The directories could accumulate a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped
[ https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-23745. - Fix Version/s: 3.2.0 Assignee: Kent Yao Resolution: Fixed Issue resolved by pull request 31833 https://github.com/apache/spark/pull/31833 > Remove the directories of the “hive.downloaded.resources.dir” when > HiveThriftServer2 stopped > > > Key: SPARK-23745 > URL: https://issues.apache.org/jira/browse/SPARK-23745 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: linux >Reporter: zuotingbing >Assignee: Kent Yao >Priority: Major > Labels: bulk-closed > Fix For: 3.2.0 > > Attachments: 2018-03-20_164832.png > > > !2018-03-20_164832.png! > when start the HiveThriftServer2, we create some directories for > hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not > remove these directories. The directories could accumulate a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns
[ https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302171#comment-17302171 ] Hyukjin Kwon commented on SPARK-34694: -- Oh, okay. it more proposes to handle column references. Probably the concern is about the type handling of the pushed predicate in the source but sure sounds like a valid issue. > Improve Spark SQL Source Filter to allow pushdown of filters span multiple > columns > -- > > Key: SPARK-34694 > URL: https://issues.apache.org/jira/browse/SPARK-34694 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1 >Reporter: Chen Zou >Priority: Minor > > The current org.apache.spark.sql.sources.Filter abstract class only allows > pushdown of filters on single column or sum of products of multiple such > single-column filters. > Filters on multiple columns cannot be pushed down through this Filter > subclass to source, e.g. from TPC-H benchmark on lineitem table: > (l_commitdate#11 < l_receiptdate#12) > (l_shipdate#10 < l_commitdate#11) > > The current design probably originates from the point that columnar source > has a hard time supporting these cross-column filters. But with batching > implemented in columnar sources, they can still support cross-column filters. > This issue tries to open up discussion on a more general Filter interface to > allow pushing down cross-column filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped
[ https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reopened SPARK-23745: - > Remove the directories of the “hive.downloaded.resources.dir” when > HiveThriftServer2 stopped > > > Key: SPARK-23745 > URL: https://issues.apache.org/jira/browse/SPARK-23745 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: linux >Reporter: zuotingbing >Priority: Major > Labels: bulk-closed > Attachments: 2018-03-20_164832.png > > > !2018-03-20_164832.png! > when start the HiveThriftServer2, we create some directories for > hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not > remove these directories. The directories could accumulate a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped
[ https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23745: Assignee: (was: Apache Spark) > Remove the directories of the “hive.downloaded.resources.dir” when > HiveThriftServer2 stopped > > > Key: SPARK-23745 > URL: https://issues.apache.org/jira/browse/SPARK-23745 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: linux >Reporter: zuotingbing >Priority: Major > Labels: bulk-closed > Attachments: 2018-03-20_164832.png > > > !2018-03-20_164832.png! > when start the HiveThriftServer2, we create some directories for > hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not > remove these directories. The directories could accumulate a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped
[ https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23745: Assignee: Apache Spark > Remove the directories of the “hive.downloaded.resources.dir” when > HiveThriftServer2 stopped > > > Key: SPARK-23745 > URL: https://issues.apache.org/jira/browse/SPARK-23745 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: linux >Reporter: zuotingbing >Assignee: Apache Spark >Priority: Major > Labels: bulk-closed > Attachments: 2018-03-20_164832.png > > > !2018-03-20_164832.png! > when start the HiveThriftServer2, we create some directories for > hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not > remove these directories. The directories could accumulate a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21449) Hive client's SessionState was not closed properly in HiveExternalCatalog
[ https://issues.apache.org/jira/browse/SPARK-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-21449: --- Assignee: Kent Yao > Hive client's SessionState was not closed properly in HiveExternalCatalog > -- > > Key: SPARK-21449 > URL: https://issues.apache.org/jira/browse/SPARK-21449 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: bulk-closed > > close the sessionstate to clear `hive.downloaded.resources.dir` and else. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34753) Deadlock in executor RPC shutdown hook
[ https://issues.apache.org/jira/browse/SPARK-34753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Patterson updated SPARK-34753: Attachment: sb-dylanw-spark-0ec26858-b72ed278375bf3a9-exec-38.log > Deadlock in executor RPC shutdown hook > -- > > Key: SPARK-34753 > URL: https://issues.apache.org/jira/browse/SPARK-34753 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.1 > Environment: Not sure this is relevant but let me know and I can > append >Reporter: Dylan Patterson >Priority: Major > Attachments: sb-dylanw-spark-0ec26858-b72ed278375bf3a9-exec-38.log > > > Ran into an issue where executors initiate shutdown sequence, System.exit is > called but java process never dies leaving orphaned containers in kubernetes. > Tracked it down to a deadlock in the RPC shutdown. See thread dump > {code:java} > "Thread-2" #26 prio=5 os_prio=0 tid=0x7f6410231800 nid=0x2a2 waiting on > condition [0x7f63c3bf1000] java.lang.Thread.State: TIMED_WAITING > (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for > <0xc05a47b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) > at > java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475) > at > java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:675) > at > org.apache.spark.rpc.netty.MessageLoop.stop(MessageLoop.scala:60)org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1(Dispatcher.scala:190) > at > org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1$adapted(Dispatcher.scala:187) > at org.apache.spark.rpc.netty.Dispatcher$$Lambda$214/337533935.apply(Unknown > Source) at scala.collection.Iterator.foreach(Iterator.scala:941) at > scala.collection.Iterator.foreach$(Iterator.scala:941) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at > scala.collection.IterableLike.foreach(IterableLike.scala:74) at > scala.collection.IterableLike.foreach$(IterableLike.scala:73) at > scala.collection.AbstractIterable.foreach(Iterable.scala:56) at > org.apache.spark.rpc.netty.Dispatcher.stop(Dispatcher.scala:187) at > org.apache.spark.rpc.netty.NettyRpcEnv.cleanup(NettyRpcEnv.scala:324) at > org.apache.spark.rpc.netty.NettyRpcEnv.shutdown(NettyRpcEnv.scala:302) at > org.apache.spark.SparkEnv.stop(SparkEnv.scala:96) at > org.apache.spark.executor.Executor.stop(Executor.scala:292) at > org.apache.spark.executor.Executor.$anonfun$new$2(Executor.scala:74) at > org.apache.spark.executor.Executor$$Lambda$317/1046854795.apply$mcV$sp(Unknown > Source) at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$Lambda$2192/1832515374.apply$mcV$sp(Unknown > Source) at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932) at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$Lambda$2191/952019066.apply$mcV$sp(Unknown > Source) at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at > scala.util.Try$.apply(Try.scala:213) at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34753) Deadlock in executor RPC shutdown hook
[ https://issues.apache.org/jira/browse/SPARK-34753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302090#comment-17302090 ] Dylan Patterson commented on SPARK-34753: - Aside from fixing the underlying issue it might be worth adding some sort of killswitch timeout for the containers since this causes resource leaks. > Deadlock in executor RPC shutdown hook > -- > > Key: SPARK-34753 > URL: https://issues.apache.org/jira/browse/SPARK-34753 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.1 > Environment: Not sure this is relevant but let me know and I can > append >Reporter: Dylan Patterson >Priority: Major > > Ran into an issue where executors initiate shutdown sequence, System.exit is > called but java process never dies leaving orphaned containers in kubernetes. > Tracked it down to a deadlock in the RPC shutdown. See thread dump > {code:java} > "Thread-2" #26 prio=5 os_prio=0 tid=0x7f6410231800 nid=0x2a2 waiting on > condition [0x7f63c3bf1000] java.lang.Thread.State: TIMED_WAITING > (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for > <0xc05a47b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) > at > java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475) > at > java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:675) > at > org.apache.spark.rpc.netty.MessageLoop.stop(MessageLoop.scala:60)org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1(Dispatcher.scala:190) > at > org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1$adapted(Dispatcher.scala:187) > at org.apache.spark.rpc.netty.Dispatcher$$Lambda$214/337533935.apply(Unknown > Source) at scala.collection.Iterator.foreach(Iterator.scala:941) at > scala.collection.Iterator.foreach$(Iterator.scala:941) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at > scala.collection.IterableLike.foreach(IterableLike.scala:74) at > scala.collection.IterableLike.foreach$(IterableLike.scala:73) at > scala.collection.AbstractIterable.foreach(Iterable.scala:56) at > org.apache.spark.rpc.netty.Dispatcher.stop(Dispatcher.scala:187) at > org.apache.spark.rpc.netty.NettyRpcEnv.cleanup(NettyRpcEnv.scala:324) at > org.apache.spark.rpc.netty.NettyRpcEnv.shutdown(NettyRpcEnv.scala:302) at > org.apache.spark.SparkEnv.stop(SparkEnv.scala:96) at > org.apache.spark.executor.Executor.stop(Executor.scala:292) at > org.apache.spark.executor.Executor.$anonfun$new$2(Executor.scala:74) at > org.apache.spark.executor.Executor$$Lambda$317/1046854795.apply$mcV$sp(Unknown > Source) at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$Lambda$2192/1832515374.apply$mcV$sp(Unknown > Source) at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932) at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$Lambda$2191/952019066.apply$mcV$sp(Unknown > Source) at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at > scala.util.Try$.apply(Try.scala:213) at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34753) Deadlock in executor RPC shutdown hook
Dylan Patterson created SPARK-34753: --- Summary: Deadlock in executor RPC shutdown hook Key: SPARK-34753 URL: https://issues.apache.org/jira/browse/SPARK-34753 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.0.1 Environment: Not sure this is relevant but let me know and I can append Reporter: Dylan Patterson Ran into an issue where executors initiate shutdown sequence, System.exit is called but java process never dies leaving orphaned containers in kubernetes. Tracked it down to a deadlock in the RPC shutdown. See thread dump {code:java} "Thread-2" #26 prio=5 os_prio=0 tid=0x7f6410231800 nid=0x2a2 waiting on condition [0x7f63c3bf1000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xc05a47b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475) at java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:675) at org.apache.spark.rpc.netty.MessageLoop.stop(MessageLoop.scala:60)org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1(Dispatcher.scala:190) at org.apache.spark.rpc.netty.Dispatcher.$anonfun$stop$1$adapted(Dispatcher.scala:187) at org.apache.spark.rpc.netty.Dispatcher$$Lambda$214/337533935.apply(Unknown Source) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.rpc.netty.Dispatcher.stop(Dispatcher.scala:187) at org.apache.spark.rpc.netty.NettyRpcEnv.cleanup(NettyRpcEnv.scala:324) at org.apache.spark.rpc.netty.NettyRpcEnv.shutdown(NettyRpcEnv.scala:302) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96) at org.apache.spark.executor.Executor.stop(Executor.scala:292) at org.apache.spark.executor.Executor.$anonfun$new$2(Executor.scala:74) at org.apache.spark.executor.Executor$$Lambda$317/1046854795.apply$mcV$sp(Unknown Source) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$Lambda$2192/1832515374.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$Lambda$2191/952019066.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
[ https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nivas Umapathy updated SPARK-34751: --- Description: I have a parquet file that has data with invalid column names on it. [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the file attached with this ticket. I tried to load this file with {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} {{df = df.withColumnRenamed('COL 1', 'COL_1')}} {{df = df.withColumnRenamed('COL,2', 'COL_2')}} {{df = df.withColumnRenamed('COL;3', 'COL_3') }} and so on. Now if i call {{df.show()}} it throws this exception that is still pointing to the old column name. {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains invalid character(s) among " ,;{}()}} {{n}} {{t=". Please use alias to rename it.;'}} When i read about it in some blogs, there was suggestion to re-read the same parquet with new schema applied. So i did {{df = glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} and it works, but all the data in the dataframe are null. The same works for String datatypes was: I have a parquet file that has data with invalid column names on it. [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the file [Invalid Header Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing]. I tried to load this file with {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} {{df = df.withColumnRenamed('COL 1', 'COL_1')}} {{df = df.withColumnRenamed('COL,2', 'COL_2')}} {{df = df.withColumnRenamed('COL;3', 'COL_3') }} and so on. Now if i call {{df.show()}} it throws this exception that is still pointing to the old column name. {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains invalid character(s) among " ,;{}()\\n}} {{t=". Please use alias to rename it.;'}} When i read about it in some blogs, there was suggestion to re-read the same parquet with new schema applied. So i did {{df = glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} and it works, but all the data in the dataframe are null. The same works for String datatypes > Parquet with invalid chars on column name reads double as null when a clean > schema is applied > - > > Key: SPARK-34751 > URL: https://issues.apache.org/jira/browse/SPARK-34751 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.3 > Environment: Pyspark 2.4.3 > AWS Glue Dev Endpoint EMR >Reporter: Nivas Umapathy >Priority: Major > Fix For: 2.4.8 > > Attachments: invalid_columns_double.parquet > > > I have a parquet file that has data with invalid column names on it. > [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the > file attached with this ticket. > I tried to load this file with > {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} > {{df = df.withColumnRenamed('COL 1', 'COL_1')}} > {{df = df.withColumnRenamed('COL,2', 'COL_2')}} > {{df = df.withColumnRenamed('COL;3', 'COL_3') }} > and so on. > Now if i call > {{df.show()}} > it throws this exception that is still pointing to the old column name. > {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains > invalid character(s) among " ,;{}()}} > {{n}} > {{t=". Please use alias to rename it.;'}} > > When i read about it in some blogs, there was suggestion to re-read the same > parquet with new schema applied. So i did > {{df = > glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} > > and it works, but all the data in the dataframe are null. The same works for > String datatypes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34729) Faster execution for broadcast nested loop join (left semi/anti with no condition)
[ https://issues.apache.org/jira/browse/SPARK-34729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302043#comment-17302043 ] Apache Spark commented on SPARK-34729: -- User 'c21' has created a pull request for this issue: https://github.com/apache/spark/pull/31845 > Faster execution for broadcast nested loop join (left semi/anti with no > condition) > -- > > Key: SPARK-34729 > URL: https://issues.apache.org/jira/browse/SPARK-34729 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Minor > Fix For: 3.2.0 > > > For `BroadcastNestedLoopJoinExec` left semi and left anti join without > condition. If we broadcast left side. Currently we check whether every row > from broadcast side has a match or not by iterating broadcast side a lot of > time - > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala#L256-L275] > . This is unnecessary, as there's no condition, and we only need to check > whether stream side is empty or not. Create this Jira to add the > optimization. This can boost the affected query execution performance a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
[ https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302028#comment-17302028 ] Apache Spark commented on SPARK-34752: -- User 'xkrogen' has created a pull request for this issue: https://github.com/apache/spark/pull/31846 > Upgrade Jetty to 9.4.37 to fix CVE-2020-27223 > - > > Key: SPARK-34752 > URL: https://issues.apache.org/jira/browse/SPARK-34752 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Priority: Major > > Another day, another Jetty CVE :) Our internal build tools are complaining > about Spark's dependency on Jetty 9.4.36 and I found it is because there is > another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time > for another upgrade to 9.4.37. > > Find more at: > https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 > https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
[ https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34752: Assignee: (was: Apache Spark) > Upgrade Jetty to 9.4.37 to fix CVE-2020-27223 > - > > Key: SPARK-34752 > URL: https://issues.apache.org/jira/browse/SPARK-34752 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Priority: Major > > Another day, another Jetty CVE :) Our internal build tools are complaining > about Spark's dependency on Jetty 9.4.36 and I found it is because there is > another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time > for another upgrade to 9.4.37. > > Find more at: > https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 > https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
[ https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34752: Assignee: Apache Spark > Upgrade Jetty to 9.4.37 to fix CVE-2020-27223 > - > > Key: SPARK-34752 > URL: https://issues.apache.org/jira/browse/SPARK-34752 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Assignee: Apache Spark >Priority: Major > > Another day, another Jetty CVE :) Our internal build tools are complaining > about Spark's dependency on Jetty 9.4.36 and I found it is because there is > another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time > for another upgrade to 9.4.37. > > Find more at: > https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 > https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
[ https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated SPARK-34752: Description: Another day, another Jetty CVE :) Our internal build tools are complaining about Spark's dependency on Jetty 9.4.36 and I found it is because there is another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time for another upgrade to 9.4.37. Find more at: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 was: Another day, another Jetty CVE :) Our internal build tools are complaining about Spark's dependency on Jetty 9.3.36 and I found it is because there is another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time for another upgrade to 9.3.37. Find more at: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 > Upgrade Jetty to 9.4.37 to fix CVE-2020-27223 > - > > Key: SPARK-34752 > URL: https://issues.apache.org/jira/browse/SPARK-34752 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Priority: Major > > Another day, another Jetty CVE :) Our internal build tools are complaining > about Spark's dependency on Jetty 9.4.36 and I found it is because there is > another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time > for another upgrade to 9.4.37. > > Find more at: > https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 > https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34752) Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
[ https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated SPARK-34752: Summary: Upgrade Jetty to 9.4.37 to fix CVE-2020-27223 (was: Upgrade Jetty to 9.3.37 to fix CVE-2020-27223) > Upgrade Jetty to 9.4.37 to fix CVE-2020-27223 > - > > Key: SPARK-34752 > URL: https://issues.apache.org/jira/browse/SPARK-34752 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Priority: Major > > Another day, another Jetty CVE :) Our internal build tools are complaining > about Spark's dependency on Jetty 9.3.36 and I found it is because there is > another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time > for another upgrade to 9.3.37. > > Find more at: > https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 > https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34752) Upgrade Jetty to 9.3.37 to fix CVE-2020-27223
[ https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated SPARK-34752: Description: Another day, another Jetty CVE :) Our internal build tools are complaining about Spark's dependency on Jetty 9.3.36 and I found it is because there is another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time for another upgrade to 9.3.37. Find more at: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 was: Another day, another Jetty CVE :) Our internal build tools are complaining about Spark's dependency on Jetty 9.3.36 and I found it is because there is another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time for another upgrade to 9.3.37. Find more at https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 / https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 > Upgrade Jetty to 9.3.37 to fix CVE-2020-27223 > - > > Key: SPARK-34752 > URL: https://issues.apache.org/jira/browse/SPARK-34752 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Priority: Major > > Another day, another Jetty CVE :) Our internal build tools are complaining > about Spark's dependency on Jetty 9.3.36 and I found it is because there is > another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time > for another upgrade to 9.3.37. > > Find more at: > https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 > https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34752) Upgrade Jetty to 9.3.37 to fix CVE-2020-27223
[ https://issues.apache.org/jira/browse/SPARK-34752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated SPARK-34752: Description: Another day, another Jetty CVE :) Our internal build tools are complaining about Spark's dependency on Jetty 9.3.36 and I found it is because there is another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time for another upgrade to 9.3.37. Find more at https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 / https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 was:Another day, another Jetty CVE :) Our internal build tools are complaining about Spark's dependency on Jetty 9.3.36 and I found it is because there is another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time for another upgrade to 9.3.37. > Upgrade Jetty to 9.3.37 to fix CVE-2020-27223 > - > > Key: SPARK-34752 > URL: https://issues.apache.org/jira/browse/SPARK-34752 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Priority: Major > > Another day, another Jetty CVE :) Our internal build tools are complaining > about Spark's dependency on Jetty 9.3.36 and I found it is because there is > another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time > for another upgrade to 9.3.37. > > Find more at https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27223 / > https://www.sourceclear.com/vulnerability-database/security/denial-of-servicedos/java/sid-29523 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34752) Upgrade Jetty to 9.3.37 to fix CVE-2020-27223
Erik Krogen created SPARK-34752: --- Summary: Upgrade Jetty to 9.3.37 to fix CVE-2020-27223 Key: SPARK-34752 URL: https://issues.apache.org/jira/browse/SPARK-34752 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.1.1 Reporter: Erik Krogen Another day, another Jetty CVE :) Our internal build tools are complaining about Spark's dependency on Jetty 9.3.36 and I found it is because there is another Jetty CVE on the version we recently upgraded to in SPARK-34449. Time for another upgrade to 9.3.37. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34738) Upgrade Minikube and kubernetes cluster version on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-34738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Knapp reassigned SPARK-34738: --- Assignee: Shane Knapp > Upgrade Minikube and kubernetes cluster version on Jenkins > -- > > Key: SPARK-34738 > URL: https://issues.apache.org/jira/browse/SPARK-34738 > Project: Spark > Issue Type: Task > Components: jenkins, Kubernetes >Affects Versions: 3.2.0 >Reporter: Attila Zsolt Piros >Assignee: Shane Knapp >Priority: Major > > [~shaneknapp] as we discussed [on the mailing > list|http://apache-spark-developers-list.1001551.n3.nabble.com/minikube-and-kubernetes-cluster-versions-for-integration-testing-td30856.html] > Minikube can be upgraded to the latest (v1.18.1) and kubernetes version > should be v1.17.3 (`minikube config set kubernetes-version v1.17.3`). > [Here|https://github.com/apache/spark/pull/31829] is my PR which uses a new > method to configure the kubernetes client. Thanks in advance to use it for > testing on the Jenkins after the Minikube version is updated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34646) TreeNode bind issue for duplicate column name.
[ https://issues.apache.org/jira/browse/SPARK-34646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301924#comment-17301924 ] loc nguyen commented on SPARK-34646: I am little confused with your response. What are information are u looking for to receive. > TreeNode bind issue for duplicate column name. > -- > > Key: SPARK-34646 > URL: https://issues.apache.org/jira/browse/SPARK-34646 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.4.3 > Environment: Spark 2.4.3, Scala 2.11.8, Hadoop 3.2.1 >Reporter: loc nguyen >Priority: Major > Labels: spark > > I received a Spark {{TreeNodeException}} executing a union of two data > frames. When I assign the union results to a DataFrame that will be returned > by a function, this error occurs. However, I am able to assign the union > results to a DataFrame that will not be returned. I have examined the schema > for all the data frames participating in the code. The PT_Id is being > duplicated. The PT_Id is duplicated and results in the failed search. > > > {{21/03/04 19:58:28 ERROR Executor: Exception in task 2.0 in stage 2281.0 > (TID 5557) > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: PT_ID#140575 at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:79) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:78) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:255) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:261) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:261) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:261) > at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:245) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:78) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:45) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:40) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1190) > at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:403) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:87) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:85) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2$$anonfun$apply$3.apply(BroadcastNestedLoopJoinExec.scala:191) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2$$anonfun$apply$3.apply(BroadcastNestedLoopJoinExec.scala:191) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:46) > at scala.collection.mutable.ArrayOps$ofRef.exists(ArrayOps.scala:186) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2.apply(BroadcastNestedLoopJoinExec.scala:191) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$4$$anonfun$apply$2.apply(BroadcastNestedLoopJoinExec.scala:190) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:464) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at org.apache.spark.scheduler.Shuffle
[jira] [Updated] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
[ https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nivas Umapathy updated SPARK-34751: --- Description: I have a parquet file that has data with invalid column names on it. [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the file [Invalid Header Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing]. I tried to load this file with {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} {{df = df.withColumnRenamed('COL 1', 'COL_1')}} {{df = df.withColumnRenamed('COL,2', 'COL_2')}} {{df = df.withColumnRenamed('COL;3', 'COL_3') }} and so on. Now if i call {{df.show()}} it throws this exception that is still pointing to the old column name. {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains invalid character(s) among " ,;{}()\\n}} {{t=". Please use alias to rename it.;'}} When i read about it in some blogs, there was suggestion to re-read the same parquet with new schema applied. So i did {{df = glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} and it works, but all the data in the dataframe are null. The same works for String datatypes was: I have a parquet file that has data with invalid column names on it. [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the file [Invalid Header Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing]. I tried to load this file with {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} {{df = df.withColumnRenamed('COL 1', 'COL_1')}} {{df = df.withColumnRenamed('COL,2', 'COL_2')}} {{df = df.withColumnRenamed('COL;3', 'COL_3') }} and so on. Now if i call {{df.show()}} it throws this exception that is still pointing to the old column name. {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains invalid character(s) among " ,;{}()\\n\\t=". Please use alias to rename it.;'}} When i read about it in some blogs, there was suggestion to re-read the same parquet with new schema applied. So i did {{df = glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}{{}} and it works, but all the data in the dataframe are null. The same works for Strings > Parquet with invalid chars on column name reads double as null when a clean > schema is applied > - > > Key: SPARK-34751 > URL: https://issues.apache.org/jira/browse/SPARK-34751 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.3 > Environment: Pyspark 2.4.3 > AWS Glue Dev Endpoint EMR >Reporter: Nivas Umapathy >Priority: Major > Fix For: 2.4.8 > > Attachments: invalid_columns_double.parquet > > > I have a parquet file that has data with invalid column names on it. > [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the > file [Invalid Header > Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing]. > I tried to load this file with > {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} > {{df = df.withColumnRenamed('COL 1', 'COL_1')}} > {{df = df.withColumnRenamed('COL,2', 'COL_2')}} > {{df = df.withColumnRenamed('COL;3', 'COL_3') }} > and so on. > Now if i call > {{df.show()}} > it throws this exception that is still pointing to the old column name. > {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains > invalid character(s) among " ,;{}()\\n}} > {{t=". Please use alias to rename it.;'}} > > When i read about it in some blogs, there was suggestion to re-read the same > parquet with new schema applied. So i did > {{df = > glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} > > and it works, but all the data in the dataframe are null. The same works for > String datatypes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
[ https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nivas Umapathy updated SPARK-34751: --- Attachment: invalid_columns_double.parquet > Parquet with invalid chars on column name reads double as null when a clean > schema is applied > - > > Key: SPARK-34751 > URL: https://issues.apache.org/jira/browse/SPARK-34751 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.3 > Environment: Pyspark 2.4.3 > AWS Glue Dev Endpoint EMR >Reporter: Nivas Umapathy >Priority: Major > Fix For: 2.4.8 > > Attachments: invalid_columns_double.parquet > > > I have a parquet file that has data with invalid column names on it. > [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the > file [Invalid Header > Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing]. > I tried to load this file with > {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} > {{df = df.withColumnRenamed('COL 1', 'COL_1')}} > {{df = df.withColumnRenamed('COL,2', 'COL_2')}} > {{df = df.withColumnRenamed('COL;3', 'COL_3') }} > and so on. > Now if i call > {{df.show()}} > it throws this exception that is still pointing to the old column name. > {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains > invalid character(s) among " ,;{}()\\n\\t=". Please use alias to rename > it.;'}} > > When i read about it in some blogs, there was suggestion to re-read the same > parquet with new schema applied. So i did > {{df = > glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}{{}} > > and it works, but all the data in the dataframe are null. The same works for > Strings > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
Nivas Umapathy created SPARK-34751: -- Summary: Parquet with invalid chars on column name reads double as null when a clean schema is applied Key: SPARK-34751 URL: https://issues.apache.org/jira/browse/SPARK-34751 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 2.4.3 Environment: Pyspark 2.4.3 AWS Glue Dev Endpoint EMR Reporter: Nivas Umapathy Fix For: 2.4.8 Attachments: invalid_columns_double.parquet I have a parquet file that has data with invalid column names on it. [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the file [Invalid Header Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing]. I tried to load this file with {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} {{df = df.withColumnRenamed('COL 1', 'COL_1')}} {{df = df.withColumnRenamed('COL,2', 'COL_2')}} {{df = df.withColumnRenamed('COL;3', 'COL_3') }} and so on. Now if i call {{df.show()}} it throws this exception that is still pointing to the old column name. {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains invalid character(s) among " ,;{}()\\n\\t=". Please use alias to rename it.;'}} When i read about it in some blogs, there was suggestion to re-read the same parquet with new schema applied. So i did {{df = glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}{{}} and it works, but all the data in the dataframe are null. The same works for Strings -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34750) Parquet with invalid chars on column name reads double as null when a clean schema is applied
Nivas Umapathy created SPARK-34750: -- Summary: Parquet with invalid chars on column name reads double as null when a clean schema is applied Key: SPARK-34750 URL: https://issues.apache.org/jira/browse/SPARK-34750 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 2.4.3 Environment: Pyspark 2.4.3 AWS Glue Dev Endpoint EMR Reporter: Nivas Umapathy Fix For: 2.4.8 I have a parquet file that has data with invalid column names on it. [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the file [Invalid Header Parquet|https://drive.google.com/file/d/101WNWXnPwhjocSMVjkhn5jo85Ri_NydP/view?usp=sharing]. I tried to load this file with {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} {{df = df.withColumnRenamed('COL 1', 'COL_1')}} {{df = df.withColumnRenamed('COL,2', 'COL_2')}} {{df = df.withColumnRenamed('COL;3', 'COL_3') }} and so on. Now if i call {{df.show()}} it throws this exception that is still pointing to the old column name. {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains invalid character(s) among " ,;{}()\\n\\t=". Please use alias to rename it.;'}} When i read about it in some blogs, there was suggestion to re-read the same parquet with new schema applied. So i did {{df = glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}{{}} and it works, but all the data in the dataframe are null. The same works for Strings -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34749) Simplify CreateNamedStruct
[ https://issues.apache.org/jira/browse/SPARK-34749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34749: Assignee: (was: Apache Spark) > Simplify CreateNamedStruct > -- > > Key: SPARK-34749 > URL: https://issues.apache.org/jira/browse/SPARK-34749 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34749) Simplify CreateNamedStruct
[ https://issues.apache.org/jira/browse/SPARK-34749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34749: Assignee: Apache Spark > Simplify CreateNamedStruct > -- > > Key: SPARK-34749 > URL: https://issues.apache.org/jira/browse/SPARK-34749 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34749) Simplify CreateNamedStruct
[ https://issues.apache.org/jira/browse/SPARK-34749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301827#comment-17301827 ] Apache Spark commented on SPARK-34749: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/31843 > Simplify CreateNamedStruct > -- > > Key: SPARK-34749 > URL: https://issues.apache.org/jira/browse/SPARK-34749 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34749) Simplify CreateNamedStruct
Wenchen Fan created SPARK-34749: --- Summary: Simplify CreateNamedStruct Key: SPARK-34749 URL: https://issues.apache.org/jira/browse/SPARK-34749 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34731) ConcurrentModificationException in EventLoggingListener when redacting properties
[ https://issues.apache.org/jira/browse/SPARK-34731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-34731: -- Affects Version/s: 3.1.1 > ConcurrentModificationException in EventLoggingListener when redacting > properties > - > > Key: SPARK-34731 > URL: https://issues.apache.org/jira/browse/SPARK-34731 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0, 3.1.1 >Reporter: Bruce Robbins >Priority: Major > > Reproduction: > The key elements of reproduction are enabling event logging, setting > spark.executor.cores, and some bad luck: > {noformat} > $ bin/spark-shell --conf spark.ui.showConsoleProgress=false \ > --conf spark.executor.cores=1 --driver-memory 4g --conf \ > "spark.ui.showConsoleProgress=false" \ > --conf spark.eventLog.enabled=true \ > --conf spark.eventLog.dir=/tmp/spark-events > ... > scala> (0 to 500).foreach { i => > | val df = spark.range(0, 2).toDF("a") > | df.filter("a > 12").count > | } > 21/03/12 18:16:44 ERROR AsyncEventQueue: Listener EventLoggingListener threw > an exception > java.util.ConcurrentModificationException > at java.util.Hashtable$Enumerator.next(Hashtable.java:1387) > at > scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$6.next(Wrappers.scala:424) > at > scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$6.next(Wrappers.scala:420) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at scala.collection.mutable.MapLike.toSeq(MapLike.scala:75) > at scala.collection.mutable.MapLike.toSeq$(MapLike.scala:72) > at scala.collection.mutable.AbstractMap.toSeq(Map.scala:82) > at > org.apache.spark.scheduler.EventLoggingListener.redactProperties(EventLoggingListener.scala:290) > at > org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:162) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) > at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) > at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) > at > org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) > at > org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) > at > scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) > at > org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) > at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1379) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) > {noformat} > Analysis from quick reading of the code: > DAGScheduler posts a JobSubmitted event containing a clone of a properties > object > [here|https://github.com/apache/spark/blob/4f1e434ec57070b52b28f98c66b53ca6ec4de7a4/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L834]. > This event is handled > [here|https://github.com/apache/spark/blob/4f1e434ec57070b52b28f98c66b53ca6ec4de7a4/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2394]. > DAGScheduler#handleJobSubmitted stores the properties object in a [Job > object|https://github.com/apache/spark/blob/4f1e434ec57070b52b28f98c66b53ca6ec4de7a4/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1154], > which in turn is [saved in the jobIdToActiveJob > map|https://github.com/apache/spark/blob/4f1e434ec57070b52b28f98c66b53ca6ec4de7a4/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1163]. > DAGScheduler#handleJobSubmitted posts a SparkListenerJobStart event > [here|https://github.com/apache/spark/blob/4f1e434ec57070b52b28f98c66b53ca6ec4de7a4/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1169] > with a reference to
[jira] [Assigned] (SPARK-34748) Create a rule of the analysis logic for streaming write
[ https://issues.apache.org/jira/browse/SPARK-34748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34748: Assignee: (was: Apache Spark) > Create a rule of the analysis logic for streaming write > --- > > Key: SPARK-34748 > URL: https://issues.apache.org/jira/browse/SPARK-34748 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Priority: Major > > Currently, the analysis logic for streaming write is mixed in > StreamingQueryManager. If we create a specific analyzer rule and separated > logical plans, it should be helpful for further extension. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34748) Create a rule of the analysis logic for streaming write
[ https://issues.apache.org/jira/browse/SPARK-34748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301792#comment-17301792 ] Apache Spark commented on SPARK-34748: -- User 'xuanyuanking' has created a pull request for this issue: https://github.com/apache/spark/pull/31842 > Create a rule of the analysis logic for streaming write > --- > > Key: SPARK-34748 > URL: https://issues.apache.org/jira/browse/SPARK-34748 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Priority: Major > > Currently, the analysis logic for streaming write is mixed in > StreamingQueryManager. If we create a specific analyzer rule and separated > logical plans, it should be helpful for further extension. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34748) Create a rule of the analysis logic for streaming write
[ https://issues.apache.org/jira/browse/SPARK-34748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34748: Assignee: Apache Spark > Create a rule of the analysis logic for streaming write > --- > > Key: SPARK-34748 > URL: https://issues.apache.org/jira/browse/SPARK-34748 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Assignee: Apache Spark >Priority: Major > > Currently, the analysis logic for streaming write is mixed in > StreamingQueryManager. If we create a specific analyzer rule and separated > logical plans, it should be helpful for further extension. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34747) Add virtual operators to the built-in function document.
[ https://issues.apache.org/jira/browse/SPARK-34747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301791#comment-17301791 ] Apache Spark commented on SPARK-34747: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/31841 > Add virtual operators to the built-in function document. > > > Key: SPARK-34747 > URL: https://issues.apache.org/jira/browse/SPARK-34747 > Project: Spark > Issue Type: Bug > Components: docs, SQL >Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > After SPARK-34697, DESCRIBE FUNCTION and SHOW FUNCTIONS can describe/show > built-in operators including the following virtual operators. > * != > * <> > * between > * case > * || > But they are still absent from the built-in functions document. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34747) Add virtual operators to the built-in function document.
[ https://issues.apache.org/jira/browse/SPARK-34747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34747: Assignee: Kousuke Saruta (was: Apache Spark) > Add virtual operators to the built-in function document. > > > Key: SPARK-34747 > URL: https://issues.apache.org/jira/browse/SPARK-34747 > Project: Spark > Issue Type: Bug > Components: docs, SQL >Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > After SPARK-34697, DESCRIBE FUNCTION and SHOW FUNCTIONS can describe/show > built-in operators including the following virtual operators. > * != > * <> > * between > * case > * || > But they are still absent from the built-in functions document. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34747) Add virtual operators to the built-in function document.
[ https://issues.apache.org/jira/browse/SPARK-34747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34747: Assignee: Apache Spark (was: Kousuke Saruta) > Add virtual operators to the built-in function document. > > > Key: SPARK-34747 > URL: https://issues.apache.org/jira/browse/SPARK-34747 > Project: Spark > Issue Type: Bug > Components: docs, SQL >Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > After SPARK-34697, DESCRIBE FUNCTION and SHOW FUNCTIONS can describe/show > built-in operators including the following virtual operators. > * != > * <> > * between > * case > * || > But they are still absent from the built-in functions document. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34747) Add virtual operators to the built-in function document.
[ https://issues.apache.org/jira/browse/SPARK-34747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301789#comment-17301789 ] Apache Spark commented on SPARK-34747: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/31841 > Add virtual operators to the built-in function document. > > > Key: SPARK-34747 > URL: https://issues.apache.org/jira/browse/SPARK-34747 > Project: Spark > Issue Type: Bug > Components: docs, SQL >Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > After SPARK-34697, DESCRIBE FUNCTION and SHOW FUNCTIONS can describe/show > built-in operators including the following virtual operators. > * != > * <> > * between > * case > * || > But they are still absent from the built-in functions document. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34748) Create a rule of the analysis logic for streaming write
Yuanjian Li created SPARK-34748: --- Summary: Create a rule of the analysis logic for streaming write Key: SPARK-34748 URL: https://issues.apache.org/jira/browse/SPARK-34748 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.2.0 Reporter: Yuanjian Li Currently, the analysis logic for streaming write is mixed in StreamingQueryManager. If we create a specific analyzer rule and separated logical plans, it should be helpful for further extension. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34747) Add virtual operators to the built-in function document.
Kousuke Saruta created SPARK-34747: -- Summary: Add virtual operators to the built-in function document. Key: SPARK-34747 URL: https://issues.apache.org/jira/browse/SPARK-34747 Project: Spark Issue Type: Bug Components: docs, SQL Affects Versions: 3.1.1, 3.0.2, 2.4.7, 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta After SPARK-34697, DESCRIBE FUNCTION and SHOW FUNCTIONS can describe/show built-in operators including the following virtual operators. * != * <> * between * case * || But they are still absent from the built-in functions document. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34746) Spark dependencies require scala 2.12.12
[ https://issues.apache.org/jira/browse/SPARK-34746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301733#comment-17301733 ] Peter Kaiser commented on SPARK-34746: -- works when enforcing Scala 2.12.10 in the gradle build file: {code:java} implementation ('org.scala-lang:scala-library:2.12.10') { force = true }{code} > Spark dependencies require scala 2.12.12 > > > Key: SPARK-34746 > URL: https://issues.apache.org/jira/browse/SPARK-34746 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.1 >Reporter: Peter Kaiser >Priority: Critical > > In our application we're creating a spark session programmatically. The > application is built using gradle. > After upgrading spark to 3.1.1 it no longer works, due to incompatible > classes on driver and executor (namely: > scala.lang.collections.immutable.WrappedArray.ofRef). > Turns out this was caused by different scala versions on driver vs. executor. > While spark still comes with Scala 2.12.10, some of its dependencies in the > gradle build require Scala 2.12.12: > {noformat} > Cannot find a version of 'org.scala-lang:scala-library' that satisfies the > version constraints: > Dependency path '...' --> '...' --> 'org.scala-lang:scala-library:{strictly > 2.12.10}' > Dependency path '...' --> 'org.apache.spark:spark-core_2.12:3.1.1' --> > 'org.json4s:json4s-jackson_2.12:3.7.0-M5' --> > 'org.scala-lang:scala-library:2.12.12' {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34746) Spark dependencies require scala 2.12.12
Peter Kaiser created SPARK-34746: Summary: Spark dependencies require scala 2.12.12 Key: SPARK-34746 URL: https://issues.apache.org/jira/browse/SPARK-34746 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.1 Reporter: Peter Kaiser In our application we're creating a spark session programmatically. The application is built using gradle. After upgrading spark to 3.1.1 it no longer works, due to incompatible classes on driver and executor (namely: scala.lang.collections.immutable.WrappedArray.ofRef). Turns out this was caused by different scala versions on driver vs. executor. While spark still comes with Scala 2.12.10, some of its dependencies in the gradle build require Scala 2.12.12: {noformat} Cannot find a version of 'org.scala-lang:scala-library' that satisfies the version constraints: Dependency path '...' --> '...' --> 'org.scala-lang:scala-library:{strictly 2.12.10}' Dependency path '...' --> 'org.apache.spark:spark-core_2.12:3.1.1' --> 'org.json4s:json4s-jackson_2.12:3.7.0-M5' --> 'org.scala-lang:scala-library:2.12.12' {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns
[ https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701 ] Chen Zou edited comment on SPARK-34694 at 3/15/21, 3:26 PM: Hi Hyukjin, I think the design you described would work. But the current org.apache.spark.sql.sources.Filter isn't built under the assumption that the 'value' parameter could be a column reference. e.g. the findReferences member function does not consider value being a column references. {code:scala} protected def findReferences(value: Any): Array[String] = value match { case f: Filter => f.references case _ => Array.empty } {code} And this is probably why org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the cross-column filters down to data sources. The end result is that cross-column filters don't get pushed down, from stderr of a spark job doing TPC-H Q12: 21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: Pushing operators to lineitem@[file:///blah/blah/lineitem] Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01) Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < l_commitdate#11) Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, l_shipmode#14 Regards, Chen was (Author: zinechant): Hi Hyukjin, I think the design you described would work. But the current org.apache.spark.sql.sources.Filter isn't built under the assumption that the 'value' parameter could be a column reference. e.g. the findReferences member function does not consider value being a column references. {code:scala} protected def findReferences(value: Any): Array[String] = value match { case f: Filter => f.references case _ => Array.empty } {code} And this is probably why org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the cross-column filters down to data sources. The end result is that cross-column filters don't get pushed down, from stderr of a spark job doing TPC-H Q12: 21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: Pushing operators to lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem] Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01) Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < l_commitdate#11) Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, l_shipmode#14 Regards, Chen > Improve Spark SQL Source Filter to allow pushdown of filters span multiple > columns > -- > > Key: SPARK-34694 > URL: https://issues.apache.org/jira/browse/SPARK-34694 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1 >Reporter: Chen Zou >Priority: Minor > > The current org.apache.spark.sql.sources.Filter abstract class only allows > pushdown of filters on single column or sum of products of multiple such > single-column filters. > Filters on multiple columns cannot be pushed down through this Filter > subclass to source, e.g. from TPC-H benchmark on lineitem table: > (l_commitdate#11 < l_receiptdate#12) > (l_shipdate#10 < l_commitdate#11) > > The current design probably originates from the point that columnar source > has a hard time supporting these cross-column filters. But with batching > implemented in columnar sources, they can still support cross-column filters. > This issue tries to open up discussion on a more general Filter interface to > allow pushing down cross-column filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns
[ https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701 ] Chen Zou edited comment on SPARK-34694 at 3/15/21, 3:24 PM: Hi Hyukjin, I think the design you described would work. But the current org.apache.spark.sql.sources.Filter isn't built under the assumption that the 'value' parameter could be a column reference. e.g. the findReferences member function does not consider value being a column references. {code:scala} protected def findReferences(value: Any): Array[String] = value match { case f: Filter => f.references case _ => Array.empty } {code} And this is probably why org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the cross-column filters down to data sources. The end result is that cross-column filters don't get pushed down, from stderr of a spark job doing TPC-H Q12: 21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: Pushing operators to lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem] Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01) Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < l_commitdate#11) Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, l_shipmode#14 Regards, Chen was (Author: zinechant): Hi Hyukjin, I think the design you described would work. But the current org.apache.spark.sql.sources.Filter isn't built under the assumption that the 'value' parameter could be a column reference. e.g. the findReferences member function does not consider value being a column references. protected def findReferences(value: Any): Array[String] = value match { case f: Filter => f.references case _ => Array.empty } And this is probably why org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the cross-column filters down to data sources. The end result is that cross-column filters don't get pushed down, from stderr of a spark job doing TPC-H Q12: 21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: Pushing operators to lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem] Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01) Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < l_commitdate#11) Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, l_shipmode#14 Regards, Chen > Improve Spark SQL Source Filter to allow pushdown of filters span multiple > columns > -- > > Key: SPARK-34694 > URL: https://issues.apache.org/jira/browse/SPARK-34694 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1 >Reporter: Chen Zou >Priority: Minor > > The current org.apache.spark.sql.sources.Filter abstract class only allows > pushdown of filters on single column or sum of products of multiple such > single-column filters. > Filters on multiple columns cannot be pushed down through this Filter > subclass to source, e.g. from TPC-H benchmark on lineitem table: > (l_commitdate#11 < l_receiptdate#12) > (l_shipdate#10 < l_commitdate#11) > > The current design probably originates from the point that columnar source > has a hard time supporting these cross-column filters. But with batching > implemented in columnar sources, they can still support cross-column filters. > This issue tries to open up discussion on a more general Filter interface to > allow pushing down cross-column filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns
[ https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701 ] Chen Zou edited comment on SPARK-34694 at 3/15/21, 3:23 PM: Hi Hyukjin, I think the design you described would work. But the current org.apache.spark.sql.sources.Filter isn't built under the assumption that the 'value' parameter could be a column reference. e.g. the findReferences member function does not consider value being a column references. {quote} protected def findReferences(value: Any): Array[String] = value match { case f: Filter => f.references case _ => Array.empty } {quote} And this is probably why org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the cross-column filters down to data sources. The end result is that cross-column filters don't get pushed down, from stderr of a spark job doing TPC-H Q12: 21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: Pushing operators to lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem] Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01) Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < l_commitdate#11) Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, l_shipmode#14 Regards, Chen was (Author: zinechant): Hi Hyukjin, I think the design you described would work. But the current org.apache.spark.sql.sources.Filter isn't built under the assumption that the 'value' parameter could be a column reference. e.g. the findReferences member function does not consider value being a column references. {quote} protected def findReferences(value: Any): Array[String] = value match { case f: Filter => f.references case _ => Array.empty } {quote} And this is probably why org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the cross-column filters down to data sources. The end result is that cross-column filters don't get pushed down, from stderr of a spark job doing TPC-H Q12: 21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: Pushing operators to lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem] Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01) Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < l_commitdate#11) Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, l_shipmode#14 Regards, Chen > Improve Spark SQL Source Filter to allow pushdown of filters span multiple > columns > -- > > Key: SPARK-34694 > URL: https://issues.apache.org/jira/browse/SPARK-34694 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1 >Reporter: Chen Zou >Priority: Minor > > The current org.apache.spark.sql.sources.Filter abstract class only allows > pushdown of filters on single column or sum of products of multiple such > single-column filters. > Filters on multiple columns cannot be pushed down through this Filter > subclass to source, e.g. from TPC-H benchmark on lineitem table: > (l_commitdate#11 < l_receiptdate#12) > (l_shipdate#10 < l_commitdate#11) > > The current design probably originates from the point that columnar source > has a hard time supporting these cross-column filters. But with batching > implemented in columnar sources, they can still support cross-column filters. > This issue tries to open up discussion on a more general Filter interface to > allow pushing down cross-column filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns
[ https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701 ] Chen Zou edited comment on SPARK-34694 at 3/15/21, 3:23 PM: Hi Hyukjin, I think the design you described would work. But the current org.apache.spark.sql.sources.Filter isn't built under the assumption that the 'value' parameter could be a column reference. e.g. the findReferences member function does not consider value being a column references. protected def findReferences(value: Any): Array[String] = value match { case f: Filter => f.references case _ => Array.empty } And this is probably why org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the cross-column filters down to data sources. The end result is that cross-column filters don't get pushed down, from stderr of a spark job doing TPC-H Q12: 21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: Pushing operators to lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem] Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01) Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < l_commitdate#11) Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, l_shipmode#14 Regards, Chen was (Author: zinechant): Hi Hyukjin, I think the design you described would work. But the current org.apache.spark.sql.sources.Filter isn't built under the assumption that the 'value' parameter could be a column reference. e.g. the findReferences member function does not consider value being a column references. {quote} protected def findReferences(value: Any): Array[String] = value match { case f: Filter => f.references case _ => Array.empty } {quote} And this is probably why org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the cross-column filters down to data sources. The end result is that cross-column filters don't get pushed down, from stderr of a spark job doing TPC-H Q12: 21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: Pushing operators to lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem] Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01) Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < l_commitdate#11) Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, l_shipmode#14 Regards, Chen > Improve Spark SQL Source Filter to allow pushdown of filters span multiple > columns > -- > > Key: SPARK-34694 > URL: https://issues.apache.org/jira/browse/SPARK-34694 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1 >Reporter: Chen Zou >Priority: Minor > > The current org.apache.spark.sql.sources.Filter abstract class only allows > pushdown of filters on single column or sum of products of multiple such > single-column filters. > Filters on multiple columns cannot be pushed down through this Filter > subclass to source, e.g. from TPC-H benchmark on lineitem table: > (l_commitdate#11 < l_receiptdate#12) > (l_shipdate#10 < l_commitdate#11) > > The current design probably originates from the point that columnar source > has a hard time supporting these cross-column filters. But with batching > implemented in columnar sources, they can still support cross-column filters. > This issue tries to open up discussion on a more general Filter interface to > allow pushing down cross-column filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns
[ https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701 ] Chen Zou edited comment on SPARK-34694 at 3/15/21, 3:23 PM: Hi Hyukjin, I think the design you described would work. But the current org.apache.spark.sql.sources.Filter isn't built under the assumption that the 'value' parameter could be a column reference. e.g. the findReferences member function does not consider value being a column references. protected def findReferences(value: Any): Array[String] = value match { case f: Filter => f.references case _ => Array.empty } And this is probably why org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the cross-column filters down to data sources. The end result is that cross-column filters don't get pushed down, from stderr of a spark job doing TPC-H Q12: 21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: Pushing operators to lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem] Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01) Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < l_commitdate#11) Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, l_shipmode#14 Regards, Chen was (Author: zinechant): Hi Hyukjin, I think the design you described would work. But the current org.apache.spark.sql.sources.Filter isn't built under the assumption that the 'value' parameter could be a column reference. e.g. the findReferences member function does not consider value being a column references. protected def findReferences(value: Any): Array[String] = value match { case f: Filter => f.references case _ => Array.empty } And this is probably why org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the cross-column filters down to data sources. The end result is that cross-column filters don't get pushed down, from stderr of a spark job doing TPC-H Q12: 21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: Pushing operators to lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem] Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01) Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < l_commitdate#11) Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, l_shipmode#14 Regards, Chen > Improve Spark SQL Source Filter to allow pushdown of filters span multiple > columns > -- > > Key: SPARK-34694 > URL: https://issues.apache.org/jira/browse/SPARK-34694 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1 >Reporter: Chen Zou >Priority: Minor > > The current org.apache.spark.sql.sources.Filter abstract class only allows > pushdown of filters on single column or sum of products of multiple such > single-column filters. > Filters on multiple columns cannot be pushed down through this Filter > subclass to source, e.g. from TPC-H benchmark on lineitem table: > (l_commitdate#11 < l_receiptdate#12) > (l_shipdate#10 < l_commitdate#11) > > The current design probably originates from the point that columnar source > has a hard time supporting these cross-column filters. But with batching > implemented in columnar sources, they can still support cross-column filters. > This issue tries to open up discussion on a more general Filter interface to > allow pushing down cross-column filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns
[ https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701 ] Chen Zou edited comment on SPARK-34694 at 3/15/21, 3:22 PM: Hi Hyukjin, I think the design you described would work. But the current org.apache.spark.sql.sources.Filter isn't built under the assumption that the 'value' parameter could be a column reference. e.g. the findReferences member function does not consider value being a column references. {quote} protected def findReferences(value: Any): Array[String] = value match { case f: Filter => f.references case _ => Array.empty } {quote} And this is probably why org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the cross-column filters down to data sources. The end result is that cross-column filters don't get pushed down, from stderr of a spark job doing TPC-H Q12: 21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: Pushing operators to lineitem@[file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem] Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01) Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < l_commitdate#11) Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, l_shipmode#14 Regards, Chen was (Author: zinechant): Hi Hyukjin, I think the design you described would work. But the current org.apache.spark.sql.sources.Filter isn't built under the assumption that the 'value' parameter could be a column reference. e.g. the findReferences member function does not consider value being a column references. {quote} protected def findReferences(value: Any): Array[String] = value match { case f: Filter => f.references case _ => Array.empty }{quote} And this is probably why org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the cross-column filters down to data sources. The end result is that cross-column filters don't get pushed down, from stderr of a spark job doing TPC-H Q12: 21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: Pushing operators to lineitem@file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01) Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < l_commitdate#11) Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, l_shipmode#14 Regards, Chen > Improve Spark SQL Source Filter to allow pushdown of filters span multiple > columns > -- > > Key: SPARK-34694 > URL: https://issues.apache.org/jira/browse/SPARK-34694 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1 >Reporter: Chen Zou >Priority: Minor > > The current org.apache.spark.sql.sources.Filter abstract class only allows > pushdown of filters on single column or sum of products of multiple such > single-column filters. > Filters on multiple columns cannot be pushed down through this Filter > subclass to source, e.g. from TPC-H benchmark on lineitem table: > (l_commitdate#11 < l_receiptdate#12) > (l_shipdate#10 < l_commitdate#11) > > The current design probably originates from the point that columnar source > has a hard time supporting these cross-column filters. But with batching > implemented in columnar sources, they can still support cross-column filters. > This issue tries to open up discussion on a more general Filter interface to > allow pushing down cross-column filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns
[ https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301701#comment-17301701 ] Chen Zou commented on SPARK-34694: -- Hi Hyukjin, I think the design you described would work. But the current org.apache.spark.sql.sources.Filter isn't built under the assumption that the 'value' parameter could be a column reference. e.g. the findReferences member function does not consider value being a column references. {quote} protected def findReferences(value: Any): Array[String] = value match { case f: Filter => f.references case _ => Array.empty }{quote} And this is probably why org.apache.spark.sql.execution.datasources.v2.PushDownUtils would not push the cross-column filters down to data sources. The end result is that cross-column filters don't get pushed down, from stderr of a spark job doing TPC-H Q12: 21/03/10 16:56:16.266 INFO V2ScanRelationPushDown: Pushing operators to lineitem@file:///home/colouser51/udpstorage/tpch/tbl_s1e1/lineitem Pushed Filters: Or(EqualTo(l_shipmode,MAIL),EqualTo(l_shipmode,SHIP)), GreaterThanOrEqual(l_receiptdate,1994-01-01), LessThan(l_receiptdate,1995-01-01) Post-Scan Filters: (l_commitdate#11 < l_receiptdate#12),(l_shipdate#10 < l_commitdate#11) Output: l_orderkey#0, l_shipdate#10, l_commitdate#11, l_receiptdate#12, l_shipmode#14 Regards, Chen > Improve Spark SQL Source Filter to allow pushdown of filters span multiple > columns > -- > > Key: SPARK-34694 > URL: https://issues.apache.org/jira/browse/SPARK-34694 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1 >Reporter: Chen Zou >Priority: Minor > > The current org.apache.spark.sql.sources.Filter abstract class only allows > pushdown of filters on single column or sum of products of multiple such > single-column filters. > Filters on multiple columns cannot be pushed down through this Filter > subclass to source, e.g. from TPC-H benchmark on lineitem table: > (l_commitdate#11 < l_receiptdate#12) > (l_shipdate#10 < l_commitdate#11) > > The current design probably originates from the point that columnar source > has a hard time supporting these cross-column filters. But with batching > implemented in columnar sources, they can still support cross-column filters. > This issue tries to open up discussion on a more general Filter interface to > allow pushing down cross-column filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34745) Unify overflow exception error message of integral types
[ https://issues.apache.org/jira/browse/SPARK-34745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301693#comment-17301693 ] Apache Spark commented on SPARK-34745: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/31840 > Unify overflow exception error message of integral types > > > Key: SPARK-34745 > URL: https://issues.apache.org/jira/browse/SPARK-34745 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Current, the overflow exception error messages of integral types are > different. > For Byte/Short type, the message is "... caused overflow" > For Int/Long, the message is "int/long overflow" since Spark is calling the > "*Exact"(e.g. addExact, negateExact) methods from java.lang.Math. > We should unify the error message by changing the message of Byte/Short as > "tinyint/smallint overflow" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34745) Unify overflow exception error message of integral types
[ https://issues.apache.org/jira/browse/SPARK-34745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34745: Assignee: Apache Spark (was: Gengliang Wang) > Unify overflow exception error message of integral types > > > Key: SPARK-34745 > URL: https://issues.apache.org/jira/browse/SPARK-34745 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Current, the overflow exception error messages of integral types are > different. > For Byte/Short type, the message is "... caused overflow" > For Int/Long, the message is "int/long overflow" since Spark is calling the > "*Exact"(e.g. addExact, negateExact) methods from java.lang.Math. > We should unify the error message by changing the message of Byte/Short as > "tinyint/smallint overflow" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34745) Unify overflow exception error message of integral types
[ https://issues.apache.org/jira/browse/SPARK-34745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34745: Assignee: Gengliang Wang (was: Apache Spark) > Unify overflow exception error message of integral types > > > Key: SPARK-34745 > URL: https://issues.apache.org/jira/browse/SPARK-34745 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Current, the overflow exception error messages of integral types are > different. > For Byte/Short type, the message is "... caused overflow" > For Int/Long, the message is "int/long overflow" since Spark is calling the > "*Exact"(e.g. addExact, negateExact) methods from java.lang.Math. > We should unify the error message by changing the message of Byte/Short as > "tinyint/smallint overflow" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34745) Unify overflow exception error message of integral types
[ https://issues.apache.org/jira/browse/SPARK-34745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-34745: --- Description: Current, the overflow exception error messages of integral types are different. For Byte/Short type, the message is "... caused overflow" For Int/Long, the message is "int/long overflow" since Spark is calling the "*Exact"(e.g. addExact, negateExact) methods from java.lang.Math. We should unify the error message by changing the message of Byte/Short as "tinyint/smallint overflow" was: Current, the overflow exception error messages of integral types are different. For Byte/Short type, the message is "... caused overflow" For Int/Long, the message is "int/long overflow" since Spark is calling the "exact*" methods from java.lang.Math. We should unify the error message by changing the message of Byte/Short as "tinyint/smallint overflow" > Unify overflow exception error message of integral types > > > Key: SPARK-34745 > URL: https://issues.apache.org/jira/browse/SPARK-34745 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Current, the overflow exception error messages of integral types are > different. > For Byte/Short type, the message is "... caused overflow" > For Int/Long, the message is "int/long overflow" since Spark is calling the > "*Exact"(e.g. addExact, negateExact) methods from java.lang.Math. > We should unify the error message by changing the message of Byte/Short as > "tinyint/smallint overflow" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34744) Improve error message for casting cause overflow error
[ https://issues.apache.org/jira/browse/SPARK-34744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301687#comment-17301687 ] Xingchao, Zhang commented on SPARK-34744: - I will try to fix it > Improve error message for casting cause overflow error > -- > > Key: SPARK-34744 > URL: https://issues.apache.org/jira/browse/SPARK-34744 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > For example: > {code:sql} > set spark.sql.ansi.enabled=true; > select tinyint(128) * tinyint(2); > {code} > Error message: > {noformat} > Casting 128 to scala.Byte$ causes overflow > {noformat} > Expected: > {noformat} > Casting 128 to tinyint causes overflow > {noformat} > We should use DataType's catalogString. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34745) Unify overflow exception error message of integral types
Gengliang Wang created SPARK-34745: -- Summary: Unify overflow exception error message of integral types Key: SPARK-34745 URL: https://issues.apache.org/jira/browse/SPARK-34745 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Gengliang Wang Assignee: Gengliang Wang Current, the overflow exception error messages of integral types are different. For Byte/Short type, the message is "... caused overflow" For Int/Long, the message is "int/long overflow" since Spark is calling the "exact*" methods from java.lang.Math. We should unify the error message by changing the message of Byte/Short as "tinyint/smallint overflow" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34744) Improve error message for casting cause overflow error
[ https://issues.apache.org/jira/browse/SPARK-34744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-34744: Description: For example: {code:sql} set spark.sql.ansi.enabled=true; select tinyint(128) * tinyint(2); {code} Error message: {noformat} Casting 128 to scala.Byte$ causes overflow {noformat} Expected: {noformat} Casting 128 to tinyint causes overflow {noformat} We should use DataType's catalogString. was: For example: {code:sql} set spark.sql.ansi.enabled=true; select tinyint(128) * tinyint(2); {code} Error message: {noformat} Casting 128 to scala.Byte$ causes overflow {noformat} Expected: {noformat} Casting 128 to tinyint causes overflow {noformat} We can update [castingCauseOverflowError|https://github.com/apache/spark/blob/5b2ad59f64a9bb065b49acb2e73a6b246a3d8c64/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala#L64-L66] to: {code:scala} def castingCauseOverflowError(t: Any, targetType: DataType): ArithmeticException = { new ArithmeticException(s"Casting $t to ${targetType.catalogString} causes overflow") } {code} > Improve error message for casting cause overflow error > -- > > Key: SPARK-34744 > URL: https://issues.apache.org/jira/browse/SPARK-34744 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > For example: > {code:sql} > set spark.sql.ansi.enabled=true; > select tinyint(128) * tinyint(2); > {code} > Error message: > {noformat} > Casting 128 to scala.Byte$ causes overflow > {noformat} > Expected: > {noformat} > Casting 128 to tinyint causes overflow > {noformat} > We should use DataType's catalogString. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34744) Improve error message for casting cause overflow error
Yuming Wang created SPARK-34744: --- Summary: Improve error message for casting cause overflow error Key: SPARK-34744 URL: https://issues.apache.org/jira/browse/SPARK-34744 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Yuming Wang For example: {code:sql} set spark.sql.ansi.enabled=true; select tinyint(128) * tinyint(2); {code} Error message: {noformat} Casting 128 to scala.Byte$ causes overflow {noformat} Expected: {noformat} Casting 128 to tinyint causes overflow {noformat} We can update [castingCauseOverflowError|https://github.com/apache/spark/blob/5b2ad59f64a9bb065b49acb2e73a6b246a3d8c64/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala#L64-L66] to: {code:scala} def castingCauseOverflowError(t: Any, targetType: DataType): ArithmeticException = { new ArithmeticException(s"Casting $t to ${targetType.catalogString} causes overflow") } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34087) a memory leak occurs when we clone the spark session
[ https://issues.apache.org/jira/browse/SPARK-34087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301665#comment-17301665 ] Apache Spark commented on SPARK-34087: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/31839 > a memory leak occurs when we clone the spark session > > > Key: SPARK-34087 > URL: https://issues.apache.org/jira/browse/SPARK-34087 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Fu Chen >Priority: Major > Attachments: 1610451044690.jpg > > > In Spark-3.0.1, the memory leak occurs when we keep cloning the spark session > because a new ExecutionListenerBus instance will add to AsyncEventQueue when > we clone a new session. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34739) Add an year-month interval to a timestamp
[ https://issues.apache.org/jira/browse/SPARK-34739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-34739. -- Resolution: Done > Add an year-month interval to a timestamp > - > > Key: SPARK-34739 > URL: https://issues.apache.org/jira/browse/SPARK-34739 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > Support adding of YearMonthIntervalType values to TIMESTAMP values. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`
[ https://issues.apache.org/jira/browse/SPARK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-34743. --- Fix Version/s: 3.0.3 3.1.2 2.4.8 3.2.0 Resolution: Fixed Issue resolved by pull request 31837 [https://github.com/apache/spark/pull/31837] > ExpressionEncoderSuite should use deepEquals when we expect `array of array` > > > Key: SPARK-34743 > URL: https://issues.apache.org/jira/browse/SPARK-34743 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.2, 3.1.1 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0, 2.4.8, 3.1.2, 3.0.3 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`
[ https://issues.apache.org/jira/browse/SPARK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-34743: - Assignee: Dongjoon Hyun > ExpressionEncoderSuite should use deepEquals when we expect `array of array` > > > Key: SPARK-34743 > URL: https://issues.apache.org/jira/browse/SPARK-34743 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.2, 3.1.1 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34639) always remove unnecessary Alias in Analyzer.resolveExpression
[ https://issues.apache.org/jira/browse/SPARK-34639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34639. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31758 [https://github.com/apache/spark/pull/31758] > always remove unnecessary Alias in Analyzer.resolveExpression > - > > Key: SPARK-34639 > URL: https://issues.apache.org/jira/browse/SPARK-34639 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34639) always remove unnecessary Alias in Analyzer.resolveExpression
[ https://issues.apache.org/jira/browse/SPARK-34639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34639: --- Assignee: Wenchen Fan > always remove unnecessary Alias in Analyzer.resolveExpression > - > > Key: SPARK-34639 > URL: https://issues.apache.org/jira/browse/SPARK-34639 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`
[ https://issues.apache.org/jira/browse/SPARK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34743: Assignee: (was: Apache Spark) > ExpressionEncoderSuite should use deepEquals when we expect `array of array` > > > Key: SPARK-34743 > URL: https://issues.apache.org/jira/browse/SPARK-34743 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.2, 3.1.1 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`
[ https://issues.apache.org/jira/browse/SPARK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301488#comment-17301488 ] Apache Spark commented on SPARK-34743: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31837 > ExpressionEncoderSuite should use deepEquals when we expect `array of array` > > > Key: SPARK-34743 > URL: https://issues.apache.org/jira/browse/SPARK-34743 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.2, 3.1.1 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`
[ https://issues.apache.org/jira/browse/SPARK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34743: Assignee: Apache Spark > ExpressionEncoderSuite should use deepEquals when we expect `array of array` > > > Key: SPARK-34743 > URL: https://issues.apache.org/jira/browse/SPARK-34743 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.2, 3.1.1 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`
[ https://issues.apache.org/jira/browse/SPARK-34743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301487#comment-17301487 ] Apache Spark commented on SPARK-34743: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31837 > ExpressionEncoderSuite should use deepEquals when we expect `array of array` > > > Key: SPARK-34743 > URL: https://issues.apache.org/jira/browse/SPARK-34743 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.2, 3.1.1 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34743) ExpressionEncoderSuite should use deepEquals when we expect `array of array`
Dongjoon Hyun created SPARK-34743: - Summary: ExpressionEncoderSuite should use deepEquals when we expect `array of array` Key: SPARK-34743 URL: https://issues.apache.org/jira/browse/SPARK-34743 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 3.1.1, 3.0.2, 2.4.7, 2.3.4, 2.2.3, 2.1.3, 2.0.2, 1.6.3 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34742) ANSI mode: Abs throws exception if input is out of range
[ https://issues.apache.org/jira/browse/SPARK-34742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301476#comment-17301476 ] Apache Spark commented on SPARK-34742: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/31836 > ANSI mode: Abs throws exception if input is out of range > > > Key: SPARK-34742 > URL: https://issues.apache.org/jira/browse/SPARK-34742 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > For the following cases, ABS should throw exceptions since the result is out > of the range of result data type in ANSI mode. > {code:java} > SELECT abs(${Int.MinValue}); > SELECT abs(${Long.MinValue}); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34742) ANSI mode: Abs throws exception if input is out of range
[ https://issues.apache.org/jira/browse/SPARK-34742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34742: Assignee: Gengliang Wang (was: Apache Spark) > ANSI mode: Abs throws exception if input is out of range > > > Key: SPARK-34742 > URL: https://issues.apache.org/jira/browse/SPARK-34742 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > For the following cases, ABS should throw exceptions since the result is out > of the range of result data type in ANSI mode. > {code:java} > SELECT abs(${Int.MinValue}); > SELECT abs(${Long.MinValue}); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34742) ANSI mode: Abs throws exception if input is out of range
[ https://issues.apache.org/jira/browse/SPARK-34742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301477#comment-17301477 ] Apache Spark commented on SPARK-34742: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/31836 > ANSI mode: Abs throws exception if input is out of range > > > Key: SPARK-34742 > URL: https://issues.apache.org/jira/browse/SPARK-34742 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > For the following cases, ABS should throw exceptions since the result is out > of the range of result data type in ANSI mode. > {code:java} > SELECT abs(${Int.MinValue}); > SELECT abs(${Long.MinValue}); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34742) ANSI mode: Abs throws exception if input is out of range
[ https://issues.apache.org/jira/browse/SPARK-34742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34742: Assignee: Apache Spark (was: Gengliang Wang) > ANSI mode: Abs throws exception if input is out of range > > > Key: SPARK-34742 > URL: https://issues.apache.org/jira/browse/SPARK-34742 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > For the following cases, ABS should throw exceptions since the result is out > of the range of result data type in ANSI mode. > {code:java} > SELECT abs(${Int.MinValue}); > SELECT abs(${Long.MinValue}); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34742) ANSI mode: Abs throws exception if input is out of range
Gengliang Wang created SPARK-34742: -- Summary: ANSI mode: Abs throws exception if input is out of range Key: SPARK-34742 URL: https://issues.apache.org/jira/browse/SPARK-34742 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Gengliang Wang Assignee: Gengliang Wang For the following cases, ABS should throw exceptions since the result is out of the range of result data type in ANSI mode. {code:java} SELECT abs(${Int.MinValue}); SELECT abs(${Long.MinValue}); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org