[jira] [Created] (SPARK-34770) InMemoryCatalog.tableExists should not fail if database doesn't exist
Wenchen Fan created SPARK-34770: --- Summary: InMemoryCatalog.tableExists should not fail if database doesn't exist Key: SPARK-34770 URL: https://issues.apache.org/jira/browse/SPARK-34770 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34768) Respect the default input buffer size in Univocity
[ https://issues.apache.org/jira/browse/SPARK-34768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34768: Assignee: Apache Spark > Respect the default input buffer size in Univocity > -- > > Key: SPARK-34768 > URL: https://issues.apache.org/jira/browse/SPARK-34768 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2, 3.1.1 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > Currenty Univocity 2.9.1 faces a bug such as > https://github.com/uniVocity/univocity-parsers/issues/449. > While this is a bug, another factor is that we don't respect Univocity's > default value which makes Spark exposed to non-test coverage in Univocity. > We should resect Univocity's default input buffer value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34769) AnsiTypeCoercion: return narrowest convertible type among TypeCollection
[ https://issues.apache.org/jira/browse/SPARK-34769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34769: Assignee: Gengliang Wang (was: Apache Spark) > AnsiTypeCoercion: return narrowest convertible type among TypeCollection > > > Key: SPARK-34769 > URL: https://issues.apache.org/jira/browse/SPARK-34769 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, when implicit casting a data type to a `TypeCollection`, Spark > returns the first convertible data type among `TypeCollection`. > In ANSI mode, we can make the behavior more reasonable by returning the > narrowest convertible data type in `TypeCollection`. > In details, we first try to find the all the expected types we can > implicitly cast: > 1. if there is no convertible data types, return None; > 2. if there is only one convertible data type, cast input as it; > 3. otherwise if there are multiple convertible data types, find the narrowest > common data > type among them. If there is no such narrowest common data type, return > None. > Note that if the narrowest common type is Float type and the convertible > types contains Double ype, simply return Double type as the narrowest common > type to avoid potential > precision loss on converting the Integral type as Float type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34768) Respect the default input buffer size in Univocity
[ https://issues.apache.org/jira/browse/SPARK-34768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34768: Assignee: (was: Apache Spark) > Respect the default input buffer size in Univocity > -- > > Key: SPARK-34768 > URL: https://issues.apache.org/jira/browse/SPARK-34768 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2, 3.1.1 >Reporter: Hyukjin Kwon >Priority: Major > > Currenty Univocity 2.9.1 faces a bug such as > https://github.com/uniVocity/univocity-parsers/issues/449. > While this is a bug, another factor is that we don't respect Univocity's > default value which makes Spark exposed to non-test coverage in Univocity. > We should resect Univocity's default input buffer value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34769) AnsiTypeCoercion: return narrowest convertible type among TypeCollection
[ https://issues.apache.org/jira/browse/SPARK-34769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34769: Assignee: Apache Spark (was: Gengliang Wang) > AnsiTypeCoercion: return narrowest convertible type among TypeCollection > > > Key: SPARK-34769 > URL: https://issues.apache.org/jira/browse/SPARK-34769 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Currently, when implicit casting a data type to a `TypeCollection`, Spark > returns the first convertible data type among `TypeCollection`. > In ANSI mode, we can make the behavior more reasonable by returning the > narrowest convertible data type in `TypeCollection`. > In details, we first try to find the all the expected types we can > implicitly cast: > 1. if there is no convertible data types, return None; > 2. if there is only one convertible data type, cast input as it; > 3. otherwise if there are multiple convertible data types, find the narrowest > common data > type among them. If there is no such narrowest common data type, return > None. > Note that if the narrowest common type is Float type and the convertible > types contains Double ype, simply return Double type as the narrowest common > type to avoid potential > precision loss on converting the Integral type as Float type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34769) AnsiTypeCoercion: return narrowest convertible type among TypeCollection
[ https://issues.apache.org/jira/browse/SPARK-34769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303095#comment-17303095 ] Apache Spark commented on SPARK-34769: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/31859 > AnsiTypeCoercion: return narrowest convertible type among TypeCollection > > > Key: SPARK-34769 > URL: https://issues.apache.org/jira/browse/SPARK-34769 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, when implicit casting a data type to a `TypeCollection`, Spark > returns the first convertible data type among `TypeCollection`. > In ANSI mode, we can make the behavior more reasonable by returning the > narrowest convertible data type in `TypeCollection`. > In details, we first try to find the all the expected types we can > implicitly cast: > 1. if there is no convertible data types, return None; > 2. if there is only one convertible data type, cast input as it; > 3. otherwise if there are multiple convertible data types, find the narrowest > common data > type among them. If there is no such narrowest common data type, return > None. > Note that if the narrowest common type is Float type and the convertible > types contains Double ype, simply return Double type as the narrowest common > type to avoid potential > precision loss on converting the Integral type as Float type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34768) Respect the default input buffer size in Univocity
[ https://issues.apache.org/jira/browse/SPARK-34768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303094#comment-17303094 ] Apache Spark commented on SPARK-34768: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/31858 > Respect the default input buffer size in Univocity > -- > > Key: SPARK-34768 > URL: https://issues.apache.org/jira/browse/SPARK-34768 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2, 3.1.1 >Reporter: Hyukjin Kwon >Priority: Major > > Currenty Univocity 2.9.1 faces a bug such as > https://github.com/uniVocity/univocity-parsers/issues/449. > While this is a bug, another factor is that we don't respect Univocity's > default value which makes Spark exposed to non-test coverage in Univocity. > We should resect Univocity's default input buffer value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34769) AnsiTypeCoercion: return narrowest convertible type among TypeCollection
Gengliang Wang created SPARK-34769: -- Summary: AnsiTypeCoercion: return narrowest convertible type among TypeCollection Key: SPARK-34769 URL: https://issues.apache.org/jira/browse/SPARK-34769 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Gengliang Wang Assignee: Gengliang Wang Currently, when implicit casting a data type to a `TypeCollection`, Spark returns the first convertible data type among `TypeCollection`. In ANSI mode, we can make the behavior more reasonable by returning the narrowest convertible data type in `TypeCollection`. In details, we first try to find the all the expected types we can implicitly cast: 1. if there is no convertible data types, return None; 2. if there is only one convertible data type, cast input as it; 3. otherwise if there are multiple convertible data types, find the narrowest common data type among them. If there is no such narrowest common data type, return None. Note that if the narrowest common type is Float type and the convertible types contains Double ype, simply return Double type as the narrowest common type to avoid potential precision loss on converting the Integral type as Float type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34768) Respect the default input buffer size in Univocity
[ https://issues.apache.org/jira/browse/SPARK-34768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34768: - Issue Type: Bug (was: Improvement) > Respect the default input buffer size in Univocity > -- > > Key: SPARK-34768 > URL: https://issues.apache.org/jira/browse/SPARK-34768 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2, 3.1.1 >Reporter: Hyukjin Kwon >Priority: Major > > Currenty Univocity 2.9.1 faces a bug such as > https://github.com/uniVocity/univocity-parsers/issues/449. > While this is a bug, another factor is that we don't respect Univocity's > default value which makes Spark exposed to non-test coverage in Univocity. > We should resect Univocity's default input buffer value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34768) Respect the default input buffer size in Univocity
[ https://issues.apache.org/jira/browse/SPARK-34768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34768: - Affects Version/s: 3.0.2 > Respect the default input buffer size in Univocity > -- > > Key: SPARK-34768 > URL: https://issues.apache.org/jira/browse/SPARK-34768 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.2, 3.1.1 >Reporter: Hyukjin Kwon >Priority: Major > > Currenty Univocity 2.9.1 faces a bug such as > https://github.com/uniVocity/univocity-parsers/issues/449. > While this is a bug, another factor is that we don't respect Univocity's > default value which makes Spark exposed to non-test coverage in Univocity. > We should resect Univocity's default input buffer value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34768) Respect the default input buffer size in Univocity
Hyukjin Kwon created SPARK-34768: Summary: Respect the default input buffer size in Univocity Key: SPARK-34768 URL: https://issues.apache.org/jira/browse/SPARK-34768 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.1 Reporter: Hyukjin Kwon Currenty Univocity 2.9.1 faces a bug such as https://github.com/uniVocity/univocity-parsers/issues/449. While this is a bug, another factor is that we don't respect Univocity's default value which makes Spark exposed to non-test coverage in Univocity. We should resect Univocity's default input buffer value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10697) Lift Calculation in Association Rule mining
[ https://issues.apache.org/jira/browse/SPARK-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303086#comment-17303086 ] Yashwanth Kumar edited comment on SPARK-10697 at 3/17/21, 4:36 AM: --- Glad that the change proposed by me 5 yrs back got resolved. Sorry that apache account got disabled. Got a new one now. looking forward to contribute. was (Author: yashkumards): Glad that the change proposed by me 5 yrs back got resolved. Sorry that apache account got disabled. Got a new one now. > Lift Calculation in Association Rule mining > --- > > Key: SPARK-10697 > URL: https://issues.apache.org/jira/browse/SPARK-10697 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Yashwanth Kumar >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > Lift is to be calculated for Association rule mining in > AssociationRules.scala under FPM. > Lift is a measure of the performance of a Association rules. > Adding lift will help to compare the model efficiency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10697) Lift Calculation in Association Rule mining
[ https://issues.apache.org/jira/browse/SPARK-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303086#comment-17303086 ] Yashwanth Kumar commented on SPARK-10697: - Glad that the change proposed by me 5 yrs back got resolved. Sorry that apache account got disabled. Got a new one now. > Lift Calculation in Association Rule mining > --- > > Key: SPARK-10697 > URL: https://issues.apache.org/jira/browse/SPARK-10697 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Yashwanth Kumar >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > Lift is to be calculated for Association rule mining in > AssociationRules.scala under FPM. > Lift is a measure of the performance of a Association rules. > Adding lift will help to compare the model efficiency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34767) sparkstreaming对接kafka数据,在idea程序上出现警告信息并卡住不动的解决办法
[ https://issues.apache.org/jira/browse/SPARK-34767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-34767. -- Resolution: Incomplete > sparkstreaming对接kafka数据,在idea程序上出现警告信息并卡住不动的解决办法 > > > Key: SPARK-34767 > URL: https://issues.apache.org/jira/browse/SPARK-34767 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.1.1 >Reporter: MrYang >Priority: Blocker > > 2021-03-17 11:16:56,711 WARN --- [ main] > org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding > enable.auto.commit to false for executor > 2021-03-17 11:16:56,714 WARN --- [ main] > org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding > auto.offset.reset to none for executor > 2021-03-17 11:16:56,714 WARN --- [ main] > org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding > executor group.id to spark-executor-recommender > 2021-03-17 11:16:56,715 WARN --- [ main] > org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding > receive.buffer.bytes to 65536 see KAFKA-3135 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34767) sparkstreaming对接kafka数据,在idea程序上出现警告信息并卡住不动的解决办法
[ https://issues.apache.org/jira/browse/SPARK-34767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303082#comment-17303082 ] Hyukjin Kwon commented on SPARK-34767: -- 1. Please use English to communicate other maintainers. Many people don't speak Chinese. 2. Spark 2.1.x is EOL. > sparkstreaming对接kafka数据,在idea程序上出现警告信息并卡住不动的解决办法 > > > Key: SPARK-34767 > URL: https://issues.apache.org/jira/browse/SPARK-34767 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.1.1 >Reporter: MrYang >Priority: Blocker > > 2021-03-17 11:16:56,711 WARN --- [ main] > org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding > enable.auto.commit to false for executor > 2021-03-17 11:16:56,714 WARN --- [ main] > org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding > auto.offset.reset to none for executor > 2021-03-17 11:16:56,714 WARN --- [ main] > org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding > executor group.id to spark-executor-recommender > 2021-03-17 11:16:56,715 WARN --- [ main] > org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding > receive.buffer.bytes to 65536 see KAFKA-3135 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34504) Avoid unnecessary view resolving and remove the `performCheck` flag
[ https://issues.apache.org/jira/browse/SPARK-34504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34504. - Fix Version/s: 3.1.2 3.2.0 Resolution: Fixed Issue resolved by pull request 31853 [https://github.com/apache/spark/pull/31853] > Avoid unnecessary view resolving and remove the `performCheck` flag > --- > > Key: SPARK-34504 > URL: https://issues.apache.org/jira/browse/SPARK-34504 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Linhong Liu >Priority: Major > Fix For: 3.2.0, 3.1.2 > > > in SPARK-34490, I added a `performCheck` flag to skip analysis check when > resolving views. This is due to some view resolution is unnecessary. So we > can avoid these unnecessary view resolution and remove the `performCheck` > flag. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34504) Avoid unnecessary view resolving and remove the `performCheck` flag
[ https://issues.apache.org/jira/browse/SPARK-34504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34504: --- Assignee: Wenchen Fan > Avoid unnecessary view resolving and remove the `performCheck` flag > --- > > Key: SPARK-34504 > URL: https://issues.apache.org/jira/browse/SPARK-34504 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Linhong Liu >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.2.0, 3.1.2 > > > in SPARK-34490, I added a `performCheck` flag to skip analysis check when > resolving views. This is due to some view resolution is unnecessary. So we > can avoid these unnecessary view resolution and remove the `performCheck` > flag. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34767) sparkstreaming对接kafka数据,在idea程序上出现警告信息并卡住不动的解决办法
MrYang created SPARK-34767: -- Summary: sparkstreaming对接kafka数据,在idea程序上出现警告信息并卡住不动的解决办法 Key: SPARK-34767 URL: https://issues.apache.org/jira/browse/SPARK-34767 Project: Spark Issue Type: Bug Components: DStreams Affects Versions: 2.1.1 Reporter: MrYang 2021-03-17 11:16:56,711 WARN --- [ main] org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding enable.auto.commit to false for executor 2021-03-17 11:16:56,714 WARN --- [ main] org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding auto.offset.reset to none for executor 2021-03-17 11:16:56,714 WARN --- [ main] org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding executor group.id to spark-executor-recommender 2021-03-17 11:16:56,715 WARN --- [ main] org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding receive.buffer.bytes to 65536 see KAFKA-3135 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28220) join foldable condition not pushed down when parent filter is totally pushed down
[ https://issues.apache.org/jira/browse/SPARK-28220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303058#comment-17303058 ] Apache Spark commented on SPARK-28220: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/31857 > join foldable condition not pushed down when parent filter is totally pushed > down > - > > Key: SPARK-28220 > URL: https://issues.apache.org/jira/browse/SPARK-28220 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 3.0.0 >Reporter: liupengcheng >Priority: Major > > We encountered a issue that join conditions not pushed down when we are > running spark app on spark2.3, after carefully looking into the code and > debugging, we found that it's because there is a bug in the rule > `PushPredicateThroughJoin`: > It will try to push parent filter down though the join, however, when the > parent filter is wholly pushed down through the join, the join will become > the top node, and then the `transform` method will skip the join to apply the > rule. > > Suppose we have two tables: table1 and table2: > table1: (a: string, b: string, c: string) > table2: (d: string) > sql as: > > {code:java} > select * from table1 left join (select d, 'w1' as r from table2) on a = d and > r = 'w2' where b = 2{code} > > let's focus on the following optimizer rules: > PushPredicateThroughJoin > FodablePropagation > BooleanSimplification > PruneFilters > > In the above case, on the first iteration of these rules: > PushPredicateThroughJoin -> > {code:java} > select * from table1 where b=2 left join (select d, 'w1' as r from table2) on > a = d and r = 'w2' > {code} > FodablePropagation -> > {code:java} > select * from table1 where b=2 left join (select d, 'w1' as r from table2) on > a = d and 'w1' = 'w2'{code} > BooleanSimplification -> > {code:java} > select * from table1 where b=2 left join (select d, 'w1' as r from table2) on > false{code} > PruneFilters -> No effective > > After several iteration of these rules, the join condition will still never > be pushed to the > right hand of the left join. thus, in some case(e.g. Large right table), the > `BroadcastNestedLoopJoin` may be slow or oom. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28220) join foldable condition not pushed down when parent filter is totally pushed down
[ https://issues.apache.org/jira/browse/SPARK-28220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303057#comment-17303057 ] Apache Spark commented on SPARK-28220: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/31857 > join foldable condition not pushed down when parent filter is totally pushed > down > - > > Key: SPARK-28220 > URL: https://issues.apache.org/jira/browse/SPARK-28220 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 3.0.0 >Reporter: liupengcheng >Priority: Major > > We encountered a issue that join conditions not pushed down when we are > running spark app on spark2.3, after carefully looking into the code and > debugging, we found that it's because there is a bug in the rule > `PushPredicateThroughJoin`: > It will try to push parent filter down though the join, however, when the > parent filter is wholly pushed down through the join, the join will become > the top node, and then the `transform` method will skip the join to apply the > rule. > > Suppose we have two tables: table1 and table2: > table1: (a: string, b: string, c: string) > table2: (d: string) > sql as: > > {code:java} > select * from table1 left join (select d, 'w1' as r from table2) on a = d and > r = 'w2' where b = 2{code} > > let's focus on the following optimizer rules: > PushPredicateThroughJoin > FodablePropagation > BooleanSimplification > PruneFilters > > In the above case, on the first iteration of these rules: > PushPredicateThroughJoin -> > {code:java} > select * from table1 where b=2 left join (select d, 'w1' as r from table2) on > a = d and r = 'w2' > {code} > FodablePropagation -> > {code:java} > select * from table1 where b=2 left join (select d, 'w1' as r from table2) on > a = d and 'w1' = 'w2'{code} > BooleanSimplification -> > {code:java} > select * from table1 where b=2 left join (select d, 'w1' as r from table2) on > false{code} > PruneFilters -> No effective > > After several iteration of these rules, the join condition will still never > be pushed to the > right hand of the left join. thus, in some case(e.g. Large right table), the > `BroadcastNestedLoopJoin` may be slow or oom. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34762) Many PR's Scala 2.13 build action failed
[ https://issues.apache.org/jira/browse/SPARK-34762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303053#comment-17303053 ] Yang Jie commented on SPARK-34762: -- [~dongjoon] It seems that the problem still exists, such as https://github.com/apache/spark/pull/31856, [https://github.com/apache/spark/pull/31855] and https://github.com/apache/spark/pull/31854 > Many PR's Scala 2.13 build action failed > > > Key: SPARK-34762 > URL: https://issues.apache.org/jira/browse/SPARK-34762 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Major > > PR with Scala 2.13 build failure includes > * [https://github.com/apache/spark/pull/31849] > * [https://github.com/apache/spark/pull/31848] > * [https://github.com/apache/spark/pull/31844] > * [https://github.com/apache/spark/pull/31843] > * https://github.com/apache/spark/pull/31841 > {code:java} > [error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1: > error: package org.apache.commons.cli does not exist > 1278[error] import org.apache.commons.cli.GnuParser; > 1279[error] ^ > 1280[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1: > error: cannot find symbol > 1281[error] private final Options options = new Options(); > 1282[error] ^ symbol: class Options > 1283[error] location: class ServerOptionsProcessor > 1284[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:177:1: > error: package org.apache.commons.cli does not exist > 1285[error] private org.apache.commons.cli.CommandLine commandLine; > 1286[error] ^ > 1287[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:255:1: > error: cannot find symbol > 1288[error] HelpOptionExecutor(String serverName, Options options) { > 1289[error] ^ symbol: class > Options > 1290[error] location: class HelpOptionExecutor > 1291[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1: > error: cannot find symbol > 1292[error] private final Options options = new Options(); > 1293[error] ^ symbol: class Options > 1294[error] location: class ServerOptionsProcessor > 1295[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:185:1: > error: cannot find symbol > 1296[error] options.addOption(OptionBuilder > 1297[error] ^ symbol: variable OptionBuilder > 1298[error] location: class ServerOptionsProcessor > 1299[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:192:1: > error: cannot find symbol > 1300[error] options.addOption(new Option("H", "help", false, "Print > help information")); > 1301[error] ^ symbol: class Option > 1302[error] location: class ServerOptionsProcessor > 1303[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:197:1: > error: cannot find symbol > 1304[error] commandLine = new GnuParser().parse(options, argv); > 1305[error] ^ symbol: class GnuParser > 1306[error] location: class ServerOptionsProcessor > 1307[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:211:1: > error: cannot find symbol > 1308[error] } catch (ParseException e) { > 1309[error]^ symbol: class ParseException > 1310[error] location: class ServerOptionsProcessor > 1311[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:262:1: > error: cannot find symbol > 1312[error] new HelpFormatter().printHelp(serverName, options); > 1313[error] ^ symbol: class HelpFormatter > 1314[error] location: class HelpOptionExecutor > 1315[error] Note: Some input files use or override a deprecated API. > 1316[error] Note: Recompile with -Xlint:deprecation for details. > 1317[error] 16 errors > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional
[jira] [Commented] (SPARK-34766) Do not capture maven config for views
[ https://issues.apache.org/jira/browse/SPARK-34766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303022#comment-17303022 ] Apache Spark commented on SPARK-34766: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/31856 > Do not capture maven config for views > - > > Key: SPARK-34766 > URL: https://issues.apache.org/jira/browse/SPARK-34766 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > > Due to the bad network, we always use the thirdparty maven repo to run test. > e.g., > {code:java} > build/sbt "test:testOnly *SQLQueryTestSuite" > -Dspark.sql.maven.additionalRemoteRepositories=x > {code} > > It's failed with such error msg > ``` > [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds) > [info] show-tblproperties.sql > [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][ > [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" > Result did not match for query #6 > [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464) > ``` > It's not necessary to capture the maven config to view since it's a session > level config. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34766) Do not capture maven config for views
[ https://issues.apache.org/jira/browse/SPARK-34766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34766: Assignee: Apache Spark > Do not capture maven config for views > - > > Key: SPARK-34766 > URL: https://issues.apache.org/jira/browse/SPARK-34766 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Assignee: Apache Spark >Priority: Minor > > Due to the bad network, we always use the thirdparty maven repo to run test. > e.g., > {code:java} > build/sbt "test:testOnly *SQLQueryTestSuite" > -Dspark.sql.maven.additionalRemoteRepositories=x > {code} > > It's failed with such error msg > ``` > [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds) > [info] show-tblproperties.sql > [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][ > [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" > Result did not match for query #6 > [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464) > ``` > It's not necessary to capture the maven config to view since it's a session > level config. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34766) Do not capture maven config for views
[ https://issues.apache.org/jira/browse/SPARK-34766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34766: Assignee: (was: Apache Spark) > Do not capture maven config for views > - > > Key: SPARK-34766 > URL: https://issues.apache.org/jira/browse/SPARK-34766 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > > Due to the bad network, we always use the thirdparty maven repo to run test. > e.g., > {code:java} > build/sbt "test:testOnly *SQLQueryTestSuite" > -Dspark.sql.maven.additionalRemoteRepositories=x > {code} > > It's failed with such error msg > ``` > [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds) > [info] show-tblproperties.sql > [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][ > [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" > Result did not match for query #6 > [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464) > ``` > It's not necessary to capture the maven config to view since it's a session > level config. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34766) Do not capture maven config for views
[ https://issues.apache.org/jira/browse/SPARK-34766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ulysses you updated SPARK-34766: Description: Due to the bad network, we always use the thirdparty maven repo to run test. e.g., {code:java} build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=x {code} It's failed with such error msg ``` [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds) [info] show-tblproperties.sql [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][ [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" Result did not match for query #6 [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464) ``` It's not necessary to capture the maven config to view since it's a session level config. > Do not capture maven config for views > - > > Key: SPARK-34766 > URL: https://issues.apache.org/jira/browse/SPARK-34766 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > > Due to the bad network, we always use the thirdparty maven repo to run test. > e.g., > {code:java} > build/sbt "test:testOnly *SQLQueryTestSuite" > -Dspark.sql.maven.additionalRemoteRepositories=x > {code} > > It's failed with such error msg > ``` > [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds) > [info] show-tblproperties.sql > [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][ > [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" > Result did not match for query #6 > [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464) > ``` > It's not necessary to capture the maven config to view since it's a session > level config. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34766) Do not capture maven config for views
[ https://issues.apache.org/jira/browse/SPARK-34766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ulysses you updated SPARK-34766: Environment: (was: Due to the bad network, we always use the thirdparty maven repo to run test. e.g., {code:java} build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=x {code} It's failed with such error msg ``` [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds) [info] show-tblproperties.sql [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][ [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" Result did not match for query #6 [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464) ``` It's not necessary to capture the maven config to view since it's a session level config. ) > Do not capture maven config for views > - > > Key: SPARK-34766 > URL: https://issues.apache.org/jira/browse/SPARK-34766 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34766) Do not capture maven config for views
ulysses you created SPARK-34766: --- Summary: Do not capture maven config for views Key: SPARK-34766 URL: https://issues.apache.org/jira/browse/SPARK-34766 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Environment: Due to the bad network, we always use the thirdparty maven repo to run test. e.g., {code:java} build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=x {code} It's failed with such error msg ``` [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds) [info] show-tblproperties.sql [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][ [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" Result did not match for query #6 [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464) ``` It's not necessary to capture the maven config to view since it's a session level config. Reporter: ulysses you -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34765) Linear Models standardization optimization
[ https://issues.apache.org/jira/browse/SPARK-34765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-34765: - Description: Existing impl of standardization in linear models does *NOT* center the vectors by removing the means, for the purpose of keep the dataset sparsity. However, this will cause feature values with small var be scaled to large values, and underlying solver like LBFGS can not efficiently handle this case. see SPARK-34448 for details. If internal vectors are centers (like other famous impl, i.e. GLMNET/Scikit-Learn), the convergence ratio will be better. In the case in SPARK-34448, the number of iteration to convergence will be reduced from 93 to 6. Moreover, the final solution is much more close to the one in GLMNET. luckily, we find a new way to 'virtually' center the vectors without densifying the dataset, iff: 1, fitIntercept is true; 2, no penalty on the intercept, it seem this is always true in existing impls; 3, no bounds on the intercept; We will also need to check whether this new methods work in all other linear models (i.e, mlor/svc/lir/aft, etc.) as we expected , and introduce it into those models if possible. was: Existing impl of standardization in linear models do NOT center the vectors by removing the means, for the purpose of keep the dataset sparsity. However, this will cause feature values with small var be scaled to large values, and underlying solver like LBFGS can not efficiently handle this case. see SPARK-34448 for details. If internal vectors are centers (like other famous impl, i.e. GLMNET/Scikit-Learn), the convergence ratio will be better. In the case in SPARK-34448, the number of iteration to convergence will be reduced from 93 to 6. Moreover, the final solution is much more close to the one in GLMNET. luckily, we find a new way to 'virtually' center the vectors without densifying the dataset, iff: 1, fitIntercept is true; 2, no penalty on the intercept, it seem this is always true in existing impls; 3, no bounds on the intercept; We will also need to check whether this new methods work in all other linear models (i.e, mlor/svc/lir/aft, etc.) as we expected , and introduce it into those model if possible. > Linear Models standardization optimization > -- > > Key: SPARK-34765 > URL: https://issues.apache.org/jira/browse/SPARK-34765 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 3.2.0, 3.1.1 >Reporter: zhengruifeng >Priority: Major > > Existing impl of standardization in linear models does *NOT* center the > vectors by removing the means, for the purpose of keep the dataset sparsity. > However, this will cause feature values with small var be scaled to large > values, and underlying solver like LBFGS can not efficiently handle this > case. see SPARK-34448 for details. > If internal vectors are centers (like other famous impl, i.e. > GLMNET/Scikit-Learn), the convergence ratio will be better. In the case in > SPARK-34448, the number of iteration to convergence will be reduced from 93 > to 6. Moreover, the final solution is much more close to the one in GLMNET. > luckily, we find a new way to 'virtually' center the vectors without > densifying the dataset, iff: > 1, fitIntercept is true; > 2, no penalty on the intercept, it seem this is always true in existing > impls; > 3, no bounds on the intercept; > > We will also need to check whether this new methods work in all other linear > models (i.e, mlor/svc/lir/aft, etc.) as we expected , and introduce it into > those models if possible. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34765) Linear Models standardization optimization
[ https://issues.apache.org/jira/browse/SPARK-34765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-34765: - Issue Type: Umbrella (was: Improvement) > Linear Models standardization optimization > -- > > Key: SPARK-34765 > URL: https://issues.apache.org/jira/browse/SPARK-34765 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 3.2.0, 3.1.1 >Reporter: zhengruifeng >Priority: Major > > Existing impl of standardization in linear models do NOT center the vectors > by removing the means, for the purpose of keep the dataset sparsity. > However, this will cause feature values with small var be scaled to large > values, and underlying solver like LBFGS can not efficiently handle this > case. see SPARK-34448 for details. > If internal vectors are centers (like other famous impl, i.e. > GLMNET/Scikit-Learn), the convergence ratio will be better. In the case in > SPARK-34448, the number of iteration to convergence will be reduced from 93 > to 6. Moreover, the final solution is much more close to the one in GLMNET. > luckily, we find a new way to 'virtually' center the vectors without > densifying the dataset, iff: > 1, fitIntercept is true; > 2, no penalty on the intercept, it seem this is always true in existing impls; > 3, no bounds on the intercept; > > We will also need to check whether this new methods work in all other linear > models (i.e, mlor/svc/lir/aft, etc.) as we expected , and introduce it into > those model if possible. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34714) collect_list(struct()) fails when used with GROUP BY
[ https://issues.apache.org/jira/browse/SPARK-34714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-34714. -- Fix Version/s: 3.1.2 Resolution: Fixed > collect_list(struct()) fails when used with GROUP BY > > > Key: SPARK-34714 > URL: https://issues.apache.org/jira/browse/SPARK-34714 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 > Environment: Databricks Runtime 8.0 >Reporter: Lauri Koobas >Priority: Major > Fix For: 3.1.2 > > > The following is failing in DBR8.0 / Spark 3.1.1, but works in earlier DBR > and Spark versions: > {quote}with step_1 as ( > select 'E' as name, named_struct('subfield', 1) as field_1 > ) > select name, collect_list(struct(field_1.subfield)) > from step_1 > group by 1 > {quote} > Fails with the following error message: > {quote}AnalysisException: cannot resolve > 'struct(step_1.`field_1`.`subfield`)' due to data type mismatch: Only > foldable string expressions are allowed to appear at odd position, got: > NamePlaceholder > {quote} > If you modify the query in any of the following ways then it still works:: > * if you remove the field "name" and the "group by 1" part of the query > * if you remove the "struct()" from within the collect_list() > * if you use "named_struct()" instead of "struct()" within the collect_list() > Similarly collect_set() is broken and possibly more related functions, but I > haven't done thorough testing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34714) collect_list(struct()) fails when used with GROUP BY
[ https://issues.apache.org/jira/browse/SPARK-34714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303009#comment-17303009 ] Takeshi Yamamuro commented on SPARK-34714: -- Ah, I've checked the latest branch-3.1 again and I found the issue goes away. So, this issue will be resolved in v3.1.2. > collect_list(struct()) fails when used with GROUP BY > > > Key: SPARK-34714 > URL: https://issues.apache.org/jira/browse/SPARK-34714 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 > Environment: Databricks Runtime 8.0 >Reporter: Lauri Koobas >Priority: Major > > The following is failing in DBR8.0 / Spark 3.1.1, but works in earlier DBR > and Spark versions: > {quote}with step_1 as ( > select 'E' as name, named_struct('subfield', 1) as field_1 > ) > select name, collect_list(struct(field_1.subfield)) > from step_1 > group by 1 > {quote} > Fails with the following error message: > {quote}AnalysisException: cannot resolve > 'struct(step_1.`field_1`.`subfield`)' due to data type mismatch: Only > foldable string expressions are allowed to appear at odd position, got: > NamePlaceholder > {quote} > If you modify the query in any of the following ways then it still works:: > * if you remove the field "name" and the "group by 1" part of the query > * if you remove the "struct()" from within the collect_list() > * if you use "named_struct()" instead of "struct()" within the collect_list() > Similarly collect_set() is broken and possibly more related functions, but I > haven't done thorough testing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34765) Linear Models standardization optimization
zhengruifeng created SPARK-34765: Summary: Linear Models standardization optimization Key: SPARK-34765 URL: https://issues.apache.org/jira/browse/SPARK-34765 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.1.1, 3.2.0 Reporter: zhengruifeng Existing impl of standardization in linear models do NOT center the vectors by removing the means, for the purpose of keep the dataset sparsity. However, this will cause feature values with small var be scaled to large values, and underlying solver like LBFGS can not efficiently handle this case. see SPARK-34448 for details. If internal vectors are centers (like other famous impl, i.e. GLMNET/Scikit-Learn), the convergence ratio will be better. In the case in SPARK-34448, the number of iteration to convergence will be reduced from 93 to 6. Moreover, the final solution is much more close to the one in GLMNET. luckily, we find a new way to 'virtually' center the vectors without densifying the dataset, iff: 1, fitIntercept is true; 2, no penalty on the intercept, it seem this is always true in existing impls; 3, no bounds on the intercept; We will also need to check whether this new methods work in all other linear models (i.e, mlor/svc/lir/aft, etc.) as we expected , and introduce it into those model if possible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
[ https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303001#comment-17303001 ] Nivas Umapathy edited comment on SPARK-34751 at 3/17/21, 12:53 AM: --- the schema is extracted from the same file, before materializing the data df = glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet') ^ by schema I meant this. The file was written out from pandas dataframe. It was written out using this {{}} {{import pandas as pd}} {{df = pd.DataFrame(}} {{data = {}} {{"COL 1": [87.0921538,41.26487033,1.731626487,99.02779887,80.750347,62.89799664,84.27772144,12.78995399,58.13994625,54.51677768],}} {{"COL,2": [28.431596,13.50141322,28.60878912,50.59429628,53.77345338,3.278319754,89.88524435,57.29173215,34.75955608,22.50907852],}} {{"COL;3": [48.12359525,2.751809433,64.45305108,40.97279762,46.3506431,68.57561523,67.52866381,18.70752371,44.86086801,8.42884315],}} {{"COL{4": [25.23141131,65.20640894,56.83503264,21.46097087,59.22963758,99.55784318,8.02616508,75.29924438,3.911268106,90.1820556],}} {{"COL}5": [37.00662369,82.24478025,27.89576774,9.549598639,46.92239754,10.48954042,81.71312268,49.991685,43.78556399,79.00133828],}} {{"COL(6": [71.21354798,82.33860851,12.88393027,23.47301417,76.36836392,18.43024893,51.48770487,93.20889954,72.66516434,18.07311939],}} {{"COL)7": [68.00032082,39.91265109,83.47701751,42.71072597,33.54784094,94.63751895,3.364241739,0.792257736,78.63395232,70.8626348],}} {{"COL\n8": [77.80604836,61.08923308,70.70871195,99.33277829,79.77837072,56.28812485,34.03977847,13.40720489,87.71281052,64.80060217],}} {{"COL=9": [60.00505851,46.51367893,87.1346726,7.202332939,49.50378799,56.70949031,99.39792697,52.08074715,18.25891755,67.88110289],}} {{"COL\t10": [78.60259718,96.87558507,20.04134901,80.46408956,69.97610739,42.96954652,22.45733464,32.00411095,52.83023296,87.48870904],}} {{}, columns = ["COL 1","COL,2","COL;3","COL\\{4","COL}5","COL(6","COL)7","COL\n8","COL=9","COL\t10"])}} {{df.to_parquet('invalid_columns_double.parquet')}} Here is a link to my databricks notebook to reproduce this [https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5940072345564347/3863439224328194/623184285031795/latest.html] was (Author: toocoolblue2000): the schema is extracted from the same file, before materializing the data df = glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet') ^ by schema I meant this. The file was written out from pandas dataframe. It was written out using this {{}} {{import pandas as pd}} {{df = pd.DataFrame(}} {{data = {}} {{"COL 1": [87.0921538,41.26487033,1.731626487,99.02779887,80.750347,62.89799664,84.27772144,12.78995399,58.13994625,54.51677768],}} {{"COL,2": [28.431596,13.50141322,28.60878912,50.59429628,53.77345338,3.278319754,89.88524435,57.29173215,34.75955608,22.50907852],}} {{"COL;3": [48.12359525,2.751809433,64.45305108,40.97279762,46.3506431,68.57561523,67.52866381,18.70752371,44.86086801,8.42884315],}} {{"COL{4": [25.23141131,65.20640894,56.83503264,21.46097087,59.22963758,99.55784318,8.02616508,75.29924438,3.911268106,90.1820556],}} {{"COL}5": [37.00662369,82.24478025,27.89576774,9.549598639,46.92239754,10.48954042,81.71312268,49.991685,43.78556399,79.00133828],}} {{"COL(6": [71.21354798,82.33860851,12.88393027,23.47301417,76.36836392,18.43024893,51.48770487,93.20889954,72.66516434,18.07311939],}} {{"COL)7": [68.00032082,39.91265109,83.47701751,42.71072597,33.54784094,94.63751895,3.364241739,0.792257736,78.63395232,70.8626348],}} {{"COL\n8": [77.80604836,61.08923308,70.70871195,99.33277829,79.77837072,56.28812485,34.03977847,13.40720489,87.71281052,64.80060217],}} {{"COL=9": [60.00505851,46.51367893,87.1346726,7.202332939,49.50378799,56.70949031,99.39792697,52.08074715,18.25891755,67.88110289],}} {{"COL\t10": [78.60259718,96.87558507,20.04134901,80.46408956,69.97610739,42.96954652,22.45733464,32.00411095,52.83023296,87.48870904],}} {{}, columns = ["COL 1","COL,2","COL;3","COL\{4","COL}5","COL(6","COL)7","COL\n8","COL=9","COL\t10"])}} {{df.to_parquet('invalid_columns_double.parquet')}} Here is a link to my databricks notebook [https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5940072345564347/3863439224328194/623184285031795/latest.html] > Parquet with invalid chars on column name reads double as null when a clean > schema is applied > - > > Key: SPARK-34751 > URL: https://issues.apache.org/jira/browse/SPARK-34751 > Project: Spark > Issue Type: Bug > Components: Input/Output >
[jira] [Comment Edited] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
[ https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303001#comment-17303001 ] Nivas Umapathy edited comment on SPARK-34751 at 3/17/21, 12:52 AM: --- the schema is extracted from the same file, before materializing the data df = glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet') ^ by schema I meant this. The file was written out from pandas dataframe. It was written out using this {{}} {{import pandas as pd}} {{df = pd.DataFrame(}} {{data = {}} {{"COL 1": [87.0921538,41.26487033,1.731626487,99.02779887,80.750347,62.89799664,84.27772144,12.78995399,58.13994625,54.51677768],}} {{"COL,2": [28.431596,13.50141322,28.60878912,50.59429628,53.77345338,3.278319754,89.88524435,57.29173215,34.75955608,22.50907852],}} {{"COL;3": [48.12359525,2.751809433,64.45305108,40.97279762,46.3506431,68.57561523,67.52866381,18.70752371,44.86086801,8.42884315],}} {{"COL{4": [25.23141131,65.20640894,56.83503264,21.46097087,59.22963758,99.55784318,8.02616508,75.29924438,3.911268106,90.1820556],}} {{"COL}5": [37.00662369,82.24478025,27.89576774,9.549598639,46.92239754,10.48954042,81.71312268,49.991685,43.78556399,79.00133828],}} {{"COL(6": [71.21354798,82.33860851,12.88393027,23.47301417,76.36836392,18.43024893,51.48770487,93.20889954,72.66516434,18.07311939],}} {{"COL)7": [68.00032082,39.91265109,83.47701751,42.71072597,33.54784094,94.63751895,3.364241739,0.792257736,78.63395232,70.8626348],}} {{"COL\n8": [77.80604836,61.08923308,70.70871195,99.33277829,79.77837072,56.28812485,34.03977847,13.40720489,87.71281052,64.80060217],}} {{"COL=9": [60.00505851,46.51367893,87.1346726,7.202332939,49.50378799,56.70949031,99.39792697,52.08074715,18.25891755,67.88110289],}} {{"COL\t10": [78.60259718,96.87558507,20.04134901,80.46408956,69.97610739,42.96954652,22.45733464,32.00411095,52.83023296,87.48870904],}} {{}, columns = ["COL 1","COL,2","COL;3","COL\{4","COL}5","COL(6","COL)7","COL\n8","COL=9","COL\t10"])}} {{df.to_parquet('invalid_columns_double.parquet')}} Here is a link to my databricks notebook [https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5940072345564347/3863439224328194/623184285031795/latest.html] was (Author: toocoolblue2000): the schema is extracted from the same file, before materializing the data df = glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet') ^ by schema I meant this. The file was written out from pandas dataframe. Here is a link to my databricks notebook https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5940072345564347/3863439224328194/623184285031795/latest.html > Parquet with invalid chars on column name reads double as null when a clean > schema is applied > - > > Key: SPARK-34751 > URL: https://issues.apache.org/jira/browse/SPARK-34751 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.3, 3.1.1 > Environment: Pyspark 2.4.3 > AWS Glue Dev Endpoint EMR >Reporter: Nivas Umapathy >Priority: Major > Attachments: invalid_columns_double.parquet > > > I have a parquet file that has data with invalid column names on it. > [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the > file attached with this ticket. > I tried to load this file with > {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} > {{df = df.withColumnRenamed('COL 1', 'COL_1')}} > {{df = df.withColumnRenamed('COL,2', 'COL_2')}} > {{df = df.withColumnRenamed('COL;3', 'COL_3') }} > and so on. > Now if i call > {{df.show()}} > it throws this exception that is still pointing to the old column name. > {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains > invalid character(s) among " ,;{}()}} > {{n}} > {{t=". Please use alias to rename it.;'}} > > When i read about it in some blogs, there was suggestion to re-read the same > parquet with new schema applied. So i did > {{df = > glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} > > and it works, but all the data in the dataframe are null. The same works for > String datatypes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
[ https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303001#comment-17303001 ] Nivas Umapathy commented on SPARK-34751: the schema is extracted from the same file, before materializing the data df = glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet') ^ by schema I meant this. The file was written out from pandas dataframe. Here is a link to my databricks notebook https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5940072345564347/3863439224328194/623184285031795/latest.html > Parquet with invalid chars on column name reads double as null when a clean > schema is applied > - > > Key: SPARK-34751 > URL: https://issues.apache.org/jira/browse/SPARK-34751 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.3, 3.1.1 > Environment: Pyspark 2.4.3 > AWS Glue Dev Endpoint EMR >Reporter: Nivas Umapathy >Priority: Major > Attachments: invalid_columns_double.parquet > > > I have a parquet file that has data with invalid column names on it. > [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the > file attached with this ticket. > I tried to load this file with > {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} > {{df = df.withColumnRenamed('COL 1', 'COL_1')}} > {{df = df.withColumnRenamed('COL,2', 'COL_2')}} > {{df = df.withColumnRenamed('COL;3', 'COL_3') }} > and so on. > Now if i call > {{df.show()}} > it throws this exception that is still pointing to the old column name. > {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains > invalid character(s) among " ,;{}()}} > {{n}} > {{t=". Please use alias to rename it.;'}} > > When i read about it in some blogs, there was suggestion to re-read the same > parquet with new schema applied. So i did > {{df = > glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} > > and it works, but all the data in the dataframe are null. The same works for > String datatypes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
[ https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302988#comment-17302988 ] Takeshi Yamamuro commented on SPARK-34751: -- Could you describe more to reproduce the issue? what's a schema of the parquet file and how did you write the parquet file, brabrabra. > Parquet with invalid chars on column name reads double as null when a clean > schema is applied > - > > Key: SPARK-34751 > URL: https://issues.apache.org/jira/browse/SPARK-34751 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.3, 3.1.1 > Environment: Pyspark 2.4.3 > AWS Glue Dev Endpoint EMR >Reporter: Nivas Umapathy >Priority: Major > Attachments: invalid_columns_double.parquet > > > I have a parquet file that has data with invalid column names on it. > [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the > file attached with this ticket. > I tried to load this file with > {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} > {{df = df.withColumnRenamed('COL 1', 'COL_1')}} > {{df = df.withColumnRenamed('COL,2', 'COL_2')}} > {{df = df.withColumnRenamed('COL;3', 'COL_3') }} > and so on. > Now if i call > {{df.show()}} > it throws this exception that is still pointing to the old column name. > {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains > invalid character(s) among " ,;{}()}} > {{n}} > {{t=". Please use alias to rename it.;'}} > > When i read about it in some blogs, there was suggestion to re-read the same > parquet with new schema applied. So i did > {{df = > glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} > > and it works, but all the data in the dataframe are null. The same works for > String datatypes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
[ https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-34751: - Affects Version/s: 3.1.1 > Parquet with invalid chars on column name reads double as null when a clean > schema is applied > - > > Key: SPARK-34751 > URL: https://issues.apache.org/jira/browse/SPARK-34751 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.3, 3.1.1 > Environment: Pyspark 2.4.3 > AWS Glue Dev Endpoint EMR >Reporter: Nivas Umapathy >Priority: Major > Attachments: invalid_columns_double.parquet > > > I have a parquet file that has data with invalid column names on it. > [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the > file attached with this ticket. > I tried to load this file with > {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} > {{df = df.withColumnRenamed('COL 1', 'COL_1')}} > {{df = df.withColumnRenamed('COL,2', 'COL_2')}} > {{df = df.withColumnRenamed('COL;3', 'COL_3') }} > and so on. > Now if i call > {{df.show()}} > it throws this exception that is still pointing to the old column name. > {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains > invalid character(s) among " ,;{}()}} > {{n}} > {{t=". Please use alias to rename it.;'}} > > When i read about it in some blogs, there was suggestion to re-read the same > parquet with new schema applied. So i did > {{df = > glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} > > and it works, but all the data in the dataframe are null. The same works for > String datatypes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
[ https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302672#comment-17302672 ] Nivas Umapathy edited comment on SPARK-34751 at 3/16/21, 7:56 PM: -- I ran it on 3.1.1 and it still has the same problem. All double column values are null was (Author: toocoolblue2000): I ran it on 3.1.1 and it still has the same problem. All column values are null > Parquet with invalid chars on column name reads double as null when a clean > schema is applied > - > > Key: SPARK-34751 > URL: https://issues.apache.org/jira/browse/SPARK-34751 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.3 > Environment: Pyspark 2.4.3 > AWS Glue Dev Endpoint EMR >Reporter: Nivas Umapathy >Priority: Major > Attachments: invalid_columns_double.parquet > > > I have a parquet file that has data with invalid column names on it. > [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the > file attached with this ticket. > I tried to load this file with > {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} > {{df = df.withColumnRenamed('COL 1', 'COL_1')}} > {{df = df.withColumnRenamed('COL,2', 'COL_2')}} > {{df = df.withColumnRenamed('COL;3', 'COL_3') }} > and so on. > Now if i call > {{df.show()}} > it throws this exception that is still pointing to the old column name. > {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains > invalid character(s) among " ,;{}()}} > {{n}} > {{t=". Please use alias to rename it.;'}} > > When i read about it in some blogs, there was suggestion to re-read the same > parquet with new schema applied. So i did > {{df = > glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} > > and it works, but all the data in the dataframe are null. The same works for > String datatypes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25935) Prevent null rows from JSON parser
[ https://issues.apache.org/jira/browse/SPARK-25935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302796#comment-17302796 ] Dongjoon Hyun commented on SPARK-25935: --- Please let me know if the previous status, `Resolution = Won't Fix` and `Fix Version = 3.0.0`, was correct. > Prevent null rows from JSON parser > -- > > Key: SPARK-25935 > URL: https://issues.apache.org/jira/browse/SPARK-25935 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Max Gekk >Priority: Minor > > Currently, JSON parser can produce nulls if it cannot detect any valid JSON > token on the root level, see > https://github.com/apache/spark/blob/4d6704db4d490bd1830ed3c757525f41058523e0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala#L402 > . As a consequence of that, the from_json() function can produce null in the > PERMISSIVE mode. To prevent that, need to throw an exception which should > treat as a bad record and handled according specified mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25935) Prevent null rows from JSON parser
[ https://issues.apache.org/jira/browse/SPARK-25935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302792#comment-17302792 ] Dongjoon Hyun edited comment on SPARK-25935 at 3/16/21, 6:38 PM: - I removed the `Fix Version = 3.0.0` from this issue because this was reverted and resolved as `Won't Fix`. was (Author: dongjoon): I removed the `Fix Version = 3.0.0` from this issue because this is reverted and resolved as `Won't Fix`. > Prevent null rows from JSON parser > -- > > Key: SPARK-25935 > URL: https://issues.apache.org/jira/browse/SPARK-25935 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Max Gekk >Priority: Minor > > Currently, JSON parser can produce nulls if it cannot detect any valid JSON > token on the root level, see > https://github.com/apache/spark/blob/4d6704db4d490bd1830ed3c757525f41058523e0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala#L402 > . As a consequence of that, the from_json() function can produce null in the > PERMISSIVE mode. To prevent that, need to throw an exception which should > treat as a bad record and handled according specified mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25935) Prevent null rows from JSON parser
[ https://issues.apache.org/jira/browse/SPARK-25935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302792#comment-17302792 ] Dongjoon Hyun commented on SPARK-25935: --- I removed the `Fix Version = 3.0.0` from this issue because this is reverted and resolved as `Won't Fix`. > Prevent null rows from JSON parser > -- > > Key: SPARK-25935 > URL: https://issues.apache.org/jira/browse/SPARK-25935 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Max Gekk >Priority: Minor > > Currently, JSON parser can produce nulls if it cannot detect any valid JSON > token on the root level, see > https://github.com/apache/spark/blob/4d6704db4d490bd1830ed3c757525f41058523e0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala#L402 > . As a consequence of that, the from_json() function can produce null in the > PERMISSIVE mode. To prevent that, need to throw an exception which should > treat as a bad record and handled according specified mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25935) Prevent null rows from JSON parser
[ https://issues.apache.org/jira/browse/SPARK-25935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25935: -- Fix Version/s: (was: 3.0.0) > Prevent null rows from JSON parser > -- > > Key: SPARK-25935 > URL: https://issues.apache.org/jira/browse/SPARK-25935 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Max Gekk >Priority: Minor > > Currently, JSON parser can produce nulls if it cannot detect any valid JSON > token on the root level, see > https://github.com/apache/spark/blob/4d6704db4d490bd1830ed3c757525f41058523e0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala#L402 > . As a consequence of that, the from_json() function can produce null in the > PERMISSIVE mode. To prevent that, need to throw an exception which should > treat as a bad record and handled according specified mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34738) Upgrade Minikube and kubernetes cluster version on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-34738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302777#comment-17302777 ] Shane Knapp commented on SPARK-34738: - i'll be doing this next tuesday (3/23) and teaching one of my sysadmins to help out. > Upgrade Minikube and kubernetes cluster version on Jenkins > -- > > Key: SPARK-34738 > URL: https://issues.apache.org/jira/browse/SPARK-34738 > Project: Spark > Issue Type: Task > Components: jenkins, Kubernetes >Affects Versions: 3.2.0 >Reporter: Attila Zsolt Piros >Assignee: Shane Knapp >Priority: Major > > [~shaneknapp] as we discussed [on the mailing > list|http://apache-spark-developers-list.1001551.n3.nabble.com/minikube-and-kubernetes-cluster-versions-for-integration-testing-td30856.html] > Minikube can be upgraded to the latest (v1.18.1) and kubernetes version > should be v1.17.3 (`minikube config set kubernetes-version v1.17.3`). > [Here|https://github.com/apache/spark/pull/31829] is my PR which uses a new > method to configure the kubernetes client. Thanks in advance to use it for > testing on the Jenkins after the Minikube version is updated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34764) Propagate reason for executor loss to the UI
Holden Karau created SPARK-34764: Summary: Propagate reason for executor loss to the UI Key: SPARK-34764 URL: https://issues.apache.org/jira/browse/SPARK-34764 Project: Spark Issue Type: Improvement Components: Kubernetes, Spark Core Affects Versions: 3.2.0 Reporter: Holden Karau When the external cluster manager terminates an executor we should propagate this information to the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34761) Add a day-time interval to a timestamp
[ https://issues.apache.org/jira/browse/SPARK-34761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302731#comment-17302731 ] Apache Spark commented on SPARK-34761: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/31855 > Add a day-time interval to a timestamp > -- > > Key: SPARK-34761 > URL: https://issues.apache.org/jira/browse/SPARK-34761 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > Support adding of DayTimeIntervalType values to TIMESTAMP values. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34761) Add a day-time interval to a timestamp
[ https://issues.apache.org/jira/browse/SPARK-34761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34761: Assignee: Max Gekk (was: Apache Spark) > Add a day-time interval to a timestamp > -- > > Key: SPARK-34761 > URL: https://issues.apache.org/jira/browse/SPARK-34761 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > Support adding of DayTimeIntervalType values to TIMESTAMP values. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34761) Add a day-time interval to a timestamp
[ https://issues.apache.org/jira/browse/SPARK-34761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34761: Assignee: Apache Spark (was: Max Gekk) > Add a day-time interval to a timestamp > -- > > Key: SPARK-34761 > URL: https://issues.apache.org/jira/browse/SPARK-34761 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > Fix For: 3.2.0 > > > Support adding of DayTimeIntervalType values to TIMESTAMP values. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34762) Many PR's Scala 2.13 build action failed
[ https://issues.apache.org/jira/browse/SPARK-34762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302729#comment-17302729 ] Dongjoon Hyun commented on SPARK-34762: --- Thank you for reporting, [~LuciferYang] . There was GitHub Action outage and BinTray outage yesterday. I guess it's recovered now. Do you see the failures still? > Many PR's Scala 2.13 build action failed > > > Key: SPARK-34762 > URL: https://issues.apache.org/jira/browse/SPARK-34762 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Major > > PR with Scala 2.13 build failure includes > * [https://github.com/apache/spark/pull/31849] > * [https://github.com/apache/spark/pull/31848] > * [https://github.com/apache/spark/pull/31844] > * [https://github.com/apache/spark/pull/31843] > * https://github.com/apache/spark/pull/31841 > {code:java} > [error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1: > error: package org.apache.commons.cli does not exist > 1278[error] import org.apache.commons.cli.GnuParser; > 1279[error] ^ > 1280[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1: > error: cannot find symbol > 1281[error] private final Options options = new Options(); > 1282[error] ^ symbol: class Options > 1283[error] location: class ServerOptionsProcessor > 1284[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:177:1: > error: package org.apache.commons.cli does not exist > 1285[error] private org.apache.commons.cli.CommandLine commandLine; > 1286[error] ^ > 1287[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:255:1: > error: cannot find symbol > 1288[error] HelpOptionExecutor(String serverName, Options options) { > 1289[error] ^ symbol: class > Options > 1290[error] location: class HelpOptionExecutor > 1291[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1: > error: cannot find symbol > 1292[error] private final Options options = new Options(); > 1293[error] ^ symbol: class Options > 1294[error] location: class ServerOptionsProcessor > 1295[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:185:1: > error: cannot find symbol > 1296[error] options.addOption(OptionBuilder > 1297[error] ^ symbol: variable OptionBuilder > 1298[error] location: class ServerOptionsProcessor > 1299[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:192:1: > error: cannot find symbol > 1300[error] options.addOption(new Option("H", "help", false, "Print > help information")); > 1301[error] ^ symbol: class Option > 1302[error] location: class ServerOptionsProcessor > 1303[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:197:1: > error: cannot find symbol > 1304[error] commandLine = new GnuParser().parse(options, argv); > 1305[error] ^ symbol: class GnuParser > 1306[error] location: class ServerOptionsProcessor > 1307[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:211:1: > error: cannot find symbol > 1308[error] } catch (ParseException e) { > 1309[error]^ symbol: class ParseException > 1310[error] location: class ServerOptionsProcessor > 1311[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:262:1: > error: cannot find symbol > 1312[error] new HelpFormatter().printHelp(serverName, options); > 1313[error] ^ symbol: class HelpFormatter > 1314[error] location: class HelpOptionExecutor > 1315[error] Note: Some input files use or override a deprecated API. > 1316[error] Note: Recompile with -Xlint:deprecation for details. > 1317[error] 16 errors > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail:
[jira] [Updated] (SPARK-33428) conv UDF returns incorrect value
[ https://issues.apache.org/jira/browse/SPARK-33428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33428: -- Fix Version/s: (was: 3.2.0) > conv UDF returns incorrect value > > > Key: SPARK-33428 > URL: https://issues.apache.org/jira/browse/SPARK-33428 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {noformat} > spark-sql> select java_method('scala.math.BigInt', 'apply', > 'c8dcdfb41711fc9a1f17928001d7fd61', 16); > 266992441711411603393340504520074460513 > spark-sql> select conv('c8dcdfb41711fc9a1f17928001d7fd61', 16, 10); > 18446744073709551615 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-33428) conv UDF returns incorrect value
[ https://issues.apache.org/jira/browse/SPARK-33428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reopened SPARK-33428: --- Assignee: (was: angerszhu) > conv UDF returns incorrect value > > > Key: SPARK-33428 > URL: https://issues.apache.org/jira/browse/SPARK-33428 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > Fix For: 3.2.0 > > > How to reproduce this issue: > {noformat} > spark-sql> select java_method('scala.math.BigInt', 'apply', > 'c8dcdfb41711fc9a1f17928001d7fd61', 16); > 266992441711411603393340504520074460513 > spark-sql> select conv('c8dcdfb41711fc9a1f17928001d7fd61', 16, 10); > 18446744073709551615 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33428) conv UDF returns incorrect value
[ https://issues.apache.org/jira/browse/SPARK-33428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302694#comment-17302694 ] Dongjoon Hyun commented on SPARK-33428: --- The commit is reverted. > conv UDF returns incorrect value > > > Key: SPARK-33428 > URL: https://issues.apache.org/jira/browse/SPARK-33428 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {noformat} > spark-sql> select java_method('scala.math.BigInt', 'apply', > 'c8dcdfb41711fc9a1f17928001d7fd61', 16); > 266992441711411603393340504520074460513 > spark-sql> select conv('c8dcdfb41711fc9a1f17928001d7fd61', 16, 10); > 18446744073709551615 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
[ https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302672#comment-17302672 ] Nivas Umapathy commented on SPARK-34751: I ran it on 3.1.1 and it still has the same problem. All column values are null > Parquet with invalid chars on column name reads double as null when a clean > schema is applied > - > > Key: SPARK-34751 > URL: https://issues.apache.org/jira/browse/SPARK-34751 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.3 > Environment: Pyspark 2.4.3 > AWS Glue Dev Endpoint EMR >Reporter: Nivas Umapathy >Priority: Major > Attachments: invalid_columns_double.parquet > > > I have a parquet file that has data with invalid column names on it. > [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the > file attached with this ticket. > I tried to load this file with > {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} > {{df = df.withColumnRenamed('COL 1', 'COL_1')}} > {{df = df.withColumnRenamed('COL,2', 'COL_2')}} > {{df = df.withColumnRenamed('COL;3', 'COL_3') }} > and so on. > Now if i call > {{df.show()}} > it throws this exception that is still pointing to the old column name. > {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains > invalid character(s) among " ,;{}()}} > {{n}} > {{t=". Please use alias to rename it.;'}} > > When i read about it in some blogs, there was suggestion to re-read the same > parquet with new schema applied. So i did > {{df = > glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} > > and it works, but all the data in the dataframe are null. The same works for > String datatypes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34763) col(), $"" and df("name") should handle quoted column names properly.
[ https://issues.apache.org/jira/browse/SPARK-34763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302622#comment-17302622 ] Apache Spark commented on SPARK-34763: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/31854 > col(), $"" and df("name") should handle quoted column names properly. > --- > > Key: SPARK-34763 > URL: https://issues.apache.org/jira/browse/SPARK-34763 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Quoted column names like `a``b.c` cannot be represented with col(), $"" > and df("") because they don't handle such column names properly. > For example, if we have a following DataFrame. > {code} > val df1 = spark.sql("SELECT 'col1' AS `a``b.c`") > {code} > For the DataFrame, this query is successfully executed. > {code} > scala> df1.selectExpr("`a``b.c`").show > +-+ > |a`b.c| > +-+ > | col1| > +-+ > {code} > But the following query will fail because df1("`a``b.c`") throws an exception. > {code} > scala> df1.select(df1("`a``b.c`")).show > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `a``b.c`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:152) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:162) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:221) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1274) > at org.apache.spark.sql.Dataset.apply(Dataset.scala:1241) > ... 49 elided > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34763) col(), $"" and df("name") should handle quoted column names properly.
[ https://issues.apache.org/jira/browse/SPARK-34763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302621#comment-17302621 ] Apache Spark commented on SPARK-34763: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/31854 > col(), $"" and df("name") should handle quoted column names properly. > --- > > Key: SPARK-34763 > URL: https://issues.apache.org/jira/browse/SPARK-34763 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Quoted column names like `a``b.c` cannot be represented with col(), $"" > and df("") because they don't handle such column names properly. > For example, if we have a following DataFrame. > {code} > val df1 = spark.sql("SELECT 'col1' AS `a``b.c`") > {code} > For the DataFrame, this query is successfully executed. > {code} > scala> df1.selectExpr("`a``b.c`").show > +-+ > |a`b.c| > +-+ > | col1| > +-+ > {code} > But the following query will fail because df1("`a``b.c`") throws an exception. > {code} > scala> df1.select(df1("`a``b.c`")).show > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `a``b.c`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:152) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:162) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:221) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1274) > at org.apache.spark.sql.Dataset.apply(Dataset.scala:1241) > ... 49 elided > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34763) col(), $"" and df("name") should handle quoted column names properly.
[ https://issues.apache.org/jira/browse/SPARK-34763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34763: Assignee: Kousuke Saruta (was: Apache Spark) > col(), $"" and df("name") should handle quoted column names properly. > --- > > Key: SPARK-34763 > URL: https://issues.apache.org/jira/browse/SPARK-34763 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Quoted column names like `a``b.c` cannot be represented with col(), $"" > and df("") because they don't handle such column names properly. > For example, if we have a following DataFrame. > {code} > val df1 = spark.sql("SELECT 'col1' AS `a``b.c`") > {code} > For the DataFrame, this query is successfully executed. > {code} > scala> df1.selectExpr("`a``b.c`").show > +-+ > |a`b.c| > +-+ > | col1| > +-+ > {code} > But the following query will fail because df1("`a``b.c`") throws an exception. > {code} > scala> df1.select(df1("`a``b.c`")).show > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `a``b.c`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:152) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:162) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:221) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1274) > at org.apache.spark.sql.Dataset.apply(Dataset.scala:1241) > ... 49 elided > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34763) col(), $"" and df("name") should handle quoted column names properly.
[ https://issues.apache.org/jira/browse/SPARK-34763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34763: Assignee: Apache Spark (was: Kousuke Saruta) > col(), $"" and df("name") should handle quoted column names properly. > --- > > Key: SPARK-34763 > URL: https://issues.apache.org/jira/browse/SPARK-34763 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Major > > Quoted column names like `a``b.c` cannot be represented with col(), $"" > and df("") because they don't handle such column names properly. > For example, if we have a following DataFrame. > {code} > val df1 = spark.sql("SELECT 'col1' AS `a``b.c`") > {code} > For the DataFrame, this query is successfully executed. > {code} > scala> df1.selectExpr("`a``b.c`").show > +-+ > |a`b.c| > +-+ > | col1| > +-+ > {code} > But the following query will fail because df1("`a``b.c`") throws an exception. > {code} > scala> df1.select(df1("`a``b.c`")).show > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `a``b.c`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:152) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:162) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:221) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1274) > at org.apache.spark.sql.Dataset.apply(Dataset.scala:1241) > ... 49 elided > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34763) col(), $"" and df("name") should handle quoted column names properly.
[ https://issues.apache.org/jira/browse/SPARK-34763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-34763: --- Summary: col(), $"" and df("name") should handle quoted column names properly. (was: col(), $"" and df("name") should handle quoted column name properly.) > col(), $"" and df("name") should handle quoted column names properly. > --- > > Key: SPARK-34763 > URL: https://issues.apache.org/jira/browse/SPARK-34763 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Quoted column names like `a``b.c` cannot be represented with col(), $"" > and df("") because they don't handle such column names properly. > For example, if we have a following DataFrame. > {code} > val df1 = spark.sql("SELECT 'col1' AS `a``b.c`") > {code} > For the DataFrame, this query is successfully executed. > {code} > scala> df1.selectExpr("`a``b.c`").show > +-+ > |a`b.c| > +-+ > | col1| > +-+ > {code} > But the following query will fail because df1("`a``b.c`") throws an exception. > {code} > scala> df1.select(df1("`a``b.c`")).show > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `a``b.c`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:152) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:162) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:221) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1274) > at org.apache.spark.sql.Dataset.apply(Dataset.scala:1241) > ... 49 elided > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34763) col(), $"" and df("name") should handle quoted column name properly.
Kousuke Saruta created SPARK-34763: -- Summary: col(), $"" and df("name") should handle quoted column name properly. Key: SPARK-34763 URL: https://issues.apache.org/jira/browse/SPARK-34763 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.1, 3.0.2, 2.4.7, 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Quoted column names like `a``b.c` cannot be represented with col(), $"" and df("") because they don't handle such column names properly. For example, if we have a following DataFrame. {code} val df1 = spark.sql("SELECT 'col1' AS `a``b.c`") {code} For the DataFrame, this query is successfully executed. {code} scala> df1.selectExpr("`a``b.c`").show +-+ |a`b.c| +-+ | col1| +-+ {code} But the following query will fail because df1("`a``b.c`") throws an exception. {code} scala> df1.select(df1("`a``b.c`")).show org.apache.spark.sql.AnalysisException: syntax error in attribute name: `a``b.c`; at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:152) at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:162) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121) at org.apache.spark.sql.Dataset.resolve(Dataset.scala:221) at org.apache.spark.sql.Dataset.col(Dataset.scala:1274) at org.apache.spark.sql.Dataset.apply(Dataset.scala:1241) ... 49 elided {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34504) Avoid unnecessary view resolving and remove the `performCheck` flag
[ https://issues.apache.org/jira/browse/SPARK-34504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302584#comment-17302584 ] Apache Spark commented on SPARK-34504: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/31853 > Avoid unnecessary view resolving and remove the `performCheck` flag > --- > > Key: SPARK-34504 > URL: https://issues.apache.org/jira/browse/SPARK-34504 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Linhong Liu >Priority: Major > > in SPARK-34490, I added a `performCheck` flag to skip analysis check when > resolving views. This is due to some view resolution is unnecessary. So we > can avoid these unnecessary view resolution and remove the `performCheck` > flag. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34504) Avoid unnecessary view resolving and remove the `performCheck` flag
[ https://issues.apache.org/jira/browse/SPARK-34504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302583#comment-17302583 ] Apache Spark commented on SPARK-34504: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/31853 > Avoid unnecessary view resolving and remove the `performCheck` flag > --- > > Key: SPARK-34504 > URL: https://issues.apache.org/jira/browse/SPARK-34504 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Linhong Liu >Priority: Major > > in SPARK-34490, I added a `performCheck` flag to skip analysis check when > resolving views. This is due to some view resolution is unnecessary. So we > can avoid these unnecessary view resolution and remove the `performCheck` > flag. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34504) Avoid unnecessary view resolving and remove the `performCheck` flag
[ https://issues.apache.org/jira/browse/SPARK-34504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34504: Assignee: (was: Apache Spark) > Avoid unnecessary view resolving and remove the `performCheck` flag > --- > > Key: SPARK-34504 > URL: https://issues.apache.org/jira/browse/SPARK-34504 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Linhong Liu >Priority: Major > > in SPARK-34490, I added a `performCheck` flag to skip analysis check when > resolving views. This is due to some view resolution is unnecessary. So we > can avoid these unnecessary view resolution and remove the `performCheck` > flag. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34504) Avoid unnecessary view resolving and remove the `performCheck` flag
[ https://issues.apache.org/jira/browse/SPARK-34504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34504: Assignee: Apache Spark > Avoid unnecessary view resolving and remove the `performCheck` flag > --- > > Key: SPARK-34504 > URL: https://issues.apache.org/jira/browse/SPARK-34504 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Linhong Liu >Assignee: Apache Spark >Priority: Major > > in SPARK-34490, I added a `performCheck` flag to skip analysis check when > resolving views. This is due to some view resolution is unnecessary. So we > can avoid these unnecessary view resolution and remove the `performCheck` > flag. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34762) Many PR's Scala 2.13 build action failed
[ https://issues.apache.org/jira/browse/SPARK-34762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-34762: - Description: PR with Scala 2.13 build failure includes * [https://github.com/apache/spark/pull/31849] * [https://github.com/apache/spark/pull/31848] * [https://github.com/apache/spark/pull/31844] * [https://github.com/apache/spark/pull/31843] * https://github.com/apache/spark/pull/31841 {code:java} [error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1: error: package org.apache.commons.cli does not exist 1278[error] import org.apache.commons.cli.GnuParser; 1279[error] ^ 1280[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1: error: cannot find symbol 1281[error] private final Options options = new Options(); 1282[error] ^ symbol: class Options 1283[error] location: class ServerOptionsProcessor 1284[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:177:1: error: package org.apache.commons.cli does not exist 1285[error] private org.apache.commons.cli.CommandLine commandLine; 1286[error] ^ 1287[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:255:1: error: cannot find symbol 1288[error] HelpOptionExecutor(String serverName, Options options) { 1289[error] ^ symbol: class Options 1290[error] location: class HelpOptionExecutor 1291[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1: error: cannot find symbol 1292[error] private final Options options = new Options(); 1293[error] ^ symbol: class Options 1294[error] location: class ServerOptionsProcessor 1295[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:185:1: error: cannot find symbol 1296[error] options.addOption(OptionBuilder 1297[error] ^ symbol: variable OptionBuilder 1298[error] location: class ServerOptionsProcessor 1299[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:192:1: error: cannot find symbol 1300[error] options.addOption(new Option("H", "help", false, "Print help information")); 1301[error] ^ symbol: class Option 1302[error] location: class ServerOptionsProcessor 1303[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:197:1: error: cannot find symbol 1304[error] commandLine = new GnuParser().parse(options, argv); 1305[error] ^ symbol: class GnuParser 1306[error] location: class ServerOptionsProcessor 1307[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:211:1: error: cannot find symbol 1308[error] } catch (ParseException e) { 1309[error]^ symbol: class ParseException 1310[error] location: class ServerOptionsProcessor 1311[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:262:1: error: cannot find symbol 1312[error] new HelpFormatter().printHelp(serverName, options); 1313[error] ^ symbol: class HelpFormatter 1314[error] location: class HelpOptionExecutor 1315[error] Note: Some input files use or override a deprecated API. 1316[error] Note: Recompile with -Xlint:deprecation for details. 1317[error] 16 errors {code} was: PR with Scala 2.13 build failure includes * [https://github.com/apache/spark/pull/31849] * [https://github.com/apache/spark/pull/31848] * [https://github.com/apache/spark/pull/31844] * [https://github.com/apache/spark/pull/31843] {code:java} [error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1: error: package org.apache.commons.cli does not exist 1278[error] import org.apache.commons.cli.GnuParser; 1279[error] ^ 1280[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1: error: cannot find symbol 1281[error] private final Options options = new Options(); 1282[error] ^ symbol: class Options 1283[error] location: class ServerOptionsProcessor 1284[error]
[jira] [Resolved] (SPARK-34680) Spark hangs when out of diskspace
[ https://issues.apache.org/jira/browse/SPARK-34680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-34680. -- Resolution: Not A Problem > Spark hangs when out of diskspace > - > > Key: SPARK-34680 > URL: https://issues.apache.org/jira/browse/SPARK-34680 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1, 3.1.1 > Environment: Running Spark and Pyspark 3.1.1. with Hadoop 3.2.2 and > Koalas 1.6.0. > Some environment variables: > |Java Home|/usr/lib/jvm/java-11-openjdk-11.0.3.7-0.el7_6.x86_64| > |Java Version|11.0.3 (Oracle Corporation)| > |Scala Version|version 2.12.10| >Reporter: Laurens >Priority: Major > > Parsing a workflow using Koalas, I noticed a stage is hanging for 8 hours > already. I checked the logs and the last output is: > {code:java} > 21/03/09 13:50:31 ERROR TaskMemoryManager: error while calling spill() on > org.apache.spark.shuffle.sort.ShuffleExternalSorter@4127a515 > java.io.IOException: No space left on device > at java.base/java.io.FileOutputStream.writeBytes(Native Method) > at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) > at > org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:59) > at > java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) > at > java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127) > at > net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:223) > at net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:176) > at > org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:260) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:218) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.spill(ShuffleExternalSorter.java:276) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:208) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:289) > at > org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:116) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:385) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:409) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:249) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:178) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Suppressed: java.io.IOException: No space left on device > at java.base/java.io.FileOutputStream.writeBytes(Native Method) > at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) > at > org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:59) > at > java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) > at > java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142) > at net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:243) > at > org.apache.spark.serializer.DummySerializerInstance$1.flush(DummySerializerInstance.java:50) > at > org.apache.spark.storage.DiskBlockObjectWriter.commitAndGet(DiskBlockObjectWriter.scala:173) > at > org.apache.spark.storage.DiskBlockObjectWriter.$anonfun$close$1(DiskBlockObjectWriter.scala:156) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at > org.apache.spark.storage.DiskBlockObjectWriter.close(DiskBlockObjectWriter.scala:158) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:226) > ... 18 more > Suppressed: java.io.IOException: No space left on device > at java.base/java.io.FileOutputStream.writeBytes(Native Method) > at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) > at >
[jira] [Commented] (SPARK-34680) Spark hangs when out of diskspace
[ https://issues.apache.org/jira/browse/SPARK-34680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302578#comment-17302578 ] Takeshi Yamamuro commented on SPARK-34680: -- Not enough information to reproduce the issue, so I'll close this. > Spark hangs when out of diskspace > - > > Key: SPARK-34680 > URL: https://issues.apache.org/jira/browse/SPARK-34680 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1, 3.1.1 > Environment: Running Spark and Pyspark 3.1.1. with Hadoop 3.2.2 and > Koalas 1.6.0. > Some environment variables: > |Java Home|/usr/lib/jvm/java-11-openjdk-11.0.3.7-0.el7_6.x86_64| > |Java Version|11.0.3 (Oracle Corporation)| > |Scala Version|version 2.12.10| >Reporter: Laurens >Priority: Major > > Parsing a workflow using Koalas, I noticed a stage is hanging for 8 hours > already. I checked the logs and the last output is: > {code:java} > 21/03/09 13:50:31 ERROR TaskMemoryManager: error while calling spill() on > org.apache.spark.shuffle.sort.ShuffleExternalSorter@4127a515 > java.io.IOException: No space left on device > at java.base/java.io.FileOutputStream.writeBytes(Native Method) > at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) > at > org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:59) > at > java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) > at > java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127) > at > net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:223) > at net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:176) > at > org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:260) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:218) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.spill(ShuffleExternalSorter.java:276) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:208) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:289) > at > org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:116) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:385) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:409) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:249) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:178) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Suppressed: java.io.IOException: No space left on device > at java.base/java.io.FileOutputStream.writeBytes(Native Method) > at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) > at > org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:59) > at > java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) > at > java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142) > at net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:243) > at > org.apache.spark.serializer.DummySerializerInstance$1.flush(DummySerializerInstance.java:50) > at > org.apache.spark.storage.DiskBlockObjectWriter.commitAndGet(DiskBlockObjectWriter.scala:173) > at > org.apache.spark.storage.DiskBlockObjectWriter.$anonfun$close$1(DiskBlockObjectWriter.scala:156) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at > org.apache.spark.storage.DiskBlockObjectWriter.close(DiskBlockObjectWriter.scala:158) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:226) > ... 18 more > Suppressed: java.io.IOException: No space left on device > at java.base/java.io.FileOutputStream.writeBytes(Native Method) > at
[jira] [Updated] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
[ https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-34751: - Target Version/s: (was: 2.4.3) > Parquet with invalid chars on column name reads double as null when a clean > schema is applied > - > > Key: SPARK-34751 > URL: https://issues.apache.org/jira/browse/SPARK-34751 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.3 > Environment: Pyspark 2.4.3 > AWS Glue Dev Endpoint EMR >Reporter: Nivas Umapathy >Priority: Major > Fix For: 2.4.8 > > Attachments: invalid_columns_double.parquet > > > I have a parquet file that has data with invalid column names on it. > [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the > file attached with this ticket. > I tried to load this file with > {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} > {{df = df.withColumnRenamed('COL 1', 'COL_1')}} > {{df = df.withColumnRenamed('COL,2', 'COL_2')}} > {{df = df.withColumnRenamed('COL;3', 'COL_3') }} > and so on. > Now if i call > {{df.show()}} > it throws this exception that is still pointing to the old column name. > {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains > invalid character(s) among " ,;{}()}} > {{n}} > {{t=". Please use alias to rename it.;'}} > > When i read about it in some blogs, there was suggestion to re-read the same > parquet with new schema applied. So i did > {{df = > glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} > > and it works, but all the data in the dataframe are null. The same works for > String datatypes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
[ https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-34751: - Fix Version/s: (was: 2.4.8) > Parquet with invalid chars on column name reads double as null when a clean > schema is applied > - > > Key: SPARK-34751 > URL: https://issues.apache.org/jira/browse/SPARK-34751 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.3 > Environment: Pyspark 2.4.3 > AWS Glue Dev Endpoint EMR >Reporter: Nivas Umapathy >Priority: Major > Attachments: invalid_columns_double.parquet > > > I have a parquet file that has data with invalid column names on it. > [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the > file attached with this ticket. > I tried to load this file with > {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} > {{df = df.withColumnRenamed('COL 1', 'COL_1')}} > {{df = df.withColumnRenamed('COL,2', 'COL_2')}} > {{df = df.withColumnRenamed('COL;3', 'COL_3') }} > and so on. > Now if i call > {{df.show()}} > it throws this exception that is still pointing to the old column name. > {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains > invalid character(s) among " ,;{}()}} > {{n}} > {{t=". Please use alias to rename it.;'}} > > When i read about it in some blogs, there was suggestion to re-read the same > parquet with new schema applied. So i did > {{df = > glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} > > and it works, but all the data in the dataframe are null. The same works for > String datatypes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied
[ https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302573#comment-17302573 ] Takeshi Yamamuro commented on SPARK-34751: -- Could you try newer Spark, e.g., 2.4.7, 3.0.2, or 3.1.1? > Parquet with invalid chars on column name reads double as null when a clean > schema is applied > - > > Key: SPARK-34751 > URL: https://issues.apache.org/jira/browse/SPARK-34751 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.3 > Environment: Pyspark 2.4.3 > AWS Glue Dev Endpoint EMR >Reporter: Nivas Umapathy >Priority: Major > Fix For: 2.4.8 > > Attachments: invalid_columns_double.parquet > > > I have a parquet file that has data with invalid column names on it. > [#Reference](https://issues.apache.org/jira/browse/SPARK-27442) Here is the > file attached with this ticket. > I tried to load this file with > {{df = glue_context.read.parquet('invalid_columns_double.parquet')}} > {{df = df.withColumnRenamed('COL 1', 'COL_1')}} > {{df = df.withColumnRenamed('COL,2', 'COL_2')}} > {{df = df.withColumnRenamed('COL;3', 'COL_3') }} > and so on. > Now if i call > {{df.show()}} > it throws this exception that is still pointing to the old column name. > {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains > invalid character(s) among " ,;{}()}} > {{n}} > {{t=". Please use alias to rename it.;'}} > > When i read about it in some blogs, there was suggestion to re-read the same > parquet with new schema applied. So i did > {{df = > glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}} > > and it works, but all the data in the dataframe are null. The same works for > String datatypes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34714) collect_list(struct()) fails when used with GROUP BY
[ https://issues.apache.org/jira/browse/SPARK-34714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302570#comment-17302570 ] Takeshi Yamamuro commented on SPARK-34714: -- I've checked that branch-3.1 still has this issue (NOTE: the current master does not). > collect_list(struct()) fails when used with GROUP BY > > > Key: SPARK-34714 > URL: https://issues.apache.org/jira/browse/SPARK-34714 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 > Environment: Databricks Runtime 8.0 >Reporter: Lauri Koobas >Priority: Major > > The following is failing in DBR8.0 / Spark 3.1.1, but works in earlier DBR > and Spark versions: > {quote}with step_1 as ( > select 'E' as name, named_struct('subfield', 1) as field_1 > ) > select name, collect_list(struct(field_1.subfield)) > from step_1 > group by 1 > {quote} > Fails with the following error message: > {quote}AnalysisException: cannot resolve > 'struct(step_1.`field_1`.`subfield`)' due to data type mismatch: Only > foldable string expressions are allowed to appear at odd position, got: > NamePlaceholder > {quote} > If you modify the query in any of the following ways then it still works:: > * if you remove the field "name" and the "group by 1" part of the query > * if you remove the "struct()" from within the collect_list() > * if you use "named_struct()" instead of "struct()" within the collect_list() > Similarly collect_set() is broken and possibly more related functions, but I > haven't done thorough testing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns
[ https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-34694: - Component/s: (was: Spark Core) SQL > Improve Spark SQL Source Filter to allow pushdown of filters span multiple > columns > -- > > Key: SPARK-34694 > URL: https://issues.apache.org/jira/browse/SPARK-34694 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1 >Reporter: Chen Zou >Priority: Minor > > The current org.apache.spark.sql.sources.Filter abstract class only allows > pushdown of filters on single column or sum of products of multiple such > single-column filters. > Filters on multiple columns cannot be pushed down through this Filter > subclass to source, e.g. from TPC-H benchmark on lineitem table: > (l_commitdate#11 < l_receiptdate#12) > (l_shipdate#10 < l_commitdate#11) > > The current design probably originates from the point that columnar source > has a hard time supporting these cross-column filters. But with batching > implemented in columnar sources, they can still support cross-column filters. > This issue tries to open up discussion on a more general Filter interface to > allow pushing down cross-column filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34661) Replaces `OriginalType` with `LogicalTypeAnnotation` in VectorizedColumnReader
[ https://issues.apache.org/jira/browse/SPARK-34661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-34661: - Description: {{OriginalType}} and {{DecimalMetadata}} has been marked as {{@Deprecated}} in new Parquet code. {{Apache Parquet}} suggest us replace {{OriginalType}} with {{LogicalTypeAnnotation}} and replace {{DecimalMetadata}} with {{DecimalLogicalTypeAnnotation.}} {{}} The files to be changed are as follows: * VectorizedColumnReader.java * ParquetFilters.scala * ParquetReadSupport.scala * ParquetRowConverter.scala * ParquetSchemaConverter.scala {{}} was: `OriginalType` has been marked as '@Deprecated', Apache Parquet suggests to use LogicalTypeAnnotation to represent logical types instead. This JIRA is used to track the cleanup of `OriginalType` usages in VectorizedColumnReader > Replaces `OriginalType` with `LogicalTypeAnnotation` in VectorizedColumnReader > -- > > Key: SPARK-34661 > URL: https://issues.apache.org/jira/browse/SPARK-34661 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Minor > > {{OriginalType}} and {{DecimalMetadata}} has been marked as {{@Deprecated}} > in new Parquet code. > {{Apache Parquet}} suggest us replace {{OriginalType}} with > {{LogicalTypeAnnotation}} and replace {{DecimalMetadata}} with > {{DecimalLogicalTypeAnnotation.}} > {{}} > The files to be changed are as follows: > * VectorizedColumnReader.java > * ParquetFilters.scala > * ParquetReadSupport.scala > * ParquetRowConverter.scala > * ParquetSchemaConverter.scala > > {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34762) Many PR's Scala 2.13 build action failed
[ https://issues.apache.org/jira/browse/SPARK-34762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302544#comment-17302544 ] Yang Jie commented on SPARK-34762: -- Maven compilation seems not failed cc [~dongjoon] [~srowen] [~hyukjin.kwon] > Many PR's Scala 2.13 build action failed > > > Key: SPARK-34762 > URL: https://issues.apache.org/jira/browse/SPARK-34762 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Major > > PR with Scala 2.13 build failure includes > * [https://github.com/apache/spark/pull/31849] > * [https://github.com/apache/spark/pull/31848] > * [https://github.com/apache/spark/pull/31844] > * [https://github.com/apache/spark/pull/31843] > {code:java} > [error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1: > error: package org.apache.commons.cli does not exist > 1278[error] import org.apache.commons.cli.GnuParser; > 1279[error] ^ > 1280[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1: > error: cannot find symbol > 1281[error] private final Options options = new Options(); > 1282[error] ^ symbol: class Options > 1283[error] location: class ServerOptionsProcessor > 1284[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:177:1: > error: package org.apache.commons.cli does not exist > 1285[error] private org.apache.commons.cli.CommandLine commandLine; > 1286[error] ^ > 1287[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:255:1: > error: cannot find symbol > 1288[error] HelpOptionExecutor(String serverName, Options options) { > 1289[error] ^ symbol: class > Options > 1290[error] location: class HelpOptionExecutor > 1291[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1: > error: cannot find symbol > 1292[error] private final Options options = new Options(); > 1293[error] ^ symbol: class Options > 1294[error] location: class ServerOptionsProcessor > 1295[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:185:1: > error: cannot find symbol > 1296[error] options.addOption(OptionBuilder > 1297[error] ^ symbol: variable OptionBuilder > 1298[error] location: class ServerOptionsProcessor > 1299[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:192:1: > error: cannot find symbol > 1300[error] options.addOption(new Option("H", "help", false, "Print > help information")); > 1301[error] ^ symbol: class Option > 1302[error] location: class ServerOptionsProcessor > 1303[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:197:1: > error: cannot find symbol > 1304[error] commandLine = new GnuParser().parse(options, argv); > 1305[error] ^ symbol: class GnuParser > 1306[error] location: class ServerOptionsProcessor > 1307[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:211:1: > error: cannot find symbol > 1308[error] } catch (ParseException e) { > 1309[error]^ symbol: class ParseException > 1310[error] location: class ServerOptionsProcessor > 1311[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:262:1: > error: cannot find symbol > 1312[error] new HelpFormatter().printHelp(serverName, options); > 1313[error] ^ symbol: class HelpFormatter > 1314[error] location: class HelpOptionExecutor > 1315[error] Note: Some input files use or override a deprecated API. > 1316[error] Note: Recompile with -Xlint:deprecation for details. > 1317[error] 16 errors > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34762) Many PR's Scala 2.13 build action failed
Yang Jie created SPARK-34762: Summary: Many PR's Scala 2.13 build action failed Key: SPARK-34762 URL: https://issues.apache.org/jira/browse/SPARK-34762 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.2.0 Reporter: Yang Jie PR with Scala 2.13 build failure includes * [https://github.com/apache/spark/pull/31849] * [https://github.com/apache/spark/pull/31848] * [https://github.com/apache/spark/pull/31844] * [https://github.com/apache/spark/pull/31843] {code:java} [error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1: error: package org.apache.commons.cli does not exist 1278[error] import org.apache.commons.cli.GnuParser; 1279[error] ^ 1280[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1: error: cannot find symbol 1281[error] private final Options options = new Options(); 1282[error] ^ symbol: class Options 1283[error] location: class ServerOptionsProcessor 1284[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:177:1: error: package org.apache.commons.cli does not exist 1285[error] private org.apache.commons.cli.CommandLine commandLine; 1286[error] ^ 1287[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:255:1: error: cannot find symbol 1288[error] HelpOptionExecutor(String serverName, Options options) { 1289[error] ^ symbol: class Options 1290[error] location: class HelpOptionExecutor 1291[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1: error: cannot find symbol 1292[error] private final Options options = new Options(); 1293[error] ^ symbol: class Options 1294[error] location: class ServerOptionsProcessor 1295[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:185:1: error: cannot find symbol 1296[error] options.addOption(OptionBuilder 1297[error] ^ symbol: variable OptionBuilder 1298[error] location: class ServerOptionsProcessor 1299[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:192:1: error: cannot find symbol 1300[error] options.addOption(new Option("H", "help", false, "Print help information")); 1301[error] ^ symbol: class Option 1302[error] location: class ServerOptionsProcessor 1303[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:197:1: error: cannot find symbol 1304[error] commandLine = new GnuParser().parse(options, argv); 1305[error] ^ symbol: class GnuParser 1306[error] location: class ServerOptionsProcessor 1307[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:211:1: error: cannot find symbol 1308[error] } catch (ParseException e) { 1309[error]^ symbol: class ParseException 1310[error] location: class ServerOptionsProcessor 1311[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:262:1: error: cannot find symbol 1312[error] new HelpFormatter().printHelp(serverName, options); 1313[error] ^ symbol: class HelpFormatter 1314[error] location: class HelpOptionExecutor 1315[error] Note: Some input files use or override a deprecated API. 1316[error] Note: Recompile with -Xlint:deprecation for details. 1317[error] 16 errors {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34759) run JavaSparkSQLExample failed with Exception.
[ https://issues.apache.org/jira/browse/SPARK-34759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302502#comment-17302502 ] Apache Spark commented on SPARK-34759: -- User 'zengruios' has created a pull request for this issue: https://github.com/apache/spark/pull/31852 > run JavaSparkSQLExample failed with Exception. > -- > > Key: SPARK-34759 > URL: https://issues.apache.org/jira/browse/SPARK-34759 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 3.0.1, 3.1.1 >Reporter: zengrui >Priority: Minor > > run JavaSparkSQLExample failed with Exception. > The Exception is thrown in function runDatasetCreationExample, when execute > ‘spark.read().json(path).as(personEncoder)’. > The exception is 'Exception in thread "main" > org.apache.spark.sql.AnalysisException: Cannot up cast `age` from bigint to > int.' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34759) run JavaSparkSQLExample failed with Exception.
[ https://issues.apache.org/jira/browse/SPARK-34759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34759: Assignee: (was: Apache Spark) > run JavaSparkSQLExample failed with Exception. > -- > > Key: SPARK-34759 > URL: https://issues.apache.org/jira/browse/SPARK-34759 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 3.0.1, 3.1.1 >Reporter: zengrui >Priority: Minor > > run JavaSparkSQLExample failed with Exception. > The Exception is thrown in function runDatasetCreationExample, when execute > ‘spark.read().json(path).as(personEncoder)’. > The exception is 'Exception in thread "main" > org.apache.spark.sql.AnalysisException: Cannot up cast `age` from bigint to > int.' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34759) run JavaSparkSQLExample failed with Exception.
[ https://issues.apache.org/jira/browse/SPARK-34759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302501#comment-17302501 ] Apache Spark commented on SPARK-34759: -- User 'zengruios' has created a pull request for this issue: https://github.com/apache/spark/pull/31852 > run JavaSparkSQLExample failed with Exception. > -- > > Key: SPARK-34759 > URL: https://issues.apache.org/jira/browse/SPARK-34759 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 3.0.1, 3.1.1 >Reporter: zengrui >Priority: Minor > > run JavaSparkSQLExample failed with Exception. > The Exception is thrown in function runDatasetCreationExample, when execute > ‘spark.read().json(path).as(personEncoder)’. > The exception is 'Exception in thread "main" > org.apache.spark.sql.AnalysisException: Cannot up cast `age` from bigint to > int.' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34759) run JavaSparkSQLExample failed with Exception.
[ https://issues.apache.org/jira/browse/SPARK-34759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34759: Assignee: Apache Spark > run JavaSparkSQLExample failed with Exception. > -- > > Key: SPARK-34759 > URL: https://issues.apache.org/jira/browse/SPARK-34759 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 3.0.1, 3.1.1 >Reporter: zengrui >Assignee: Apache Spark >Priority: Minor > > run JavaSparkSQLExample failed with Exception. > The Exception is thrown in function runDatasetCreationExample, when execute > ‘spark.read().json(path).as(personEncoder)’. > The exception is 'Exception in thread "main" > org.apache.spark.sql.AnalysisException: Cannot up cast `age` from bigint to > int.' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34760) run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample().
[ https://issues.apache.org/jira/browse/SPARK-34760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302497#comment-17302497 ] Apache Spark commented on SPARK-34760: -- User 'zengruios' has created a pull request for this issue: https://github.com/apache/spark/pull/31851 > run JavaSQLDataSourceExample failed with Exception in > runBasicDataSourceExample(). > -- > > Key: SPARK-34760 > URL: https://issues.apache.org/jira/browse/SPARK-34760 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 3.0.1, 3.1.1 >Reporter: zengrui >Priority: Minor > > run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample(). > when excecute > 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");' > throws Exception: 'Exception in thread "main" > org.apache.spark.sql.AnalysisException: partition column favorite_color is > not defined in table people_partitioned_bucketed, defined table columns are: > age, name;' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34760) run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample().
[ https://issues.apache.org/jira/browse/SPARK-34760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34760: Assignee: Apache Spark > run JavaSQLDataSourceExample failed with Exception in > runBasicDataSourceExample(). > -- > > Key: SPARK-34760 > URL: https://issues.apache.org/jira/browse/SPARK-34760 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 3.0.1, 3.1.1 >Reporter: zengrui >Assignee: Apache Spark >Priority: Minor > > run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample(). > when excecute > 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");' > throws Exception: 'Exception in thread "main" > org.apache.spark.sql.AnalysisException: partition column favorite_color is > not defined in table people_partitioned_bucketed, defined table columns are: > age, name;' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34760) run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample().
[ https://issues.apache.org/jira/browse/SPARK-34760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302496#comment-17302496 ] Apache Spark commented on SPARK-34760: -- User 'zengruios' has created a pull request for this issue: https://github.com/apache/spark/pull/31851 > run JavaSQLDataSourceExample failed with Exception in > runBasicDataSourceExample(). > -- > > Key: SPARK-34760 > URL: https://issues.apache.org/jira/browse/SPARK-34760 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 3.0.1, 3.1.1 >Reporter: zengrui >Priority: Minor > > run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample(). > when excecute > 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");' > throws Exception: 'Exception in thread "main" > org.apache.spark.sql.AnalysisException: partition column favorite_color is > not defined in table people_partitioned_bucketed, defined table columns are: > age, name;' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34760) run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample().
[ https://issues.apache.org/jira/browse/SPARK-34760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34760: Assignee: (was: Apache Spark) > run JavaSQLDataSourceExample failed with Exception in > runBasicDataSourceExample(). > -- > > Key: SPARK-34760 > URL: https://issues.apache.org/jira/browse/SPARK-34760 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 3.0.1, 3.1.1 >Reporter: zengrui >Priority: Minor > > run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample(). > when excecute > 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");' > throws Exception: 'Exception in thread "main" > org.apache.spark.sql.AnalysisException: partition column favorite_color is > not defined in table people_partitioned_bucketed, defined table columns are: > age, name;' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34760) run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample().
[ https://issues.apache.org/jira/browse/SPARK-34760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zengrui updated SPARK-34760: Description: run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample(). when excecute 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");' throws Exception: 'Exception in thread "main" org.apache.spark.sql.AnalysisException: partition column favorite_color is not defined in table people_partitioned_bucketed, defined table columns are: age, name;' was: run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample(). when excecute 'peopleDF.write().partitionBy("age").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");' throws Exception: 'Exception in thread "main" org.apache.spark.sql.AnalysisException: partition column favorite_color is not defined in table people_partitioned_bucketed, defined table columns are: age, name;' > run JavaSQLDataSourceExample failed with Exception in > runBasicDataSourceExample(). > -- > > Key: SPARK-34760 > URL: https://issues.apache.org/jira/browse/SPARK-34760 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 3.0.1, 3.1.1 >Reporter: zengrui >Priority: Minor > > run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample(). > when excecute > 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");' > throws Exception: 'Exception in thread "main" > org.apache.spark.sql.AnalysisException: partition column favorite_color is > not defined in table people_partitioned_bucketed, defined table columns are: > age, name;' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34760) run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample().
[ https://issues.apache.org/jira/browse/SPARK-34760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zengrui updated SPARK-34760: Summary: run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample(). (was: run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample().) > run JavaSQLDataSourceExample failed with Exception in > runBasicDataSourceExample(). > -- > > Key: SPARK-34760 > URL: https://issues.apache.org/jira/browse/SPARK-34760 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 3.0.1, 3.1.1 >Reporter: zengrui >Priority: Minor > > run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample(). > when excecute > 'peopleDF.write().partitionBy("age").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");' > throws Exception: 'Exception in thread "main" > org.apache.spark.sql.AnalysisException: partition column favorite_color is > not defined in table people_partitioned_bucketed, defined table columns are: > age, name;' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21449) Hive client's SessionState was not closed properly in HiveExternalCatalog
[ https://issues.apache.org/jira/browse/SPARK-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302439#comment-17302439 ] Apache Spark commented on SPARK-21449: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/31850 > Hive client's SessionState was not closed properly in HiveExternalCatalog > -- > > Key: SPARK-21449 > URL: https://issues.apache.org/jira/browse/SPARK-21449 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.2.0 > > > close the sessionstate to clear `hive.downloaded.resources.dir` and else. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21449) Hive client's SessionState was not closed properly in HiveExternalCatalog
[ https://issues.apache.org/jira/browse/SPARK-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302438#comment-17302438 ] Apache Spark commented on SPARK-21449: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/31850 > Hive client's SessionState was not closed properly in HiveExternalCatalog > -- > > Key: SPARK-21449 > URL: https://issues.apache.org/jira/browse/SPARK-21449 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.2.0 > > > close the sessionstate to clear `hive.downloaded.resources.dir` and else. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34761) Add a day-time interval to a timestamp
Max Gekk created SPARK-34761: Summary: Add a day-time interval to a timestamp Key: SPARK-34761 URL: https://issues.apache.org/jira/browse/SPARK-34761 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Max Gekk Assignee: Max Gekk Fix For: 3.2.0 Support adding of YearMonthIntervalType values to TIMESTAMP values. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34761) Add a day-time interval to a timestamp
[ https://issues.apache.org/jira/browse/SPARK-34761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-34761: - Description: Support adding of DayTimeIntervalType values to TIMESTAMP values. (was: Support adding of YearMonthIntervalType values to TIMESTAMP values.) > Add a day-time interval to a timestamp > -- > > Key: SPARK-34761 > URL: https://issues.apache.org/jira/browse/SPARK-34761 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > Support adding of DayTimeIntervalType values to TIMESTAMP values. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34754) sparksql 'add jar' not support hdfs ha mode in k8s
[ https://issues.apache.org/jira/browse/SPARK-34754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lithiumlee-_- updated SPARK-34754: -- Description: Submit app to K8S, the executors meet exception "java.net.UnknownHostException: xx". The udf jar uri using hdfs ha style, but the exception stack show "...*createNonHAProxy*..." hql: {code:java} // code placeholder add jar hdfs://xx/test.jar; create temporary function test_udf as 'com.xxx.xxx'; create table test.test_udf as select test_udf('1') name_1; {code} exception: {code:java} // code placeholder TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): java.lang.IllegalArgumentException: java.net.UnknownHostException: xx at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390) at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:816) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:808) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:808) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.UnknownHostException: xx ... 28 more {code} was: Submit app to K8S, the driver already running but meet exception "java.net.UnknownHostException: xx" when starting executors. The udf jar uri using ha style, but the exception stack is "...*createNonHAProxy*..." hql: {code:java} // code placeholder add jar hdfs://xx/test.jar; create temporary function test_udf as 'com.xxx.xxx'; create table test.test_udf as select test_udf('1') name_1; {code} exception: {code:java} // code placeholder TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): java.lang.IllegalArgumentException: java.net.UnknownHostException: xx at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390) at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721) at
[jira] [Created] (SPARK-34760) run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample().
zengrui created SPARK-34760: --- Summary: run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample(). Key: SPARK-34760 URL: https://issues.apache.org/jira/browse/SPARK-34760 Project: Spark Issue Type: Bug Components: Examples Affects Versions: 3.1.1, 3.0.1 Reporter: zengrui run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample(). when excecute 'peopleDF.write().partitionBy("age").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");' throws Exception: 'Exception in thread "main" org.apache.spark.sql.AnalysisException: partition column favorite_color is not defined in table people_partitioned_bucketed, defined table columns are: age, name;' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34757) Spark submit should ignore cache for SNAPSHOT dependencies
[ https://issues.apache.org/jira/browse/SPARK-34757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34757: Assignee: (was: Apache Spark) > Spark submit should ignore cache for SNAPSHOT dependencies > -- > > Key: SPARK-34757 > URL: https://issues.apache.org/jira/browse/SPARK-34757 > Project: Spark > Issue Type: Bug > Components: Deploy, Spark Core >Affects Versions: 3.1.1 >Reporter: Bo Zhang >Priority: Major > > When spark-submit is executed with --packages, it will not download the > dependency jars when they are available in cache (e.g. ivy cache), even when > the dependencies are SNAPSHOTs. > This might block developers who work on external modules in Spark (e.g. > spark-avro), since they need to remove the cache manually every time when > they update the code during developments (which generates SNAPSHOT jars). > Without knowing this, they could be blocked wondering why their code changes > are not reflected in spark-submit executions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34757) Spark submit should ignore cache for SNAPSHOT dependencies
[ https://issues.apache.org/jira/browse/SPARK-34757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302373#comment-17302373 ] Apache Spark commented on SPARK-34757: -- User 'bozhang2820' has created a pull request for this issue: https://github.com/apache/spark/pull/31849 > Spark submit should ignore cache for SNAPSHOT dependencies > -- > > Key: SPARK-34757 > URL: https://issues.apache.org/jira/browse/SPARK-34757 > Project: Spark > Issue Type: Bug > Components: Deploy, Spark Core >Affects Versions: 3.1.1 >Reporter: Bo Zhang >Priority: Major > > When spark-submit is executed with --packages, it will not download the > dependency jars when they are available in cache (e.g. ivy cache), even when > the dependencies are SNAPSHOTs. > This might block developers who work on external modules in Spark (e.g. > spark-avro), since they need to remove the cache manually every time when > they update the code during developments (which generates SNAPSHOT jars). > Without knowing this, they could be blocked wondering why their code changes > are not reflected in spark-submit executions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34757) Spark submit should ignore cache for SNAPSHOT dependencies
[ https://issues.apache.org/jira/browse/SPARK-34757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34757: Assignee: Apache Spark > Spark submit should ignore cache for SNAPSHOT dependencies > -- > > Key: SPARK-34757 > URL: https://issues.apache.org/jira/browse/SPARK-34757 > Project: Spark > Issue Type: Bug > Components: Deploy, Spark Core >Affects Versions: 3.1.1 >Reporter: Bo Zhang >Assignee: Apache Spark >Priority: Major > > When spark-submit is executed with --packages, it will not download the > dependency jars when they are available in cache (e.g. ivy cache), even when > the dependencies are SNAPSHOTs. > This might block developers who work on external modules in Spark (e.g. > spark-avro), since they need to remove the cache manually every time when > they update the code during developments (which generates SNAPSHOT jars). > Without knowing this, they could be blocked wondering why their code changes > are not reflected in spark-submit executions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34757) Spark submit should ignore cache for SNAPSHOT dependencies
[ https://issues.apache.org/jira/browse/SPARK-34757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34757: Assignee: Apache Spark > Spark submit should ignore cache for SNAPSHOT dependencies > -- > > Key: SPARK-34757 > URL: https://issues.apache.org/jira/browse/SPARK-34757 > Project: Spark > Issue Type: Bug > Components: Deploy, Spark Core >Affects Versions: 3.1.1 >Reporter: Bo Zhang >Assignee: Apache Spark >Priority: Major > > When spark-submit is executed with --packages, it will not download the > dependency jars when they are available in cache (e.g. ivy cache), even when > the dependencies are SNAPSHOTs. > This might block developers who work on external modules in Spark (e.g. > spark-avro), since they need to remove the cache manually every time when > they update the code during developments (which generates SNAPSHOT jars). > Without knowing this, they could be blocked wondering why their code changes > are not reflected in spark-submit executions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34759) run JavaSparkSQLExample failed with Exception.
zengrui created SPARK-34759: --- Summary: run JavaSparkSQLExample failed with Exception. Key: SPARK-34759 URL: https://issues.apache.org/jira/browse/SPARK-34759 Project: Spark Issue Type: Bug Components: Examples Affects Versions: 3.1.1, 3.0.1 Reporter: zengrui run JavaSparkSQLExample failed with Exception. The Exception is thrown in function runDatasetCreationExample, when execute ‘spark.read().json(path).as(personEncoder)’. The exception is 'Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot up cast `age` from bigint to int.' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34758) Simplify Analyzer.resolveLiteralFunction
[ https://issues.apache.org/jira/browse/SPARK-34758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302354#comment-17302354 ] Apache Spark commented on SPARK-34758: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/31844 > Simplify Analyzer.resolveLiteralFunction > > > Key: SPARK-34758 > URL: https://issues.apache.org/jira/browse/SPARK-34758 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34758) Simplify Analyzer.resolveLiteralFunction
[ https://issues.apache.org/jira/browse/SPARK-34758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34758: Assignee: Apache Spark > Simplify Analyzer.resolveLiteralFunction > > > Key: SPARK-34758 > URL: https://issues.apache.org/jira/browse/SPARK-34758 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34758) Simplify Analyzer.resolveLiteralFunction
[ https://issues.apache.org/jira/browse/SPARK-34758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302353#comment-17302353 ] Apache Spark commented on SPARK-34758: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/31844 > Simplify Analyzer.resolveLiteralFunction > > > Key: SPARK-34758 > URL: https://issues.apache.org/jira/browse/SPARK-34758 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34758) Simplify Analyzer.resolveLiteralFunction
[ https://issues.apache.org/jira/browse/SPARK-34758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34758: Assignee: (was: Apache Spark) > Simplify Analyzer.resolveLiteralFunction > > > Key: SPARK-34758 > URL: https://issues.apache.org/jira/browse/SPARK-34758 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34758) Simplify Analyzer.resolveLiteralFunction
Wenchen Fan created SPARK-34758: --- Summary: Simplify Analyzer.resolveLiteralFunction Key: SPARK-34758 URL: https://issues.apache.org/jira/browse/SPARK-34758 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34757) Spark submit should ignore cache for SNAPSHOT dependencies
Bo Zhang created SPARK-34757: Summary: Spark submit should ignore cache for SNAPSHOT dependencies Key: SPARK-34757 URL: https://issues.apache.org/jira/browse/SPARK-34757 Project: Spark Issue Type: Bug Components: Deploy, Spark Core Affects Versions: 3.1.1 Reporter: Bo Zhang When spark-submit is executed with --packages, it will not download the dependency jars when they are available in cache (e.g. ivy cache), even when the dependencies are SNAPSHOTs. This might block developers who work on external modules in Spark (e.g. spark-avro), since they need to remove the cache manually every time when they update the code during developments (which generates SNAPSHOT jars). Without knowing this, they could be blocked wondering why their code changes are not reflected in spark-submit executions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org