[jira] [Created] (SPARK-34770) InMemoryCatalog.tableExists should not fail if database doesn't exist

2021-03-16 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-34770:
---

 Summary: InMemoryCatalog.tableExists should not fail if database 
doesn't exist
 Key: SPARK-34770
 URL: https://issues.apache.org/jira/browse/SPARK-34770
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34768) Respect the default input buffer size in Univocity

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34768:


Assignee: Apache Spark

> Respect the default input buffer size in Univocity
> --
>
> Key: SPARK-34768
> URL: https://issues.apache.org/jira/browse/SPARK-34768
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> Currenty Univocity 2.9.1 faces a bug such as 
> https://github.com/uniVocity/univocity-parsers/issues/449.
> While this is a bug, another factor is that we don't respect Univocity's 
> default value which makes Spark exposed to non-test coverage in Univocity.
> We should resect Univocity's default input buffer value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34769) AnsiTypeCoercion: return narrowest convertible type among TypeCollection

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34769:


Assignee: Gengliang Wang  (was: Apache Spark)

> AnsiTypeCoercion: return narrowest convertible type among TypeCollection
> 
>
> Key: SPARK-34769
> URL: https://issues.apache.org/jira/browse/SPARK-34769
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Currently, when implicit casting a data type to a `TypeCollection`, Spark 
> returns the first convertible data type among `TypeCollection`.
> In ANSI mode, we can make the behavior more reasonable by returning the 
> narrowest convertible data type in `TypeCollection`.
> In details,  we first try to find the all the expected types we can 
> implicitly cast:
> 1. if there is no convertible data types, return None;
> 2. if there is only one convertible data type, cast input as it;
> 3. otherwise if there are multiple convertible data types, find the narrowest 
> common data
>  type among them. If there is no such narrowest common data type, return 
> None.
> Note that if the narrowest common type is Float type and the convertible 
> types contains Double ype, simply return Double type as the narrowest common 
> type to avoid potential
>  precision loss on converting the Integral type as Float type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34768) Respect the default input buffer size in Univocity

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34768:


Assignee: (was: Apache Spark)

> Respect the default input buffer size in Univocity
> --
>
> Key: SPARK-34768
> URL: https://issues.apache.org/jira/browse/SPARK-34768
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currenty Univocity 2.9.1 faces a bug such as 
> https://github.com/uniVocity/univocity-parsers/issues/449.
> While this is a bug, another factor is that we don't respect Univocity's 
> default value which makes Spark exposed to non-test coverage in Univocity.
> We should resect Univocity's default input buffer value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34769) AnsiTypeCoercion: return narrowest convertible type among TypeCollection

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34769:


Assignee: Apache Spark  (was: Gengliang Wang)

> AnsiTypeCoercion: return narrowest convertible type among TypeCollection
> 
>
> Key: SPARK-34769
> URL: https://issues.apache.org/jira/browse/SPARK-34769
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Currently, when implicit casting a data type to a `TypeCollection`, Spark 
> returns the first convertible data type among `TypeCollection`.
> In ANSI mode, we can make the behavior more reasonable by returning the 
> narrowest convertible data type in `TypeCollection`.
> In details,  we first try to find the all the expected types we can 
> implicitly cast:
> 1. if there is no convertible data types, return None;
> 2. if there is only one convertible data type, cast input as it;
> 3. otherwise if there are multiple convertible data types, find the narrowest 
> common data
>  type among them. If there is no such narrowest common data type, return 
> None.
> Note that if the narrowest common type is Float type and the convertible 
> types contains Double ype, simply return Double type as the narrowest common 
> type to avoid potential
>  precision loss on converting the Integral type as Float type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34769) AnsiTypeCoercion: return narrowest convertible type among TypeCollection

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303095#comment-17303095
 ] 

Apache Spark commented on SPARK-34769:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/31859

> AnsiTypeCoercion: return narrowest convertible type among TypeCollection
> 
>
> Key: SPARK-34769
> URL: https://issues.apache.org/jira/browse/SPARK-34769
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Currently, when implicit casting a data type to a `TypeCollection`, Spark 
> returns the first convertible data type among `TypeCollection`.
> In ANSI mode, we can make the behavior more reasonable by returning the 
> narrowest convertible data type in `TypeCollection`.
> In details,  we first try to find the all the expected types we can 
> implicitly cast:
> 1. if there is no convertible data types, return None;
> 2. if there is only one convertible data type, cast input as it;
> 3. otherwise if there are multiple convertible data types, find the narrowest 
> common data
>  type among them. If there is no such narrowest common data type, return 
> None.
> Note that if the narrowest common type is Float type and the convertible 
> types contains Double ype, simply return Double type as the narrowest common 
> type to avoid potential
>  precision loss on converting the Integral type as Float type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34768) Respect the default input buffer size in Univocity

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303094#comment-17303094
 ] 

Apache Spark commented on SPARK-34768:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/31858

> Respect the default input buffer size in Univocity
> --
>
> Key: SPARK-34768
> URL: https://issues.apache.org/jira/browse/SPARK-34768
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currenty Univocity 2.9.1 faces a bug such as 
> https://github.com/uniVocity/univocity-parsers/issues/449.
> While this is a bug, another factor is that we don't respect Univocity's 
> default value which makes Spark exposed to non-test coverage in Univocity.
> We should resect Univocity's default input buffer value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34769) AnsiTypeCoercion: return narrowest convertible type among TypeCollection

2021-03-16 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-34769:
--

 Summary: AnsiTypeCoercion: return narrowest convertible type among 
TypeCollection
 Key: SPARK-34769
 URL: https://issues.apache.org/jira/browse/SPARK-34769
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Currently, when implicit casting a data type to a `TypeCollection`, Spark 
returns the first convertible data type among `TypeCollection`.
In ANSI mode, we can make the behavior more reasonable by returning the 
narrowest convertible data type in `TypeCollection`.

In details,  we first try to find the all the expected types we can implicitly 
cast:
1. if there is no convertible data types, return None;
2. if there is only one convertible data type, cast input as it;
3. otherwise if there are multiple convertible data types, find the narrowest 
common data
 type among them. If there is no such narrowest common data type, return 
None.

Note that if the narrowest common type is Float type and the convertible types 
contains Double ype, simply return Double type as the narrowest common type to 
avoid potential
 precision loss on converting the Integral type as Float type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34768) Respect the default input buffer size in Univocity

2021-03-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34768:
-
Issue Type: Bug  (was: Improvement)

> Respect the default input buffer size in Univocity
> --
>
> Key: SPARK-34768
> URL: https://issues.apache.org/jira/browse/SPARK-34768
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currenty Univocity 2.9.1 faces a bug such as 
> https://github.com/uniVocity/univocity-parsers/issues/449.
> While this is a bug, another factor is that we don't respect Univocity's 
> default value which makes Spark exposed to non-test coverage in Univocity.
> We should resect Univocity's default input buffer value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34768) Respect the default input buffer size in Univocity

2021-03-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34768:
-
Affects Version/s: 3.0.2

> Respect the default input buffer size in Univocity
> --
>
> Key: SPARK-34768
> URL: https://issues.apache.org/jira/browse/SPARK-34768
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.1
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currenty Univocity 2.9.1 faces a bug such as 
> https://github.com/uniVocity/univocity-parsers/issues/449.
> While this is a bug, another factor is that we don't respect Univocity's 
> default value which makes Spark exposed to non-test coverage in Univocity.
> We should resect Univocity's default input buffer value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34768) Respect the default input buffer size in Univocity

2021-03-16 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-34768:


 Summary: Respect the default input buffer size in Univocity
 Key: SPARK-34768
 URL: https://issues.apache.org/jira/browse/SPARK-34768
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.1
Reporter: Hyukjin Kwon


Currenty Univocity 2.9.1 faces a bug such as 
https://github.com/uniVocity/univocity-parsers/issues/449.

While this is a bug, another factor is that we don't respect Univocity's 
default value which makes Spark exposed to non-test coverage in Univocity.

We should resect Univocity's default input buffer value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10697) Lift Calculation in Association Rule mining

2021-03-16 Thread Yashwanth Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303086#comment-17303086
 ] 

Yashwanth Kumar edited comment on SPARK-10697 at 3/17/21, 4:36 AM:
---

Glad that the change proposed by me 5 yrs back got resolved. Sorry that apache 
account got disabled. Got a new one now. looking forward to contribute.


was (Author: yashkumards):
Glad that the change proposed by me 5 yrs back got resolved. Sorry that apache 
account got disabled. Got a new one now.

> Lift Calculation in Association Rule mining
> ---
>
> Key: SPARK-10697
> URL: https://issues.apache.org/jira/browse/SPARK-10697
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Yashwanth Kumar
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.0
>
>
> Lift is to be calculated for Association rule mining in 
> AssociationRules.scala under FPM.
> Lift is a measure of the performance of a  Association rules.
> Adding lift will help to compare the model efficiency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10697) Lift Calculation in Association Rule mining

2021-03-16 Thread Yashwanth Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303086#comment-17303086
 ] 

Yashwanth Kumar commented on SPARK-10697:
-

Glad that the change proposed by me 5 yrs back got resolved. Sorry that apache 
account got disabled. Got a new one now.

> Lift Calculation in Association Rule mining
> ---
>
> Key: SPARK-10697
> URL: https://issues.apache.org/jira/browse/SPARK-10697
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Yashwanth Kumar
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.0
>
>
> Lift is to be calculated for Association rule mining in 
> AssociationRules.scala under FPM.
> Lift is a measure of the performance of a  Association rules.
> Adding lift will help to compare the model efficiency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34767) sparkstreaming对接kafka数据,在idea程序上出现警告信息并卡住不动的解决办法

2021-03-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-34767.
--
Resolution: Incomplete

> sparkstreaming对接kafka数据,在idea程序上出现警告信息并卡住不动的解决办法
> 
>
> Key: SPARK-34767
> URL: https://issues.apache.org/jira/browse/SPARK-34767
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.1
>Reporter: MrYang
>Priority: Blocker
>
> 2021-03-17 11:16:56,711 WARN --- [ main] 
> org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding 
> enable.auto.commit to false for executor
> 2021-03-17 11:16:56,714 WARN --- [ main] 
> org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding 
> auto.offset.reset to none for executor
> 2021-03-17 11:16:56,714 WARN --- [ main] 
> org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding 
> executor group.id to spark-executor-recommender
> 2021-03-17 11:16:56,715 WARN --- [ main] 
> org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding 
> receive.buffer.bytes to 65536 see KAFKA-3135



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34767) sparkstreaming对接kafka数据,在idea程序上出现警告信息并卡住不动的解决办法

2021-03-16 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303082#comment-17303082
 ] 

Hyukjin Kwon commented on SPARK-34767:
--

1. Please use English to communicate other maintainers. Many people don't speak 
Chinese.
2. Spark 2.1.x is EOL.


> sparkstreaming对接kafka数据,在idea程序上出现警告信息并卡住不动的解决办法
> 
>
> Key: SPARK-34767
> URL: https://issues.apache.org/jira/browse/SPARK-34767
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.1
>Reporter: MrYang
>Priority: Blocker
>
> 2021-03-17 11:16:56,711 WARN --- [ main] 
> org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding 
> enable.auto.commit to false for executor
> 2021-03-17 11:16:56,714 WARN --- [ main] 
> org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding 
> auto.offset.reset to none for executor
> 2021-03-17 11:16:56,714 WARN --- [ main] 
> org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding 
> executor group.id to spark-executor-recommender
> 2021-03-17 11:16:56,715 WARN --- [ main] 
> org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding 
> receive.buffer.bytes to 65536 see KAFKA-3135



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34504) Avoid unnecessary view resolving and remove the `performCheck` flag

2021-03-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34504.
-
Fix Version/s: 3.1.2
   3.2.0
   Resolution: Fixed

Issue resolved by pull request 31853
[https://github.com/apache/spark/pull/31853]

> Avoid unnecessary view resolving and remove the `performCheck` flag
> ---
>
> Key: SPARK-34504
> URL: https://issues.apache.org/jira/browse/SPARK-34504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Linhong Liu
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
>
> in SPARK-34490, I added a `performCheck` flag to skip analysis check when 
> resolving views. This is due to some view resolution is unnecessary. So we 
> can avoid these unnecessary view resolution and remove the `performCheck` 
> flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34504) Avoid unnecessary view resolving and remove the `performCheck` flag

2021-03-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34504:
---

Assignee: Wenchen Fan

> Avoid unnecessary view resolving and remove the `performCheck` flag
> ---
>
> Key: SPARK-34504
> URL: https://issues.apache.org/jira/browse/SPARK-34504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Linhong Liu
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
>
> in SPARK-34490, I added a `performCheck` flag to skip analysis check when 
> resolving views. This is due to some view resolution is unnecessary. So we 
> can avoid these unnecessary view resolution and remove the `performCheck` 
> flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34767) sparkstreaming对接kafka数据,在idea程序上出现警告信息并卡住不动的解决办法

2021-03-16 Thread MrYang (Jira)
MrYang created SPARK-34767:
--

 Summary: sparkstreaming对接kafka数据,在idea程序上出现警告信息并卡住不动的解决办法
 Key: SPARK-34767
 URL: https://issues.apache.org/jira/browse/SPARK-34767
 Project: Spark
  Issue Type: Bug
  Components: DStreams
Affects Versions: 2.1.1
Reporter: MrYang


2021-03-17 11:16:56,711 WARN --- [ main] 
org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding 
enable.auto.commit to false for executor
2021-03-17 11:16:56,714 WARN --- [ main] 
org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding 
auto.offset.reset to none for executor
2021-03-17 11:16:56,714 WARN --- [ main] 
org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding executor 
group.id to spark-executor-recommender
2021-03-17 11:16:56,715 WARN --- [ main] 
org.apache.spark.streaming.kafka010.KafkaUtils (line: 66) : overriding 
receive.buffer.bytes to 65536 see KAFKA-3135



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28220) join foldable condition not pushed down when parent filter is totally pushed down

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303058#comment-17303058
 ] 

Apache Spark commented on SPARK-28220:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/31857

> join foldable condition not pushed down when parent filter is totally pushed 
> down
> -
>
> Key: SPARK-28220
> URL: https://issues.apache.org/jira/browse/SPARK-28220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2, 3.0.0
>Reporter: liupengcheng
>Priority: Major
>
> We encountered a issue that join conditions not pushed down when we are 
> running spark app on spark2.3, after carefully looking into the code and 
> debugging, we found that it's because there is a bug in the rule 
> `PushPredicateThroughJoin`:
> It will try to push parent filter down though the join, however, when the 
> parent filter is wholly pushed down through the join, the join will become 
> the top node, and then the `transform` method will skip the join to apply the 
> rule. 
>  
> Suppose we have two tables: table1 and table2:
> table1: (a: string, b: string, c: string)
> table2: (d: string)
> sql as:
>  
> {code:java}
> select * from table1 left join (select d, 'w1' as r from table2) on a = d and 
> r = 'w2' where b = 2{code}
>  
> let's focus on the following optimizer rules:
> PushPredicateThroughJoin
> FodablePropagation
> BooleanSimplification
> PruneFilters
>  
> In the above case, on the first iteration of these rules:
> PushPredicateThroughJoin -> 
> {code:java}
> select * from table1 where b=2 left join (select d, 'w1' as r from table2) on 
> a = d and r = 'w2'
> {code}
> FodablePropagation ->
> {code:java}
> select * from table1 where b=2 left join (select d, 'w1' as r from table2) on 
> a = d and 'w1' = 'w2'{code}
> BooleanSimplification ->
> {code:java}
> select * from table1 where b=2 left join (select d, 'w1' as r from table2) on 
> false{code}
> PruneFilters -> No effective
>  
> After several iteration of these rules, the join condition will still never 
> be pushed to the 
> right hand of the left join. thus, in some case(e.g. Large right table), the 
> `BroadcastNestedLoopJoin` may be slow or oom.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28220) join foldable condition not pushed down when parent filter is totally pushed down

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303057#comment-17303057
 ] 

Apache Spark commented on SPARK-28220:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/31857

> join foldable condition not pushed down when parent filter is totally pushed 
> down
> -
>
> Key: SPARK-28220
> URL: https://issues.apache.org/jira/browse/SPARK-28220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2, 3.0.0
>Reporter: liupengcheng
>Priority: Major
>
> We encountered a issue that join conditions not pushed down when we are 
> running spark app on spark2.3, after carefully looking into the code and 
> debugging, we found that it's because there is a bug in the rule 
> `PushPredicateThroughJoin`:
> It will try to push parent filter down though the join, however, when the 
> parent filter is wholly pushed down through the join, the join will become 
> the top node, and then the `transform` method will skip the join to apply the 
> rule. 
>  
> Suppose we have two tables: table1 and table2:
> table1: (a: string, b: string, c: string)
> table2: (d: string)
> sql as:
>  
> {code:java}
> select * from table1 left join (select d, 'w1' as r from table2) on a = d and 
> r = 'w2' where b = 2{code}
>  
> let's focus on the following optimizer rules:
> PushPredicateThroughJoin
> FodablePropagation
> BooleanSimplification
> PruneFilters
>  
> In the above case, on the first iteration of these rules:
> PushPredicateThroughJoin -> 
> {code:java}
> select * from table1 where b=2 left join (select d, 'w1' as r from table2) on 
> a = d and r = 'w2'
> {code}
> FodablePropagation ->
> {code:java}
> select * from table1 where b=2 left join (select d, 'w1' as r from table2) on 
> a = d and 'w1' = 'w2'{code}
> BooleanSimplification ->
> {code:java}
> select * from table1 where b=2 left join (select d, 'w1' as r from table2) on 
> false{code}
> PruneFilters -> No effective
>  
> After several iteration of these rules, the join condition will still never 
> be pushed to the 
> right hand of the left join. thus, in some case(e.g. Large right table), the 
> `BroadcastNestedLoopJoin` may be slow or oom.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34762) Many PR's Scala 2.13 build action failed

2021-03-16 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303053#comment-17303053
 ] 

Yang Jie commented on SPARK-34762:
--

[~dongjoon]  It seems that the problem still exists, such as 
https://github.com/apache/spark/pull/31856, 
[https://github.com/apache/spark/pull/31855]  and 
https://github.com/apache/spark/pull/31854

> Many PR's Scala 2.13 build action failed
> 
>
> Key: SPARK-34762
> URL: https://issues.apache.org/jira/browse/SPARK-34762
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Major
>
> PR with Scala 2.13 build failure includes 
>  * [https://github.com/apache/spark/pull/31849]
>  * [https://github.com/apache/spark/pull/31848]
>  * [https://github.com/apache/spark/pull/31844]
>  * [https://github.com/apache/spark/pull/31843]
>  * https://github.com/apache/spark/pull/31841
> {code:java}
> [error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1:
>   error: package org.apache.commons.cli does not exist
> 1278[error] import org.apache.commons.cli.GnuParser;
> 1279[error]  ^
> 1280[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1:
>   error: cannot find symbol
> 1281[error] private final Options options = new Options();
> 1282[error]   ^  symbol:   class Options
> 1283[error]   location: class ServerOptionsProcessor
> 1284[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:177:1:
>   error: package org.apache.commons.cli does not exist
> 1285[error] private org.apache.commons.cli.CommandLine commandLine;
> 1286[error]   ^
> 1287[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:255:1:
>   error: cannot find symbol
> 1288[error] HelpOptionExecutor(String serverName, Options options) {
> 1289[error]   ^  symbol:   class 
> Options
> 1290[error]   location: class HelpOptionExecutor
> 1291[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1:
>   error: cannot find symbol
> 1292[error] private final Options options = new Options();
> 1293[error] ^  symbol:   class Options
> 1294[error]   location: class ServerOptionsProcessor
> 1295[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:185:1:
>   error: cannot find symbol
> 1296[error]   options.addOption(OptionBuilder
> 1297[error] ^  symbol:   variable OptionBuilder
> 1298[error]   location: class ServerOptionsProcessor
> 1299[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:192:1:
>   error: cannot find symbol
> 1300[error]   options.addOption(new Option("H", "help", false, "Print 
> help information"));
> 1301[error] ^  symbol:   class Option
> 1302[error]   location: class ServerOptionsProcessor
> 1303[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:197:1:
>   error: cannot find symbol
> 1304[error] commandLine = new GnuParser().parse(options, argv);
> 1305[error]   ^  symbol:   class GnuParser
> 1306[error]   location: class ServerOptionsProcessor
> 1307[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:211:1:
>   error: cannot find symbol
> 1308[error]   } catch (ParseException e) {
> 1309[error]^  symbol:   class ParseException
> 1310[error]   location: class ServerOptionsProcessor
> 1311[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:262:1:
>   error: cannot find symbol
> 1312[error]   new HelpFormatter().printHelp(serverName, options);
> 1313[error]   ^  symbol:   class HelpFormatter
> 1314[error]   location: class HelpOptionExecutor
> 1315[error] Note: Some input files use or override a deprecated API.
> 1316[error] Note: Recompile with -Xlint:deprecation for details.
> 1317[error] 16 errors
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional 

[jira] [Commented] (SPARK-34766) Do not capture maven config for views

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303022#comment-17303022
 ] 

Apache Spark commented on SPARK-34766:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/31856

> Do not capture maven config for views
> -
>
> Key: SPARK-34766
> URL: https://issues.apache.org/jira/browse/SPARK-34766
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> Due to the bad network, we always use the thirdparty maven repo to run test. 
> e.g.,
> {code:java}
> build/sbt "test:testOnly *SQLQueryTestSuite" 
> -Dspark.sql.maven.additionalRemoteRepositories=x
> {code}
>  
> It's failed with such error msg
> ```
> [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds)
> [info] show-tblproperties.sql
> [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][
> [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" 
> Result did not match for query #6
> [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464)
> ```
> It's not necessary to capture the maven config to view since it's a session 
> level config.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34766) Do not capture maven config for views

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34766:


Assignee: Apache Spark

> Do not capture maven config for views
> -
>
> Key: SPARK-34766
> URL: https://issues.apache.org/jira/browse/SPARK-34766
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Assignee: Apache Spark
>Priority: Minor
>
> Due to the bad network, we always use the thirdparty maven repo to run test. 
> e.g.,
> {code:java}
> build/sbt "test:testOnly *SQLQueryTestSuite" 
> -Dspark.sql.maven.additionalRemoteRepositories=x
> {code}
>  
> It's failed with such error msg
> ```
> [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds)
> [info] show-tblproperties.sql
> [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][
> [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" 
> Result did not match for query #6
> [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464)
> ```
> It's not necessary to capture the maven config to view since it's a session 
> level config.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34766) Do not capture maven config for views

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34766:


Assignee: (was: Apache Spark)

> Do not capture maven config for views
> -
>
> Key: SPARK-34766
> URL: https://issues.apache.org/jira/browse/SPARK-34766
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> Due to the bad network, we always use the thirdparty maven repo to run test. 
> e.g.,
> {code:java}
> build/sbt "test:testOnly *SQLQueryTestSuite" 
> -Dspark.sql.maven.additionalRemoteRepositories=x
> {code}
>  
> It's failed with such error msg
> ```
> [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds)
> [info] show-tblproperties.sql
> [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][
> [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" 
> Result did not match for query #6
> [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464)
> ```
> It's not necessary to capture the maven config to view since it's a session 
> level config.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34766) Do not capture maven config for views

2021-03-16 Thread ulysses you (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-34766:

Description: 
Due to the bad network, we always use the thirdparty maven repo to run test. 
e.g.,
{code:java}
build/sbt "test:testOnly *SQLQueryTestSuite" 
-Dspark.sql.maven.additionalRemoteRepositories=x
{code}
 
It's failed with such error msg
```
[info] - show-tblproperties.sql *** FAILED *** (128 milliseconds)
[info] show-tblproperties.sql
[info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][
[info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" 
Result did not match for query #6
[info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464)
```

It's not necessary to capture the maven config to view since it's a session 
level config.

 

> Do not capture maven config for views
> -
>
> Key: SPARK-34766
> URL: https://issues.apache.org/jira/browse/SPARK-34766
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> Due to the bad network, we always use the thirdparty maven repo to run test. 
> e.g.,
> {code:java}
> build/sbt "test:testOnly *SQLQueryTestSuite" 
> -Dspark.sql.maven.additionalRemoteRepositories=x
> {code}
>  
> It's failed with such error msg
> ```
> [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds)
> [info] show-tblproperties.sql
> [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][
> [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" 
> Result did not match for query #6
> [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464)
> ```
> It's not necessary to capture the maven config to view since it's a session 
> level config.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34766) Do not capture maven config for views

2021-03-16 Thread ulysses you (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-34766:

Environment: (was: Due to the bad network, we always use the thirdparty 
maven repo to run test. e.g.,
{code:java}
build/sbt "test:testOnly *SQLQueryTestSuite" 
-Dspark.sql.maven.additionalRemoteRepositories=x
{code}
 
It's failed with such error msg
```
[info] - show-tblproperties.sql *** FAILED *** (128 milliseconds)
[info] show-tblproperties.sql
[info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][
[info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" 
Result did not match for query #6
[info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464)
```

It's not necessary to capture the maven config to view since it's a session 
level config.

 )

> Do not capture maven config for views
> -
>
> Key: SPARK-34766
> URL: https://issues.apache.org/jira/browse/SPARK-34766
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34766) Do not capture maven config for views

2021-03-16 Thread ulysses you (Jira)
ulysses you created SPARK-34766:
---

 Summary: Do not capture maven config for views
 Key: SPARK-34766
 URL: https://issues.apache.org/jira/browse/SPARK-34766
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
 Environment: Due to the bad network, we always use the thirdparty 
maven repo to run test. e.g.,
{code:java}
build/sbt "test:testOnly *SQLQueryTestSuite" 
-Dspark.sql.maven.additionalRemoteRepositories=x
{code}
 
It's failed with such error msg
```
[info] - show-tblproperties.sql *** FAILED *** (128 milliseconds)
[info] show-tblproperties.sql
[info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][
[info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories x]" 
Result did not match for query #6
[info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464)
```

It's not necessary to capture the maven config to view since it's a session 
level config.

 
Reporter: ulysses you






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34765) Linear Models standardization optimization

2021-03-16 Thread zhengruifeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng updated SPARK-34765:
-
Description: 
Existing impl of standardization in linear models does *NOT* center the vectors 
by removing the means, for the purpose of keep the dataset sparsity.

However, this will cause feature values with small var be scaled to large 
values, and underlying solver like LBFGS can not efficiently handle this case. 
see SPARK-34448 for details.

If internal vectors are centers (like other famous impl, i.e. 
GLMNET/Scikit-Learn), the convergence ratio will be better. In the case in 
SPARK-34448, the number of iteration to convergence will be reduced from 93 to 
6. Moreover, the final solution is much more close to the one in GLMNET.

luckily, we find a new way to 'virtually' center the vectors without densifying 
the dataset, iff:

1, fitIntercept is true;
 2, no penalty on the intercept, it seem this is always true in existing impls;
 3, no bounds on the intercept;

 

We will also need to check whether this new methods work in all other linear 
models (i.e, mlor/svc/lir/aft, etc.) as we expected , and introduce it into 
those models if possible.

 

  was:
Existing impl of standardization in linear models do NOT center the vectors by 
removing the means, for the purpose of keep the dataset sparsity.

However, this will cause feature values with small var be scaled to large 
values, and underlying solver like LBFGS can not efficiently handle this case. 
see SPARK-34448 for details.

If internal vectors are centers (like other famous impl, i.e. 
GLMNET/Scikit-Learn), the convergence ratio will be better. In the case in 
SPARK-34448, the number of iteration to convergence will be reduced from 93 to 
6. Moreover, the final solution is much more close to the one in GLMNET.

luckily, we find a new way to 'virtually' center the vectors without densifying 
the dataset, iff:

1, fitIntercept is true;
2, no penalty on the intercept, it seem this is always true in existing impls;
3, no bounds on the intercept;

 

We will also need to check whether this new methods work in all other linear 
models (i.e, mlor/svc/lir/aft, etc.) as we expected , and introduce it into 
those model if possible.

 


> Linear Models standardization optimization
> --
>
> Key: SPARK-34765
> URL: https://issues.apache.org/jira/browse/SPARK-34765
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Affects Versions: 3.2.0, 3.1.1
>Reporter: zhengruifeng
>Priority: Major
>
> Existing impl of standardization in linear models does *NOT* center the 
> vectors by removing the means, for the purpose of keep the dataset sparsity.
> However, this will cause feature values with small var be scaled to large 
> values, and underlying solver like LBFGS can not efficiently handle this 
> case. see SPARK-34448 for details.
> If internal vectors are centers (like other famous impl, i.e. 
> GLMNET/Scikit-Learn), the convergence ratio will be better. In the case in 
> SPARK-34448, the number of iteration to convergence will be reduced from 93 
> to 6. Moreover, the final solution is much more close to the one in GLMNET.
> luckily, we find a new way to 'virtually' center the vectors without 
> densifying the dataset, iff:
> 1, fitIntercept is true;
>  2, no penalty on the intercept, it seem this is always true in existing 
> impls;
>  3, no bounds on the intercept;
>  
> We will also need to check whether this new methods work in all other linear 
> models (i.e, mlor/svc/lir/aft, etc.) as we expected , and introduce it into 
> those models if possible.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34765) Linear Models standardization optimization

2021-03-16 Thread zhengruifeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng updated SPARK-34765:
-
Issue Type: Umbrella  (was: Improvement)

> Linear Models standardization optimization
> --
>
> Key: SPARK-34765
> URL: https://issues.apache.org/jira/browse/SPARK-34765
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Affects Versions: 3.2.0, 3.1.1
>Reporter: zhengruifeng
>Priority: Major
>
> Existing impl of standardization in linear models do NOT center the vectors 
> by removing the means, for the purpose of keep the dataset sparsity.
> However, this will cause feature values with small var be scaled to large 
> values, and underlying solver like LBFGS can not efficiently handle this 
> case. see SPARK-34448 for details.
> If internal vectors are centers (like other famous impl, i.e. 
> GLMNET/Scikit-Learn), the convergence ratio will be better. In the case in 
> SPARK-34448, the number of iteration to convergence will be reduced from 93 
> to 6. Moreover, the final solution is much more close to the one in GLMNET.
> luckily, we find a new way to 'virtually' center the vectors without 
> densifying the dataset, iff:
> 1, fitIntercept is true;
> 2, no penalty on the intercept, it seem this is always true in existing impls;
> 3, no bounds on the intercept;
>  
> We will also need to check whether this new methods work in all other linear 
> models (i.e, mlor/svc/lir/aft, etc.) as we expected , and introduce it into 
> those model if possible.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34714) collect_list(struct()) fails when used with GROUP BY

2021-03-16 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-34714.
--
Fix Version/s: 3.1.2
   Resolution: Fixed

> collect_list(struct()) fails when used with GROUP BY
> 
>
> Key: SPARK-34714
> URL: https://issues.apache.org/jira/browse/SPARK-34714
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
> Environment: Databricks Runtime 8.0
>Reporter: Lauri Koobas
>Priority: Major
> Fix For: 3.1.2
>
>
> The following is failing in DBR8.0 / Spark 3.1.1, but works in earlier DBR 
> and Spark versions:
> {quote}with step_1 as (
>     select 'E' as name, named_struct('subfield', 1) as field_1
> )
> select name, collect_list(struct(field_1.subfield))
> from step_1
> group by 1
> {quote}
> Fails with the following error message:
> {quote}AnalysisException: cannot resolve 
> 'struct(step_1.`field_1`.`subfield`)' due to data type mismatch: Only 
> foldable string expressions are allowed to appear at odd position, got: 
> NamePlaceholder
> {quote}
> If you modify the query in any of the following ways then it still works::
>  * if you remove the field "name" and the "group by 1" part of the query
>  * if you remove the "struct()" from within the collect_list()
>  * if you use "named_struct()" instead of "struct()" within the collect_list()
> Similarly collect_set() is broken and possibly more related functions, but I 
> haven't done thorough testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34714) collect_list(struct()) fails when used with GROUP BY

2021-03-16 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303009#comment-17303009
 ] 

Takeshi Yamamuro commented on SPARK-34714:
--

Ah, I've checked the latest branch-3.1 again and I found the issue goes away. 
So, this issue will be resolved in v3.1.2.

> collect_list(struct()) fails when used with GROUP BY
> 
>
> Key: SPARK-34714
> URL: https://issues.apache.org/jira/browse/SPARK-34714
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
> Environment: Databricks Runtime 8.0
>Reporter: Lauri Koobas
>Priority: Major
>
> The following is failing in DBR8.0 / Spark 3.1.1, but works in earlier DBR 
> and Spark versions:
> {quote}with step_1 as (
>     select 'E' as name, named_struct('subfield', 1) as field_1
> )
> select name, collect_list(struct(field_1.subfield))
> from step_1
> group by 1
> {quote}
> Fails with the following error message:
> {quote}AnalysisException: cannot resolve 
> 'struct(step_1.`field_1`.`subfield`)' due to data type mismatch: Only 
> foldable string expressions are allowed to appear at odd position, got: 
> NamePlaceholder
> {quote}
> If you modify the query in any of the following ways then it still works::
>  * if you remove the field "name" and the "group by 1" part of the query
>  * if you remove the "struct()" from within the collect_list()
>  * if you use "named_struct()" instead of "struct()" within the collect_list()
> Similarly collect_set() is broken and possibly more related functions, but I 
> haven't done thorough testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34765) Linear Models standardization optimization

2021-03-16 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-34765:


 Summary: Linear Models standardization optimization
 Key: SPARK-34765
 URL: https://issues.apache.org/jira/browse/SPARK-34765
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 3.1.1, 3.2.0
Reporter: zhengruifeng


Existing impl of standardization in linear models do NOT center the vectors by 
removing the means, for the purpose of keep the dataset sparsity.

However, this will cause feature values with small var be scaled to large 
values, and underlying solver like LBFGS can not efficiently handle this case. 
see SPARK-34448 for details.

If internal vectors are centers (like other famous impl, i.e. 
GLMNET/Scikit-Learn), the convergence ratio will be better. In the case in 
SPARK-34448, the number of iteration to convergence will be reduced from 93 to 
6. Moreover, the final solution is much more close to the one in GLMNET.

luckily, we find a new way to 'virtually' center the vectors without densifying 
the dataset, iff:

1, fitIntercept is true;
2, no penalty on the intercept, it seem this is always true in existing impls;
3, no bounds on the intercept;

 

We will also need to check whether this new methods work in all other linear 
models (i.e, mlor/svc/lir/aft, etc.) as we expected , and introduce it into 
those model if possible.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-16 Thread Nivas Umapathy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303001#comment-17303001
 ] 

Nivas Umapathy edited comment on SPARK-34751 at 3/17/21, 12:53 AM:
---

the schema is extracted from the same file, before materializing the data

df = 
glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')

                       ^

by schema I meant this. The file was written out from pandas dataframe.

It was written out using this

{{}}

{{import pandas as pd}}

{{df = pd.DataFrame(}}
 {{data = {}}
 {{"COL 1": 
[87.0921538,41.26487033,1.731626487,99.02779887,80.750347,62.89799664,84.27772144,12.78995399,58.13994625,54.51677768],}}
 {{"COL,2": 
[28.431596,13.50141322,28.60878912,50.59429628,53.77345338,3.278319754,89.88524435,57.29173215,34.75955608,22.50907852],}}
 {{"COL;3": 
[48.12359525,2.751809433,64.45305108,40.97279762,46.3506431,68.57561523,67.52866381,18.70752371,44.86086801,8.42884315],}}
 {{"COL{4": 
[25.23141131,65.20640894,56.83503264,21.46097087,59.22963758,99.55784318,8.02616508,75.29924438,3.911268106,90.1820556],}}
 {{"COL}5": 
[37.00662369,82.24478025,27.89576774,9.549598639,46.92239754,10.48954042,81.71312268,49.991685,43.78556399,79.00133828],}}
 {{"COL(6": 
[71.21354798,82.33860851,12.88393027,23.47301417,76.36836392,18.43024893,51.48770487,93.20889954,72.66516434,18.07311939],}}
 {{"COL)7": 
[68.00032082,39.91265109,83.47701751,42.71072597,33.54784094,94.63751895,3.364241739,0.792257736,78.63395232,70.8626348],}}
 {{"COL\n8": 
[77.80604836,61.08923308,70.70871195,99.33277829,79.77837072,56.28812485,34.03977847,13.40720489,87.71281052,64.80060217],}}
 {{"COL=9": 
[60.00505851,46.51367893,87.1346726,7.202332939,49.50378799,56.70949031,99.39792697,52.08074715,18.25891755,67.88110289],}}
 {{"COL\t10": 
[78.60259718,96.87558507,20.04134901,80.46408956,69.97610739,42.96954652,22.45733464,32.00411095,52.83023296,87.48870904],}}
 {{}, columns = ["COL 
1","COL,2","COL;3","COL\\{4","COL}5","COL(6","COL)7","COL\n8","COL=9","COL\t10"])}}

{{df.to_parquet('invalid_columns_double.parquet')}}

 

Here is a link to my databricks notebook to reproduce this

[https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5940072345564347/3863439224328194/623184285031795/latest.html]


was (Author: toocoolblue2000):
the schema is extracted from the same file, before materializing the data

df = 
glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')

                       ^

by schema I meant this. The file was written out from pandas dataframe.

It was written out using this

{{}}

{{import pandas as pd}}

{{df = pd.DataFrame(}}
{{data = {}}
{{"COL 1": 
[87.0921538,41.26487033,1.731626487,99.02779887,80.750347,62.89799664,84.27772144,12.78995399,58.13994625,54.51677768],}}
{{"COL,2": 
[28.431596,13.50141322,28.60878912,50.59429628,53.77345338,3.278319754,89.88524435,57.29173215,34.75955608,22.50907852],}}
{{"COL;3": 
[48.12359525,2.751809433,64.45305108,40.97279762,46.3506431,68.57561523,67.52866381,18.70752371,44.86086801,8.42884315],}}
{{"COL{4": 
[25.23141131,65.20640894,56.83503264,21.46097087,59.22963758,99.55784318,8.02616508,75.29924438,3.911268106,90.1820556],}}
{{"COL}5": 
[37.00662369,82.24478025,27.89576774,9.549598639,46.92239754,10.48954042,81.71312268,49.991685,43.78556399,79.00133828],}}
{{"COL(6": 
[71.21354798,82.33860851,12.88393027,23.47301417,76.36836392,18.43024893,51.48770487,93.20889954,72.66516434,18.07311939],}}
{{"COL)7": 
[68.00032082,39.91265109,83.47701751,42.71072597,33.54784094,94.63751895,3.364241739,0.792257736,78.63395232,70.8626348],}}
{{"COL\n8": 
[77.80604836,61.08923308,70.70871195,99.33277829,79.77837072,56.28812485,34.03977847,13.40720489,87.71281052,64.80060217],}}
{{"COL=9": 
[60.00505851,46.51367893,87.1346726,7.202332939,49.50378799,56.70949031,99.39792697,52.08074715,18.25891755,67.88110289],}}
{{"COL\t10": 
[78.60259718,96.87558507,20.04134901,80.46408956,69.97610739,42.96954652,22.45733464,32.00411095,52.83023296,87.48870904],}}
{{}, columns = ["COL 
1","COL,2","COL;3","COL\{4","COL}5","COL(6","COL)7","COL\n8","COL=9","COL\t10"])}}

{{df.to_parquet('invalid_columns_double.parquet')}}

 

Here is a link to my databricks notebook

[https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5940072345564347/3863439224328194/623184285031795/latest.html]

> Parquet with invalid chars on column name reads double as null when a clean 
> schema is applied
> -
>
> Key: SPARK-34751
> URL: https://issues.apache.org/jira/browse/SPARK-34751
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>

[jira] [Comment Edited] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-16 Thread Nivas Umapathy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303001#comment-17303001
 ] 

Nivas Umapathy edited comment on SPARK-34751 at 3/17/21, 12:52 AM:
---

the schema is extracted from the same file, before materializing the data

df = 
glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')

                       ^

by schema I meant this. The file was written out from pandas dataframe.

It was written out using this

{{}}

{{import pandas as pd}}

{{df = pd.DataFrame(}}
{{data = {}}
{{"COL 1": 
[87.0921538,41.26487033,1.731626487,99.02779887,80.750347,62.89799664,84.27772144,12.78995399,58.13994625,54.51677768],}}
{{"COL,2": 
[28.431596,13.50141322,28.60878912,50.59429628,53.77345338,3.278319754,89.88524435,57.29173215,34.75955608,22.50907852],}}
{{"COL;3": 
[48.12359525,2.751809433,64.45305108,40.97279762,46.3506431,68.57561523,67.52866381,18.70752371,44.86086801,8.42884315],}}
{{"COL{4": 
[25.23141131,65.20640894,56.83503264,21.46097087,59.22963758,99.55784318,8.02616508,75.29924438,3.911268106,90.1820556],}}
{{"COL}5": 
[37.00662369,82.24478025,27.89576774,9.549598639,46.92239754,10.48954042,81.71312268,49.991685,43.78556399,79.00133828],}}
{{"COL(6": 
[71.21354798,82.33860851,12.88393027,23.47301417,76.36836392,18.43024893,51.48770487,93.20889954,72.66516434,18.07311939],}}
{{"COL)7": 
[68.00032082,39.91265109,83.47701751,42.71072597,33.54784094,94.63751895,3.364241739,0.792257736,78.63395232,70.8626348],}}
{{"COL\n8": 
[77.80604836,61.08923308,70.70871195,99.33277829,79.77837072,56.28812485,34.03977847,13.40720489,87.71281052,64.80060217],}}
{{"COL=9": 
[60.00505851,46.51367893,87.1346726,7.202332939,49.50378799,56.70949031,99.39792697,52.08074715,18.25891755,67.88110289],}}
{{"COL\t10": 
[78.60259718,96.87558507,20.04134901,80.46408956,69.97610739,42.96954652,22.45733464,32.00411095,52.83023296,87.48870904],}}
{{}, columns = ["COL 
1","COL,2","COL;3","COL\{4","COL}5","COL(6","COL)7","COL\n8","COL=9","COL\t10"])}}

{{df.to_parquet('invalid_columns_double.parquet')}}

 

Here is a link to my databricks notebook

[https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5940072345564347/3863439224328194/623184285031795/latest.html]


was (Author: toocoolblue2000):
the schema is extracted from the same file, before materializing the data

df = 
glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')

                       ^

by schema I meant this. The file was written out from pandas dataframe.

 

Here is a link to my databricks notebook

https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5940072345564347/3863439224328194/623184285031795/latest.html

> Parquet with invalid chars on column name reads double as null when a clean 
> schema is applied
> -
>
> Key: SPARK-34751
> URL: https://issues.apache.org/jira/browse/SPARK-34751
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.3, 3.1.1
> Environment: Pyspark 2.4.3
> AWS Glue Dev Endpoint EMR
>Reporter: Nivas Umapathy
>Priority: Major
> Attachments: invalid_columns_double.parquet
>
>
> I have a parquet file that has data with invalid column names on it. 
> [#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
> file attached with this ticket.
> I tried to load this file with 
> {{df = glue_context.read.parquet('invalid_columns_double.parquet')}}
> {{df = df.withColumnRenamed('COL 1', 'COL_1')}}
> {{df = df.withColumnRenamed('COL,2', 'COL_2')}}
> {{df = df.withColumnRenamed('COL;3', 'COL_3') }}
> and so on.
> Now if i call
> {{df.show()}}
> it throws this exception that is still pointing to the old column name.
>  {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
> invalid character(s) among " ,;{}()}}
> {{n}}
>  {{t=". Please use alias to rename it.;'}}
>  
> When i read about it in some blogs, there was suggestion to re-read the same 
> parquet with new schema applied. So i did 
> {{df = 
> glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}
>  
> and it works, but all the data in the dataframe are null. The same works for 
> String datatypes
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-16 Thread Nivas Umapathy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303001#comment-17303001
 ] 

Nivas Umapathy commented on SPARK-34751:


the schema is extracted from the same file, before materializing the data

df = 
glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')

                       ^

by schema I meant this. The file was written out from pandas dataframe.

 

Here is a link to my databricks notebook

https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5940072345564347/3863439224328194/623184285031795/latest.html

> Parquet with invalid chars on column name reads double as null when a clean 
> schema is applied
> -
>
> Key: SPARK-34751
> URL: https://issues.apache.org/jira/browse/SPARK-34751
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.3, 3.1.1
> Environment: Pyspark 2.4.3
> AWS Glue Dev Endpoint EMR
>Reporter: Nivas Umapathy
>Priority: Major
> Attachments: invalid_columns_double.parquet
>
>
> I have a parquet file that has data with invalid column names on it. 
> [#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
> file attached with this ticket.
> I tried to load this file with 
> {{df = glue_context.read.parquet('invalid_columns_double.parquet')}}
> {{df = df.withColumnRenamed('COL 1', 'COL_1')}}
> {{df = df.withColumnRenamed('COL,2', 'COL_2')}}
> {{df = df.withColumnRenamed('COL;3', 'COL_3') }}
> and so on.
> Now if i call
> {{df.show()}}
> it throws this exception that is still pointing to the old column name.
>  {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
> invalid character(s) among " ,;{}()}}
> {{n}}
>  {{t=". Please use alias to rename it.;'}}
>  
> When i read about it in some blogs, there was suggestion to re-read the same 
> parquet with new schema applied. So i did 
> {{df = 
> glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}
>  
> and it works, but all the data in the dataframe are null. The same works for 
> String datatypes
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-16 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302988#comment-17302988
 ] 

Takeshi Yamamuro commented on SPARK-34751:
--

Could you describe more to reproduce the issue? what's a schema of the parquet 
file and how did you write the parquet file, brabrabra.

> Parquet with invalid chars on column name reads double as null when a clean 
> schema is applied
> -
>
> Key: SPARK-34751
> URL: https://issues.apache.org/jira/browse/SPARK-34751
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.3, 3.1.1
> Environment: Pyspark 2.4.3
> AWS Glue Dev Endpoint EMR
>Reporter: Nivas Umapathy
>Priority: Major
> Attachments: invalid_columns_double.parquet
>
>
> I have a parquet file that has data with invalid column names on it. 
> [#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
> file attached with this ticket.
> I tried to load this file with 
> {{df = glue_context.read.parquet('invalid_columns_double.parquet')}}
> {{df = df.withColumnRenamed('COL 1', 'COL_1')}}
> {{df = df.withColumnRenamed('COL,2', 'COL_2')}}
> {{df = df.withColumnRenamed('COL;3', 'COL_3') }}
> and so on.
> Now if i call
> {{df.show()}}
> it throws this exception that is still pointing to the old column name.
>  {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
> invalid character(s) among " ,;{}()}}
> {{n}}
>  {{t=". Please use alias to rename it.;'}}
>  
> When i read about it in some blogs, there was suggestion to re-read the same 
> parquet with new schema applied. So i did 
> {{df = 
> glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}
>  
> and it works, but all the data in the dataframe are null. The same works for 
> String datatypes
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-16 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-34751:
-
Affects Version/s: 3.1.1

> Parquet with invalid chars on column name reads double as null when a clean 
> schema is applied
> -
>
> Key: SPARK-34751
> URL: https://issues.apache.org/jira/browse/SPARK-34751
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.3, 3.1.1
> Environment: Pyspark 2.4.3
> AWS Glue Dev Endpoint EMR
>Reporter: Nivas Umapathy
>Priority: Major
> Attachments: invalid_columns_double.parquet
>
>
> I have a parquet file that has data with invalid column names on it. 
> [#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
> file attached with this ticket.
> I tried to load this file with 
> {{df = glue_context.read.parquet('invalid_columns_double.parquet')}}
> {{df = df.withColumnRenamed('COL 1', 'COL_1')}}
> {{df = df.withColumnRenamed('COL,2', 'COL_2')}}
> {{df = df.withColumnRenamed('COL;3', 'COL_3') }}
> and so on.
> Now if i call
> {{df.show()}}
> it throws this exception that is still pointing to the old column name.
>  {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
> invalid character(s) among " ,;{}()}}
> {{n}}
>  {{t=". Please use alias to rename it.;'}}
>  
> When i read about it in some blogs, there was suggestion to re-read the same 
> parquet with new schema applied. So i did 
> {{df = 
> glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}
>  
> and it works, but all the data in the dataframe are null. The same works for 
> String datatypes
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-16 Thread Nivas Umapathy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302672#comment-17302672
 ] 

Nivas Umapathy edited comment on SPARK-34751 at 3/16/21, 7:56 PM:
--

I ran it on 3.1.1 and it still has the same problem. All double column values 
are null


was (Author: toocoolblue2000):
I ran it on 3.1.1 and it still has the same problem. All column values are null

> Parquet with invalid chars on column name reads double as null when a clean 
> schema is applied
> -
>
> Key: SPARK-34751
> URL: https://issues.apache.org/jira/browse/SPARK-34751
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.3
> Environment: Pyspark 2.4.3
> AWS Glue Dev Endpoint EMR
>Reporter: Nivas Umapathy
>Priority: Major
> Attachments: invalid_columns_double.parquet
>
>
> I have a parquet file that has data with invalid column names on it. 
> [#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
> file attached with this ticket.
> I tried to load this file with 
> {{df = glue_context.read.parquet('invalid_columns_double.parquet')}}
> {{df = df.withColumnRenamed('COL 1', 'COL_1')}}
> {{df = df.withColumnRenamed('COL,2', 'COL_2')}}
> {{df = df.withColumnRenamed('COL;3', 'COL_3') }}
> and so on.
> Now if i call
> {{df.show()}}
> it throws this exception that is still pointing to the old column name.
>  {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
> invalid character(s) among " ,;{}()}}
> {{n}}
>  {{t=". Please use alias to rename it.;'}}
>  
> When i read about it in some blogs, there was suggestion to re-read the same 
> parquet with new schema applied. So i did 
> {{df = 
> glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}
>  
> and it works, but all the data in the dataframe are null. The same works for 
> String datatypes
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25935) Prevent null rows from JSON parser

2021-03-16 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302796#comment-17302796
 ] 

Dongjoon Hyun commented on SPARK-25935:
---

Please let me know if the previous status, `Resolution = Won't Fix` and `Fix 
Version = 3.0.0`, was correct.

> Prevent null rows from JSON parser
> --
>
> Key: SPARK-25935
> URL: https://issues.apache.org/jira/browse/SPARK-25935
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Max Gekk
>Priority: Minor
>
> Currently, JSON parser can produce nulls if it cannot detect any valid JSON 
> token on the root level, see 
> https://github.com/apache/spark/blob/4d6704db4d490bd1830ed3c757525f41058523e0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala#L402
>  . As a consequence of that, the from_json() function can produce null in the 
> PERMISSIVE mode. To prevent that, need to throw an exception which should 
> treat as a bad record and handled according specified mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25935) Prevent null rows from JSON parser

2021-03-16 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302792#comment-17302792
 ] 

Dongjoon Hyun edited comment on SPARK-25935 at 3/16/21, 6:38 PM:
-

I removed the `Fix Version = 3.0.0` from this issue because this was reverted 
and resolved as `Won't Fix`.


was (Author: dongjoon):
I removed the `Fix Version = 3.0.0` from this issue because this is reverted 
and resolved as `Won't Fix`.

> Prevent null rows from JSON parser
> --
>
> Key: SPARK-25935
> URL: https://issues.apache.org/jira/browse/SPARK-25935
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Max Gekk
>Priority: Minor
>
> Currently, JSON parser can produce nulls if it cannot detect any valid JSON 
> token on the root level, see 
> https://github.com/apache/spark/blob/4d6704db4d490bd1830ed3c757525f41058523e0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala#L402
>  . As a consequence of that, the from_json() function can produce null in the 
> PERMISSIVE mode. To prevent that, need to throw an exception which should 
> treat as a bad record and handled according specified mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25935) Prevent null rows from JSON parser

2021-03-16 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302792#comment-17302792
 ] 

Dongjoon Hyun commented on SPARK-25935:
---

I removed the `Fix Version = 3.0.0` from this issue because this is reverted 
and resolved as `Won't Fix`.

> Prevent null rows from JSON parser
> --
>
> Key: SPARK-25935
> URL: https://issues.apache.org/jira/browse/SPARK-25935
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Max Gekk
>Priority: Minor
>
> Currently, JSON parser can produce nulls if it cannot detect any valid JSON 
> token on the root level, see 
> https://github.com/apache/spark/blob/4d6704db4d490bd1830ed3c757525f41058523e0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala#L402
>  . As a consequence of that, the from_json() function can produce null in the 
> PERMISSIVE mode. To prevent that, need to throw an exception which should 
> treat as a bad record and handled according specified mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25935) Prevent null rows from JSON parser

2021-03-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25935:
--
Fix Version/s: (was: 3.0.0)

> Prevent null rows from JSON parser
> --
>
> Key: SPARK-25935
> URL: https://issues.apache.org/jira/browse/SPARK-25935
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Max Gekk
>Priority: Minor
>
> Currently, JSON parser can produce nulls if it cannot detect any valid JSON 
> token on the root level, see 
> https://github.com/apache/spark/blob/4d6704db4d490bd1830ed3c757525f41058523e0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala#L402
>  . As a consequence of that, the from_json() function can produce null in the 
> PERMISSIVE mode. To prevent that, need to throw an exception which should 
> treat as a bad record and handled according specified mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34738) Upgrade Minikube and kubernetes cluster version on Jenkins

2021-03-16 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302777#comment-17302777
 ] 

Shane Knapp commented on SPARK-34738:
-

i'll be doing this next tuesday (3/23) and teaching one of my sysadmins to help 
out.

> Upgrade Minikube and kubernetes cluster version on Jenkins
> --
>
> Key: SPARK-34738
> URL: https://issues.apache.org/jira/browse/SPARK-34738
> Project: Spark
>  Issue Type: Task
>  Components: jenkins, Kubernetes
>Affects Versions: 3.2.0
>Reporter: Attila Zsolt Piros
>Assignee: Shane Knapp
>Priority: Major
>
> [~shaneknapp] as we discussed [on the mailing 
> list|http://apache-spark-developers-list.1001551.n3.nabble.com/minikube-and-kubernetes-cluster-versions-for-integration-testing-td30856.html]
>  Minikube can be upgraded to the latest (v1.18.1) and kubernetes version 
> should be v1.17.3 (`minikube config set kubernetes-version v1.17.3`).
> [Here|https://github.com/apache/spark/pull/31829] is my PR which uses a new 
> method to configure the kubernetes client. Thanks in advance to use it for 
> testing on the Jenkins after the Minikube version is updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34764) Propagate reason for executor loss to the UI

2021-03-16 Thread Holden Karau (Jira)
Holden Karau created SPARK-34764:


 Summary: Propagate reason for executor loss to the UI
 Key: SPARK-34764
 URL: https://issues.apache.org/jira/browse/SPARK-34764
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes, Spark Core
Affects Versions: 3.2.0
Reporter: Holden Karau


When the external cluster manager terminates an executor we should propagate 
this information to the UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34761) Add a day-time interval to a timestamp

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302731#comment-17302731
 ] 

Apache Spark commented on SPARK-34761:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31855

> Add a day-time interval to a timestamp
> --
>
> Key: SPARK-34761
> URL: https://issues.apache.org/jira/browse/SPARK-34761
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Support adding of DayTimeIntervalType values to TIMESTAMP values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34761) Add a day-time interval to a timestamp

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34761:


Assignee: Max Gekk  (was: Apache Spark)

> Add a day-time interval to a timestamp
> --
>
> Key: SPARK-34761
> URL: https://issues.apache.org/jira/browse/SPARK-34761
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Support adding of DayTimeIntervalType values to TIMESTAMP values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34761) Add a day-time interval to a timestamp

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34761:


Assignee: Apache Spark  (was: Max Gekk)

> Add a day-time interval to a timestamp
> --
>
> Key: SPARK-34761
> URL: https://issues.apache.org/jira/browse/SPARK-34761
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> Support adding of DayTimeIntervalType values to TIMESTAMP values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34762) Many PR's Scala 2.13 build action failed

2021-03-16 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302729#comment-17302729
 ] 

Dongjoon Hyun commented on SPARK-34762:
---

Thank you for reporting, [~LuciferYang] . There was GitHub Action outage and 
BinTray outage yesterday. I guess it's recovered now. Do you see the failures 
still?

> Many PR's Scala 2.13 build action failed
> 
>
> Key: SPARK-34762
> URL: https://issues.apache.org/jira/browse/SPARK-34762
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Major
>
> PR with Scala 2.13 build failure includes 
>  * [https://github.com/apache/spark/pull/31849]
>  * [https://github.com/apache/spark/pull/31848]
>  * [https://github.com/apache/spark/pull/31844]
>  * [https://github.com/apache/spark/pull/31843]
>  * https://github.com/apache/spark/pull/31841
> {code:java}
> [error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1:
>   error: package org.apache.commons.cli does not exist
> 1278[error] import org.apache.commons.cli.GnuParser;
> 1279[error]  ^
> 1280[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1:
>   error: cannot find symbol
> 1281[error] private final Options options = new Options();
> 1282[error]   ^  symbol:   class Options
> 1283[error]   location: class ServerOptionsProcessor
> 1284[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:177:1:
>   error: package org.apache.commons.cli does not exist
> 1285[error] private org.apache.commons.cli.CommandLine commandLine;
> 1286[error]   ^
> 1287[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:255:1:
>   error: cannot find symbol
> 1288[error] HelpOptionExecutor(String serverName, Options options) {
> 1289[error]   ^  symbol:   class 
> Options
> 1290[error]   location: class HelpOptionExecutor
> 1291[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1:
>   error: cannot find symbol
> 1292[error] private final Options options = new Options();
> 1293[error] ^  symbol:   class Options
> 1294[error]   location: class ServerOptionsProcessor
> 1295[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:185:1:
>   error: cannot find symbol
> 1296[error]   options.addOption(OptionBuilder
> 1297[error] ^  symbol:   variable OptionBuilder
> 1298[error]   location: class ServerOptionsProcessor
> 1299[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:192:1:
>   error: cannot find symbol
> 1300[error]   options.addOption(new Option("H", "help", false, "Print 
> help information"));
> 1301[error] ^  symbol:   class Option
> 1302[error]   location: class ServerOptionsProcessor
> 1303[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:197:1:
>   error: cannot find symbol
> 1304[error] commandLine = new GnuParser().parse(options, argv);
> 1305[error]   ^  symbol:   class GnuParser
> 1306[error]   location: class ServerOptionsProcessor
> 1307[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:211:1:
>   error: cannot find symbol
> 1308[error]   } catch (ParseException e) {
> 1309[error]^  symbol:   class ParseException
> 1310[error]   location: class ServerOptionsProcessor
> 1311[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:262:1:
>   error: cannot find symbol
> 1312[error]   new HelpFormatter().printHelp(serverName, options);
> 1313[error]   ^  symbol:   class HelpFormatter
> 1314[error]   location: class HelpOptionExecutor
> 1315[error] Note: Some input files use or override a deprecated API.
> 1316[error] Note: Recompile with -Xlint:deprecation for details.
> 1317[error] 16 errors
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Updated] (SPARK-33428) conv UDF returns incorrect value

2021-03-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33428:
--
Fix Version/s: (was: 3.2.0)

> conv UDF returns incorrect value
> 
>
> Key: SPARK-33428
> URL: https://issues.apache.org/jira/browse/SPARK-33428
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {noformat}
> spark-sql> select java_method('scala.math.BigInt', 'apply', 
> 'c8dcdfb41711fc9a1f17928001d7fd61', 16);
> 266992441711411603393340504520074460513
> spark-sql> select conv('c8dcdfb41711fc9a1f17928001d7fd61', 16, 10);
> 18446744073709551615
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-33428) conv UDF returns incorrect value

2021-03-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-33428:
---
  Assignee: (was: angerszhu)

> conv UDF returns incorrect value
> 
>
> Key: SPARK-33428
> URL: https://issues.apache.org/jira/browse/SPARK-33428
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> How to reproduce this issue:
> {noformat}
> spark-sql> select java_method('scala.math.BigInt', 'apply', 
> 'c8dcdfb41711fc9a1f17928001d7fd61', 16);
> 266992441711411603393340504520074460513
> spark-sql> select conv('c8dcdfb41711fc9a1f17928001d7fd61', 16, 10);
> 18446744073709551615
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33428) conv UDF returns incorrect value

2021-03-16 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302694#comment-17302694
 ] 

Dongjoon Hyun commented on SPARK-33428:
---

The commit is reverted.

> conv UDF returns incorrect value
> 
>
> Key: SPARK-33428
> URL: https://issues.apache.org/jira/browse/SPARK-33428
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {noformat}
> spark-sql> select java_method('scala.math.BigInt', 'apply', 
> 'c8dcdfb41711fc9a1f17928001d7fd61', 16);
> 266992441711411603393340504520074460513
> spark-sql> select conv('c8dcdfb41711fc9a1f17928001d7fd61', 16, 10);
> 18446744073709551615
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-16 Thread Nivas Umapathy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302672#comment-17302672
 ] 

Nivas Umapathy commented on SPARK-34751:


I ran it on 3.1.1 and it still has the same problem. All column values are null

> Parquet with invalid chars on column name reads double as null when a clean 
> schema is applied
> -
>
> Key: SPARK-34751
> URL: https://issues.apache.org/jira/browse/SPARK-34751
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.3
> Environment: Pyspark 2.4.3
> AWS Glue Dev Endpoint EMR
>Reporter: Nivas Umapathy
>Priority: Major
> Attachments: invalid_columns_double.parquet
>
>
> I have a parquet file that has data with invalid column names on it. 
> [#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
> file attached with this ticket.
> I tried to load this file with 
> {{df = glue_context.read.parquet('invalid_columns_double.parquet')}}
> {{df = df.withColumnRenamed('COL 1', 'COL_1')}}
> {{df = df.withColumnRenamed('COL,2', 'COL_2')}}
> {{df = df.withColumnRenamed('COL;3', 'COL_3') }}
> and so on.
> Now if i call
> {{df.show()}}
> it throws this exception that is still pointing to the old column name.
>  {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
> invalid character(s) among " ,;{}()}}
> {{n}}
>  {{t=". Please use alias to rename it.;'}}
>  
> When i read about it in some blogs, there was suggestion to re-read the same 
> parquet with new schema applied. So i did 
> {{df = 
> glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}
>  
> and it works, but all the data in the dataframe are null. The same works for 
> String datatypes
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34763) col(), $"" and df("name") should handle quoted column names properly.

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302622#comment-17302622
 ] 

Apache Spark commented on SPARK-34763:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/31854

> col(), $"" and df("name") should handle quoted column names properly.
> ---
>
> Key: SPARK-34763
> URL: https://issues.apache.org/jira/browse/SPARK-34763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Quoted column names like `a``b.c` cannot be represented with col(), $"" 
> and df("") because they don't handle such column names properly.
> For example, if we have a following DataFrame.
> {code}
> val df1 = spark.sql("SELECT 'col1' AS `a``b.c`")
> {code}
> For the DataFrame, this query is successfully executed.
> {code}
> scala> df1.selectExpr("`a``b.c`").show
> +-+
> |a`b.c|
> +-+
> | col1|
> +-+
> {code}
> But the following query will fail because df1("`a``b.c`") throws an exception.
> {code}
> scala> df1.select(df1("`a``b.c`")).show
> org.apache.spark.sql.AnalysisException: syntax error in attribute name: 
> `a``b.c`;
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:152)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:162)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121)
>   at org.apache.spark.sql.Dataset.resolve(Dataset.scala:221)
>   at org.apache.spark.sql.Dataset.col(Dataset.scala:1274)
>   at org.apache.spark.sql.Dataset.apply(Dataset.scala:1241)
>   ... 49 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34763) col(), $"" and df("name") should handle quoted column names properly.

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302621#comment-17302621
 ] 

Apache Spark commented on SPARK-34763:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/31854

> col(), $"" and df("name") should handle quoted column names properly.
> ---
>
> Key: SPARK-34763
> URL: https://issues.apache.org/jira/browse/SPARK-34763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Quoted column names like `a``b.c` cannot be represented with col(), $"" 
> and df("") because they don't handle such column names properly.
> For example, if we have a following DataFrame.
> {code}
> val df1 = spark.sql("SELECT 'col1' AS `a``b.c`")
> {code}
> For the DataFrame, this query is successfully executed.
> {code}
> scala> df1.selectExpr("`a``b.c`").show
> +-+
> |a`b.c|
> +-+
> | col1|
> +-+
> {code}
> But the following query will fail because df1("`a``b.c`") throws an exception.
> {code}
> scala> df1.select(df1("`a``b.c`")).show
> org.apache.spark.sql.AnalysisException: syntax error in attribute name: 
> `a``b.c`;
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:152)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:162)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121)
>   at org.apache.spark.sql.Dataset.resolve(Dataset.scala:221)
>   at org.apache.spark.sql.Dataset.col(Dataset.scala:1274)
>   at org.apache.spark.sql.Dataset.apply(Dataset.scala:1241)
>   ... 49 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34763) col(), $"" and df("name") should handle quoted column names properly.

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34763:


Assignee: Kousuke Saruta  (was: Apache Spark)

> col(), $"" and df("name") should handle quoted column names properly.
> ---
>
> Key: SPARK-34763
> URL: https://issues.apache.org/jira/browse/SPARK-34763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Quoted column names like `a``b.c` cannot be represented with col(), $"" 
> and df("") because they don't handle such column names properly.
> For example, if we have a following DataFrame.
> {code}
> val df1 = spark.sql("SELECT 'col1' AS `a``b.c`")
> {code}
> For the DataFrame, this query is successfully executed.
> {code}
> scala> df1.selectExpr("`a``b.c`").show
> +-+
> |a`b.c|
> +-+
> | col1|
> +-+
> {code}
> But the following query will fail because df1("`a``b.c`") throws an exception.
> {code}
> scala> df1.select(df1("`a``b.c`")).show
> org.apache.spark.sql.AnalysisException: syntax error in attribute name: 
> `a``b.c`;
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:152)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:162)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121)
>   at org.apache.spark.sql.Dataset.resolve(Dataset.scala:221)
>   at org.apache.spark.sql.Dataset.col(Dataset.scala:1274)
>   at org.apache.spark.sql.Dataset.apply(Dataset.scala:1241)
>   ... 49 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34763) col(), $"" and df("name") should handle quoted column names properly.

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34763:


Assignee: Apache Spark  (was: Kousuke Saruta)

> col(), $"" and df("name") should handle quoted column names properly.
> ---
>
> Key: SPARK-34763
> URL: https://issues.apache.org/jira/browse/SPARK-34763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Major
>
> Quoted column names like `a``b.c` cannot be represented with col(), $"" 
> and df("") because they don't handle such column names properly.
> For example, if we have a following DataFrame.
> {code}
> val df1 = spark.sql("SELECT 'col1' AS `a``b.c`")
> {code}
> For the DataFrame, this query is successfully executed.
> {code}
> scala> df1.selectExpr("`a``b.c`").show
> +-+
> |a`b.c|
> +-+
> | col1|
> +-+
> {code}
> But the following query will fail because df1("`a``b.c`") throws an exception.
> {code}
> scala> df1.select(df1("`a``b.c`")).show
> org.apache.spark.sql.AnalysisException: syntax error in attribute name: 
> `a``b.c`;
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:152)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:162)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121)
>   at org.apache.spark.sql.Dataset.resolve(Dataset.scala:221)
>   at org.apache.spark.sql.Dataset.col(Dataset.scala:1274)
>   at org.apache.spark.sql.Dataset.apply(Dataset.scala:1241)
>   ... 49 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34763) col(), $"" and df("name") should handle quoted column names properly.

2021-03-16 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-34763:
---
Summary: col(), $"" and df("name") should handle quoted column names 
properly.  (was: col(), $"" and df("name") should handle quoted column 
name properly.)

> col(), $"" and df("name") should handle quoted column names properly.
> ---
>
> Key: SPARK-34763
> URL: https://issues.apache.org/jira/browse/SPARK-34763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Quoted column names like `a``b.c` cannot be represented with col(), $"" 
> and df("") because they don't handle such column names properly.
> For example, if we have a following DataFrame.
> {code}
> val df1 = spark.sql("SELECT 'col1' AS `a``b.c`")
> {code}
> For the DataFrame, this query is successfully executed.
> {code}
> scala> df1.selectExpr("`a``b.c`").show
> +-+
> |a`b.c|
> +-+
> | col1|
> +-+
> {code}
> But the following query will fail because df1("`a``b.c`") throws an exception.
> {code}
> scala> df1.select(df1("`a``b.c`")).show
> org.apache.spark.sql.AnalysisException: syntax error in attribute name: 
> `a``b.c`;
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:152)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:162)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121)
>   at org.apache.spark.sql.Dataset.resolve(Dataset.scala:221)
>   at org.apache.spark.sql.Dataset.col(Dataset.scala:1274)
>   at org.apache.spark.sql.Dataset.apply(Dataset.scala:1241)
>   ... 49 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34763) col(), $"" and df("name") should handle quoted column name properly.

2021-03-16 Thread Kousuke Saruta (Jira)
Kousuke Saruta created SPARK-34763:
--

 Summary: col(), $"" and df("name") should handle quoted 
column name properly.
 Key: SPARK-34763
 URL: https://issues.apache.org/jira/browse/SPARK-34763
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1, 3.0.2, 2.4.7, 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Quoted column names like `a``b.c` cannot be represented with col(), $"" 
and df("") because they don't handle such column names properly.

For example, if we have a following DataFrame.
{code}
val df1 = spark.sql("SELECT 'col1' AS `a``b.c`")
{code}

For the DataFrame, this query is successfully executed.
{code}
scala> df1.selectExpr("`a``b.c`").show
+-+
|a`b.c|
+-+
| col1|
+-+
{code}

But the following query will fail because df1("`a``b.c`") throws an exception.
{code}
scala> df1.select(df1("`a``b.c`")).show
org.apache.spark.sql.AnalysisException: syntax error in attribute name: 
`a``b.c`;
  at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:152)
  at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:162)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121)
  at org.apache.spark.sql.Dataset.resolve(Dataset.scala:221)
  at org.apache.spark.sql.Dataset.col(Dataset.scala:1274)
  at org.apache.spark.sql.Dataset.apply(Dataset.scala:1241)
  ... 49 elided
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34504) Avoid unnecessary view resolving and remove the `performCheck` flag

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302584#comment-17302584
 ] 

Apache Spark commented on SPARK-34504:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/31853

> Avoid unnecessary view resolving and remove the `performCheck` flag
> ---
>
> Key: SPARK-34504
> URL: https://issues.apache.org/jira/browse/SPARK-34504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Linhong Liu
>Priority: Major
>
> in SPARK-34490, I added a `performCheck` flag to skip analysis check when 
> resolving views. This is due to some view resolution is unnecessary. So we 
> can avoid these unnecessary view resolution and remove the `performCheck` 
> flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34504) Avoid unnecessary view resolving and remove the `performCheck` flag

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302583#comment-17302583
 ] 

Apache Spark commented on SPARK-34504:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/31853

> Avoid unnecessary view resolving and remove the `performCheck` flag
> ---
>
> Key: SPARK-34504
> URL: https://issues.apache.org/jira/browse/SPARK-34504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Linhong Liu
>Priority: Major
>
> in SPARK-34490, I added a `performCheck` flag to skip analysis check when 
> resolving views. This is due to some view resolution is unnecessary. So we 
> can avoid these unnecessary view resolution and remove the `performCheck` 
> flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34504) Avoid unnecessary view resolving and remove the `performCheck` flag

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34504:


Assignee: (was: Apache Spark)

> Avoid unnecessary view resolving and remove the `performCheck` flag
> ---
>
> Key: SPARK-34504
> URL: https://issues.apache.org/jira/browse/SPARK-34504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Linhong Liu
>Priority: Major
>
> in SPARK-34490, I added a `performCheck` flag to skip analysis check when 
> resolving views. This is due to some view resolution is unnecessary. So we 
> can avoid these unnecessary view resolution and remove the `performCheck` 
> flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34504) Avoid unnecessary view resolving and remove the `performCheck` flag

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34504:


Assignee: Apache Spark

> Avoid unnecessary view resolving and remove the `performCheck` flag
> ---
>
> Key: SPARK-34504
> URL: https://issues.apache.org/jira/browse/SPARK-34504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Linhong Liu
>Assignee: Apache Spark
>Priority: Major
>
> in SPARK-34490, I added a `performCheck` flag to skip analysis check when 
> resolving views. This is due to some view resolution is unnecessary. So we 
> can avoid these unnecessary view resolution and remove the `performCheck` 
> flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34762) Many PR's Scala 2.13 build action failed

2021-03-16 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-34762:
-
Description: 
PR with Scala 2.13 build failure includes 
 * [https://github.com/apache/spark/pull/31849]
 * [https://github.com/apache/spark/pull/31848]
 * [https://github.com/apache/spark/pull/31844]
 * [https://github.com/apache/spark/pull/31843]
 * https://github.com/apache/spark/pull/31841

{code:java}
[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1:
  error: package org.apache.commons.cli does not exist
1278[error] import org.apache.commons.cli.GnuParser;
1279[error]  ^
1280[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1:
  error: cannot find symbol
1281[error] private final Options options = new Options();
1282[error]   ^  symbol:   class Options
1283[error]   location: class ServerOptionsProcessor
1284[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:177:1:
  error: package org.apache.commons.cli does not exist
1285[error] private org.apache.commons.cli.CommandLine commandLine;
1286[error]   ^
1287[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:255:1:
  error: cannot find symbol
1288[error] HelpOptionExecutor(String serverName, Options options) {
1289[error]   ^  symbol:   class Options
1290[error]   location: class HelpOptionExecutor
1291[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1:
  error: cannot find symbol
1292[error] private final Options options = new Options();
1293[error] ^  symbol:   class Options
1294[error]   location: class ServerOptionsProcessor
1295[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:185:1:
  error: cannot find symbol
1296[error]   options.addOption(OptionBuilder
1297[error] ^  symbol:   variable OptionBuilder
1298[error]   location: class ServerOptionsProcessor
1299[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:192:1:
  error: cannot find symbol
1300[error]   options.addOption(new Option("H", "help", false, "Print help 
information"));
1301[error] ^  symbol:   class Option
1302[error]   location: class ServerOptionsProcessor
1303[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:197:1:
  error: cannot find symbol
1304[error] commandLine = new GnuParser().parse(options, argv);
1305[error]   ^  symbol:   class GnuParser
1306[error]   location: class ServerOptionsProcessor
1307[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:211:1:
  error: cannot find symbol
1308[error]   } catch (ParseException e) {
1309[error]^  symbol:   class ParseException
1310[error]   location: class ServerOptionsProcessor
1311[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:262:1:
  error: cannot find symbol
1312[error]   new HelpFormatter().printHelp(serverName, options);
1313[error]   ^  symbol:   class HelpFormatter
1314[error]   location: class HelpOptionExecutor
1315[error] Note: Some input files use or override a deprecated API.
1316[error] Note: Recompile with -Xlint:deprecation for details.
1317[error] 16 errors
{code}
 

  was:
PR with Scala 2.13 build failure includes 
 * [https://github.com/apache/spark/pull/31849]
 * [https://github.com/apache/spark/pull/31848]
 * [https://github.com/apache/spark/pull/31844]
 * [https://github.com/apache/spark/pull/31843]

{code:java}
[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1:
  error: package org.apache.commons.cli does not exist
1278[error] import org.apache.commons.cli.GnuParser;
1279[error]  ^
1280[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1:
  error: cannot find symbol
1281[error] private final Options options = new Options();
1282[error]   ^  symbol:   class Options
1283[error]   location: class ServerOptionsProcessor
1284[error] 

[jira] [Resolved] (SPARK-34680) Spark hangs when out of diskspace

2021-03-16 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-34680.
--
Resolution: Not A Problem

> Spark hangs when out of diskspace
> -
>
> Key: SPARK-34680
> URL: https://issues.apache.org/jira/browse/SPARK-34680
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.1
> Environment: Running Spark and Pyspark 3.1.1. with Hadoop 3.2.2 and 
> Koalas 1.6.0.
> Some environment variables:
> |Java Home|/usr/lib/jvm/java-11-openjdk-11.0.3.7-0.el7_6.x86_64|
> |Java Version|11.0.3 (Oracle Corporation)|
> |Scala Version|version 2.12.10|
>Reporter: Laurens
>Priority: Major
>
> Parsing a workflow using Koalas, I noticed a stage is hanging for 8 hours 
> already. I checked the logs and the last output is:
> {code:java}
> 21/03/09 13:50:31 ERROR TaskMemoryManager: error while calling spill() on 
> org.apache.spark.shuffle.sort.ShuffleExternalSorter@4127a515
> java.io.IOException: No space left on device
>  at java.base/java.io.FileOutputStream.writeBytes(Native Method)
>  at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354)
>  at 
> org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:59)
>  at 
> java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
>  at 
> java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
>  at 
> net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:223)
>  at net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:176)
>  at 
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:260)
>  at 
> org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:218)
>  at 
> org.apache.spark.shuffle.sort.ShuffleExternalSorter.spill(ShuffleExternalSorter.java:276)
>  at 
> org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:208)
>  at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:289)
>  at 
> org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:116)
>  at 
> org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:385)
>  at 
> org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:409)
>  at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:249)
>  at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:178)
>  at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>  at org.apache.spark.scheduler.Task.run(Task.scala:131)
>  at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:834)
>  Suppressed: java.io.IOException: No space left on device
>  at java.base/java.io.FileOutputStream.writeBytes(Native Method)
>  at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354)
>  at 
> org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:59)
>  at 
> java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
>  at 
> java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
>  at net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:243)
>  at 
> org.apache.spark.serializer.DummySerializerInstance$1.flush(DummySerializerInstance.java:50)
>  at 
> org.apache.spark.storage.DiskBlockObjectWriter.commitAndGet(DiskBlockObjectWriter.scala:173)
>  at 
> org.apache.spark.storage.DiskBlockObjectWriter.$anonfun$close$1(DiskBlockObjectWriter.scala:156)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
>  at 
> org.apache.spark.storage.DiskBlockObjectWriter.close(DiskBlockObjectWriter.scala:158)
>  at 
> org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:226)
>  ... 18 more
>  Suppressed: java.io.IOException: No space left on device
>  at java.base/java.io.FileOutputStream.writeBytes(Native Method)
>  at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354)
>  at 
> 

[jira] [Commented] (SPARK-34680) Spark hangs when out of diskspace

2021-03-16 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302578#comment-17302578
 ] 

Takeshi Yamamuro commented on SPARK-34680:
--

Not enough information to reproduce the issue, so I'll close this.

> Spark hangs when out of diskspace
> -
>
> Key: SPARK-34680
> URL: https://issues.apache.org/jira/browse/SPARK-34680
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.1
> Environment: Running Spark and Pyspark 3.1.1. with Hadoop 3.2.2 and 
> Koalas 1.6.0.
> Some environment variables:
> |Java Home|/usr/lib/jvm/java-11-openjdk-11.0.3.7-0.el7_6.x86_64|
> |Java Version|11.0.3 (Oracle Corporation)|
> |Scala Version|version 2.12.10|
>Reporter: Laurens
>Priority: Major
>
> Parsing a workflow using Koalas, I noticed a stage is hanging for 8 hours 
> already. I checked the logs and the last output is:
> {code:java}
> 21/03/09 13:50:31 ERROR TaskMemoryManager: error while calling spill() on 
> org.apache.spark.shuffle.sort.ShuffleExternalSorter@4127a515
> java.io.IOException: No space left on device
>  at java.base/java.io.FileOutputStream.writeBytes(Native Method)
>  at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354)
>  at 
> org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:59)
>  at 
> java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
>  at 
> java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
>  at 
> net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:223)
>  at net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:176)
>  at 
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:260)
>  at 
> org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:218)
>  at 
> org.apache.spark.shuffle.sort.ShuffleExternalSorter.spill(ShuffleExternalSorter.java:276)
>  at 
> org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:208)
>  at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:289)
>  at 
> org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:116)
>  at 
> org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:385)
>  at 
> org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:409)
>  at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:249)
>  at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:178)
>  at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>  at org.apache.spark.scheduler.Task.run(Task.scala:131)
>  at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:834)
>  Suppressed: java.io.IOException: No space left on device
>  at java.base/java.io.FileOutputStream.writeBytes(Native Method)
>  at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354)
>  at 
> org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:59)
>  at 
> java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
>  at 
> java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
>  at net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:243)
>  at 
> org.apache.spark.serializer.DummySerializerInstance$1.flush(DummySerializerInstance.java:50)
>  at 
> org.apache.spark.storage.DiskBlockObjectWriter.commitAndGet(DiskBlockObjectWriter.scala:173)
>  at 
> org.apache.spark.storage.DiskBlockObjectWriter.$anonfun$close$1(DiskBlockObjectWriter.scala:156)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
>  at 
> org.apache.spark.storage.DiskBlockObjectWriter.close(DiskBlockObjectWriter.scala:158)
>  at 
> org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:226)
>  ... 18 more
>  Suppressed: java.io.IOException: No space left on device
>  at java.base/java.io.FileOutputStream.writeBytes(Native Method)
>  at 

[jira] [Updated] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-16 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-34751:
-
Target Version/s:   (was: 2.4.3)

> Parquet with invalid chars on column name reads double as null when a clean 
> schema is applied
> -
>
> Key: SPARK-34751
> URL: https://issues.apache.org/jira/browse/SPARK-34751
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.3
> Environment: Pyspark 2.4.3
> AWS Glue Dev Endpoint EMR
>Reporter: Nivas Umapathy
>Priority: Major
> Fix For: 2.4.8
>
> Attachments: invalid_columns_double.parquet
>
>
> I have a parquet file that has data with invalid column names on it. 
> [#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
> file attached with this ticket.
> I tried to load this file with 
> {{df = glue_context.read.parquet('invalid_columns_double.parquet')}}
> {{df = df.withColumnRenamed('COL 1', 'COL_1')}}
> {{df = df.withColumnRenamed('COL,2', 'COL_2')}}
> {{df = df.withColumnRenamed('COL;3', 'COL_3') }}
> and so on.
> Now if i call
> {{df.show()}}
> it throws this exception that is still pointing to the old column name.
>  {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
> invalid character(s) among " ,;{}()}}
> {{n}}
>  {{t=". Please use alias to rename it.;'}}
>  
> When i read about it in some blogs, there was suggestion to re-read the same 
> parquet with new schema applied. So i did 
> {{df = 
> glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}
>  
> and it works, but all the data in the dataframe are null. The same works for 
> String datatypes
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-16 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-34751:
-
Fix Version/s: (was: 2.4.8)

> Parquet with invalid chars on column name reads double as null when a clean 
> schema is applied
> -
>
> Key: SPARK-34751
> URL: https://issues.apache.org/jira/browse/SPARK-34751
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.3
> Environment: Pyspark 2.4.3
> AWS Glue Dev Endpoint EMR
>Reporter: Nivas Umapathy
>Priority: Major
> Attachments: invalid_columns_double.parquet
>
>
> I have a parquet file that has data with invalid column names on it. 
> [#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
> file attached with this ticket.
> I tried to load this file with 
> {{df = glue_context.read.parquet('invalid_columns_double.parquet')}}
> {{df = df.withColumnRenamed('COL 1', 'COL_1')}}
> {{df = df.withColumnRenamed('COL,2', 'COL_2')}}
> {{df = df.withColumnRenamed('COL;3', 'COL_3') }}
> and so on.
> Now if i call
> {{df.show()}}
> it throws this exception that is still pointing to the old column name.
>  {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
> invalid character(s) among " ,;{}()}}
> {{n}}
>  {{t=". Please use alias to rename it.;'}}
>  
> When i read about it in some blogs, there was suggestion to re-read the same 
> parquet with new schema applied. So i did 
> {{df = 
> glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}
>  
> and it works, but all the data in the dataframe are null. The same works for 
> String datatypes
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34751) Parquet with invalid chars on column name reads double as null when a clean schema is applied

2021-03-16 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302573#comment-17302573
 ] 

Takeshi Yamamuro commented on SPARK-34751:
--

Could you try newer Spark, e.g., 2.4.7, 3.0.2, or 3.1.1?

> Parquet with invalid chars on column name reads double as null when a clean 
> schema is applied
> -
>
> Key: SPARK-34751
> URL: https://issues.apache.org/jira/browse/SPARK-34751
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.3
> Environment: Pyspark 2.4.3
> AWS Glue Dev Endpoint EMR
>Reporter: Nivas Umapathy
>Priority: Major
> Fix For: 2.4.8
>
> Attachments: invalid_columns_double.parquet
>
>
> I have a parquet file that has data with invalid column names on it. 
> [#Reference](https://issues.apache.org/jira/browse/SPARK-27442)  Here is the 
> file attached with this ticket.
> I tried to load this file with 
> {{df = glue_context.read.parquet('invalid_columns_double.parquet')}}
> {{df = df.withColumnRenamed('COL 1', 'COL_1')}}
> {{df = df.withColumnRenamed('COL,2', 'COL_2')}}
> {{df = df.withColumnRenamed('COL;3', 'COL_3') }}
> and so on.
> Now if i call
> {{df.show()}}
> it throws this exception that is still pointing to the old column name.
>  {{pyspark.sql.utils.AnalysisException: 'Attribute name "COL 1" contains 
> invalid character(s) among " ,;{}()}}
> {{n}}
>  {{t=". Please use alias to rename it.;'}}
>  
> When i read about it in some blogs, there was suggestion to re-read the same 
> parquet with new schema applied. So i did 
> {{df = 
> glue_context.read.schema(df.schema).parquet(}}{{'invalid_columns_double.parquet')}}
>  
> and it works, but all the data in the dataframe are null. The same works for 
> String datatypes
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34714) collect_list(struct()) fails when used with GROUP BY

2021-03-16 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302570#comment-17302570
 ] 

Takeshi Yamamuro commented on SPARK-34714:
--

I've checked that branch-3.1 still has this issue (NOTE: the current master 
does not).

> collect_list(struct()) fails when used with GROUP BY
> 
>
> Key: SPARK-34714
> URL: https://issues.apache.org/jira/browse/SPARK-34714
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
> Environment: Databricks Runtime 8.0
>Reporter: Lauri Koobas
>Priority: Major
>
> The following is failing in DBR8.0 / Spark 3.1.1, but works in earlier DBR 
> and Spark versions:
> {quote}with step_1 as (
>     select 'E' as name, named_struct('subfield', 1) as field_1
> )
> select name, collect_list(struct(field_1.subfield))
> from step_1
> group by 1
> {quote}
> Fails with the following error message:
> {quote}AnalysisException: cannot resolve 
> 'struct(step_1.`field_1`.`subfield`)' due to data type mismatch: Only 
> foldable string expressions are allowed to appear at odd position, got: 
> NamePlaceholder
> {quote}
> If you modify the query in any of the following ways then it still works::
>  * if you remove the field "name" and the "group by 1" part of the query
>  * if you remove the "struct()" from within the collect_list()
>  * if you use "named_struct()" instead of "struct()" within the collect_list()
> Similarly collect_set() is broken and possibly more related functions, but I 
> haven't done thorough testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34694) Improve Spark SQL Source Filter to allow pushdown of filters span multiple columns

2021-03-16 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-34694:
-
Component/s: (was: Spark Core)
 SQL

> Improve Spark SQL Source Filter to allow pushdown of filters span multiple 
> columns
> --
>
> Key: SPARK-34694
> URL: https://issues.apache.org/jira/browse/SPARK-34694
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1
>Reporter: Chen Zou
>Priority: Minor
>
> The current org.apache.spark.sql.sources.Filter abstract class only allows 
> pushdown of filters on single column or sum of products of multiple such 
> single-column filters.
> Filters on multiple columns cannot be pushed down through this Filter 
> subclass to source, e.g. from TPC-H benchmark on lineitem table:
> (l_commitdate#11 < l_receiptdate#12)
> (l_shipdate#10 < l_commitdate#11)
>  
> The current design probably originates from the point that columnar source 
> has a hard time supporting these cross-column filters. But with batching 
> implemented in columnar sources, they can still support cross-column filters. 
>  This issue tries to open up discussion on a more general Filter interface to 
> allow pushing down cross-column filters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34661) Replaces `OriginalType` with `LogicalTypeAnnotation` in VectorizedColumnReader

2021-03-16 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-34661:
-
Description: 
{{OriginalType}} and {{DecimalMetadata}} has been marked as {{@Deprecated}} in 
new Parquet code.

{{Apache Parquet}} suggest us replace {{OriginalType}} with 
{{LogicalTypeAnnotation}} and replace {{DecimalMetadata}} with 
{{DecimalLogicalTypeAnnotation.}}

{{}}

The files to be changed are as follows:
 * VectorizedColumnReader.java
 * ParquetFilters.scala
 * ParquetReadSupport.scala
 * ParquetRowConverter.scala
 * ParquetSchemaConverter.scala

 

{{}}

  was:
`OriginalType` has been marked as '@Deprecated', Apache Parquet suggests to use 
 LogicalTypeAnnotation to represent logical types instead.

This JIRA is used to track the cleanup of `OriginalType` usages in 
VectorizedColumnReader

 


> Replaces `OriginalType` with `LogicalTypeAnnotation` in VectorizedColumnReader
> --
>
> Key: SPARK-34661
> URL: https://issues.apache.org/jira/browse/SPARK-34661
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Minor
>
> {{OriginalType}} and {{DecimalMetadata}} has been marked as {{@Deprecated}} 
> in new Parquet code.
> {{Apache Parquet}} suggest us replace {{OriginalType}} with 
> {{LogicalTypeAnnotation}} and replace {{DecimalMetadata}} with 
> {{DecimalLogicalTypeAnnotation.}}
> {{}}
> The files to be changed are as follows:
>  * VectorizedColumnReader.java
>  * ParquetFilters.scala
>  * ParquetReadSupport.scala
>  * ParquetRowConverter.scala
>  * ParquetSchemaConverter.scala
>  
> {{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34762) Many PR's Scala 2.13 build action failed

2021-03-16 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302544#comment-17302544
 ] 

Yang Jie commented on SPARK-34762:
--

Maven compilation seems not failed

cc [~dongjoon] [~srowen] [~hyukjin.kwon] 

> Many PR's Scala 2.13 build action failed
> 
>
> Key: SPARK-34762
> URL: https://issues.apache.org/jira/browse/SPARK-34762
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Major
>
> PR with Scala 2.13 build failure includes 
>  * [https://github.com/apache/spark/pull/31849]
>  * [https://github.com/apache/spark/pull/31848]
>  * [https://github.com/apache/spark/pull/31844]
>  * [https://github.com/apache/spark/pull/31843]
> {code:java}
> [error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1:
>   error: package org.apache.commons.cli does not exist
> 1278[error] import org.apache.commons.cli.GnuParser;
> 1279[error]  ^
> 1280[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1:
>   error: cannot find symbol
> 1281[error] private final Options options = new Options();
> 1282[error]   ^  symbol:   class Options
> 1283[error]   location: class ServerOptionsProcessor
> 1284[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:177:1:
>   error: package org.apache.commons.cli does not exist
> 1285[error] private org.apache.commons.cli.CommandLine commandLine;
> 1286[error]   ^
> 1287[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:255:1:
>   error: cannot find symbol
> 1288[error] HelpOptionExecutor(String serverName, Options options) {
> 1289[error]   ^  symbol:   class 
> Options
> 1290[error]   location: class HelpOptionExecutor
> 1291[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1:
>   error: cannot find symbol
> 1292[error] private final Options options = new Options();
> 1293[error] ^  symbol:   class Options
> 1294[error]   location: class ServerOptionsProcessor
> 1295[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:185:1:
>   error: cannot find symbol
> 1296[error]   options.addOption(OptionBuilder
> 1297[error] ^  symbol:   variable OptionBuilder
> 1298[error]   location: class ServerOptionsProcessor
> 1299[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:192:1:
>   error: cannot find symbol
> 1300[error]   options.addOption(new Option("H", "help", false, "Print 
> help information"));
> 1301[error] ^  symbol:   class Option
> 1302[error]   location: class ServerOptionsProcessor
> 1303[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:197:1:
>   error: cannot find symbol
> 1304[error] commandLine = new GnuParser().parse(options, argv);
> 1305[error]   ^  symbol:   class GnuParser
> 1306[error]   location: class ServerOptionsProcessor
> 1307[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:211:1:
>   error: cannot find symbol
> 1308[error]   } catch (ParseException e) {
> 1309[error]^  symbol:   class ParseException
> 1310[error]   location: class ServerOptionsProcessor
> 1311[error] 
> /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:262:1:
>   error: cannot find symbol
> 1312[error]   new HelpFormatter().printHelp(serverName, options);
> 1313[error]   ^  symbol:   class HelpFormatter
> 1314[error]   location: class HelpOptionExecutor
> 1315[error] Note: Some input files use or override a deprecated API.
> 1316[error] Note: Recompile with -Xlint:deprecation for details.
> 1317[error] 16 errors
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34762) Many PR's Scala 2.13 build action failed

2021-03-16 Thread Yang Jie (Jira)
Yang Jie created SPARK-34762:


 Summary: Many PR's Scala 2.13 build action failed
 Key: SPARK-34762
 URL: https://issues.apache.org/jira/browse/SPARK-34762
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.0
Reporter: Yang Jie


PR with Scala 2.13 build failure includes 
 * [https://github.com/apache/spark/pull/31849]
 * [https://github.com/apache/spark/pull/31848]
 * [https://github.com/apache/spark/pull/31844]
 * [https://github.com/apache/spark/pull/31843]

{code:java}
[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:26:1:
  error: package org.apache.commons.cli does not exist
1278[error] import org.apache.commons.cli.GnuParser;
1279[error]  ^
1280[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1:
  error: cannot find symbol
1281[error] private final Options options = new Options();
1282[error]   ^  symbol:   class Options
1283[error]   location: class ServerOptionsProcessor
1284[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:177:1:
  error: package org.apache.commons.cli does not exist
1285[error] private org.apache.commons.cli.CommandLine commandLine;
1286[error]   ^
1287[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:255:1:
  error: cannot find symbol
1288[error] HelpOptionExecutor(String serverName, Options options) {
1289[error]   ^  symbol:   class Options
1290[error]   location: class HelpOptionExecutor
1291[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:176:1:
  error: cannot find symbol
1292[error] private final Options options = new Options();
1293[error] ^  symbol:   class Options
1294[error]   location: class ServerOptionsProcessor
1295[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:185:1:
  error: cannot find symbol
1296[error]   options.addOption(OptionBuilder
1297[error] ^  symbol:   variable OptionBuilder
1298[error]   location: class ServerOptionsProcessor
1299[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:192:1:
  error: cannot find symbol
1300[error]   options.addOption(new Option("H", "help", false, "Print help 
information"));
1301[error] ^  symbol:   class Option
1302[error]   location: class ServerOptionsProcessor
1303[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:197:1:
  error: cannot find symbol
1304[error] commandLine = new GnuParser().parse(options, argv);
1305[error]   ^  symbol:   class GnuParser
1306[error]   location: class ServerOptionsProcessor
1307[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:211:1:
  error: cannot find symbol
1308[error]   } catch (ParseException e) {
1309[error]^  symbol:   class ParseException
1310[error]   location: class ServerOptionsProcessor
1311[error] 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:262:1:
  error: cannot find symbol
1312[error]   new HelpFormatter().printHelp(serverName, options);
1313[error]   ^  symbol:   class HelpFormatter
1314[error]   location: class HelpOptionExecutor
1315[error] Note: Some input files use or override a deprecated API.
1316[error] Note: Recompile with -Xlint:deprecation for details.
1317[error] 16 errors
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34759) run JavaSparkSQLExample failed with Exception.

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302502#comment-17302502
 ] 

Apache Spark commented on SPARK-34759:
--

User 'zengruios' has created a pull request for this issue:
https://github.com/apache/spark/pull/31852

> run JavaSparkSQLExample failed with Exception.
> --
>
> Key: SPARK-34759
> URL: https://issues.apache.org/jira/browse/SPARK-34759
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 3.0.1, 3.1.1
>Reporter: zengrui
>Priority: Minor
>
> run JavaSparkSQLExample failed with Exception.
> The Exception is thrown  in function runDatasetCreationExample, when execute 
> ‘spark.read().json(path).as(personEncoder)’.
> The exception is  'Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot up cast `age` from bigint to 
> int.'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34759) run JavaSparkSQLExample failed with Exception.

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34759:


Assignee: (was: Apache Spark)

> run JavaSparkSQLExample failed with Exception.
> --
>
> Key: SPARK-34759
> URL: https://issues.apache.org/jira/browse/SPARK-34759
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 3.0.1, 3.1.1
>Reporter: zengrui
>Priority: Minor
>
> run JavaSparkSQLExample failed with Exception.
> The Exception is thrown  in function runDatasetCreationExample, when execute 
> ‘spark.read().json(path).as(personEncoder)’.
> The exception is  'Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot up cast `age` from bigint to 
> int.'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34759) run JavaSparkSQLExample failed with Exception.

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302501#comment-17302501
 ] 

Apache Spark commented on SPARK-34759:
--

User 'zengruios' has created a pull request for this issue:
https://github.com/apache/spark/pull/31852

> run JavaSparkSQLExample failed with Exception.
> --
>
> Key: SPARK-34759
> URL: https://issues.apache.org/jira/browse/SPARK-34759
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 3.0.1, 3.1.1
>Reporter: zengrui
>Priority: Minor
>
> run JavaSparkSQLExample failed with Exception.
> The Exception is thrown  in function runDatasetCreationExample, when execute 
> ‘spark.read().json(path).as(personEncoder)’.
> The exception is  'Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot up cast `age` from bigint to 
> int.'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34759) run JavaSparkSQLExample failed with Exception.

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34759:


Assignee: Apache Spark

> run JavaSparkSQLExample failed with Exception.
> --
>
> Key: SPARK-34759
> URL: https://issues.apache.org/jira/browse/SPARK-34759
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 3.0.1, 3.1.1
>Reporter: zengrui
>Assignee: Apache Spark
>Priority: Minor
>
> run JavaSparkSQLExample failed with Exception.
> The Exception is thrown  in function runDatasetCreationExample, when execute 
> ‘spark.read().json(path).as(personEncoder)’.
> The exception is  'Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot up cast `age` from bigint to 
> int.'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34760) run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample().

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302497#comment-17302497
 ] 

Apache Spark commented on SPARK-34760:
--

User 'zengruios' has created a pull request for this issue:
https://github.com/apache/spark/pull/31851

> run JavaSQLDataSourceExample failed with Exception in 
> runBasicDataSourceExample().
> --
>
> Key: SPARK-34760
> URL: https://issues.apache.org/jira/browse/SPARK-34760
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 3.0.1, 3.1.1
>Reporter: zengrui
>Priority: Minor
>
> run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample().
> when excecute 
> 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'
> throws Exception: 'Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: partition column favorite_color is 
> not defined in table people_partitioned_bucketed, defined table columns are: 
> age, name;'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34760) run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample().

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34760:


Assignee: Apache Spark

> run JavaSQLDataSourceExample failed with Exception in 
> runBasicDataSourceExample().
> --
>
> Key: SPARK-34760
> URL: https://issues.apache.org/jira/browse/SPARK-34760
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 3.0.1, 3.1.1
>Reporter: zengrui
>Assignee: Apache Spark
>Priority: Minor
>
> run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample().
> when excecute 
> 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'
> throws Exception: 'Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: partition column favorite_color is 
> not defined in table people_partitioned_bucketed, defined table columns are: 
> age, name;'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34760) run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample().

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302496#comment-17302496
 ] 

Apache Spark commented on SPARK-34760:
--

User 'zengruios' has created a pull request for this issue:
https://github.com/apache/spark/pull/31851

> run JavaSQLDataSourceExample failed with Exception in 
> runBasicDataSourceExample().
> --
>
> Key: SPARK-34760
> URL: https://issues.apache.org/jira/browse/SPARK-34760
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 3.0.1, 3.1.1
>Reporter: zengrui
>Priority: Minor
>
> run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample().
> when excecute 
> 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'
> throws Exception: 'Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: partition column favorite_color is 
> not defined in table people_partitioned_bucketed, defined table columns are: 
> age, name;'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34760) run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample().

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34760:


Assignee: (was: Apache Spark)

> run JavaSQLDataSourceExample failed with Exception in 
> runBasicDataSourceExample().
> --
>
> Key: SPARK-34760
> URL: https://issues.apache.org/jira/browse/SPARK-34760
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 3.0.1, 3.1.1
>Reporter: zengrui
>Priority: Minor
>
> run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample().
> when excecute 
> 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'
> throws Exception: 'Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: partition column favorite_color is 
> not defined in table people_partitioned_bucketed, defined table columns are: 
> age, name;'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34760) run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample().

2021-03-16 Thread zengrui (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zengrui updated SPARK-34760:

Description: 
run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample().

when excecute 
'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'

throws Exception: 'Exception in thread "main" 
org.apache.spark.sql.AnalysisException: partition column favorite_color is not 
defined in table people_partitioned_bucketed, defined table columns are: age, 
name;'

  was:
run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample().

when excecute 
'peopleDF.write().partitionBy("age").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'

throws Exception: 'Exception in thread "main" 
org.apache.spark.sql.AnalysisException: partition column favorite_color is not 
defined in table people_partitioned_bucketed, defined table columns are: age, 
name;'


> run JavaSQLDataSourceExample failed with Exception in 
> runBasicDataSourceExample().
> --
>
> Key: SPARK-34760
> URL: https://issues.apache.org/jira/browse/SPARK-34760
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 3.0.1, 3.1.1
>Reporter: zengrui
>Priority: Minor
>
> run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample().
> when excecute 
> 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'
> throws Exception: 'Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: partition column favorite_color is 
> not defined in table people_partitioned_bucketed, defined table columns are: 
> age, name;'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34760) run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample().

2021-03-16 Thread zengrui (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zengrui updated SPARK-34760:

Summary: run JavaSQLDataSourceExample failed with Exception in 
runBasicDataSourceExample().  (was: run JavaSparkSQLExample failed with 
Exception in runBasicDataSourceExample().)

> run JavaSQLDataSourceExample failed with Exception in 
> runBasicDataSourceExample().
> --
>
> Key: SPARK-34760
> URL: https://issues.apache.org/jira/browse/SPARK-34760
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 3.0.1, 3.1.1
>Reporter: zengrui
>Priority: Minor
>
> run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample().
> when excecute 
> 'peopleDF.write().partitionBy("age").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'
> throws Exception: 'Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: partition column favorite_color is 
> not defined in table people_partitioned_bucketed, defined table columns are: 
> age, name;'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21449) Hive client's SessionState was not closed properly in HiveExternalCatalog

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302439#comment-17302439
 ] 

Apache Spark commented on SPARK-21449:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/31850

> Hive client's SessionState was not closed properly  in HiveExternalCatalog
> --
>
> Key: SPARK-21449
> URL: https://issues.apache.org/jira/browse/SPARK-21449
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.2.0
>
>
> close the sessionstate to clear `hive.downloaded.resources.dir` and else.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21449) Hive client's SessionState was not closed properly in HiveExternalCatalog

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-21449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302438#comment-17302438
 ] 

Apache Spark commented on SPARK-21449:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/31850

> Hive client's SessionState was not closed properly  in HiveExternalCatalog
> --
>
> Key: SPARK-21449
> URL: https://issues.apache.org/jira/browse/SPARK-21449
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.2.0
>
>
> close the sessionstate to clear `hive.downloaded.resources.dir` and else.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34761) Add a day-time interval to a timestamp

2021-03-16 Thread Max Gekk (Jira)
Max Gekk created SPARK-34761:


 Summary: Add a day-time interval to a timestamp
 Key: SPARK-34761
 URL: https://issues.apache.org/jira/browse/SPARK-34761
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Max Gekk
Assignee: Max Gekk
 Fix For: 3.2.0


Support adding of YearMonthIntervalType values to TIMESTAMP values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34761) Add a day-time interval to a timestamp

2021-03-16 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-34761:
-
Description: Support adding of DayTimeIntervalType values to TIMESTAMP 
values.  (was: Support adding of YearMonthIntervalType values to TIMESTAMP 
values.)

> Add a day-time interval to a timestamp
> --
>
> Key: SPARK-34761
> URL: https://issues.apache.org/jira/browse/SPARK-34761
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Support adding of DayTimeIntervalType values to TIMESTAMP values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34754) sparksql 'add jar' not support hdfs ha mode in k8s

2021-03-16 Thread lithiumlee-_- (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lithiumlee-_- updated SPARK-34754:
--
Description: 
Submit app to K8S,  the executors meet exception  
"java.net.UnknownHostException: xx". 

The udf jar uri using hdfs ha style, but the exception stack show  
"...*createNonHAProxy*..."

 

hql: 
{code:java}
// code placeholder

add jar hdfs://xx/test.jar;
create temporary function test_udf as 'com.xxx.xxx';

create table test.test_udf as 
select test_udf('1') name_1;
 {code}
 

 

exception:
{code:java}
// code placeholder
 TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): 
java.lang.IllegalArgumentException: java.net.UnknownHostException: xx
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496)
at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:816)
at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:808)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at 
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:808)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: xx
... 28 more

{code}
 

  was:
Submit app to K8S,  the driver already running  but meet exception  
"java.net.UnknownHostException: xx" when starting executors. 

The udf jar uri using ha style, but the exception stack is 
"...*createNonHAProxy*..."

 

hql: 
{code:java}
// code placeholder

add jar hdfs://xx/test.jar;
create temporary function test_udf as 'com.xxx.xxx';

create table test.test_udf as 
select test_udf('1') name_1;
 {code}
 

 

exception:
{code:java}
// code placeholder
 TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.30.89.44, executor 1): 
java.lang.IllegalArgumentException: java.net.UnknownHostException: xx
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721)
at 

[jira] [Created] (SPARK-34760) run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample().

2021-03-16 Thread zengrui (Jira)
zengrui created SPARK-34760:
---

 Summary: run JavaSparkSQLExample failed with Exception in 
runBasicDataSourceExample().
 Key: SPARK-34760
 URL: https://issues.apache.org/jira/browse/SPARK-34760
 Project: Spark
  Issue Type: Bug
  Components: Examples
Affects Versions: 3.1.1, 3.0.1
Reporter: zengrui


run JavaSparkSQLExample failed with Exception in runBasicDataSourceExample().

when excecute 
'peopleDF.write().partitionBy("age").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'

throws Exception: 'Exception in thread "main" 
org.apache.spark.sql.AnalysisException: partition column favorite_color is not 
defined in table people_partitioned_bucketed, defined table columns are: age, 
name;'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34757) Spark submit should ignore cache for SNAPSHOT dependencies

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34757:


Assignee: (was: Apache Spark)

> Spark submit should ignore cache for SNAPSHOT dependencies
> --
>
> Key: SPARK-34757
> URL: https://issues.apache.org/jira/browse/SPARK-34757
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 3.1.1
>Reporter: Bo Zhang
>Priority: Major
>
> When spark-submit is executed with --packages, it will not download the 
> dependency jars when they are available in cache (e.g. ivy cache), even when 
> the dependencies are SNAPSHOTs. 
> This might block developers who work on external modules in Spark (e.g. 
> spark-avro), since they need to remove the cache manually every time when 
> they update the code during developments (which generates SNAPSHOT jars). 
> Without knowing this, they could be blocked wondering why their code changes 
> are not reflected in spark-submit executions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34757) Spark submit should ignore cache for SNAPSHOT dependencies

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302373#comment-17302373
 ] 

Apache Spark commented on SPARK-34757:
--

User 'bozhang2820' has created a pull request for this issue:
https://github.com/apache/spark/pull/31849

> Spark submit should ignore cache for SNAPSHOT dependencies
> --
>
> Key: SPARK-34757
> URL: https://issues.apache.org/jira/browse/SPARK-34757
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 3.1.1
>Reporter: Bo Zhang
>Priority: Major
>
> When spark-submit is executed with --packages, it will not download the 
> dependency jars when they are available in cache (e.g. ivy cache), even when 
> the dependencies are SNAPSHOTs. 
> This might block developers who work on external modules in Spark (e.g. 
> spark-avro), since they need to remove the cache manually every time when 
> they update the code during developments (which generates SNAPSHOT jars). 
> Without knowing this, they could be blocked wondering why their code changes 
> are not reflected in spark-submit executions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34757) Spark submit should ignore cache for SNAPSHOT dependencies

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34757:


Assignee: Apache Spark

> Spark submit should ignore cache for SNAPSHOT dependencies
> --
>
> Key: SPARK-34757
> URL: https://issues.apache.org/jira/browse/SPARK-34757
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 3.1.1
>Reporter: Bo Zhang
>Assignee: Apache Spark
>Priority: Major
>
> When spark-submit is executed with --packages, it will not download the 
> dependency jars when they are available in cache (e.g. ivy cache), even when 
> the dependencies are SNAPSHOTs. 
> This might block developers who work on external modules in Spark (e.g. 
> spark-avro), since they need to remove the cache manually every time when 
> they update the code during developments (which generates SNAPSHOT jars). 
> Without knowing this, they could be blocked wondering why their code changes 
> are not reflected in spark-submit executions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34757) Spark submit should ignore cache for SNAPSHOT dependencies

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34757:


Assignee: Apache Spark

> Spark submit should ignore cache for SNAPSHOT dependencies
> --
>
> Key: SPARK-34757
> URL: https://issues.apache.org/jira/browse/SPARK-34757
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 3.1.1
>Reporter: Bo Zhang
>Assignee: Apache Spark
>Priority: Major
>
> When spark-submit is executed with --packages, it will not download the 
> dependency jars when they are available in cache (e.g. ivy cache), even when 
> the dependencies are SNAPSHOTs. 
> This might block developers who work on external modules in Spark (e.g. 
> spark-avro), since they need to remove the cache manually every time when 
> they update the code during developments (which generates SNAPSHOT jars). 
> Without knowing this, they could be blocked wondering why their code changes 
> are not reflected in spark-submit executions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34759) run JavaSparkSQLExample failed with Exception.

2021-03-16 Thread zengrui (Jira)
zengrui created SPARK-34759:
---

 Summary: run JavaSparkSQLExample failed with Exception.
 Key: SPARK-34759
 URL: https://issues.apache.org/jira/browse/SPARK-34759
 Project: Spark
  Issue Type: Bug
  Components: Examples
Affects Versions: 3.1.1, 3.0.1
Reporter: zengrui


run JavaSparkSQLExample failed with Exception.

The Exception is thrown  in function runDatasetCreationExample, when execute 
‘spark.read().json(path).as(personEncoder)’.

The exception is  'Exception in thread "main" 
org.apache.spark.sql.AnalysisException: Cannot up cast `age` from bigint to 
int.'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34758) Simplify Analyzer.resolveLiteralFunction

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302354#comment-17302354
 ] 

Apache Spark commented on SPARK-34758:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/31844

> Simplify Analyzer.resolveLiteralFunction
> 
>
> Key: SPARK-34758
> URL: https://issues.apache.org/jira/browse/SPARK-34758
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34758) Simplify Analyzer.resolveLiteralFunction

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34758:


Assignee: Apache Spark

> Simplify Analyzer.resolveLiteralFunction
> 
>
> Key: SPARK-34758
> URL: https://issues.apache.org/jira/browse/SPARK-34758
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34758) Simplify Analyzer.resolveLiteralFunction

2021-03-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302353#comment-17302353
 ] 

Apache Spark commented on SPARK-34758:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/31844

> Simplify Analyzer.resolveLiteralFunction
> 
>
> Key: SPARK-34758
> URL: https://issues.apache.org/jira/browse/SPARK-34758
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34758) Simplify Analyzer.resolveLiteralFunction

2021-03-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34758:


Assignee: (was: Apache Spark)

> Simplify Analyzer.resolveLiteralFunction
> 
>
> Key: SPARK-34758
> URL: https://issues.apache.org/jira/browse/SPARK-34758
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34758) Simplify Analyzer.resolveLiteralFunction

2021-03-16 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-34758:
---

 Summary: Simplify Analyzer.resolveLiteralFunction
 Key: SPARK-34758
 URL: https://issues.apache.org/jira/browse/SPARK-34758
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34757) Spark submit should ignore cache for SNAPSHOT dependencies

2021-03-16 Thread Bo Zhang (Jira)
Bo Zhang created SPARK-34757:


 Summary: Spark submit should ignore cache for SNAPSHOT dependencies
 Key: SPARK-34757
 URL: https://issues.apache.org/jira/browse/SPARK-34757
 Project: Spark
  Issue Type: Bug
  Components: Deploy, Spark Core
Affects Versions: 3.1.1
Reporter: Bo Zhang


When spark-submit is executed with --packages, it will not download the 
dependency jars when they are available in cache (e.g. ivy cache), even when 
the dependencies are SNAPSHOTs. 

This might block developers who work on external modules in Spark (e.g. 
spark-avro), since they need to remove the cache manually every time when they 
update the code during developments (which generates SNAPSHOT jars). Without 
knowing this, they could be blocked wondering why their code changes are not 
reflected in spark-submit executions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >