[jira] [Assigned] (SPARK-34557) Exclude Avro's transitive zstd-jni dependency

2021-02-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-34557:
-

Assignee: Dongjoon Hyun

> Exclude Avro's transitive zstd-jni dependency
> -
>
> Key: SPARK-34557
> URL: https://issues.apache.org/jira/browse/SPARK-34557
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34557) Exclude Avro's transitive zstd-jni dependency

2021-02-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-34557.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31670
[https://github.com/apache/spark/pull/31670]

> Exclude Avro's transitive zstd-jni dependency
> -
>
> Key: SPARK-34557
> URL: https://issues.apache.org/jira/browse/SPARK-34557
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34559) Upgrade to ZSTD JNI 1.4.6

2021-02-27 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-34559:
-

 Summary: Upgrade to ZSTD JNI 1.4.6
 Key: SPARK-34559
 URL: https://issues.apache.org/jira/browse/SPARK-34559
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34559) Upgrade to ZSTD JNI 1.4.6

2021-02-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292076#comment-17292076
 ] 

Apache Spark commented on SPARK-34559:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31674

> Upgrade to ZSTD JNI 1.4.6
> -
>
> Key: SPARK-34559
> URL: https://issues.apache.org/jira/browse/SPARK-34559
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34559) Upgrade to ZSTD JNI 1.4.6

2021-02-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34559:


Assignee: (was: Apache Spark)

> Upgrade to ZSTD JNI 1.4.6
> -
>
> Key: SPARK-34559
> URL: https://issues.apache.org/jira/browse/SPARK-34559
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34559) Upgrade to ZSTD JNI 1.4.6

2021-02-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34559:


Assignee: Apache Spark

> Upgrade to ZSTD JNI 1.4.6
> -
>
> Key: SPARK-34559
> URL: https://issues.apache.org/jira/browse/SPARK-34559
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34559) Upgrade to ZSTD JNI 1.4.8-6

2021-02-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-34559:
--
Summary: Upgrade to ZSTD JNI 1.4.8-6  (was: Upgrade to ZSTD JNI 1.4.6)

> Upgrade to ZSTD JNI 1.4.8-6
> ---
>
> Key: SPARK-34559
> URL: https://issues.apache.org/jira/browse/SPARK-34559
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34447) Refactor the unified v1 and v2 command tests

2021-02-27 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34447:
---
Description: 
The ticket aims to gather potential improvements for the unified tests.
 1. Remove SharedSparkSession from *ParserSuite
 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite
 3. Add JIRA ID SPARK-33829 to "SPARK-33786: Cache's storage level should be 
respected when a table name is altered"
 4. Reset default namespace in ShowTablesSuiteBase."change current catalog and 
namespace with USE statements" using spark.sessionState.catalogManager.reset()

  was:
The ticket aims to gather potential improvements for the unified tests.
 1. Remove SharedSparkSession from *ParserSuite
 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite
 3. Add JIRA ID SPARK-33829 to "SPARK-33786: Cache's storage level should be 
respected when a table name is altered"


> Refactor the unified v1 and v2 command tests
> 
>
> Key: SPARK-34447
> URL: https://issues.apache.org/jira/browse/SPARK-34447
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The ticket aims to gather potential improvements for the unified tests.
>  1. Remove SharedSparkSession from *ParserSuite
>  2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite
>  3. Add JIRA ID SPARK-33829 to "SPARK-33786: Cache's storage level should be 
> respected when a table name is altered"
>  4. Reset default namespace in ShowTablesSuiteBase."change current catalog 
> and namespace with USE statements" using 
> spark.sessionState.catalogManager.reset()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34559) Upgrade to ZSTD JNI 1.4.8-6

2021-02-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-34559:
-

Assignee: Dongjoon Hyun

> Upgrade to ZSTD JNI 1.4.8-6
> ---
>
> Key: SPARK-34559
> URL: https://issues.apache.org/jira/browse/SPARK-34559
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34559) Upgrade to ZSTD JNI 1.4.8-6

2021-02-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-34559.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31674
[https://github.com/apache/spark/pull/31674]

> Upgrade to ZSTD JNI 1.4.8-6
> ---
>
> Key: SPARK-34559
> URL: https://issues.apache.org/jira/browse/SPARK-34559
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34560) Cannot join datasets of SHOW TABLES

2021-02-27 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34560:
--

 Summary: Cannot join datasets of SHOW TABLES
 Key: SPARK-34560
 URL: https://issues.apache.org/jira/browse/SPARK-34560
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The example portraits the issue:

{code:scala}
scala> sql("CREATE NAMESPACE ns1")
res8: org.apache.spark.sql.DataFrame = []

scala> sql("CREATE NAMESPACE ns2")
res9: org.apache.spark.sql.DataFrame = []

scala> sql("CREATE TABLE ns1.tbl1 (c INT)")
res10: org.apache.spark.sql.DataFrame = []

scala> sql("CREATE TABLE ns2.tbl2 (c INT)")
res11: org.apache.spark.sql.DataFrame = []

scala> val show1 = sql("SHOW TABLES IN ns1")
show1: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string 
... 1 more field]

scala> val show2 = sql("SHOW TABLES IN ns2")
show2: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string 
... 1 more field]

scala> show1.show
+-+-+---+
|namespace|tableName|isTemporary|
+-+-+---+
|  ns1| tbl1|  false|
+-+-+---+


scala> show2.show
+-+-+---+
|namespace|tableName|isTemporary|
+-+-+---+
|  ns2| tbl2|  false|
+-+-+---+


scala> show1.join(show2).where(show1("tableName") =!= show2("tableName")).show
org.apache.spark.sql.AnalysisException: Column tableName#17 are ambiguous. It's 
probably because you joined several Datasets together, and some of these 
Datasets are the same. This column points to one of the Datasets but Spark is 
unable to figure out which one. Please alias the Datasets with different names 
via `Dataset.as` before joining them, and specify the column using qualified 
name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set 
spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.
  at 
org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin$.apply(DetectAmbiguousSelfJoin.scala:157)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34560) Cannot join datasets of SHOW TABLES

2021-02-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34560:


Assignee: Apache Spark

> Cannot join datasets of SHOW TABLES
> ---
>
> Key: SPARK-34560
> URL: https://issues.apache.org/jira/browse/SPARK-34560
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The example portraits the issue:
> {code:scala}
> scala> sql("CREATE NAMESPACE ns1")
> res8: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE NAMESPACE ns2")
> res9: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE TABLE ns1.tbl1 (c INT)")
> res10: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE TABLE ns2.tbl2 (c INT)")
> res11: org.apache.spark.sql.DataFrame = []
> scala> val show1 = sql("SHOW TABLES IN ns1")
> show1: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string 
> ... 1 more field]
> scala> val show2 = sql("SHOW TABLES IN ns2")
> show2: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string 
> ... 1 more field]
> scala> show1.show
> +-+-+---+
> |namespace|tableName|isTemporary|
> +-+-+---+
> |  ns1| tbl1|  false|
> +-+-+---+
> scala> show2.show
> +-+-+---+
> |namespace|tableName|isTemporary|
> +-+-+---+
> |  ns2| tbl2|  false|
> +-+-+---+
> scala> show1.join(show2).where(show1("tableName") =!= show2("tableName")).show
> org.apache.spark.sql.AnalysisException: Column tableName#17 are ambiguous. 
> It's probably because you joined several Datasets together, and some of these 
> Datasets are the same. This column points to one of the Datasets but Spark is 
> unable to figure out which one. Please alias the Datasets with different 
> names via `Dataset.as` before joining them, and specify the column using 
> qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You 
> can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable 
> this check.
>   at 
> org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin$.apply(DetectAmbiguousSelfJoin.scala:157)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34560) Cannot join datasets of SHOW TABLES

2021-02-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292128#comment-17292128
 ] 

Apache Spark commented on SPARK-34560:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31675

> Cannot join datasets of SHOW TABLES
> ---
>
> Key: SPARK-34560
> URL: https://issues.apache.org/jira/browse/SPARK-34560
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The example portraits the issue:
> {code:scala}
> scala> sql("CREATE NAMESPACE ns1")
> res8: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE NAMESPACE ns2")
> res9: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE TABLE ns1.tbl1 (c INT)")
> res10: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE TABLE ns2.tbl2 (c INT)")
> res11: org.apache.spark.sql.DataFrame = []
> scala> val show1 = sql("SHOW TABLES IN ns1")
> show1: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string 
> ... 1 more field]
> scala> val show2 = sql("SHOW TABLES IN ns2")
> show2: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string 
> ... 1 more field]
> scala> show1.show
> +-+-+---+
> |namespace|tableName|isTemporary|
> +-+-+---+
> |  ns1| tbl1|  false|
> +-+-+---+
> scala> show2.show
> +-+-+---+
> |namespace|tableName|isTemporary|
> +-+-+---+
> |  ns2| tbl2|  false|
> +-+-+---+
> scala> show1.join(show2).where(show1("tableName") =!= show2("tableName")).show
> org.apache.spark.sql.AnalysisException: Column tableName#17 are ambiguous. 
> It's probably because you joined several Datasets together, and some of these 
> Datasets are the same. This column points to one of the Datasets but Spark is 
> unable to figure out which one. Please alias the Datasets with different 
> names via `Dataset.as` before joining them, and specify the column using 
> qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You 
> can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable 
> this check.
>   at 
> org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin$.apply(DetectAmbiguousSelfJoin.scala:157)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34560) Cannot join datasets of SHOW TABLES

2021-02-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292129#comment-17292129
 ] 

Apache Spark commented on SPARK-34560:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31675

> Cannot join datasets of SHOW TABLES
> ---
>
> Key: SPARK-34560
> URL: https://issues.apache.org/jira/browse/SPARK-34560
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The example portraits the issue:
> {code:scala}
> scala> sql("CREATE NAMESPACE ns1")
> res8: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE NAMESPACE ns2")
> res9: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE TABLE ns1.tbl1 (c INT)")
> res10: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE TABLE ns2.tbl2 (c INT)")
> res11: org.apache.spark.sql.DataFrame = []
> scala> val show1 = sql("SHOW TABLES IN ns1")
> show1: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string 
> ... 1 more field]
> scala> val show2 = sql("SHOW TABLES IN ns2")
> show2: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string 
> ... 1 more field]
> scala> show1.show
> +-+-+---+
> |namespace|tableName|isTemporary|
> +-+-+---+
> |  ns1| tbl1|  false|
> +-+-+---+
> scala> show2.show
> +-+-+---+
> |namespace|tableName|isTemporary|
> +-+-+---+
> |  ns2| tbl2|  false|
> +-+-+---+
> scala> show1.join(show2).where(show1("tableName") =!= show2("tableName")).show
> org.apache.spark.sql.AnalysisException: Column tableName#17 are ambiguous. 
> It's probably because you joined several Datasets together, and some of these 
> Datasets are the same. This column points to one of the Datasets but Spark is 
> unable to figure out which one. Please alias the Datasets with different 
> names via `Dataset.as` before joining them, and specify the column using 
> qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You 
> can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable 
> this check.
>   at 
> org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin$.apply(DetectAmbiguousSelfJoin.scala:157)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34560) Cannot join datasets of SHOW TABLES

2021-02-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34560:


Assignee: (was: Apache Spark)

> Cannot join datasets of SHOW TABLES
> ---
>
> Key: SPARK-34560
> URL: https://issues.apache.org/jira/browse/SPARK-34560
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The example portraits the issue:
> {code:scala}
> scala> sql("CREATE NAMESPACE ns1")
> res8: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE NAMESPACE ns2")
> res9: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE TABLE ns1.tbl1 (c INT)")
> res10: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE TABLE ns2.tbl2 (c INT)")
> res11: org.apache.spark.sql.DataFrame = []
> scala> val show1 = sql("SHOW TABLES IN ns1")
> show1: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string 
> ... 1 more field]
> scala> val show2 = sql("SHOW TABLES IN ns2")
> show2: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string 
> ... 1 more field]
> scala> show1.show
> +-+-+---+
> |namespace|tableName|isTemporary|
> +-+-+---+
> |  ns1| tbl1|  false|
> +-+-+---+
> scala> show2.show
> +-+-+---+
> |namespace|tableName|isTemporary|
> +-+-+---+
> |  ns2| tbl2|  false|
> +-+-+---+
> scala> show1.join(show2).where(show1("tableName") =!= show2("tableName")).show
> org.apache.spark.sql.AnalysisException: Column tableName#17 are ambiguous. 
> It's probably because you joined several Datasets together, and some of these 
> Datasets are the same. This column points to one of the Datasets but Spark is 
> unable to figure out which one. Please alias the Datasets with different 
> names via `Dataset.as` before joining them, and specify the column using 
> qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You 
> can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable 
> this check.
>   at 
> org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin$.apply(DetectAmbiguousSelfJoin.scala:157)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34392) Invalid ID for offset-based ZoneId since Spark 3.0

2021-02-27 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-34392:
-
Fix Version/s: 3.1.2

> Invalid ID for offset-based ZoneId since Spark 3.0
> --
>
> Key: SPARK-34392
> URL: https://issues.apache.org/jira/browse/SPARK-34392
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Yuming Wang
>Assignee: karl wang
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
>
> How to reproduce this issue:
> {code:sql}
> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> {code}
> Spark 2.4:
> {noformat}
> spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> 2020-02-07 08:00:00
> Time taken: 0.089 seconds, Fetched 1 row(s)
> {noformat}
> Spark 3.x:
> {noformat}
> spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> 21/02/07 01:24:32 ERROR SparkSQLDriver: Failed in [select 
> to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00")]
> java.time.DateTimeException: Invalid ID for offset-based ZoneId: GMT+8:00
>   at java.time.ZoneId.ofWithPrefix(ZoneId.java:437)
>   at java.time.ZoneId.of(ZoneId.java:407)
>   at java.time.ZoneId.of(ZoneId.java:359)
>   at java.time.ZoneId.of(ZoneId.java:315)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.getZoneId(DateTimeUtils.scala:53)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.toUTCTime(DateTimeUtils.scala:814)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34415) Use randomization as a possibly better technique than grid search in optimizing hyperparameters

2021-02-27 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-34415:


Assignee: Phillip Henry

> Use randomization as a possibly better technique than grid search in 
> optimizing hyperparameters
> ---
>
> Key: SPARK-34415
> URL: https://issues.apache.org/jira/browse/SPARK-34415
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 3.0.1
>Reporter: Phillip Henry
>Assignee: Phillip Henry
>Priority: Minor
>  Labels: pull-request-available
>
> Randomization can be a more effective techinique than a grid search in 
> finding optimal hyperparameters since min/max points can fall between the 
> grid lines and never be found. Randomisation is not so restricted although 
> the probability of finding minima/maxima is dependent on the number of 
> attempts.
> Alice Zheng has an accessible description on how this technique works at 
> [https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html]
> (Note that I have a PR for this work outstanding at 
> [https://github.com/apache/spark/pull/31535] )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34415) Use randomization as a possibly better technique than grid search in optimizing hyperparameters

2021-02-27 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-34415.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31535
[https://github.com/apache/spark/pull/31535]

> Use randomization as a possibly better technique than grid search in 
> optimizing hyperparameters
> ---
>
> Key: SPARK-34415
> URL: https://issues.apache.org/jira/browse/SPARK-34415
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 3.0.1
>Reporter: Phillip Henry
>Assignee: Phillip Henry
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>
> Randomization can be a more effective techinique than a grid search in 
> finding optimal hyperparameters since min/max points can fall between the 
> grid lines and never be found. Randomisation is not so restricted although 
> the probability of finding minima/maxima is dependent on the number of 
> attempts.
> Alice Zheng has an accessible description on how this technique works at 
> [https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html]
> (Note that I have a PR for this work outstanding at 
> [https://github.com/apache/spark/pull/31535] )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34561) Cannot drop/add columns from/to a dataset of v2 `DESCRIBE TABLE`

2021-02-27 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34561:
--

 Summary: Cannot drop/add columns from/to a dataset of v2 `DESCRIBE 
TABLE`
 Key: SPARK-34561
 URL: https://issues.apache.org/jira/browse/SPARK-34561
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Dropping a column from a dataset of v2 `DESCRIBE TABLE` fails with:
{code:java}
Resolved attribute(s) col_name#102,data_type#103 missing from 
col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, 
data_type#103]. Attribute(s) with the same name appear in the operation: 
col_name,data_type. Please check if the right attribute(s) are used.;
!Project [col_name#102, data_type#103]
+- LocalRelation [col_name#29, data_type#30, comment#31]{code}
The code below demonstrates the issue:
{code:java}
val tbl = s"${catalogAndNamespace}tbl"
withTable(tbl) {
  sql(s"CREATE TABLE $tbl (c0 INT) USING $v2Format")
  val description = sql(s"DESCRIBE TABLE $tbl")
  val noComment = description.drop("comment")
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34561) Cannot drop/add columns from/to a dataset of v2 `DESCRIBE TABLE`

2021-02-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34561:


Assignee: Apache Spark

> Cannot drop/add columns from/to a dataset of v2 `DESCRIBE TABLE`
> 
>
> Key: SPARK-34561
> URL: https://issues.apache.org/jira/browse/SPARK-34561
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Dropping a column from a dataset of v2 `DESCRIBE TABLE` fails with:
> {code:java}
> Resolved attribute(s) col_name#102,data_type#103 missing from 
> col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, 
> data_type#103]. Attribute(s) with the same name appear in the operation: 
> col_name,data_type. Please check if the right attribute(s) are used.;
> !Project [col_name#102, data_type#103]
> +- LocalRelation [col_name#29, data_type#30, comment#31]{code}
> The code below demonstrates the issue:
> {code:java}
> val tbl = s"${catalogAndNamespace}tbl"
> withTable(tbl) {
>   sql(s"CREATE TABLE $tbl (c0 INT) USING $v2Format")
>   val description = sql(s"DESCRIBE TABLE $tbl")
>   val noComment = description.drop("comment")
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34561) Cannot drop/add columns from/to a dataset of v2 `DESCRIBE TABLE`

2021-02-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34561:


Assignee: (was: Apache Spark)

> Cannot drop/add columns from/to a dataset of v2 `DESCRIBE TABLE`
> 
>
> Key: SPARK-34561
> URL: https://issues.apache.org/jira/browse/SPARK-34561
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Dropping a column from a dataset of v2 `DESCRIBE TABLE` fails with:
> {code:java}
> Resolved attribute(s) col_name#102,data_type#103 missing from 
> col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, 
> data_type#103]. Attribute(s) with the same name appear in the operation: 
> col_name,data_type. Please check if the right attribute(s) are used.;
> !Project [col_name#102, data_type#103]
> +- LocalRelation [col_name#29, data_type#30, comment#31]{code}
> The code below demonstrates the issue:
> {code:java}
> val tbl = s"${catalogAndNamespace}tbl"
> withTable(tbl) {
>   sql(s"CREATE TABLE $tbl (c0 INT) USING $v2Format")
>   val description = sql(s"DESCRIBE TABLE $tbl")
>   val noComment = description.drop("comment")
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34561) Cannot drop/add columns from/to a dataset of v2 `DESCRIBE TABLE`

2021-02-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292192#comment-17292192
 ] 

Apache Spark commented on SPARK-34561:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31676

> Cannot drop/add columns from/to a dataset of v2 `DESCRIBE TABLE`
> 
>
> Key: SPARK-34561
> URL: https://issues.apache.org/jira/browse/SPARK-34561
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Dropping a column from a dataset of v2 `DESCRIBE TABLE` fails with:
> {code:java}
> Resolved attribute(s) col_name#102,data_type#103 missing from 
> col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, 
> data_type#103]. Attribute(s) with the same name appear in the operation: 
> col_name,data_type. Please check if the right attribute(s) are used.;
> !Project [col_name#102, data_type#103]
> +- LocalRelation [col_name#29, data_type#30, comment#31]{code}
> The code below demonstrates the issue:
> {code:java}
> val tbl = s"${catalogAndNamespace}tbl"
> withTable(tbl) {
>   sql(s"CREATE TABLE $tbl (c0 INT) USING $v2Format")
>   val description = sql(s"DESCRIBE TABLE $tbl")
>   val noComment = description.drop("comment")
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34562) Leverage parquet bloom filters

2021-02-27 Thread H. Vetinari (Jira)
H. Vetinari created SPARK-34562:
---

 Summary: Leverage parquet bloom filters
 Key: SPARK-34562
 URL: https://issues.apache.org/jira/browse/SPARK-34562
 Project: Spark
  Issue Type: Task
  Components: Input/Output
Affects Versions: 3.2.0
Reporter: H. Vetinari


The currently in-progress SPARK-34542 brings in parquet 1.12, which contains 
PARQUET-41.

>From searching the issues, it seems there is no current tracker for this, 
>though I found a comment from [~dongjoon] that points out the missing parquet 
>support up until now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34562) Leverage parquet bloom filters

2021-02-27 Thread H. Vetinari (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

H. Vetinari updated SPARK-34562:

Issue Type: Improvement  (was: Task)

> Leverage parquet bloom filters
> --
>
> Key: SPARK-34562
> URL: https://issues.apache.org/jira/browse/SPARK-34562
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Priority: Major
>
> The currently in-progress SPARK-34542 brings in parquet 1.12, which contains 
> PARQUET-41.
> From searching the issues, it seems there is no current tracker for this, 
> though I found a comment from [~dongjoon] that points out the missing parquet 
> support up until now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34562) Leverage parquet bloom filters

2021-02-27 Thread H. Vetinari (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

H. Vetinari updated SPARK-34562:

Description: 
The currently in-progress SPARK-34542 brings in parquet 1.12, which contains 
PARQUET-41.

>From searching the issues, it seems there is no current tracker for this, 
>though I found a 
>[comment|https://issues.apache.org/jira/browse/SPARK-20901?focusedCommentId=17052473&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17052473]
> from [~dongjoon] that points out the missing parquet support up until now.

  was:
The currently in-progress SPARK-34542 brings in parquet 1.12, which contains 
PARQUET-41.

>From searching the issues, it seems there is no current tracker for this, 
>though I found a comment from [~dongjoon] that points out the missing parquet 
>support up until now.


> Leverage parquet bloom filters
> --
>
> Key: SPARK-34562
> URL: https://issues.apache.org/jira/browse/SPARK-34562
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Priority: Major
>
> The currently in-progress SPARK-34542 brings in parquet 1.12, which contains 
> PARQUET-41.
> From searching the issues, it seems there is no current tracker for this, 
> though I found a 
> [comment|https://issues.apache.org/jira/browse/SPARK-20901?focusedCommentId=17052473&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17052473]
>  from [~dongjoon] that points out the missing parquet support up until now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec

2021-02-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-34479.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31673
[https://github.com/apache/spark/pull/31673]

> Add zstandard codec to spark.sql.avro.compression.codec
> ---
>
> Key: SPARK-34479
> URL: https://issues.apache.org/jira/browse/SPARK-34479
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Avro add zstandard codec since AVRO-2195.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec

2021-02-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-34479:
-

Assignee: Yuming Wang

> Add zstandard codec to spark.sql.avro.compression.codec
> ---
>
> Key: SPARK-34479
> URL: https://issues.apache.org/jira/browse/SPARK-34479
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Avro add zstandard codec since AVRO-2195.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34563) Checkpointing a union with another checkpoint fails

2021-02-27 Thread Michael Kamprath (Jira)
Michael Kamprath created SPARK-34563:


 Summary: Checkpointing a union with another checkpoint fails
 Key: SPARK-34563
 URL: https://issues.apache.org/jira/browse/SPARK-34563
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.0.2
 Environment: I am running Spark 3.0.2 in stand alone cluster mode, 
built for Hadoop 2.7, and Scala 2.12.12. I am using QFS 2.2.2 (Quantcast File 
System) as the underlying DFS. The nodes run on Debian Stretch, and Java is 
openjdk version "1.8.0_275". 
Reporter: Michael Kamprath


I have some PySpark code that periodically checkpoints a data frame  that I am 
building in pieces by union-ing those pieces together as they are constructed. 
(Py)Spark fails on the second checkpoint, which would be a union of a new piece 
of the desired data frame with a previously checkpointed piece. Some simplified 
PySpark code that will trigger this problem is:

 
{code:java}
RANGE_STEP = 1
PARTITIONS = 5
COUNT_UNIONS = 20

df = spark.range(1, RANGE_STEP+1, numPartitions=PARTITIONS)

for i in range(1, COUNT_UNIONS+1):
print('Processing i = {0}'.format(i))
new_df = spark.range(RANGE_STEP*i + 1, RANGE_STEP*(i+1) + 1, 
numPartitions=PARTITIONS)
df = df.union(new_df).checkpoint()

df.count()
{code}
When this code gets to the checkpoint on the second loop iteration (i=2) the 
job fails with an error:

 
{code:java}
Py4JJavaError: An error occurred while calling o119.checkpoint.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in 
stage 10.0 failed 4 times, most recent failure: Lost task 9.3 in stage 10.0 
(TID 264, 10.20.30.13, executor 0): com.esotericsoftware.kryo.KryoException: 
Encountered unregistered class ID: 9062
at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:804)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:296)
at 
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:168)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804)
at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
at 
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2154)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:462)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973)
at scala.Option.foreach(Option.scala:407)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
at 
org.ap

[jira] [Updated] (SPARK-34563) Checkpointing a union with another checkpoint fails

2021-02-27 Thread Michael Kamprath (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Kamprath updated SPARK-34563:
-
Description: 
I have some PySpark code that periodically checkpoints a data frame  that I am 
building in pieces by union-ing those pieces together as they are constructed. 
(Py)Spark fails on the second checkpoint, which would be a union of a new piece 
of the desired data frame with a previously checkpointed piece. Some simplified 
PySpark code that will trigger this problem is:

 
{code:java}
RANGE_STEP = 1
PARTITIONS = 5
COUNT_UNIONS = 20

df = spark.range(1, RANGE_STEP+1, numPartitions=PARTITIONS)

for i in range(1, COUNT_UNIONS+1):
print('Processing i = {0}'.format(i))
new_df = spark.range(RANGE_STEP*i + 1, RANGE_STEP*(i+1) + 1, 
numPartitions=PARTITIONS)
df = df.union(new_df).checkpoint()

df.count()
{code}
When this code gets to the checkpoint on the second loop iteration (i=2) the 
job fails with an error:

 
{code:java}
Py4JJavaError: An error occurred while calling o119.checkpoint.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in 
stage 10.0 failed 4 times, most recent failure: Lost task 9.3 in stage 10.0 
(TID 264, 10.20.30.13, executor 0): com.esotericsoftware.kryo.KryoException: 
Encountered unregistered class ID: 9062
at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:804)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:296)
at 
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:168)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804)
at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
at 
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2154)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:462)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973)
at scala.Option.foreach(Option.scala:407)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2135)
at org.apache.spark.SparkContext.runJob(SparkContext.sc

[jira] [Created] (SPARK-34564) DateTimeUtils.fromJavaDate fails for very late dates during casting to Int

2021-02-27 Thread kondziolka9ld (Jira)
kondziolka9ld created SPARK-34564:
-

 Summary: DateTimeUtils.fromJavaDate fails for very late dates 
during casting to Int
 Key: SPARK-34564
 URL: https://issues.apache.org/jira/browse/SPARK-34564
 Project: Spark
  Issue Type: Question
  Components: SQL
Affects Versions: 3.0.1
Reporter: kondziolka9ld


Please consider a following scenario on *spark-3.0.1*:

 
{code:java}
scala> List(("some date", new Date(Int.MaxValue)), ("some corner case date", 
new Date(Long.MaxValue))).toDF

java.lang.RuntimeException: Error while encoding: 
java.lang.ArithmeticException: integer overflow
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1, true, 
false) AS _1#0
staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, DateType, 
fromJavaDate, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, 
true, false) AS _2#1
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215)
  at 
org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:466)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike.map(TraversableLike.scala:238)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
  at scala.collection.immutable.List.map(List.scala:298)
  at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:466)
  at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:353)
  at 
org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:231)
  ... 51 elided
Caused by: java.lang.ArithmeticException: integer overflow
  at java.lang.Math.toIntExact(Math.java:1011)
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.fromJavaDate(DateTimeUtils.scala:111)
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils.fromJavaDate(DateTimeUtils.scala)
  at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:211)
  ... 60 more
{code}
 

In opposition to *spark-2.4.7* where it is possible to create dataframe with 
such values: 

 
{code:java}
scala> val df = List(("some date", new Date(Int.MaxValue)), ("some corner case 
date", new Date(Long.MaxValue))).toDF
df: org.apache.spark.sql.DataFrame = [_1: string, _2: date]scala> df.show
++-+
|  _1|   _2|
++-+
|   some date|   1970-01-25|
|some corner case ...|1701498-03-18|
++-+

{code}
Anyway, I am aware of the fact that during collecting these data I will got 
another result:

 

 
{code:java}
scala> df.collect
res10: Array[org.apache.spark.sql.Row] = Array([some date,1970-01-25], [some 
corner case date,?498-03-18])
{code}
what seems to be natural as:

 
{code:java}
scala> new java.sql.Date(Long.MaxValue)
res1: java.sql.Date = ?994-08-17
{code}
 

 

When it comes to easier reproduction, please consider:

 
{code:java}
scala> org.apache.spark.sql.catalyst.util.DateTimeUtils.fromJavaDate(new 
java.sql.Date(Long.MaxValue))
java.lang.ArithmeticException: integer overflow
  at java.lang.Math.toIntExact(Math.java:1011)
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.fromJavaDate(DateTimeUtils.scala:111)
  ... 47 elided

{code}
 

However, the question is even if such late dates are not supported, could it 
fail in more gentle way?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34564) DateTimeUtils.fromJavaDate fails for very late dates during casting to Int

2021-02-27 Thread kondziolka9ld (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kondziolka9ld updated SPARK-34564:
--
Description: 
Please consider a following scenario on *spark-3.0.1*: 
{code:java}
scala> List(("some date", new Date(Int.MaxValue)), ("some corner case date", 
new Date(Long.MaxValue))).toDF

java.lang.RuntimeException: Error while encoding: 
java.lang.ArithmeticException: integer overflow
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1, true, 
false) AS _1#0
staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, DateType, 
fromJavaDate, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, 
true, false) AS _2#1
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215)
  at 
org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:466)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike.map(TraversableLike.scala:238)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
  at scala.collection.immutable.List.map(List.scala:298)
  at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:466)
  at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:353)
  at 
org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:231)
  ... 51 elided
Caused by: java.lang.ArithmeticException: integer overflow
  at java.lang.Math.toIntExact(Math.java:1011)
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.fromJavaDate(DateTimeUtils.scala:111)
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils.fromJavaDate(DateTimeUtils.scala)
  at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:211)
  ... 60 more
{code}
 In opposition to *spark-2.4.7* where it is possible to create dataframe with 
such values:  
{code:java}
scala> val df = List(("some date", new Date(Int.MaxValue)), ("some corner case 
date", new Date(Long.MaxValue))).toDF
df: org.apache.spark.sql.DataFrame = [_1: string, _2: date]scala> df.show
++-+
|  _1|   _2|
++-+
|   some date|   1970-01-25|
|some corner case ...|1701498-03-18|
++-+

{code}
Anyway, I am aware of the fact that during collecting these data I will got 
another result: 
{code:java}
scala> df.collect
res10: Array[org.apache.spark.sql.Row] = Array([some date,1970-01-25], [some 
corner case date,?498-03-18])
{code}
what seems to be natural as: 
{code:java}
scala> new java.sql.Date(Long.MaxValue)
res1: java.sql.Date = ?994-08-17
{code}
  

When it comes to easier reproduction, please consider: 
{code:java}
scala> org.apache.spark.sql.catalyst.util.DateTimeUtils.fromJavaDate(new 
java.sql.Date(Long.MaxValue))
java.lang.ArithmeticException: integer overflow
  at java.lang.Math.toIntExact(Math.java:1011)
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.fromJavaDate(DateTimeUtils.scala:111)
  ... 47 elided

{code}
 However, the question is even if such late dates are not supported, could it 
fail in more gentle way?

 

  was:
Please consider a following scenario on *spark-3.0.1*:

 
{code:java}
scala> List(("some date", new Date(Int.MaxValue)), ("some corner case date", 
new Date(Long.MaxValue))).toDF

java.lang.RuntimeException: Error while encoding: 
java.lang.ArithmeticException: integer overflow
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1, true, 
false) AS _1#0
staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, DateType, 
fromJavaDate, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, 
true, false) AS _2#1
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215)
  at 
org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:466)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike.map(TraversableLike.scala:238)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
  at scala.collection.immutable.List.map(List.scala:298)
  at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:466)
  at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:353)
  at 
org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:231)
  ... 51 elided
Caused by: java.lang.A

[jira] [Updated] (SPARK-34564) DateTimeUtils.fromJavaDate fails for very late dates during casting to Int

2021-02-27 Thread kondziolka9ld (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kondziolka9ld updated SPARK-34564:
--
Description: 
Please consider a following scenario on *spark-3.0.1*: 
{code:java}
scala> List(("some date", new Date(Int.MaxValue)), ("some corner case date", 
new Date(Long.MaxValue))).toDF

java.lang.RuntimeException: Error while encoding: 
java.lang.ArithmeticException: integer overflow
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1, true, 
false) AS _1#0
staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, DateType, 
fromJavaDate, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, 
true, false) AS _2#1
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215)
  at 
org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:466)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike.map(TraversableLike.scala:238)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
  at scala.collection.immutable.List.map(List.scala:298)
  at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:466)
  at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:353)
  at 
org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:231)
  ... 51 elided
Caused by: java.lang.ArithmeticException: integer overflow
  at java.lang.Math.toIntExact(Math.java:1011)
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.fromJavaDate(DateTimeUtils.scala:111)
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils.fromJavaDate(DateTimeUtils.scala)
  at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:211)
  ... 60 more
{code}
 In opposition to *spark-2.4.7* where it is possible to create dataframe with 
such values:  
{code:java}
scala> val df = List(("some date", new Date(Int.MaxValue)), ("some corner case 
date", new Date(Long.MaxValue))).toDF
df: org.apache.spark.sql.DataFrame = [_1: string, _2: date]scala> df.show
++-+
|  _1|   _2|
++-+
|   some date|   1970-01-25|
|some corner case ...|1701498-03-18|
++-+

{code}
Anyway, I am aware of the fact that during collecting these data I will got 
another result: 
{code:java}
scala> df.collect
res10: Array[org.apache.spark.sql.Row] = Array([some date,1970-01-25], [some 
corner case date,?498-03-18])
{code}
what seems to be natural because of behaviour of *java.sql.Date*:
{code:java}
scala> new java.sql.Date(Long.MaxValue)
res1: java.sql.Date = ?994-08-17
{code}
  

When it comes to easier reproduction, please consider: 
{code:java}
scala> org.apache.spark.sql.catalyst.util.DateTimeUtils.fromJavaDate(new 
java.sql.Date(Long.MaxValue))
java.lang.ArithmeticException: integer overflow
  at java.lang.Math.toIntExact(Math.java:1011)
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.fromJavaDate(DateTimeUtils.scala:111)
  ... 47 elided

{code}
 However, the question is even if such late dates are not supported, could it 
fail in more gentle way?

 

  was:
Please consider a following scenario on *spark-3.0.1*: 
{code:java}
scala> List(("some date", new Date(Int.MaxValue)), ("some corner case date", 
new Date(Long.MaxValue))).toDF

java.lang.RuntimeException: Error while encoding: 
java.lang.ArithmeticException: integer overflow
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1, true, 
false) AS _1#0
staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, DateType, 
fromJavaDate, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, 
true, false) AS _2#1
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215)
  at 
org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:466)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike.map(TraversableLike.scala:238)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
  at scala.collection.immutable.List.map(List.scala:298)
  at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:466)
  at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:353)
  at 
org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:231)
  ..

[jira] [Commented] (SPARK-34542) Upgrade Parquet to 1.12.0

2021-02-27 Thread H. Vetinari (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292239#comment-17292239
 ] 

H. Vetinari commented on SPARK-34542:
-

Would be amazing if this could make it into 3.2, given all the features in 
parquet 1.12 (e.g. bloom filters & encryption).

> Upgrade Parquet to 1.12.0
> -
>
> Key: SPARK-34542
> URL: https://issues.apache.org/jira/browse/SPARK-34542
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Parquet-1.12.0 release notes:
> https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.0-rc2/CHANGES.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34565) Collapse Window nodes with Project between them

2021-02-27 Thread Tanel Kiis (Jira)
Tanel Kiis created SPARK-34565:
--

 Summary: Collapse Window nodes with Project between them
 Key: SPARK-34565
 URL: https://issues.apache.org/jira/browse/SPARK-34565
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Tanel Kiis


The CollapseWindow optimizer rule can be imroved to also collapse Window nodes, 
that have a Project between them. This sort of Window - Project - Window chains 
will happen when chaining the dataframe.withColumn calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34565) Collapse Window nodes with Project between them

2021-02-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34565:


Assignee: (was: Apache Spark)

> Collapse Window nodes with Project between them
> ---
>
> Key: SPARK-34565
> URL: https://issues.apache.org/jira/browse/SPARK-34565
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> The CollapseWindow optimizer rule can be imroved to also collapse Window 
> nodes, that have a Project between them. This sort of Window - Project - 
> Window chains will happen when chaining the dataframe.withColumn calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34565) Collapse Window nodes with Project between them

2021-02-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292250#comment-17292250
 ] 

Apache Spark commented on SPARK-34565:
--

User 'tanelk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31677

> Collapse Window nodes with Project between them
> ---
>
> Key: SPARK-34565
> URL: https://issues.apache.org/jira/browse/SPARK-34565
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> The CollapseWindow optimizer rule can be imroved to also collapse Window 
> nodes, that have a Project between them. This sort of Window - Project - 
> Window chains will happen when chaining the dataframe.withColumn calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34565) Collapse Window nodes with Project between them

2021-02-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34565:


Assignee: Apache Spark

> Collapse Window nodes with Project between them
> ---
>
> Key: SPARK-34565
> URL: https://issues.apache.org/jira/browse/SPARK-34565
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Assignee: Apache Spark
>Priority: Major
>
> The CollapseWindow optimizer rule can be imroved to also collapse Window 
> nodes, that have a Project between them. This sort of Window - Project - 
> Window chains will happen when chaining the dataframe.withColumn calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34565) Collapse Window nodes with Project between them

2021-02-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292251#comment-17292251
 ] 

Apache Spark commented on SPARK-34565:
--

User 'tanelk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31677

> Collapse Window nodes with Project between them
> ---
>
> Key: SPARK-34565
> URL: https://issues.apache.org/jira/browse/SPARK-34565
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> The CollapseWindow optimizer rule can be imroved to also collapse Window 
> nodes, that have a Project between them. This sort of Window - Project - 
> Window chains will happen when chaining the dataframe.withColumn calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34562) Leverage parquet bloom filters

2021-02-27 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-34562:

Component/s: (was: Input/Output)
 SQL

> Leverage parquet bloom filters
> --
>
> Key: SPARK-34562
> URL: https://issues.apache.org/jira/browse/SPARK-34562
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Priority: Major
>
> The currently in-progress SPARK-34542 brings in parquet 1.12, which contains 
> PARQUET-41.
> From searching the issues, it seems there is no current tracker for this, 
> though I found a 
> [comment|https://issues.apache.org/jira/browse/SPARK-20901?focusedCommentId=17052473&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17052473]
>  from [~dongjoon] that points out the missing parquet support up until now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34562) Leverage parquet bloom filters

2021-02-27 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292256#comment-17292256
 ] 

Yuming Wang commented on SPARK-34562:
-

[~h-vetinari] This is an example to build bloom filter: 
https://issues.apache.org/jira/browse/PARQUET-41?focusedCommentId=17276854&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17276854

> Leverage parquet bloom filters
> --
>
> Key: SPARK-34562
> URL: https://issues.apache.org/jira/browse/SPARK-34562
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: H. Vetinari
>Priority: Major
>
> The currently in-progress SPARK-34542 brings in parquet 1.12, which contains 
> PARQUET-41.
> From searching the issues, it seems there is no current tracker for this, 
> though I found a 
> [comment|https://issues.apache.org/jira/browse/SPARK-20901?focusedCommentId=17052473&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17052473]
>  from [~dongjoon] that points out the missing parquet support up until now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34566) Fix typo error of `spark.launcher.childConectionTimeout`

2021-02-27 Thread angerszhu (Jira)
angerszhu created SPARK-34566:
-

 Summary: Fix typo error of `spark.launcher.childConectionTimeout`
 Key: SPARK-34566
 URL: https://issues.apache.org/jira/browse/SPARK-34566
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.0, 3.1.1
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34566) Fix typo error of `spark.launcher.childConectionTimeout`

2021-02-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292290#comment-17292290
 ] 

Apache Spark commented on SPARK-34566:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/31678

> Fix typo error of `spark.launcher.childConectionTimeout`
> 
>
> Key: SPARK-34566
> URL: https://issues.apache.org/jira/browse/SPARK-34566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34566) Fix typo error of `spark.launcher.childConectionTimeout`

2021-02-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34566:


Assignee: (was: Apache Spark)

> Fix typo error of `spark.launcher.childConectionTimeout`
> 
>
> Key: SPARK-34566
> URL: https://issues.apache.org/jira/browse/SPARK-34566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34566) Fix typo error of `spark.launcher.childConectionTimeout`

2021-02-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34566:


Assignee: Apache Spark

> Fix typo error of `spark.launcher.childConectionTimeout`
> 
>
> Key: SPARK-34566
> URL: https://issues.apache.org/jira/browse/SPARK-34566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34543) Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION

2021-02-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-34543:
--
Fix Version/s: 3.0.3

> Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION
> --
>
> Key: SPARK-34543
> URL: https://issues.apache.org/jira/browse/SPARK-34543
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.2, 3.1.1
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0, 3.1.2, 3.0.3
>
>
> SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config 
> *spark.sql.caseSensitive* which is false by default, for instance:
> {code:sql}
> spark-sql> CREATE TABLE tbl (id INT, part INT) PARTITIONED BY (part);
> spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0;
> spark-sql> SHOW TABLE EXTENDED LIKE 'tbl' PARTITION (part=0);
> Location: 
> file:/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0
> spark-sql> ALTER TABLE tbl ADD PARTITION (part=1);
> spark-sql> SELECT * FROM tbl;
> 0 0
> {code}
> Create new partition folder in the file system:
> {code}
> $ cp -r 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa
> {code}
> Set new location for the partition part=1:
> {code:sql}
> spark-sql> ALTER TABLE tbl PARTITION (part=1) SET LOCATION 
> '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa';
> spark-sql> SELECT * FROM tbl;
> 0 0
> 0 1
> spark-sql> ALTER TABLE tbl ADD PARTITION (PART=2);
> spark-sql> SELECT * FROM tbl;
> 0 0
> 0 1
> {code}
> Set location for a partition in the upper case:
> {code}
> $ cp -r 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb
> {code}
> {code:sql}
> spark-sql> ALTER TABLE tbl PARTITION (PART=2) SET LOCATION 
> '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb';
> Error in query: Partition spec is invalid. The spec (PART) must match the 
> partition spec (part) defined in table '`default`.`tbl`'
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34567) CreateTableAsSelect should have metrics update too

2021-02-27 Thread angerszhu (Jira)
angerszhu created SPARK-34567:
-

 Summary: CreateTableAsSelect should have metrics update too
 Key: SPARK-34567
 URL: https://issues.apache.org/jira/browse/SPARK-34567
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34567) CreateTableAsSelect should have metrics update too

2021-02-27 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292301#comment-17292301
 ] 

angerszhu commented on SPARK-34567:
---

raise a pr soon

> CreateTableAsSelect should have metrics update too
> --
>
> Key: SPARK-34567
> URL: https://issues.apache.org/jira/browse/SPARK-34567
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34567) CreateTableAsSelect should have metrics update too

2021-02-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34567:


Assignee: Apache Spark

> CreateTableAsSelect should have metrics update too
> --
>
> Key: SPARK-34567
> URL: https://issues.apache.org/jira/browse/SPARK-34567
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34567) CreateTableAsSelect should have metrics update too

2021-02-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292325#comment-17292325
 ] 

Apache Spark commented on SPARK-34567:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/31679

> CreateTableAsSelect should have metrics update too
> --
>
> Key: SPARK-34567
> URL: https://issues.apache.org/jira/browse/SPARK-34567
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34567) CreateTableAsSelect should have metrics update too

2021-02-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292324#comment-17292324
 ] 

Apache Spark commented on SPARK-34567:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/31679

> CreateTableAsSelect should have metrics update too
> --
>
> Key: SPARK-34567
> URL: https://issues.apache.org/jira/browse/SPARK-34567
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34567) CreateTableAsSelect should have metrics update too

2021-02-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34567:


Assignee: (was: Apache Spark)

> CreateTableAsSelect should have metrics update too
> --
>
> Key: SPARK-34567
> URL: https://issues.apache.org/jira/browse/SPARK-34567
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org