date:20210219

[jira] [Updated] (SPARK-34481) Refactor dataframe reader/writer path option logic

2021-02-19 Thread Yuchen Huo (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuchen Huo updated SPARK-34481:
---
Priority: Trivial  (was: Major)

> Refactor dataframe reader/writer path option logic
> --
>
> Key: SPARK-34481
> URL: https://issues.apache.org/jira/browse/SPARK-34481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuchen Huo
>Priority: Trivial
>
> Refactor the dataframe reader/writer logic so the path in options handling 
> logic has their own function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10816) EventTime based sessionization (session window)

2021-02-19 Thread Yuanjian Li (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287585#comment-17287585
 ] 

Yuanjian Li commented on SPARK-10816:
-

Great thanks for your heads up! [~viirya] [~kabhwan]
{quote}Now that there're two committers from different teams finding the 
feature as useful, looks like we could try pushing this out again.
{quote}
Big +1. Really excited to revive this feature with you. I'll also take some 
time to reload the old context soon.
{quote}Probably the code size is different because the design is actually quite 
different
{quote}
That's right. From my roughly investigation, The main difference list below:
 * State store format design: As Shixiong described in [this 
comment|https://issues.apache.org/jira/browse/SPARK-10816?focusedCommentId=16645370&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16645370],
 my approach is easy to implement but not scale well in the case of non-numeric 
aggregate.
 * The structure of the physical plan node: Jungtaek's approach leverages the 
aggregation iterator. My approach reused the way of `WindowExec`.

About authorship, really appreciate your trust [~kabhwan]! I can help with 
confirming with the co-authors. Comparing with other issues, I think this 
should be the easiest one and can be discussed at the end. :)

> EventTime based sessionization (session window)
> ---
>
> Key: SPARK-10816
> URL: https://issues.apache.org/jira/browse/SPARK-10816
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Reporter: Reynold Xin
>Priority: Major
> Attachments: SPARK-10816 Support session window natively.pdf, Session 
> Window Support For Structure Streaming.pdf
>
>
> Currently structured streaming supports two kinds of windows: tumbling window 
> and sliding window. Another useful window function is session window. Which 
> is not supported by SS.
> Unlike time window (tumbling window and sliding window), session window 
> doesn't have static window begin and end time. Session window creation 
> depends on defined session gap which can be static or dynamic.
> For static session gap, the events which are falling in a certain period of 
> time (gap) are considered as a session window. A session window closes when 
> it does not receive events for the gap. For dynamic gap, the gap could be 
> changed from event to event.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34481) Refactor dataframe reader/writer path option logic

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34481:


Assignee: (was: Apache Spark)

> Refactor dataframe reader/writer path option logic
> --
>
> Key: SPARK-34481
> URL: https://issues.apache.org/jira/browse/SPARK-34481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuchen Huo
>Priority: Major
>
> Refactor the dataframe reader/writer logic so the path in options handling 
> logic has their own function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34481) Refactor dataframe reader/writer path option logic

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287583#comment-17287583
 ] 

Apache Spark commented on SPARK-34481:
--

User 'yuchenhuo' has created a pull request for this issue:
https://github.com/apache/spark/pull/31599

> Refactor dataframe reader/writer path option logic
> --
>
> Key: SPARK-34481
> URL: https://issues.apache.org/jira/browse/SPARK-34481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuchen Huo
>Priority: Major
>
> Refactor the dataframe reader/writer logic so the path in options handling 
> logic has their own function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34481) Refactor dataframe reader/writer path option logic

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34481:


Assignee: Apache Spark

> Refactor dataframe reader/writer path option logic
> --
>
> Key: SPARK-34481
> URL: https://issues.apache.org/jira/browse/SPARK-34481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuchen Huo
>Assignee: Apache Spark
>Priority: Major
>
> Refactor the dataframe reader/writer logic so the path in options handling 
> logic has their own function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34481) Refactor dataframe reader/writer path option logic

2021-02-19 Thread Yuchen Huo (Jira)

Yuchen Huo created SPARK-34481:
--

 Summary: Refactor dataframe reader/writer path option logic
 Key: SPARK-34481
 URL: https://issues.apache.org/jira/browse/SPARK-34481
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Yuchen Huo


Refactor the dataframe reader/writer logic so the path in options handling 
logic has their own function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-34480) Module launcher build failed with profile hadoop-3.2 activated

2021-02-19 Thread Lichuanliang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lichuanliang closed SPARK-34480.


Please ignore this issue

> Module launcher build failed with profile hadoop-3.2 activated
> --
>
> Key: SPARK-34480
> URL: https://issues.apache.org/jira/browse/SPARK-34480
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 3.0.1
>Reporter: Lichuanliang
>Priority: Minor
>
> Build spark 3.0.1 with profile hadoop-3.2 activated
> {code:java}
> build/mvn -pl :spark-launcher_2.12 package -DskipTests -Phadoop-3.2 -Phive 
> -Phive-thriftserver -Pkubernetes
> {code}
> When building the spark-launcher module it complains that lacking the 
> common-lang dependency:
> {code:java}
> [INFO] --- scala-maven-plugin:4.3.0:compile (scala-compile-first) @ 
> spark-launcher_2.12 ---
> [INFO] Using incremental compilation using Mixed compile order
> [INFO] Compiler bridge file: 
> /Users/lichuanliang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.3.1-bin_2.12.10__52.0-1.3.1_20191012T045515.jar
> [INFO] Compiling 20 Java sources to 
> /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/target/scala-2.12/classes
>  ...
> [ERROR] [Error] 
> /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:20:
>  package org.apache.commons.lang does not exist
> [ERROR] [Error] 
> /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:226:
>  cannot find symbol
>   symbol:   variable StringUtils
>   location: class org.apache.spark.launcher.SparkSubmitCommandBuilder
> [ERROR] [Error] 
> /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:227:
>  cannot find symbol
>   symbol:   variable StringUtils
>   location: class org.apache.spark.launcher.SparkSubmitCommandBuilder
> [ERROR] [Error] 
> /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:232:
>  cannot find symbol
>   symbol:   variable StringUtils
>   location: class org.apache.spark.launcher.SparkSubmitCommandBuilder
> [INFO] 
> 
> [INFO] BUILD FAILURE
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34480) Module launcher build failed with profile hadoop-3.2 activated

2021-02-19 Thread Lichuanliang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lichuanliang resolved SPARK-34480.
--
Resolution: Fixed

> Module launcher build failed with profile hadoop-3.2 activated
> --
>
> Key: SPARK-34480
> URL: https://issues.apache.org/jira/browse/SPARK-34480
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 3.0.1
>Reporter: Lichuanliang
>Priority: Minor
>
> Build spark 3.0.1 with profile hadoop-3.2 activated
> {code:java}
> build/mvn -pl :spark-launcher_2.12 package -DskipTests -Phadoop-3.2 -Phive 
> -Phive-thriftserver -Pkubernetes
> {code}
> When building the spark-launcher module it complains that lacking the 
> common-lang dependency:
> {code:java}
> [INFO] --- scala-maven-plugin:4.3.0:compile (scala-compile-first) @ 
> spark-launcher_2.12 ---
> [INFO] Using incremental compilation using Mixed compile order
> [INFO] Compiler bridge file: 
> /Users/lichuanliang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.3.1-bin_2.12.10__52.0-1.3.1_20191012T045515.jar
> [INFO] Compiling 20 Java sources to 
> /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/target/scala-2.12/classes
>  ...
> [ERROR] [Error] 
> /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:20:
>  package org.apache.commons.lang does not exist
> [ERROR] [Error] 
> /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:226:
>  cannot find symbol
>   symbol:   variable StringUtils
>   location: class org.apache.spark.launcher.SparkSubmitCommandBuilder
> [ERROR] [Error] 
> /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:227:
>  cannot find symbol
>   symbol:   variable StringUtils
>   location: class org.apache.spark.launcher.SparkSubmitCommandBuilder
> [ERROR] [Error] 
> /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:232:
>  cannot find symbol
>   symbol:   variable StringUtils
>   location: class org.apache.spark.launcher.SparkSubmitCommandBuilder
> [INFO] 
> 
> [INFO] BUILD FAILURE
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34480) Module launcher build failed with profile hadoop-3.2 activated

2021-02-19 Thread Lichuanliang (Jira)

Lichuanliang created SPARK-34480:


 Summary: Module launcher build failed with profile hadoop-3.2 
activated
 Key: SPARK-34480
 URL: https://issues.apache.org/jira/browse/SPARK-34480
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 3.0.1
Reporter: Lichuanliang


Build spark 3.0.1 with profile hadoop-3.2 activated
{code:java}
build/mvn -pl :spark-launcher_2.12 package -DskipTests -Phadoop-3.2 -Phive 
-Phive-thriftserver -Pkubernetes
{code}
When building the spark-launcher module it complains that lacking the 
common-lang dependency:


{code:java}
[INFO] --- scala-maven-plugin:4.3.0:compile (scala-compile-first) @ 
spark-launcher_2.12 ---
[INFO] Using incremental compilation using Mixed compile order
[INFO] Compiler bridge file: 
/Users/lichuanliang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.3.1-bin_2.12.10__52.0-1.3.1_20191012T045515.jar
[INFO] Compiling 20 Java sources to 
/Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/target/scala-2.12/classes
 ...
[ERROR] [Error] 
/Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:20:
 package org.apache.commons.lang does not exist
[ERROR] [Error] 
/Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:226:
 cannot find symbol
  symbol:   variable StringUtils
  location: class org.apache.spark.launcher.SparkSubmitCommandBuilder
[ERROR] [Error] 
/Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:227:
 cannot find symbol
  symbol:   variable StringUtils
  location: class org.apache.spark.launcher.SparkSubmitCommandBuilder
[ERROR] [Error] 
/Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:232:
 cannot find symbol
  symbol:   variable StringUtils
  location: class org.apache.spark.launcher.SparkSubmitCommandBuilder
[INFO] 
[INFO] BUILD FAILURE

{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide

2021-02-19 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-34471.
--
Fix Version/s: 3.1.1
   Resolution: Fixed

Issue resolved by pull request 31590
[https://github.com/apache/spark/pull/31590]

> Document DataStreamReader/Writer table APIs in Structured Streaming 
> Programming Guide
> -
>
> Key: SPARK-34471
> URL: https://issues.apache.org/jira/browse/SPARK-34471
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Structured Streaming
>Affects Versions: 3.1.1
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
> Fix For: 3.1.1
>
>
> We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 
> and SPARK-33836.
> We need to update the Structured Streaming Programming Guide with the changes 
> above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide

2021-02-19 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-34471:


Assignee: Bo Zhang

> Document DataStreamReader/Writer table APIs in Structured Streaming 
> Programming Guide
> -
>
> Key: SPARK-34471
> URL: https://issues.apache.org/jira/browse/SPARK-34471
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Structured Streaming
>Affects Versions: 3.1.1
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
>
> We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 
> and SPARK-33836.
> We need to update the Structured Streaming Programming Guide with the changes 
> above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33602) Group exception messages in execution/datasources

2021-02-19 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287492#comment-17287492
 ] 

jiaan.geng commented on SPARK-33602:


ping [~allisonwang-db] Should we put AnalysisException in DataSource.scala into 
QueryCompilationErrors or QueryExecutionErrors ?

> Group exception messages in execution/datasources
> -
>
> Key: SPARK-33602
> URL: https://issues.apache.org/jira/browse/SPARK-33602
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources'
> || Filename||   Count ||
> | DataSource.scala|   9 |
> | DataSourceStrategy.scala|   1 |
> | DataSourceUtils.scala   |   2 |
> | FileFormat.scala|   1 |
> | FileFormatWriter.scala  |   3 |
> | FileScanRDD.scala   |   2 |
> | InsertIntoHadoopFsRelationCommand.scala |   2 |
> | PartitioningAwareFileIndex.scala|   1 |
> | PartitioningUtils.scala |   3 |
> | RecordReaderIterator.scala  |   1 |
> | rules.scala |   4 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile'
> || Filename   ||   Count ||
> | BinaryFileFormat.scala |   2 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc'
> || Filename  ||   Count ||
> | JDBCOptions.scala |   2 |
> | JdbcUtils.scala   |   6 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc'
> || Filename  ||   Count ||
> | OrcDeserializer.scala |   1 |
> | OrcFilters.scala  |   1 |
> | OrcSerializer.scala   |   1 |
> | OrcUtils.scala|   2 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet'
> || Filename ||   Count ||
> | ParquetFileFormat.scala  |   2 |
> | ParquetReadSupport.scala |   1 |
> | ParquetSchemaConverter.scala |   6 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-33601) Group exception messages in catalyst/parser

2021-02-19 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-33601:
---
Comment: was deleted

(was: I'm working on.)

> Group exception messages in catalyst/parser
> ---
>
> Key: SPARK-33601
> URL: https://issues.apache.org/jira/browse/SPARK-33601
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> '/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser'
> || Filename ||   Count ||
> | AstBuilder.scala |  36 |
> | LegacyTypeStringParser.scala |   1 |
> | ParseDriver.scala|   3 |
> | ParserUtils.scala|   4 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-33541) Group exception messages in catalyst/expressions

2021-02-19 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-33541:
---
Comment: was deleted

(was: I'm working on.)

> Group exception messages in catalyst/expressions
> 
>
> Key: SPARK-33541
> URL: https://issues.apache.org/jira/browse/SPARK-33541
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.2.0
>
>
> '/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions'
> || Filename  ||   Count ||
> | Cast.scala|  18 |
> | ExprUtils.scala   |   2 |
> | Expression.scala  |   8 |
> | InterpretedUnsafeProjection.scala |   1 |
> | ScalaUDF.scala|   2 |
> | SelectedField.scala   |   3 |
> | SubExprEvaluationRuntime.scala|   1 |
> | arithmetic.scala  |   8 |
> | collectionOperations.scala|   4 |
> | complexTypeExtractors.scala   |   3 |
> | csvExpressions.scala  |   3 |
> | datetimeExpressions.scala |   4 |
> | decimalExpressions.scala  |   2 |
> | generators.scala  |   2 |
> | higherOrderFunctions.scala|   6 |
> | jsonExpressions.scala |   2 |
> | literals.scala|   3 |
> | misc.scala|   2 |
> | namedExpressions.scala|   1 |
> | ordering.scala|   1 |
> | package.scala |   1 |
> | regexpExpressions.scala   |   1 |
> | stringExpressions.scala   |   1 |
> | windowExpressions.scala   |   5 |
> '/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate'
> || Filename||   Count ||
> | ApproximatePercentile.scala |   2 |
> | HyperLogLogPlusPlus.scala   |   1 |
> | Percentile.scala|   1 |
> | interfaces.scala|   2 |
> '/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen'
> || Filename||   Count ||
> | CodeGenerator.scala |   5 |
> | javaCode.scala  |   1 |
> '/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects'
> || Filename  ||   Count ||
> | objects.scala |  12 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33602) Group exception messages in execution/datasources

2021-02-19 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287485#comment-17287485
 ] 

jiaan.geng commented on SPARK-33602:


I'm working on.

> Group exception messages in execution/datasources
> -
>
> Key: SPARK-33602
> URL: https://issues.apache.org/jira/browse/SPARK-33602
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources'
> || Filename||   Count ||
> | DataSource.scala|   9 |
> | DataSourceStrategy.scala|   1 |
> | DataSourceUtils.scala   |   2 |
> | FileFormat.scala|   1 |
> | FileFormatWriter.scala  |   3 |
> | FileScanRDD.scala   |   2 |
> | InsertIntoHadoopFsRelationCommand.scala |   2 |
> | PartitioningAwareFileIndex.scala|   1 |
> | PartitioningUtils.scala |   3 |
> | RecordReaderIterator.scala  |   1 |
> | rules.scala |   4 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile'
> || Filename   ||   Count ||
> | BinaryFileFormat.scala |   2 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc'
> || Filename  ||   Count ||
> | JDBCOptions.scala |   2 |
> | JdbcUtils.scala   |   6 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc'
> || Filename  ||   Count ||
> | OrcDeserializer.scala |   1 |
> | OrcFilters.scala  |   1 |
> | OrcSerializer.scala   |   1 |
> | OrcUtils.scala|   2 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet'
> || Filename ||   Count ||
> | ParquetFileFormat.scala  |   2 |
> | ParquetReadSupport.scala |   1 |
> | ParquetSchemaConverter.scala |   6 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-33542) Group exception messages in catalyst/catalog

2021-02-19 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-33542:
---
Comment: was deleted

(was: I'm working on.)

> Group exception messages in catalyst/catalog
> 
>
> Key: SPARK-33542
> URL: https://issues.apache.org/jira/browse/SPARK-33542
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.2.0
>
>
> Group all exception messages in 
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog.
> ||Filename||Count||
> |ExternalCatalog.scala|4|
> |GlobalTempViewManager.scala|1|
> |InMemoryCatalog.scala|18|
> |SessionCatalog.scala|17|
> |functionResources.scala|1|
> |interface.scala|4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-33599) Group exception messages in catalyst/analysis

2021-02-19 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-33599:
---
Comment: was deleted

(was: I'm working on.)

> Group exception messages in catalyst/analysis
> -
>
> Key: SPARK-33599
> URL: https://issues.apache.org/jira/browse/SPARK-33599
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.2.0
>
>
> '/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis'
> || Filename   ||   Count ||
> | Analyzer.scala |   1 |
> | CheckAnalysis.scala|   1 |
> | FunctionRegistry.scala |   5 |
> | ResolveCatalogs.scala  |   1 |
> | ResolveHints.scala |   1 |
> | package.scala  |   2 |
> | unresolved.scala   |  43 |
> '/core/src/main/scala/org/apache/spark/sql/catalyst/analysis'
> || Filename||   Count ||
> | ResolveSessionCatalog.scala |  12 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34449) Upgrade Jetty to fix CVE-2020-27218

2021-02-19 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-34449.
--
Fix Version/s: 3.1.2
   3.0.3
   2.4.8
   Resolution: Fixed

Issue resolved by pull request 31583
[https://github.com/apache/spark/pull/31583]

> Upgrade Jetty to fix CVE-2020-27218
> ---
>
> Key: SPARK-34449
> URL: https://issues.apache.org/jira/browse/SPARK-34449
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.7, 3.0.1, 3.2.0, 3.1.1
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 2.4.8, 3.0.3, 3.1.2
>
>
> CVE-2020-27218 affects the currently used Jetty 9.4.34 so let's upgrade it.
> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27218.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34478) Ignore or reject wrong config when start sparksession

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34478:


Assignee: Apache Spark

> Ignore or reject wrong config when start sparksession
> -
>
> Key: SPARK-34478
> URL: https://issues.apache.org/jira/browse/SPARK-34478
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> When use 
> {code:java}
> SparkSession.builder().config()
> {code}
> In this method user may config `spark.driver.memory`. But when we run this 
> code, jvm is started,  so this configuration won't work and in Spark UI, it 
> will show as this configuration. 
> So we should ignore such as wrong way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34478) Ignore or reject wrong config when start sparksession

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287478#comment-17287478
 ] 

Apache Spark commented on SPARK-34478:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/31598

> Ignore or reject wrong config when start sparksession
> -
>
> Key: SPARK-34478
> URL: https://issues.apache.org/jira/browse/SPARK-34478
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> When use 
> {code:java}
> SparkSession.builder().config()
> {code}
> In this method user may config `spark.driver.memory`. But when we run this 
> code, jvm is started,  so this configuration won't work and in Spark UI, it 
> will show as this configuration. 
> So we should ignore such as wrong way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34478) Ignore or reject wrong config when start sparksession

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34478:


Assignee: (was: Apache Spark)

> Ignore or reject wrong config when start sparksession
> -
>
> Key: SPARK-34478
> URL: https://issues.apache.org/jira/browse/SPARK-34478
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> When use 
> {code:java}
> SparkSession.builder().config()
> {code}
> In this method user may config `spark.driver.memory`. But when we run this 
> code, jvm is started,  so this configuration won't work and in Spark UI, it 
> will show as this configuration. 
> So we should ignore such as wrong way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34478) Ignore or reject wrong config when start sparksession

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287479#comment-17287479
 ] 

Apache Spark commented on SPARK-34478:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/31598

> Ignore or reject wrong config when start sparksession
> -
>
> Key: SPARK-34478
> URL: https://issues.apache.org/jira/browse/SPARK-34478
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> When use 
> {code:java}
> SparkSession.builder().config()
> {code}
> In this method user may config `spark.driver.memory`. But when we run this 
> code, jvm is started,  so this configuration won't work and in Spark UI, it 
> will show as this configuration. 
> So we should ignore such as wrong way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34477) Kryo NPEs when serializing Avro GenericData objects (except GenericRecord)

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34477:


Assignee: (was: Apache Spark)

> Kryo NPEs when serializing Avro GenericData objects (except GenericRecord) 
> ---
>
> Key: SPARK-34477
> URL: https://issues.apache.org/jira/browse/SPARK-34477
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Shardul Mahadik
>Priority: Major
>
> SPARK-746 added KryoSerializer for GenericRecord and GenericData.Record Avro 
> objects. However, Kryo serialization of other GenericData types like array, 
> enum and fixed fails. Note that if such objects are within a GenericRecord, 
> then current code works. However if these types are top level objects we want 
> to distribute, then Kryo fails.
> We should register KryoSerializer(s) for these GenericData types.
> Code to reproduce:
> {code:scala}
> import org.apache.avro.{Schema, SchemaBuilder}
> import org.apache.avro.generic.GenericData.Array
> val arraySchema = SchemaBuilder.array().items().intType()
> val array = new Array[Integer](1, arraySchema)
> array.add(1)
> sc.parallelize((0 until 10).map((_, array)), 2).collect
> {code}
> Similar code can be written for enums and fixed types
> Errors:
>  GenericData.Array
> {code:java}
> java.io.IOException: java.lang.NullPointerException
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1410)
>   at 
> org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:69)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>   at org.apache.avro.generic.GenericData$Array.add(GenericData.java:383)
>   at java.util.AbstractList.add(AbstractList.java:108)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
>   at 
> com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:35)
>   at 
> com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:23)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
>   at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:303)
>   at 
> org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2(ParallelCollectionRDD.scala:79)
>   at 
> org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2$adapted(ParallelCollectionRDD.scala:79)
>   at 
> org.apache.spark.util.Utils$.deserializeViaNestedStream(Utils.scala:171)
>   at 
> org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$1(ParallelCollection

[jira] [Commented] (SPARK-34477) Kryo NPEs when serializing Avro GenericData objects (except GenericRecord)

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287474#comment-17287474
 ] 

Apache Spark commented on SPARK-34477:
--

User 'shardulm94' has created a pull request for this issue:
https://github.com/apache/spark/pull/31597

> Kryo NPEs when serializing Avro GenericData objects (except GenericRecord) 
> ---
>
> Key: SPARK-34477
> URL: https://issues.apache.org/jira/browse/SPARK-34477
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Shardul Mahadik
>Priority: Major
>
> SPARK-746 added KryoSerializer for GenericRecord and GenericData.Record Avro 
> objects. However, Kryo serialization of other GenericData types like array, 
> enum and fixed fails. Note that if such objects are within a GenericRecord, 
> then current code works. However if these types are top level objects we want 
> to distribute, then Kryo fails.
> We should register KryoSerializer(s) for these GenericData types.
> Code to reproduce:
> {code:scala}
> import org.apache.avro.{Schema, SchemaBuilder}
> import org.apache.avro.generic.GenericData.Array
> val arraySchema = SchemaBuilder.array().items().intType()
> val array = new Array[Integer](1, arraySchema)
> array.add(1)
> sc.parallelize((0 until 10).map((_, array)), 2).collect
> {code}
> Similar code can be written for enums and fixed types
> Errors:
>  GenericData.Array
> {code:java}
> java.io.IOException: java.lang.NullPointerException
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1410)
>   at 
> org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:69)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>   at org.apache.avro.generic.GenericData$Array.add(GenericData.java:383)
>   at java.util.AbstractList.add(AbstractList.java:108)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
>   at 
> com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:35)
>   at 
> com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:23)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
>   at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:303)
>   at 
> org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2(ParallelCollectionRDD.scala:79)
>   at 
> org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2$adapted(ParallelCollectionRDD.scala:79)
>   at 
> org.apache.spark.util.Utils$.deserializeViaNestedStream(Uti

[jira] [Assigned] (SPARK-34477) Kryo NPEs when serializing Avro GenericData objects (except GenericRecord)

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34477:


Assignee: Apache Spark

> Kryo NPEs when serializing Avro GenericData objects (except GenericRecord) 
> ---
>
> Key: SPARK-34477
> URL: https://issues.apache.org/jira/browse/SPARK-34477
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Shardul Mahadik
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-746 added KryoSerializer for GenericRecord and GenericData.Record Avro 
> objects. However, Kryo serialization of other GenericData types like array, 
> enum and fixed fails. Note that if such objects are within a GenericRecord, 
> then current code works. However if these types are top level objects we want 
> to distribute, then Kryo fails.
> We should register KryoSerializer(s) for these GenericData types.
> Code to reproduce:
> {code:scala}
> import org.apache.avro.{Schema, SchemaBuilder}
> import org.apache.avro.generic.GenericData.Array
> val arraySchema = SchemaBuilder.array().items().intType()
> val array = new Array[Integer](1, arraySchema)
> array.add(1)
> sc.parallelize((0 until 10).map((_, array)), 2).collect
> {code}
> Similar code can be written for enums and fixed types
> Errors:
>  GenericData.Array
> {code:java}
> java.io.IOException: java.lang.NullPointerException
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1410)
>   at 
> org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:69)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>   at org.apache.avro.generic.GenericData$Array.add(GenericData.java:383)
>   at java.util.AbstractList.add(AbstractList.java:108)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
>   at 
> com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:35)
>   at 
> com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:23)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
>   at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:303)
>   at 
> org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2(ParallelCollectionRDD.scala:79)
>   at 
> org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2$adapted(ParallelCollectionRDD.scala:79)
>   at 
> org.apache.spark.util.Utils$.deserializeViaNestedStream(Utils.scala:171)
>   at 
> org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readOb

[jira] [Comment Edited] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec

2021-02-19 Thread Yuming Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287463#comment-17287463
 ] 

Yuming Wang edited comment on SPARK-34479 at 2/20/21, 3:02 AM:
---

But zstd 1.4.5-12 is not compatible with 1.4.8-4.
https://github.com/apache/avro/blob/release-1.10.1/lang/java/pom.xml#L64
https://github.com/apache/spark/blob/331c6fd4efcb337d903b7179b05997dca2dae2a8/pom.xml#L703
{noformat}
Caused by: java.lang.NoSuchMethodError: 
com.github.luben.zstd.ZstdOutputStream.setCloseFrameOnFlush(Z)Lcom/github/luben/zstd/ZstdOutputStream;
at org.apache.avro.file.ZstandardLoader.output(ZstandardLoader.java:40)
at org.apache.avro.file.ZstandardCodec.compress(ZstandardCodec.java:67)
at 
org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:386)
at 
org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:407)
at org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:428)
at org.apache.avro.file.DataFileWriter.flush(DataFileWriter.java:437)
at org.apache.avro.file.DataFileWriter.close(DataFileWriter.java:460)
at 
org.apache.spark.sql.avro.SparkAvroKeyRecordWriter.close(SparkAvroKeyOutputFormat.java:88)
at 
org.apache.spark.sql.avro.AvroOutputWriter.close(AvroOutputWriter.scala:86)
at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:58)
at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:75)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:281)
{noformat}

[~iemejia] May be we need to release Avro 1.10.2 or 1.11.0.


was (Author: q79969786):
But zstd 1.4.5-12 is not compatible with 1.4.8-4.
https://github.com/apache/avro/blob/release-1.10.1/lang/java/pom.xml#L64
https://github.com/apache/spark/blob/331c6fd4efcb337d903b7179b05997dca2dae2a8/pom.xml#L703

[~iemejia] May be we need to release Avro 1.10.2 or 1.11.0.


{noformat}
Caused by: java.lang.NoSuchMethodError: 
com.github.luben.zstd.ZstdOutputStream.setCloseFrameOnFlush(Z)Lcom/github/luben/zstd/ZstdOutputStream;
at org.apache.avro.file.ZstandardLoader.output(ZstandardLoader.java:40)
at org.apache.avro.file.ZstandardCodec.compress(ZstandardCodec.java:67)
at 
org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:386)
at 
org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:407)
at org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:428)
at org.apache.avro.file.DataFileWriter.flush(DataFileWriter.java:437)
at org.apache.avro.file.DataFileWriter.close(DataFileWriter.java:460)
at 
org.apache.spark.sql.avro.SparkAvroKeyRecordWriter.close(SparkAvroKeyOutputFormat.java:88)
at 
org.apache.spark.sql.avro.AvroOutputWriter.close(AvroOutputWriter.scala:86)
at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:58)
at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:75)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:281)
{noformat}


> Add zstandard codec to spark.sql.avro.compression.codec
> ---
>
> Key: SPARK-34479
> URL: https://issues.apache.org/jira/browse/SPARK-34479
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Avro add zstandard codec since AVRO-2195.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec

2021-02-19 Thread Yuming Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287463#comment-17287463
 ] 

Yuming Wang edited comment on SPARK-34479 at 2/20/21, 3:02 AM:
---

But zstd 1.4.5-12 is not compatible with 1.4.8-4.
https://github.com/apache/avro/blob/release-1.10.1/lang/java/pom.xml#L64
https://github.com/apache/spark/blob/331c6fd4efcb337d903b7179b05997dca2dae2a8/pom.xml#L703

[~iemejia] May be we need to release Avro 1.10.2 or 1.11.0.


{noformat}
Caused by: java.lang.NoSuchMethodError: 
com.github.luben.zstd.ZstdOutputStream.setCloseFrameOnFlush(Z)Lcom/github/luben/zstd/ZstdOutputStream;
at org.apache.avro.file.ZstandardLoader.output(ZstandardLoader.java:40)
at org.apache.avro.file.ZstandardCodec.compress(ZstandardCodec.java:67)
at 
org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:386)
at 
org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:407)
at org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:428)
at org.apache.avro.file.DataFileWriter.flush(DataFileWriter.java:437)
at org.apache.avro.file.DataFileWriter.close(DataFileWriter.java:460)
at 
org.apache.spark.sql.avro.SparkAvroKeyRecordWriter.close(SparkAvroKeyOutputFormat.java:88)
at 
org.apache.spark.sql.avro.AvroOutputWriter.close(AvroOutputWriter.scala:86)
at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:58)
at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:75)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:281)
{noformat}



was (Author: q79969786):
But zstd 1.4.5-12 is not compatible with 1.4.8-4.
https://github.com/apache/avro/blob/release-1.10.1/lang/java/pom.xml#L64
https://github.com/apache/spark/blob/331c6fd4efcb337d903b7179b05997dca2dae2a8/pom.xml#L703

[~iemejia] May be we need to release Avro 1.10.2 or 1.11.0.

> Add zstandard codec to spark.sql.avro.compression.codec
> ---
>
> Key: SPARK-34479
> URL: https://issues.apache.org/jira/browse/SPARK-34479
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Avro add zstandard codec since AVRO-2195.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec

2021-02-19 Thread Yuming Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287463#comment-17287463
 ] 

Yuming Wang commented on SPARK-34479:
-

But zstd 1.4.5-12 is not compatible with 1.4.8-4.
https://github.com/apache/avro/blob/release-1.10.1/lang/java/pom.xml#L64
https://github.com/apache/spark/blob/331c6fd4efcb337d903b7179b05997dca2dae2a8/pom.xml#L703

[~iemejia] May be we need to release Avro 1.10.2 or 1.11.0.

> Add zstandard codec to spark.sql.avro.compression.codec
> ---
>
> Key: SPARK-34479
> URL: https://issues.apache.org/jira/browse/SPARK-34479
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Avro add zstandard codec since AVRO-2195.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec

2021-02-19 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-34479:

Description: Avro add zstandard codec since AVRO-2195.  (was: Avro add 
AVRO-2195)

> Add zstandard codec to spark.sql.avro.compression.codec
> ---
>
> Key: SPARK-34479
> URL: https://issues.apache.org/jira/browse/SPARK-34479
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Avro add zstandard codec since AVRO-2195.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec

2021-02-19 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-34479:

Description: Avro add AVRO-2195

> Add zstandard codec to spark.sql.avro.compression.codec
> ---
>
> Key: SPARK-34479
> URL: https://issues.apache.org/jira/browse/SPARK-34479
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Avro add AVRO-2195



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec

2021-02-19 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-34479:
---

 Summary: Add zstandard codec to spark.sql.avro.compression.codec
 Key: SPARK-34479
 URL: https://issues.apache.org/jira/browse/SPARK-34479
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34478) Ignore or reject wrong config when start sparksession

2021-02-19 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287459#comment-17287459
 ] 

angerszhu commented on SPARK-34478:
---

Raise a pr soon

> Ignore or reject wrong config when start sparksession
> -
>
> Key: SPARK-34478
> URL: https://issues.apache.org/jira/browse/SPARK-34478
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> When use 
> {code:java}
> SparkSession.builder().config()
> {code}
> In this method user may config `spark.driver.memory`. But when we run this 
> code, jvm is started,  so this configuration won't work and in Spark UI, it 
> will show as this configuration. 
> So we should ignore such as wrong way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34478) Ignore or reject wrong config when start sparksession

2021-02-19 Thread angerszhu (Jira)

angerszhu created SPARK-34478:
-

 Summary: Ignore or reject wrong config when start sparksession
 Key: SPARK-34478
 URL: https://issues.apache.org/jira/browse/SPARK-34478
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 3.2.0
Reporter: angerszhu


When use 
{code:java}
SparkSession.builder().config()
{code}
In this method user may config `spark.driver.memory`. But when we run this 
code, jvm is started,  so this configuration won't work and in Spark UI, it 
will show as this configuration. 

So we should ignore such as wrong way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34477) Kryo NPEs when serializing Avro GenericData objects (except GenericRecord)

2021-02-19 Thread Shardul Mahadik (Jira)

Shardul Mahadik created SPARK-34477:
---

 Summary: Kryo NPEs when serializing Avro GenericData objects 
(except GenericRecord) 
 Key: SPARK-34477
 URL: https://issues.apache.org/jira/browse/SPARK-34477
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0, 2.0.0
Reporter: Shardul Mahadik


SPARK-746 added KryoSerializer for GenericRecord and GenericData.Record Avro 
objects. However, Kryo serialization of other GenericData types like array, 
enum and fixed fails. Note that if such objects are within a GenericRecord, 
then current code works. However if these types are top level objects we want 
to distribute, then Kryo fails.

We should register KryoSerializer(s) for these GenericData types.

Code to reproduce:
{code:scala}
import org.apache.avro.{Schema, SchemaBuilder}
import org.apache.avro.generic.GenericData.Array

val arraySchema = SchemaBuilder.array().items().intType()
val array = new Array[Integer](1, arraySchema)
array.add(1)

sc.parallelize((0 until 10).map((_, array)), 2).collect
{code}
Similar code can be written for enums and fixed types

Errors:
 GenericData.Array
{code:java}
java.io.IOException: java.lang.NullPointerException
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1410)
at 
org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at org.apache.avro.generic.GenericData$Array.add(GenericData.java:383)
at java.util.AbstractList.add(AbstractList.java:108)
at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
at 
com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:35)
at 
com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:23)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:303)
at 
org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2(ParallelCollectionRDD.scala:79)
at 
org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2$adapted(ParallelCollectionRDD.scala:79)
at 
org.apache.spark.util.Utils$.deserializeViaNestedStream(Utils.scala:171)
at 
org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$1(ParallelCollectionRDD.scala:79)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1403)
... 20 more
{code}
GenericData.EnumSymbol
{code:java}
com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
props (org.apac

[jira] [Commented] (SPARK-25075) Build and test Spark against Scala 2.13

2021-02-19 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287437#comment-17287437
 ] 

Dongjoon Hyun commented on SPARK-25075:
---

[~MasseGuillaume]. Feel free to create a new independent Jira issue if you 
think that's a problem.

> Build and test Spark against Scala 2.13
> ---
>
> Key: SPARK-25075
> URL: https://issues.apache.org/jira/browse/SPARK-25075
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, MLlib, Project Infra, Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Guillaume Massé
>Priority: Major
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.13 milestone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences

2021-02-19 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287403#comment-17287403
 ] 

Ted Yu commented on SPARK-34476:


The basic jsonb test is here:
https://github.com/yugabyte/yugabyte-db/blob/master/java/yb-cql-4x/src/test/java/org/yb/loadtest/TestSpark3Jsonb.java

I am working on adding get_json_string() function (via Spark extension) which 
is similar to get_json_object() but expands the last jsonb field using '->>' 
instead of '->'.

> Duplicate referenceNames are given for ambiguousReferences
> --
>
> Key: SPARK-34476
> URL: https://issues.apache.org/jira/browse/SPARK-34476
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ted Yu
>Priority: Major
>
> When running test with Spark extension that converts custom function to json 
> path expression, I saw the following in test output:
> {code}
> 2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
> org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is 
> == Physical Plan ==
> org.apache.spark.sql.AnalysisException: Reference 
> 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
> {code}
> Please note the candidates following 'could be' are the same.
> Here is the physical plan for a working query where phone is a jsonb column:
> {code}
> TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
> output=[id#6,address#7,key#0])
> +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
>+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
> Scan: test.person
>  - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
>  - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
> {code}
> The difference for the failed query is that it tries to use 
> {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as 
> part of filter).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences

2021-02-19 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-34476:
---
Description: 
When running test with Spark extension that converts custom function to json 
path expression, I saw the following in test output:
{code}
2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == 
Physical Plan ==
org.apache.spark.sql.AnalysisException: Reference 
'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
{code}
Please note the candidates following 'could be' are the same.
Here is the physical plan for a working query where phone is a jsonb column:
{code}
TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
output=[id#6,address#7,key#0])
+- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
   +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
Scan: test.person
 - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
 - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
{code}
The difference for the failed query is that it tries to use 
{code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as 
part of filter).

  was:
When running test with Spark extension that converts custom function to json 
path expression, I saw the following in test output:
{code}
2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == 
Physical Plan ==
org.apache.spark.sql.AnalysisException: Reference 
'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
{code}
Please note the candidates following 'could be' are the same.
Here is the physical plan for a working query where phone is a jsonb column:
{code}
TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
output=[id#6,address#7,key#0])
+- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
   +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
Scan: test.person
 - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
 - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
{code}
The difference for the failed query is that it tries to use 
phone->'key'->1->'m'->2->>'b' in the projection (which works as part of filter).


> Duplicate referenceNames are given for ambiguousReferences
> --
>
> Key: SPARK-34476
> URL: https://issues.apache.org/jira/browse/SPARK-34476
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ted Yu
>Priority: Major
>
> When running test with Spark extension that converts custom function to json 
> path expression, I saw the following in test output:
> {code}
> 2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
> org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is 
> == Physical Plan ==
> org.apache.spark.sql.AnalysisException: Reference 
> 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
> {code}
> Please note the candidates following 'could be' are the same.
> Here is the physical plan for a working query where phone is a jsonb column:
> {code}
> TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
> output=[id#6,address#7,key#0])
> +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
>+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
> Scan: test.person
>  - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
>  - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
> {code}
> The difference for the failed query is that it tries to use 
> {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as 
> part of filter).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences

2021-02-19 Thread Ted Yu (Jira)

Ted Yu created SPARK-34476:
--

 Summary: Duplicate referenceNames are given for ambiguousReferences
 Key: SPARK-34476
 URL: https://issues.apache.org/jira/browse/SPARK-34476
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Ted Yu


When running test with Spark extension that converts custom function to json 
path expression, I saw the following in test output:
{code}
2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == 
Physical Plan ==
org.apache.spark.sql.AnalysisException: Reference 
'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
{code}
Please note the candidates following 'could be' are the same.
Here is the physical plan for a working query where phone is a jsonb column:
{code}
TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
output=[id#6,address#7,key#0])
+- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
   +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
Scan: test.person
 - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
 - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
{code}
The difference for the failed query is that it tries to use 
phone->'key'->1->'m'->2->>'b' in the projection (which works as part of filter).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-24818) Ensure all the barrier tasks in the same stage are launched together

2021-02-19 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-24818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-24818:
---

Assignee: wuyi

> Ensure all the barrier tasks in the same stage are launched together
> 
>
> Key: SPARK-24818
> URL: https://issues.apache.org/jira/browse/SPARK-24818
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Xingbo Jiang
>Assignee: wuyi
>Priority: Major
> Fix For: 3.2.0
>
>
> When some executors/hosts are blacklisted, it may happen that only a part of 
> the tasks in the same barrier stage can be launched. We shall detect the case 
> and revert the allocated resource offers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24818) Ensure all the barrier tasks in the same stage are launched together

2021-02-19 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-24818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-24818.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30650
[https://github.com/apache/spark/pull/30650]

> Ensure all the barrier tasks in the same stage are launched together
> 
>
> Key: SPARK-24818
> URL: https://issues.apache.org/jira/browse/SPARK-24818
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Xingbo Jiang
>Priority: Major
> Fix For: 3.2.0
>
>
> When some executors/hosts are blacklisted, it may happen that only a part of 
> the tasks in the same barrier stage can be launched. We shall detect the case 
> and revert the allocated resource offers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34475) Rename v2 logical nodes

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34475:


Assignee: Maxim Gekk  (was: Apache Spark)

> Rename v2 logical nodes
> ---
>
> Key: SPARK-34475
> URL: https://issues.apache.org/jira/browse/SPARK-34475
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Rename v2 logical nodes for simplicity in the form:  +  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34475) Rename v2 logical nodes

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287340#comment-17287340
 ] 

Apache Spark commented on SPARK-34475:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31596

> Rename v2 logical nodes
> ---
>
> Key: SPARK-34475
> URL: https://issues.apache.org/jira/browse/SPARK-34475
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Rename v2 logical nodes for simplicity in the form:  +  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34475) Rename v2 logical nodes

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34475:


Assignee: Apache Spark  (was: Maxim Gekk)

> Rename v2 logical nodes
> ---
>
> Key: SPARK-34475
> URL: https://issues.apache.org/jira/browse/SPARK-34475
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> Rename v2 logical nodes for simplicity in the form:  +  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34475) Rename v2 logical nodes

2021-02-19 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34475:
---
Description: Rename v2 logical nodes for simplicity in the form:  + 
   (was: To be consistent with other exec nodes, rename:
* AlterTableAddPartitionExec -> AddPartitionExec
* AlterTableRenamePartitionExec -> RenamePartitionExec 
* AlterTableDropPartitionExec -> DropPartitionExec)

> Rename v2 logical nodes
> ---
>
> Key: SPARK-34475
> URL: https://issues.apache.org/jira/browse/SPARK-34475
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Rename v2 logical nodes for simplicity in the form:  +  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34475) Rename v2 logical nodes

2021-02-19 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34475:
--

 Summary: Rename v2 logical nodes
 Key: SPARK-34475
 URL: https://issues.apache.org/jira/browse/SPARK-34475
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


To be consistent with other exec nodes, rename:
* AlterTableAddPartitionExec -> AddPartitionExec
* AlterTableRenamePartitionExec -> RenamePartitionExec 
* AlterTableDropPartitionExec -> DropPartitionExec



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34474) Remove unnecessary Union under Distinct like operators

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34474:


Assignee: L. C. Hsieh  (was: Apache Spark)

> Remove unnecessary Union under Distinct like operators
> --
>
> Key: SPARK-34474
> URL: https://issues.apache.org/jira/browse/SPARK-34474
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> For an Union under Distinct like operators, if its children are all the same, 
> we can just keep one among them and remove the Union.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34474) Remove unnecessary Union under Distinct like operators

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34474:


Assignee: Apache Spark  (was: L. C. Hsieh)

> Remove unnecessary Union under Distinct like operators
> --
>
> Key: SPARK-34474
> URL: https://issues.apache.org/jira/browse/SPARK-34474
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> For an Union under Distinct like operators, if its children are all the same, 
> we can just keep one among them and remove the Union.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34474) Remove unnecessary Union under Distinct like operators

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287320#comment-17287320
 ] 

Apache Spark commented on SPARK-34474:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/31595

> Remove unnecessary Union under Distinct like operators
> --
>
> Key: SPARK-34474
> URL: https://issues.apache.org/jira/browse/SPARK-34474
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> For an Union under Distinct like operators, if its children are all the same, 
> we can just keep one among them and remove the Union.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34474) Remove unnecessary Union under Distinct like operators

2021-02-19 Thread L. C. Hsieh (Jira)

L. C. Hsieh created SPARK-34474:
---

 Summary: Remove unnecessary Union under Distinct like operators
 Key: SPARK-34474
 URL: https://issues.apache.org/jira/browse/SPARK-34474
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: L. C. Hsieh
Assignee: L. C. Hsieh


For an Union under Distinct like operators, if its children are all the same, 
we can just keep one among them and remove the Union.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34424) HiveOrcHadoopFsRelationSuite fails with seed 610710213676

2021-02-19 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-34424:
--
Fix Version/s: (was: 3.0.2)
   3.0.3

> HiveOrcHadoopFsRelationSuite fails with seed 610710213676
> -
>
> Key: SPARK-34424
> URL: https://issues.apache.org/jira/browse/SPARK-34424
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.1, 3.0.3
>
>
> The test "test all data types" in HiveOrcHadoopFsRelationSuite fails with:
> {code:java}
> == Results ==
> !== Correct Answer - 20 ==== Spark Answer - 20 ==
>  struct   struct
>  [1,1582-10-15]   [1,1582-10-15]
>  [2,null] [2,null]
>  [3,1970-01-01]   [3,1970-01-01]
>  [4,1681-08-06]   [4,1681-08-06]
>  [5,1582-10-15]   [5,1582-10-15]
>  [6,-12-31]   [6,-12-31]
>  [7,0583-01-04]   [7,0583-01-04]
>  [8,6077-03-04]   [8,6077-03-04]
> ![9,1582-10-06]   [9,1582-10-15]
>  [10,1582-10-15]  [10,1582-10-15]
>  [11,-12-31]  [11,-12-31]
>  [12,9722-10-04]  [12,9722-10-04]
>  [13,0243-12-19]  [13,0243-12-19]
>  [14,-12-31]  [14,-12-31]
>  [15,8743-01-24]  [15,8743-01-24]
>  [16,1039-10-31]  [16,1039-10-31]
>  [17,-12-31]  [17,-12-31]
>  [18,1582-10-15]  [18,1582-10-15]
>  [19,1582-10-15]  [19,1582-10-15]
>  [20,1582-10-15]  [20,1582-10-15]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34468) Fix v2 ALTER TABLE .. RENAME TO

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34468:


Assignee: Apache Spark

> Fix v2 ALTER TABLE .. RENAME TO
> ---
>
> Key: SPARK-34468
> URL: https://issues.apache.org/jira/browse/SPARK-34468
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The v2 `ALTER TABLE .. RENAME TO` command should rename a table in-place 
> instead of moving it to the "root" namespace:
> {code:scala}
> sql("ALTER TABLE ns1.ns2.ns3.src_tbl RENAME TO dst_tbl")
> sql(s"SHOW TABLES IN $catalog").show(false)
> +-+-+---+
> |namespace|tableName|isTemporary|
> +-+-+---+
> | |dst_tbl  |false  |
> +-+-+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34468) Fix v2 ALTER TABLE .. RENAME TO

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34468:


Assignee: (was: Apache Spark)

> Fix v2 ALTER TABLE .. RENAME TO
> ---
>
> Key: SPARK-34468
> URL: https://issues.apache.org/jira/browse/SPARK-34468
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The v2 `ALTER TABLE .. RENAME TO` command should rename a table in-place 
> instead of moving it to the "root" namespace:
> {code:scala}
> sql("ALTER TABLE ns1.ns2.ns3.src_tbl RENAME TO dst_tbl")
> sql(s"SHOW TABLES IN $catalog").show(false)
> +-+-+---+
> |namespace|tableName|isTemporary|
> +-+-+---+
> | |dst_tbl  |false  |
> +-+-+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34468) Fix v2 ALTER TABLE .. RENAME TO

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287274#comment-17287274
 ] 

Apache Spark commented on SPARK-34468:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31594

> Fix v2 ALTER TABLE .. RENAME TO
> ---
>
> Key: SPARK-34468
> URL: https://issues.apache.org/jira/browse/SPARK-34468
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The v2 `ALTER TABLE .. RENAME TO` command should rename a table in-place 
> instead of moving it to the "root" namespace:
> {code:scala}
> sql("ALTER TABLE ns1.ns2.ns3.src_tbl RENAME TO dst_tbl")
> sql(s"SHOW TABLES IN $catalog").show(false)
> +-+-+---+
> |namespace|tableName|isTemporary|
> +-+-+---+
> | |dst_tbl  |false  |
> +-+-+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34469) Ignore RegisterExecutor when SparkContext is stopped

2021-02-19 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-34469:
-

Assignee: Dongjoon Hyun

> Ignore RegisterExecutor when SparkContext is stopped
> 
>
> Key: SPARK-34469
> URL: https://issues.apache.org/jira/browse/SPARK-34469
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34469) Ignore RegisterExecutor when SparkContext is stopped

2021-02-19 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-34469.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31587
[https://github.com/apache/spark/pull/31587]

> Ignore RegisterExecutor when SparkContext is stopped
> 
>
> Key: SPARK-34469
> URL: https://issues.apache.org/jira/browse/SPARK-34469
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34283) Combines all adjacent 'Union' operators into a single 'Union' when using 'Dataset.union.distinct.union.distinct'

2021-02-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34283:
---

Assignee: Zhichao  Zhang

> Combines all adjacent 'Union' operators into a single 'Union' when using 
> 'Dataset.union.distinct.union.distinct'
> 
>
> Key: SPARK-34283
> URL: https://issues.apache.org/jira/browse/SPARK-34283
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: image-2021-01-29-11-12-44-112.png, 
> image-2021-01-29-11-13-42-055.png, image-2021-01-29-11-14-08-822.png, 
> image-2021-01-29-11-14-42-700.png
>
>
> Problem:
> Currently when using 'Dataset.union.distinct.union.distinct' to union some 
> datasets, Optimizer can't combine all adjacent 'Union' operators into a 
> single 'Union', but it can handle this case when using sql.
> For example:
> !image-2021-01-29-11-12-44-112.png!
> The 'Physical Plan' is shown below:
> !image-2021-01-29-11-13-42-055.png!
> But using sql:
> !image-2021-01-29-11-14-08-822.png!
> The 'Physical Plan' is shown below:
> !image-2021-01-29-11-14-42-700.png!
>  
> Root cause:
> When using 'Dataset.union.distinct.union.distinct', the operator is  
> 'Deduplicate(Keys, Union)', but AstBuilder transform sql 'Union' to operator 
> 'Distinct(Union)', the rule 'CombineUnions' in Optimizer only handle 
> 'Distinct(Union)' operator but not Deduplicate(Keys, Union).
>   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34283) Combines all adjacent 'Union' operators into a single 'Union' when using 'Dataset.union.distinct.union.distinct'

2021-02-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34283.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31404
[https://github.com/apache/spark/pull/31404]

> Combines all adjacent 'Union' operators into a single 'Union' when using 
> 'Dataset.union.distinct.union.distinct'
> 
>
> Key: SPARK-34283
> URL: https://issues.apache.org/jira/browse/SPARK-34283
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Zhichao  Zhang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: image-2021-01-29-11-12-44-112.png, 
> image-2021-01-29-11-13-42-055.png, image-2021-01-29-11-14-08-822.png, 
> image-2021-01-29-11-14-42-700.png
>
>
> Problem:
> Currently when using 'Dataset.union.distinct.union.distinct' to union some 
> datasets, Optimizer can't combine all adjacent 'Union' operators into a 
> single 'Union', but it can handle this case when using sql.
> For example:
> !image-2021-01-29-11-12-44-112.png!
> The 'Physical Plan' is shown below:
> !image-2021-01-29-11-13-42-055.png!
> But using sql:
> !image-2021-01-29-11-14-08-822.png!
> The 'Physical Plan' is shown below:
> !image-2021-01-29-11-14-42-700.png!
>  
> Root cause:
> When using 'Dataset.union.distinct.union.distinct', the operator is  
> 'Deduplicate(Keys, Union)', but AstBuilder transform sql 'Union' to operator 
> 'Distinct(Union)', the rule 'CombineUnions' in Optimizer only handle 
> 'Distinct(Union)' operator but not Deduplicate(Keys, Union).
>   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25075) Build and test Spark against Scala 2.13

2021-02-19 Thread Guillaume Martres (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287109#comment-17287109
 ] 

Guillaume Martres commented on SPARK-25075:
---

[~dongjoon] I think something is wrong with the published snapshots, it seems 
to depend on both Scala 2.12 and Scala 2.13 artifacts, leading to crashes at 
runtime, and indeed if I look at 
[https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.13/3.2.0-SNAPSHOT/spark-parent_2.13-3.2.0-20210219.011324-25.pom]
 I see:

2.12.10

So I assume a config file wasn't updated somewhere.

> Build and test Spark against Scala 2.13
> ---
>
> Key: SPARK-25075
> URL: https://issues.apache.org/jira/browse/SPARK-25075
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, MLlib, Project Infra, Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Guillaume Massé
>Priority: Major
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.13 milestone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34473) avoid NPE in DataFrameReader.schema(StructType)

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34473:


Assignee: Apache Spark

> avoid NPE in DataFrameReader.schema(StructType)
> ---
>
> Key: SPARK-34473
> URL: https://issues.apache.org/jira/browse/SPARK-34473
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34473) avoid NPE in DataFrameReader.schema(StructType)

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287096#comment-17287096
 ] 

Apache Spark commented on SPARK-34473:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/31593

> avoid NPE in DataFrameReader.schema(StructType)
> ---
>
> Key: SPARK-34473
> URL: https://issues.apache.org/jira/browse/SPARK-34473
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34473) avoid NPE in DataFrameReader.schema(StructType)

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34473:


Assignee: (was: Apache Spark)

> avoid NPE in DataFrameReader.schema(StructType)
> ---
>
> Key: SPARK-34473
> URL: https://issues.apache.org/jira/browse/SPARK-34473
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34473) avoid NPE in DataFrameReader.schema(StructType)

2021-02-19 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-34473:
---

 Summary: avoid NPE in DataFrameReader.schema(StructType)
 Key: SPARK-34473
 URL: https://issues.apache.org/jira/browse/SPARK-34473
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28123) String Functions: Add support btrim

2021-02-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-28123:
---

Assignee: jiaan.geng

> String Functions: Add support btrim
> ---
>
> Key: SPARK-28123
> URL: https://issues.apache.org/jira/browse/SPARK-28123
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: jiaan.geng
>Priority: Major
>
> ||Function||Return Type||Description||Example||Result||
> |{{btrim(_{{string}}_}}{{bytea}}{{, 
> _{{bytes}}_}}{{bytea}}{{)}}|{{bytea}}|Remove the longest string containing 
> only bytes appearing in _{{bytes}}_from the start and end of 
> _{{string}}_|{{btrim('\000trim\001'::bytea, '\000\001'::bytea)}}|{{trim}}|
> More details: https://www.postgresql.org/docs/11/functions-binarystring.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28123) String Functions: Add support btrim

2021-02-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-28123.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31390
[https://github.com/apache/spark/pull/31390]

> String Functions: Add support btrim
> ---
>
> Key: SPARK-28123
> URL: https://issues.apache.org/jira/browse/SPARK-28123
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.2.0
>
>
> ||Function||Return Type||Description||Example||Result||
> |{{btrim(_{{string}}_}}{{bytea}}{{, 
> _{{bytes}}_}}{{bytea}}{{)}}|{{bytea}}|Remove the longest string containing 
> only bytes appearing in _{{bytes}}_from the start and end of 
> _{{string}}_|{{btrim('\000trim\001'::bytea, '\000\001'::bytea)}}|{{trim}}|
> More details: https://www.postgresql.org/docs/11/functions-binarystring.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34424) HiveOrcHadoopFsRelationSuite fails with seed 610710213676

2021-02-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34424.
-
Fix Version/s: 3.1.1
   3.0.2
 Assignee: Maxim Gekk
   Resolution: Fixed

> HiveOrcHadoopFsRelationSuite fails with seed 610710213676
> -
>
> Key: SPARK-34424
> URL: https://issues.apache.org/jira/browse/SPARK-34424
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.2, 3.1.1
>
>
> The test "test all data types" in HiveOrcHadoopFsRelationSuite fails with:
> {code:java}
> == Results ==
> !== Correct Answer - 20 ==== Spark Answer - 20 ==
>  struct   struct
>  [1,1582-10-15]   [1,1582-10-15]
>  [2,null] [2,null]
>  [3,1970-01-01]   [3,1970-01-01]
>  [4,1681-08-06]   [4,1681-08-06]
>  [5,1582-10-15]   [5,1582-10-15]
>  [6,-12-31]   [6,-12-31]
>  [7,0583-01-04]   [7,0583-01-04]
>  [8,6077-03-04]   [8,6077-03-04]
> ![9,1582-10-06]   [9,1582-10-15]
>  [10,1582-10-15]  [10,1582-10-15]
>  [11,-12-31]  [11,-12-31]
>  [12,9722-10-04]  [12,9722-10-04]
>  [13,0243-12-19]  [13,0243-12-19]
>  [14,-12-31]  [14,-12-31]
>  [15,8743-01-24]  [15,8743-01-24]
>  [16,1039-10-31]  [16,1039-10-31]
>  [17,-12-31]  [17,-12-31]
>  [18,1582-10-15]  [18,1582-10-15]
>  [19,1582-10-15]  [19,1582-10-15]
>  [20,1582-10-15]  [20,1582-10-15]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34421) Custom functions can't be used in temporary views with CTEs

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287057#comment-17287057
 ] 

Apache Spark commented on SPARK-34421:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/31592

> Custom functions can't be used in temporary views with CTEs
> ---
>
> Key: SPARK-34421
> URL: https://issues.apache.org/jira/browse/SPARK-34421
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
> Environment: Databricks Runtime 8.0
>Reporter: Lauri Koobas
>Assignee: Peter Toth
>Priority: Blocker
> Fix For: 3.1.1
>
>
> The following query works in Spark 3.0 not Spark 3.1.
>   
>  Start with:
>  {{spark.udf.registerJavaFunction("custom_func", 
> "com.stuff.path.custom_func", LongType())}}
>   
>  Works: * {{select custom_func()}}
>  * {{create temporary view blaah as select custom_func()}}
>  * {{with step_1 as ( select custom_func() ) select * from step_1}}
> Broken:
>  {{create temporary view blaah as with step_1 as ( select custom_func() ) 
> select * from step_1}}
>   
>  followed by:
>  {{select * from blaah}}
>   
>  Error:
>  {{Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF 
> '}}{{com.stuff.path.custom_func}}{{';}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34472) SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287008#comment-17287008
 ] 

Apache Spark commented on SPARK-34472:
--

User 'shardulm94' has created a pull request for this issue:
https://github.com/apache/spark/pull/31591

> SparkContext.addJar with an ivy path fails in cluster mode with a custom 
> ivySettings file
> -
>
> Key: SPARK-34472
> URL: https://issues.apache.org/jira/browse/SPARK-34472
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Shardul Mahadik
>Priority: Major
>
> SPARK-33084 introduced support for Ivy paths in {{sc.addJar}} or Spark SQL 
> {{ADD JAR}}. If we use a custom ivySettings file using 
> {{spark.jars.ivySettings}}, it is loaded at 
> [https://github.com/apache/spark/blob/b26e7b510bbaee63c4095ab47e75ff2a70e377d7/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1280.]
>  However, this file is only accessible on the client machine. In cluster 
> mode, this file is not available on the driver and so {{addJar}} fails.
> {code:sh}
> spark-submit --master yarn --deploy-mode cluster --class IvyAddJarExample 
> --conf spark.jars.ivySettings=/path/to/ivySettings.xml example.jar
> {code}
> {code}
> java.lang.IllegalArgumentException: requirement failed: Ivy settings file 
> /path/to/ivySettings.xml does not exist
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.deploy.SparkSubmitUtils$.loadIvySettings(SparkSubmit.scala:1331)
>   at 
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:176)
>   at 
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:156)
>   at 
> org.apache.spark.sql.internal.SessionResourceLoader.resolveJars(SessionState.scala:166)
>   at 
> org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:133)
>   at 
> org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
>  {code}
> We should ship the ivySettings file to the driver so that {{addJar}} is able 
> to find it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34472) SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34472:


Assignee: Apache Spark

> SparkContext.addJar with an ivy path fails in cluster mode with a custom 
> ivySettings file
> -
>
> Key: SPARK-34472
> URL: https://issues.apache.org/jira/browse/SPARK-34472
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Shardul Mahadik
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-33084 introduced support for Ivy paths in {{sc.addJar}} or Spark SQL 
> {{ADD JAR}}. If we use a custom ivySettings file using 
> {{spark.jars.ivySettings}}, it is loaded at 
> [https://github.com/apache/spark/blob/b26e7b510bbaee63c4095ab47e75ff2a70e377d7/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1280.]
>  However, this file is only accessible on the client machine. In cluster 
> mode, this file is not available on the driver and so {{addJar}} fails.
> {code:sh}
> spark-submit --master yarn --deploy-mode cluster --class IvyAddJarExample 
> --conf spark.jars.ivySettings=/path/to/ivySettings.xml example.jar
> {code}
> {code}
> java.lang.IllegalArgumentException: requirement failed: Ivy settings file 
> /path/to/ivySettings.xml does not exist
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.deploy.SparkSubmitUtils$.loadIvySettings(SparkSubmit.scala:1331)
>   at 
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:176)
>   at 
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:156)
>   at 
> org.apache.spark.sql.internal.SessionResourceLoader.resolveJars(SessionState.scala:166)
>   at 
> org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:133)
>   at 
> org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
>  {code}
> We should ship the ivySettings file to the driver so that {{addJar}} is able 
> to find it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34472) SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34472:


Assignee: (was: Apache Spark)

> SparkContext.addJar with an ivy path fails in cluster mode with a custom 
> ivySettings file
> -
>
> Key: SPARK-34472
> URL: https://issues.apache.org/jira/browse/SPARK-34472
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Shardul Mahadik
>Priority: Major
>
> SPARK-33084 introduced support for Ivy paths in {{sc.addJar}} or Spark SQL 
> {{ADD JAR}}. If we use a custom ivySettings file using 
> {{spark.jars.ivySettings}}, it is loaded at 
> [https://github.com/apache/spark/blob/b26e7b510bbaee63c4095ab47e75ff2a70e377d7/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1280.]
>  However, this file is only accessible on the client machine. In cluster 
> mode, this file is not available on the driver and so {{addJar}} fails.
> {code:sh}
> spark-submit --master yarn --deploy-mode cluster --class IvyAddJarExample 
> --conf spark.jars.ivySettings=/path/to/ivySettings.xml example.jar
> {code}
> {code}
> java.lang.IllegalArgumentException: requirement failed: Ivy settings file 
> /path/to/ivySettings.xml does not exist
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.deploy.SparkSubmitUtils$.loadIvySettings(SparkSubmit.scala:1331)
>   at 
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:176)
>   at 
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:156)
>   at 
> org.apache.spark.sql.internal.SessionResourceLoader.resolveJars(SessionState.scala:166)
>   at 
> org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:133)
>   at 
> org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
>  {code}
> We should ship the ivySettings file to the driver so that {{addJar}} is able 
> to find it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34472) SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287009#comment-17287009
 ] 

Apache Spark commented on SPARK-34472:
--

User 'shardulm94' has created a pull request for this issue:
https://github.com/apache/spark/pull/31591

> SparkContext.addJar with an ivy path fails in cluster mode with a custom 
> ivySettings file
> -
>
> Key: SPARK-34472
> URL: https://issues.apache.org/jira/browse/SPARK-34472
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Shardul Mahadik
>Priority: Major
>
> SPARK-33084 introduced support for Ivy paths in {{sc.addJar}} or Spark SQL 
> {{ADD JAR}}. If we use a custom ivySettings file using 
> {{spark.jars.ivySettings}}, it is loaded at 
> [https://github.com/apache/spark/blob/b26e7b510bbaee63c4095ab47e75ff2a70e377d7/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1280.]
>  However, this file is only accessible on the client machine. In cluster 
> mode, this file is not available on the driver and so {{addJar}} fails.
> {code:sh}
> spark-submit --master yarn --deploy-mode cluster --class IvyAddJarExample 
> --conf spark.jars.ivySettings=/path/to/ivySettings.xml example.jar
> {code}
> {code}
> java.lang.IllegalArgumentException: requirement failed: Ivy settings file 
> /path/to/ivySettings.xml does not exist
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.deploy.SparkSubmitUtils$.loadIvySettings(SparkSubmit.scala:1331)
>   at 
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:176)
>   at 
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:156)
>   at 
> org.apache.spark.sql.internal.SessionResourceLoader.resolveJars(SessionState.scala:166)
>   at 
> org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:133)
>   at 
> org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
>  {code}
> We should ship the ivySettings file to the driver so that {{addJar}} is able 
> to find it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34421) Custom functions can't be used in temporary views with CTEs

2021-02-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34421:
---

Assignee: Peter Toth

> Custom functions can't be used in temporary views with CTEs
> ---
>
> Key: SPARK-34421
> URL: https://issues.apache.org/jira/browse/SPARK-34421
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
> Environment: Databricks Runtime 8.0
>Reporter: Lauri Koobas
>Assignee: Peter Toth
>Priority: Blocker
> Fix For: 3.1.1
>
>
> Works in DBR 7.4, which is Spark 3.0.1. Breaks in DBR8.0(beta), which is 
> Spark 3.1.
>  
> Start with:
> {{spark.udf.registerJavaFunction("custom_func", "com.stuff.path.custom_func", 
> LongType())}}
>  
> Works: * {{select custom_func()}}
>  * {{create temporary view blaah as select custom_func()}}
>  * {{with step_1 as ( select custom_func() ) select * from step_1}}
> Broken:
> {{create temporary view blaah as with step_1 as ( select custom_func() ) 
> select * from step_1}}
>  
> followed by:
> {{select * from blaah}}
>  
> Error:
> {{Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF 
> '}}{{com.stuff.path.custom_func}}{{';}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34421) Custom functions can't be used in temporary views with CTEs

2021-02-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-34421:

Description: 
The following query works in Spark 3.0 not Spark 3.1.
  
 Start with:
 {{spark.udf.registerJavaFunction("custom_func", "com.stuff.path.custom_func", 
LongType())}}
  
 Works: * {{select custom_func()}}
 * {{create temporary view blaah as select custom_func()}}
 * {{with step_1 as ( select custom_func() ) select * from step_1}}

Broken:
 {{create temporary view blaah as with step_1 as ( select custom_func() ) 
select * from step_1}}
  
 followed by:
 {{select * from blaah}}
  
 Error:
 {{Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF 
'}}{{com.stuff.path.custom_func}}{{';}}

  was:
Works in DBR 7.4, which is Spark 3.0.1. Breaks in DBR8.0(beta), which is Spark 
3.1.
 
Start with:
{{spark.udf.registerJavaFunction("custom_func", "com.stuff.path.custom_func", 
LongType())}}
 
Works: * {{select custom_func()}}
 * {{create temporary view blaah as select custom_func()}}
 * {{with step_1 as ( select custom_func() ) select * from step_1}}


Broken:
{{create temporary view blaah as with step_1 as ( select custom_func() ) select 
* from step_1}}
 
followed by:
{{select * from blaah}}
 
Error:
{{Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF 
'}}{{com.stuff.path.custom_func}}{{';}}


> Custom functions can't be used in temporary views with CTEs
> ---
>
> Key: SPARK-34421
> URL: https://issues.apache.org/jira/browse/SPARK-34421
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
> Environment: Databricks Runtime 8.0
>Reporter: Lauri Koobas
>Assignee: Peter Toth
>Priority: Blocker
> Fix For: 3.1.1
>
>
> The following query works in Spark 3.0 not Spark 3.1.
>   
>  Start with:
>  {{spark.udf.registerJavaFunction("custom_func", 
> "com.stuff.path.custom_func", LongType())}}
>   
>  Works: * {{select custom_func()}}
>  * {{create temporary view blaah as select custom_func()}}
>  * {{with step_1 as ( select custom_func() ) select * from step_1}}
> Broken:
>  {{create temporary view blaah as with step_1 as ( select custom_func() ) 
> select * from step_1}}
>   
>  followed by:
>  {{select * from blaah}}
>   
>  Error:
>  {{Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF 
> '}}{{com.stuff.path.custom_func}}{{';}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34421) Custom functions can't be used in temporary views with CTEs

2021-02-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34421.
-
Fix Version/s: 3.1.1
   Resolution: Fixed

Issue resolved by pull request 31550
[https://github.com/apache/spark/pull/31550]

> Custom functions can't be used in temporary views with CTEs
> ---
>
> Key: SPARK-34421
> URL: https://issues.apache.org/jira/browse/SPARK-34421
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
> Environment: Databricks Runtime 8.0
>Reporter: Lauri Koobas
>Priority: Blocker
> Fix For: 3.1.1
>
>
> Works in DBR 7.4, which is Spark 3.0.1. Breaks in DBR8.0(beta), which is 
> Spark 3.1.
>  
> Start with:
> {{spark.udf.registerJavaFunction("custom_func", "com.stuff.path.custom_func", 
> LongType())}}
>  
> Works: * {{select custom_func()}}
>  * {{create temporary view blaah as select custom_func()}}
>  * {{with step_1 as ( select custom_func() ) select * from step_1}}
> Broken:
> {{create temporary view blaah as with step_1 as ( select custom_func() ) 
> select * from step_1}}
>  
> followed by:
> {{select * from blaah}}
>  
> Error:
> {{Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF 
> '}}{{com.stuff.path.custom_func}}{{';}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34472) SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file

2021-02-19 Thread Shardul Mahadik (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286977#comment-17286977
 ] 

Shardul Mahadik commented on SPARK-34472:
-

I will be sending a PR for this soon.

> SparkContext.addJar with an ivy path fails in cluster mode with a custom 
> ivySettings file
> -
>
> Key: SPARK-34472
> URL: https://issues.apache.org/jira/browse/SPARK-34472
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Shardul Mahadik
>Priority: Major
>
> SPARK-33084 introduced support for Ivy paths in {{sc.addJar}} or Spark SQL 
> {{ADD JAR}}. If we use a custom ivySettings file using 
> {{spark.jars.ivySettings}}, it is loaded at 
> [https://github.com/apache/spark/blob/b26e7b510bbaee63c4095ab47e75ff2a70e377d7/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1280.]
>  However, this file is only accessible on the client machine. In cluster 
> mode, this file is not available on the driver and so {{addJar}} fails.
> {code:sh}
> spark-submit --master yarn --deploy-mode cluster --class IvyAddJarExample 
> --conf spark.jars.ivySettings=/path/to/ivySettings.xml example.jar
> {code}
> {code}
> java.lang.IllegalArgumentException: requirement failed: Ivy settings file 
> /path/to/ivySettings.xml does not exist
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.deploy.SparkSubmitUtils$.loadIvySettings(SparkSubmit.scala:1331)
>   at 
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:176)
>   at 
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:156)
>   at 
> org.apache.spark.sql.internal.SessionResourceLoader.resolveJars(SessionState.scala:166)
>   at 
> org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:133)
>   at 
> org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
>  {code}
> We should ship the ivySettings file to the driver so that {{addJar}} is able 
> to find it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34472) SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file

2021-02-19 Thread Shardul Mahadik (Jira)

Shardul Mahadik created SPARK-34472:
---

 Summary: SparkContext.addJar with an ivy path fails in cluster 
mode with a custom ivySettings file
 Key: SPARK-34472
 URL: https://issues.apache.org/jira/browse/SPARK-34472
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Shardul Mahadik


SPARK-33084 introduced support for Ivy paths in {{sc.addJar}} or Spark SQL 
{{ADD JAR}}. If we use a custom ivySettings file using 
{{spark.jars.ivySettings}}, it is loaded at 
[https://github.com/apache/spark/blob/b26e7b510bbaee63c4095ab47e75ff2a70e377d7/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1280.]
 However, this file is only accessible on the client machine. In cluster mode, 
this file is not available on the driver and so {{addJar}} fails.

{code:sh}
spark-submit --master yarn --deploy-mode cluster --class IvyAddJarExample 
--conf spark.jars.ivySettings=/path/to/ivySettings.xml example.jar
{code}

{code}
java.lang.IllegalArgumentException: requirement failed: Ivy settings file 
/path/to/ivySettings.xml does not exist
at scala.Predef$.require(Predef.scala:281)
at 
org.apache.spark.deploy.SparkSubmitUtils$.loadIvySettings(SparkSubmit.scala:1331)
at 
org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:176)
at 
org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:156)
at 
org.apache.spark.sql.internal.SessionResourceLoader.resolveJars(SessionState.scala:166)
at 
org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:133)
at 
org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
 {code}

We should ship the ivySettings file to the driver so that {{addJar}} is able to 
find it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286953#comment-17286953
 ] 

Apache Spark commented on SPARK-34471:
--

User 'bozhang2820' has created a pull request for this issue:
https://github.com/apache/spark/pull/31590

> Document DataStreamReader/Writer table APIs in Structured Streaming 
> Programming Guide
> -
>
> Key: SPARK-34471
> URL: https://issues.apache.org/jira/browse/SPARK-34471
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Structured Streaming
>Affects Versions: 3.1.1
>Reporter: Bo Zhang
>Priority: Major
>
> We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 
> and SPARK-33836.
> We need to update the Structured Streaming Programming Guide with the changes 
> above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34471:


Assignee: (was: Apache Spark)

> Document DataStreamReader/Writer table APIs in Structured Streaming 
> Programming Guide
> -
>
> Key: SPARK-34471
> URL: https://issues.apache.org/jira/browse/SPARK-34471
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Structured Streaming
>Affects Versions: 3.1.1
>Reporter: Bo Zhang
>Priority: Major
>
> We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 
> and SPARK-33836.
> We need to update the Structured Streaming Programming Guide with the changes 
> above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide

2021-02-19 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34471:


Assignee: Apache Spark

> Document DataStreamReader/Writer table APIs in Structured Streaming 
> Programming Guide
> -
>
> Key: SPARK-34471
> URL: https://issues.apache.org/jira/browse/SPARK-34471
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Structured Streaming
>Affects Versions: 3.1.1
>Reporter: Bo Zhang
>Assignee: Apache Spark
>Priority: Major
>
> We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 
> and SPARK-33836.
> We need to update the Structured Streaming Programming Guide with the changes 
> above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286951#comment-17286951
 ] 

Apache Spark commented on SPARK-34471:
--

User 'bozhang2820' has created a pull request for this issue:
https://github.com/apache/spark/pull/31590

> Document DataStreamReader/Writer table APIs in Structured Streaming 
> Programming Guide
> -
>
> Key: SPARK-34471
> URL: https://issues.apache.org/jira/browse/SPARK-34471
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Structured Streaming
>Affects Versions: 3.1.1
>Reporter: Bo Zhang
>Priority: Major
>
> We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 
> and SPARK-33836.
> We need to update the Structured Streaming Programming Guide with the changes 
> above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide

2021-02-19 Thread Bo Zhang (Jira)

Bo Zhang created SPARK-34471:


 Summary: Document DataStreamReader/Writer table APIs in Structured 
Streaming Programming Guide
 Key: SPARK-34471
 URL: https://issues.apache.org/jira/browse/SPARK-34471
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, Structured Streaming
Affects Versions: 3.1.1
Reporter: Bo Zhang


We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 and 
SPARK-33836.

We need to update the Structured Streaming Programming Guide with the changes 
above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34314) Wrong discovered partition value

2021-02-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34314:
---

Assignee: Maxim Gekk

> Wrong discovered partition value
> 
>
> Key: SPARK-34314
> URL: https://issues.apache.org/jira/browse/SPARK-34314
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> The example below portraits the issue:
> {code:scala}
>   val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
>   df.write
> .partitionBy("part")
> .format("parquet")
> .save(path)
>   val readback = spark.read.parquet(path)
>   readback.printSchema()
>   readback.show(false)
> {code}
> It write the partition value as string:
> {code}
> /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
> ├── _SUCCESS
> ├── part=-0
> │   └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> └── part=AA
> └── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> {code}
> *"-0"* and "AA".
> but when Spark reads data back, it transforms "-0" to "0"
> {code}
> root
>  |-- id: integer (nullable = true)
>  |-- part: string (nullable = true)
> +---++
> |id |part|
> +---++
> |0  |AA  |
> |1  |0   |
> +---++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34314) Wrong discovered partition value

2021-02-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34314.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31549
[https://github.com/apache/spark/pull/31549]

> Wrong discovered partition value
> 
>
> Key: SPARK-34314
> URL: https://issues.apache.org/jira/browse/SPARK-34314
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> The example below portraits the issue:
> {code:scala}
>   val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
>   df.write
> .partitionBy("part")
> .format("parquet")
> .save(path)
>   val readback = spark.read.parquet(path)
>   readback.printSchema()
>   readback.show(false)
> {code}
> It write the partition value as string:
> {code}
> /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
> ├── _SUCCESS
> ├── part=-0
> │   └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> └── part=AA
> └── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> {code}
> *"-0"* and "AA".
> but when Spark reads data back, it transforms "-0" to "0"
> {code}
> root
>  |-- id: integer (nullable = true)
>  |-- part: string (nullable = true)
> +---++
> |id |part|
> +---++
> |0  |AA  |
> |1  |0   |
> +---++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34424) HiveOrcHadoopFsRelationSuite fails with seed 610710213676

2021-02-19 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286926#comment-17286926
 ] 

Apache Spark commented on SPARK-34424:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31589

> HiveOrcHadoopFsRelationSuite fails with seed 610710213676
> -
>
> Key: SPARK-34424
> URL: https://issues.apache.org/jira/browse/SPARK-34424
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Maxim Gekk
>Priority: Major
>
> The test "test all data types" in HiveOrcHadoopFsRelationSuite fails with:
> {code:java}
> == Results ==
> !== Correct Answer - 20 ==== Spark Answer - 20 ==
>  struct   struct
>  [1,1582-10-15]   [1,1582-10-15]
>  [2,null] [2,null]
>  [3,1970-01-01]   [3,1970-01-01]
>  [4,1681-08-06]   [4,1681-08-06]
>  [5,1582-10-15]   [5,1582-10-15]
>  [6,-12-31]   [6,-12-31]
>  [7,0583-01-04]   [7,0583-01-04]
>  [8,6077-03-04]   [8,6077-03-04]
> ![9,1582-10-06]   [9,1582-10-15]
>  [10,1582-10-15]  [10,1582-10-15]
>  [11,-12-31]  [11,-12-31]
>  [12,9722-10-04]  [12,9722-10-04]
>  [13,0243-12-19]  [13,0243-12-19]
>  [14,-12-31]  [14,-12-31]
>  [15,8743-01-24]  [15,8743-01-24]
>  [16,1039-10-31]  [16,1039-10-31]
>  [17,-12-31]  [17,-12-31]
>  [18,1582-10-15]  [18,1582-10-15]
>  [19,1582-10-15]  [19,1582-10-15]
>  [20,1582-10-15]  [20,1582-10-15]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21770) ProbabilisticClassificationModel: Improve normalization of all-zero raw predictions

2021-02-19 Thread Weichen Xu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-21770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286916#comment-17286916
 ] 

Weichen Xu commented on SPARK-21770:


[~rishi-aga]

Could you create a new ticket for this with reproducing code ?
We should find root cause why it generate all zero probabilities

> ProbabilisticClassificationModel: Improve normalization of all-zero raw 
> predictions
> ---
>
> Key: SPARK-21770
> URL: https://issues.apache.org/jira/browse/SPARK-21770
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: Siddharth Murching
>Assignee: Weichen Xu
>Priority: Minor
> Fix For: 2.3.0
>
>
> Given an n-element raw prediction vector of all-zeros, 
> ProbabilisticClassifierModel.normalizeToProbabilitiesInPlace() should output 
> a probability vector of all-equal 1/n entries



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

84 matches

Mail list logo