[jira] [Updated] (SPARK-34481) Refactor dataframe reader/writer path option logic
[ https://issues.apache.org/jira/browse/SPARK-34481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuchen Huo updated SPARK-34481: --- Priority: Trivial (was: Major) > Refactor dataframe reader/writer path option logic > -- > > Key: SPARK-34481 > URL: https://issues.apache.org/jira/browse/SPARK-34481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuchen Huo >Priority: Trivial > > Refactor the dataframe reader/writer logic so the path in options handling > logic has their own function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10816) EventTime based sessionization (session window)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287585#comment-17287585 ] Yuanjian Li commented on SPARK-10816: - Great thanks for your heads up! [~viirya] [~kabhwan] {quote}Now that there're two committers from different teams finding the feature as useful, looks like we could try pushing this out again. {quote} Big +1. Really excited to revive this feature with you. I'll also take some time to reload the old context soon. {quote}Probably the code size is different because the design is actually quite different {quote} That's right. From my roughly investigation, The main difference list below: * State store format design: As Shixiong described in [this comment|https://issues.apache.org/jira/browse/SPARK-10816?focusedCommentId=16645370&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16645370], my approach is easy to implement but not scale well in the case of non-numeric aggregate. * The structure of the physical plan node: Jungtaek's approach leverages the aggregation iterator. My approach reused the way of `WindowExec`. About authorship, really appreciate your trust [~kabhwan]! I can help with confirming with the co-authors. Comparing with other issues, I think this should be the easiest one and can be discussed at the end. :) > EventTime based sessionization (session window) > --- > > Key: SPARK-10816 > URL: https://issues.apache.org/jira/browse/SPARK-10816 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Reynold Xin >Priority: Major > Attachments: SPARK-10816 Support session window natively.pdf, Session > Window Support For Structure Streaming.pdf > > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. > Unlike time window (tumbling window and sliding window), session window > doesn't have static window begin and end time. Session window creation > depends on defined session gap which can be static or dynamic. > For static session gap, the events which are falling in a certain period of > time (gap) are considered as a session window. A session window closes when > it does not receive events for the gap. For dynamic gap, the gap could be > changed from event to event. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34481) Refactor dataframe reader/writer path option logic
[ https://issues.apache.org/jira/browse/SPARK-34481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34481: Assignee: (was: Apache Spark) > Refactor dataframe reader/writer path option logic > -- > > Key: SPARK-34481 > URL: https://issues.apache.org/jira/browse/SPARK-34481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuchen Huo >Priority: Major > > Refactor the dataframe reader/writer logic so the path in options handling > logic has their own function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34481) Refactor dataframe reader/writer path option logic
[ https://issues.apache.org/jira/browse/SPARK-34481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287583#comment-17287583 ] Apache Spark commented on SPARK-34481: -- User 'yuchenhuo' has created a pull request for this issue: https://github.com/apache/spark/pull/31599 > Refactor dataframe reader/writer path option logic > -- > > Key: SPARK-34481 > URL: https://issues.apache.org/jira/browse/SPARK-34481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuchen Huo >Priority: Major > > Refactor the dataframe reader/writer logic so the path in options handling > logic has their own function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34481) Refactor dataframe reader/writer path option logic
[ https://issues.apache.org/jira/browse/SPARK-34481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34481: Assignee: Apache Spark > Refactor dataframe reader/writer path option logic > -- > > Key: SPARK-34481 > URL: https://issues.apache.org/jira/browse/SPARK-34481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuchen Huo >Assignee: Apache Spark >Priority: Major > > Refactor the dataframe reader/writer logic so the path in options handling > logic has their own function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34481) Refactor dataframe reader/writer path option logic
Yuchen Huo created SPARK-34481: -- Summary: Refactor dataframe reader/writer path option logic Key: SPARK-34481 URL: https://issues.apache.org/jira/browse/SPARK-34481 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Yuchen Huo Refactor the dataframe reader/writer logic so the path in options handling logic has their own function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-34480) Module launcher build failed with profile hadoop-3.2 activated
[ https://issues.apache.org/jira/browse/SPARK-34480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lichuanliang closed SPARK-34480. Please ignore this issue > Module launcher build failed with profile hadoop-3.2 activated > -- > > Key: SPARK-34480 > URL: https://issues.apache.org/jira/browse/SPARK-34480 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 3.0.1 >Reporter: Lichuanliang >Priority: Minor > > Build spark 3.0.1 with profile hadoop-3.2 activated > {code:java} > build/mvn -pl :spark-launcher_2.12 package -DskipTests -Phadoop-3.2 -Phive > -Phive-thriftserver -Pkubernetes > {code} > When building the spark-launcher module it complains that lacking the > common-lang dependency: > {code:java} > [INFO] --- scala-maven-plugin:4.3.0:compile (scala-compile-first) @ > spark-launcher_2.12 --- > [INFO] Using incremental compilation using Mixed compile order > [INFO] Compiler bridge file: > /Users/lichuanliang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.3.1-bin_2.12.10__52.0-1.3.1_20191012T045515.jar > [INFO] Compiling 20 Java sources to > /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/target/scala-2.12/classes > ... > [ERROR] [Error] > /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:20: > package org.apache.commons.lang does not exist > [ERROR] [Error] > /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:226: > cannot find symbol > symbol: variable StringUtils > location: class org.apache.spark.launcher.SparkSubmitCommandBuilder > [ERROR] [Error] > /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:227: > cannot find symbol > symbol: variable StringUtils > location: class org.apache.spark.launcher.SparkSubmitCommandBuilder > [ERROR] [Error] > /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:232: > cannot find symbol > symbol: variable StringUtils > location: class org.apache.spark.launcher.SparkSubmitCommandBuilder > [INFO] > > [INFO] BUILD FAILURE > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34480) Module launcher build failed with profile hadoop-3.2 activated
[ https://issues.apache.org/jira/browse/SPARK-34480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lichuanliang resolved SPARK-34480. -- Resolution: Fixed > Module launcher build failed with profile hadoop-3.2 activated > -- > > Key: SPARK-34480 > URL: https://issues.apache.org/jira/browse/SPARK-34480 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 3.0.1 >Reporter: Lichuanliang >Priority: Minor > > Build spark 3.0.1 with profile hadoop-3.2 activated > {code:java} > build/mvn -pl :spark-launcher_2.12 package -DskipTests -Phadoop-3.2 -Phive > -Phive-thriftserver -Pkubernetes > {code} > When building the spark-launcher module it complains that lacking the > common-lang dependency: > {code:java} > [INFO] --- scala-maven-plugin:4.3.0:compile (scala-compile-first) @ > spark-launcher_2.12 --- > [INFO] Using incremental compilation using Mixed compile order > [INFO] Compiler bridge file: > /Users/lichuanliang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.3.1-bin_2.12.10__52.0-1.3.1_20191012T045515.jar > [INFO] Compiling 20 Java sources to > /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/target/scala-2.12/classes > ... > [ERROR] [Error] > /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:20: > package org.apache.commons.lang does not exist > [ERROR] [Error] > /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:226: > cannot find symbol > symbol: variable StringUtils > location: class org.apache.spark.launcher.SparkSubmitCommandBuilder > [ERROR] [Error] > /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:227: > cannot find symbol > symbol: variable StringUtils > location: class org.apache.spark.launcher.SparkSubmitCommandBuilder > [ERROR] [Error] > /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:232: > cannot find symbol > symbol: variable StringUtils > location: class org.apache.spark.launcher.SparkSubmitCommandBuilder > [INFO] > > [INFO] BUILD FAILURE > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34480) Module launcher build failed with profile hadoop-3.2 activated
Lichuanliang created SPARK-34480: Summary: Module launcher build failed with profile hadoop-3.2 activated Key: SPARK-34480 URL: https://issues.apache.org/jira/browse/SPARK-34480 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 3.0.1 Reporter: Lichuanliang Build spark 3.0.1 with profile hadoop-3.2 activated {code:java} build/mvn -pl :spark-launcher_2.12 package -DskipTests -Phadoop-3.2 -Phive -Phive-thriftserver -Pkubernetes {code} When building the spark-launcher module it complains that lacking the common-lang dependency: {code:java} [INFO] --- scala-maven-plugin:4.3.0:compile (scala-compile-first) @ spark-launcher_2.12 --- [INFO] Using incremental compilation using Mixed compile order [INFO] Compiler bridge file: /Users/lichuanliang/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.3.1-bin_2.12.10__52.0-1.3.1_20191012T045515.jar [INFO] Compiling 20 Java sources to /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/target/scala-2.12/classes ... [ERROR] [Error] /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:20: package org.apache.commons.lang does not exist [ERROR] [Error] /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:226: cannot find symbol symbol: variable StringUtils location: class org.apache.spark.launcher.SparkSubmitCommandBuilder [ERROR] [Error] /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:227: cannot find symbol symbol: variable StringUtils location: class org.apache.spark.launcher.SparkSubmitCommandBuilder [ERROR] [Error] /Users/lichuanliang/IdeaProjects/qtt-spark-3.0/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:232: cannot find symbol symbol: variable StringUtils location: class org.apache.spark.launcher.SparkSubmitCommandBuilder [INFO] [INFO] BUILD FAILURE {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-34471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-34471. -- Fix Version/s: 3.1.1 Resolution: Fixed Issue resolved by pull request 31590 [https://github.com/apache/spark/pull/31590] > Document DataStreamReader/Writer table APIs in Structured Streaming > Programming Guide > - > > Key: SPARK-34471 > URL: https://issues.apache.org/jira/browse/SPARK-34471 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 3.1.1 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > Fix For: 3.1.1 > > > We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 > and SPARK-33836. > We need to update the Structured Streaming Programming Guide with the changes > above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-34471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-34471: Assignee: Bo Zhang > Document DataStreamReader/Writer table APIs in Structured Streaming > Programming Guide > - > > Key: SPARK-34471 > URL: https://issues.apache.org/jira/browse/SPARK-34471 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 3.1.1 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > > We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 > and SPARK-33836. > We need to update the Structured Streaming Programming Guide with the changes > above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33602) Group exception messages in execution/datasources
[ https://issues.apache.org/jira/browse/SPARK-33602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287492#comment-17287492 ] jiaan.geng commented on SPARK-33602: ping [~allisonwang-db] Should we put AnalysisException in DataSource.scala into QueryCompilationErrors or QueryExecutionErrors ? > Group exception messages in execution/datasources > - > > Key: SPARK-33602 > URL: https://issues.apache.org/jira/browse/SPARK-33602 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/datasources' > || Filename|| Count || > | DataSource.scala| 9 | > | DataSourceStrategy.scala| 1 | > | DataSourceUtils.scala | 2 | > | FileFormat.scala| 1 | > | FileFormatWriter.scala | 3 | > | FileScanRDD.scala | 2 | > | InsertIntoHadoopFsRelationCommand.scala | 2 | > | PartitioningAwareFileIndex.scala| 1 | > | PartitioningUtils.scala | 3 | > | RecordReaderIterator.scala | 1 | > | rules.scala | 4 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile' > || Filename || Count || > | BinaryFileFormat.scala | 2 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc' > || Filename || Count || > | JDBCOptions.scala | 2 | > | JdbcUtils.scala | 6 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc' > || Filename || Count || > | OrcDeserializer.scala | 1 | > | OrcFilters.scala | 1 | > | OrcSerializer.scala | 1 | > | OrcUtils.scala| 2 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet' > || Filename || Count || > | ParquetFileFormat.scala | 2 | > | ParquetReadSupport.scala | 1 | > | ParquetSchemaConverter.scala | 6 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-33601) Group exception messages in catalyst/parser
[ https://issues.apache.org/jira/browse/SPARK-33601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-33601: --- Comment: was deleted (was: I'm working on.) > Group exception messages in catalyst/parser > --- > > Key: SPARK-33601 > URL: https://issues.apache.org/jira/browse/SPARK-33601 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > Fix For: 3.2.0 > > > '/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser' > || Filename || Count || > | AstBuilder.scala | 36 | > | LegacyTypeStringParser.scala | 1 | > | ParseDriver.scala| 3 | > | ParserUtils.scala| 4 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-33541) Group exception messages in catalyst/expressions
[ https://issues.apache.org/jira/browse/SPARK-33541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-33541: --- Comment: was deleted (was: I'm working on.) > Group exception messages in catalyst/expressions > > > Key: SPARK-33541 > URL: https://issues.apache.org/jira/browse/SPARK-33541 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: jiaan.geng >Priority: Major > Fix For: 3.2.0 > > > '/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions' > || Filename || Count || > | Cast.scala| 18 | > | ExprUtils.scala | 2 | > | Expression.scala | 8 | > | InterpretedUnsafeProjection.scala | 1 | > | ScalaUDF.scala| 2 | > | SelectedField.scala | 3 | > | SubExprEvaluationRuntime.scala| 1 | > | arithmetic.scala | 8 | > | collectionOperations.scala| 4 | > | complexTypeExtractors.scala | 3 | > | csvExpressions.scala | 3 | > | datetimeExpressions.scala | 4 | > | decimalExpressions.scala | 2 | > | generators.scala | 2 | > | higherOrderFunctions.scala| 6 | > | jsonExpressions.scala | 2 | > | literals.scala| 3 | > | misc.scala| 2 | > | namedExpressions.scala| 1 | > | ordering.scala| 1 | > | package.scala | 1 | > | regexpExpressions.scala | 1 | > | stringExpressions.scala | 1 | > | windowExpressions.scala | 5 | > '/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate' > || Filename|| Count || > | ApproximatePercentile.scala | 2 | > | HyperLogLogPlusPlus.scala | 1 | > | Percentile.scala| 1 | > | interfaces.scala| 2 | > '/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen' > || Filename|| Count || > | CodeGenerator.scala | 5 | > | javaCode.scala | 1 | > '/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects' > || Filename || Count || > | objects.scala | 12 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33602) Group exception messages in execution/datasources
[ https://issues.apache.org/jira/browse/SPARK-33602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287485#comment-17287485 ] jiaan.geng commented on SPARK-33602: I'm working on. > Group exception messages in execution/datasources > - > > Key: SPARK-33602 > URL: https://issues.apache.org/jira/browse/SPARK-33602 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/datasources' > || Filename|| Count || > | DataSource.scala| 9 | > | DataSourceStrategy.scala| 1 | > | DataSourceUtils.scala | 2 | > | FileFormat.scala| 1 | > | FileFormatWriter.scala | 3 | > | FileScanRDD.scala | 2 | > | InsertIntoHadoopFsRelationCommand.scala | 2 | > | PartitioningAwareFileIndex.scala| 1 | > | PartitioningUtils.scala | 3 | > | RecordReaderIterator.scala | 1 | > | rules.scala | 4 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile' > || Filename || Count || > | BinaryFileFormat.scala | 2 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc' > || Filename || Count || > | JDBCOptions.scala | 2 | > | JdbcUtils.scala | 6 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc' > || Filename || Count || > | OrcDeserializer.scala | 1 | > | OrcFilters.scala | 1 | > | OrcSerializer.scala | 1 | > | OrcUtils.scala| 2 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet' > || Filename || Count || > | ParquetFileFormat.scala | 2 | > | ParquetReadSupport.scala | 1 | > | ParquetSchemaConverter.scala | 6 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-33542) Group exception messages in catalyst/catalog
[ https://issues.apache.org/jira/browse/SPARK-33542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-33542: --- Comment: was deleted (was: I'm working on.) > Group exception messages in catalyst/catalog > > > Key: SPARK-33542 > URL: https://issues.apache.org/jira/browse/SPARK-33542 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: jiaan.geng >Priority: Major > Fix For: 3.2.0 > > > Group all exception messages in > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog. > ||Filename||Count|| > |ExternalCatalog.scala|4| > |GlobalTempViewManager.scala|1| > |InMemoryCatalog.scala|18| > |SessionCatalog.scala|17| > |functionResources.scala|1| > |interface.scala|4| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-33599) Group exception messages in catalyst/analysis
[ https://issues.apache.org/jira/browse/SPARK-33599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-33599: --- Comment: was deleted (was: I'm working on.) > Group exception messages in catalyst/analysis > - > > Key: SPARK-33599 > URL: https://issues.apache.org/jira/browse/SPARK-33599 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: jiaan.geng >Priority: Major > Fix For: 3.2.0 > > > '/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis' > || Filename || Count || > | Analyzer.scala | 1 | > | CheckAnalysis.scala| 1 | > | FunctionRegistry.scala | 5 | > | ResolveCatalogs.scala | 1 | > | ResolveHints.scala | 1 | > | package.scala | 2 | > | unresolved.scala | 43 | > '/core/src/main/scala/org/apache/spark/sql/catalyst/analysis' > || Filename|| Count || > | ResolveSessionCatalog.scala | 12 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34449) Upgrade Jetty to fix CVE-2020-27218
[ https://issues.apache.org/jira/browse/SPARK-34449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-34449. -- Fix Version/s: 3.1.2 3.0.3 2.4.8 Resolution: Fixed Issue resolved by pull request 31583 [https://github.com/apache/spark/pull/31583] > Upgrade Jetty to fix CVE-2020-27218 > --- > > Key: SPARK-34449 > URL: https://issues.apache.org/jira/browse/SPARK-34449 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.7, 3.0.1, 3.2.0, 3.1.1 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > Fix For: 2.4.8, 3.0.3, 3.1.2 > > > CVE-2020-27218 affects the currently used Jetty 9.4.34 so let's upgrade it. > https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27218. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34478) Ignore or reject wrong config when start sparksession
[ https://issues.apache.org/jira/browse/SPARK-34478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34478: Assignee: Apache Spark > Ignore or reject wrong config when start sparksession > - > > Key: SPARK-34478 > URL: https://issues.apache.org/jira/browse/SPARK-34478 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > When use > {code:java} > SparkSession.builder().config() > {code} > In this method user may config `spark.driver.memory`. But when we run this > code, jvm is started, so this configuration won't work and in Spark UI, it > will show as this configuration. > So we should ignore such as wrong way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34478) Ignore or reject wrong config when start sparksession
[ https://issues.apache.org/jira/browse/SPARK-34478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287478#comment-17287478 ] Apache Spark commented on SPARK-34478: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/31598 > Ignore or reject wrong config when start sparksession > - > > Key: SPARK-34478 > URL: https://issues.apache.org/jira/browse/SPARK-34478 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > When use > {code:java} > SparkSession.builder().config() > {code} > In this method user may config `spark.driver.memory`. But when we run this > code, jvm is started, so this configuration won't work and in Spark UI, it > will show as this configuration. > So we should ignore such as wrong way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34478) Ignore or reject wrong config when start sparksession
[ https://issues.apache.org/jira/browse/SPARK-34478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34478: Assignee: (was: Apache Spark) > Ignore or reject wrong config when start sparksession > - > > Key: SPARK-34478 > URL: https://issues.apache.org/jira/browse/SPARK-34478 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > When use > {code:java} > SparkSession.builder().config() > {code} > In this method user may config `spark.driver.memory`. But when we run this > code, jvm is started, so this configuration won't work and in Spark UI, it > will show as this configuration. > So we should ignore such as wrong way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34478) Ignore or reject wrong config when start sparksession
[ https://issues.apache.org/jira/browse/SPARK-34478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287479#comment-17287479 ] Apache Spark commented on SPARK-34478: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/31598 > Ignore or reject wrong config when start sparksession > - > > Key: SPARK-34478 > URL: https://issues.apache.org/jira/browse/SPARK-34478 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > When use > {code:java} > SparkSession.builder().config() > {code} > In this method user may config `spark.driver.memory`. But when we run this > code, jvm is started, so this configuration won't work and in Spark UI, it > will show as this configuration. > So we should ignore such as wrong way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34477) Kryo NPEs when serializing Avro GenericData objects (except GenericRecord)
[ https://issues.apache.org/jira/browse/SPARK-34477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34477: Assignee: (was: Apache Spark) > Kryo NPEs when serializing Avro GenericData objects (except GenericRecord) > --- > > Key: SPARK-34477 > URL: https://issues.apache.org/jira/browse/SPARK-34477 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0, 3.0.0 >Reporter: Shardul Mahadik >Priority: Major > > SPARK-746 added KryoSerializer for GenericRecord and GenericData.Record Avro > objects. However, Kryo serialization of other GenericData types like array, > enum and fixed fails. Note that if such objects are within a GenericRecord, > then current code works. However if these types are top level objects we want > to distribute, then Kryo fails. > We should register KryoSerializer(s) for these GenericData types. > Code to reproduce: > {code:scala} > import org.apache.avro.{Schema, SchemaBuilder} > import org.apache.avro.generic.GenericData.Array > val arraySchema = SchemaBuilder.array().items().intType() > val array = new Array[Integer](1, arraySchema) > array.add(1) > sc.parallelize((0 until 10).map((_, array)), 2).collect > {code} > Similar code can be written for enums and fixed types > Errors: > GenericData.Array > {code:java} > java.io.IOException: java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1410) > at > org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:69) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at org.apache.avro.generic.GenericData$Array.add(GenericData.java:383) > at java.util.AbstractList.add(AbstractList.java:108) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:35) > at > com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:23) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:303) > at > org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2(ParallelCollectionRDD.scala:79) > at > org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2$adapted(ParallelCollectionRDD.scala:79) > at > org.apache.spark.util.Utils$.deserializeViaNestedStream(Utils.scala:171) > at > org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$1(ParallelCollection
[jira] [Commented] (SPARK-34477) Kryo NPEs when serializing Avro GenericData objects (except GenericRecord)
[ https://issues.apache.org/jira/browse/SPARK-34477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287474#comment-17287474 ] Apache Spark commented on SPARK-34477: -- User 'shardulm94' has created a pull request for this issue: https://github.com/apache/spark/pull/31597 > Kryo NPEs when serializing Avro GenericData objects (except GenericRecord) > --- > > Key: SPARK-34477 > URL: https://issues.apache.org/jira/browse/SPARK-34477 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0, 3.0.0 >Reporter: Shardul Mahadik >Priority: Major > > SPARK-746 added KryoSerializer for GenericRecord and GenericData.Record Avro > objects. However, Kryo serialization of other GenericData types like array, > enum and fixed fails. Note that if such objects are within a GenericRecord, > then current code works. However if these types are top level objects we want > to distribute, then Kryo fails. > We should register KryoSerializer(s) for these GenericData types. > Code to reproduce: > {code:scala} > import org.apache.avro.{Schema, SchemaBuilder} > import org.apache.avro.generic.GenericData.Array > val arraySchema = SchemaBuilder.array().items().intType() > val array = new Array[Integer](1, arraySchema) > array.add(1) > sc.parallelize((0 until 10).map((_, array)), 2).collect > {code} > Similar code can be written for enums and fixed types > Errors: > GenericData.Array > {code:java} > java.io.IOException: java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1410) > at > org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:69) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at org.apache.avro.generic.GenericData$Array.add(GenericData.java:383) > at java.util.AbstractList.add(AbstractList.java:108) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:35) > at > com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:23) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:303) > at > org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2(ParallelCollectionRDD.scala:79) > at > org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2$adapted(ParallelCollectionRDD.scala:79) > at > org.apache.spark.util.Utils$.deserializeViaNestedStream(Uti
[jira] [Assigned] (SPARK-34477) Kryo NPEs when serializing Avro GenericData objects (except GenericRecord)
[ https://issues.apache.org/jira/browse/SPARK-34477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34477: Assignee: Apache Spark > Kryo NPEs when serializing Avro GenericData objects (except GenericRecord) > --- > > Key: SPARK-34477 > URL: https://issues.apache.org/jira/browse/SPARK-34477 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0, 3.0.0 >Reporter: Shardul Mahadik >Assignee: Apache Spark >Priority: Major > > SPARK-746 added KryoSerializer for GenericRecord and GenericData.Record Avro > objects. However, Kryo serialization of other GenericData types like array, > enum and fixed fails. Note that if such objects are within a GenericRecord, > then current code works. However if these types are top level objects we want > to distribute, then Kryo fails. > We should register KryoSerializer(s) for these GenericData types. > Code to reproduce: > {code:scala} > import org.apache.avro.{Schema, SchemaBuilder} > import org.apache.avro.generic.GenericData.Array > val arraySchema = SchemaBuilder.array().items().intType() > val array = new Array[Integer](1, arraySchema) > array.add(1) > sc.parallelize((0 until 10).map((_, array)), 2).collect > {code} > Similar code can be written for enums and fixed types > Errors: > GenericData.Array > {code:java} > java.io.IOException: java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1410) > at > org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:69) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at org.apache.avro.generic.GenericData$Array.add(GenericData.java:383) > at java.util.AbstractList.add(AbstractList.java:108) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:35) > at > com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:23) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:303) > at > org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2(ParallelCollectionRDD.scala:79) > at > org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2$adapted(ParallelCollectionRDD.scala:79) > at > org.apache.spark.util.Utils$.deserializeViaNestedStream(Utils.scala:171) > at > org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readOb
[jira] [Comment Edited] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec
[ https://issues.apache.org/jira/browse/SPARK-34479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287463#comment-17287463 ] Yuming Wang edited comment on SPARK-34479 at 2/20/21, 3:02 AM: --- But zstd 1.4.5-12 is not compatible with 1.4.8-4. https://github.com/apache/avro/blob/release-1.10.1/lang/java/pom.xml#L64 https://github.com/apache/spark/blob/331c6fd4efcb337d903b7179b05997dca2dae2a8/pom.xml#L703 {noformat} Caused by: java.lang.NoSuchMethodError: com.github.luben.zstd.ZstdOutputStream.setCloseFrameOnFlush(Z)Lcom/github/luben/zstd/ZstdOutputStream; at org.apache.avro.file.ZstandardLoader.output(ZstandardLoader.java:40) at org.apache.avro.file.ZstandardCodec.compress(ZstandardCodec.java:67) at org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:386) at org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:407) at org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:428) at org.apache.avro.file.DataFileWriter.flush(DataFileWriter.java:437) at org.apache.avro.file.DataFileWriter.close(DataFileWriter.java:460) at org.apache.spark.sql.avro.SparkAvroKeyRecordWriter.close(SparkAvroKeyOutputFormat.java:88) at org.apache.spark.sql.avro.AvroOutputWriter.close(AvroOutputWriter.scala:86) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:58) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:75) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:281) {noformat} [~iemejia] May be we need to release Avro 1.10.2 or 1.11.0. was (Author: q79969786): But zstd 1.4.5-12 is not compatible with 1.4.8-4. https://github.com/apache/avro/blob/release-1.10.1/lang/java/pom.xml#L64 https://github.com/apache/spark/blob/331c6fd4efcb337d903b7179b05997dca2dae2a8/pom.xml#L703 [~iemejia] May be we need to release Avro 1.10.2 or 1.11.0. {noformat} Caused by: java.lang.NoSuchMethodError: com.github.luben.zstd.ZstdOutputStream.setCloseFrameOnFlush(Z)Lcom/github/luben/zstd/ZstdOutputStream; at org.apache.avro.file.ZstandardLoader.output(ZstandardLoader.java:40) at org.apache.avro.file.ZstandardCodec.compress(ZstandardCodec.java:67) at org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:386) at org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:407) at org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:428) at org.apache.avro.file.DataFileWriter.flush(DataFileWriter.java:437) at org.apache.avro.file.DataFileWriter.close(DataFileWriter.java:460) at org.apache.spark.sql.avro.SparkAvroKeyRecordWriter.close(SparkAvroKeyOutputFormat.java:88) at org.apache.spark.sql.avro.AvroOutputWriter.close(AvroOutputWriter.scala:86) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:58) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:75) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:281) {noformat} > Add zstandard codec to spark.sql.avro.compression.codec > --- > > Key: SPARK-34479 > URL: https://issues.apache.org/jira/browse/SPARK-34479 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Avro add zstandard codec since AVRO-2195. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec
[ https://issues.apache.org/jira/browse/SPARK-34479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287463#comment-17287463 ] Yuming Wang edited comment on SPARK-34479 at 2/20/21, 3:02 AM: --- But zstd 1.4.5-12 is not compatible with 1.4.8-4. https://github.com/apache/avro/blob/release-1.10.1/lang/java/pom.xml#L64 https://github.com/apache/spark/blob/331c6fd4efcb337d903b7179b05997dca2dae2a8/pom.xml#L703 [~iemejia] May be we need to release Avro 1.10.2 or 1.11.0. {noformat} Caused by: java.lang.NoSuchMethodError: com.github.luben.zstd.ZstdOutputStream.setCloseFrameOnFlush(Z)Lcom/github/luben/zstd/ZstdOutputStream; at org.apache.avro.file.ZstandardLoader.output(ZstandardLoader.java:40) at org.apache.avro.file.ZstandardCodec.compress(ZstandardCodec.java:67) at org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:386) at org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:407) at org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:428) at org.apache.avro.file.DataFileWriter.flush(DataFileWriter.java:437) at org.apache.avro.file.DataFileWriter.close(DataFileWriter.java:460) at org.apache.spark.sql.avro.SparkAvroKeyRecordWriter.close(SparkAvroKeyOutputFormat.java:88) at org.apache.spark.sql.avro.AvroOutputWriter.close(AvroOutputWriter.scala:86) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:58) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:75) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:281) {noformat} was (Author: q79969786): But zstd 1.4.5-12 is not compatible with 1.4.8-4. https://github.com/apache/avro/blob/release-1.10.1/lang/java/pom.xml#L64 https://github.com/apache/spark/blob/331c6fd4efcb337d903b7179b05997dca2dae2a8/pom.xml#L703 [~iemejia] May be we need to release Avro 1.10.2 or 1.11.0. > Add zstandard codec to spark.sql.avro.compression.codec > --- > > Key: SPARK-34479 > URL: https://issues.apache.org/jira/browse/SPARK-34479 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Avro add zstandard codec since AVRO-2195. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec
[ https://issues.apache.org/jira/browse/SPARK-34479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287463#comment-17287463 ] Yuming Wang commented on SPARK-34479: - But zstd 1.4.5-12 is not compatible with 1.4.8-4. https://github.com/apache/avro/blob/release-1.10.1/lang/java/pom.xml#L64 https://github.com/apache/spark/blob/331c6fd4efcb337d903b7179b05997dca2dae2a8/pom.xml#L703 [~iemejia] May be we need to release Avro 1.10.2 or 1.11.0. > Add zstandard codec to spark.sql.avro.compression.codec > --- > > Key: SPARK-34479 > URL: https://issues.apache.org/jira/browse/SPARK-34479 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Avro add zstandard codec since AVRO-2195. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec
[ https://issues.apache.org/jira/browse/SPARK-34479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-34479: Description: Avro add zstandard codec since AVRO-2195. (was: Avro add AVRO-2195) > Add zstandard codec to spark.sql.avro.compression.codec > --- > > Key: SPARK-34479 > URL: https://issues.apache.org/jira/browse/SPARK-34479 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Avro add zstandard codec since AVRO-2195. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec
[ https://issues.apache.org/jira/browse/SPARK-34479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-34479: Description: Avro add AVRO-2195 > Add zstandard codec to spark.sql.avro.compression.codec > --- > > Key: SPARK-34479 > URL: https://issues.apache.org/jira/browse/SPARK-34479 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Avro add AVRO-2195 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34479) Add zstandard codec to spark.sql.avro.compression.codec
Yuming Wang created SPARK-34479: --- Summary: Add zstandard codec to spark.sql.avro.compression.codec Key: SPARK-34479 URL: https://issues.apache.org/jira/browse/SPARK-34479 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34478) Ignore or reject wrong config when start sparksession
[ https://issues.apache.org/jira/browse/SPARK-34478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287459#comment-17287459 ] angerszhu commented on SPARK-34478: --- Raise a pr soon > Ignore or reject wrong config when start sparksession > - > > Key: SPARK-34478 > URL: https://issues.apache.org/jira/browse/SPARK-34478 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > When use > {code:java} > SparkSession.builder().config() > {code} > In this method user may config `spark.driver.memory`. But when we run this > code, jvm is started, so this configuration won't work and in Spark UI, it > will show as this configuration. > So we should ignore such as wrong way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34478) Ignore or reject wrong config when start sparksession
angerszhu created SPARK-34478: - Summary: Ignore or reject wrong config when start sparksession Key: SPARK-34478 URL: https://issues.apache.org/jira/browse/SPARK-34478 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 3.2.0 Reporter: angerszhu When use {code:java} SparkSession.builder().config() {code} In this method user may config `spark.driver.memory`. But when we run this code, jvm is started, so this configuration won't work and in Spark UI, it will show as this configuration. So we should ignore such as wrong way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34477) Kryo NPEs when serializing Avro GenericData objects (except GenericRecord)
Shardul Mahadik created SPARK-34477: --- Summary: Kryo NPEs when serializing Avro GenericData objects (except GenericRecord) Key: SPARK-34477 URL: https://issues.apache.org/jira/browse/SPARK-34477 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0, 2.0.0 Reporter: Shardul Mahadik SPARK-746 added KryoSerializer for GenericRecord and GenericData.Record Avro objects. However, Kryo serialization of other GenericData types like array, enum and fixed fails. Note that if such objects are within a GenericRecord, then current code works. However if these types are top level objects we want to distribute, then Kryo fails. We should register KryoSerializer(s) for these GenericData types. Code to reproduce: {code:scala} import org.apache.avro.{Schema, SchemaBuilder} import org.apache.avro.generic.GenericData.Array val arraySchema = SchemaBuilder.array().items().intType() val array = new Array[Integer](1, arraySchema) array.add(1) sc.parallelize((0 until 10).map((_, array)), 2).collect {code} Similar code can be written for enums and fixed types Errors: GenericData.Array {code:java} java.io.IOException: java.lang.NullPointerException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1410) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:69) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.avro.generic.GenericData$Array.add(GenericData.java:383) at java.util.AbstractList.add(AbstractList.java:108) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) at com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:35) at com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:23) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:303) at org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2(ParallelCollectionRDD.scala:79) at org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$2$adapted(ParallelCollectionRDD.scala:79) at org.apache.spark.util.Utils$.deserializeViaNestedStream(Utils.scala:171) at org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$readObject$1(ParallelCollectionRDD.scala:79) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1403) ... 20 more {code} GenericData.EnumSymbol {code:java} com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException Serialization trace: props (org.apac
[jira] [Commented] (SPARK-25075) Build and test Spark against Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287437#comment-17287437 ] Dongjoon Hyun commented on SPARK-25075: --- [~MasseGuillaume]. Feel free to create a new independent Jira issue if you think that's a problem. > Build and test Spark against Scala 2.13 > --- > > Key: SPARK-25075 > URL: https://issues.apache.org/jira/browse/SPARK-25075 > Project: Spark > Issue Type: Umbrella > Components: Build, MLlib, Project Infra, Spark Core, SQL >Affects Versions: 3.0.0 >Reporter: Guillaume Massé >Priority: Major > > This umbrella JIRA tracks the requirements for building and testing Spark > against the current Scala 2.13 milestone. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences
[ https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287403#comment-17287403 ] Ted Yu commented on SPARK-34476: The basic jsonb test is here: https://github.com/yugabyte/yugabyte-db/blob/master/java/yb-cql-4x/src/test/java/org/yb/loadtest/TestSpark3Jsonb.java I am working on adding get_json_string() function (via Spark extension) which is similar to get_json_object() but expands the last jsonb field using '->>' instead of '->'. > Duplicate referenceNames are given for ambiguousReferences > -- > > Key: SPARK-34476 > URL: https://issues.apache.org/jira/browse/SPARK-34476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Ted Yu >Priority: Major > > When running test with Spark extension that converts custom function to json > path expression, I saw the following in test output: > {code} > 2021-02-19 21:57:24,550 (Time-limited test) [INFO - > org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is > == Physical Plan == > org.apache.spark.sql.AnalysisException: Reference > 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: > mycatalog.test.person.phone->'key'->1->'m'->2->>'b', > mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8 > {code} > Please note the candidates following 'could be' are the same. > Here is the physical plan for a working query where phone is a jsonb column: > {code} > TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], > output=[id#6,address#7,key#0]) > +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0] >+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra > Scan: test.person > - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]] > - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b'] > {code} > The difference for the failed query is that it tries to use > {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as > part of filter). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences
[ https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-34476: --- Description: When running test with Spark extension that converts custom function to json path expression, I saw the following in test output: {code} 2021-02-19 21:57:24,550 (Time-limited test) [INFO - org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == Physical Plan == org.apache.spark.sql.AnalysisException: Reference 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: mycatalog.test.person.phone->'key'->1->'m'->2->>'b', mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8 {code} Please note the candidates following 'could be' are the same. Here is the physical plan for a working query where phone is a jsonb column: {code} TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], output=[id#6,address#7,key#0]) +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0] +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra Scan: test.person - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]] - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b'] {code} The difference for the failed query is that it tries to use {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as part of filter). was: When running test with Spark extension that converts custom function to json path expression, I saw the following in test output: {code} 2021-02-19 21:57:24,550 (Time-limited test) [INFO - org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == Physical Plan == org.apache.spark.sql.AnalysisException: Reference 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: mycatalog.test.person.phone->'key'->1->'m'->2->>'b', mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8 {code} Please note the candidates following 'could be' are the same. Here is the physical plan for a working query where phone is a jsonb column: {code} TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], output=[id#6,address#7,key#0]) +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0] +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra Scan: test.person - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]] - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b'] {code} The difference for the failed query is that it tries to use phone->'key'->1->'m'->2->>'b' in the projection (which works as part of filter). > Duplicate referenceNames are given for ambiguousReferences > -- > > Key: SPARK-34476 > URL: https://issues.apache.org/jira/browse/SPARK-34476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Ted Yu >Priority: Major > > When running test with Spark extension that converts custom function to json > path expression, I saw the following in test output: > {code} > 2021-02-19 21:57:24,550 (Time-limited test) [INFO - > org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is > == Physical Plan == > org.apache.spark.sql.AnalysisException: Reference > 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: > mycatalog.test.person.phone->'key'->1->'m'->2->>'b', > mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8 > {code} > Please note the candidates following 'could be' are the same. > Here is the physical plan for a working query where phone is a jsonb column: > {code} > TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], > output=[id#6,address#7,key#0]) > +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0] >+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra > Scan: test.person > - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]] > - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b'] > {code} > The difference for the failed query is that it tries to use > {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as > part of filter). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences
Ted Yu created SPARK-34476: -- Summary: Duplicate referenceNames are given for ambiguousReferences Key: SPARK-34476 URL: https://issues.apache.org/jira/browse/SPARK-34476 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0 Reporter: Ted Yu When running test with Spark extension that converts custom function to json path expression, I saw the following in test output: {code} 2021-02-19 21:57:24,550 (Time-limited test) [INFO - org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == Physical Plan == org.apache.spark.sql.AnalysisException: Reference 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: mycatalog.test.person.phone->'key'->1->'m'->2->>'b', mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8 {code} Please note the candidates following 'could be' are the same. Here is the physical plan for a working query where phone is a jsonb column: {code} TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], output=[id#6,address#7,key#0]) +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0] +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra Scan: test.person - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]] - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b'] {code} The difference for the failed query is that it tries to use phone->'key'->1->'m'->2->>'b' in the projection (which works as part of filter). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24818) Ensure all the barrier tasks in the same stage are launched together
[ https://issues.apache.org/jira/browse/SPARK-24818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-24818: --- Assignee: wuyi > Ensure all the barrier tasks in the same stage are launched together > > > Key: SPARK-24818 > URL: https://issues.apache.org/jira/browse/SPARK-24818 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Xingbo Jiang >Assignee: wuyi >Priority: Major > Fix For: 3.2.0 > > > When some executors/hosts are blacklisted, it may happen that only a part of > the tasks in the same barrier stage can be launched. We shall detect the case > and revert the allocated resource offers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24818) Ensure all the barrier tasks in the same stage are launched together
[ https://issues.apache.org/jira/browse/SPARK-24818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-24818. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30650 [https://github.com/apache/spark/pull/30650] > Ensure all the barrier tasks in the same stage are launched together > > > Key: SPARK-24818 > URL: https://issues.apache.org/jira/browse/SPARK-24818 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Xingbo Jiang >Priority: Major > Fix For: 3.2.0 > > > When some executors/hosts are blacklisted, it may happen that only a part of > the tasks in the same barrier stage can be launched. We shall detect the case > and revert the allocated resource offers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34475) Rename v2 logical nodes
[ https://issues.apache.org/jira/browse/SPARK-34475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34475: Assignee: Maxim Gekk (was: Apache Spark) > Rename v2 logical nodes > --- > > Key: SPARK-34475 > URL: https://issues.apache.org/jira/browse/SPARK-34475 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Rename v2 logical nodes for simplicity in the form: + -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34475) Rename v2 logical nodes
[ https://issues.apache.org/jira/browse/SPARK-34475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287340#comment-17287340 ] Apache Spark commented on SPARK-34475: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/31596 > Rename v2 logical nodes > --- > > Key: SPARK-34475 > URL: https://issues.apache.org/jira/browse/SPARK-34475 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Rename v2 logical nodes for simplicity in the form: + -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34475) Rename v2 logical nodes
[ https://issues.apache.org/jira/browse/SPARK-34475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34475: Assignee: Apache Spark (was: Maxim Gekk) > Rename v2 logical nodes > --- > > Key: SPARK-34475 > URL: https://issues.apache.org/jira/browse/SPARK-34475 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > Fix For: 3.2.0 > > > Rename v2 logical nodes for simplicity in the form: + -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34475) Rename v2 logical nodes
[ https://issues.apache.org/jira/browse/SPARK-34475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34475: --- Description: Rename v2 logical nodes for simplicity in the form: + (was: To be consistent with other exec nodes, rename: * AlterTableAddPartitionExec -> AddPartitionExec * AlterTableRenamePartitionExec -> RenamePartitionExec * AlterTableDropPartitionExec -> DropPartitionExec) > Rename v2 logical nodes > --- > > Key: SPARK-34475 > URL: https://issues.apache.org/jira/browse/SPARK-34475 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Rename v2 logical nodes for simplicity in the form: + -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34475) Rename v2 logical nodes
Maxim Gekk created SPARK-34475: -- Summary: Rename v2 logical nodes Key: SPARK-34475 URL: https://issues.apache.org/jira/browse/SPARK-34475 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 To be consistent with other exec nodes, rename: * AlterTableAddPartitionExec -> AddPartitionExec * AlterTableRenamePartitionExec -> RenamePartitionExec * AlterTableDropPartitionExec -> DropPartitionExec -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34474) Remove unnecessary Union under Distinct like operators
[ https://issues.apache.org/jira/browse/SPARK-34474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34474: Assignee: L. C. Hsieh (was: Apache Spark) > Remove unnecessary Union under Distinct like operators > -- > > Key: SPARK-34474 > URL: https://issues.apache.org/jira/browse/SPARK-34474 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > For an Union under Distinct like operators, if its children are all the same, > we can just keep one among them and remove the Union. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34474) Remove unnecessary Union under Distinct like operators
[ https://issues.apache.org/jira/browse/SPARK-34474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34474: Assignee: Apache Spark (was: L. C. Hsieh) > Remove unnecessary Union under Distinct like operators > -- > > Key: SPARK-34474 > URL: https://issues.apache.org/jira/browse/SPARK-34474 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: Apache Spark >Priority: Major > > For an Union under Distinct like operators, if its children are all the same, > we can just keep one among them and remove the Union. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34474) Remove unnecessary Union under Distinct like operators
[ https://issues.apache.org/jira/browse/SPARK-34474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287320#comment-17287320 ] Apache Spark commented on SPARK-34474: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/31595 > Remove unnecessary Union under Distinct like operators > -- > > Key: SPARK-34474 > URL: https://issues.apache.org/jira/browse/SPARK-34474 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > For an Union under Distinct like operators, if its children are all the same, > we can just keep one among them and remove the Union. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34474) Remove unnecessary Union under Distinct like operators
L. C. Hsieh created SPARK-34474: --- Summary: Remove unnecessary Union under Distinct like operators Key: SPARK-34474 URL: https://issues.apache.org/jira/browse/SPARK-34474 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: L. C. Hsieh Assignee: L. C. Hsieh For an Union under Distinct like operators, if its children are all the same, we can just keep one among them and remove the Union. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34424) HiveOrcHadoopFsRelationSuite fails with seed 610710213676
[ https://issues.apache.org/jira/browse/SPARK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-34424: -- Fix Version/s: (was: 3.0.2) 3.0.3 > HiveOrcHadoopFsRelationSuite fails with seed 610710213676 > - > > Key: SPARK-34424 > URL: https://issues.apache.org/jira/browse/SPARK-34424 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.1, 3.0.3 > > > The test "test all data types" in HiveOrcHadoopFsRelationSuite fails with: > {code:java} > == Results == > !== Correct Answer - 20 ==== Spark Answer - 20 == > struct struct > [1,1582-10-15] [1,1582-10-15] > [2,null] [2,null] > [3,1970-01-01] [3,1970-01-01] > [4,1681-08-06] [4,1681-08-06] > [5,1582-10-15] [5,1582-10-15] > [6,-12-31] [6,-12-31] > [7,0583-01-04] [7,0583-01-04] > [8,6077-03-04] [8,6077-03-04] > ![9,1582-10-06] [9,1582-10-15] > [10,1582-10-15] [10,1582-10-15] > [11,-12-31] [11,-12-31] > [12,9722-10-04] [12,9722-10-04] > [13,0243-12-19] [13,0243-12-19] > [14,-12-31] [14,-12-31] > [15,8743-01-24] [15,8743-01-24] > [16,1039-10-31] [16,1039-10-31] > [17,-12-31] [17,-12-31] > [18,1582-10-15] [18,1582-10-15] > [19,1582-10-15] [19,1582-10-15] > [20,1582-10-15] [20,1582-10-15] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34468) Fix v2 ALTER TABLE .. RENAME TO
[ https://issues.apache.org/jira/browse/SPARK-34468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34468: Assignee: Apache Spark > Fix v2 ALTER TABLE .. RENAME TO > --- > > Key: SPARK-34468 > URL: https://issues.apache.org/jira/browse/SPARK-34468 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > The v2 `ALTER TABLE .. RENAME TO` command should rename a table in-place > instead of moving it to the "root" namespace: > {code:scala} > sql("ALTER TABLE ns1.ns2.ns3.src_tbl RENAME TO dst_tbl") > sql(s"SHOW TABLES IN $catalog").show(false) > +-+-+---+ > |namespace|tableName|isTemporary| > +-+-+---+ > | |dst_tbl |false | > +-+-+---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34468) Fix v2 ALTER TABLE .. RENAME TO
[ https://issues.apache.org/jira/browse/SPARK-34468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34468: Assignee: (was: Apache Spark) > Fix v2 ALTER TABLE .. RENAME TO > --- > > Key: SPARK-34468 > URL: https://issues.apache.org/jira/browse/SPARK-34468 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The v2 `ALTER TABLE .. RENAME TO` command should rename a table in-place > instead of moving it to the "root" namespace: > {code:scala} > sql("ALTER TABLE ns1.ns2.ns3.src_tbl RENAME TO dst_tbl") > sql(s"SHOW TABLES IN $catalog").show(false) > +-+-+---+ > |namespace|tableName|isTemporary| > +-+-+---+ > | |dst_tbl |false | > +-+-+---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34468) Fix v2 ALTER TABLE .. RENAME TO
[ https://issues.apache.org/jira/browse/SPARK-34468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287274#comment-17287274 ] Apache Spark commented on SPARK-34468: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/31594 > Fix v2 ALTER TABLE .. RENAME TO > --- > > Key: SPARK-34468 > URL: https://issues.apache.org/jira/browse/SPARK-34468 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The v2 `ALTER TABLE .. RENAME TO` command should rename a table in-place > instead of moving it to the "root" namespace: > {code:scala} > sql("ALTER TABLE ns1.ns2.ns3.src_tbl RENAME TO dst_tbl") > sql(s"SHOW TABLES IN $catalog").show(false) > +-+-+---+ > |namespace|tableName|isTemporary| > +-+-+---+ > | |dst_tbl |false | > +-+-+---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34469) Ignore RegisterExecutor when SparkContext is stopped
[ https://issues.apache.org/jira/browse/SPARK-34469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-34469: - Assignee: Dongjoon Hyun > Ignore RegisterExecutor when SparkContext is stopped > > > Key: SPARK-34469 > URL: https://issues.apache.org/jira/browse/SPARK-34469 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34469) Ignore RegisterExecutor when SparkContext is stopped
[ https://issues.apache.org/jira/browse/SPARK-34469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-34469. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31587 [https://github.com/apache/spark/pull/31587] > Ignore RegisterExecutor when SparkContext is stopped > > > Key: SPARK-34469 > URL: https://issues.apache.org/jira/browse/SPARK-34469 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34283) Combines all adjacent 'Union' operators into a single 'Union' when using 'Dataset.union.distinct.union.distinct'
[ https://issues.apache.org/jira/browse/SPARK-34283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34283: --- Assignee: Zhichao Zhang > Combines all adjacent 'Union' operators into a single 'Union' when using > 'Dataset.union.distinct.union.distinct' > > > Key: SPARK-34283 > URL: https://issues.apache.org/jira/browse/SPARK-34283 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Major > Fix For: 3.2.0 > > Attachments: image-2021-01-29-11-12-44-112.png, > image-2021-01-29-11-13-42-055.png, image-2021-01-29-11-14-08-822.png, > image-2021-01-29-11-14-42-700.png > > > Problem: > Currently when using 'Dataset.union.distinct.union.distinct' to union some > datasets, Optimizer can't combine all adjacent 'Union' operators into a > single 'Union', but it can handle this case when using sql. > For example: > !image-2021-01-29-11-12-44-112.png! > The 'Physical Plan' is shown below: > !image-2021-01-29-11-13-42-055.png! > But using sql: > !image-2021-01-29-11-14-08-822.png! > The 'Physical Plan' is shown below: > !image-2021-01-29-11-14-42-700.png! > > Root cause: > When using 'Dataset.union.distinct.union.distinct', the operator is > 'Deduplicate(Keys, Union)', but AstBuilder transform sql 'Union' to operator > 'Distinct(Union)', the rule 'CombineUnions' in Optimizer only handle > 'Distinct(Union)' operator but not Deduplicate(Keys, Union). > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34283) Combines all adjacent 'Union' operators into a single 'Union' when using 'Dataset.union.distinct.union.distinct'
[ https://issues.apache.org/jira/browse/SPARK-34283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34283. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31404 [https://github.com/apache/spark/pull/31404] > Combines all adjacent 'Union' operators into a single 'Union' when using > 'Dataset.union.distinct.union.distinct' > > > Key: SPARK-34283 > URL: https://issues.apache.org/jira/browse/SPARK-34283 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Zhichao Zhang >Priority: Major > Fix For: 3.2.0 > > Attachments: image-2021-01-29-11-12-44-112.png, > image-2021-01-29-11-13-42-055.png, image-2021-01-29-11-14-08-822.png, > image-2021-01-29-11-14-42-700.png > > > Problem: > Currently when using 'Dataset.union.distinct.union.distinct' to union some > datasets, Optimizer can't combine all adjacent 'Union' operators into a > single 'Union', but it can handle this case when using sql. > For example: > !image-2021-01-29-11-12-44-112.png! > The 'Physical Plan' is shown below: > !image-2021-01-29-11-13-42-055.png! > But using sql: > !image-2021-01-29-11-14-08-822.png! > The 'Physical Plan' is shown below: > !image-2021-01-29-11-14-42-700.png! > > Root cause: > When using 'Dataset.union.distinct.union.distinct', the operator is > 'Deduplicate(Keys, Union)', but AstBuilder transform sql 'Union' to operator > 'Distinct(Union)', the rule 'CombineUnions' in Optimizer only handle > 'Distinct(Union)' operator but not Deduplicate(Keys, Union). > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25075) Build and test Spark against Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287109#comment-17287109 ] Guillaume Martres commented on SPARK-25075: --- [~dongjoon] I think something is wrong with the published snapshots, it seems to depend on both Scala 2.12 and Scala 2.13 artifacts, leading to crashes at runtime, and indeed if I look at [https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.13/3.2.0-SNAPSHOT/spark-parent_2.13-3.2.0-20210219.011324-25.pom] I see: 2.12.10 So I assume a config file wasn't updated somewhere. > Build and test Spark against Scala 2.13 > --- > > Key: SPARK-25075 > URL: https://issues.apache.org/jira/browse/SPARK-25075 > Project: Spark > Issue Type: Umbrella > Components: Build, MLlib, Project Infra, Spark Core, SQL >Affects Versions: 3.0.0 >Reporter: Guillaume Massé >Priority: Major > > This umbrella JIRA tracks the requirements for building and testing Spark > against the current Scala 2.13 milestone. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34473) avoid NPE in DataFrameReader.schema(StructType)
[ https://issues.apache.org/jira/browse/SPARK-34473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34473: Assignee: Apache Spark > avoid NPE in DataFrameReader.schema(StructType) > --- > > Key: SPARK-34473 > URL: https://issues.apache.org/jira/browse/SPARK-34473 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34473) avoid NPE in DataFrameReader.schema(StructType)
[ https://issues.apache.org/jira/browse/SPARK-34473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287096#comment-17287096 ] Apache Spark commented on SPARK-34473: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/31593 > avoid NPE in DataFrameReader.schema(StructType) > --- > > Key: SPARK-34473 > URL: https://issues.apache.org/jira/browse/SPARK-34473 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34473) avoid NPE in DataFrameReader.schema(StructType)
[ https://issues.apache.org/jira/browse/SPARK-34473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34473: Assignee: (was: Apache Spark) > avoid NPE in DataFrameReader.schema(StructType) > --- > > Key: SPARK-34473 > URL: https://issues.apache.org/jira/browse/SPARK-34473 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34473) avoid NPE in DataFrameReader.schema(StructType)
Wenchen Fan created SPARK-34473: --- Summary: avoid NPE in DataFrameReader.schema(StructType) Key: SPARK-34473 URL: https://issues.apache.org/jira/browse/SPARK-34473 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28123) String Functions: Add support btrim
[ https://issues.apache.org/jira/browse/SPARK-28123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-28123: --- Assignee: jiaan.geng > String Functions: Add support btrim > --- > > Key: SPARK-28123 > URL: https://issues.apache.org/jira/browse/SPARK-28123 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: jiaan.geng >Priority: Major > > ||Function||Return Type||Description||Example||Result|| > |{{btrim(_{{string}}_}}{{bytea}}{{, > _{{bytes}}_}}{{bytea}}{{)}}|{{bytea}}|Remove the longest string containing > only bytes appearing in _{{bytes}}_from the start and end of > _{{string}}_|{{btrim('\000trim\001'::bytea, '\000\001'::bytea)}}|{{trim}}| > More details: https://www.postgresql.org/docs/11/functions-binarystring.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28123) String Functions: Add support btrim
[ https://issues.apache.org/jira/browse/SPARK-28123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-28123. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31390 [https://github.com/apache/spark/pull/31390] > String Functions: Add support btrim > --- > > Key: SPARK-28123 > URL: https://issues.apache.org/jira/browse/SPARK-28123 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: jiaan.geng >Priority: Major > Fix For: 3.2.0 > > > ||Function||Return Type||Description||Example||Result|| > |{{btrim(_{{string}}_}}{{bytea}}{{, > _{{bytes}}_}}{{bytea}}{{)}}|{{bytea}}|Remove the longest string containing > only bytes appearing in _{{bytes}}_from the start and end of > _{{string}}_|{{btrim('\000trim\001'::bytea, '\000\001'::bytea)}}|{{trim}}| > More details: https://www.postgresql.org/docs/11/functions-binarystring.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34424) HiveOrcHadoopFsRelationSuite fails with seed 610710213676
[ https://issues.apache.org/jira/browse/SPARK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34424. - Fix Version/s: 3.1.1 3.0.2 Assignee: Maxim Gekk Resolution: Fixed > HiveOrcHadoopFsRelationSuite fails with seed 610710213676 > - > > Key: SPARK-34424 > URL: https://issues.apache.org/jira/browse/SPARK-34424 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.2, 3.1.1 > > > The test "test all data types" in HiveOrcHadoopFsRelationSuite fails with: > {code:java} > == Results == > !== Correct Answer - 20 ==== Spark Answer - 20 == > struct struct > [1,1582-10-15] [1,1582-10-15] > [2,null] [2,null] > [3,1970-01-01] [3,1970-01-01] > [4,1681-08-06] [4,1681-08-06] > [5,1582-10-15] [5,1582-10-15] > [6,-12-31] [6,-12-31] > [7,0583-01-04] [7,0583-01-04] > [8,6077-03-04] [8,6077-03-04] > ![9,1582-10-06] [9,1582-10-15] > [10,1582-10-15] [10,1582-10-15] > [11,-12-31] [11,-12-31] > [12,9722-10-04] [12,9722-10-04] > [13,0243-12-19] [13,0243-12-19] > [14,-12-31] [14,-12-31] > [15,8743-01-24] [15,8743-01-24] > [16,1039-10-31] [16,1039-10-31] > [17,-12-31] [17,-12-31] > [18,1582-10-15] [18,1582-10-15] > [19,1582-10-15] [19,1582-10-15] > [20,1582-10-15] [20,1582-10-15] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34421) Custom functions can't be used in temporary views with CTEs
[ https://issues.apache.org/jira/browse/SPARK-34421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287057#comment-17287057 ] Apache Spark commented on SPARK-34421: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/31592 > Custom functions can't be used in temporary views with CTEs > --- > > Key: SPARK-34421 > URL: https://issues.apache.org/jira/browse/SPARK-34421 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 > Environment: Databricks Runtime 8.0 >Reporter: Lauri Koobas >Assignee: Peter Toth >Priority: Blocker > Fix For: 3.1.1 > > > The following query works in Spark 3.0 not Spark 3.1. > > Start with: > {{spark.udf.registerJavaFunction("custom_func", > "com.stuff.path.custom_func", LongType())}} > > Works: * {{select custom_func()}} > * {{create temporary view blaah as select custom_func()}} > * {{with step_1 as ( select custom_func() ) select * from step_1}} > Broken: > {{create temporary view blaah as with step_1 as ( select custom_func() ) > select * from step_1}} > > followed by: > {{select * from blaah}} > > Error: > {{Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF > '}}{{com.stuff.path.custom_func}}{{';}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34472) SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file
[ https://issues.apache.org/jira/browse/SPARK-34472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287008#comment-17287008 ] Apache Spark commented on SPARK-34472: -- User 'shardulm94' has created a pull request for this issue: https://github.com/apache/spark/pull/31591 > SparkContext.addJar with an ivy path fails in cluster mode with a custom > ivySettings file > - > > Key: SPARK-34472 > URL: https://issues.apache.org/jira/browse/SPARK-34472 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Shardul Mahadik >Priority: Major > > SPARK-33084 introduced support for Ivy paths in {{sc.addJar}} or Spark SQL > {{ADD JAR}}. If we use a custom ivySettings file using > {{spark.jars.ivySettings}}, it is loaded at > [https://github.com/apache/spark/blob/b26e7b510bbaee63c4095ab47e75ff2a70e377d7/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1280.] > However, this file is only accessible on the client machine. In cluster > mode, this file is not available on the driver and so {{addJar}} fails. > {code:sh} > spark-submit --master yarn --deploy-mode cluster --class IvyAddJarExample > --conf spark.jars.ivySettings=/path/to/ivySettings.xml example.jar > {code} > {code} > java.lang.IllegalArgumentException: requirement failed: Ivy settings file > /path/to/ivySettings.xml does not exist > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.deploy.SparkSubmitUtils$.loadIvySettings(SparkSubmit.scala:1331) > at > org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:176) > at > org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:156) > at > org.apache.spark.sql.internal.SessionResourceLoader.resolveJars(SessionState.scala:166) > at > org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:133) > at > org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40) > {code} > We should ship the ivySettings file to the driver so that {{addJar}} is able > to find it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34472) SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file
[ https://issues.apache.org/jira/browse/SPARK-34472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34472: Assignee: Apache Spark > SparkContext.addJar with an ivy path fails in cluster mode with a custom > ivySettings file > - > > Key: SPARK-34472 > URL: https://issues.apache.org/jira/browse/SPARK-34472 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Shardul Mahadik >Assignee: Apache Spark >Priority: Major > > SPARK-33084 introduced support for Ivy paths in {{sc.addJar}} or Spark SQL > {{ADD JAR}}. If we use a custom ivySettings file using > {{spark.jars.ivySettings}}, it is loaded at > [https://github.com/apache/spark/blob/b26e7b510bbaee63c4095ab47e75ff2a70e377d7/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1280.] > However, this file is only accessible on the client machine. In cluster > mode, this file is not available on the driver and so {{addJar}} fails. > {code:sh} > spark-submit --master yarn --deploy-mode cluster --class IvyAddJarExample > --conf spark.jars.ivySettings=/path/to/ivySettings.xml example.jar > {code} > {code} > java.lang.IllegalArgumentException: requirement failed: Ivy settings file > /path/to/ivySettings.xml does not exist > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.deploy.SparkSubmitUtils$.loadIvySettings(SparkSubmit.scala:1331) > at > org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:176) > at > org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:156) > at > org.apache.spark.sql.internal.SessionResourceLoader.resolveJars(SessionState.scala:166) > at > org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:133) > at > org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40) > {code} > We should ship the ivySettings file to the driver so that {{addJar}} is able > to find it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34472) SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file
[ https://issues.apache.org/jira/browse/SPARK-34472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34472: Assignee: (was: Apache Spark) > SparkContext.addJar with an ivy path fails in cluster mode with a custom > ivySettings file > - > > Key: SPARK-34472 > URL: https://issues.apache.org/jira/browse/SPARK-34472 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Shardul Mahadik >Priority: Major > > SPARK-33084 introduced support for Ivy paths in {{sc.addJar}} or Spark SQL > {{ADD JAR}}. If we use a custom ivySettings file using > {{spark.jars.ivySettings}}, it is loaded at > [https://github.com/apache/spark/blob/b26e7b510bbaee63c4095ab47e75ff2a70e377d7/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1280.] > However, this file is only accessible on the client machine. In cluster > mode, this file is not available on the driver and so {{addJar}} fails. > {code:sh} > spark-submit --master yarn --deploy-mode cluster --class IvyAddJarExample > --conf spark.jars.ivySettings=/path/to/ivySettings.xml example.jar > {code} > {code} > java.lang.IllegalArgumentException: requirement failed: Ivy settings file > /path/to/ivySettings.xml does not exist > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.deploy.SparkSubmitUtils$.loadIvySettings(SparkSubmit.scala:1331) > at > org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:176) > at > org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:156) > at > org.apache.spark.sql.internal.SessionResourceLoader.resolveJars(SessionState.scala:166) > at > org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:133) > at > org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40) > {code} > We should ship the ivySettings file to the driver so that {{addJar}} is able > to find it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34472) SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file
[ https://issues.apache.org/jira/browse/SPARK-34472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287009#comment-17287009 ] Apache Spark commented on SPARK-34472: -- User 'shardulm94' has created a pull request for this issue: https://github.com/apache/spark/pull/31591 > SparkContext.addJar with an ivy path fails in cluster mode with a custom > ivySettings file > - > > Key: SPARK-34472 > URL: https://issues.apache.org/jira/browse/SPARK-34472 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Shardul Mahadik >Priority: Major > > SPARK-33084 introduced support for Ivy paths in {{sc.addJar}} or Spark SQL > {{ADD JAR}}. If we use a custom ivySettings file using > {{spark.jars.ivySettings}}, it is loaded at > [https://github.com/apache/spark/blob/b26e7b510bbaee63c4095ab47e75ff2a70e377d7/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1280.] > However, this file is only accessible on the client machine. In cluster > mode, this file is not available on the driver and so {{addJar}} fails. > {code:sh} > spark-submit --master yarn --deploy-mode cluster --class IvyAddJarExample > --conf spark.jars.ivySettings=/path/to/ivySettings.xml example.jar > {code} > {code} > java.lang.IllegalArgumentException: requirement failed: Ivy settings file > /path/to/ivySettings.xml does not exist > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.deploy.SparkSubmitUtils$.loadIvySettings(SparkSubmit.scala:1331) > at > org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:176) > at > org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:156) > at > org.apache.spark.sql.internal.SessionResourceLoader.resolveJars(SessionState.scala:166) > at > org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:133) > at > org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40) > {code} > We should ship the ivySettings file to the driver so that {{addJar}} is able > to find it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34421) Custom functions can't be used in temporary views with CTEs
[ https://issues.apache.org/jira/browse/SPARK-34421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34421: --- Assignee: Peter Toth > Custom functions can't be used in temporary views with CTEs > --- > > Key: SPARK-34421 > URL: https://issues.apache.org/jira/browse/SPARK-34421 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 > Environment: Databricks Runtime 8.0 >Reporter: Lauri Koobas >Assignee: Peter Toth >Priority: Blocker > Fix For: 3.1.1 > > > Works in DBR 7.4, which is Spark 3.0.1. Breaks in DBR8.0(beta), which is > Spark 3.1. > > Start with: > {{spark.udf.registerJavaFunction("custom_func", "com.stuff.path.custom_func", > LongType())}} > > Works: * {{select custom_func()}} > * {{create temporary view blaah as select custom_func()}} > * {{with step_1 as ( select custom_func() ) select * from step_1}} > Broken: > {{create temporary view blaah as with step_1 as ( select custom_func() ) > select * from step_1}} > > followed by: > {{select * from blaah}} > > Error: > {{Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF > '}}{{com.stuff.path.custom_func}}{{';}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34421) Custom functions can't be used in temporary views with CTEs
[ https://issues.apache.org/jira/browse/SPARK-34421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-34421: Description: The following query works in Spark 3.0 not Spark 3.1. Start with: {{spark.udf.registerJavaFunction("custom_func", "com.stuff.path.custom_func", LongType())}} Works: * {{select custom_func()}} * {{create temporary view blaah as select custom_func()}} * {{with step_1 as ( select custom_func() ) select * from step_1}} Broken: {{create temporary view blaah as with step_1 as ( select custom_func() ) select * from step_1}} followed by: {{select * from blaah}} Error: {{Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF '}}{{com.stuff.path.custom_func}}{{';}} was: Works in DBR 7.4, which is Spark 3.0.1. Breaks in DBR8.0(beta), which is Spark 3.1. Start with: {{spark.udf.registerJavaFunction("custom_func", "com.stuff.path.custom_func", LongType())}} Works: * {{select custom_func()}} * {{create temporary view blaah as select custom_func()}} * {{with step_1 as ( select custom_func() ) select * from step_1}} Broken: {{create temporary view blaah as with step_1 as ( select custom_func() ) select * from step_1}} followed by: {{select * from blaah}} Error: {{Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF '}}{{com.stuff.path.custom_func}}{{';}} > Custom functions can't be used in temporary views with CTEs > --- > > Key: SPARK-34421 > URL: https://issues.apache.org/jira/browse/SPARK-34421 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 > Environment: Databricks Runtime 8.0 >Reporter: Lauri Koobas >Assignee: Peter Toth >Priority: Blocker > Fix For: 3.1.1 > > > The following query works in Spark 3.0 not Spark 3.1. > > Start with: > {{spark.udf.registerJavaFunction("custom_func", > "com.stuff.path.custom_func", LongType())}} > > Works: * {{select custom_func()}} > * {{create temporary view blaah as select custom_func()}} > * {{with step_1 as ( select custom_func() ) select * from step_1}} > Broken: > {{create temporary view blaah as with step_1 as ( select custom_func() ) > select * from step_1}} > > followed by: > {{select * from blaah}} > > Error: > {{Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF > '}}{{com.stuff.path.custom_func}}{{';}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34421) Custom functions can't be used in temporary views with CTEs
[ https://issues.apache.org/jira/browse/SPARK-34421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34421. - Fix Version/s: 3.1.1 Resolution: Fixed Issue resolved by pull request 31550 [https://github.com/apache/spark/pull/31550] > Custom functions can't be used in temporary views with CTEs > --- > > Key: SPARK-34421 > URL: https://issues.apache.org/jira/browse/SPARK-34421 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 > Environment: Databricks Runtime 8.0 >Reporter: Lauri Koobas >Priority: Blocker > Fix For: 3.1.1 > > > Works in DBR 7.4, which is Spark 3.0.1. Breaks in DBR8.0(beta), which is > Spark 3.1. > > Start with: > {{spark.udf.registerJavaFunction("custom_func", "com.stuff.path.custom_func", > LongType())}} > > Works: * {{select custom_func()}} > * {{create temporary view blaah as select custom_func()}} > * {{with step_1 as ( select custom_func() ) select * from step_1}} > Broken: > {{create temporary view blaah as with step_1 as ( select custom_func() ) > select * from step_1}} > > followed by: > {{select * from blaah}} > > Error: > {{Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF > '}}{{com.stuff.path.custom_func}}{{';}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34472) SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file
[ https://issues.apache.org/jira/browse/SPARK-34472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286977#comment-17286977 ] Shardul Mahadik commented on SPARK-34472: - I will be sending a PR for this soon. > SparkContext.addJar with an ivy path fails in cluster mode with a custom > ivySettings file > - > > Key: SPARK-34472 > URL: https://issues.apache.org/jira/browse/SPARK-34472 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Shardul Mahadik >Priority: Major > > SPARK-33084 introduced support for Ivy paths in {{sc.addJar}} or Spark SQL > {{ADD JAR}}. If we use a custom ivySettings file using > {{spark.jars.ivySettings}}, it is loaded at > [https://github.com/apache/spark/blob/b26e7b510bbaee63c4095ab47e75ff2a70e377d7/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1280.] > However, this file is only accessible on the client machine. In cluster > mode, this file is not available on the driver and so {{addJar}} fails. > {code:sh} > spark-submit --master yarn --deploy-mode cluster --class IvyAddJarExample > --conf spark.jars.ivySettings=/path/to/ivySettings.xml example.jar > {code} > {code} > java.lang.IllegalArgumentException: requirement failed: Ivy settings file > /path/to/ivySettings.xml does not exist > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.deploy.SparkSubmitUtils$.loadIvySettings(SparkSubmit.scala:1331) > at > org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:176) > at > org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:156) > at > org.apache.spark.sql.internal.SessionResourceLoader.resolveJars(SessionState.scala:166) > at > org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:133) > at > org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40) > {code} > We should ship the ivySettings file to the driver so that {{addJar}} is able > to find it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34472) SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file
Shardul Mahadik created SPARK-34472: --- Summary: SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file Key: SPARK-34472 URL: https://issues.apache.org/jira/browse/SPARK-34472 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.0 Reporter: Shardul Mahadik SPARK-33084 introduced support for Ivy paths in {{sc.addJar}} or Spark SQL {{ADD JAR}}. If we use a custom ivySettings file using {{spark.jars.ivySettings}}, it is loaded at [https://github.com/apache/spark/blob/b26e7b510bbaee63c4095ab47e75ff2a70e377d7/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1280.] However, this file is only accessible on the client machine. In cluster mode, this file is not available on the driver and so {{addJar}} fails. {code:sh} spark-submit --master yarn --deploy-mode cluster --class IvyAddJarExample --conf spark.jars.ivySettings=/path/to/ivySettings.xml example.jar {code} {code} java.lang.IllegalArgumentException: requirement failed: Ivy settings file /path/to/ivySettings.xml does not exist at scala.Predef$.require(Predef.scala:281) at org.apache.spark.deploy.SparkSubmitUtils$.loadIvySettings(SparkSubmit.scala:1331) at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:176) at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:156) at org.apache.spark.sql.internal.SessionResourceLoader.resolveJars(SessionState.scala:166) at org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:133) at org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40) {code} We should ship the ivySettings file to the driver so that {{addJar}} is able to find it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-34471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286953#comment-17286953 ] Apache Spark commented on SPARK-34471: -- User 'bozhang2820' has created a pull request for this issue: https://github.com/apache/spark/pull/31590 > Document DataStreamReader/Writer table APIs in Structured Streaming > Programming Guide > - > > Key: SPARK-34471 > URL: https://issues.apache.org/jira/browse/SPARK-34471 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 3.1.1 >Reporter: Bo Zhang >Priority: Major > > We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 > and SPARK-33836. > We need to update the Structured Streaming Programming Guide with the changes > above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-34471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34471: Assignee: (was: Apache Spark) > Document DataStreamReader/Writer table APIs in Structured Streaming > Programming Guide > - > > Key: SPARK-34471 > URL: https://issues.apache.org/jira/browse/SPARK-34471 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 3.1.1 >Reporter: Bo Zhang >Priority: Major > > We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 > and SPARK-33836. > We need to update the Structured Streaming Programming Guide with the changes > above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-34471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34471: Assignee: Apache Spark > Document DataStreamReader/Writer table APIs in Structured Streaming > Programming Guide > - > > Key: SPARK-34471 > URL: https://issues.apache.org/jira/browse/SPARK-34471 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 3.1.1 >Reporter: Bo Zhang >Assignee: Apache Spark >Priority: Major > > We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 > and SPARK-33836. > We need to update the Structured Streaming Programming Guide with the changes > above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide
[ https://issues.apache.org/jira/browse/SPARK-34471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286951#comment-17286951 ] Apache Spark commented on SPARK-34471: -- User 'bozhang2820' has created a pull request for this issue: https://github.com/apache/spark/pull/31590 > Document DataStreamReader/Writer table APIs in Structured Streaming > Programming Guide > - > > Key: SPARK-34471 > URL: https://issues.apache.org/jira/browse/SPARK-34471 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 3.1.1 >Reporter: Bo Zhang >Priority: Major > > We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 > and SPARK-33836. > We need to update the Structured Streaming Programming Guide with the changes > above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide
Bo Zhang created SPARK-34471: Summary: Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide Key: SPARK-34471 URL: https://issues.apache.org/jira/browse/SPARK-34471 Project: Spark Issue Type: Documentation Components: Documentation, Structured Streaming Affects Versions: 3.1.1 Reporter: Bo Zhang We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 and SPARK-33836. We need to update the Structured Streaming Programming Guide with the changes above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34314) Wrong discovered partition value
[ https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34314: --- Assignee: Maxim Gekk > Wrong discovered partition value > > > Key: SPARK-34314 > URL: https://issues.apache.org/jira/browse/SPARK-34314 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > The example below portraits the issue: > {code:scala} > val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part") > df.write > .partitionBy("part") > .format("parquet") > .save(path) > val readback = spark.read.parquet(path) > readback.printSchema() > readback.show(false) > {code} > It write the partition value as string: > {code} > /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d > ├── _SUCCESS > ├── part=-0 > │ └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet > └── part=AA > └── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet > {code} > *"-0"* and "AA". > but when Spark reads data back, it transforms "-0" to "0" > {code} > root > |-- id: integer (nullable = true) > |-- part: string (nullable = true) > +---++ > |id |part| > +---++ > |0 |AA | > |1 |0 | > +---++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34314) Wrong discovered partition value
[ https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34314. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31549 [https://github.com/apache/spark/pull/31549] > Wrong discovered partition value > > > Key: SPARK-34314 > URL: https://issues.apache.org/jira/browse/SPARK-34314 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > The example below portraits the issue: > {code:scala} > val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part") > df.write > .partitionBy("part") > .format("parquet") > .save(path) > val readback = spark.read.parquet(path) > readback.printSchema() > readback.show(false) > {code} > It write the partition value as string: > {code} > /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d > ├── _SUCCESS > ├── part=-0 > │ └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet > └── part=AA > └── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet > {code} > *"-0"* and "AA". > but when Spark reads data back, it transforms "-0" to "0" > {code} > root > |-- id: integer (nullable = true) > |-- part: string (nullable = true) > +---++ > |id |part| > +---++ > |0 |AA | > |1 |0 | > +---++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34424) HiveOrcHadoopFsRelationSuite fails with seed 610710213676
[ https://issues.apache.org/jira/browse/SPARK-34424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286926#comment-17286926 ] Apache Spark commented on SPARK-34424: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/31589 > HiveOrcHadoopFsRelationSuite fails with seed 610710213676 > - > > Key: SPARK-34424 > URL: https://issues.apache.org/jira/browse/SPARK-34424 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Maxim Gekk >Priority: Major > > The test "test all data types" in HiveOrcHadoopFsRelationSuite fails with: > {code:java} > == Results == > !== Correct Answer - 20 ==== Spark Answer - 20 == > struct struct > [1,1582-10-15] [1,1582-10-15] > [2,null] [2,null] > [3,1970-01-01] [3,1970-01-01] > [4,1681-08-06] [4,1681-08-06] > [5,1582-10-15] [5,1582-10-15] > [6,-12-31] [6,-12-31] > [7,0583-01-04] [7,0583-01-04] > [8,6077-03-04] [8,6077-03-04] > ![9,1582-10-06] [9,1582-10-15] > [10,1582-10-15] [10,1582-10-15] > [11,-12-31] [11,-12-31] > [12,9722-10-04] [12,9722-10-04] > [13,0243-12-19] [13,0243-12-19] > [14,-12-31] [14,-12-31] > [15,8743-01-24] [15,8743-01-24] > [16,1039-10-31] [16,1039-10-31] > [17,-12-31] [17,-12-31] > [18,1582-10-15] [18,1582-10-15] > [19,1582-10-15] [19,1582-10-15] > [20,1582-10-15] [20,1582-10-15] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21770) ProbabilisticClassificationModel: Improve normalization of all-zero raw predictions
[ https://issues.apache.org/jira/browse/SPARK-21770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286916#comment-17286916 ] Weichen Xu commented on SPARK-21770: [~rishi-aga] Could you create a new ticket for this with reproducing code ? We should find root cause why it generate all zero probabilities > ProbabilisticClassificationModel: Improve normalization of all-zero raw > predictions > --- > > Key: SPARK-21770 > URL: https://issues.apache.org/jira/browse/SPARK-21770 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.3.0 >Reporter: Siddharth Murching >Assignee: Weichen Xu >Priority: Minor > Fix For: 2.3.0 > > > Given an n-element raw prediction vector of all-zeros, > ProbabilisticClassifierModel.normalizeToProbabilitiesInPlace() should output > a probability vector of all-equal 1/n entries -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org