[jira] [Commented] (SPARK-37578) DSV2 is not updating Output Metrics
[ https://issues.apache.org/jira/browse/SPARK-37578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480360#comment-17480360 ] Apache Spark commented on SPARK-37578: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/35277 > DSV2 is not updating Output Metrics > --- > > Key: SPARK-37578 > URL: https://issues.apache.org/jira/browse/SPARK-37578 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Sandeep Katta >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.3.0 > > > Repro code > ./bin/spark-shell --master local --jars > /Users/jars/iceberg-spark3-runtime-0.12.1.jar > > {code:java} > import scala.collection.mutable > import org.apache.spark.scheduler._val bytesWritten = new > mutable.ArrayBuffer[Long]() > val recordsWritten = new mutable.ArrayBuffer[Long]() > val bytesWrittenListener = new SparkListener() { > override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = { > bytesWritten += taskEnd.taskMetrics.outputMetrics.bytesWritten > recordsWritten += taskEnd.taskMetrics.outputMetrics.recordsWritten > } > } > spark.sparkContext.addSparkListener(bytesWrittenListener) > try { > val df = spark.range(1000).toDF("id") > df.write.format("iceberg").save("Users/data/dsv2_test") > > assert(bytesWritten.sum > 0) > assert(recordsWritten.sum > 0) > } finally { > spark.sparkContext.removeSparkListener(bytesWrittenListener) > } {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37985) Fix flaky test SPARK-37578
[ https://issues.apache.org/jira/browse/SPARK-37985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37985: Assignee: Apache Spark > Fix flaky test SPARK-37578 > -- > > Key: SPARK-37985 > URL: https://issues.apache.org/jira/browse/SPARK-37985 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > 2022-01-22T01:58:29.8444339Z [0m[[0m[0minfo[0m] [0m[0m[32m- > SPARK-36030: Report metrics from Datasource v2 write (90 milliseconds)[0m[0m > 2022-01-22T01:58:29.9427049Z [0m[[0m[0minfo[0m] [0m[0m[31m- > SPARK-37578: Update output metrics from Datasource v2 *** FAILED *** (65 > milliseconds)[0m[0m > 2022-01-22T01:58:29.9428038Z [0m[[0m[0minfo[0m] [0m[0m[31m 123 did > not equal 246 (SQLAppStatusListenerSuite.scala:936)[0m[0m > 2022-01-22T01:58:29.9428531Z [0m[[0m[0minfo[0m] [0m[0m[31m > org.scalatest.exceptions.TestFailedException:[0m[0m > 2022-01-22T01:58:29.9429101Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)[0m[0m > 2022-01-22T01:58:29.9429717Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)[0m[0m > 2022-01-22T01:58:29.9430298Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)[0m[0m > 2022-01-22T01:58:29.9430840Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)[0m[0m > 2022-01-22T01:58:29.9431512Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61(SQLAppStatusListenerSuite.scala:936)[0m[0m > 2022-01-22T01:58:29.9432305Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61$adapted(SQLAppStatusListenerSuite.scala:905)[0m[0m > 2022-01-22T01:58:29.9432982Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:79)[0m[0m > 2022-01-22T01:58:29.9433695Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:78)[0m[0m > 2022-01-22T01:58:29.9434276Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:221)[0m[0m > 2022-01-22T01:58:29.9435040Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m > 2022-01-22T01:58:29.9435764Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:78)[0m[0m > 2022-01-22T01:58:29.9436354Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:77)[0m[0m > 2022-01-22T01:58:29.9437063Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m > 2022-01-22T01:58:29.9437851Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$60(SQLAppStatusListenerSuite.scala:905)[0m[0m -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37985) Fix flaky test SPARK-37578
[ https://issues.apache.org/jira/browse/SPARK-37985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37985: Assignee: (was: Apache Spark) > Fix flaky test SPARK-37578 > -- > > Key: SPARK-37985 > URL: https://issues.apache.org/jira/browse/SPARK-37985 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > 2022-01-22T01:58:29.8444339Z [0m[[0m[0minfo[0m] [0m[0m[32m- > SPARK-36030: Report metrics from Datasource v2 write (90 milliseconds)[0m[0m > 2022-01-22T01:58:29.9427049Z [0m[[0m[0minfo[0m] [0m[0m[31m- > SPARK-37578: Update output metrics from Datasource v2 *** FAILED *** (65 > milliseconds)[0m[0m > 2022-01-22T01:58:29.9428038Z [0m[[0m[0minfo[0m] [0m[0m[31m 123 did > not equal 246 (SQLAppStatusListenerSuite.scala:936)[0m[0m > 2022-01-22T01:58:29.9428531Z [0m[[0m[0minfo[0m] [0m[0m[31m > org.scalatest.exceptions.TestFailedException:[0m[0m > 2022-01-22T01:58:29.9429101Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)[0m[0m > 2022-01-22T01:58:29.9429717Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)[0m[0m > 2022-01-22T01:58:29.9430298Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)[0m[0m > 2022-01-22T01:58:29.9430840Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)[0m[0m > 2022-01-22T01:58:29.9431512Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61(SQLAppStatusListenerSuite.scala:936)[0m[0m > 2022-01-22T01:58:29.9432305Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61$adapted(SQLAppStatusListenerSuite.scala:905)[0m[0m > 2022-01-22T01:58:29.9432982Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:79)[0m[0m > 2022-01-22T01:58:29.9433695Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:78)[0m[0m > 2022-01-22T01:58:29.9434276Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:221)[0m[0m > 2022-01-22T01:58:29.9435040Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m > 2022-01-22T01:58:29.9435764Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:78)[0m[0m > 2022-01-22T01:58:29.9436354Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:77)[0m[0m > 2022-01-22T01:58:29.9437063Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m > 2022-01-22T01:58:29.9437851Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$60(SQLAppStatusListenerSuite.scala:905)[0m[0m -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37985) Fix flaky test SPARK-37578
[ https://issues.apache.org/jira/browse/SPARK-37985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480359#comment-17480359 ] Apache Spark commented on SPARK-37985: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/35277 > Fix flaky test SPARK-37578 > -- > > Key: SPARK-37985 > URL: https://issues.apache.org/jira/browse/SPARK-37985 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > 2022-01-22T01:58:29.8444339Z [0m[[0m[0minfo[0m] [0m[0m[32m- > SPARK-36030: Report metrics from Datasource v2 write (90 milliseconds)[0m[0m > 2022-01-22T01:58:29.9427049Z [0m[[0m[0minfo[0m] [0m[0m[31m- > SPARK-37578: Update output metrics from Datasource v2 *** FAILED *** (65 > milliseconds)[0m[0m > 2022-01-22T01:58:29.9428038Z [0m[[0m[0minfo[0m] [0m[0m[31m 123 did > not equal 246 (SQLAppStatusListenerSuite.scala:936)[0m[0m > 2022-01-22T01:58:29.9428531Z [0m[[0m[0minfo[0m] [0m[0m[31m > org.scalatest.exceptions.TestFailedException:[0m[0m > 2022-01-22T01:58:29.9429101Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)[0m[0m > 2022-01-22T01:58:29.9429717Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)[0m[0m > 2022-01-22T01:58:29.9430298Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)[0m[0m > 2022-01-22T01:58:29.9430840Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)[0m[0m > 2022-01-22T01:58:29.9431512Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61(SQLAppStatusListenerSuite.scala:936)[0m[0m > 2022-01-22T01:58:29.9432305Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61$adapted(SQLAppStatusListenerSuite.scala:905)[0m[0m > 2022-01-22T01:58:29.9432982Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:79)[0m[0m > 2022-01-22T01:58:29.9433695Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:78)[0m[0m > 2022-01-22T01:58:29.9434276Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:221)[0m[0m > 2022-01-22T01:58:29.9435040Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m > 2022-01-22T01:58:29.9435764Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:78)[0m[0m > 2022-01-22T01:58:29.9436354Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:77)[0m[0m > 2022-01-22T01:58:29.9437063Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m > 2022-01-22T01:58:29.9437851Z [0m[[0m[0minfo[0m] [0m[0m[31m at > org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$60(SQLAppStatusListenerSuite.scala:905)[0m[0m -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37985) Fix flaky test SPARK-37578
angerszhu created SPARK-37985: - Summary: Fix flaky test SPARK-37578 Key: SPARK-37985 URL: https://issues.apache.org/jira/browse/SPARK-37985 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0 Reporter: angerszhu 2022-01-22T01:58:29.8444339Z [0m[[0m[0minfo[0m] [0m[0m[32m- SPARK-36030: Report metrics from Datasource v2 write (90 milliseconds)[0m[0m 2022-01-22T01:58:29.9427049Z [0m[[0m[0minfo[0m] [0m[0m[31m- SPARK-37578: Update output metrics from Datasource v2 *** FAILED *** (65 milliseconds)[0m[0m 2022-01-22T01:58:29.9428038Z [0m[[0m[0minfo[0m] [0m[0m[31m 123 did not equal 246 (SQLAppStatusListenerSuite.scala:936)[0m[0m 2022-01-22T01:58:29.9428531Z [0m[[0m[0minfo[0m] [0m[0m[31m org.scalatest.exceptions.TestFailedException:[0m[0m 2022-01-22T01:58:29.9429101Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)[0m[0m 2022-01-22T01:58:29.9429717Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)[0m[0m 2022-01-22T01:58:29.9430298Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)[0m[0m 2022-01-22T01:58:29.9430840Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)[0m[0m 2022-01-22T01:58:29.9431512Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61(SQLAppStatusListenerSuite.scala:936)[0m[0m 2022-01-22T01:58:29.9432305Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61$adapted(SQLAppStatusListenerSuite.scala:905)[0m[0m 2022-01-22T01:58:29.9432982Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:79)[0m[0m 2022-01-22T01:58:29.9433695Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:78)[0m[0m 2022-01-22T01:58:29.9434276Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:221)[0m[0m 2022-01-22T01:58:29.9435040Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m 2022-01-22T01:58:29.9435764Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:78)[0m[0m 2022-01-22T01:58:29.9436354Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:77)[0m[0m 2022-01-22T01:58:29.9437063Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m 2022-01-22T01:58:29.9437851Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$60(SQLAppStatusListenerSuite.scala:905)[0m[0m -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37984) Avoid calculating all outstanding requests to improve performance.
[ https://issues.apache.org/jira/browse/SPARK-37984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37984: Assignee: Apache Spark > Avoid calculating all outstanding requests to improve performance. > -- > > Key: SPARK-37984 > URL: https://issues.apache.org/jira/browse/SPARK-37984 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.0, 3.1.2, 3.2.0 >Reporter: weixiuli >Assignee: Apache Spark >Priority: Major > > Avoid calculating all outstanding requests to improve performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37984) Avoid calculating all outstanding requests to improve performance.
[ https://issues.apache.org/jira/browse/SPARK-37984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37984: Assignee: (was: Apache Spark) > Avoid calculating all outstanding requests to improve performance. > -- > > Key: SPARK-37984 > URL: https://issues.apache.org/jira/browse/SPARK-37984 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.0, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Avoid calculating all outstanding requests to improve performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37984) Avoid calculating all outstanding requests to improve performance.
[ https://issues.apache.org/jira/browse/SPARK-37984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480351#comment-17480351 ] Apache Spark commented on SPARK-37984: -- User 'weixiuli' has created a pull request for this issue: https://github.com/apache/spark/pull/35276 > Avoid calculating all outstanding requests to improve performance. > -- > > Key: SPARK-37984 > URL: https://issues.apache.org/jira/browse/SPARK-37984 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.0, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Avoid calculating all outstanding requests to improve performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37984) Avoid calculating all outstanding requests to improve performance.
[ https://issues.apache.org/jira/browse/SPARK-37984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37984: - Description: Avoid calculating all outstanding requests to improve performance. (was: Avoid computing all outstanding requests to improve performance.) Summary: Avoid calculating all outstanding requests to improve performance. (was: Avoid computing all outstanding requests to improve performance.) > Avoid calculating all outstanding requests to improve performance. > -- > > Key: SPARK-37984 > URL: https://issues.apache.org/jira/browse/SPARK-37984 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.0, 3.1.2, 3.2.0 >Reporter: weixiuli >Priority: Major > > Avoid calculating all outstanding requests to improve performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37984) Avoid computing all outstanding requests to improve performance.
weixiuli created SPARK-37984: Summary: Avoid computing all outstanding requests to improve performance. Key: SPARK-37984 URL: https://issues.apache.org/jira/browse/SPARK-37984 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Affects Versions: 3.2.0, 3.1.2, 3.1.0, 3.0.3, 3.0.1, 3.0.0 Reporter: weixiuli Avoid computing all outstanding requests to improve performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37731) refactor and cleanup function lookup in Analyzer
[ https://issues.apache.org/jira/browse/SPARK-37731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480323#comment-17480323 ] Apache Spark commented on SPARK-37731: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/35275 > refactor and cleanup function lookup in Analyzer > > > Key: SPARK-37731 > URL: https://issues.apache.org/jira/browse/SPARK-37731 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37982) Use error classes in the execution errors related to unsupported input type
[ https://issues.apache.org/jira/browse/SPARK-37982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480309#comment-17480309 ] Apache Spark commented on SPARK-37982: -- User 'leesf' has created a pull request for this issue: https://github.com/apache/spark/pull/35274 > Use error classes in the execution errors related to unsupported input type > --- > > Key: SPARK-37982 > URL: https://issues.apache.org/jira/browse/SPARK-37982 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: leesf >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37982) Use error classes in the execution errors related to unsupported input type
[ https://issues.apache.org/jira/browse/SPARK-37982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37982: Assignee: Apache Spark > Use error classes in the execution errors related to unsupported input type > --- > > Key: SPARK-37982 > URL: https://issues.apache.org/jira/browse/SPARK-37982 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: leesf >Assignee: Apache Spark >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37982) Use error classes in the execution errors related to unsupported input type
[ https://issues.apache.org/jira/browse/SPARK-37982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37982: Assignee: (was: Apache Spark) > Use error classes in the execution errors related to unsupported input type > --- > > Key: SPARK-37982 > URL: https://issues.apache.org/jira/browse/SPARK-37982 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: leesf >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37982) Use error classes in the execution errors related to unsupported input type
[ https://issues.apache.org/jira/browse/SPARK-37982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480308#comment-17480308 ] Apache Spark commented on SPARK-37982: -- User 'leesf' has created a pull request for this issue: https://github.com/apache/spark/pull/35274 > Use error classes in the execution errors related to unsupported input type > --- > > Key: SPARK-37982 > URL: https://issues.apache.org/jira/browse/SPARK-37982 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: leesf >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37981) Deletes columns with all Null as default.
[ https://issues.apache.org/jira/browse/SPARK-37981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-37981. Resolution: Duplicate > Deletes columns with all Null as default. > - > > Key: SPARK-37981 > URL: https://issues.apache.org/jira/browse/SPARK-37981 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: json_null.json > > > Spark 3.2.1-RC2 > During write.json spark deletes columns with all Null as default. > > Spark does have dropFieldIfAllNullfalse as default, according to > https://spark.apache.org/docs/latest/sql-data-sources-json.html > {code:java} > from pyspark import pandas as ps > import re > import numpy as np > import os > import pandas as pd > from pyspark import SparkContext, SparkConf > from pyspark.sql import SparkSession > from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr > from pyspark.sql.types import StructType, StructField, StringType,IntegerType > os.environ["PYARROW_IGNORE_TIMEZONE"]="1" > def get_spark_session(app_name: str, conf: SparkConf): > conf.setMaster('local[*]') > conf \ > .set('spark.driver.memory', '64g')\ > .set("fs.s3a.access.key", "minio") \ > .set("fs.s3a.secret.key", "") \ > .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \ > .set("spark.hadoop.fs.s3a.impl", > "org.apache.hadoop.fs.s3a.S3AFileSystem") \ > .set("spark.hadoop.fs.s3a.path.style.access", "true") \ > .set("spark.sql.repl.eagerEval.enabled", "True") \ > .set("spark.sql.adaptive.enabled", "True") \ > .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ > .set("spark.sql.repl.eagerEval.maxNumRows", "1") \ > .set("sc.setLogLevel", "error") > > return > SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate() > spark = get_spark_session("Falk", SparkConf()) > d3 = > spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json") > import pyspark > def sparkShape(dataFrame): > return (dataFrame.count(), len(dataFrame.columns)) > pyspark.sql.dataframe.DataFrame.shape = sparkShape > print(d3.shape()) > (653610, 267) > d3.write.json("d3.json") > d3 = spark.read.json("d3.json/*.json") > import pyspark > def sparkShape(dataFrame): > return (dataFrame.count(), len(dataFrame.columns)) > pyspark.sql.dataframe.DataFrame.shape = sparkShape > print(d3.shape()) > (653610, 186) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18591) Replace hash-based aggregates with sort-based ones if inputs already sorted
[ https://issues.apache.org/jira/browse/SPARK-18591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480300#comment-17480300 ] Cheng Su commented on SPARK-18591: -- Just FYI, the Jira should be fixed by https://issues.apache.org/jira/browse/SPARK-37455 . The related code is merged already and should be released in next Spark release - Spark 3.3.0. > Replace hash-based aggregates with sort-based ones if inputs already sorted > --- > > Key: SPARK-18591 > URL: https://issues.apache.org/jira/browse/SPARK-18591 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.2 >Reporter: Takeshi Yamamuro >Priority: Major > Labels: bulk-closed > > Spark currently uses sort-based aggregates only in limited condition; the > cases where spark cannot use partial aggregates and hash-based ones. > However, if input ordering has already satisfied the requirements of > sort-based aggregates, it seems sort-based ones are faster than the other. > {code} > ./bin/spark-shell --conf spark.sql.shuffle.partitions=1 > val df = spark.range(1000).selectExpr("id AS key", "id % 10 AS > value").sort($"key").cache > def timer[R](block: => R): R = { > val t0 = System.nanoTime() > val result = block > val t1 = System.nanoTime() > println("Elapsed time: " + ((t1 - t0 + 0.0) / 10.0)+ "s") > result > } > timer { > df.groupBy("key").count().count > } > // codegen'd hash aggregate > Elapsed time: 7.116962977s > // non-codegen'd sort aggregarte > Elapsed time: 3.088816662s > {code} > If codegen'd sort-based aggregates are supported in SPARK-16844, this seems > to make the performance gap bigger; > {code} > - codegen'd sort aggregate > Elapsed time: 1.645234684s > {code} > Therefore, it'd be better to use sort-based ones in this case. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37983) Backout agg build time metrics from sort aggregate
[ https://issues.apache.org/jira/browse/SPARK-37983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37983: Assignee: (was: Apache Spark) > Backout agg build time metrics from sort aggregate > -- > > Key: SPARK-37983 > URL: https://issues.apache.org/jira/browse/SPARK-37983 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Cheng Su >Priority: Trivial > > This is a followup of https://issues.apache.org/jira/browse/SPARK-37564 . I > realize the agg build time metrics for sort aggregate is actually not > correctly recorded. We don't have a hash build phase for sort aggregate, so > there is really no way to measure so-called build time for sort aggregate. So > here I make the change to back out the change introduced in > [https://github.com/apache/spark/pull/34826] for agg build time metric. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37983) Backout agg build time metrics from sort aggregate
[ https://issues.apache.org/jira/browse/SPARK-37983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37983: Assignee: Apache Spark > Backout agg build time metrics from sort aggregate > -- > > Key: SPARK-37983 > URL: https://issues.apache.org/jira/browse/SPARK-37983 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Cheng Su >Assignee: Apache Spark >Priority: Trivial > > This is a followup of https://issues.apache.org/jira/browse/SPARK-37564 . I > realize the agg build time metrics for sort aggregate is actually not > correctly recorded. We don't have a hash build phase for sort aggregate, so > there is really no way to measure so-called build time for sort aggregate. So > here I make the change to back out the change introduced in > [https://github.com/apache/spark/pull/34826] for agg build time metric. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37983) Backout agg build time metrics from sort aggregate
[ https://issues.apache.org/jira/browse/SPARK-37983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480298#comment-17480298 ] Apache Spark commented on SPARK-37983: -- User 'c21' has created a pull request for this issue: https://github.com/apache/spark/pull/35273 > Backout agg build time metrics from sort aggregate > -- > > Key: SPARK-37983 > URL: https://issues.apache.org/jira/browse/SPARK-37983 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Cheng Su >Priority: Trivial > > This is a followup of https://issues.apache.org/jira/browse/SPARK-37564 . I > realize the agg build time metrics for sort aggregate is actually not > correctly recorded. We don't have a hash build phase for sort aggregate, so > there is really no way to measure so-called build time for sort aggregate. So > here I make the change to back out the change introduced in > [https://github.com/apache/spark/pull/34826] for agg build time metric. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37983) Backout agg build time metrics from sort aggregate
Cheng Su created SPARK-37983: Summary: Backout agg build time metrics from sort aggregate Key: SPARK-37983 URL: https://issues.apache.org/jira/browse/SPARK-37983 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Cheng Su This is a followup of https://issues.apache.org/jira/browse/SPARK-37564 . I realize the agg build time metrics for sort aggregate is actually not correctly recorded. We don't have a hash build phase for sort aggregate, so there is really no way to measure so-called build time for sort aggregate. So here I make the change to back out the change introduced in [https://github.com/apache/spark/pull/34826] for agg build time metric. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37981) Deletes columns with all Null as default.
[ https://issues.apache.org/jira/browse/SPARK-37981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480293#comment-17480293 ] Bjørn Jørgensen commented on SPARK-37981: - [^json_null.json] {code:java} from pyspark import pandas as ps import re import numpy as np import os import pandas as pd from pyspark import SparkContext, SparkConf from pyspark.sql import SparkSession from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr from pyspark.sql.types import StructType, StructField, StringType,IntegerType os.environ["PYARROW_IGNORE_TIMEZONE"]="1" def get_spark_session(app_name: str, conf: SparkConf): conf.setMaster('local[*]') conf \ .set('spark.driver.memory', '64g')\ .set("fs.s3a.access.key", "minio") \ .set("fs.s3a.secret.key", "") \ .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \ .set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \ .set("spark.hadoop.fs.s3a.path.style.access", "true") \ .set("spark.sql.repl.eagerEval.enabled", "True") \ .set("spark.sql.adaptive.enabled", "True") \ .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ .set("spark.sql.repl.eagerEval.maxNumRows", "1") \ .set("sc.setLogLevel", "error") return SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate() spark = get_spark_session("Falk", SparkConf()) df = spark.read.option("multiline","true").json("json_null.json") import pyspark def sparkShape(dataFrame): return (dataFrame.count(), len(dataFrame.columns)) pyspark.sql.dataframe.DataFrame.shape = sparkShape print(df.shape()) (1, 4) df.write.json("df.json") df = spark.read.json("df.json/*.json") import pyspark def sparkShape(dataFrame): return (dataFrame.count(), len(dataFrame.columns)) pyspark.sql.dataframe.DataFrame.shape = sparkShape print(df.shape()) (1, 3) {code} > Deletes columns with all Null as default. > - > > Key: SPARK-37981 > URL: https://issues.apache.org/jira/browse/SPARK-37981 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: json_null.json > > > Spark 3.2.1-RC2 > During write.json spark deletes columns with all Null as default. > > Spark does have dropFieldIfAllNullfalse as default, according to > https://spark.apache.org/docs/latest/sql-data-sources-json.html > {code:java} > from pyspark import pandas as ps > import re > import numpy as np > import os > import pandas as pd > from pyspark import SparkContext, SparkConf > from pyspark.sql import SparkSession > from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr > from pyspark.sql.types import StructType, StructField, StringType,IntegerType > os.environ["PYARROW_IGNORE_TIMEZONE"]="1" > def get_spark_session(app_name: str, conf: SparkConf): > conf.setMaster('local[*]') > conf \ > .set('spark.driver.memory', '64g')\ > .set("fs.s3a.access.key", "minio") \ > .set("fs.s3a.secret.key", "") \ > .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \ > .set("spark.hadoop.fs.s3a.impl", > "org.apache.hadoop.fs.s3a.S3AFileSystem") \ > .set("spark.hadoop.fs.s3a.path.style.access", "true") \ > .set("spark.sql.repl.eagerEval.enabled", "True") \ > .set("spark.sql.adaptive.enabled", "True") \ > .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ > .set("spark.sql.repl.eagerEval.maxNumRows", "1") \ > .set("sc.setLogLevel", "error") > > return > SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate() > spark = get_spark_session("Falk", SparkConf()) > d3 = > spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json") > import pyspark > def sparkShape(dataFrame): > return (dataFrame.count(), len(dataFrame.columns)) > pyspark.sql.dataframe.DataFrame.shape = sparkShape > print(d3.shape()) > (653610, 267) > d3.write.json("d3.json") > d3 = spark.read.json("d3.json/*.json") > import pyspark > def sparkShape(dataFrame): > return (dataFrame.count(), len(dataFrame.columns)) > pyspark.sql.dataframe.DataFrame.shape = sparkShape > print(d3.shape()) > (653610, 186) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37982) Use error classes in the execution errors related to unsupported input type
leesf created SPARK-37982: - Summary: Use error classes in the execution errors related to unsupported input type Key: SPARK-37982 URL: https://issues.apache.org/jira/browse/SPARK-37982 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: leesf Fix For: 3.3.0 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37981) Deletes columns with all Null as default.
[ https://issues.apache.org/jira/browse/SPARK-37981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480291#comment-17480291 ] Maciej Szymkiewicz edited comment on SPARK-37981 at 1/21/22, 11:37 PM: --- This doesn't seem valid. {{dropFieldIfAllNull}} is a reader option. For writes, we use {{ignoreNullFields}}. So your write code should use appropriate option: {code} d3.write.option("ignoreNullFields", "false").json("d3.json") {code} was (Author: zero323): This doesn't seem valid. {{dropFieldIfAllNull}} is a reader option. For writes, we use {{ignoreNullFields}}. So your code should be {code} d3.write.option("ignoreNullFields", "false").json("d3.json") {code} > Deletes columns with all Null as default. > - > > Key: SPARK-37981 > URL: https://issues.apache.org/jira/browse/SPARK-37981 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: json_null.json > > > Spark 3.2.1-RC2 > During write.json spark deletes columns with all Null as default. > > Spark does have dropFieldIfAllNullfalse as default, according to > https://spark.apache.org/docs/latest/sql-data-sources-json.html > {code:java} > from pyspark import pandas as ps > import re > import numpy as np > import os > import pandas as pd > from pyspark import SparkContext, SparkConf > from pyspark.sql import SparkSession > from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr > from pyspark.sql.types import StructType, StructField, StringType,IntegerType > os.environ["PYARROW_IGNORE_TIMEZONE"]="1" > def get_spark_session(app_name: str, conf: SparkConf): > conf.setMaster('local[*]') > conf \ > .set('spark.driver.memory', '64g')\ > .set("fs.s3a.access.key", "minio") \ > .set("fs.s3a.secret.key", "") \ > .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \ > .set("spark.hadoop.fs.s3a.impl", > "org.apache.hadoop.fs.s3a.S3AFileSystem") \ > .set("spark.hadoop.fs.s3a.path.style.access", "true") \ > .set("spark.sql.repl.eagerEval.enabled", "True") \ > .set("spark.sql.adaptive.enabled", "True") \ > .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ > .set("spark.sql.repl.eagerEval.maxNumRows", "1") \ > .set("sc.setLogLevel", "error") > > return > SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate() > spark = get_spark_session("Falk", SparkConf()) > d3 = > spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json") > import pyspark > def sparkShape(dataFrame): > return (dataFrame.count(), len(dataFrame.columns)) > pyspark.sql.dataframe.DataFrame.shape = sparkShape > print(d3.shape()) > (653610, 267) > d3.write.json("d3.json") > d3 = spark.read.json("d3.json/*.json") > import pyspark > def sparkShape(dataFrame): > return (dataFrame.count(), len(dataFrame.columns)) > pyspark.sql.dataframe.DataFrame.shape = sparkShape > print(d3.shape()) > (653610, 186) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37981) Deletes columns with all Null as default.
[ https://issues.apache.org/jira/browse/SPARK-37981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480291#comment-17480291 ] Maciej Szymkiewicz commented on SPARK-37981: This doesn't seem valid. {{dropFieldIfAllNull}} is a reader option. For writes, we use {{ignoreNullFields}}. So your code should be {code} d3.write.option("ignoreNullFields", "false").json("d3.json") {code} > Deletes columns with all Null as default. > - > > Key: SPARK-37981 > URL: https://issues.apache.org/jira/browse/SPARK-37981 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: json_null.json > > > Spark 3.2.1-RC2 > During write.json spark deletes columns with all Null as default. > > Spark does have dropFieldIfAllNullfalse as default, according to > https://spark.apache.org/docs/latest/sql-data-sources-json.html > {code:java} > from pyspark import pandas as ps > import re > import numpy as np > import os > import pandas as pd > from pyspark import SparkContext, SparkConf > from pyspark.sql import SparkSession > from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr > from pyspark.sql.types import StructType, StructField, StringType,IntegerType > os.environ["PYARROW_IGNORE_TIMEZONE"]="1" > def get_spark_session(app_name: str, conf: SparkConf): > conf.setMaster('local[*]') > conf \ > .set('spark.driver.memory', '64g')\ > .set("fs.s3a.access.key", "minio") \ > .set("fs.s3a.secret.key", "") \ > .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \ > .set("spark.hadoop.fs.s3a.impl", > "org.apache.hadoop.fs.s3a.S3AFileSystem") \ > .set("spark.hadoop.fs.s3a.path.style.access", "true") \ > .set("spark.sql.repl.eagerEval.enabled", "True") \ > .set("spark.sql.adaptive.enabled", "True") \ > .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ > .set("spark.sql.repl.eagerEval.maxNumRows", "1") \ > .set("sc.setLogLevel", "error") > > return > SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate() > spark = get_spark_session("Falk", SparkConf()) > d3 = > spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json") > import pyspark > def sparkShape(dataFrame): > return (dataFrame.count(), len(dataFrame.columns)) > pyspark.sql.dataframe.DataFrame.shape = sparkShape > print(d3.shape()) > (653610, 267) > d3.write.json("d3.json") > d3 = spark.read.json("d3.json/*.json") > import pyspark > def sparkShape(dataFrame): > return (dataFrame.count(), len(dataFrame.columns)) > pyspark.sql.dataframe.DataFrame.shape = sparkShape > print(d3.shape()) > (653610, 186) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37981) Deletes columns with all Null as default.
[ https://issues.apache.org/jira/browse/SPARK-37981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bjørn Jørgensen updated SPARK-37981: Attachment: json_null.json > Deletes columns with all Null as default. > - > > Key: SPARK-37981 > URL: https://issues.apache.org/jira/browse/SPARK-37981 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: json_null.json > > > Spark 3.2.1-RC2 > During write.json spark deletes columns with all Null as default. > > Spark does have dropFieldIfAllNullfalse as default, according to > https://spark.apache.org/docs/latest/sql-data-sources-json.html > {code:java} > from pyspark import pandas as ps > import re > import numpy as np > import os > import pandas as pd > from pyspark import SparkContext, SparkConf > from pyspark.sql import SparkSession > from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr > from pyspark.sql.types import StructType, StructField, StringType,IntegerType > os.environ["PYARROW_IGNORE_TIMEZONE"]="1" > def get_spark_session(app_name: str, conf: SparkConf): > conf.setMaster('local[*]') > conf \ > .set('spark.driver.memory', '64g')\ > .set("fs.s3a.access.key", "minio") \ > .set("fs.s3a.secret.key", "") \ > .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \ > .set("spark.hadoop.fs.s3a.impl", > "org.apache.hadoop.fs.s3a.S3AFileSystem") \ > .set("spark.hadoop.fs.s3a.path.style.access", "true") \ > .set("spark.sql.repl.eagerEval.enabled", "True") \ > .set("spark.sql.adaptive.enabled", "True") \ > .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ > .set("spark.sql.repl.eagerEval.maxNumRows", "1") \ > .set("sc.setLogLevel", "error") > > return > SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate() > spark = get_spark_session("Falk", SparkConf()) > d3 = > spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json") > import pyspark > def sparkShape(dataFrame): > return (dataFrame.count(), len(dataFrame.columns)) > pyspark.sql.dataframe.DataFrame.shape = sparkShape > print(d3.shape()) > (653610, 267) > d3.write.json("d3.json") > d3 = spark.read.json("d3.json/*.json") > import pyspark > def sparkShape(dataFrame): > return (dataFrame.count(), len(dataFrame.columns)) > pyspark.sql.dataframe.DataFrame.shape = sparkShape > print(d3.shape()) > (653610, 186) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37981) Deletes columns with all Null as default.
[ https://issues.apache.org/jira/browse/SPARK-37981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-37981: - Affects Version/s: 3.2.0 (was: 3.2.1) Priority: Major (was: Critical) This isn't possible to evaluate without seeing some input data > Deletes columns with all Null as default. > - > > Key: SPARK-37981 > URL: https://issues.apache.org/jira/browse/SPARK-37981 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Bjørn Jørgensen >Priority: Major > > Spark 3.2.1-RC2 > During write.json spark deletes columns with all Null as default. > > Spark does have dropFieldIfAllNullfalse as default, according to > https://spark.apache.org/docs/latest/sql-data-sources-json.html > {code:java} > from pyspark import pandas as ps > import re > import numpy as np > import os > import pandas as pd > from pyspark import SparkContext, SparkConf > from pyspark.sql import SparkSession > from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr > from pyspark.sql.types import StructType, StructField, StringType,IntegerType > os.environ["PYARROW_IGNORE_TIMEZONE"]="1" > def get_spark_session(app_name: str, conf: SparkConf): > conf.setMaster('local[*]') > conf \ > .set('spark.driver.memory', '64g')\ > .set("fs.s3a.access.key", "minio") \ > .set("fs.s3a.secret.key", "") \ > .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \ > .set("spark.hadoop.fs.s3a.impl", > "org.apache.hadoop.fs.s3a.S3AFileSystem") \ > .set("spark.hadoop.fs.s3a.path.style.access", "true") \ > .set("spark.sql.repl.eagerEval.enabled", "True") \ > .set("spark.sql.adaptive.enabled", "True") \ > .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ > .set("spark.sql.repl.eagerEval.maxNumRows", "1") \ > .set("sc.setLogLevel", "error") > > return > SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate() > spark = get_spark_session("Falk", SparkConf()) > d3 = > spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json") > import pyspark > def sparkShape(dataFrame): > return (dataFrame.count(), len(dataFrame.columns)) > pyspark.sql.dataframe.DataFrame.shape = sparkShape > print(d3.shape()) > (653610, 267) > d3.write.json("d3.json") > d3 = spark.read.json("d3.json/*.json") > import pyspark > def sparkShape(dataFrame): > return (dataFrame.count(), len(dataFrame.columns)) > pyspark.sql.dataframe.DataFrame.shape = sparkShape > print(d3.shape()) > (653610, 186) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37981) Deletes columns with all Null as default.
Bjørn Jørgensen created SPARK-37981: --- Summary: Deletes columns with all Null as default. Key: SPARK-37981 URL: https://issues.apache.org/jira/browse/SPARK-37981 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.2.1 Reporter: Bjørn Jørgensen Spark 3.2.1-RC2 During write.json spark deletes columns with all Null as default. Spark does have dropFieldIfAllNull false as default, according to https://spark.apache.org/docs/latest/sql-data-sources-json.html {code:java} from pyspark import pandas as ps import re import numpy as np import os import pandas as pd from pyspark import SparkContext, SparkConf from pyspark.sql import SparkSession from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr from pyspark.sql.types import StructType, StructField, StringType,IntegerType os.environ["PYARROW_IGNORE_TIMEZONE"]="1" def get_spark_session(app_name: str, conf: SparkConf): conf.setMaster('local[*]') conf \ .set('spark.driver.memory', '64g')\ .set("fs.s3a.access.key", "minio") \ .set("fs.s3a.secret.key", "") \ .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \ .set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \ .set("spark.hadoop.fs.s3a.path.style.access", "true") \ .set("spark.sql.repl.eagerEval.enabled", "True") \ .set("spark.sql.adaptive.enabled", "True") \ .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ .set("spark.sql.repl.eagerEval.maxNumRows", "1") \ .set("sc.setLogLevel", "error") return SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate() spark = get_spark_session("Falk", SparkConf()) d3 = spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json") import pyspark def sparkShape(dataFrame): return (dataFrame.count(), len(dataFrame.columns)) pyspark.sql.dataframe.DataFrame.shape = sparkShape print(d3.shape()) (653610, 267) d3.write.json("d3.json") d3 = spark.read.json("d3.json/*.json") import pyspark def sparkShape(dataFrame): return (dataFrame.count(), len(dataFrame.columns)) pyspark.sql.dataframe.DataFrame.shape = sparkShape print(d3.shape()) (653610, 186) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37980) Extend METADATA column to support row indexes for file based data sources
Prakhar Jain created SPARK-37980: Summary: Extend METADATA column to support row indexes for file based data sources Key: SPARK-37980 URL: https://issues.apache.org/jira/browse/SPARK-37980 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3 Reporter: Prakhar Jain Spark recently added hidden metadata column support for File based datasources as part of SPARK-37273. We should extend it to support ROW_INDEX also. Definition: ROW_INDEX is basically an index of a row within a file. E.g. 5th row in the file will have ROW_INDEX 5. Use cases: Row Indexes can be used in a variety of ways. A (fileName, rowIndex) tuple uniquely identifies row in a table. This information can be used to mark rows. An Index can be easily created using the Row Indexes. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37936) Use error classes in the parsing errors of intervals
[ https://issues.apache.org/jira/browse/SPARK-37936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480099#comment-17480099 ] Senthil Kumar commented on SPARK-37936: --- [~maxgekk] , I have queries which matches * invalidIntervalFormError - "SELECT INTERVAL '1 DAY 2' HOUR" * fromToIntervalUnsupportedError - "SELECT extract(MONTH FROM INTERVAL '2021-11' YEAR TO DAY)" it will be helpful if you share queries for below scenarios, * moreThanOneFromToUnitInIntervalLiteralError * invalidIntervalLiteralError * invalidFromToUnitValueError * mixedIntervalUnitsError > Use error classes in the parsing errors of intervals > > > Key: SPARK-37936 > URL: https://issues.apache.org/jira/browse/SPARK-37936 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Modify the following methods in QueryParsingErrors: > * moreThanOneFromToUnitInIntervalLiteralError > * invalidIntervalLiteralError > * invalidIntervalFormError > * invalidFromToUnitValueError > * fromToIntervalUnsupportedError > * mixedIntervalUnitsError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryParsingErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37907) StaticInvoke should support ConstantFolding
[ https://issues.apache.org/jira/browse/SPARK-37907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-37907: --- Assignee: angerszhu > StaticInvoke should support ConstantFolding > --- > > Key: SPARK-37907 > URL: https://issues.apache.org/jira/browse/SPARK-37907 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > > StaticInvoke not implement folderable, should support it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37907) StaticInvoke should support ConstantFolding
[ https://issues.apache.org/jira/browse/SPARK-37907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37907. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35207 [https://github.com/apache/spark/pull/35207] > StaticInvoke should support ConstantFolding > --- > > Key: SPARK-37907 > URL: https://issues.apache.org/jira/browse/SPARK-37907 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.3.0 > > > StaticInvoke not implement folderable, should support it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37950) Take EXTERNAL as a reserved table property
[ https://issues.apache.org/jira/browse/SPARK-37950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37950. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35268 [https://github.com/apache/spark/pull/35268] > Take EXTERNAL as a reserved table property > -- > > Key: SPARK-37950 > URL: https://issues.apache.org/jira/browse/SPARK-37950 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: PengLei >Priority: Major > Fix For: 3.3.0 > > > At now. the {{EXTERNAL}} is not table reserved property. we should make > {{EXTERNAL}} a truly reserved property > [discuss|https://github.com/apache/spark/pull/35204#issuecomment-1014752053] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37950) Take EXTERNAL as a reserved table property
[ https://issues.apache.org/jira/browse/SPARK-37950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-37950: --- Assignee: PengLei > Take EXTERNAL as a reserved table property > -- > > Key: SPARK-37950 > URL: https://issues.apache.org/jira/browse/SPARK-37950 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: PengLei >Priority: Major > > At now. the {{EXTERNAL}} is not table reserved property. we should make > {{EXTERNAL}} a truly reserved property > [discuss|https://github.com/apache/spark/pull/35204#issuecomment-1014752053] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37979) Switch to more generic error classes in AES functions
[ https://issues.apache.org/jira/browse/SPARK-37979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480021#comment-17480021 ] Apache Spark commented on SPARK-37979: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/35272 > Switch to more generic error classes in AES functions > - > > Key: SPARK-37979 > URL: https://issues.apache.org/jira/browse/SPARK-37979 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Switch from existing error classes to more generic in AES functions. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37979) Switch to more generic error classes in AES functions
[ https://issues.apache.org/jira/browse/SPARK-37979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37979: Assignee: Apache Spark (was: Max Gekk) > Switch to more generic error classes in AES functions > - > > Key: SPARK-37979 > URL: https://issues.apache.org/jira/browse/SPARK-37979 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Switch from existing error classes to more generic in AES functions. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37979) Switch to more generic error classes in AES functions
[ https://issues.apache.org/jira/browse/SPARK-37979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480020#comment-17480020 ] Apache Spark commented on SPARK-37979: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/35272 > Switch to more generic error classes in AES functions > - > > Key: SPARK-37979 > URL: https://issues.apache.org/jira/browse/SPARK-37979 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Switch from existing error classes to more generic in AES functions. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37979) Switch to more generic error classes in AES functions
[ https://issues.apache.org/jira/browse/SPARK-37979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37979: Assignee: Max Gekk (was: Apache Spark) > Switch to more generic error classes in AES functions > - > > Key: SPARK-37979 > URL: https://issues.apache.org/jira/browse/SPARK-37979 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Switch from existing error classes to more generic in AES functions. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37979) Switch to more generic error classes in AES functions
Max Gekk created SPARK-37979: Summary: Switch to more generic error classes in AES functions Key: SPARK-37979 URL: https://issues.apache.org/jira/browse/SPARK-37979 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Max Gekk Assignee: Max Gekk Switch from existing error classes to more generic in AES functions. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37972) Typing incompatibilities with numpy==1.22.x
[ https://issues.apache.org/jira/browse/SPARK-37972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-37972. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35261 [https://github.com/apache/spark/pull/35261] > Typing incompatibilities with numpy==1.22.x > --- > > Key: SPARK-37972 > URL: https://issues.apache.org/jira/browse/SPARK-37972 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Minor > Fix For: 3.3.0 > > > When type checked against {{numpy==1.22}} mypy detects following issues: > {code:python} > python/pyspark/mllib/linalg/__init__.py:412: error: Argument 2 to "norm" has > incompatible type "Union[float, str]"; expected "Union[None, float, > Literal['fro'], Literal['nuc']]" [arg-type] > python/pyspark/mllib/linalg/__init__.py:457: error: No overload variant of > "dot" matches argument types "ndarray[Any, Any]", "Iterable[float]" > [call-overload] > python/pyspark/mllib/linalg/__init__.py:457: note: Possible overload variant: > python/pyspark/mllib/linalg/__init__.py:457: note: def dot(a: > Union[_SupportsArray[dtype[Any]], > _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, > bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], b: > Union[_SupportsArray[dtype[Any]], > _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, > bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], out: > None = ...) -> Any > python/pyspark/mllib/linalg/__init__.py:457: note: <1 more non-matching > overload not shown> > python/pyspark/mllib/linalg/__init__.py:707: error: Argument 2 to "norm" has > incompatible type "Union[float, str]"; expected "Union[None, float, > Literal['fro'], Literal['nuc']]" [arg-type] > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37972) Typing incompatibilities with numpy==1.22.x
[ https://issues.apache.org/jira/browse/SPARK-37972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reassigned SPARK-37972: -- Assignee: Maciej Szymkiewicz > Typing incompatibilities with numpy==1.22.x > --- > > Key: SPARK-37972 > URL: https://issues.apache.org/jira/browse/SPARK-37972 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Minor > > When type checked against {{numpy==1.22}} mypy detects following issues: > {code:python} > python/pyspark/mllib/linalg/__init__.py:412: error: Argument 2 to "norm" has > incompatible type "Union[float, str]"; expected "Union[None, float, > Literal['fro'], Literal['nuc']]" [arg-type] > python/pyspark/mllib/linalg/__init__.py:457: error: No overload variant of > "dot" matches argument types "ndarray[Any, Any]", "Iterable[float]" > [call-overload] > python/pyspark/mllib/linalg/__init__.py:457: note: Possible overload variant: > python/pyspark/mllib/linalg/__init__.py:457: note: def dot(a: > Union[_SupportsArray[dtype[Any]], > _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, > bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], b: > Union[_SupportsArray[dtype[Any]], > _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, > bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], out: > None = ...) -> Any > python/pyspark/mllib/linalg/__init__.py:457: note: <1 more non-matching > overload not shown> > python/pyspark/mllib/linalg/__init__.py:707: error: Argument 2 to "norm" has > incompatible type "Union[float, str]"; expected "Union[None, float, > Literal['fro'], Literal['nuc']]" [arg-type] > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34805) PySpark loses metadata in DataFrame fields when selecting nested columns
[ https://issues.apache.org/jira/browse/SPARK-34805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479989#comment-17479989 ] Apache Spark commented on SPARK-34805: -- User 'kevinwallimann' has created a pull request for this issue: https://github.com/apache/spark/pull/35270 > PySpark loses metadata in DataFrame fields when selecting nested columns > > > Key: SPARK-34805 > URL: https://issues.apache.org/jira/browse/SPARK-34805 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.0.1, 3.1.1 >Reporter: Mark Ressler >Priority: Major > Attachments: jsonMetadataTest.py, nested_columns_metadata.scala > > > For a DataFrame schema with nested StructTypes, where metadata is set for > fields in the schema, that metadata is lost when a DataFrame selects nested > fields. For example, suppose > {code:java} > df.schema.fields[0].dataType.fields[0].metadata > {code} > returns a non-empty dictionary, then > {code:java} > df.select('Field0.SubField0').schema.fields[0].metadata{code} > returns an empty dictionary, where "Field0" is the name of the first field in > the DataFrame and "SubField0" is the name of the first nested field under > "Field0". > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34805) PySpark loses metadata in DataFrame fields when selecting nested columns
[ https://issues.apache.org/jira/browse/SPARK-34805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34805: Assignee: Apache Spark > PySpark loses metadata in DataFrame fields when selecting nested columns > > > Key: SPARK-34805 > URL: https://issues.apache.org/jira/browse/SPARK-34805 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.0.1, 3.1.1 >Reporter: Mark Ressler >Assignee: Apache Spark >Priority: Major > Attachments: jsonMetadataTest.py, nested_columns_metadata.scala > > > For a DataFrame schema with nested StructTypes, where metadata is set for > fields in the schema, that metadata is lost when a DataFrame selects nested > fields. For example, suppose > {code:java} > df.schema.fields[0].dataType.fields[0].metadata > {code} > returns a non-empty dictionary, then > {code:java} > df.select('Field0.SubField0').schema.fields[0].metadata{code} > returns an empty dictionary, where "Field0" is the name of the first field in > the DataFrame and "SubField0" is the name of the first nested field under > "Field0". > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34805) PySpark loses metadata in DataFrame fields when selecting nested columns
[ https://issues.apache.org/jira/browse/SPARK-34805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34805: Assignee: (was: Apache Spark) > PySpark loses metadata in DataFrame fields when selecting nested columns > > > Key: SPARK-34805 > URL: https://issues.apache.org/jira/browse/SPARK-34805 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.0.1, 3.1.1 >Reporter: Mark Ressler >Priority: Major > Attachments: jsonMetadataTest.py, nested_columns_metadata.scala > > > For a DataFrame schema with nested StructTypes, where metadata is set for > fields in the schema, that metadata is lost when a DataFrame selects nested > fields. For example, suppose > {code:java} > df.schema.fields[0].dataType.fields[0].metadata > {code} > returns a non-empty dictionary, then > {code:java} > df.select('Field0.SubField0').schema.fields[0].metadata{code} > returns an empty dictionary, where "Field0" is the name of the first field in > the DataFrame and "SubField0" is the name of the first nested field under > "Field0". > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34805) PySpark loses metadata in DataFrame fields when selecting nested columns
[ https://issues.apache.org/jira/browse/SPARK-34805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479991#comment-17479991 ] Apache Spark commented on SPARK-34805: -- User 'kevinwallimann' has created a pull request for this issue: https://github.com/apache/spark/pull/35270 > PySpark loses metadata in DataFrame fields when selecting nested columns > > > Key: SPARK-34805 > URL: https://issues.apache.org/jira/browse/SPARK-34805 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.0.1, 3.1.1 >Reporter: Mark Ressler >Priority: Major > Attachments: jsonMetadataTest.py, nested_columns_metadata.scala > > > For a DataFrame schema with nested StructTypes, where metadata is set for > fields in the schema, that metadata is lost when a DataFrame selects nested > fields. For example, suppose > {code:java} > df.schema.fields[0].dataType.fields[0].metadata > {code} > returns a non-empty dictionary, then > {code:java} > df.select('Field0.SubField0').schema.fields[0].metadata{code} > returns an empty dictionary, where "Field0" is the name of the first field in > the DataFrame and "SubField0" is the name of the first nested field under > "Field0". > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37929) Support cascade mode for `dropNamespace` API
[ https://issues.apache.org/jira/browse/SPARK-37929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479988#comment-17479988 ] Apache Spark commented on SPARK-37929: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/35271 > Support cascade mode for `dropNamespace` API > - > > Key: SPARK-37929 > URL: https://issues.apache.org/jira/browse/SPARK-37929 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: dch nguyen >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36649) Support Trigger.AvailableNow on Kafka data source
[ https://issues.apache.org/jira/browse/SPARK-36649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanjian Li resolved SPARK-36649. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35238 [https://github.com/apache/spark/pull/35238] > Support Trigger.AvailableNow on Kafka data source > - > > Key: SPARK-36649 > URL: https://issues.apache.org/jira/browse/SPARK-36649 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Jungtaek Lim >Priority: Major > Fix For: 3.3.0 > > > SPARK-36533 introduces a new trigger Trigger.AvailableNow, but only > introduces the new functionality to the file stream source. Given that Kafka > data source is the one of major data sources being used in streaming query, > we should make Kafka data source support this. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-28516) Data Type Formatting Functions: `to_char`
[ https://issues.apache.org/jira/browse/SPARK-28516 ] jiaan.geng deleted comment on SPARK-28516: was (Author: beliefer): I'm working on. > Data Type Formatting Functions: `to_char` > - > > Key: SPARK-28516 > URL: https://issues.apache.org/jira/browse/SPARK-28516 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark does not have support for `to_char`. PgSQL, however, > [does|[https://www.postgresql.org/docs/12/functions-formatting.html]]: > Query example: > {code:sql} > SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 > FOLLOWING),'9D9') > {code} > ||Function||Return Type||Description||Example|| > |{{to_char(}}{{timestamp}}{{, }}{{text}}{{)}}|{{text}}|convert time stamp to > string|{{to_char(current_timestamp, 'HH12:MI:SS')}}| > |{{to_char(}}{{interval}}{{, }}{{text}}{{)}}|{{text}}|convert interval to > string|{{to_char(interval '15h 2m 12s', 'HH24:MI:SS')}}| > |{{to_char(}}{{int}}{{, }}{{text}}{{)}}|{{text}}|convert integer to > string|{{to_char(125, '999')}}| > |{{to_char}}{{(}}{{double precision}}{{, }}{{text}}{{)}}|{{text}}|convert > real/double precision to string|{{to_char(125.8::real, '999D9')}}| > |{{to_char(}}{{numeric}}{{, }}{{text}}{{)}}|{{text}}|convert numeric to > string|{{to_char(-125.8, '999D99S')}}| -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28516) Data Type Formatting Functions: `to_char`
[ https://issues.apache.org/jira/browse/SPARK-28516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479903#comment-17479903 ] Apache Spark commented on SPARK-28516: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/35269 > Data Type Formatting Functions: `to_char` > - > > Key: SPARK-28516 > URL: https://issues.apache.org/jira/browse/SPARK-28516 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark does not have support for `to_char`. PgSQL, however, > [does|[https://www.postgresql.org/docs/12/functions-formatting.html]]: > Query example: > {code:sql} > SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 > FOLLOWING),'9D9') > {code} > ||Function||Return Type||Description||Example|| > |{{to_char(}}{{timestamp}}{{, }}{{text}}{{)}}|{{text}}|convert time stamp to > string|{{to_char(current_timestamp, 'HH12:MI:SS')}}| > |{{to_char(}}{{interval}}{{, }}{{text}}{{)}}|{{text}}|convert interval to > string|{{to_char(interval '15h 2m 12s', 'HH24:MI:SS')}}| > |{{to_char(}}{{int}}{{, }}{{text}}{{)}}|{{text}}|convert integer to > string|{{to_char(125, '999')}}| > |{{to_char}}{{(}}{{double precision}}{{, }}{{text}}{{)}}|{{text}}|convert > real/double precision to string|{{to_char(125.8::real, '999D9')}}| > |{{to_char(}}{{numeric}}{{, }}{{text}}{{)}}|{{text}}|convert numeric to > string|{{to_char(-125.8, '999D99S')}}| -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28516) Data Type Formatting Functions: `to_char`
[ https://issues.apache.org/jira/browse/SPARK-28516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28516: Assignee: Apache Spark > Data Type Formatting Functions: `to_char` > - > > Key: SPARK-28516 > URL: https://issues.apache.org/jira/browse/SPARK-28516 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Assignee: Apache Spark >Priority: Major > > Currently, Spark does not have support for `to_char`. PgSQL, however, > [does|[https://www.postgresql.org/docs/12/functions-formatting.html]]: > Query example: > {code:sql} > SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 > FOLLOWING),'9D9') > {code} > ||Function||Return Type||Description||Example|| > |{{to_char(}}{{timestamp}}{{, }}{{text}}{{)}}|{{text}}|convert time stamp to > string|{{to_char(current_timestamp, 'HH12:MI:SS')}}| > |{{to_char(}}{{interval}}{{, }}{{text}}{{)}}|{{text}}|convert interval to > string|{{to_char(interval '15h 2m 12s', 'HH24:MI:SS')}}| > |{{to_char(}}{{int}}{{, }}{{text}}{{)}}|{{text}}|convert integer to > string|{{to_char(125, '999')}}| > |{{to_char}}{{(}}{{double precision}}{{, }}{{text}}{{)}}|{{text}}|convert > real/double precision to string|{{to_char(125.8::real, '999D9')}}| > |{{to_char(}}{{numeric}}{{, }}{{text}}{{)}}|{{text}}|convert numeric to > string|{{to_char(-125.8, '999D99S')}}| -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28516) Data Type Formatting Functions: `to_char`
[ https://issues.apache.org/jira/browse/SPARK-28516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28516: Assignee: (was: Apache Spark) > Data Type Formatting Functions: `to_char` > - > > Key: SPARK-28516 > URL: https://issues.apache.org/jira/browse/SPARK-28516 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark does not have support for `to_char`. PgSQL, however, > [does|[https://www.postgresql.org/docs/12/functions-formatting.html]]: > Query example: > {code:sql} > SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 > FOLLOWING),'9D9') > {code} > ||Function||Return Type||Description||Example|| > |{{to_char(}}{{timestamp}}{{, }}{{text}}{{)}}|{{text}}|convert time stamp to > string|{{to_char(current_timestamp, 'HH12:MI:SS')}}| > |{{to_char(}}{{interval}}{{, }}{{text}}{{)}}|{{text}}|convert interval to > string|{{to_char(interval '15h 2m 12s', 'HH24:MI:SS')}}| > |{{to_char(}}{{int}}{{, }}{{text}}{{)}}|{{text}}|convert integer to > string|{{to_char(125, '999')}}| > |{{to_char}}{{(}}{{double precision}}{{, }}{{text}}{{)}}|{{text}}|convert > real/double precision to string|{{to_char(125.8::real, '999D9')}}| > |{{to_char(}}{{numeric}}{{, }}{{text}}{{)}}|{{text}}|convert numeric to > string|{{to_char(-125.8, '999D99S')}}| -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37950) Take EXTERNAL as a reserved table property
[ https://issues.apache.org/jira/browse/SPARK-37950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479902#comment-17479902 ] Apache Spark commented on SPARK-37950: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/35268 > Take EXTERNAL as a reserved table property > -- > > Key: SPARK-37950 > URL: https://issues.apache.org/jira/browse/SPARK-37950 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Priority: Major > > At now. the {{EXTERNAL}} is not table reserved property. we should make > {{EXTERNAL}} a truly reserved property > [discuss|https://github.com/apache/spark/pull/35204#issuecomment-1014752053] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37950) Take EXTERNAL as a reserved table property
[ https://issues.apache.org/jira/browse/SPARK-37950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37950: Assignee: (was: Apache Spark) > Take EXTERNAL as a reserved table property > -- > > Key: SPARK-37950 > URL: https://issues.apache.org/jira/browse/SPARK-37950 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Priority: Major > > At now. the {{EXTERNAL}} is not table reserved property. we should make > {{EXTERNAL}} a truly reserved property > [discuss|https://github.com/apache/spark/pull/35204#issuecomment-1014752053] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37950) Take EXTERNAL as a reserved table property
[ https://issues.apache.org/jira/browse/SPARK-37950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479901#comment-17479901 ] Apache Spark commented on SPARK-37950: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/35268 > Take EXTERNAL as a reserved table property > -- > > Key: SPARK-37950 > URL: https://issues.apache.org/jira/browse/SPARK-37950 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Priority: Major > > At now. the {{EXTERNAL}} is not table reserved property. we should make > {{EXTERNAL}} a truly reserved property > [discuss|https://github.com/apache/spark/pull/35204#issuecomment-1014752053] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37950) Take EXTERNAL as a reserved table property
[ https://issues.apache.org/jira/browse/SPARK-37950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37950: Assignee: Apache Spark > Take EXTERNAL as a reserved table property > -- > > Key: SPARK-37950 > URL: https://issues.apache.org/jira/browse/SPARK-37950 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: Apache Spark >Priority: Major > > At now. the {{EXTERNAL}} is not table reserved property. we should make > {{EXTERNAL}} a truly reserved property > [discuss|https://github.com/apache/spark/pull/35204#issuecomment-1014752053] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37978) Remove the unnecessary ChunkFetchFailureException class
[ https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37978: - Description: The ChunkFetchFailureException is unnecessary and can be replaced by RuntimeException. (was: Remove the useless ChunkFetchFailureException class) > Remove the unnecessary ChunkFetchFailureException class > --- > > Key: SPARK-37978 > URL: https://issues.apache.org/jira/browse/SPARK-37978 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0 >Reporter: weixiuli >Priority: Major > > The ChunkFetchFailureException is unnecessary and can be replaced by > RuntimeException. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37978) Remove the unnecessary ChunkFetchFailureException class
[ https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37978: - Summary: Remove the unnecessary ChunkFetchFailureException class (was: Remove the useless ChunkFetchFailureException class) > Remove the unnecessary ChunkFetchFailureException class > --- > > Key: SPARK-37978 > URL: https://issues.apache.org/jira/browse/SPARK-37978 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0 >Reporter: weixiuli >Priority: Major > > Remove the useless ChunkFetchFailureException class -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37978) Remove the useless ChunkFetchFailureException class
[ https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37978: Assignee: (was: Apache Spark) > Remove the useless ChunkFetchFailureException class > --- > > Key: SPARK-37978 > URL: https://issues.apache.org/jira/browse/SPARK-37978 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0 >Reporter: weixiuli >Priority: Major > > Remove the useless ChunkFetchFailureException class -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37978) Remove the useless ChunkFetchFailureException class
[ https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479881#comment-17479881 ] Apache Spark commented on SPARK-37978: -- User 'weixiuli' has created a pull request for this issue: https://github.com/apache/spark/pull/35267 > Remove the useless ChunkFetchFailureException class > --- > > Key: SPARK-37978 > URL: https://issues.apache.org/jira/browse/SPARK-37978 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0 >Reporter: weixiuli >Priority: Major > > Remove the useless ChunkFetchFailureException class -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37978) Remove the useless ChunkFetchFailureException class
[ https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37978: Assignee: Apache Spark > Remove the useless ChunkFetchFailureException class > --- > > Key: SPARK-37978 > URL: https://issues.apache.org/jira/browse/SPARK-37978 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0 >Reporter: weixiuli >Assignee: Apache Spark >Priority: Major > > Remove the useless ChunkFetchFailureException class -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org