date:20220121

[jira] [Commented] (SPARK-37578) DSV2 is not updating Output Metrics

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480360#comment-17480360
 ] 

Apache Spark commented on SPARK-37578:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/35277

> DSV2 is not updating Output Metrics
> ---
>
> Key: SPARK-37578
> URL: https://issues.apache.org/jira/browse/SPARK-37578
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Sandeep Katta
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.3.0
>
>
> Repro code
> ./bin/spark-shell --master local  --jars 
> /Users/jars/iceberg-spark3-runtime-0.12.1.jar
>  
> {code:java}
> import scala.collection.mutable
> import org.apache.spark.scheduler._val bytesWritten = new 
> mutable.ArrayBuffer[Long]()
> val recordsWritten = new mutable.ArrayBuffer[Long]()
> val bytesWrittenListener = new SparkListener() {
>   override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
>     bytesWritten += taskEnd.taskMetrics.outputMetrics.bytesWritten
>     recordsWritten += taskEnd.taskMetrics.outputMetrics.recordsWritten
>   }
> }
> spark.sparkContext.addSparkListener(bytesWrittenListener)
> try {
> val df = spark.range(1000).toDF("id")
>   df.write.format("iceberg").save("Users/data/dsv2_test")
>   
> assert(bytesWritten.sum > 0)
> assert(recordsWritten.sum > 0)
> } finally {
>   spark.sparkContext.removeSparkListener(bytesWrittenListener)
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37985) Fix flaky test SPARK-37578

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37985:


Assignee: Apache Spark

> Fix flaky test SPARK-37578
> --
>
> Key: SPARK-37985
> URL: https://issues.apache.org/jira/browse/SPARK-37985
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> 2022-01-22T01:58:29.8444339Z [0m[[0m[0minfo[0m] [0m[0m[32m- 
> SPARK-36030: Report metrics from Datasource v2 write (90 milliseconds)[0m[0m
> 2022-01-22T01:58:29.9427049Z [0m[[0m[0minfo[0m] [0m[0m[31m- 
> SPARK-37578: Update output metrics from Datasource v2 *** FAILED *** (65 
> milliseconds)[0m[0m
> 2022-01-22T01:58:29.9428038Z [0m[[0m[0minfo[0m] [0m[0m[31m  123 did 
> not equal 246 (SQLAppStatusListenerSuite.scala:936)[0m[0m
> 2022-01-22T01:58:29.9428531Z [0m[[0m[0minfo[0m] [0m[0m[31m  
> org.scalatest.exceptions.TestFailedException:[0m[0m
> 2022-01-22T01:58:29.9429101Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)[0m[0m
> 2022-01-22T01:58:29.9429717Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)[0m[0m
> 2022-01-22T01:58:29.9430298Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)[0m[0m
> 2022-01-22T01:58:29.9430840Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)[0m[0m
> 2022-01-22T01:58:29.9431512Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61(SQLAppStatusListenerSuite.scala:936)[0m[0m
> 2022-01-22T01:58:29.9432305Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61$adapted(SQLAppStatusListenerSuite.scala:905)[0m[0m
> 2022-01-22T01:58:29.9432982Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:79)[0m[0m
> 2022-01-22T01:58:29.9433695Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:78)[0m[0m
> 2022-01-22T01:58:29.9434276Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:221)[0m[0m
> 2022-01-22T01:58:29.9435040Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m
> 2022-01-22T01:58:29.9435764Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:78)[0m[0m
> 2022-01-22T01:58:29.9436354Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:77)[0m[0m
> 2022-01-22T01:58:29.9437063Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m
> 2022-01-22T01:58:29.9437851Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$60(SQLAppStatusListenerSuite.scala:905)[0m[0m



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37985) Fix flaky test SPARK-37578

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37985:


Assignee: (was: Apache Spark)

> Fix flaky test SPARK-37578
> --
>
> Key: SPARK-37985
> URL: https://issues.apache.org/jira/browse/SPARK-37985
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> 2022-01-22T01:58:29.8444339Z [0m[[0m[0minfo[0m] [0m[0m[32m- 
> SPARK-36030: Report metrics from Datasource v2 write (90 milliseconds)[0m[0m
> 2022-01-22T01:58:29.9427049Z [0m[[0m[0minfo[0m] [0m[0m[31m- 
> SPARK-37578: Update output metrics from Datasource v2 *** FAILED *** (65 
> milliseconds)[0m[0m
> 2022-01-22T01:58:29.9428038Z [0m[[0m[0minfo[0m] [0m[0m[31m  123 did 
> not equal 246 (SQLAppStatusListenerSuite.scala:936)[0m[0m
> 2022-01-22T01:58:29.9428531Z [0m[[0m[0minfo[0m] [0m[0m[31m  
> org.scalatest.exceptions.TestFailedException:[0m[0m
> 2022-01-22T01:58:29.9429101Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)[0m[0m
> 2022-01-22T01:58:29.9429717Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)[0m[0m
> 2022-01-22T01:58:29.9430298Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)[0m[0m
> 2022-01-22T01:58:29.9430840Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)[0m[0m
> 2022-01-22T01:58:29.9431512Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61(SQLAppStatusListenerSuite.scala:936)[0m[0m
> 2022-01-22T01:58:29.9432305Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61$adapted(SQLAppStatusListenerSuite.scala:905)[0m[0m
> 2022-01-22T01:58:29.9432982Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:79)[0m[0m
> 2022-01-22T01:58:29.9433695Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:78)[0m[0m
> 2022-01-22T01:58:29.9434276Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:221)[0m[0m
> 2022-01-22T01:58:29.9435040Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m
> 2022-01-22T01:58:29.9435764Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:78)[0m[0m
> 2022-01-22T01:58:29.9436354Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:77)[0m[0m
> 2022-01-22T01:58:29.9437063Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m
> 2022-01-22T01:58:29.9437851Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$60(SQLAppStatusListenerSuite.scala:905)[0m[0m



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37985) Fix flaky test SPARK-37578

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480359#comment-17480359
 ] 

Apache Spark commented on SPARK-37985:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/35277

> Fix flaky test SPARK-37578
> --
>
> Key: SPARK-37985
> URL: https://issues.apache.org/jira/browse/SPARK-37985
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> 2022-01-22T01:58:29.8444339Z [0m[[0m[0minfo[0m] [0m[0m[32m- 
> SPARK-36030: Report metrics from Datasource v2 write (90 milliseconds)[0m[0m
> 2022-01-22T01:58:29.9427049Z [0m[[0m[0minfo[0m] [0m[0m[31m- 
> SPARK-37578: Update output metrics from Datasource v2 *** FAILED *** (65 
> milliseconds)[0m[0m
> 2022-01-22T01:58:29.9428038Z [0m[[0m[0minfo[0m] [0m[0m[31m  123 did 
> not equal 246 (SQLAppStatusListenerSuite.scala:936)[0m[0m
> 2022-01-22T01:58:29.9428531Z [0m[[0m[0minfo[0m] [0m[0m[31m  
> org.scalatest.exceptions.TestFailedException:[0m[0m
> 2022-01-22T01:58:29.9429101Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)[0m[0m
> 2022-01-22T01:58:29.9429717Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)[0m[0m
> 2022-01-22T01:58:29.9430298Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)[0m[0m
> 2022-01-22T01:58:29.9430840Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)[0m[0m
> 2022-01-22T01:58:29.9431512Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61(SQLAppStatusListenerSuite.scala:936)[0m[0m
> 2022-01-22T01:58:29.9432305Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61$adapted(SQLAppStatusListenerSuite.scala:905)[0m[0m
> 2022-01-22T01:58:29.9432982Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:79)[0m[0m
> 2022-01-22T01:58:29.9433695Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:78)[0m[0m
> 2022-01-22T01:58:29.9434276Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:221)[0m[0m
> 2022-01-22T01:58:29.9435040Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m
> 2022-01-22T01:58:29.9435764Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:78)[0m[0m
> 2022-01-22T01:58:29.9436354Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:77)[0m[0m
> 2022-01-22T01:58:29.9437063Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m
> 2022-01-22T01:58:29.9437851Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$60(SQLAppStatusListenerSuite.scala:905)[0m[0m



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37985) Fix flaky test SPARK-37578

2022-01-21 Thread angerszhu (Jira)

angerszhu created SPARK-37985:
-

 Summary: Fix flaky test SPARK-37578
 Key: SPARK-37985
 URL: https://issues.apache.org/jira/browse/SPARK-37985
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu


2022-01-22T01:58:29.8444339Z [0m[[0m[0minfo[0m] [0m[0m[32m- SPARK-36030: 
Report metrics from Datasource v2 write (90 milliseconds)[0m[0m
2022-01-22T01:58:29.9427049Z [0m[[0m[0minfo[0m] [0m[0m[31m- SPARK-37578: 
Update output metrics from Datasource v2 *** FAILED *** (65 
milliseconds)[0m[0m
2022-01-22T01:58:29.9428038Z [0m[[0m[0minfo[0m] [0m[0m[31m  123 did not 
equal 246 (SQLAppStatusListenerSuite.scala:936)[0m[0m
2022-01-22T01:58:29.9428531Z [0m[[0m[0minfo[0m] [0m[0m[31m  
org.scalatest.exceptions.TestFailedException:[0m[0m
2022-01-22T01:58:29.9429101Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)[0m[0m
2022-01-22T01:58:29.9429717Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)[0m[0m
2022-01-22T01:58:29.9430298Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)[0m[0m
2022-01-22T01:58:29.9430840Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)[0m[0m
2022-01-22T01:58:29.9431512Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61(SQLAppStatusListenerSuite.scala:936)[0m[0m
2022-01-22T01:58:29.9432305Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61$adapted(SQLAppStatusListenerSuite.scala:905)[0m[0m
2022-01-22T01:58:29.9432982Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:79)[0m[0m
2022-01-22T01:58:29.9433695Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:78)[0m[0m
2022-01-22T01:58:29.9434276Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:221)[0m[0m
2022-01-22T01:58:29.9435040Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m
2022-01-22T01:58:29.9435764Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:78)[0m[0m
2022-01-22T01:58:29.9436354Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:77)[0m[0m
2022-01-22T01:58:29.9437063Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m
2022-01-22T01:58:29.9437851Z [0m[[0m[0minfo[0m] [0m[0m[31m  at 
org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$60(SQLAppStatusListenerSuite.scala:905)[0m[0m




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37984) Avoid calculating all outstanding requests to improve performance.

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37984:


Assignee: Apache Spark

> Avoid calculating all outstanding requests to improve performance.
> --
>
> Key: SPARK-37984
> URL: https://issues.apache.org/jira/browse/SPARK-37984
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.0, 3.1.2, 3.2.0
>Reporter: weixiuli
>Assignee: Apache Spark
>Priority: Major
>
> Avoid calculating all outstanding requests to improve performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37984) Avoid calculating all outstanding requests to improve performance.

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37984:


Assignee: (was: Apache Spark)

> Avoid calculating all outstanding requests to improve performance.
> --
>
> Key: SPARK-37984
> URL: https://issues.apache.org/jira/browse/SPARK-37984
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.0, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Avoid calculating all outstanding requests to improve performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37984) Avoid calculating all outstanding requests to improve performance.

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480351#comment-17480351
 ] 

Apache Spark commented on SPARK-37984:
--

User 'weixiuli' has created a pull request for this issue:
https://github.com/apache/spark/pull/35276

> Avoid calculating all outstanding requests to improve performance.
> --
>
> Key: SPARK-37984
> URL: https://issues.apache.org/jira/browse/SPARK-37984
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.0, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Avoid calculating all outstanding requests to improve performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37984) Avoid calculating all outstanding requests to improve performance.

2022-01-21 Thread weixiuli (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37984:
-
Description: Avoid calculating all outstanding requests to improve 
performance.  (was: Avoid computing all outstanding requests to improve 
performance.)
Summary: Avoid calculating all outstanding requests to improve 
performance.  (was: Avoid computing all outstanding requests to improve 
performance.)

> Avoid calculating all outstanding requests to improve performance.
> --
>
> Key: SPARK-37984
> URL: https://issues.apache.org/jira/browse/SPARK-37984
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.0, 3.1.2, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Avoid calculating all outstanding requests to improve performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37984) Avoid computing all outstanding requests to improve performance.

2022-01-21 Thread weixiuli (Jira)

weixiuli created SPARK-37984:


 Summary: Avoid computing all outstanding requests to improve 
performance.
 Key: SPARK-37984
 URL: https://issues.apache.org/jira/browse/SPARK-37984
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 3.2.0, 3.1.2, 3.1.0, 3.0.3, 3.0.1, 3.0.0
Reporter: weixiuli


Avoid computing all outstanding requests to improve performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37731) refactor and cleanup function lookup in Analyzer

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480323#comment-17480323
 ] 

Apache Spark commented on SPARK-37731:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/35275

> refactor and cleanup function lookup in Analyzer
> 
>
> Key: SPARK-37731
> URL: https://issues.apache.org/jira/browse/SPARK-37731
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37982) Use error classes in the execution errors related to unsupported input type

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480309#comment-17480309
 ] 

Apache Spark commented on SPARK-37982:
--

User 'leesf' has created a pull request for this issue:
https://github.com/apache/spark/pull/35274

> Use error classes in the execution errors related to unsupported input type
> ---
>
> Key: SPARK-37982
> URL: https://issues.apache.org/jira/browse/SPARK-37982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: leesf
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37982) Use error classes in the execution errors related to unsupported input type

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37982:


Assignee: Apache Spark

> Use error classes in the execution errors related to unsupported input type
> ---
>
> Key: SPARK-37982
> URL: https://issues.apache.org/jira/browse/SPARK-37982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: leesf
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37982) Use error classes in the execution errors related to unsupported input type

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37982:


Assignee: (was: Apache Spark)

> Use error classes in the execution errors related to unsupported input type
> ---
>
> Key: SPARK-37982
> URL: https://issues.apache.org/jira/browse/SPARK-37982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: leesf
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37982) Use error classes in the execution errors related to unsupported input type

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480308#comment-17480308
 ] 

Apache Spark commented on SPARK-37982:
--

User 'leesf' has created a pull request for this issue:
https://github.com/apache/spark/pull/35274

> Use error classes in the execution errors related to unsupported input type
> ---
>
> Key: SPARK-37982
> URL: https://issues.apache.org/jira/browse/SPARK-37982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: leesf
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37981) Deletes columns with all Null as default.

2022-01-21 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz resolved SPARK-37981.

Resolution: Duplicate

> Deletes columns with all Null as default.
> -
>
> Key: SPARK-37981
> URL: https://issues.apache.org/jira/browse/SPARK-37981
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Bjørn Jørgensen
>Priority: Major
> Attachments: json_null.json
>
>
> Spark 3.2.1-RC2 
> During write.json spark deletes columns with all Null as default. 
>  
> Spark does have dropFieldIfAllNullfalse as default, according to 
> https://spark.apache.org/docs/latest/sql-data-sources-json.html
> {code:java}
> from pyspark import pandas as ps
> import re
> import numpy as np
> import os
> import pandas as pd
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
> from pyspark.sql.types import StructType, StructField, StringType,IntegerType
> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
> def get_spark_session(app_name: str, conf: SparkConf):
> conf.setMaster('local[*]')
> conf \
>   .set('spark.driver.memory', '64g')\
>   .set("fs.s3a.access.key", "minio") \
>   .set("fs.s3a.secret.key", "") \
>   .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
>   .set("spark.hadoop.fs.s3a.impl", 
> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>   .set("spark.sql.repl.eagerEval.enabled", "True") \
>   .set("spark.sql.adaptive.enabled", "True") \
>   .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
>   .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
>   .set("sc.setLogLevel", "error")
>
> return 
> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
> spark = get_spark_session("Falk", SparkConf())
> d3 = 
> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
> (653610, 267)
> d3.write.json("d3.json")
> d3 = spark.read.json("d3.json/*.json")
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
> (653610, 186)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18591) Replace hash-based aggregates with sort-based ones if inputs already sorted

2022-01-21 Thread Cheng Su (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-18591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480300#comment-17480300
 ] 

Cheng Su commented on SPARK-18591:
--

Just FYI, the Jira should be fixed by 
https://issues.apache.org/jira/browse/SPARK-37455 . The related code is merged 
already and should be released in next Spark release - Spark 3.3.0.

> Replace hash-based aggregates with sort-based ones if inputs already sorted
> ---
>
> Key: SPARK-18591
> URL: https://issues.apache.org/jira/browse/SPARK-18591
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Takeshi Yamamuro
>Priority: Major
>  Labels: bulk-closed
>
> Spark currently uses sort-based aggregates only in limited condition; the 
> cases where spark cannot use partial aggregates and hash-based ones.
> However, if input ordering has already satisfied the requirements of 
> sort-based aggregates, it seems sort-based ones are faster than the other.
> {code}
> ./bin/spark-shell --conf spark.sql.shuffle.partitions=1
> val df = spark.range(1000).selectExpr("id AS key", "id % 10 AS 
> value").sort($"key").cache
> def timer[R](block: => R): R = {
>   val t0 = System.nanoTime()
>   val result = block
>   val t1 = System.nanoTime()
>   println("Elapsed time: " + ((t1 - t0 + 0.0) / 10.0)+ "s")
>   result
> }
> timer {
>   df.groupBy("key").count().count
> }
> // codegen'd hash aggregate
> Elapsed time: 7.116962977s
> // non-codegen'd sort aggregarte
> Elapsed time: 3.088816662s
> {code}
> If codegen'd sort-based aggregates are supported in SPARK-16844, this seems 
> to make the performance gap bigger;
> {code}
> - codegen'd sort aggregate
> Elapsed time: 1.645234684s
> {code} 
> Therefore, it'd be better to use sort-based ones in this case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37983) Backout agg build time metrics from sort aggregate

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37983:


Assignee: (was: Apache Spark)

> Backout agg build time metrics from sort aggregate
> --
>
> Key: SPARK-37983
> URL: https://issues.apache.org/jira/browse/SPARK-37983
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Su
>Priority: Trivial
>
> This is a followup of https://issues.apache.org/jira/browse/SPARK-37564 . I 
> realize the agg build time metrics for sort aggregate is actually not 
> correctly recorded. We don't have a hash build phase for sort aggregate, so 
> there is really no way to measure so-called build time for sort aggregate. So 
> here I make the change to back out the change introduced in 
> [https://github.com/apache/spark/pull/34826] for agg build time metric.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37983) Backout agg build time metrics from sort aggregate

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37983:


Assignee: Apache Spark

> Backout agg build time metrics from sort aggregate
> --
>
> Key: SPARK-37983
> URL: https://issues.apache.org/jira/browse/SPARK-37983
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Su
>Assignee: Apache Spark
>Priority: Trivial
>
> This is a followup of https://issues.apache.org/jira/browse/SPARK-37564 . I 
> realize the agg build time metrics for sort aggregate is actually not 
> correctly recorded. We don't have a hash build phase for sort aggregate, so 
> there is really no way to measure so-called build time for sort aggregate. So 
> here I make the change to back out the change introduced in 
> [https://github.com/apache/spark/pull/34826] for agg build time metric.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37983) Backout agg build time metrics from sort aggregate

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480298#comment-17480298
 ] 

Apache Spark commented on SPARK-37983:
--

User 'c21' has created a pull request for this issue:
https://github.com/apache/spark/pull/35273

> Backout agg build time metrics from sort aggregate
> --
>
> Key: SPARK-37983
> URL: https://issues.apache.org/jira/browse/SPARK-37983
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Su
>Priority: Trivial
>
> This is a followup of https://issues.apache.org/jira/browse/SPARK-37564 . I 
> realize the agg build time metrics for sort aggregate is actually not 
> correctly recorded. We don't have a hash build phase for sort aggregate, so 
> there is really no way to measure so-called build time for sort aggregate. So 
> here I make the change to back out the change introduced in 
> [https://github.com/apache/spark/pull/34826] for agg build time metric.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37983) Backout agg build time metrics from sort aggregate

2022-01-21 Thread Cheng Su (Jira)

Cheng Su created SPARK-37983:


 Summary: Backout agg build time metrics from sort aggregate
 Key: SPARK-37983
 URL: https://issues.apache.org/jira/browse/SPARK-37983
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Cheng Su


This is a followup of https://issues.apache.org/jira/browse/SPARK-37564 . I 
realize the agg build time metrics for sort aggregate is actually not correctly 
recorded. We don't have a hash build phase for sort aggregate, so there is 
really no way to measure so-called build time for sort aggregate. So here I 
make the change to back out the change introduced in 
[https://github.com/apache/spark/pull/34826] for agg build time metric.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37981) Deletes columns with all Null as default.

2022-01-21 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-37981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480293#comment-17480293
 ] 

Bjørn Jørgensen commented on SPARK-37981:
-

 [^json_null.json] 



{code:java}
from pyspark import pandas as ps
import re
import numpy as np
import os
import pandas as pd

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
from pyspark.sql.types import StructType, StructField, StringType,IntegerType

os.environ["PYARROW_IGNORE_TIMEZONE"]="1"

def get_spark_session(app_name: str, conf: SparkConf):
conf.setMaster('local[*]')
conf \
  .set('spark.driver.memory', '64g')\
  .set("fs.s3a.access.key", "minio") \
  .set("fs.s3a.secret.key", "") \
  .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
  .set("spark.hadoop.fs.s3a.impl", 
"org.apache.hadoop.fs.s3a.S3AFileSystem") \
  .set("spark.hadoop.fs.s3a.path.style.access", "true") \
  .set("spark.sql.repl.eagerEval.enabled", "True") \
  .set("spark.sql.adaptive.enabled", "True") \
  .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
  .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
  .set("sc.setLogLevel", "error")

return 
SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()

spark = get_spark_session("Falk", SparkConf())


df = spark.read.option("multiline","true").json("json_null.json")


import pyspark
def sparkShape(dataFrame):
return (dataFrame.count(), len(dataFrame.columns))
pyspark.sql.dataframe.DataFrame.shape = sparkShape
print(df.shape())

(1, 4)


df.write.json("df.json")

df = spark.read.json("df.json/*.json")

import pyspark
def sparkShape(dataFrame):
return (dataFrame.count(), len(dataFrame.columns))
pyspark.sql.dataframe.DataFrame.shape = sparkShape
print(df.shape())


(1, 3)

{code}


> Deletes columns with all Null as default.
> -
>
> Key: SPARK-37981
> URL: https://issues.apache.org/jira/browse/SPARK-37981
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Bjørn Jørgensen
>Priority: Major
> Attachments: json_null.json
>
>
> Spark 3.2.1-RC2 
> During write.json spark deletes columns with all Null as default. 
>  
> Spark does have dropFieldIfAllNullfalse as default, according to 
> https://spark.apache.org/docs/latest/sql-data-sources-json.html
> {code:java}
> from pyspark import pandas as ps
> import re
> import numpy as np
> import os
> import pandas as pd
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
> from pyspark.sql.types import StructType, StructField, StringType,IntegerType
> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
> def get_spark_session(app_name: str, conf: SparkConf):
> conf.setMaster('local[*]')
> conf \
>   .set('spark.driver.memory', '64g')\
>   .set("fs.s3a.access.key", "minio") \
>   .set("fs.s3a.secret.key", "") \
>   .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
>   .set("spark.hadoop.fs.s3a.impl", 
> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>   .set("spark.sql.repl.eagerEval.enabled", "True") \
>   .set("spark.sql.adaptive.enabled", "True") \
>   .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
>   .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
>   .set("sc.setLogLevel", "error")
>
> return 
> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
> spark = get_spark_session("Falk", SparkConf())
> d3 = 
> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
> (653610, 267)
> d3.write.json("d3.json")
> d3 = spark.read.json("d3.json/*.json")
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
> (653610, 186)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37982) Use error classes in the execution errors related to unsupported input type

2022-01-21 Thread leesf (Jira)

leesf created SPARK-37982:
-

 Summary: Use error classes in the execution errors related to 
unsupported input type
 Key: SPARK-37982
 URL: https://issues.apache.org/jira/browse/SPARK-37982
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: leesf
 Fix For: 3.3.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-37981) Deletes columns with all Null as default.

2022-01-21 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480291#comment-17480291
 ] 

Maciej Szymkiewicz edited comment on SPARK-37981 at 1/21/22, 11:37 PM:
---

This doesn't seem valid.

{{dropFieldIfAllNull}} is a reader option. For writes, we use 
{{ignoreNullFields}}.

So your write code should use appropriate option:

{code}

d3.write.option("ignoreNullFields", "false").json("d3.json")

{code}


was (Author: zero323):
This doesn't seem valid.

{{dropFieldIfAllNull}} is a reader option. For writes, we use 
{{ignoreNullFields}}.

So your code should be 

{code}

d3.write.option("ignoreNullFields", "false").json("d3.json")

{code}

> Deletes columns with all Null as default.
> -
>
> Key: SPARK-37981
> URL: https://issues.apache.org/jira/browse/SPARK-37981
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Bjørn Jørgensen
>Priority: Major
> Attachments: json_null.json
>
>
> Spark 3.2.1-RC2 
> During write.json spark deletes columns with all Null as default. 
>  
> Spark does have dropFieldIfAllNullfalse as default, according to 
> https://spark.apache.org/docs/latest/sql-data-sources-json.html
> {code:java}
> from pyspark import pandas as ps
> import re
> import numpy as np
> import os
> import pandas as pd
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
> from pyspark.sql.types import StructType, StructField, StringType,IntegerType
> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
> def get_spark_session(app_name: str, conf: SparkConf):
> conf.setMaster('local[*]')
> conf \
>   .set('spark.driver.memory', '64g')\
>   .set("fs.s3a.access.key", "minio") \
>   .set("fs.s3a.secret.key", "") \
>   .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
>   .set("spark.hadoop.fs.s3a.impl", 
> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>   .set("spark.sql.repl.eagerEval.enabled", "True") \
>   .set("spark.sql.adaptive.enabled", "True") \
>   .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
>   .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
>   .set("sc.setLogLevel", "error")
>
> return 
> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
> spark = get_spark_session("Falk", SparkConf())
> d3 = 
> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
> (653610, 267)
> d3.write.json("d3.json")
> d3 = spark.read.json("d3.json/*.json")
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
> (653610, 186)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37981) Deletes columns with all Null as default.

2022-01-21 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480291#comment-17480291
 ] 

Maciej Szymkiewicz commented on SPARK-37981:


This doesn't seem valid.

{{dropFieldIfAllNull}} is a reader option. For writes, we use 
{{ignoreNullFields}}.

So your code should be 

{code}

d3.write.option("ignoreNullFields", "false").json("d3.json")

{code}

> Deletes columns with all Null as default.
> -
>
> Key: SPARK-37981
> URL: https://issues.apache.org/jira/browse/SPARK-37981
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Bjørn Jørgensen
>Priority: Major
> Attachments: json_null.json
>
>
> Spark 3.2.1-RC2 
> During write.json spark deletes columns with all Null as default. 
>  
> Spark does have dropFieldIfAllNullfalse as default, according to 
> https://spark.apache.org/docs/latest/sql-data-sources-json.html
> {code:java}
> from pyspark import pandas as ps
> import re
> import numpy as np
> import os
> import pandas as pd
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
> from pyspark.sql.types import StructType, StructField, StringType,IntegerType
> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
> def get_spark_session(app_name: str, conf: SparkConf):
> conf.setMaster('local[*]')
> conf \
>   .set('spark.driver.memory', '64g')\
>   .set("fs.s3a.access.key", "minio") \
>   .set("fs.s3a.secret.key", "") \
>   .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
>   .set("spark.hadoop.fs.s3a.impl", 
> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>   .set("spark.sql.repl.eagerEval.enabled", "True") \
>   .set("spark.sql.adaptive.enabled", "True") \
>   .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
>   .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
>   .set("sc.setLogLevel", "error")
>
> return 
> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
> spark = get_spark_session("Falk", SparkConf())
> d3 = 
> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
> (653610, 267)
> d3.write.json("d3.json")
> d3 = spark.read.json("d3.json/*.json")
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
> (653610, 186)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37981) Deletes columns with all Null as default.

2022-01-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-37981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bjørn Jørgensen updated SPARK-37981:

Attachment: json_null.json

> Deletes columns with all Null as default.
> -
>
> Key: SPARK-37981
> URL: https://issues.apache.org/jira/browse/SPARK-37981
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Bjørn Jørgensen
>Priority: Major
> Attachments: json_null.json
>
>
> Spark 3.2.1-RC2 
> During write.json spark deletes columns with all Null as default. 
>  
> Spark does have dropFieldIfAllNullfalse as default, according to 
> https://spark.apache.org/docs/latest/sql-data-sources-json.html
> {code:java}
> from pyspark import pandas as ps
> import re
> import numpy as np
> import os
> import pandas as pd
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
> from pyspark.sql.types import StructType, StructField, StringType,IntegerType
> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
> def get_spark_session(app_name: str, conf: SparkConf):
> conf.setMaster('local[*]')
> conf \
>   .set('spark.driver.memory', '64g')\
>   .set("fs.s3a.access.key", "minio") \
>   .set("fs.s3a.secret.key", "") \
>   .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
>   .set("spark.hadoop.fs.s3a.impl", 
> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>   .set("spark.sql.repl.eagerEval.enabled", "True") \
>   .set("spark.sql.adaptive.enabled", "True") \
>   .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
>   .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
>   .set("sc.setLogLevel", "error")
>
> return 
> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
> spark = get_spark_session("Falk", SparkConf())
> d3 = 
> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
> (653610, 267)
> d3.write.json("d3.json")
> d3 = spark.read.json("d3.json/*.json")
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
> (653610, 186)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37981) Deletes columns with all Null as default.

2022-01-21 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-37981:
-
Affects Version/s: 3.2.0
   (was: 3.2.1)
 Priority: Major  (was: Critical)

This isn't possible to evaluate without seeing some input data

> Deletes columns with all Null as default.
> -
>
> Key: SPARK-37981
> URL: https://issues.apache.org/jira/browse/SPARK-37981
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> Spark 3.2.1-RC2 
> During write.json spark deletes columns with all Null as default. 
>  
> Spark does have dropFieldIfAllNullfalse as default, according to 
> https://spark.apache.org/docs/latest/sql-data-sources-json.html
> {code:java}
> from pyspark import pandas as ps
> import re
> import numpy as np
> import os
> import pandas as pd
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
> from pyspark.sql.types import StructType, StructField, StringType,IntegerType
> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
> def get_spark_session(app_name: str, conf: SparkConf):
> conf.setMaster('local[*]')
> conf \
>   .set('spark.driver.memory', '64g')\
>   .set("fs.s3a.access.key", "minio") \
>   .set("fs.s3a.secret.key", "") \
>   .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
>   .set("spark.hadoop.fs.s3a.impl", 
> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>   .set("spark.sql.repl.eagerEval.enabled", "True") \
>   .set("spark.sql.adaptive.enabled", "True") \
>   .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
>   .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
>   .set("sc.setLogLevel", "error")
>
> return 
> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
> spark = get_spark_session("Falk", SparkConf())
> d3 = 
> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
> (653610, 267)
> d3.write.json("d3.json")
> d3 = spark.read.json("d3.json/*.json")
> import pyspark
> def sparkShape(dataFrame):
> return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
> (653610, 186)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37981) Deletes columns with all Null as default.

2022-01-21 Thread Jira

Bjørn Jørgensen created SPARK-37981:
---

 Summary: Deletes columns with all Null as default.
 Key: SPARK-37981
 URL: https://issues.apache.org/jira/browse/SPARK-37981
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.2.1
Reporter: Bjørn Jørgensen


Spark 3.2.1-RC2 
During write.json spark deletes columns with all Null as default. 
 

Spark does have dropFieldIfAllNull  false as default, according to 
https://spark.apache.org/docs/latest/sql-data-sources-json.html

{code:java}



from pyspark import pandas as ps
import re
import numpy as np
import os
import pandas as pd

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
from pyspark.sql.types import StructType, StructField, StringType,IntegerType

os.environ["PYARROW_IGNORE_TIMEZONE"]="1"

def get_spark_session(app_name: str, conf: SparkConf):
conf.setMaster('local[*]')
conf \
  .set('spark.driver.memory', '64g')\
  .set("fs.s3a.access.key", "minio") \
  .set("fs.s3a.secret.key", "") \
  .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
  .set("spark.hadoop.fs.s3a.impl", 
"org.apache.hadoop.fs.s3a.S3AFileSystem") \
  .set("spark.hadoop.fs.s3a.path.style.access", "true") \
  .set("spark.sql.repl.eagerEval.enabled", "True") \
  .set("spark.sql.adaptive.enabled", "True") \
  .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
  .set("spark.sql.repl.eagerEval.maxNumRows", "1") \
  .set("sc.setLogLevel", "error")
   
return 
SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()

spark = get_spark_session("Falk", SparkConf())

d3 = 
spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")

import pyspark
def sparkShape(dataFrame):
return (dataFrame.count(), len(dataFrame.columns))
pyspark.sql.dataframe.DataFrame.shape = sparkShape
print(d3.shape())


(653610, 267)


d3.write.json("d3.json")


d3 = spark.read.json("d3.json/*.json")

import pyspark
def sparkShape(dataFrame):
return (dataFrame.count(), len(dataFrame.columns))
pyspark.sql.dataframe.DataFrame.shape = sparkShape
print(d3.shape())

(653610, 186)
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37980) Extend METADATA column to support row indexes for file based data sources

2022-01-21 Thread Prakhar Jain (Jira)

Prakhar Jain created SPARK-37980:


 Summary: Extend METADATA column to support row indexes for file 
based data sources
 Key: SPARK-37980
 URL: https://issues.apache.org/jira/browse/SPARK-37980
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3
Reporter: Prakhar Jain


Spark recently added hidden metadata column support for File based datasources 
as part of  SPARK-37273.

We should extend it to support ROW_INDEX also.

 

Definition:

ROW_INDEX is basically an index of a row within a file. E.g. 5th row in the 
file will have ROW_INDEX 5.

 

Use cases: 

Row Indexes can be used in a variety of ways. A (fileName, rowIndex) tuple 
uniquely identifies row in a table. This information can be used to mark rows. 
An Index can be easily created using the Row Indexes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37936) Use error classes in the parsing errors of intervals

2022-01-21 Thread Senthil Kumar (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480099#comment-17480099
 ] 

Senthil Kumar commented on SPARK-37936:
---

[~maxgekk] ,  I have queries which matches 

 
 * invalidIntervalFormError - "SELECT INTERVAL '1 DAY 2' HOUR"

 * fromToIntervalUnsupportedError - "SELECT extract(MONTH FROM INTERVAL 
'2021-11' YEAR TO DAY)"

it will be helpful if you share queries for below scenarios,
 * moreThanOneFromToUnitInIntervalLiteralError
 * invalidIntervalLiteralError

 * invalidFromToUnitValueError
 * mixedIntervalUnitsError

> Use error classes in the parsing errors of intervals
> 
>
> Key: SPARK-37936
> URL: https://issues.apache.org/jira/browse/SPARK-37936
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Modify the following methods in QueryParsingErrors:
>  * moreThanOneFromToUnitInIntervalLiteralError
>  * invalidIntervalLiteralError
>  * invalidIntervalFormError
>  * invalidFromToUnitValueError
>  * fromToIntervalUnsupportedError
>  * mixedIntervalUnitsError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryParsingErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37907) StaticInvoke should support ConstantFolding

2022-01-21 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37907:
---

Assignee: angerszhu

> StaticInvoke should support ConstantFolding
> ---
>
> Key: SPARK-37907
> URL: https://issues.apache.org/jira/browse/SPARK-37907
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> StaticInvoke not implement folderable, should support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37907) StaticInvoke should support ConstantFolding

2022-01-21 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37907.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35207
[https://github.com/apache/spark/pull/35207]

> StaticInvoke should support ConstantFolding
> ---
>
> Key: SPARK-37907
> URL: https://issues.apache.org/jira/browse/SPARK-37907
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> StaticInvoke not implement folderable, should support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37950) Take EXTERNAL as a reserved table property

2022-01-21 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37950.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35268
[https://github.com/apache/spark/pull/35268]

> Take EXTERNAL as a reserved table property
> --
>
> Key: SPARK-37950
> URL: https://issues.apache.org/jira/browse/SPARK-37950
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: PengLei
>Priority: Major
> Fix For: 3.3.0
>
>
> At now. the {{EXTERNAL}} is not table reserved property. we should make 
> {{EXTERNAL}} a truly reserved property 
> [discuss|https://github.com/apache/spark/pull/35204#issuecomment-1014752053]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37950) Take EXTERNAL as a reserved table property

2022-01-21 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37950:
---

Assignee: PengLei

> Take EXTERNAL as a reserved table property
> --
>
> Key: SPARK-37950
> URL: https://issues.apache.org/jira/browse/SPARK-37950
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: PengLei
>Priority: Major
>
> At now. the {{EXTERNAL}} is not table reserved property. we should make 
> {{EXTERNAL}} a truly reserved property 
> [discuss|https://github.com/apache/spark/pull/35204#issuecomment-1014752053]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37979) Switch to more generic error classes in AES functions

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480021#comment-17480021
 ] 

Apache Spark commented on SPARK-37979:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/35272

> Switch to more generic error classes in AES functions
> -
>
> Key: SPARK-37979
> URL: https://issues.apache.org/jira/browse/SPARK-37979
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Switch from existing error classes to more generic in AES functions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37979) Switch to more generic error classes in AES functions

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37979:


Assignee: Apache Spark  (was: Max Gekk)

> Switch to more generic error classes in AES functions
> -
>
> Key: SPARK-37979
> URL: https://issues.apache.org/jira/browse/SPARK-37979
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Switch from existing error classes to more generic in AES functions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37979) Switch to more generic error classes in AES functions

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480020#comment-17480020
 ] 

Apache Spark commented on SPARK-37979:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/35272

> Switch to more generic error classes in AES functions
> -
>
> Key: SPARK-37979
> URL: https://issues.apache.org/jira/browse/SPARK-37979
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Switch from existing error classes to more generic in AES functions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37979) Switch to more generic error classes in AES functions

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37979:


Assignee: Max Gekk  (was: Apache Spark)

> Switch to more generic error classes in AES functions
> -
>
> Key: SPARK-37979
> URL: https://issues.apache.org/jira/browse/SPARK-37979
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Switch from existing error classes to more generic in AES functions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37979) Switch to more generic error classes in AES functions

2022-01-21 Thread Max Gekk (Jira)

Max Gekk created SPARK-37979:


 Summary: Switch to more generic error classes in AES functions
 Key: SPARK-37979
 URL: https://issues.apache.org/jira/browse/SPARK-37979
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Max Gekk
Assignee: Max Gekk


Switch from existing error classes to more generic in AES functions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37972) Typing incompatibilities with numpy==1.22.x

2022-01-21 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz resolved SPARK-37972.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35261
[https://github.com/apache/spark/pull/35261]

> Typing incompatibilities with numpy==1.22.x
> ---
>
> Key: SPARK-37972
> URL: https://issues.apache.org/jira/browse/SPARK-37972
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Minor
> Fix For: 3.3.0
>
>
> When type checked against {{numpy==1.22}} mypy detects following issues:
> {code:python}
> python/pyspark/mllib/linalg/__init__.py:412: error: Argument 2 to "norm" has 
> incompatible type "Union[float, str]"; expected "Union[None, float, 
> Literal['fro'], Literal['nuc']]"  [arg-type]
> python/pyspark/mllib/linalg/__init__.py:457: error: No overload variant of 
> "dot" matches argument types "ndarray[Any, Any]", "Iterable[float]"  
> [call-overload]
> python/pyspark/mllib/linalg/__init__.py:457: note: Possible overload variant:
> python/pyspark/mllib/linalg/__init__.py:457: note: def dot(a: 
> Union[_SupportsArray[dtype[Any]], 
> _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, 
> bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], b: 
> Union[_SupportsArray[dtype[Any]], 
> _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, 
> bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], out: 
> None = ...) -> Any
> python/pyspark/mllib/linalg/__init__.py:457: note: <1 more non-matching 
> overload not shown>
> python/pyspark/mllib/linalg/__init__.py:707: error: Argument 2 to "norm" has 
> incompatible type "Union[float, str]"; expected "Union[None, float, 
> Literal['fro'], Literal['nuc']]"  [arg-type]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37972) Typing incompatibilities with numpy==1.22.x

2022-01-21 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz reassigned SPARK-37972:
--

Assignee: Maciej Szymkiewicz

> Typing incompatibilities with numpy==1.22.x
> ---
>
> Key: SPARK-37972
> URL: https://issues.apache.org/jira/browse/SPARK-37972
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Minor
>
> When type checked against {{numpy==1.22}} mypy detects following issues:
> {code:python}
> python/pyspark/mllib/linalg/__init__.py:412: error: Argument 2 to "norm" has 
> incompatible type "Union[float, str]"; expected "Union[None, float, 
> Literal['fro'], Literal['nuc']]"  [arg-type]
> python/pyspark/mllib/linalg/__init__.py:457: error: No overload variant of 
> "dot" matches argument types "ndarray[Any, Any]", "Iterable[float]"  
> [call-overload]
> python/pyspark/mllib/linalg/__init__.py:457: note: Possible overload variant:
> python/pyspark/mllib/linalg/__init__.py:457: note: def dot(a: 
> Union[_SupportsArray[dtype[Any]], 
> _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, 
> bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], b: 
> Union[_SupportsArray[dtype[Any]], 
> _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, 
> bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], out: 
> None = ...) -> Any
> python/pyspark/mllib/linalg/__init__.py:457: note: <1 more non-matching 
> overload not shown>
> python/pyspark/mllib/linalg/__init__.py:707: error: Argument 2 to "norm" has 
> incompatible type "Union[float, str]"; expected "Union[None, float, 
> Literal['fro'], Literal['nuc']]"  [arg-type]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34805) PySpark loses metadata in DataFrame fields when selecting nested columns

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479989#comment-17479989
 ] 

Apache Spark commented on SPARK-34805:
--

User 'kevinwallimann' has created a pull request for this issue:
https://github.com/apache/spark/pull/35270

> PySpark loses metadata in DataFrame fields when selecting nested columns
> 
>
> Key: SPARK-34805
> URL: https://issues.apache.org/jira/browse/SPARK-34805
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Mark Ressler
>Priority: Major
> Attachments: jsonMetadataTest.py, nested_columns_metadata.scala
>
>
> For a DataFrame schema with nested StructTypes, where metadata is set for 
> fields in the schema, that metadata is lost when a DataFrame selects nested 
> fields.  For example, suppose
> {code:java}
> df.schema.fields[0].dataType.fields[0].metadata
> {code}
> returns a non-empty dictionary, then
> {code:java}
> df.select('Field0.SubField0').schema.fields[0].metadata{code}
> returns an empty dictionary, where "Field0" is the name of the first field in 
> the DataFrame and "SubField0" is the name of the first nested field under 
> "Field0".
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34805) PySpark loses metadata in DataFrame fields when selecting nested columns

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34805:


Assignee: Apache Spark

> PySpark loses metadata in DataFrame fields when selecting nested columns
> 
>
> Key: SPARK-34805
> URL: https://issues.apache.org/jira/browse/SPARK-34805
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Mark Ressler
>Assignee: Apache Spark
>Priority: Major
> Attachments: jsonMetadataTest.py, nested_columns_metadata.scala
>
>
> For a DataFrame schema with nested StructTypes, where metadata is set for 
> fields in the schema, that metadata is lost when a DataFrame selects nested 
> fields.  For example, suppose
> {code:java}
> df.schema.fields[0].dataType.fields[0].metadata
> {code}
> returns a non-empty dictionary, then
> {code:java}
> df.select('Field0.SubField0').schema.fields[0].metadata{code}
> returns an empty dictionary, where "Field0" is the name of the first field in 
> the DataFrame and "SubField0" is the name of the first nested field under 
> "Field0".
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34805) PySpark loses metadata in DataFrame fields when selecting nested columns

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34805:


Assignee: (was: Apache Spark)

> PySpark loses metadata in DataFrame fields when selecting nested columns
> 
>
> Key: SPARK-34805
> URL: https://issues.apache.org/jira/browse/SPARK-34805
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Mark Ressler
>Priority: Major
> Attachments: jsonMetadataTest.py, nested_columns_metadata.scala
>
>
> For a DataFrame schema with nested StructTypes, where metadata is set for 
> fields in the schema, that metadata is lost when a DataFrame selects nested 
> fields.  For example, suppose
> {code:java}
> df.schema.fields[0].dataType.fields[0].metadata
> {code}
> returns a non-empty dictionary, then
> {code:java}
> df.select('Field0.SubField0').schema.fields[0].metadata{code}
> returns an empty dictionary, where "Field0" is the name of the first field in 
> the DataFrame and "SubField0" is the name of the first nested field under 
> "Field0".
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34805) PySpark loses metadata in DataFrame fields when selecting nested columns

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479991#comment-17479991
 ] 

Apache Spark commented on SPARK-34805:
--

User 'kevinwallimann' has created a pull request for this issue:
https://github.com/apache/spark/pull/35270

> PySpark loses metadata in DataFrame fields when selecting nested columns
> 
>
> Key: SPARK-34805
> URL: https://issues.apache.org/jira/browse/SPARK-34805
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Mark Ressler
>Priority: Major
> Attachments: jsonMetadataTest.py, nested_columns_metadata.scala
>
>
> For a DataFrame schema with nested StructTypes, where metadata is set for 
> fields in the schema, that metadata is lost when a DataFrame selects nested 
> fields.  For example, suppose
> {code:java}
> df.schema.fields[0].dataType.fields[0].metadata
> {code}
> returns a non-empty dictionary, then
> {code:java}
> df.select('Field0.SubField0').schema.fields[0].metadata{code}
> returns an empty dictionary, where "Field0" is the name of the first field in 
> the DataFrame and "SubField0" is the name of the first nested field under 
> "Field0".
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37929) Support cascade mode for `dropNamespace` API

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479988#comment-17479988
 ] 

Apache Spark commented on SPARK-37929:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/35271

> Support cascade mode for `dropNamespace` API 
> -
>
> Key: SPARK-37929
> URL: https://issues.apache.org/jira/browse/SPARK-37929
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36649) Support Trigger.AvailableNow on Kafka data source

2022-01-21 Thread Yuanjian Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanjian Li resolved SPARK-36649.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35238
[https://github.com/apache/spark/pull/35238]

> Support Trigger.AvailableNow on Kafka data source
> -
>
> Key: SPARK-36649
> URL: https://issues.apache.org/jira/browse/SPARK-36649
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Priority: Major
> Fix For: 3.3.0
>
>
> SPARK-36533 introduces a new trigger Trigger.AvailableNow, but only 
> introduces the new functionality to the file stream source. Given that Kafka 
> data source is the one of major data sources being used in streaming query, 
> we should make Kafka data source support this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-28516) Data Type Formatting Functions: `to_char`

2022-01-21 Thread jiaan.geng (Jira)



[ https://issues.apache.org/jira/browse/SPARK-28516 ]


jiaan.geng deleted comment on SPARK-28516:


was (Author: beliefer):
I'm working on.

> Data Type Formatting Functions: `to_char`
> -
>
> Key: SPARK-28516
> URL: https://issues.apache.org/jira/browse/SPARK-28516
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Currently, Spark does not have support for `to_char`. PgSQL, however, 
> [does|[https://www.postgresql.org/docs/12/functions-formatting.html]]:
> Query example: 
> {code:sql}
> SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 
> FOLLOWING),'9D9')
> {code}
> ||Function||Return Type||Description||Example||
> |{{to_char(}}{{timestamp}}{{, }}{{text}}{{)}}|{{text}}|convert time stamp to 
> string|{{to_char(current_timestamp, 'HH12:MI:SS')}}|
> |{{to_char(}}{{interval}}{{, }}{{text}}{{)}}|{{text}}|convert interval to 
> string|{{to_char(interval '15h 2m 12s', 'HH24:MI:SS')}}|
> |{{to_char(}}{{int}}{{, }}{{text}}{{)}}|{{text}}|convert integer to 
> string|{{to_char(125, '999')}}|
> |{{to_char}}{{(}}{{double precision}}{{, }}{{text}}{{)}}|{{text}}|convert 
> real/double precision to string|{{to_char(125.8::real, '999D9')}}|
> |{{to_char(}}{{numeric}}{{, }}{{text}}{{)}}|{{text}}|convert numeric to 
> string|{{to_char(-125.8, '999D99S')}}|



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28516) Data Type Formatting Functions: `to_char`

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479903#comment-17479903
 ] 

Apache Spark commented on SPARK-28516:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/35269

> Data Type Formatting Functions: `to_char`
> -
>
> Key: SPARK-28516
> URL: https://issues.apache.org/jira/browse/SPARK-28516
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Currently, Spark does not have support for `to_char`. PgSQL, however, 
> [does|[https://www.postgresql.org/docs/12/functions-formatting.html]]:
> Query example: 
> {code:sql}
> SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 
> FOLLOWING),'9D9')
> {code}
> ||Function||Return Type||Description||Example||
> |{{to_char(}}{{timestamp}}{{, }}{{text}}{{)}}|{{text}}|convert time stamp to 
> string|{{to_char(current_timestamp, 'HH12:MI:SS')}}|
> |{{to_char(}}{{interval}}{{, }}{{text}}{{)}}|{{text}}|convert interval to 
> string|{{to_char(interval '15h 2m 12s', 'HH24:MI:SS')}}|
> |{{to_char(}}{{int}}{{, }}{{text}}{{)}}|{{text}}|convert integer to 
> string|{{to_char(125, '999')}}|
> |{{to_char}}{{(}}{{double precision}}{{, }}{{text}}{{)}}|{{text}}|convert 
> real/double precision to string|{{to_char(125.8::real, '999D9')}}|
> |{{to_char(}}{{numeric}}{{, }}{{text}}{{)}}|{{text}}|convert numeric to 
> string|{{to_char(-125.8, '999D99S')}}|



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28516) Data Type Formatting Functions: `to_char`

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28516:


Assignee: Apache Spark

> Data Type Formatting Functions: `to_char`
> -
>
> Key: SPARK-28516
> URL: https://issues.apache.org/jira/browse/SPARK-28516
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Assignee: Apache Spark
>Priority: Major
>
> Currently, Spark does not have support for `to_char`. PgSQL, however, 
> [does|[https://www.postgresql.org/docs/12/functions-formatting.html]]:
> Query example: 
> {code:sql}
> SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 
> FOLLOWING),'9D9')
> {code}
> ||Function||Return Type||Description||Example||
> |{{to_char(}}{{timestamp}}{{, }}{{text}}{{)}}|{{text}}|convert time stamp to 
> string|{{to_char(current_timestamp, 'HH12:MI:SS')}}|
> |{{to_char(}}{{interval}}{{, }}{{text}}{{)}}|{{text}}|convert interval to 
> string|{{to_char(interval '15h 2m 12s', 'HH24:MI:SS')}}|
> |{{to_char(}}{{int}}{{, }}{{text}}{{)}}|{{text}}|convert integer to 
> string|{{to_char(125, '999')}}|
> |{{to_char}}{{(}}{{double precision}}{{, }}{{text}}{{)}}|{{text}}|convert 
> real/double precision to string|{{to_char(125.8::real, '999D9')}}|
> |{{to_char(}}{{numeric}}{{, }}{{text}}{{)}}|{{text}}|convert numeric to 
> string|{{to_char(-125.8, '999D99S')}}|



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28516) Data Type Formatting Functions: `to_char`

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28516:


Assignee: (was: Apache Spark)

> Data Type Formatting Functions: `to_char`
> -
>
> Key: SPARK-28516
> URL: https://issues.apache.org/jira/browse/SPARK-28516
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Currently, Spark does not have support for `to_char`. PgSQL, however, 
> [does|[https://www.postgresql.org/docs/12/functions-formatting.html]]:
> Query example: 
> {code:sql}
> SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 
> FOLLOWING),'9D9')
> {code}
> ||Function||Return Type||Description||Example||
> |{{to_char(}}{{timestamp}}{{, }}{{text}}{{)}}|{{text}}|convert time stamp to 
> string|{{to_char(current_timestamp, 'HH12:MI:SS')}}|
> |{{to_char(}}{{interval}}{{, }}{{text}}{{)}}|{{text}}|convert interval to 
> string|{{to_char(interval '15h 2m 12s', 'HH24:MI:SS')}}|
> |{{to_char(}}{{int}}{{, }}{{text}}{{)}}|{{text}}|convert integer to 
> string|{{to_char(125, '999')}}|
> |{{to_char}}{{(}}{{double precision}}{{, }}{{text}}{{)}}|{{text}}|convert 
> real/double precision to string|{{to_char(125.8::real, '999D9')}}|
> |{{to_char(}}{{numeric}}{{, }}{{text}}{{)}}|{{text}}|convert numeric to 
> string|{{to_char(-125.8, '999D99S')}}|



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37950) Take EXTERNAL as a reserved table property

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479902#comment-17479902
 ] 

Apache Spark commented on SPARK-37950:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/35268

> Take EXTERNAL as a reserved table property
> --
>
> Key: SPARK-37950
> URL: https://issues.apache.org/jira/browse/SPARK-37950
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Priority: Major
>
> At now. the {{EXTERNAL}} is not table reserved property. we should make 
> {{EXTERNAL}} a truly reserved property 
> [discuss|https://github.com/apache/spark/pull/35204#issuecomment-1014752053]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37950) Take EXTERNAL as a reserved table property

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37950:


Assignee: (was: Apache Spark)

> Take EXTERNAL as a reserved table property
> --
>
> Key: SPARK-37950
> URL: https://issues.apache.org/jira/browse/SPARK-37950
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Priority: Major
>
> At now. the {{EXTERNAL}} is not table reserved property. we should make 
> {{EXTERNAL}} a truly reserved property 
> [discuss|https://github.com/apache/spark/pull/35204#issuecomment-1014752053]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37950) Take EXTERNAL as a reserved table property

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479901#comment-17479901
 ] 

Apache Spark commented on SPARK-37950:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/35268

> Take EXTERNAL as a reserved table property
> --
>
> Key: SPARK-37950
> URL: https://issues.apache.org/jira/browse/SPARK-37950
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Priority: Major
>
> At now. the {{EXTERNAL}} is not table reserved property. we should make 
> {{EXTERNAL}} a truly reserved property 
> [discuss|https://github.com/apache/spark/pull/35204#issuecomment-1014752053]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37950) Take EXTERNAL as a reserved table property

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37950:


Assignee: Apache Spark

> Take EXTERNAL as a reserved table property
> --
>
> Key: SPARK-37950
> URL: https://issues.apache.org/jira/browse/SPARK-37950
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: Apache Spark
>Priority: Major
>
> At now. the {{EXTERNAL}} is not table reserved property. we should make 
> {{EXTERNAL}} a truly reserved property 
> [discuss|https://github.com/apache/spark/pull/35204#issuecomment-1014752053]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37978) Remove the unnecessary ChunkFetchFailureException class

2022-01-21 Thread weixiuli (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37978:
-
Description: The ChunkFetchFailureException is unnecessary and can be 
replaced by RuntimeException.  (was: Remove the useless 
ChunkFetchFailureException class)

> Remove the unnecessary ChunkFetchFailureException class
> ---
>
> Key: SPARK-37978
> URL: https://issues.apache.org/jira/browse/SPARK-37978
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> The ChunkFetchFailureException is unnecessary and can be replaced by 
> RuntimeException.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37978) Remove the unnecessary ChunkFetchFailureException class

2022-01-21 Thread weixiuli (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37978:
-
Summary: Remove the unnecessary ChunkFetchFailureException class  (was: 
Remove the useless ChunkFetchFailureException class)

> Remove the unnecessary ChunkFetchFailureException class
> ---
>
> Key: SPARK-37978
> URL: https://issues.apache.org/jira/browse/SPARK-37978
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Remove the useless ChunkFetchFailureException class



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37978) Remove the useless ChunkFetchFailureException class

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37978:


Assignee: (was: Apache Spark)

> Remove the useless ChunkFetchFailureException class
> ---
>
> Key: SPARK-37978
> URL: https://issues.apache.org/jira/browse/SPARK-37978
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Remove the useless ChunkFetchFailureException class



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37978) Remove the useless ChunkFetchFailureException class

2022-01-21 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479881#comment-17479881
 ] 

Apache Spark commented on SPARK-37978:
--

User 'weixiuli' has created a pull request for this issue:
https://github.com/apache/spark/pull/35267

> Remove the useless ChunkFetchFailureException class
> ---
>
> Key: SPARK-37978
> URL: https://issues.apache.org/jira/browse/SPARK-37978
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> Remove the useless ChunkFetchFailureException class



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37978) Remove the useless ChunkFetchFailureException class

2022-01-21 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37978:


Assignee: Apache Spark

> Remove the useless ChunkFetchFailureException class
> ---
>
> Key: SPARK-37978
> URL: https://issues.apache.org/jira/browse/SPARK-37978
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.3, 3.1.1, 3.2.0
>Reporter: weixiuli
>Assignee: Apache Spark
>Priority: Major
>
> Remove the useless ChunkFetchFailureException class



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

60 matches

Mail list logo