date:20231106

[jira] [Resolved] (SPARK-45775) Drop table skiped when CatalogV2Util loadTable meet unexpected Exception

2023-11-06 Thread konwu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

konwu resolved SPARK-45775.
---
Resolution: Invalid

> Drop table skiped when CatalogV2Util loadTable meet unexpected Exception
> 
>
> Key: SPARK-45775
> URL: https://issues.apache.org/jira/browse/SPARK-45775
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3
> Environment: spark 3.1.3
>Reporter: konwu
>Priority: Major
>
> Currently  CatalogV2Util.loadTable method catch only NoSuch*Exception like 
> below
> {code:java}
>   def loadTable(catalog: CatalogPlugin, ident: Identifier): Option[Table] =
>     try {
>       Option(catalog.asTableCatalog.loadTable(ident))
>     } catch {
>       case _: NoSuchTableException => None
>       case _: NoSuchDatabaseException => None
>       case _: NoSuchNamespaceException => None
>     } {code}
> It will skip drop table when conmunicate with meta time out or other 
> Exception, because the method always return None, maybe we should catch it 
> like below
> {code:java}
> def loadTable(catalog: CatalogPlugin, ident: Identifier): Option[Table] =
>   try {
> Option(catalog.asTableCatalog.loadTable(ident))
>   } catch {
> case e: NoSuchTableException =>  return None
> case e: NoSuchDatabaseException =>  return None
> case e: NoSuchNamespaceException =>  return None
> case e: Throwable =>  throw e
>   } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45013) Flaky Test with NPE: track allocated resources by taskId

2023-11-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45013:
-

Assignee: Kent Yao

> Flaky Test with NPE: track allocated resources by taskId
> 
>
> Key: SPARK-45013
> URL: https://issues.apache.org/jira/browse/SPARK-45013
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> - track allocated resources by taskId *** FAILED *** (76 milliseconds)
> 28782[info]   java.lang.NullPointerException:
> 28783[info]   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:267)
> 28784[info]   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackendSuite.$anonfun$new$22(CoarseGrainedExecutorBackendSuite.scala:347)
> 28785[info]   at 
> org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
> 28786[info]   at 
> org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
> 28787[info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
> 28788[info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
> 28789[info]   at 
> org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
> 28790[info]   at 
> org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
> 28791[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> 28792[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> 28793[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 28794[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> 28795[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> 28796[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> 28797[info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)
> 28798[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> 28799[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> 28800[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> 28801[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> 28802[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> 28803[info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69)
> 28804[info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> 28805[info]   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> 28806[info]   at 
> org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69)
> 28807[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> 28808[info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45013) Flaky Test with NPE: track allocated resources by taskId

2023-11-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45013.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43693
[https://github.com/apache/spark/pull/43693]

> Flaky Test with NPE: track allocated resources by taskId
> 
>
> Key: SPARK-45013
> URL: https://issues.apache.org/jira/browse/SPARK-45013
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
> - track allocated resources by taskId *** FAILED *** (76 milliseconds)
> 28782[info]   java.lang.NullPointerException:
> 28783[info]   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:267)
> 28784[info]   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackendSuite.$anonfun$new$22(CoarseGrainedExecutorBackendSuite.scala:347)
> 28785[info]   at 
> org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
> 28786[info]   at 
> org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
> 28787[info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
> 28788[info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
> 28789[info]   at 
> org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
> 28790[info]   at 
> org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
> 28791[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> 28792[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> 28793[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 28794[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> 28795[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> 28796[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> 28797[info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)
> 28798[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> 28799[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> 28800[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> 28801[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> 28802[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> 28803[info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69)
> 28804[info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> 28805[info]   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> 28806[info]   at 
> org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69)
> 28807[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> 28808[info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45816) Return null when overflowing during casting from timestamp to integers

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45816:
---
Labels: pull-request-available  (was: )

> Return null when overflowing during casting from timestamp to integers
> --
>
> Key: SPARK-45816
> URL: https://issues.apache.org/jira/browse/SPARK-45816
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>
> Spark cast works in two modes: ansi and non-ansi. When overflowing during 
> casting, the common behavior under non-ansi mode is to return null. However, 
> casting from Timestamp to Int/Short/Byte returns a wrapping value now. The 
> behavior to silently overflow doesn't make sense.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45816) Return null when overflowing during casting from timestamp to integers

2023-11-06 Thread L. C. Hsieh (Jira)

L. C. Hsieh created SPARK-45816:
---

 Summary: Return null when overflowing during casting from 
timestamp to integers
 Key: SPARK-45816
 URL: https://issues.apache.org/jira/browse/SPARK-45816
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 3.4.1, 3.3.3
Reporter: L. C. Hsieh


Spark cast works in two modes: ansi and non-ansi. When overflowing during 
casting, the common behavior under non-ansi mode is to return null. However, 
casting from Timestamp to Int/Short/Byte returns a wrapping value now. The 
behavior to silently overflow doesn't make sense.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45013) Flaky Test with NPE: track allocated resources by taskId

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45013:
---
Labels: pull-request-available  (was: )

> Flaky Test with NPE: track allocated resources by taskId
> 
>
> Key: SPARK-45013
> URL: https://issues.apache.org/jira/browse/SPARK-45013
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> - track allocated resources by taskId *** FAILED *** (76 milliseconds)
> 28782[info]   java.lang.NullPointerException:
> 28783[info]   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:267)
> 28784[info]   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackendSuite.$anonfun$new$22(CoarseGrainedExecutorBackendSuite.scala:347)
> 28785[info]   at 
> org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
> 28786[info]   at 
> org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
> 28787[info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
> 28788[info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
> 28789[info]   at 
> org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
> 28790[info]   at 
> org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
> 28791[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> 28792[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> 28793[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 28794[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> 28795[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> 28796[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> 28797[info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)
> 28798[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> 28799[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> 28800[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> 28801[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> 28802[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> 28803[info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69)
> 28804[info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> 28805[info]   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> 28806[info]   at 
> org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69)
> 28807[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> 28808[info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45812) Upgrade Pandas to 2.1.2

2023-11-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45812.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43689
[https://github.com/apache/spark/pull/43689]

> Upgrade Pandas to 2.1.2
> ---
>
> Key: SPARK-45812
> URL: https://issues.apache.org/jira/browse/SPARK-45812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45804) Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off

2023-11-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45804.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43674
[https://github.com/apache/spark/pull/43674]

> Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off
> -
>
> Key: SPARK-45804
> URL: https://issues.apache.org/jira/browse/SPARK-45804
> Project: Spark
>  Issue Type: Sub-task
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45804) Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off

2023-11-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45804:
-

Assignee: Kent Yao

> Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off
> -
>
> Key: SPARK-45804
> URL: https://issues.apache.org/jira/browse/SPARK-45804
> Project: Spark
>  Issue Type: Sub-task
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45815) Provide an interface for Streaming sources to add _metadata columns

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45815:
---
Labels: pull-request-available  (was: )

> Provide an interface for Streaming sources to add _metadata columns
> ---
>
> Key: SPARK-45815
> URL: https://issues.apache.org/jira/browse/SPARK-45815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Structured Streaming
>Affects Versions: 3.5.1
>Reporter: Yaohua Zhao
>Priority: Major
>  Labels: pull-request-available
>
> Currently, only the native V1 file-based streaming source can read the 
> `_metadata` column: 
> [https://github.com/apache/spark/blob/370870b7a0303e4a2c4b3dea1b479b4fcbc93f8d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala#L63]
>  
> Our goal is to create an interface that allows other streaming sources to add 
> `{{{}_metadata`{}}} columns. For instance, we would like the Delta Streaming 
> source, which you can find here: 
> [https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L49],
>  to extend this interface and provide the `{{{}_metadata`{}}} column for its 
> underlying storage format, such as Parquet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45815) Provide an interface for Streaming sources to add _metadata columns

2023-11-06 Thread Yaohua Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yaohua Zhao updated SPARK-45815:

Component/s: Structured Streaming

> Provide an interface for Streaming sources to add _metadata columns
> ---
>
> Key: SPARK-45815
> URL: https://issues.apache.org/jira/browse/SPARK-45815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Structured Streaming
>Affects Versions: 3.5.1
>Reporter: Yaohua Zhao
>Priority: Major
>
> Currently, only the native V1 file-based streaming source can read the 
> `_metadata` column: 
> [https://github.com/apache/spark/blob/370870b7a0303e4a2c4b3dea1b479b4fcbc93f8d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala#L63]
>  
> Our goal is to create an interface that allows other streaming sources to add 
> `{{{}_metadata`{}}} columns. For instance, we would like the Delta Streaming 
> source, which you can find here: 
> [https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L49],
>  to extend this interface and provide the `{{{}_metadata`{}}} column for its 
> underlying storage format, such as Parquet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45815) Provide an interface for Streaming sources to add _metadata columns

2023-11-06 Thread Yaohua Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yaohua Zhao updated SPARK-45815:

Description: 
Currently, only the native V1 file-based streaming source can read the 
`_metadata` column: 
[https://github.com/apache/spark/blob/370870b7a0303e4a2c4b3dea1b479b4fcbc93f8d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala#L63]

 

Our goal is to create an interface that allows other streaming sources to add 
`{{{}_metadata`{}}} columns. For instance, we would like the Delta Streaming 
source, which you can find here: 
[https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L49],
 to extend this interface and provide the `{{{}_metadata`{}}} column for its 
underlying storage format, such as Parquet.

  was:
Currently, only the native V1 file-based streaming source can read the 
`_metadata`{{{}{}}} column: 
https://github.com/apache/spark/blob/370870b7a0303e4a2c4b3dea1b479b4fcbc93f8d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala#L63

 

Our goal is to create an interface that allows other streaming sources to add 
`{{{}_metadata`{}}} columns. For instance, we would like the Delta Streaming 
source, which you can find here: 
[https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L49],
 to extend this interface and provide the `{{{}_metadata`{}}} column for its 
underlying storage format, such as Parquet.


> Provide an interface for Streaming sources to add _metadata columns
> ---
>
> Key: SPARK-45815
> URL: https://issues.apache.org/jira/browse/SPARK-45815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Yaohua Zhao
>Priority: Major
>
> Currently, only the native V1 file-based streaming source can read the 
> `_metadata` column: 
> [https://github.com/apache/spark/blob/370870b7a0303e4a2c4b3dea1b479b4fcbc93f8d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala#L63]
>  
> Our goal is to create an interface that allows other streaming sources to add 
> `{{{}_metadata`{}}} columns. For instance, we would like the Delta Streaming 
> source, which you can find here: 
> [https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L49],
>  to extend this interface and provide the `{{{}_metadata`{}}} column for its 
> underlying storage format, such as Parquet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45815) Provide an interface for Streaming sources to add _metadata columns

2023-11-06 Thread Yaohua Zhao (Jira)

Yaohua Zhao created SPARK-45815:
---

 Summary: Provide an interface for Streaming sources to add 
_metadata columns
 Key: SPARK-45815
 URL: https://issues.apache.org/jira/browse/SPARK-45815
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.1
Reporter: Yaohua Zhao


Currently, only the native V1 file-based streaming source can read the 
`_metadata`{{{}{}}} column: 
https://github.com/apache/spark/blob/370870b7a0303e4a2c4b3dea1b479b4fcbc93f8d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala#L63

 

Our goal is to create an interface that allows other streaming sources to add 
`{{{}_metadata`{}}} columns. For instance, we would like the Delta Streaming 
source, which you can find here: 
[https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L49],
 to extend this interface and provide the `{{{}_metadata`{}}} column for its 
underlying storage format, such as Parquet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45811) XML: Refine docstring of from_xml

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45811:
---
Labels: pull-request-available  (was: )

> XML: Refine docstring of from_xml
> -
>
> Key: SPARK-45811
> URL: https://issues.apache.org/jira/browse/SPARK-45811
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45814) ArrowConverters.createEmptyArrowBatch may cause memory leak

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45814:
---
Labels: pull-request-available  (was: )

> ArrowConverters.createEmptyArrowBatch may cause memory leak
> ---
>
> Key: SPARK-45814
> URL: https://issues.apache.org/jira/browse/SPARK-45814
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: xie shuiahu
>Priority: Minor
>  Labels: pull-request-available
>
> ArrowConverters.createEmptyArrowBatch don't call hasNext, if TaskContext.get 
> is None, then memory leak happens



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45814) ArrowConverters.createEmptyArrowBatch may cause memory leak

2023-11-06 Thread xie shuiahu (Jira)

xie shuiahu created SPARK-45814:
---

 Summary: ArrowConverters.createEmptyArrowBatch may cause memory 
leak
 Key: SPARK-45814
 URL: https://issues.apache.org/jira/browse/SPARK-45814
 Project: Spark
  Issue Type: Bug
  Components: Connect, SQL
Affects Versions: 3.5.0, 3.4.1
Reporter: xie shuiahu


ArrowConverters.createEmptyArrowBatch don't call hasNext, if TaskContext.get is 
None, then memory leak happens



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45813) Return the observed metrics from commands

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45813:
---
Labels: pull-request-available  (was: )

> Return the observed metrics from commands
> -
>
> Key: SPARK-45813
> URL: https://issues.apache.org/jira/browse/SPARK-45813
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34444) Pushdown scalar-subquery filter to FileSourceScan

2023-11-06 Thread XiDuo You (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You resolved SPARK-3.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> Pushdown scalar-subquery filter to FileSourceScan
> -
>
> Key: SPARK-3
> URL: https://issues.apache.org/jira/browse/SPARK-3
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
> Fix For: 4.0.0
>
>
> We can pushdown {{a < (select max(d) from t2)}} to FileSourceScan:
> {code:scala}
> sql("CREATE TABLE t1 using parquet AS SELECT id AS a, id AS b FROM 
> range(5L)")
> sql("CREATE TABLE t2 using parquet AS SELECT id AS d FROM range(20)")
> sql("SELECT * FROM t1 WHERE b = (select max(d) from t2)").show
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45813) Return the observed metrics from commands

2023-11-06 Thread Takuya Ueshin (Jira)

Takuya Ueshin created SPARK-45813:
-

 Summary: Return the observed metrics from commands
 Key: SPARK-45813
 URL: https://issues.apache.org/jira/browse/SPARK-45813
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45812) Upgrade Pandas to 2.1.2

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45812:
---
Labels: pull-request-available  (was: )

> Upgrade Pandas to 2.1.2
> ---
>
> Key: SPARK-45812
> URL: https://issues.apache.org/jira/browse/SPARK-45812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45812) Upgrade Pandas to 2.1.2

2023-11-06 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-45812:
---

 Summary: Upgrade Pandas to 2.1.2
 Key: SPARK-45812
 URL: https://issues.apache.org/jira/browse/SPARK-45812
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 4.0.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45223) Refine docstring of `Column.when`

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45223:
---
Labels: pull-request-available  (was: )

> Refine docstring of `Column.when`
> -
>
> Key: SPARK-45223
> URL: https://issues.apache.org/jira/browse/SPARK-45223
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Refine the docstring of Column.when 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44728) Improve PySpark documentations

2023-11-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-44728:
-
Fix Version/s: (was: 4.0.0)

> Improve PySpark documentations
> --
>
> Key: SPARK-44728
> URL: https://issues.apache.org/jira/browse/SPARK-44728
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> An umbrella Jira ticket to improve the PySpark documentation.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45803) Remove the no longer used `RpcAbortException`.

2023-11-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45803.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43673
[https://github.com/apache/spark/pull/43673]

> Remove the no longer used `RpcAbortException`.
> --
>
> Key: SPARK-45803
> URL: https://issues.apache.org/jira/browse/SPARK-45803
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45803) Remove the no longer used `RpcAbortException`.

2023-11-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45803:
-

Assignee: Yang Jie

> Remove the no longer used `RpcAbortException`.
> --
>
> Key: SPARK-45803
> URL: https://issues.apache.org/jira/browse/SPARK-45803
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45222) Refine docstring of `DataFrameReader.json`

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45222:
---
Labels: pull-request-available  (was: )

> Refine docstring of `DataFrameReader.json`
> --
>
> Key: SPARK-45222
> URL: https://issues.apache.org/jira/browse/SPARK-45222
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Refine the docstring of read json



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45260) Refine docstring of count_distinct

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45260:
---
Labels: pull-request-available  (was: )

> Refine docstring of count_distinct
> --
>
> Key: SPARK-45260
> URL: https://issues.apache.org/jira/browse/SPARK-45260
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Refine docstring of the function `count_distinct`, (e.g provide examples with 
> groupBy)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45259) Refine docstring of `count`

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45259:
---
Labels: pull-request-available  (was: )

> Refine docstring of `count`
> ---
>
> Key: SPARK-45259
> URL: https://issues.apache.org/jira/browse/SPARK-45259
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Refine the docstring of the function `count` (e.g provide examples with 
> groupby)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45805) Eliminate magic numbers in withOrigin

2023-11-06 Thread Peter Toth (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth resolved SPARK-45805.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43671
[https://github.com/apache/spark/pull/43671]

> Eliminate magic numbers in withOrigin
> -
>
> Key: SPARK-45805
> URL: https://issues.apache.org/jira/browse/SPARK-45805
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Refactor `withOrigin`, and make it more generic by eliminating the magic 
> number from which the traverse of stack traces starts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45258) Refine docstring of `sum`

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45258:
---
Labels: pull-request-available  (was: )

> Refine docstring of `sum`
> -
>
> Key: SPARK-45258
> URL: https://issues.apache.org/jira/browse/SPARK-45258
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Refine the docstring of function `sum` (e.g provide examples with groupBy)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45810) Create API to stop consuming rows from the input table

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45810:
---
Labels: pull-request-available  (was: )

> Create API to stop consuming rows from the input table
> --
>
> Key: SPARK-45810
> URL: https://issues.apache.org/jira/browse/SPARK-45810
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45186) XML: Refine docstring of schema_of_xml

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45186:
---
Labels: pull-request-available  (was: )

> XML: Refine docstring of schema_of_xml
> --
>
> Key: SPARK-45186
> URL: https://issues.apache.org/jira/browse/SPARK-45186
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45186) XML: Refine docstring of schema_of_xml

2023-11-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-45186:
-
Summary: XML: Refine docstring of schema_of_xml  (was: XML: Refine 
docstring of from_xml, schema_of_xml)

> XML: Refine docstring of schema_of_xml
> --
>
> Key: SPARK-45186
> URL: https://issues.apache.org/jira/browse/SPARK-45186
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45811) XML: Refine docstring of from_xml

2023-11-06 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45811:


 Summary: XML: Refine docstring of from_xml
 Key: SPARK-45811
 URL: https://issues.apache.org/jira/browse/SPARK-45811
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Sandip Agarwala






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45810) Create API to stop consuming rows from the input table

2023-11-06 Thread Daniel (Jira)

Daniel created SPARK-45810:
--

 Summary: Create API to stop consuming rows from the input table
 Key: SPARK-45810
 URL: https://issues.apache.org/jira/browse/SPARK-45810
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Daniel






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45809) Refine docstring of `lit`

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45809:
---
Labels: pull-request-available  (was: )

> Refine docstring of `lit`
> -
>
> Key: SPARK-45809
> URL: https://issues.apache.org/jira/browse/SPARK-45809
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45809) Refine docstring of `lit`

2023-11-06 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45809:


 Summary: Refine docstring of `lit`
 Key: SPARK-45809
 URL: https://issues.apache.org/jira/browse/SPARK-45809
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45808) Improve error details for Spark Connect Client in Python

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45808:
---
Labels: pull-request-available  (was: )

> Improve error details for Spark Connect Client in Python
> 
>
> Key: SPARK-45808
> URL: https://issues.apache.org/jira/browse/SPARK-45808
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Martin Grund
>Priority: Major
>  Labels: pull-request-available
>
> Improve the error handling in Spark Connect Python Client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45808) Improve error details for Spark Connect Client in Python

2023-11-06 Thread Martin Grund (Jira)

Martin Grund created SPARK-45808:


 Summary: Improve error details for Spark Connect Client in Python
 Key: SPARK-45808
 URL: https://issues.apache.org/jira/browse/SPARK-45808
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Martin Grund


Improve the error handling in Spark Connect Python Client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45773) Refine docstring of `SparkSession.builder.config`

2023-11-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45773:


Assignee: Allison Wang

> Refine docstring of `SparkSession.builder.config`
> -
>
> Key: SPARK-45773
> URL: https://issues.apache.org/jira/browse/SPARK-45773
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Refine the docstring of SparkSession.builder.config
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45773) Refine docstring of `SparkSession.builder.config`

2023-11-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45773.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43639
[https://github.com/apache/spark/pull/43639]

> Refine docstring of `SparkSession.builder.config`
> -
>
> Key: SPARK-45773
> URL: https://issues.apache.org/jira/browse/SPARK-45773
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Refine the docstring of SparkSession.builder.config
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45786) Inaccurate Decimal multiplication and division results

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45786:
---
Labels: pull-request-available  (was: )

> Inaccurate Decimal multiplication and division results
> --
>
> Key: SPARK-45786
> URL: https://issues.apache.org/jira/browse/SPARK-45786
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.4, 3.3.3, 3.4.1, 3.5.0, 4.0.0
>Reporter: Kazuyuki Tanimura
>Priority: Major
>  Labels: pull-request-available
>
> Decimal multiplication and division results may be inaccurate due to rounding 
> issues.
> h2. Multiplication:
> {code:scala}
> scala> sql("select  -14120025096157587712113961295153.858047 * 
> -0.4652").show(truncate=false)
> ++
>   
> |(-14120025096157587712113961295153.858047 * -0.4652)|
> ++
> |6568635674732509803675414794505.574764  |
> ++
> {code}
> The correct answer is
> {quote}6568635674732509803675414794505.574763
> {quote}
> Please note that the last digit is 3 instead of 4 as
>  
> {code:scala}
> scala> 
> java.math.BigDecimal("-14120025096157587712113961295153.858047").multiply(java.math.BigDecimal("-0.4652"))
> val res21: java.math.BigDecimal = 6568635674732509803675414794505.5747634644
> {code}
> Since the factional part .574763 is followed by 4644, it should not be 
> rounded up.
> h2. Division:
> {code:scala}
> scala> sql("select -0.172787979 / 
> 533704665545018957788294905796.5").show(truncate=false)
> +-+
> |(-0.172787979 / 533704665545018957788294905796.5)|
> +-+
> |-3.237521E-31|
> +-+
> {code}
> The correct answer is
> {quote}-3.237520E-31
> {quote}
> Please note that the last digit is 0 instead of 1 as
>  
> {code:scala}
> scala> 
> java.math.BigDecimal("-0.172787979").divide(java.math.BigDecimal("533704665545018957788294905796.5"),
>  100, java.math.RoundingMode.DOWN)
> val res22: java.math.BigDecimal = 
> -3.237520489418037889998826491401059986665344697406144511563561222578738E-31
> {code}
> Since the factional part .237520 is followed by 4894..., it should not be 
> rounded up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43402) FileSourceScanExec supports push down data filter with scalar subquery

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43402:
---
Labels: pull-request-available  (was: )

> FileSourceScanExec supports push down data filter with scalar subquery
> --
>
> Key: SPARK-43402
> URL: https://issues.apache.org/jira/browse/SPARK-43402
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Scalar subquery can be pushed down as data filter at runtime, since we always 
> execute subquery first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45801) Introduce two helper methods for `QueryTest` that accept the `Array` type `expectedAnswer`

2023-11-06 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45801.
--
Resolution: Won't Fix

> Introduce two helper methods for `QueryTest` that accept the `Array` type 
> `expectedAnswer`
> --
>
> Key: SPARK-45801
> URL: https://issues.apache.org/jira/browse/SPARK-45801
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>
> They are used to bridge
> {code:java}
> protected def checkAnswer(df: => DataFrame, expectedAnswer: Seq[Row]): Unit 
> {code}
> and
> {code:java}
> def checkAnswer(df: DataFrame, expectedAnswer: Seq[Row], checkToRDD: Boolean 
> = true): Unit {code}
> to reduce compilation warnings related to `method 
> copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45807) DataSourceV2: Improve ViewCatalog API

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45807:
---
Labels: pull-request-available  (was: )

> DataSourceV2: Improve ViewCatalog API
> -
>
> Key: SPARK-45807
> URL: https://issues.apache.org/jira/browse/SPARK-45807
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eduard Tudenhoefner
>Priority: Major
>  Labels: pull-request-available
>
> The goal is to add createOrReplaceView(..) and replaceView(..) methods to the 
> ViewCatalog API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45807) DataSourceV2: Improve ViewCatalog API

2023-11-06 Thread Eduard Tudenhoefner (Jira)

Eduard Tudenhoefner created SPARK-45807:
---

 Summary: DataSourceV2: Improve ViewCatalog API
 Key: SPARK-45807
 URL: https://issues.apache.org/jira/browse/SPARK-45807
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Eduard Tudenhoefner


The goal is to add createOrReplaceView(..) and replaceView(..) methods to the 
ViewCatalog API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45806) GROUP BY ALL don't work in ansi mode

2023-11-06 Thread XiaozongCui (Jira)

XiaozongCui created SPARK-45806:
---

 Summary: GROUP BY ALL don't work in ansi mode
 Key: SPARK-45806
 URL: https://issues.apache.org/jira/browse/SPARK-45806
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 3.4.0
Reporter: XiaozongCui


I noticed that we treat 'ALL' in 'GROUP BY ALL' as an Identifier in parsing, 
and this will cause problem when I turn on ansi keyword behavior
*set spark.sql.ansi.enabled=true;*
*set spark.sql.ansi.enforceReservedKeywords=true;*

spark-sql (default)> select a,b,c, count(*) from values(1,2,3)t(a,b,c) group by 
all;

[PARSE_SYNTAX_ERROR] Syntax error at or near 'all'.(line 1, pos 59)

== SQL ==
select a,b,c, count(*) from values(1,2,3)t(a,b,c) group by all
---^^^

can we allow this reserved keyword in ansi mode ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45805) Eliminate magic numbers in withOrigin

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45805:
---
Labels: pull-request-available  (was: )

> Eliminate magic numbers in withOrigin
> -
>
> Key: SPARK-45805
> URL: https://issues.apache.org/jira/browse/SPARK-45805
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> Refactor `withOrigin`, and make it more generic by eliminating the magic 
> number from which the traverse of stack traces starts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45793) Improve the built-in compression codecs

2023-11-06 Thread Jiaan Geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng resolved SPARK-45793.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43659
[https://github.com/apache/spark/pull/43659]

> Improve the built-in compression codecs
> ---
>
> Key: SPARK-45793
> URL: https://issues.apache.org/jira/browse/SPARK-45793
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, Spark supported many built-in compression codecs used for I/O and 
> storage.
> There are a lot of magic strings copy from built-in compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45805) Eliminate magic numbers in withOrigin

2023-11-06 Thread Max Gekk (Jira)

Max Gekk created SPARK-45805:


 Summary: Eliminate magic numbers in withOrigin
 Key: SPARK-45805
 URL: https://issues.apache.org/jira/browse/SPARK-45805
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Max Gekk
Assignee: Max Gekk


Refactor `withOrigin`, and make it more generic by eliminating the magic number 
from which the traverse of stack traces starts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45804) Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45804:
---
Labels: pull-request-available  (was: )

> Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off
> -
>
> Key: SPARK-45804
> URL: https://issues.apache.org/jira/browse/SPARK-45804
> Project: Spark
>  Issue Type: Sub-task
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45804) Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off

2023-11-06 Thread Kent Yao (Jira)

Kent Yao created SPARK-45804:


 Summary: Add spark.ui.threadDump.flamegraphEnabled config to 
switch flame graph on/off
 Key: SPARK-45804
 URL: https://issues.apache.org/jira/browse/SPARK-45804
 Project: Spark
  Issue Type: Sub-task
  Components: UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45758) Introduce a mapper for hadoop compression codecs

2023-11-06 Thread Jiaan Geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng resolved SPARK-45758.

Resolution: Resolved

> Introduce a mapper for hadoop compression codecs
> 
>
> Key: SPARK-45758
> URL: https://issues.apache.org/jira/browse/SPARK-45758
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark supported partial Hadoop compression codecs, but the Hadoop 
> supported compression codecs and spark supported are not completely 
> one-on-one due to Spark introduce two fake compression codecs none and 
> uncompress.
> There are a lot of magic strings copy from Hadoop compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45758) Introduce a mapper for hadoop compression codecs

2023-11-06 Thread Jiaan Geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783169#comment-17783169
 ] 

Jiaan Geng commented on SPARK-45758:


Resolved by https://github.com/apache/spark/pull/43620

> Introduce a mapper for hadoop compression codecs
> 
>
> Key: SPARK-45758
> URL: https://issues.apache.org/jira/browse/SPARK-45758
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark supported partial Hadoop compression codecs, but the Hadoop 
> supported compression codecs and spark supported are not completely 
> one-on-one due to Spark introduce two fake compression codecs none and 
> uncompress.
> There are a lot of magic strings copy from Hadoop compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41635) GROUP BY ALL

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-41635:
---
Labels: pull-request-available  (was: )

> GROUP BY ALL
> 
>
> Key: SPARK-41635
> URL: https://issues.apache.org/jira/browse/SPARK-41635
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> This patch implements GROUP BY ALL, similar to the one initially implemented 
> in DuckDB. When specified, the analyzer automatically infers the grouping 
> columns based on the expressions specified in the select clause: all 
> expressions that don't include any aggregate expressions are pulled 
> implicitly into the grouping columns. This avoids users having to specify 
> individually the list of grouping columns in most cases.
> Examples: 
> {noformat}
> select key, count, sum(score) from table group by all
> -- rewritten to
> select key, count, sum(score) from table group by key{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45798) Assert server-side session ID in Spark Connect

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45798:
--

Assignee: (was: Apache Spark)

> Assert server-side session ID in Spark Connect
> --
>
> Key: SPARK-45798
> URL: https://issues.apache.org/jira/browse/SPARK-45798
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Martin Grund
>Priority: Major
>  Labels: pull-request-available
>
> When accessing the Spark Session remotely, it is possible that the server has 
> silently restarted and we loose temporary state like for example views or 
> function definitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45798) Assert server-side session ID in Spark Connect

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45798:
--

Assignee: Apache Spark

> Assert server-side session ID in Spark Connect
> --
>
> Key: SPARK-45798
> URL: https://issues.apache.org/jira/browse/SPARK-45798
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> When accessing the Spark Session remotely, it is possible that the server has 
> silently restarted and we loose temporary state like for example views or 
> function definitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45556) Inconsistent status code between web page and REST API when exception is thrown

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45556:
--

Assignee: (was: Apache Spark)

> Inconsistent status code between web page and REST API when exception is 
> thrown
> ---
>
> Key: SPARK-45556
> URL: https://issues.apache.org/jira/browse/SPARK-45556
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.5.0
>Reporter: wy
>Priority: Minor
>  Labels: pull-request-available
>
> Spark history server provides 
> [AppHistoryServerPlugin|https://github.com/kuwii/spark/blob/dev/status-code/core/src/main/scala/org/apache/spark/status/AppHistoryServerPlugin.scala]
>  to add extra REST API and web pages. However there's an issue when 
> exceptions are thrown, causing incnosistent status code between web page and 
> REST API.
> For REST API, if the thrown exception is an instance of 
> WebApplicationException, then the status code will be set as the one defined 
> within the exception.
> However for web page, all exceptions are wrapped within a 500 response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45556) Inconsistent status code between web page and REST API when exception is thrown

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45556:
--

Assignee: Apache Spark

> Inconsistent status code between web page and REST API when exception is 
> thrown
> ---
>
> Key: SPARK-45556
> URL: https://issues.apache.org/jira/browse/SPARK-45556
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.5.0
>Reporter: wy
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>
> Spark history server provides 
> [AppHistoryServerPlugin|https://github.com/kuwii/spark/blob/dev/status-code/core/src/main/scala/org/apache/spark/status/AppHistoryServerPlugin.scala]
>  to add extra REST API and web pages. However there's an issue when 
> exceptions are thrown, causing incnosistent status code between web page and 
> REST API.
> For REST API, if the thrown exception is an instance of 
> WebApplicationException, then the status code will be set as the one defined 
> within the exception.
> However for web page, all exceptions are wrapped within a 500 response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45527) Task fraction resource request is not expected

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45527:
--

Assignee: Apache Spark

> Task fraction resource request is not expected
> --
>
> Key: SPARK-45527
> URL: https://issues.apache.org/jira/browse/SPARK-45527
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1, 3.3.3, 3.4.1, 3.5.0
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
>  
> {code:java}
> test("SPARK-XXX") {
>   import org.apache.spark.resource.{ResourceProfileBuilder, 
> TaskResourceRequests}
>   withTempDir { dir =>
> val scriptPath = createTempScriptWithExpectedOutput(dir, 
> "gpuDiscoveryScript",
>   """{"name": "gpu","addresses":["0"]}""")
> val conf = new SparkConf()
>   .setAppName("test")
>   .setMaster("local-cluster[1, 12, 1024]")
>   .set("spark.executor.cores", "12")
> conf.set(TASK_GPU_ID.amountConf, "0.08")
> conf.set(WORKER_GPU_ID.amountConf, "1")
> conf.set(WORKER_GPU_ID.discoveryScriptConf, scriptPath)
> conf.set(EXECUTOR_GPU_ID.amountConf, "1")
> sc = new SparkContext(conf)
> val rdd = sc.range(0, 100, 1, 4)
> var rdd1 = rdd.repartition(3)
> val treqs = new TaskResourceRequests().cpus(1).resource("gpu", 1.0)
> val rp = new ResourceProfileBuilder().require(treqs).build
> rdd1 = rdd1.withResources(rp)
> assert(rdd1.collect().size === 100)
>   }
> } {code}
> In the above test, the 3 tasks generated by rdd1 are expected to be executed 
> in sequence as we expect "new TaskResourceRequests().cpus(1).resource("gpu", 
> 1.0)" should override "conf.set(TASK_GPU_ID.amountConf, "0.08")". However, 
> those 3 tasks are run in parallel in fact.
> The root cause is that ExecutorData#ExecutorResourceInfo#numParts is static. 
> In this case, the "gpu.numParts" is initialized with 12 (1/0.08) and won't 
> change even if there's a new task resource request (e.g., resource("gpu", 
> 1.0) in this case). Thus, those 3 tasks are able to be executed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45527) Task fraction resource request is not expected

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45527:
--

Assignee: (was: Apache Spark)

> Task fraction resource request is not expected
> --
>
> Key: SPARK-45527
> URL: https://issues.apache.org/jira/browse/SPARK-45527
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1, 3.3.3, 3.4.1, 3.5.0
>Reporter: wuyi
>Priority: Major
>  Labels: pull-request-available
>
>  
> {code:java}
> test("SPARK-XXX") {
>   import org.apache.spark.resource.{ResourceProfileBuilder, 
> TaskResourceRequests}
>   withTempDir { dir =>
> val scriptPath = createTempScriptWithExpectedOutput(dir, 
> "gpuDiscoveryScript",
>   """{"name": "gpu","addresses":["0"]}""")
> val conf = new SparkConf()
>   .setAppName("test")
>   .setMaster("local-cluster[1, 12, 1024]")
>   .set("spark.executor.cores", "12")
> conf.set(TASK_GPU_ID.amountConf, "0.08")
> conf.set(WORKER_GPU_ID.amountConf, "1")
> conf.set(WORKER_GPU_ID.discoveryScriptConf, scriptPath)
> conf.set(EXECUTOR_GPU_ID.amountConf, "1")
> sc = new SparkContext(conf)
> val rdd = sc.range(0, 100, 1, 4)
> var rdd1 = rdd.repartition(3)
> val treqs = new TaskResourceRequests().cpus(1).resource("gpu", 1.0)
> val rp = new ResourceProfileBuilder().require(treqs).build
> rdd1 = rdd1.withResources(rp)
> assert(rdd1.collect().size === 100)
>   }
> } {code}
> In the above test, the 3 tasks generated by rdd1 are expected to be executed 
> in sequence as we expect "new TaskResourceRequests().cpus(1).resource("gpu", 
> 1.0)" should override "conf.set(TASK_GPU_ID.amountConf, "0.08")". However, 
> those 3 tasks are run in parallel in fact.
> The root cause is that ExecutorData#ExecutorResourceInfo#numParts is static. 
> In this case, the "gpu.numParts" is initialized with 12 (1/0.08) and won't 
> change even if there's a new task resource request (e.g., resource("gpu", 
> 1.0) in this case). Thus, those 3 tasks are able to be executed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45803) Remove the no longer used `RpcAbortException`.

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45803:
---
Labels: pull-request-available  (was: )

> Remove the no longer used `RpcAbortException`.
> --
>
> Key: SPARK-45803
> URL: https://issues.apache.org/jira/browse/SPARK-45803
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45803) Remove the no longer used `RpcAbortException`.

2023-11-06 Thread Yang Jie (Jira)

Yang Jie created SPARK-45803:


 Summary: Remove the no longer used `RpcAbortException`.
 Key: SPARK-45803
 URL: https://issues.apache.org/jira/browse/SPARK-45803
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45802) Remove no longer needed Java majorVersion checks in `Platform`

2023-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45802:
---
Labels: pull-request-available  (was: )

> Remove no longer needed Java majorVersion checks in `Platform`
> --
>
> Key: SPARK-45802
> URL: https://issues.apache.org/jira/browse/SPARK-45802
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45802) Remove no longer needed Java majorVersion checks in `Platform`

2023-11-06 Thread Yang Jie (Jira)

Yang Jie created SPARK-45802:


 Summary: Remove no longer needed Java majorVersion checks in 
`Platform`
 Key: SPARK-45802
 URL: https://issues.apache.org/jira/browse/SPARK-45802
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

65 matches

Mail list logo