[jira] [Resolved] (SPARK-45775) Drop table skiped when CatalogV2Util loadTable meet unexpected Exception
[ https://issues.apache.org/jira/browse/SPARK-45775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] konwu resolved SPARK-45775. --- Resolution: Invalid > Drop table skiped when CatalogV2Util loadTable meet unexpected Exception > > > Key: SPARK-45775 > URL: https://issues.apache.org/jira/browse/SPARK-45775 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3 > Environment: spark 3.1.3 >Reporter: konwu >Priority: Major > > Currently CatalogV2Util.loadTable method catch only NoSuch*Exception like > below > {code:java} > def loadTable(catalog: CatalogPlugin, ident: Identifier): Option[Table] = > try { > Option(catalog.asTableCatalog.loadTable(ident)) > } catch { > case _: NoSuchTableException => None > case _: NoSuchDatabaseException => None > case _: NoSuchNamespaceException => None > } {code} > It will skip drop table when conmunicate with meta time out or other > Exception, because the method always return None, maybe we should catch it > like below > {code:java} > def loadTable(catalog: CatalogPlugin, ident: Identifier): Option[Table] = > try { > Option(catalog.asTableCatalog.loadTable(ident)) > } catch { > case e: NoSuchTableException => return None > case e: NoSuchDatabaseException => return None > case e: NoSuchNamespaceException => return None > case e: Throwable => throw e > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45013) Flaky Test with NPE: track allocated resources by taskId
[ https://issues.apache.org/jira/browse/SPARK-45013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45013: - Assignee: Kent Yao > Flaky Test with NPE: track allocated resources by taskId > > > Key: SPARK-45013 > URL: https://issues.apache.org/jira/browse/SPARK-45013 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > > {code:java} > - track allocated resources by taskId *** FAILED *** (76 milliseconds) > 28782[info] java.lang.NullPointerException: > 28783[info] at > org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:267) > 28784[info] at > org.apache.spark.executor.CoarseGrainedExecutorBackendSuite.$anonfun$new$22(CoarseGrainedExecutorBackendSuite.scala:347) > 28785[info] at > org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) > 28786[info] at > org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) > 28787[info] at > org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) > 28788[info] at > org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) > 28789[info] at > org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69) > 28790[info] at > org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155) > 28791[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > 28792[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > 28793[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > 28794[info] at org.scalatest.Transformer.apply(Transformer.scala:22) > 28795[info] at org.scalatest.Transformer.apply(Transformer.scala:20) > 28796[info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > 28797[info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227) > 28798[info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > 28799[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > 28800[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > 28801[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > 28802[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > 28803[info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69) > 28804[info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > 28805[info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > 28806[info] at > org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69) > 28807[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > 28808[info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45013) Flaky Test with NPE: track allocated resources by taskId
[ https://issues.apache.org/jira/browse/SPARK-45013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45013. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43693 [https://github.com/apache/spark/pull/43693] > Flaky Test with NPE: track allocated resources by taskId > > > Key: SPARK-45013 > URL: https://issues.apache.org/jira/browse/SPARK-45013 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > - track allocated resources by taskId *** FAILED *** (76 milliseconds) > 28782[info] java.lang.NullPointerException: > 28783[info] at > org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:267) > 28784[info] at > org.apache.spark.executor.CoarseGrainedExecutorBackendSuite.$anonfun$new$22(CoarseGrainedExecutorBackendSuite.scala:347) > 28785[info] at > org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) > 28786[info] at > org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) > 28787[info] at > org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) > 28788[info] at > org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) > 28789[info] at > org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69) > 28790[info] at > org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155) > 28791[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > 28792[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > 28793[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > 28794[info] at org.scalatest.Transformer.apply(Transformer.scala:22) > 28795[info] at org.scalatest.Transformer.apply(Transformer.scala:20) > 28796[info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > 28797[info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227) > 28798[info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > 28799[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > 28800[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > 28801[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > 28802[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > 28803[info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69) > 28804[info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > 28805[info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > 28806[info] at > org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69) > 28807[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > 28808[info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45816) Return null when overflowing during casting from timestamp to integers
[ https://issues.apache.org/jira/browse/SPARK-45816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45816: --- Labels: pull-request-available (was: ) > Return null when overflowing during casting from timestamp to integers > -- > > Key: SPARK-45816 > URL: https://issues.apache.org/jira/browse/SPARK-45816 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.1, 3.5.0 >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > > Spark cast works in two modes: ansi and non-ansi. When overflowing during > casting, the common behavior under non-ansi mode is to return null. However, > casting from Timestamp to Int/Short/Byte returns a wrapping value now. The > behavior to silently overflow doesn't make sense. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45816) Return null when overflowing during casting from timestamp to integers
L. C. Hsieh created SPARK-45816: --- Summary: Return null when overflowing during casting from timestamp to integers Key: SPARK-45816 URL: https://issues.apache.org/jira/browse/SPARK-45816 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0, 3.4.1, 3.3.3 Reporter: L. C. Hsieh Spark cast works in two modes: ansi and non-ansi. When overflowing during casting, the common behavior under non-ansi mode is to return null. However, casting from Timestamp to Int/Short/Byte returns a wrapping value now. The behavior to silently overflow doesn't make sense. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45013) Flaky Test with NPE: track allocated resources by taskId
[ https://issues.apache.org/jira/browse/SPARK-45013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45013: --- Labels: pull-request-available (was: ) > Flaky Test with NPE: track allocated resources by taskId > > > Key: SPARK-45013 > URL: https://issues.apache.org/jira/browse/SPARK-45013 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > > {code:java} > - track allocated resources by taskId *** FAILED *** (76 milliseconds) > 28782[info] java.lang.NullPointerException: > 28783[info] at > org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:267) > 28784[info] at > org.apache.spark.executor.CoarseGrainedExecutorBackendSuite.$anonfun$new$22(CoarseGrainedExecutorBackendSuite.scala:347) > 28785[info] at > org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) > 28786[info] at > org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) > 28787[info] at > org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) > 28788[info] at > org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) > 28789[info] at > org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69) > 28790[info] at > org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155) > 28791[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > 28792[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > 28793[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > 28794[info] at org.scalatest.Transformer.apply(Transformer.scala:22) > 28795[info] at org.scalatest.Transformer.apply(Transformer.scala:20) > 28796[info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > 28797[info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227) > 28798[info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > 28799[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > 28800[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > 28801[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > 28802[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > 28803[info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69) > 28804[info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > 28805[info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > 28806[info] at > org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69) > 28807[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > 28808[info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45812) Upgrade Pandas to 2.1.2
[ https://issues.apache.org/jira/browse/SPARK-45812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45812. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43689 [https://github.com/apache/spark/pull/43689] > Upgrade Pandas to 2.1.2 > --- > > Key: SPARK-45812 > URL: https://issues.apache.org/jira/browse/SPARK-45812 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45804) Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off
[ https://issues.apache.org/jira/browse/SPARK-45804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45804. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43674 [https://github.com/apache/spark/pull/43674] > Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off > - > > Key: SPARK-45804 > URL: https://issues.apache.org/jira/browse/SPARK-45804 > Project: Spark > Issue Type: Sub-task > Components: UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45804) Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off
[ https://issues.apache.org/jira/browse/SPARK-45804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45804: - Assignee: Kent Yao > Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off > - > > Key: SPARK-45804 > URL: https://issues.apache.org/jira/browse/SPARK-45804 > Project: Spark > Issue Type: Sub-task > Components: UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45815) Provide an interface for Streaming sources to add _metadata columns
[ https://issues.apache.org/jira/browse/SPARK-45815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45815: --- Labels: pull-request-available (was: ) > Provide an interface for Streaming sources to add _metadata columns > --- > > Key: SPARK-45815 > URL: https://issues.apache.org/jira/browse/SPARK-45815 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.5.1 >Reporter: Yaohua Zhao >Priority: Major > Labels: pull-request-available > > Currently, only the native V1 file-based streaming source can read the > `_metadata` column: > [https://github.com/apache/spark/blob/370870b7a0303e4a2c4b3dea1b479b4fcbc93f8d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala#L63] > > Our goal is to create an interface that allows other streaming sources to add > `{{{}_metadata`{}}} columns. For instance, we would like the Delta Streaming > source, which you can find here: > [https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L49], > to extend this interface and provide the `{{{}_metadata`{}}} column for its > underlying storage format, such as Parquet. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45815) Provide an interface for Streaming sources to add _metadata columns
[ https://issues.apache.org/jira/browse/SPARK-45815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-45815: Component/s: Structured Streaming > Provide an interface for Streaming sources to add _metadata columns > --- > > Key: SPARK-45815 > URL: https://issues.apache.org/jira/browse/SPARK-45815 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.5.1 >Reporter: Yaohua Zhao >Priority: Major > > Currently, only the native V1 file-based streaming source can read the > `_metadata` column: > [https://github.com/apache/spark/blob/370870b7a0303e4a2c4b3dea1b479b4fcbc93f8d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala#L63] > > Our goal is to create an interface that allows other streaming sources to add > `{{{}_metadata`{}}} columns. For instance, we would like the Delta Streaming > source, which you can find here: > [https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L49], > to extend this interface and provide the `{{{}_metadata`{}}} column for its > underlying storage format, such as Parquet. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45815) Provide an interface for Streaming sources to add _metadata columns
[ https://issues.apache.org/jira/browse/SPARK-45815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-45815: Description: Currently, only the native V1 file-based streaming source can read the `_metadata` column: [https://github.com/apache/spark/blob/370870b7a0303e4a2c4b3dea1b479b4fcbc93f8d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala#L63] Our goal is to create an interface that allows other streaming sources to add `{{{}_metadata`{}}} columns. For instance, we would like the Delta Streaming source, which you can find here: [https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L49], to extend this interface and provide the `{{{}_metadata`{}}} column for its underlying storage format, such as Parquet. was: Currently, only the native V1 file-based streaming source can read the `_metadata`{{{}{}}} column: https://github.com/apache/spark/blob/370870b7a0303e4a2c4b3dea1b479b4fcbc93f8d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala#L63 Our goal is to create an interface that allows other streaming sources to add `{{{}_metadata`{}}} columns. For instance, we would like the Delta Streaming source, which you can find here: [https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L49], to extend this interface and provide the `{{{}_metadata`{}}} column for its underlying storage format, such as Parquet. > Provide an interface for Streaming sources to add _metadata columns > --- > > Key: SPARK-45815 > URL: https://issues.apache.org/jira/browse/SPARK-45815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Yaohua Zhao >Priority: Major > > Currently, only the native V1 file-based streaming source can read the > `_metadata` column: > [https://github.com/apache/spark/blob/370870b7a0303e4a2c4b3dea1b479b4fcbc93f8d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala#L63] > > Our goal is to create an interface that allows other streaming sources to add > `{{{}_metadata`{}}} columns. For instance, we would like the Delta Streaming > source, which you can find here: > [https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L49], > to extend this interface and provide the `{{{}_metadata`{}}} column for its > underlying storage format, such as Parquet. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45815) Provide an interface for Streaming sources to add _metadata columns
Yaohua Zhao created SPARK-45815: --- Summary: Provide an interface for Streaming sources to add _metadata columns Key: SPARK-45815 URL: https://issues.apache.org/jira/browse/SPARK-45815 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.1 Reporter: Yaohua Zhao Currently, only the native V1 file-based streaming source can read the `_metadata`{{{}{}}} column: https://github.com/apache/spark/blob/370870b7a0303e4a2c4b3dea1b479b4fcbc93f8d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala#L63 Our goal is to create an interface that allows other streaming sources to add `{{{}_metadata`{}}} columns. For instance, we would like the Delta Streaming source, which you can find here: [https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L49], to extend this interface and provide the `{{{}_metadata`{}}} column for its underlying storage format, such as Parquet. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45811) XML: Refine docstring of from_xml
[ https://issues.apache.org/jira/browse/SPARK-45811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45811: --- Labels: pull-request-available (was: ) > XML: Refine docstring of from_xml > - > > Key: SPARK-45811 > URL: https://issues.apache.org/jira/browse/SPARK-45811 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45814) ArrowConverters.createEmptyArrowBatch may cause memory leak
[ https://issues.apache.org/jira/browse/SPARK-45814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45814: --- Labels: pull-request-available (was: ) > ArrowConverters.createEmptyArrowBatch may cause memory leak > --- > > Key: SPARK-45814 > URL: https://issues.apache.org/jira/browse/SPARK-45814 > Project: Spark > Issue Type: Bug > Components: Connect, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: xie shuiahu >Priority: Minor > Labels: pull-request-available > > ArrowConverters.createEmptyArrowBatch don't call hasNext, if TaskContext.get > is None, then memory leak happens -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45814) ArrowConverters.createEmptyArrowBatch may cause memory leak
xie shuiahu created SPARK-45814: --- Summary: ArrowConverters.createEmptyArrowBatch may cause memory leak Key: SPARK-45814 URL: https://issues.apache.org/jira/browse/SPARK-45814 Project: Spark Issue Type: Bug Components: Connect, SQL Affects Versions: 3.5.0, 3.4.1 Reporter: xie shuiahu ArrowConverters.createEmptyArrowBatch don't call hasNext, if TaskContext.get is None, then memory leak happens -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45813) Return the observed metrics from commands
[ https://issues.apache.org/jira/browse/SPARK-45813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45813: --- Labels: pull-request-available (was: ) > Return the observed metrics from commands > - > > Key: SPARK-45813 > URL: https://issues.apache.org/jira/browse/SPARK-45813 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34444) Pushdown scalar-subquery filter to FileSourceScan
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You resolved SPARK-3. --- Fix Version/s: 4.0.0 Resolution: Fixed > Pushdown scalar-subquery filter to FileSourceScan > - > > Key: SPARK-3 > URL: https://issues.apache.org/jira/browse/SPARK-3 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > Fix For: 4.0.0 > > > We can pushdown {{a < (select max(d) from t2)}} to FileSourceScan: > {code:scala} > sql("CREATE TABLE t1 using parquet AS SELECT id AS a, id AS b FROM > range(5L)") > sql("CREATE TABLE t2 using parquet AS SELECT id AS d FROM range(20)") > sql("SELECT * FROM t1 WHERE b = (select max(d) from t2)").show > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45813) Return the observed metrics from commands
Takuya Ueshin created SPARK-45813: - Summary: Return the observed metrics from commands Key: SPARK-45813 URL: https://issues.apache.org/jira/browse/SPARK-45813 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45812) Upgrade Pandas to 2.1.2
[ https://issues.apache.org/jira/browse/SPARK-45812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45812: --- Labels: pull-request-available (was: ) > Upgrade Pandas to 2.1.2 > --- > > Key: SPARK-45812 > URL: https://issues.apache.org/jira/browse/SPARK-45812 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45812) Upgrade Pandas to 2.1.2
Haejoon Lee created SPARK-45812: --- Summary: Upgrade Pandas to 2.1.2 Key: SPARK-45812 URL: https://issues.apache.org/jira/browse/SPARK-45812 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45223) Refine docstring of `Column.when`
[ https://issues.apache.org/jira/browse/SPARK-45223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45223: --- Labels: pull-request-available (was: ) > Refine docstring of `Column.when` > - > > Key: SPARK-45223 > URL: https://issues.apache.org/jira/browse/SPARK-45223 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Refine the docstring of Column.when -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44728) Improve PySpark documentations
[ https://issues.apache.org/jira/browse/SPARK-44728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44728: - Fix Version/s: (was: 4.0.0) > Improve PySpark documentations > -- > > Key: SPARK-44728 > URL: https://issues.apache.org/jira/browse/SPARK-44728 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Allison Wang >Priority: Major > > An umbrella Jira ticket to improve the PySpark documentation. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45803) Remove the no longer used `RpcAbortException`.
[ https://issues.apache.org/jira/browse/SPARK-45803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45803. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43673 [https://github.com/apache/spark/pull/43673] > Remove the no longer used `RpcAbortException`. > -- > > Key: SPARK-45803 > URL: https://issues.apache.org/jira/browse/SPARK-45803 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45803) Remove the no longer used `RpcAbortException`.
[ https://issues.apache.org/jira/browse/SPARK-45803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45803: - Assignee: Yang Jie > Remove the no longer used `RpcAbortException`. > -- > > Key: SPARK-45803 > URL: https://issues.apache.org/jira/browse/SPARK-45803 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45222) Refine docstring of `DataFrameReader.json`
[ https://issues.apache.org/jira/browse/SPARK-45222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45222: --- Labels: pull-request-available (was: ) > Refine docstring of `DataFrameReader.json` > -- > > Key: SPARK-45222 > URL: https://issues.apache.org/jira/browse/SPARK-45222 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Refine the docstring of read json -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45260) Refine docstring of count_distinct
[ https://issues.apache.org/jira/browse/SPARK-45260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45260: --- Labels: pull-request-available (was: ) > Refine docstring of count_distinct > -- > > Key: SPARK-45260 > URL: https://issues.apache.org/jira/browse/SPARK-45260 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Refine docstring of the function `count_distinct`, (e.g provide examples with > groupBy) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45259) Refine docstring of `count`
[ https://issues.apache.org/jira/browse/SPARK-45259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45259: --- Labels: pull-request-available (was: ) > Refine docstring of `count` > --- > > Key: SPARK-45259 > URL: https://issues.apache.org/jira/browse/SPARK-45259 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Refine the docstring of the function `count` (e.g provide examples with > groupby) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45805) Eliminate magic numbers in withOrigin
[ https://issues.apache.org/jira/browse/SPARK-45805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Toth resolved SPARK-45805. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43671 [https://github.com/apache/spark/pull/43671] > Eliminate magic numbers in withOrigin > - > > Key: SPARK-45805 > URL: https://issues.apache.org/jira/browse/SPARK-45805 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Refactor `withOrigin`, and make it more generic by eliminating the magic > number from which the traverse of stack traces starts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45258) Refine docstring of `sum`
[ https://issues.apache.org/jira/browse/SPARK-45258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45258: --- Labels: pull-request-available (was: ) > Refine docstring of `sum` > - > > Key: SPARK-45258 > URL: https://issues.apache.org/jira/browse/SPARK-45258 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Refine the docstring of function `sum` (e.g provide examples with groupBy) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45810) Create API to stop consuming rows from the input table
[ https://issues.apache.org/jira/browse/SPARK-45810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45810: --- Labels: pull-request-available (was: ) > Create API to stop consuming rows from the input table > -- > > Key: SPARK-45810 > URL: https://issues.apache.org/jira/browse/SPARK-45810 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Daniel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45186) XML: Refine docstring of schema_of_xml
[ https://issues.apache.org/jira/browse/SPARK-45186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45186: --- Labels: pull-request-available (was: ) > XML: Refine docstring of schema_of_xml > -- > > Key: SPARK-45186 > URL: https://issues.apache.org/jira/browse/SPARK-45186 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45186) XML: Refine docstring of schema_of_xml
[ https://issues.apache.org/jira/browse/SPARK-45186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-45186: - Summary: XML: Refine docstring of schema_of_xml (was: XML: Refine docstring of from_xml, schema_of_xml) > XML: Refine docstring of schema_of_xml > -- > > Key: SPARK-45186 > URL: https://issues.apache.org/jira/browse/SPARK-45186 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45811) XML: Refine docstring of from_xml
Hyukjin Kwon created SPARK-45811: Summary: XML: Refine docstring of from_xml Key: SPARK-45811 URL: https://issues.apache.org/jira/browse/SPARK-45811 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Sandip Agarwala -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45810) Create API to stop consuming rows from the input table
Daniel created SPARK-45810: -- Summary: Create API to stop consuming rows from the input table Key: SPARK-45810 URL: https://issues.apache.org/jira/browse/SPARK-45810 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Daniel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45809) Refine docstring of `lit`
[ https://issues.apache.org/jira/browse/SPARK-45809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45809: --- Labels: pull-request-available (was: ) > Refine docstring of `lit` > - > > Key: SPARK-45809 > URL: https://issues.apache.org/jira/browse/SPARK-45809 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45809) Refine docstring of `lit`
Hyukjin Kwon created SPARK-45809: Summary: Refine docstring of `lit` Key: SPARK-45809 URL: https://issues.apache.org/jira/browse/SPARK-45809 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45808) Improve error details for Spark Connect Client in Python
[ https://issues.apache.org/jira/browse/SPARK-45808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45808: --- Labels: pull-request-available (was: ) > Improve error details for Spark Connect Client in Python > > > Key: SPARK-45808 > URL: https://issues.apache.org/jira/browse/SPARK-45808 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Martin Grund >Priority: Major > Labels: pull-request-available > > Improve the error handling in Spark Connect Python Client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45808) Improve error details for Spark Connect Client in Python
Martin Grund created SPARK-45808: Summary: Improve error details for Spark Connect Client in Python Key: SPARK-45808 URL: https://issues.apache.org/jira/browse/SPARK-45808 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Martin Grund Improve the error handling in Spark Connect Python Client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45773) Refine docstring of `SparkSession.builder.config`
[ https://issues.apache.org/jira/browse/SPARK-45773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45773: Assignee: Allison Wang > Refine docstring of `SparkSession.builder.config` > - > > Key: SPARK-45773 > URL: https://issues.apache.org/jira/browse/SPARK-45773 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Refine the docstring of SparkSession.builder.config > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45773) Refine docstring of `SparkSession.builder.config`
[ https://issues.apache.org/jira/browse/SPARK-45773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45773. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43639 [https://github.com/apache/spark/pull/43639] > Refine docstring of `SparkSession.builder.config` > - > > Key: SPARK-45773 > URL: https://issues.apache.org/jira/browse/SPARK-45773 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Refine the docstring of SparkSession.builder.config > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45786) Inaccurate Decimal multiplication and division results
[ https://issues.apache.org/jira/browse/SPARK-45786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45786: --- Labels: pull-request-available (was: ) > Inaccurate Decimal multiplication and division results > -- > > Key: SPARK-45786 > URL: https://issues.apache.org/jira/browse/SPARK-45786 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.4, 3.3.3, 3.4.1, 3.5.0, 4.0.0 >Reporter: Kazuyuki Tanimura >Priority: Major > Labels: pull-request-available > > Decimal multiplication and division results may be inaccurate due to rounding > issues. > h2. Multiplication: > {code:scala} > scala> sql("select -14120025096157587712113961295153.858047 * > -0.4652").show(truncate=false) > ++ > > |(-14120025096157587712113961295153.858047 * -0.4652)| > ++ > |6568635674732509803675414794505.574764 | > ++ > {code} > The correct answer is > {quote}6568635674732509803675414794505.574763 > {quote} > Please note that the last digit is 3 instead of 4 as > > {code:scala} > scala> > java.math.BigDecimal("-14120025096157587712113961295153.858047").multiply(java.math.BigDecimal("-0.4652")) > val res21: java.math.BigDecimal = 6568635674732509803675414794505.5747634644 > {code} > Since the factional part .574763 is followed by 4644, it should not be > rounded up. > h2. Division: > {code:scala} > scala> sql("select -0.172787979 / > 533704665545018957788294905796.5").show(truncate=false) > +-+ > |(-0.172787979 / 533704665545018957788294905796.5)| > +-+ > |-3.237521E-31| > +-+ > {code} > The correct answer is > {quote}-3.237520E-31 > {quote} > Please note that the last digit is 0 instead of 1 as > > {code:scala} > scala> > java.math.BigDecimal("-0.172787979").divide(java.math.BigDecimal("533704665545018957788294905796.5"), > 100, java.math.RoundingMode.DOWN) > val res22: java.math.BigDecimal = > -3.237520489418037889998826491401059986665344697406144511563561222578738E-31 > {code} > Since the factional part .237520 is followed by 4894..., it should not be > rounded up. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43402) FileSourceScanExec supports push down data filter with scalar subquery
[ https://issues.apache.org/jira/browse/SPARK-43402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43402: --- Labels: pull-request-available (was: ) > FileSourceScanExec supports push down data filter with scalar subquery > -- > > Key: SPARK-43402 > URL: https://issues.apache.org/jira/browse/SPARK-43402 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Scalar subquery can be pushed down as data filter at runtime, since we always > execute subquery first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45801) Introduce two helper methods for `QueryTest` that accept the `Array` type `expectedAnswer`
[ https://issues.apache.org/jira/browse/SPARK-45801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45801. -- Resolution: Won't Fix > Introduce two helper methods for `QueryTest` that accept the `Array` type > `expectedAnswer` > -- > > Key: SPARK-45801 > URL: https://issues.apache.org/jira/browse/SPARK-45801 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > > They are used to bridge > {code:java} > protected def checkAnswer(df: => DataFrame, expectedAnswer: Seq[Row]): Unit > {code} > and > {code:java} > def checkAnswer(df: DataFrame, expectedAnswer: Seq[Row], checkToRDD: Boolean > = true): Unit {code} > to reduce compilation warnings related to `method > copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45807) DataSourceV2: Improve ViewCatalog API
[ https://issues.apache.org/jira/browse/SPARK-45807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45807: --- Labels: pull-request-available (was: ) > DataSourceV2: Improve ViewCatalog API > - > > Key: SPARK-45807 > URL: https://issues.apache.org/jira/browse/SPARK-45807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Eduard Tudenhoefner >Priority: Major > Labels: pull-request-available > > The goal is to add createOrReplaceView(..) and replaceView(..) methods to the > ViewCatalog API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45807) DataSourceV2: Improve ViewCatalog API
Eduard Tudenhoefner created SPARK-45807: --- Summary: DataSourceV2: Improve ViewCatalog API Key: SPARK-45807 URL: https://issues.apache.org/jira/browse/SPARK-45807 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Eduard Tudenhoefner The goal is to add createOrReplaceView(..) and replaceView(..) methods to the ViewCatalog API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45806) GROUP BY ALL don't work in ansi mode
XiaozongCui created SPARK-45806: --- Summary: GROUP BY ALL don't work in ansi mode Key: SPARK-45806 URL: https://issues.apache.org/jira/browse/SPARK-45806 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0, 3.4.0 Reporter: XiaozongCui I noticed that we treat 'ALL' in 'GROUP BY ALL' as an Identifier in parsing, and this will cause problem when I turn on ansi keyword behavior *set spark.sql.ansi.enabled=true;* *set spark.sql.ansi.enforceReservedKeywords=true;* spark-sql (default)> select a,b,c, count(*) from values(1,2,3)t(a,b,c) group by all; [PARSE_SYNTAX_ERROR] Syntax error at or near 'all'.(line 1, pos 59) == SQL == select a,b,c, count(*) from values(1,2,3)t(a,b,c) group by all ---^^^ can we allow this reserved keyword in ansi mode ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45805) Eliminate magic numbers in withOrigin
[ https://issues.apache.org/jira/browse/SPARK-45805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45805: --- Labels: pull-request-available (was: ) > Eliminate magic numbers in withOrigin > - > > Key: SPARK-45805 > URL: https://issues.apache.org/jira/browse/SPARK-45805 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > Refactor `withOrigin`, and make it more generic by eliminating the magic > number from which the traverse of stack traces starts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45793) Improve the built-in compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45793. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43659 [https://github.com/apache/spark/pull/43659] > Improve the built-in compression codecs > --- > > Key: SPARK-45793 > URL: https://issues.apache.org/jira/browse/SPARK-45793 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, Spark supported many built-in compression codecs used for I/O and > storage. > There are a lot of magic strings copy from built-in compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45805) Eliminate magic numbers in withOrigin
Max Gekk created SPARK-45805: Summary: Eliminate magic numbers in withOrigin Key: SPARK-45805 URL: https://issues.apache.org/jira/browse/SPARK-45805 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk Refactor `withOrigin`, and make it more generic by eliminating the magic number from which the traverse of stack traces starts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45804) Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off
[ https://issues.apache.org/jira/browse/SPARK-45804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45804: --- Labels: pull-request-available (was: ) > Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off > - > > Key: SPARK-45804 > URL: https://issues.apache.org/jira/browse/SPARK-45804 > Project: Spark > Issue Type: Sub-task > Components: UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45804) Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off
Kent Yao created SPARK-45804: Summary: Add spark.ui.threadDump.flamegraphEnabled config to switch flame graph on/off Key: SPARK-45804 URL: https://issues.apache.org/jira/browse/SPARK-45804 Project: Spark Issue Type: Sub-task Components: UI Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45758) Introduce a mapper for hadoop compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45758. Resolution: Resolved > Introduce a mapper for hadoop compression codecs > > > Key: SPARK-45758 > URL: https://issues.apache.org/jira/browse/SPARK-45758 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Currently, Spark supported partial Hadoop compression codecs, but the Hadoop > supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce two fake compression codecs none and > uncompress. > There are a lot of magic strings copy from Hadoop compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45758) Introduce a mapper for hadoop compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783169#comment-17783169 ] Jiaan Geng commented on SPARK-45758: Resolved by https://github.com/apache/spark/pull/43620 > Introduce a mapper for hadoop compression codecs > > > Key: SPARK-45758 > URL: https://issues.apache.org/jira/browse/SPARK-45758 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Currently, Spark supported partial Hadoop compression codecs, but the Hadoop > supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce two fake compression codecs none and > uncompress. > There are a lot of magic strings copy from Hadoop compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41635) GROUP BY ALL
[ https://issues.apache.org/jira/browse/SPARK-41635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-41635: --- Labels: pull-request-available (was: ) > GROUP BY ALL > > > Key: SPARK-41635 > URL: https://issues.apache.org/jira/browse/SPARK-41635 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > This patch implements GROUP BY ALL, similar to the one initially implemented > in DuckDB. When specified, the analyzer automatically infers the grouping > columns based on the expressions specified in the select clause: all > expressions that don't include any aggregate expressions are pulled > implicitly into the grouping columns. This avoids users having to specify > individually the list of grouping columns in most cases. > Examples: > {noformat} > select key, count, sum(score) from table group by all > -- rewritten to > select key, count, sum(score) from table group by key{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45798) Assert server-side session ID in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-45798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45798: -- Assignee: (was: Apache Spark) > Assert server-side session ID in Spark Connect > -- > > Key: SPARK-45798 > URL: https://issues.apache.org/jira/browse/SPARK-45798 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Martin Grund >Priority: Major > Labels: pull-request-available > > When accessing the Spark Session remotely, it is possible that the server has > silently restarted and we loose temporary state like for example views or > function definitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45798) Assert server-side session ID in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-45798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45798: -- Assignee: Apache Spark > Assert server-side session ID in Spark Connect > -- > > Key: SPARK-45798 > URL: https://issues.apache.org/jira/browse/SPARK-45798 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Martin Grund >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > When accessing the Spark Session remotely, it is possible that the server has > silently restarted and we loose temporary state like for example views or > function definitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45556) Inconsistent status code between web page and REST API when exception is thrown
[ https://issues.apache.org/jira/browse/SPARK-45556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45556: -- Assignee: (was: Apache Spark) > Inconsistent status code between web page and REST API when exception is > thrown > --- > > Key: SPARK-45556 > URL: https://issues.apache.org/jira/browse/SPARK-45556 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.5.0 >Reporter: wy >Priority: Minor > Labels: pull-request-available > > Spark history server provides > [AppHistoryServerPlugin|https://github.com/kuwii/spark/blob/dev/status-code/core/src/main/scala/org/apache/spark/status/AppHistoryServerPlugin.scala] > to add extra REST API and web pages. However there's an issue when > exceptions are thrown, causing incnosistent status code between web page and > REST API. > For REST API, if the thrown exception is an instance of > WebApplicationException, then the status code will be set as the one defined > within the exception. > However for web page, all exceptions are wrapped within a 500 response. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45556) Inconsistent status code between web page and REST API when exception is thrown
[ https://issues.apache.org/jira/browse/SPARK-45556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45556: -- Assignee: Apache Spark > Inconsistent status code between web page and REST API when exception is > thrown > --- > > Key: SPARK-45556 > URL: https://issues.apache.org/jira/browse/SPARK-45556 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.5.0 >Reporter: wy >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > > Spark history server provides > [AppHistoryServerPlugin|https://github.com/kuwii/spark/blob/dev/status-code/core/src/main/scala/org/apache/spark/status/AppHistoryServerPlugin.scala] > to add extra REST API and web pages. However there's an issue when > exceptions are thrown, causing incnosistent status code between web page and > REST API. > For REST API, if the thrown exception is an instance of > WebApplicationException, then the status code will be set as the one defined > within the exception. > However for web page, all exceptions are wrapped within a 500 response. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45527) Task fraction resource request is not expected
[ https://issues.apache.org/jira/browse/SPARK-45527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45527: -- Assignee: Apache Spark > Task fraction resource request is not expected > -- > > Key: SPARK-45527 > URL: https://issues.apache.org/jira/browse/SPARK-45527 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1, 3.3.3, 3.4.1, 3.5.0 >Reporter: wuyi >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > > {code:java} > test("SPARK-XXX") { > import org.apache.spark.resource.{ResourceProfileBuilder, > TaskResourceRequests} > withTempDir { dir => > val scriptPath = createTempScriptWithExpectedOutput(dir, > "gpuDiscoveryScript", > """{"name": "gpu","addresses":["0"]}""") > val conf = new SparkConf() > .setAppName("test") > .setMaster("local-cluster[1, 12, 1024]") > .set("spark.executor.cores", "12") > conf.set(TASK_GPU_ID.amountConf, "0.08") > conf.set(WORKER_GPU_ID.amountConf, "1") > conf.set(WORKER_GPU_ID.discoveryScriptConf, scriptPath) > conf.set(EXECUTOR_GPU_ID.amountConf, "1") > sc = new SparkContext(conf) > val rdd = sc.range(0, 100, 1, 4) > var rdd1 = rdd.repartition(3) > val treqs = new TaskResourceRequests().cpus(1).resource("gpu", 1.0) > val rp = new ResourceProfileBuilder().require(treqs).build > rdd1 = rdd1.withResources(rp) > assert(rdd1.collect().size === 100) > } > } {code} > In the above test, the 3 tasks generated by rdd1 are expected to be executed > in sequence as we expect "new TaskResourceRequests().cpus(1).resource("gpu", > 1.0)" should override "conf.set(TASK_GPU_ID.amountConf, "0.08")". However, > those 3 tasks are run in parallel in fact. > The root cause is that ExecutorData#ExecutorResourceInfo#numParts is static. > In this case, the "gpu.numParts" is initialized with 12 (1/0.08) and won't > change even if there's a new task resource request (e.g., resource("gpu", > 1.0) in this case). Thus, those 3 tasks are able to be executed in parallel. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45527) Task fraction resource request is not expected
[ https://issues.apache.org/jira/browse/SPARK-45527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45527: -- Assignee: (was: Apache Spark) > Task fraction resource request is not expected > -- > > Key: SPARK-45527 > URL: https://issues.apache.org/jira/browse/SPARK-45527 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1, 3.3.3, 3.4.1, 3.5.0 >Reporter: wuyi >Priority: Major > Labels: pull-request-available > > > {code:java} > test("SPARK-XXX") { > import org.apache.spark.resource.{ResourceProfileBuilder, > TaskResourceRequests} > withTempDir { dir => > val scriptPath = createTempScriptWithExpectedOutput(dir, > "gpuDiscoveryScript", > """{"name": "gpu","addresses":["0"]}""") > val conf = new SparkConf() > .setAppName("test") > .setMaster("local-cluster[1, 12, 1024]") > .set("spark.executor.cores", "12") > conf.set(TASK_GPU_ID.amountConf, "0.08") > conf.set(WORKER_GPU_ID.amountConf, "1") > conf.set(WORKER_GPU_ID.discoveryScriptConf, scriptPath) > conf.set(EXECUTOR_GPU_ID.amountConf, "1") > sc = new SparkContext(conf) > val rdd = sc.range(0, 100, 1, 4) > var rdd1 = rdd.repartition(3) > val treqs = new TaskResourceRequests().cpus(1).resource("gpu", 1.0) > val rp = new ResourceProfileBuilder().require(treqs).build > rdd1 = rdd1.withResources(rp) > assert(rdd1.collect().size === 100) > } > } {code} > In the above test, the 3 tasks generated by rdd1 are expected to be executed > in sequence as we expect "new TaskResourceRequests().cpus(1).resource("gpu", > 1.0)" should override "conf.set(TASK_GPU_ID.amountConf, "0.08")". However, > those 3 tasks are run in parallel in fact. > The root cause is that ExecutorData#ExecutorResourceInfo#numParts is static. > In this case, the "gpu.numParts" is initialized with 12 (1/0.08) and won't > change even if there's a new task resource request (e.g., resource("gpu", > 1.0) in this case). Thus, those 3 tasks are able to be executed in parallel. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45803) Remove the no longer used `RpcAbortException`.
[ https://issues.apache.org/jira/browse/SPARK-45803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45803: --- Labels: pull-request-available (was: ) > Remove the no longer used `RpcAbortException`. > -- > > Key: SPARK-45803 > URL: https://issues.apache.org/jira/browse/SPARK-45803 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45803) Remove the no longer used `RpcAbortException`.
Yang Jie created SPARK-45803: Summary: Remove the no longer used `RpcAbortException`. Key: SPARK-45803 URL: https://issues.apache.org/jira/browse/SPARK-45803 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45802) Remove no longer needed Java majorVersion checks in `Platform`
[ https://issues.apache.org/jira/browse/SPARK-45802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45802: --- Labels: pull-request-available (was: ) > Remove no longer needed Java majorVersion checks in `Platform` > -- > > Key: SPARK-45802 > URL: https://issues.apache.org/jira/browse/SPARK-45802 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45802) Remove no longer needed Java majorVersion checks in `Platform`
Yang Jie created SPARK-45802: Summary: Remove no longer needed Java majorVersion checks in `Platform` Key: SPARK-45802 URL: https://issues.apache.org/jira/browse/SPARK-45802 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org