[jira] [Assigned] (SPARK-45730) [CORE] Make ReloadingX509TrustManagerSuite less flaky
[ https://issues.apache.org/jira/browse/SPARK-45730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-45730: --- Assignee: Hasnain Lakhani > [CORE] Make ReloadingX509TrustManagerSuite less flaky > - > > Key: SPARK-45730 > URL: https://issues.apache.org/jira/browse/SPARK-45730 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hasnain Lakhani >Assignee: Hasnain Lakhani >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45730) [CORE] Make ReloadingX509TrustManagerSuite less flaky
[ https://issues.apache.org/jira/browse/SPARK-45730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-45730. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43596 [https://github.com/apache/spark/pull/43596] > [CORE] Make ReloadingX509TrustManagerSuite less flaky > - > > Key: SPARK-45730 > URL: https://issues.apache.org/jira/browse/SPARK-45730 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hasnain Lakhani >Assignee: Hasnain Lakhani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45556) Inconsistent status code between web page and REST API when exception is thrown
[ https://issues.apache.org/jira/browse/SPARK-45556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45556: --- Labels: pull-request-available (was: ) > Inconsistent status code between web page and REST API when exception is > thrown > --- > > Key: SPARK-45556 > URL: https://issues.apache.org/jira/browse/SPARK-45556 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.5.0 >Reporter: wy >Priority: Minor > Labels: pull-request-available > > Spark history server provides > [AppHistoryServerPlugin|https://github.com/kuwii/spark/blob/dev/status-code/core/src/main/scala/org/apache/spark/status/AppHistoryServerPlugin.scala] > to add extra REST API and web pages. However there's an issue when > exceptions are thrown, causing incnosistent status code between web page and > REST API. > For REST API, if the thrown exception is an instance of > WebApplicationException, then the status code will be set as the one defined > within the exception. > However for web page, all exceptions are wrapped within a 500 response. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45777) Support `spark.test.appId` in `LocalSchedulerBackend`
[ https://issues.apache.org/jira/browse/SPARK-45777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45777: --- Labels: pull-request-available (was: ) > Support `spark.test.appId` in `LocalSchedulerBackend` > - > > Key: SPARK-45777 > URL: https://issues.apache.org/jira/browse/SPARK-45777 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45777) Support `spark.test.appId` in `LocalSchedulerBackend`
Dongjoon Hyun created SPARK-45777: - Summary: Support `spark.test.appId` in `LocalSchedulerBackend` Key: SPARK-45777 URL: https://issues.apache.org/jira/browse/SPARK-45777 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45776) Remove the defensive null check added in SPARK-39553.
[ https://issues.apache.org/jira/browse/SPARK-45776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45776: --- Labels: pull-request-available (was: ) > Remove the defensive null check added in SPARK-39553. > - > > Key: SPARK-45776 > URL: https://issues.apache.org/jira/browse/SPARK-45776 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > > {code:java} > def unregisterShuffle(shuffleId: Int): Unit = { > shuffleStatuses.remove(shuffleId).foreach { shuffleStatus => > // SPARK-39553: Add protection for Scala 2.13 due to > https://github.com/scala/bug/issues/12613 > // We should revert this if Scala 2.13 solves this issue. > if (shuffleStatus != null) { > shuffleStatus.invalidateSerializedMapOutputStatusCache() > shuffleStatus.invalidateSerializedMergeOutputStatusCache() > } > } > } {code} > This issue has been fixed in Scala 2.13.9. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45774) Support `spark.ui.historyServerUrl` in ApplicationPage
[ https://issues.apache.org/jira/browse/SPARK-45774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45774: - Assignee: Dongjoon Hyun > Support `spark.ui.historyServerUrl` in ApplicationPage > -- > > Key: SPARK-45774 > URL: https://issues.apache.org/jira/browse/SPARK-45774 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45774) Support `spark.ui.historyServerUrl` in `ApplicationPage`
[ https://issues.apache.org/jira/browse/SPARK-45774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45774: -- Summary: Support `spark.ui.historyServerUrl` in `ApplicationPage` (was: Support `spark.ui.historyServerUrl` in ApplicationPage) > Support `spark.ui.historyServerUrl` in `ApplicationPage` > > > Key: SPARK-45774 > URL: https://issues.apache.org/jira/browse/SPARK-45774 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45776) Remove the defensive null check added in SPARK-39553.
[ https://issues.apache.org/jira/browse/SPARK-45776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45776: - Description: {code:java} def unregisterShuffle(shuffleId: Int): Unit = { shuffleStatuses.remove(shuffleId).foreach { shuffleStatus => // SPARK-39553: Add protection for Scala 2.13 due to https://github.com/scala/bug/issues/12613 // We should revert this if Scala 2.13 solves this issue. if (shuffleStatus != null) { shuffleStatus.invalidateSerializedMapOutputStatusCache() shuffleStatus.invalidateSerializedMergeOutputStatusCache() } } } {code} This issue has been fixed in Scala 2.13.9. > Remove the defensive null check added in SPARK-39553. > - > > Key: SPARK-45776 > URL: https://issues.apache.org/jira/browse/SPARK-45776 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > {code:java} > def unregisterShuffle(shuffleId: Int): Unit = { > shuffleStatuses.remove(shuffleId).foreach { shuffleStatus => > // SPARK-39553: Add protection for Scala 2.13 due to > https://github.com/scala/bug/issues/12613 > // We should revert this if Scala 2.13 solves this issue. > if (shuffleStatus != null) { > shuffleStatus.invalidateSerializedMapOutputStatusCache() > shuffleStatus.invalidateSerializedMergeOutputStatusCache() > } > } > } {code} > This issue has been fixed in Scala 2.13.9. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45776) Remove the defensive null check added in SPARK-39553.
[ https://issues.apache.org/jira/browse/SPARK-45776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45776: - Environment: (was: {code:java} def unregisterShuffle(shuffleId: Int): Unit = { shuffleStatuses.remove(shuffleId).foreach { shuffleStatus => // SPARK-39553: Add protection for Scala 2.13 due to https://github.com/scala/bug/issues/12613 // We should revert this if Scala 2.13 solves this issue. if (shuffleStatus != null) { shuffleStatus.invalidateSerializedMapOutputStatusCache() shuffleStatus.invalidateSerializedMergeOutputStatusCache() } } } {code} This issue has been fixed in Scala 2.13.9.) > Remove the defensive null check added in SPARK-39553. > - > > Key: SPARK-45776 > URL: https://issues.apache.org/jira/browse/SPARK-45776 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45776) Remove the defensive null check added in SPARK-39553.
Yang Jie created SPARK-45776: Summary: Remove the defensive null check added in SPARK-39553. Key: SPARK-45776 URL: https://issues.apache.org/jira/browse/SPARK-45776 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Environment: {code:java} def unregisterShuffle(shuffleId: Int): Unit = { shuffleStatuses.remove(shuffleId).foreach { shuffleStatus => // SPARK-39553: Add protection for Scala 2.13 due to https://github.com/scala/bug/issues/12613 // We should revert this if Scala 2.13 solves this issue. if (shuffleStatus != null) { shuffleStatus.invalidateSerializedMapOutputStatusCache() shuffleStatus.invalidateSerializedMergeOutputStatusCache() } } } {code} This issue has been fixed in Scala 2.13.9. Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45775) Drop table skiped when CatalogV2Util loadTable meet unexpected Exception
konwu created SPARK-45775: - Summary: Drop table skiped when CatalogV2Util loadTable meet unexpected Exception Key: SPARK-45775 URL: https://issues.apache.org/jira/browse/SPARK-45775 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.3 Environment: spark 3.1.3 Reporter: konwu Currently CatalogV2Util.loadTable method catch only NoSuch*Exception like below {code:java} def loadTable(catalog: CatalogPlugin, ident: Identifier): Option[Table] = try { Option(catalog.asTableCatalog.loadTable(ident)) } catch { case _: NoSuchTableException => None case _: NoSuchDatabaseException => None case _: NoSuchNamespaceException => None } {code} It will skip drop table when conmunicate with meta time out or other Exception, because the method always return None, maybe we should catch it like below {code:java} def loadTable(catalog: CatalogPlugin, ident: Identifier): Option[Table] = try { Option(catalog.asTableCatalog.loadTable(ident)) } catch { case e: NoSuchTableException => return None case e: NoSuchDatabaseException => return None case e: NoSuchNamespaceException => return None case e: Throwable => throw e } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45774) Support `spark.ui.historyServerUrl` in ApplicationPage
[ https://issues.apache.org/jira/browse/SPARK-45774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45774: --- Labels: pull-request-available (was: ) > Support `spark.ui.historyServerUrl` in ApplicationPage > -- > > Key: SPARK-45774 > URL: https://issues.apache.org/jira/browse/SPARK-45774 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45774) Support `spark.ui.historyServerUrl` in ApplicationPage
Dongjoon Hyun created SPARK-45774: - Summary: Support `spark.ui.historyServerUrl` in ApplicationPage Key: SPARK-45774 URL: https://issues.apache.org/jira/browse/SPARK-45774 Project: Spark Issue Type: Sub-task Components: Spark Core, Web UI Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45695) Fix `method force in trait View is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45695. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43637 [https://github.com/apache/spark/pull/43637] > Fix `method force in trait View is deprecated` > -- > > Key: SPARK-45695 > URL: https://issues.apache.org/jira/browse/SPARK-45695 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Tengfei Huang >Priority: Minor > Fix For: 4.0.0 > > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala:368:36: > method force in trait View is deprecated (since 2.13.0): Views no longer > know about their underlying collection type; .force always returns an > IndexedSeq > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.catalyst.trees.TreeNode.legacyWithNewChildren.newArgs.$anonfun, > origin=scala.collection.View.force, version=2.13. > [warn] m.mapValues(mapChild).view.force.toMap > [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-45694: Assignee: Tengfei Huang > Fix `method signum in trait ScalaNumberProxy is deprecated` > --- > > Key: SPARK-45694 > URL: https://issues.apache.org/jira/browse/SPARK-45694 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Tengfei Huang >Priority: Minor > Labels: pull-request-available > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25: > method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use > `sign` method instead > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc, > origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0 > [warn] val uc = useCount.signum > [warn] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45694. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43637 [https://github.com/apache/spark/pull/43637] > Fix `method signum in trait ScalaNumberProxy is deprecated` > --- > > Key: SPARK-45694 > URL: https://issues.apache.org/jira/browse/SPARK-45694 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Tengfei Huang >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25: > method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use > `sign` method instead > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc, > origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0 > [warn] val uc = useCount.signum > [warn] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45687) Fix `Passing an explicit array value to a Scala varargs method is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45687: --- Labels: pull-request-available (was: ) > Fix `Passing an explicit array value to a Scala varargs method is deprecated` > - > > Key: SPARK-45687 > URL: https://issues.apache.org/jira/browse/SPARK-45687 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala:945:21: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.AggregationQuerySuite, version=2.13.0 > [warn] df.agg(udaf(allColumns: _*)), > [warn] ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:156:48: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.agg(aggFunctions.head, aggFunctions.tail: _*), > [warn] ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:161:76: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.groupBy($"id" % 4 as "mod").agg(aggFunctions.head, > aggFunctions.tail: _*), > [warn] > ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:171:50: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.agg(aggFunctions.head, aggFunctions.tail: _*), > [warn] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45768) Make faulthandler a runtime configuration for Python execution in SQL
[ https://issues.apache.org/jira/browse/SPARK-45768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45768. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43635 [https://github.com/apache/spark/pull/43635] > Make faulthandler a runtime configuration for Python execution in SQL > - > > Key: SPARK-45768 > URL: https://issues.apache.org/jira/browse/SPARK-45768 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > faulthanlder feature within PySpark is really useful especially to debug an > errors that regular Python interpreter cannot catch out of the box such as a > segmentation fault errors, see also > https://github.com/apache/spark/pull/43600. It would be very useful to > convert this as a runtime configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45768) Make faulthandler a runtime configuration for Python execution in SQL
[ https://issues.apache.org/jira/browse/SPARK-45768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45768: Assignee: Hyukjin Kwon > Make faulthandler a runtime configuration for Python execution in SQL > - > > Key: SPARK-45768 > URL: https://issues.apache.org/jira/browse/SPARK-45768 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > faulthanlder feature within PySpark is really useful especially to debug an > errors that regular Python interpreter cannot catch out of the box such as a > segmentation fault errors, see also > https://github.com/apache/spark/pull/43600. It would be very useful to > convert this as a runtime configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44065) Optimize BroadcastHashJoin skew when localShuffleReader is disabled
[ https://issues.apache.org/jira/browse/SPARK-44065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44065: --- Labels: pull-request-available (was: ) > Optimize BroadcastHashJoin skew when localShuffleReader is disabled > --- > > Key: SPARK-44065 > URL: https://issues.apache.org/jira/browse/SPARK-44065 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Zhen Wang >Priority: Major > Labels: pull-request-available > > In RemoteShuffleService services such as uniffle and celeborn, it is > recommended to disable localShuffleReader by default for better performance. > But it may make BroadcastHashJoin skewed, so I want to optimize > BroadcastHashJoin skew in OptimizeSkewedJoin when localShuffleReader is > disabled. > > Refer to: > https://github.com/apache/incubator-celeborn#spark-configuration > https://github.com/apache/incubator-uniffle/blob/master/docs/client_guide.md#support-spark-aqe -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44517) first operator should respect the nullability of child expression as well as ignoreNulls option
[ https://issues.apache.org/jira/browse/SPARK-44517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44517: --- Labels: pull-request-available (was: ) > first operator should respect the nullability of child expression as well as > ignoreNulls option > --- > > Key: SPARK-44517 > URL: https://issues.apache.org/jira/browse/SPARK-44517 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.2, > 3.4.0, 3.4.1 >Reporter: Nan Zhu >Priority: Major > Labels: pull-request-available > > I found the following problem when using Spark recently: > > {code:java} > // code placeholder > import spark.implicits._ > val s = Seq((1.2, "s", 2.2)).toDF("v1", "v2", "v3") > val schema = StructType(Seq(StructField("v1", DoubleType, nullable = > false),StructField("v2", StringType, nullable = true),StructField("v3", > DoubleType, nullable = false))) > val df = spark.createDataFrame(s.rdd, schema)val inputDF = > val inputDF = df.dropDuplicates("v3") > spark.sql("CREATE TABLE local.db.table (\n v1 DOUBLE NOT NULL,\n v2 STRING, > v3 DOUBLE NOT NULL)") > inputDF.write.mode("overwrite").format("iceberg").save("local.db.table") > {code} > > > when I use the above code to write to iceberg (i guess Delta Lake will have > the same problem) , I got very confusing exception > {code:java} > Exception in thread "main" java.lang.IllegalArgumentException: Cannot write > incompatible dataset to table with schema: > table > { 1: v1: required double 2: v2: optional string 3: v3: required double} > Provided schema: > table { 1: v1: optional double 2: v2: optional string 3: v3: required > double} {code} > basically it complains that we have v1 as the nullable column in our > `inputDF` above which is not allowed since we created table with the v1 as > not nullable. The confusion comes from that, if we check the schema with > printSchema() of inputDF, v1 is not nullable > {noformat} > root > |-- v1: double (nullable = false) > |-- v2: string (nullable = true) > |-- v3: double (nullable = false){noformat} > Clearly, something changed the v1's nullability unexpectedly! > > After some debugging I found that the key is that dropDuplicates("v3"). In > optimization phase, we have ReplaceDeduplicateWithAggregate to replace the > Deduplicate with aggregate on v3 and run first() over all other columns. > However, first() operator has hard coded nullable as always "true" which is > the source of changed nullability of v1 > > this is a very confusing behavior of Spark, and probably no one really > noticed as we do not care too much without the new table formats like delta > lake and iceberg which can make nullability check correctly. Nowadays, we > users adopt them more and more, this is surfaced up > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4836) Web UI should display separate information for all stage attempts
[ https://issues.apache.org/jira/browse/SPARK-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-4836: -- Labels: bulk-closed pull-request-available (was: bulk-closed) > Web UI should display separate information for all stage attempts > - > > Key: SPARK-4836 > URL: https://issues.apache.org/jira/browse/SPARK-4836 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.1.1, 1.2.0 >Reporter: Josh Rosen >Priority: Major > Labels: bulk-closed, pull-request-available > > I've run into some cases where the web UI job page will say that a job took > 12 minutes but the sum of that job's stage times is something like 10 > seconds. In this case, it turns out that my job ran a stage to completion > (which took, say, 5 minutes) then lost some partitions of that stage and had > to run a new stage attempt to recompute one or two tasks from that stage. As > a result, the latest attempt for that stage reports only one or two tasks. > In the web UI, it seems that we only show the latest stage attempt, not all > attempts, which can lead to confusing / misleading displays for jobs with > failed / partially-recomputed stages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-4836) Web UI should display separate information for all stage attempts
[ https://issues.apache.org/jira/browse/SPARK-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reopened SPARK-4836: --- > Web UI should display separate information for all stage attempts > - > > Key: SPARK-4836 > URL: https://issues.apache.org/jira/browse/SPARK-4836 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.1.1, 1.2.0 >Reporter: Josh Rosen >Priority: Major > Labels: bulk-closed > > I've run into some cases where the web UI job page will say that a job took > 12 minutes but the sum of that job's stage times is something like 10 > seconds. In this case, it turns out that my job ran a stage to completion > (which took, say, 5 minutes) then lost some partitions of that stage and had > to run a new stage attempt to recompute one or two tasks from that stage. As > a result, the latest attempt for that stage reports only one or two tasks. > In the web UI, it seems that we only show the latest stage attempt, not all > attempts, which can lead to confusing / misleading displays for jobs with > failed / partially-recomputed stages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45773) Refine docstring of `SparkSession.builder.config`
[ https://issues.apache.org/jira/browse/SPARK-45773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45773: --- Labels: pull-request-available (was: ) > Refine docstring of `SparkSession.builder.config` > - > > Key: SPARK-45773 > URL: https://issues.apache.org/jira/browse/SPARK-45773 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Refine the docstring of SparkSession.builder.config > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45773) Refine docstring of `SparkSession.builder.config`
Allison Wang created SPARK-45773: Summary: Refine docstring of `SparkSession.builder.config` Key: SPARK-45773 URL: https://issues.apache.org/jira/browse/SPARK-45773 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Refine the docstring of SparkSession.builder.config -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45757) Avoid re-computation of NNZ in Binarizer
[ https://issues.apache.org/jira/browse/SPARK-45757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45757: - Assignee: Ruifeng Zheng > Avoid re-computation of NNZ in Binarizer > > > Key: SPARK-45757 > URL: https://issues.apache.org/jira/browse/SPARK-45757 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45757) Avoid re-computation of NNZ in Binarizer
[ https://issues.apache.org/jira/browse/SPARK-45757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45757. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43619 [https://github.com/apache/spark/pull/43619] > Avoid re-computation of NNZ in Binarizer > > > Key: SPARK-45757 > URL: https://issues.apache.org/jira/browse/SPARK-45757 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45718) Remove remaining deprecated Pandas APIs from Spark 3.4.0
[ https://issues.apache.org/jira/browse/SPARK-45718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45718. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43581 [https://github.com/apache/spark/pull/43581] > Remove remaining deprecated Pandas APIs from Spark 3.4.0 > > > Key: SPARK-45718 > URL: https://issues.apache.org/jira/browse/SPARK-45718 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Remove remaining deprecated Pandas APIs from Spark 3.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45718) Remove remaining deprecated Pandas APIs from Spark 3.4.0
[ https://issues.apache.org/jira/browse/SPARK-45718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45718: - Assignee: Haejoon Lee > Remove remaining deprecated Pandas APIs from Spark 3.4.0 > > > Key: SPARK-45718 > URL: https://issues.apache.org/jira/browse/SPARK-45718 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Remove remaining deprecated Pandas APIs from Spark 3.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45771) Enable spark.eventLog.rolling.enabled by default
[ https://issues.apache.org/jira/browse/SPARK-45771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45771: - Assignee: Dongjoon Hyun > Enable spark.eventLog.rolling.enabled by default > > > Key: SPARK-45771 > URL: https://issues.apache.org/jira/browse/SPARK-45771 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45771) Enable spark.eventLog.rolling.enabled by default
[ https://issues.apache.org/jira/browse/SPARK-45771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45771. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43638 [https://github.com/apache/spark/pull/43638] > Enable spark.eventLog.rolling.enabled by default > > > Key: SPARK-45771 > URL: https://issues.apache.org/jira/browse/SPARK-45771 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45772) Add additional test coverage for input_file_name() expr + Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-45772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Utkarsh Agarwal updated SPARK-45772: Summary: Add additional test coverage for input_file_name() expr + Python UDFs (was: Add additional test coverage for input_file_name_expr) > Add additional test coverage for input_file_name() expr + Python UDFs > - > > Key: SPARK-45772 > URL: https://issues.apache.org/jira/browse/SPARK-45772 > Project: Spark > Issue Type: Task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Utkarsh Agarwal >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-44705 introduced changes to the > evaluation of the `input_file_name()` expression in the presence of Python > UDFs. This was done to maintain the behavior of the `input_file_name()` > expression when the execution model of the PythonRunner was made > single-threaded by https://issues.apache.org/jira/browse/SPARK-44705. We > should add additional test coverage for `input_file_name()` + Python UDFs to > prevent future breakages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45772) Add additional test coverage for input_file_name_expr
Utkarsh Agarwal created SPARK-45772: --- Summary: Add additional test coverage for input_file_name_expr Key: SPARK-45772 URL: https://issues.apache.org/jira/browse/SPARK-45772 Project: Spark Issue Type: Task Components: PySpark Affects Versions: 4.0.0 Reporter: Utkarsh Agarwal https://issues.apache.org/jira/browse/SPARK-44705 introduced changes to the evaluation of the `input_file_name()` expression in the presence of Python UDFs. This was done to maintain the behavior of the `input_file_name()` expression when the execution model of the PythonRunner was made single-threaded by https://issues.apache.org/jira/browse/SPARK-44705. We should add additional test coverage for `input_file_name()` + Python UDFs to prevent future breakages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45771) Enable spark.eventLog.rolling.enabled by default
[ https://issues.apache.org/jira/browse/SPARK-45771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45771: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Improvement) > Enable spark.eventLog.rolling.enabled by default > > > Key: SPARK-45771 > URL: https://issues.apache.org/jira/browse/SPARK-45771 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45771) Enable spark.eventLog.rolling.enabled by default
[ https://issues.apache.org/jira/browse/SPARK-45771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45771: --- Labels: pull-request-available (was: ) > Enable spark.eventLog.rolling.enabled by default > > > Key: SPARK-45771 > URL: https://issues.apache.org/jira/browse/SPARK-45771 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45771) Enable spark.eventLog.rolling.enabled by default
Dongjoon Hyun created SPARK-45771: - Summary: Enable spark.eventLog.rolling.enabled by default Key: SPARK-45771 URL: https://issues.apache.org/jira/browse/SPARK-45771 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45767) Delete `TimeStampedHashMap` and its UT
[ https://issues.apache.org/jira/browse/SPARK-45767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45767. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43633 [https://github.com/apache/spark/pull/43633] > Delete `TimeStampedHashMap` and its UT > -- > > Key: SPARK-45767 > URL: https://issues.apache.org/jira/browse/SPARK-45767 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45767) Delete `TimeStampedHashMap` and its UT
[ https://issues.apache.org/jira/browse/SPARK-45767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-45767: Assignee: BingKun Pan > Delete `TimeStampedHashMap` and its UT > -- > > Key: SPARK-45767 > URL: https://issues.apache.org/jira/browse/SPARK-45767 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45694: --- Labels: pull-request-available (was: ) > Fix `method signum in trait ScalaNumberProxy is deprecated` > --- > > Key: SPARK-45694 > URL: https://issues.apache.org/jira/browse/SPARK-45694 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25: > method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use > `sign` method instead > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc, > origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0 > [warn] val uc = useCount.signum > [warn] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45770) Fix column resolution in DataFrame.drop
[ https://issues.apache.org/jira/browse/SPARK-45770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45770: --- Labels: pull-request-available (was: ) > Fix column resolution in DataFrame.drop > --- > > Key: SPARK-45770 > URL: https://issues.apache.org/jira/browse/SPARK-45770 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41454) Support Python 3.11
[ https://issues.apache.org/jira/browse/SPARK-41454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-41454: --- Labels: pull-request-available (was: ) > Support Python 3.11 > --- > > Key: SPARK-41454 > URL: https://issues.apache.org/jira/browse/SPARK-41454 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45769) data retrieval fails on executors with spark connect
Steven Ottens created SPARK-45769: - Summary: data retrieval fails on executors with spark connect Key: SPARK-45769 URL: https://issues.apache.org/jira/browse/SPARK-45769 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Steven Ottens We have an OpenShift cluster with Spark and JupyterHub and we use Spark-Connect to access Spark from within Jupyter. This worked fine with Spark 3.4.1. However after upgrading to Spark 3.5.0 we were not able to access any data in our Delta Tables through Spark. Initially I assumed it was a bug in Delta: [https://github.com/delta-io/delta/issues/2235] The actual error is {code:java} SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 13) (172.31.15.72 executor 4): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD{code} However after further investigation I discovered that this is a regression in Spark 3.5.0. The issue is similar to SPARK-36917, however I am not using any custom functions, nor any other classes than spark-connect, and this setup used to work in 3.4.1. The issue only occurs when remote executors are used in a kubernetes environment. Running a plain Spark-Connect eg {code:java} ./sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.5.0{code} doesn't produce the error. The issue occurs both in a full OpenShift cluster as in a tiny minikube setup. The steps to reproduce are based on the minikube setup. You need to have a minimal Spark 3.5.0 setup with 1 driver and at least 1 executor and use python to access data through Spark. The query I used to test this is {code:java} from pyspark.sql import SparkSession logFile = '/opt/spark/work-dir/data.csv' spark = SparkSession.builder.remote('sc://spark-connect').getOrCreate() df = spark.read.csv(logFile) df.count() {code} However it doesn't matter if the data is local, or remote on a S3 storage, nor if the data is plain text, CSV or Delta Table. h3. Steps to reproduce: # Install minikube # Create a service account 'spark' {code:java} kubectl create sa spark{code} # Bind the 'edit' role to the service account {code:java} kubectl create rolebinding spark-edit \ --clusterrole=edit \ --serviceaccount=default:spark \ --namespace=default{code} # Create a service for spark {code:java} kubectl create -f service.yml{code} # Create a Spark-Connect deployment with the default Spark docker image: [https://hub.docker.com/_/spark] (do change the deployment.yml to point to the kubernetes API endpoint {code:java} kubectl create -f deployment.yml{code} # Add data to both the executor and the driver pods, e.g. login on the terminal of the pods and run on both pods {code:java} touch data.csv echo id,name > data.csv echo 1,2 >> data.csv {code} # Start a spark-remote session to access the newly created data. I logged in on the driver pod and installed the necessary python packages: {code:java} python3 -m pip install pandas pyspark grpcio-tools grpcio-status pyarrow{code} Started a python shell and executed: {code:java} from pyspark.sql import SparkSession logFile = '/opt/spark/work-dir/data.csv' spark = SparkSession.builder.remote('sc://spark-connect').getOrCreate() df = spark.read.csv(logFile) df.count() {code} h3. Necessary files: Service.yml: {code:java} apiVersion: v1 kind: Service metadata: labels: app: spark-connect name: spark-connect namespace: default spec: ipFamilies: - IPv4 ports: - name: connect-grpc protocol: TCP port: 15002 # Port the service listens on. targetPort: 15002 # Port on the backing pods to which the service forwards connections - name: sparkui protocol: TCP port: 4040 # Port the service listens on. targetPort: 4040 # Port on the backing pods to which the service forwards connections - name: spark-rpc protocol: TCP port: 7078 # Port the service listens on. targetPort: 7078 # Port on the backing pods to which the service forwards connections - name: blockmanager protocol: TCP port: 7079 # Port the service listens on. targetPort: 7079 # Port on the backing pods to which the service forwards connections internalTrafficPolicy: Cluster type: ClusterIP ipFamilyPolicy: SingleStack sessionAffinity: None selector: app: spark-connect {code} deployment.yml: (do replace the spark.master URL with the correct one for your setup) {code:java} kind: Deployment apiVersion: apps/v1 metadata: name: spark-connect namespace: default uid:
[jira] [Updated] (SPARK-45768) Make faulthandler a runtime configuration for Python execution in SQL
[ https://issues.apache.org/jira/browse/SPARK-45768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45768: --- Labels: pull-request-available (was: ) > Make faulthandler a runtime configuration for Python execution in SQL > - > > Key: SPARK-45768 > URL: https://issues.apache.org/jira/browse/SPARK-45768 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > faulthanlder feature within PySpark is really useful especially to debug an > errors that regular Python interpreter cannot catch out of the box such as a > segmentation fault errors, see also > https://github.com/apache/spark/pull/43600. It would be very useful to > convert this as a runtime configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45768) Make faulthandler a runtime configuration for Python execution in SQL
[ https://issues.apache.org/jira/browse/SPARK-45768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-45768: - Summary: Make faulthandler a runtime configuration for Python execution in SQL (was: Make faulthanlder a runtime configuration for Python execution in SQL) > Make faulthandler a runtime configuration for Python execution in SQL > - > > Key: SPARK-45768 > URL: https://issues.apache.org/jira/browse/SPARK-45768 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > faulthanlder feature within PySpark is really useful especially to debug an > errors that regular Python interpreter cannot catch out of the box such as a > segmentation fault errors, see also > https://github.com/apache/spark/pull/43600. It would be very useful to > convert this as a runtime configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45768) Make faulthanlder a runtime configuration for Python execution in SQL
Hyukjin Kwon created SPARK-45768: Summary: Make faulthanlder a runtime configuration for Python execution in SQL Key: SPARK-45768 URL: https://issues.apache.org/jira/browse/SPARK-45768 Project: Spark Issue Type: Improvement Components: PySpark, SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon faulthanlder feature within PySpark is really useful especially to debug an errors that regular Python interpreter cannot catch out of the box such as a segmentation fault errors, see also https://github.com/apache/spark/pull/43600. It would be very useful to convert this as a runtime configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org