[jira] [Commented] (SPARK-38911) 'test 1 resource profile' throws exception when running it in IDEA separately
[ https://issues.apache.org/jira/browse/SPARK-38911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522646#comment-17522646 ] Bobby Wang commented on SPARK-38911: [~tgraves] Could you help to check that > 'test 1 resource profile' throws exception when running it in IDEA separately > - > > Key: SPARK-38911 > URL: https://issues.apache.org/jira/browse/SPARK-38911 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.2.1 >Reporter: Bobby Wang >Priority: Minor > > The test `test 1 resource profile` of DAGSchedulerSuite will fail if I run it > in IDEA separately. > > The root cause is the ResourceProfile is initialized before SparkContext, and > it will take `DEFAULT_RESOURCE_PROFILE_ID` as the resource profile id. But > the test asserts that the id is not equal to DEFAULT_RESOURCE_PROFILE_ID. > > {code:java} > assert(expectedid.get != ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID){code} > > The exception is like below, > > {code:java} > 0 equaled 0 > ScalaTestFailureLocation: org.apache.spark.scheduler.DAGSchedulerSuite at > (DAGSchedulerSuite.scala:3269) > org.scalatest.exceptions.TestFailedException: 0 equaled 0 > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > at > org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$191(DAGSchedulerSuite.scala:3269) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85){code} > This issue does not exist when running all DAGSchedulerSuite one by one, > since the SparkContext will be initialized at the very beginning. > > I will submit a patch to fix it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38911) 'test 1 resource profile' throws exception when running it in IDEA separately
[ https://issues.apache.org/jira/browse/SPARK-38911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522645#comment-17522645 ] Bobby Wang commented on SPARK-38911: Just submited an PR for this issue https://github.com/apache/spark/pull/36208 > 'test 1 resource profile' throws exception when running it in IDEA separately > - > > Key: SPARK-38911 > URL: https://issues.apache.org/jira/browse/SPARK-38911 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.2.1 >Reporter: Bobby Wang >Priority: Minor > > The test `test 1 resource profile` of DAGSchedulerSuite will fail if I run it > in IDEA separately. > > The root cause is the ResourceProfile is initialized before SparkContext, and > it will take `DEFAULT_RESOURCE_PROFILE_ID` as the resource profile id. But > the test asserts that the id is not equal to DEFAULT_RESOURCE_PROFILE_ID. > > {code:java} > assert(expectedid.get != ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID){code} > > The exception is like below, > > {code:java} > 0 equaled 0 > ScalaTestFailureLocation: org.apache.spark.scheduler.DAGSchedulerSuite at > (DAGSchedulerSuite.scala:3269) > org.scalatest.exceptions.TestFailedException: 0 equaled 0 > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > at > org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$191(DAGSchedulerSuite.scala:3269) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85){code} > This issue does not exist when running all DAGSchedulerSuite one by one, > since the SparkContext will be initialized at the very beginning. > > I will submit a patch to fix it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38911) 'test 1 resource profile' throws exception when running it in IDEA separately
[ https://issues.apache.org/jira/browse/SPARK-38911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522644#comment-17522644 ] Apache Spark commented on SPARK-38911: -- User 'wbo4958' has created a pull request for this issue: https://github.com/apache/spark/pull/36208 > 'test 1 resource profile' throws exception when running it in IDEA separately > - > > Key: SPARK-38911 > URL: https://issues.apache.org/jira/browse/SPARK-38911 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.2.1 >Reporter: Bobby Wang >Priority: Minor > > The test `test 1 resource profile` of DAGSchedulerSuite will fail if I run it > in IDEA separately. > > The root cause is the ResourceProfile is initialized before SparkContext, and > it will take `DEFAULT_RESOURCE_PROFILE_ID` as the resource profile id. But > the test asserts that the id is not equal to DEFAULT_RESOURCE_PROFILE_ID. > > {code:java} > assert(expectedid.get != ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID){code} > > The exception is like below, > > {code:java} > 0 equaled 0 > ScalaTestFailureLocation: org.apache.spark.scheduler.DAGSchedulerSuite at > (DAGSchedulerSuite.scala:3269) > org.scalatest.exceptions.TestFailedException: 0 equaled 0 > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > at > org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$191(DAGSchedulerSuite.scala:3269) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85){code} > This issue does not exist when running all DAGSchedulerSuite one by one, > since the SparkContext will be initialized at the very beginning. > > I will submit a patch to fix it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38911) 'test 1 resource profile' throws exception when running it in IDEA separately
[ https://issues.apache.org/jira/browse/SPARK-38911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522643#comment-17522643 ] Apache Spark commented on SPARK-38911: -- User 'wbo4958' has created a pull request for this issue: https://github.com/apache/spark/pull/36208 > 'test 1 resource profile' throws exception when running it in IDEA separately > - > > Key: SPARK-38911 > URL: https://issues.apache.org/jira/browse/SPARK-38911 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.2.1 >Reporter: Bobby Wang >Priority: Minor > > The test `test 1 resource profile` of DAGSchedulerSuite will fail if I run it > in IDEA separately. > > The root cause is the ResourceProfile is initialized before SparkContext, and > it will take `DEFAULT_RESOURCE_PROFILE_ID` as the resource profile id. But > the test asserts that the id is not equal to DEFAULT_RESOURCE_PROFILE_ID. > > {code:java} > assert(expectedid.get != ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID){code} > > The exception is like below, > > {code:java} > 0 equaled 0 > ScalaTestFailureLocation: org.apache.spark.scheduler.DAGSchedulerSuite at > (DAGSchedulerSuite.scala:3269) > org.scalatest.exceptions.TestFailedException: 0 equaled 0 > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > at > org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$191(DAGSchedulerSuite.scala:3269) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85){code} > This issue does not exist when running all DAGSchedulerSuite one by one, > since the SparkContext will be initialized at the very beginning. > > I will submit a patch to fix it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38911) 'test 1 resource profile' throws exception when running it in IDEA separately
[ https://issues.apache.org/jira/browse/SPARK-38911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38911: Assignee: Apache Spark > 'test 1 resource profile' throws exception when running it in IDEA separately > - > > Key: SPARK-38911 > URL: https://issues.apache.org/jira/browse/SPARK-38911 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.2.1 >Reporter: Bobby Wang >Assignee: Apache Spark >Priority: Minor > > The test `test 1 resource profile` of DAGSchedulerSuite will fail if I run it > in IDEA separately. > > The root cause is the ResourceProfile is initialized before SparkContext, and > it will take `DEFAULT_RESOURCE_PROFILE_ID` as the resource profile id. But > the test asserts that the id is not equal to DEFAULT_RESOURCE_PROFILE_ID. > > {code:java} > assert(expectedid.get != ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID){code} > > The exception is like below, > > {code:java} > 0 equaled 0 > ScalaTestFailureLocation: org.apache.spark.scheduler.DAGSchedulerSuite at > (DAGSchedulerSuite.scala:3269) > org.scalatest.exceptions.TestFailedException: 0 equaled 0 > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > at > org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$191(DAGSchedulerSuite.scala:3269) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85){code} > This issue does not exist when running all DAGSchedulerSuite one by one, > since the SparkContext will be initialized at the very beginning. > > I will submit a patch to fix it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38911) 'test 1 resource profile' throws exception when running it in IDEA separately
[ https://issues.apache.org/jira/browse/SPARK-38911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38911: Assignee: (was: Apache Spark) > 'test 1 resource profile' throws exception when running it in IDEA separately > - > > Key: SPARK-38911 > URL: https://issues.apache.org/jira/browse/SPARK-38911 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.2.1 >Reporter: Bobby Wang >Priority: Minor > > The test `test 1 resource profile` of DAGSchedulerSuite will fail if I run it > in IDEA separately. > > The root cause is the ResourceProfile is initialized before SparkContext, and > it will take `DEFAULT_RESOURCE_PROFILE_ID` as the resource profile id. But > the test asserts that the id is not equal to DEFAULT_RESOURCE_PROFILE_ID. > > {code:java} > assert(expectedid.get != ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID){code} > > The exception is like below, > > {code:java} > 0 equaled 0 > ScalaTestFailureLocation: org.apache.spark.scheduler.DAGSchedulerSuite at > (DAGSchedulerSuite.scala:3269) > org.scalatest.exceptions.TestFailedException: 0 equaled 0 > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > at > org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$191(DAGSchedulerSuite.scala:3269) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85){code} > This issue does not exist when running all DAGSchedulerSuite one by one, > since the SparkContext will be initialized at the very beginning. > > I will submit a patch to fix it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38911) 'test 1 resource profile' throws exception when running it in IDEA separately
Bobby Wang created SPARK-38911: -- Summary: 'test 1 resource profile' throws exception when running it in IDEA separately Key: SPARK-38911 URL: https://issues.apache.org/jira/browse/SPARK-38911 Project: Spark Issue Type: Test Components: Tests Affects Versions: 3.2.1 Reporter: Bobby Wang The test `test 1 resource profile` of DAGSchedulerSuite will fail if I run it in IDEA separately. The root cause is the ResourceProfile is initialized before SparkContext, and it will take `DEFAULT_RESOURCE_PROFILE_ID` as the resource profile id. But the test asserts that the id is not equal to DEFAULT_RESOURCE_PROFILE_ID. {code:java} assert(expectedid.get != ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID){code} The exception is like below, {code:java} 0 equaled 0 ScalaTestFailureLocation: org.apache.spark.scheduler.DAGSchedulerSuite at (DAGSchedulerSuite.scala:3269) org.scalatest.exceptions.TestFailedException: 0 equaled 0 at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) at org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$191(DAGSchedulerSuite.scala:3269) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85){code} This issue does not exist when running all DAGSchedulerSuite one by one, since the SparkContext will be initialized at the very beginning. I will submit a patch to fix it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38910) Clean sparkStaging dir when WAIT_FOR_APP_COMPLETION is false too
[ https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38910: Assignee: (was: Apache Spark) > Clean sparkStaging dir when WAIT_FOR_APP_COMPLETION is false too > > > Key: SPARK-38910 > URL: https://issues.apache.org/jira/browse/SPARK-38910 > Project: Spark > Issue Type: Task > Components: YARN >Affects Versions: 3.2.1, 3.3.0 >Reporter: angerszhu >Priority: Major > > {code:java} > def run(): Unit = { > submitApplication() > if (!launcherBackend.isConnected() && fireAndForget) { > val report = getApplicationReport(appId) > val state = report.getYarnApplicationState > logInfo(s"Application report for $appId (state: $state)") > logInfo(formatReportDetails(report, getDriverLogsLink(report))) > if (state == YarnApplicationState.FAILED || state == > YarnApplicationState.KILLED) { > throw new SparkException(s"Application $appId finished with status: > $state") > } > } else { > val YarnAppReport(appState, finalState, diags) = > monitorApplication(appId) > if (appState == YarnApplicationState.FAILED || finalState == > FinalApplicationStatus.FAILED) { > var amContainerSucceed = false > val amContainerExitMsg = s"AM Container for " + > > s"${yarnClient.getApplicationReport(appId).getCurrentApplicationAttemptId} " + > s"exited with exitCode: 0" > diags.foreach { err => > logError(s"Application diagnostics message: $err") > if (err.contains(amContainerExitMsg)) { > amContainerSucceed = true > > {code} > Not clean the staging dir when match case > {code:jave} > !launcherBackend.isConnected() && fireAndForget > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38910) Clean sparkStaging dir when WAIT_FOR_APP_COMPLETION is false too
[ https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522638#comment-17522638 ] Apache Spark commented on SPARK-38910: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/36207 > Clean sparkStaging dir when WAIT_FOR_APP_COMPLETION is false too > > > Key: SPARK-38910 > URL: https://issues.apache.org/jira/browse/SPARK-38910 > Project: Spark > Issue Type: Task > Components: YARN >Affects Versions: 3.2.1, 3.3.0 >Reporter: angerszhu >Priority: Major > > {code:java} > def run(): Unit = { > submitApplication() > if (!launcherBackend.isConnected() && fireAndForget) { > val report = getApplicationReport(appId) > val state = report.getYarnApplicationState > logInfo(s"Application report for $appId (state: $state)") > logInfo(formatReportDetails(report, getDriverLogsLink(report))) > if (state == YarnApplicationState.FAILED || state == > YarnApplicationState.KILLED) { > throw new SparkException(s"Application $appId finished with status: > $state") > } > } else { > val YarnAppReport(appState, finalState, diags) = > monitorApplication(appId) > if (appState == YarnApplicationState.FAILED || finalState == > FinalApplicationStatus.FAILED) { > var amContainerSucceed = false > val amContainerExitMsg = s"AM Container for " + > > s"${yarnClient.getApplicationReport(appId).getCurrentApplicationAttemptId} " + > s"exited with exitCode: 0" > diags.foreach { err => > logError(s"Application diagnostics message: $err") > if (err.contains(amContainerExitMsg)) { > amContainerSucceed = true > > {code} > Not clean the staging dir when match case > {code:jave} > !launcherBackend.isConnected() && fireAndForget > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38910) Clean sparkStaging dir when WAIT_FOR_APP_COMPLETION is false too
[ https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38910: Assignee: Apache Spark > Clean sparkStaging dir when WAIT_FOR_APP_COMPLETION is false too > > > Key: SPARK-38910 > URL: https://issues.apache.org/jira/browse/SPARK-38910 > Project: Spark > Issue Type: Task > Components: YARN >Affects Versions: 3.2.1, 3.3.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > {code:java} > def run(): Unit = { > submitApplication() > if (!launcherBackend.isConnected() && fireAndForget) { > val report = getApplicationReport(appId) > val state = report.getYarnApplicationState > logInfo(s"Application report for $appId (state: $state)") > logInfo(formatReportDetails(report, getDriverLogsLink(report))) > if (state == YarnApplicationState.FAILED || state == > YarnApplicationState.KILLED) { > throw new SparkException(s"Application $appId finished with status: > $state") > } > } else { > val YarnAppReport(appState, finalState, diags) = > monitorApplication(appId) > if (appState == YarnApplicationState.FAILED || finalState == > FinalApplicationStatus.FAILED) { > var amContainerSucceed = false > val amContainerExitMsg = s"AM Container for " + > > s"${yarnClient.getApplicationReport(appId).getCurrentApplicationAttemptId} " + > s"exited with exitCode: 0" > diags.foreach { err => > logError(s"Application diagnostics message: $err") > if (err.contains(amContainerExitMsg)) { > amContainerSucceed = true > > {code} > Not clean the staging dir when match case > {code:jave} > !launcherBackend.isConnected() && fireAndForget > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38910) Clean sparkStaging dir when WAIT_FOR_APP_COMPLETION is false too
angerszhu created SPARK-38910: - Summary: Clean sparkStaging dir when WAIT_FOR_APP_COMPLETION is false too Key: SPARK-38910 URL: https://issues.apache.org/jira/browse/SPARK-38910 Project: Spark Issue Type: Task Components: YARN Affects Versions: 3.2.1, 3.3.0 Reporter: angerszhu {code:java} def run(): Unit = { submitApplication() if (!launcherBackend.isConnected() && fireAndForget) { val report = getApplicationReport(appId) val state = report.getYarnApplicationState logInfo(s"Application report for $appId (state: $state)") logInfo(formatReportDetails(report, getDriverLogsLink(report))) if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) { throw new SparkException(s"Application $appId finished with status: $state") } } else { val YarnAppReport(appState, finalState, diags) = monitorApplication(appId) if (appState == YarnApplicationState.FAILED || finalState == FinalApplicationStatus.FAILED) { var amContainerSucceed = false val amContainerExitMsg = s"AM Container for " + s"${yarnClient.getApplicationReport(appId).getCurrentApplicationAttemptId} " + s"exited with exitCode: 0" diags.foreach { err => logError(s"Application diagnostics message: $err") if (err.contains(amContainerExitMsg)) { amContainerSucceed = true {code} Not clean the staging dir when match case {code:jave} !launcherBackend.isConnected() && fireAndForget {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37814) Migrating from log4j 1 to log4j 2
[ https://issues.apache.org/jira/browse/SPARK-37814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522630#comment-17522630 ] Dongjoon Hyun edited comment on SPARK-37814 at 4/15/22 3:07 AM: BTW, one more tip to you, [~brentwritescode]. This JIRA SPARK-37814 is already resolved in `branch-3.3` and irrelevant to your future suggestion for Apache Spark 3.4. Since you can open a new one for your suggestion, feel free to suggest anything on your own JIRA. You're welcome. {quote}If you think this is a good path forward for the Spark project, I'd be happy to make a Jira or GitHub issue for it if no one has yet. {quote} was (Author: dongjoon): BTW, one more tip to you, [~brentwritescode]. This JIRA SPARK-37814 is already resolved in `branch-3.3` and irrelevant to your future suggestion for Apache Spark 3.4. Since you can open a new one for your suggestion, free free to suggest anything on your own JIRA. You're welcome. {quote}If you think this is a good path forward for the Spark project, I'd be happy to make a Jira or GitHub issue for it if no one has yet. {quote} > Migrating from log4j 1 to log4j 2 > - > > Key: SPARK-37814 > URL: https://issues.apache.org/jira/browse/SPARK-37814 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 3.3.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: releasenotes > Fix For: 3.3.0 > > > This is umbrella ticket for all tasks related to migrating to log4j2. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37814) Migrating from log4j 1 to log4j 2
[ https://issues.apache.org/jira/browse/SPARK-37814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522630#comment-17522630 ] Dongjoon Hyun commented on SPARK-37814: --- BTW, one more tip to you, [~brentwritescode]. This JIRA SPARK-37814 is already resolved in `branch-3.3` and irrelevant to your future suggestion for Apache Spark 3.4. Since you can open a new one for your suggestion, free free to suggest anything on your own JIRA. You're welcome. {quote}If you think this is a good path forward for the Spark project, I'd be happy to make a Jira or GitHub issue for it if no one has yet. {quote} > Migrating from log4j 1 to log4j 2 > - > > Key: SPARK-37814 > URL: https://issues.apache.org/jira/browse/SPARK-37814 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 3.3.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: releasenotes > Fix For: 3.3.0 > > > This is umbrella ticket for all tasks related to migrating to log4j2. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37814) Migrating from log4j 1 to log4j 2
[ https://issues.apache.org/jira/browse/SPARK-37814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522627#comment-17522627 ] Dongjoon Hyun commented on SPARK-37814: --- Hi, [~brentwritescode]. Thank you for the suggestion but none of them are released yet, aren't they? - First, I don't think the future release plan of Apache Hadoop community could be the blocker for Apache Spark releases. As you know, Apache Spark 3.3+ already moved to log4j2. - For Hadoop 2, Apache Spark binary distribution is Hadoop 2.7.4 and we have no plan to upgrade to Hadoop 2.10.x. So, that doesn't look like a path for us unfortunately. - For Hadoop 3, I'm sure that Apache Spark community is going to try Apache Hadoop 3.3.4 with Apache Spark 3.4 timeframe. However, there is no guarantee in the open source community. Apache Hadoop 3.3.2 is also still under active testing and we might revert it back to old one during RC period. Hadoop is one of several key dependencies which we are considering seriously. For the Apache Hadoop releases, let's talk later when the real releases arrives to us so that we can play around them. > Migrating from log4j 1 to log4j 2 > - > > Key: SPARK-37814 > URL: https://issues.apache.org/jira/browse/SPARK-37814 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 3.3.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: releasenotes > Fix For: 3.3.0 > > > This is umbrella ticket for all tasks related to migrating to log4j2. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38909) Encapsulate LevelDB used by ExternalShuffleBlockResolver and YarnShuffleService as LocalDB
[ https://issues.apache.org/jira/browse/SPARK-38909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38909: Assignee: Apache Spark > Encapsulate LevelDB used by ExternalShuffleBlockResolver and > YarnShuffleService as LocalDB > -- > > Key: SPARK-38909 > URL: https://issues.apache.org/jira/browse/SPARK-38909 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > {{ExternalShuffleBlockResolver}} and {{YarnShuffleService}} use {{{}LevelDB > directly{}}}, this is not conducive to extending the use of {{RocksDB}} in > this scenario. This pr is encapsulated for expansibility. It will be the > pre-work of SPARK-3 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38909) Encapsulate LevelDB used by ExternalShuffleBlockResolver and YarnShuffleService as LocalDB
[ https://issues.apache.org/jira/browse/SPARK-38909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-38909: - Description: {{ExternalShuffleBlockResolver}} and {{YarnShuffleService}} use {{{}LevelDB directly{}}}, this is not conducive to extending the use of {{RocksDB}} in this scenario. This pr is encapsulated for expansibility. It will be the pre-work of SPARK-3 (was: {{ExternalShuffleBlockResolver}} and {{YarnShuffleService}} use {{{}LevelDB directly{}}}, this is not conducive to extending the use of {{RocksDB}} in this scenario. This pr is encapsulated for expansibility. It will be the pre pr of SPARK-3) > Encapsulate LevelDB used by ExternalShuffleBlockResolver and > YarnShuffleService as LocalDB > -- > > Key: SPARK-38909 > URL: https://issues.apache.org/jira/browse/SPARK-38909 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > {{ExternalShuffleBlockResolver}} and {{YarnShuffleService}} use {{{}LevelDB > directly{}}}, this is not conducive to extending the use of {{RocksDB}} in > this scenario. This pr is encapsulated for expansibility. It will be the > pre-work of SPARK-3 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38909) Encapsulate LevelDB used by ExternalShuffleBlockResolver and YarnShuffleService as LocalDB
[ https://issues.apache.org/jira/browse/SPARK-38909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522622#comment-17522622 ] Apache Spark commented on SPARK-38909: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/36200 > Encapsulate LevelDB used by ExternalShuffleBlockResolver and > YarnShuffleService as LocalDB > -- > > Key: SPARK-38909 > URL: https://issues.apache.org/jira/browse/SPARK-38909 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > {{ExternalShuffleBlockResolver}} and {{YarnShuffleService}} use {{{}LevelDB > directly{}}}, this is not conducive to extending the use of {{RocksDB}} in > this scenario. This pr is encapsulated for expansibility. It will be the > pre-work of SPARK-3 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38909) Encapsulate LevelDB used by ExternalShuffleBlockResolver and YarnShuffleService as LocalDB
[ https://issues.apache.org/jira/browse/SPARK-38909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38909: Assignee: (was: Apache Spark) > Encapsulate LevelDB used by ExternalShuffleBlockResolver and > YarnShuffleService as LocalDB > -- > > Key: SPARK-38909 > URL: https://issues.apache.org/jira/browse/SPARK-38909 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > {{ExternalShuffleBlockResolver}} and {{YarnShuffleService}} use {{{}LevelDB > directly{}}}, this is not conducive to extending the use of {{RocksDB}} in > this scenario. This pr is encapsulated for expansibility. It will be the > pre-work of SPARK-3 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38909) Encapsulate LevelDB used by ExternalShuffleBlockResolver and YarnShuffleService as LocalDB
Yang Jie created SPARK-38909: Summary: Encapsulate LevelDB used by ExternalShuffleBlockResolver and YarnShuffleService as LocalDB Key: SPARK-38909 URL: https://issues.apache.org/jira/browse/SPARK-38909 Project: Spark Issue Type: Improvement Components: Spark Core, YARN Affects Versions: 3.4.0 Reporter: Yang Jie {{ExternalShuffleBlockResolver}} and {{YarnShuffleService}} use {{{}LevelDB directly{}}}, this is not conducive to extending the use of {{RocksDB}} in this scenario. This pr is encapsulated for expansibility. It will be the pre pr of SPARK-3 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38908) Provide query context in runtime error of Casting from String to Number/Date/Timestamp/Boolean
[ https://issues.apache.org/jira/browse/SPARK-38908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522613#comment-17522613 ] Apache Spark commented on SPARK-38908: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/36206 > Provide query context in runtime error of Casting from String to > Number/Date/Timestamp/Boolean > -- > > Key: SPARK-38908 > URL: https://issues.apache.org/jira/browse/SPARK-38908 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38908) Provide query context in runtime error of Casting from String to Number/Date/Timestamp/Boolean
[ https://issues.apache.org/jira/browse/SPARK-38908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38908: Assignee: Gengliang Wang (was: Apache Spark) > Provide query context in runtime error of Casting from String to > Number/Date/Timestamp/Boolean > -- > > Key: SPARK-38908 > URL: https://issues.apache.org/jira/browse/SPARK-38908 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38908) Provide query context in runtime error of Casting from String to Number/Date/Timestamp/Boolean
[ https://issues.apache.org/jira/browse/SPARK-38908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38908: Assignee: Apache Spark (was: Gengliang Wang) > Provide query context in runtime error of Casting from String to > Number/Date/Timestamp/Boolean > -- > > Key: SPARK-38908 > URL: https://issues.apache.org/jira/browse/SPARK-38908 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38908) Provide query context in runtime error of Casting from String to Number/Date/Timestamp/Boolean
Gengliang Wang created SPARK-38908: -- Summary: Provide query context in runtime error of Casting from String to Number/Date/Timestamp/Boolean Key: SPARK-38908 URL: https://issues.apache.org/jira/browse/SPARK-38908 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Gengliang Wang Assignee: Gengliang Wang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38904) Low cost DataFrame schema swap util
[ https://issues.apache.org/jira/browse/SPARK-38904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522610#comment-17522610 ] Rafal Wojdyla commented on SPARK-38904: --- [~hyukjin.kwon] thanks for the comment, sounds good to me, just want to point out that at least in my case it's important that the metadata of the columns gets "updated". > Low cost DataFrame schema swap util > --- > > Key: SPARK-38904 > URL: https://issues.apache.org/jira/browse/SPARK-38904 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.1 >Reporter: Rafal Wojdyla >Priority: Major > > This question is related to [https://stackoverflow.com/a/37090151/1661491]. > Let's assume I have a pyspark DataFrame with certain schema, and I would like > to overwrite that schema with a new schema that I *know* is compatible, I > could do: > {code:python} > df: DataFrame > new_schema = ... > df.rdd.toDF(schema=new_schema) > {code} > Unfortunately this triggers computation as described in the link above. Is > there a way to do that at the metadata level (or lazy), without eagerly > triggering computation or conversions? > Edit, note: > * the schema can be arbitrarily complicated (nested etc) > * new schema includes updates to description, nullability and additional > metadata (bonus points for updates to the type) > * I would like to avoid writing a custom query expression generator, > *unless* there's one already built into Spark that can generate query based > on the schema/{{{}StructType{}}} > Copied from: > [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] > See POC of workaround/util in > [https://github.com/ravwojdyla/spark-schema-utils] > Also posted in > [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38907) Impl DataFrame.corrwith
[ https://issues.apache.org/jira/browse/SPARK-38907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38907: Assignee: Apache Spark > Impl DataFrame.corrwith > --- > > Key: SPARK-38907 > URL: https://issues.apache.org/jira/browse/SPARK-38907 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: zhengruifeng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38907) Impl DataFrame.corrwith
[ https://issues.apache.org/jira/browse/SPARK-38907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38907: Assignee: (was: Apache Spark) > Impl DataFrame.corrwith > --- > > Key: SPARK-38907 > URL: https://issues.apache.org/jira/browse/SPARK-38907 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: zhengruifeng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38907) Impl DataFrame.corrwith
[ https://issues.apache.org/jira/browse/SPARK-38907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522608#comment-17522608 ] Apache Spark commented on SPARK-38907: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/36205 > Impl DataFrame.corrwith > --- > > Key: SPARK-38907 > URL: https://issues.apache.org/jira/browse/SPARK-38907 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: zhengruifeng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38907) Impl DataFrame.corrwith
zhengruifeng created SPARK-38907: Summary: Impl DataFrame.corrwith Key: SPARK-38907 URL: https://issues.apache.org/jira/browse/SPARK-38907 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.0 Reporter: zhengruifeng -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38902) cast as char/varchar result is string, not expect data type
[ https://issues.apache.org/jira/browse/SPARK-38902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YuanGuanhu updated SPARK-38902: --- Issue Type: Improvement (was: Bug) > cast as char/varchar result is string, not expect data type > --- > > Key: SPARK-38902 > URL: https://issues.apache.org/jira/browse/SPARK-38902 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1, 3.3.0 >Reporter: YuanGuanhu >Priority: Major > > when cast column to char/varchar type, result is string, not expected data > type -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38904) Low cost DataFrame schema swap util
[ https://issues.apache.org/jira/browse/SPARK-38904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522599#comment-17522599 ] Hyukjin Kwon commented on SPARK-38904: -- I think we should have an API like DataFrame.select(StructType) so we don't need to trigger another ser/de via conversion between RDD and DataFrame. > Low cost DataFrame schema swap util > --- > > Key: SPARK-38904 > URL: https://issues.apache.org/jira/browse/SPARK-38904 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.1 >Reporter: Rafal Wojdyla >Priority: Major > > This question is related to [https://stackoverflow.com/a/37090151/1661491]. > Let's assume I have a pyspark DataFrame with certain schema, and I would like > to overwrite that schema with a new schema that I *know* is compatible, I > could do: > {code:python} > df: DataFrame > new_schema = ... > df.rdd.toDF(schema=new_schema) > {code} > Unfortunately this triggers computation as described in the link above. Is > there a way to do that at the metadata level (or lazy), without eagerly > triggering computation or conversions? > Edit, note: > * the schema can be arbitrarily complicated (nested etc) > * new schema includes updates to description, nullability and additional > metadata (bonus points for updates to the type) > * I would like to avoid writing a custom query expression generator, > *unless* there's one already built into Spark that can generate query based > on the schema/{{{}StructType{}}} > Copied from: > [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] > See POC of workaround/util in > [https://github.com/ravwojdyla/spark-schema-utils] > Also posted in > [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38905) Upgrade ORC to 1.6.14
[ https://issues.apache.org/jira/browse/SPARK-38905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38905: - Assignee: Dongjoon Hyun > Upgrade ORC to 1.6.14 > - > > Key: SPARK-38905 > URL: https://issues.apache.org/jira/browse/SPARK-38905 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.1 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38905) Upgrade ORC to 1.6.14
[ https://issues.apache.org/jira/browse/SPARK-38905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38905. --- Fix Version/s: 3.2.2 Resolution: Fixed Issue resolved by pull request 36204 [https://github.com/apache/spark/pull/36204] > Upgrade ORC to 1.6.14 > - > > Key: SPARK-38905 > URL: https://issues.apache.org/jira/browse/SPARK-38905 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.1 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.2 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37814) Migrating from log4j 1 to log4j 2
[ https://issues.apache.org/jira/browse/SPARK-37814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522589#comment-17522589 ] Brent commented on SPARK-37814: --- [~kabhwan] [~dongjoon] I happened to notice your conversation about seeing what Hadoop does with regards to maintenance versions and I was just looking at their GitHub and Jira a little while ago. They did indeed move to Reload4j for their 3.3.x, 3.2.x and 2.10.x release lines (while I believe they're moving to Logback for 3.4.x and beyond). For reference, here is the Jira: https://issues.apache.org/jira/browse/HADOOP-18088 And here are the pull requests: * Hadoop 2.10.2: [https://github.com/apache/hadoop/pull/4151] * Hadoop 3.2.4: [https://github.com/apache/hadoop/pull/4084] * Hadoop 3.3.4: [https://github.com/apache/hadoop/pull/4052] If you think this is a good path forward for the Spark project, I'd be happy to make a Jira or GitHub issue for it if no one has yet. > Migrating from log4j 1 to log4j 2 > - > > Key: SPARK-37814 > URL: https://issues.apache.org/jira/browse/SPARK-37814 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 3.3.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: releasenotes > Fix For: 3.3.0 > > > This is umbrella ticket for all tasks related to migrating to log4j2. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38823) Incorrect result of dataset reduceGroups in java
[ https://issues.apache.org/jira/browse/SPARK-38823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38823. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 36183 [https://github.com/apache/spark/pull/36183] > Incorrect result of dataset reduceGroups in java > > > Key: SPARK-38823 > URL: https://issues.apache.org/jira/browse/SPARK-38823 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 3.3.0, 3.4.0 >Reporter: IKozar >Assignee: Bruce Robbins >Priority: Major > Labels: correctness > Fix For: 3.3.0 > > > {code:java} > @Data > @NoArgsConstructor > @AllArgsConstructor > public static class Item implements Serializable { > private String x; > private String y; > private int z; > public Item addZ(int z) { > return new Item(x, y, this.z + z); > } > } {code} > {code:java} > List items = List.of( > new Item("X1", "Y1", 1), > new Item("X2", "Y1", 1), > new Item("X1", "Y1", 1), > new Item("X2", "Y1", 1), > new Item("X3", "Y1", 1), > new Item("X1", "Y1", 1), > new Item("X1", "Y2", 1), > new Item("X2", "Y1", 1)); > Dataset ds = spark.createDataFrame(items, > Item.class).as(Encoders.bean(Item.class)); > ds.groupByKey((MapFunction>) item -> > Tuple2.apply(item.getX(), item.getY()), > Encoders.tuple(Encoders.STRING(), Encoders.STRING())) > .reduceGroups((ReduceFunction) (item1, item2) -> > item1.addZ(item2.getZ())) > .show(10); > {code} > result is > {noformat} > ++--+ > | key|ReduceAggregator(poc.job.JavaSparkReduce$Item)| > ++--+ > |{X1, Y1}| {X2, Y1, 2}|-- expected 3 > |{X2, Y1}| {X2, Y1, 2}|-- expected 3 > |{X1, Y2}| {X2, Y1, 1}| > |{X3, Y1}| {X2, Y1, 1}| > ++--+{noformat} > pay attention that key doesn't mach with value -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38823) Incorrect result of dataset reduceGroups in java
[ https://issues.apache.org/jira/browse/SPARK-38823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38823: Assignee: Bruce Robbins > Incorrect result of dataset reduceGroups in java > > > Key: SPARK-38823 > URL: https://issues.apache.org/jira/browse/SPARK-38823 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 3.3.0, 3.4.0 >Reporter: IKozar >Assignee: Bruce Robbins >Priority: Major > Labels: correctness > > {code:java} > @Data > @NoArgsConstructor > @AllArgsConstructor > public static class Item implements Serializable { > private String x; > private String y; > private int z; > public Item addZ(int z) { > return new Item(x, y, this.z + z); > } > } {code} > {code:java} > List items = List.of( > new Item("X1", "Y1", 1), > new Item("X2", "Y1", 1), > new Item("X1", "Y1", 1), > new Item("X2", "Y1", 1), > new Item("X3", "Y1", 1), > new Item("X1", "Y1", 1), > new Item("X1", "Y2", 1), > new Item("X2", "Y1", 1)); > Dataset ds = spark.createDataFrame(items, > Item.class).as(Encoders.bean(Item.class)); > ds.groupByKey((MapFunction>) item -> > Tuple2.apply(item.getX(), item.getY()), > Encoders.tuple(Encoders.STRING(), Encoders.STRING())) > .reduceGroups((ReduceFunction) (item1, item2) -> > item1.addZ(item2.getZ())) > .show(10); > {code} > result is > {noformat} > ++--+ > | key|ReduceAggregator(poc.job.JavaSparkReduce$Item)| > ++--+ > |{X1, Y1}| {X2, Y1, 2}|-- expected 3 > |{X2, Y1}| {X2, Y1, 2}|-- expected 3 > |{X1, Y2}| {X2, Y1, 1}| > |{X3, Y1}| {X2, Y1, 1}| > ++--+{noformat} > pay attention that key doesn't mach with value -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38898) Failed to build python docker images due to .cache not found
[ https://issues.apache.org/jira/browse/SPARK-38898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38898: Assignee: Yikun Jiang > Failed to build python docker images due to .cache not found > > > Key: SPARK-38898 > URL: https://issues.apache.org/jira/browse/SPARK-38898 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > > rm: cannot remove '/root/.cache': No such file or directory > Related: > [https://github.com/volcano-sh/volcano/runs/6020604500?check_suite_focus=true#step:10:2381] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38898) Failed to build python docker images due to .cache not found
[ https://issues.apache.org/jira/browse/SPARK-38898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38898. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36198 [https://github.com/apache/spark/pull/36198] > Failed to build python docker images due to .cache not found > > > Key: SPARK-38898 > URL: https://issues.apache.org/jira/browse/SPARK-38898 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.4.0 > > > rm: cannot remove '/root/.cache': No such file or directory > Related: > [https://github.com/volcano-sh/volcano/runs/6020604500?check_suite_focus=true#step:10:2381] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36664) Log time spent waiting for cluster resources
[ https://issues.apache.org/jira/browse/SPARK-36664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36664: -- Affects Version/s: 3.4.0 (was: 3.2.0) (was: 3.3.0) > Log time spent waiting for cluster resources > > > Key: SPARK-36664 > URL: https://issues.apache.org/jira/browse/SPARK-36664 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Holden Karau >Priority: Major > > To provide better visibility into why jobs might be running slow it would be > useful to log when we are waiting for resources and how long we are waiting > for resources so if there is an underlying cluster issue the user can be > aware. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36664) Log time spent waiting for cluster resources
[ https://issues.apache.org/jira/browse/SPARK-36664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36664: -- Target Version/s: (was: 3.3.0) > Log time spent waiting for cluster resources > > > Key: SPARK-36664 > URL: https://issues.apache.org/jira/browse/SPARK-36664 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0, 3.3.0 >Reporter: Holden Karau >Priority: Major > > To provide better visibility into why jobs might be running slow it would be > useful to log when we are waiting for resources and how long we are waiting > for resources so if there is an underlying cluster issue the user can be > aware. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38906) Support joinWith() with watermarks
John P created SPARK-38906: -- Summary: Support joinWith() with watermarks Key: SPARK-38906 URL: https://issues.apache.org/jira/browse/SPARK-38906 Project: Spark Issue Type: Improvement Components: SQL, Structured Streaming Affects Versions: 3.2.0 Reporter: John P Structured streaming requires a watermark for outer joins. This makes it impossible to use joinWith. I've attached a self-contained project example [here|https://gist.github.com/jpassaro/886d5febe6e40bced03a3691115c84d5], but this is the relevant part: {code:scala} streamingDatasetLeft .withWatermark("whenA", "1 second") .joinWith( streamingDatasetRight.withWatermark("whenB", "1 second"), ($"idA" === $"idB") && $"whenB".between($"whenA" - interval, $"whenA" + interval), "leftOuter" ) {code} stack trace: {noformat} [error] org.apache.spark.sql.AnalysisException: Stream-stream LeftOuter join between two streaming DataFrame/Datasets is not supported without a watermark in the join keys, or a watermark on the nullable side and an appropriate range condition; [error] Join LeftOuter, ((_1#40.idA = _2#41.idB) AND ((_2#41.whenB >= cast(whenA#9-T1000ms - INTERVAL '00.2' SECOND as timestamp)) AND (_2#41.whenB <= cast(_1#40.whenA + INTERVAL '00.2' SECOND as timestamp [error] :- Project [named_struct(idA, idA#8, whenA, whenA#9-T1000ms) AS _1#40] [error] : +- EventTimeWatermark whenA#9: timestamp, 1 seconds [error] : +- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, example.ExampleA, true])).idA, true, false) AS idA#8, staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, TimestampType, fromJavaTimestamp, knownnotnull(assertnotnull(input[0, example.ExampleA, true])).whenA, true, false) AS whenA#9] [error] :+- MapElements example.Main$$$Lambda$23191/0x0008061a5840@5af65edf, interface org.apache.spark.sql.Row, [StructField(timestamp,TimestampType,true), StructField(value,LongType,true)], obj#7: example.ExampleA [error] : +- DeserializeToObject createexternalrow(staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, ObjectType(class java.sql.Timestamp), toJavaTimestamp, timestamp#0, true, false), value#1L, StructField(timestamp,TimestampType,true), StructField(value,LongType,true)), obj#6: org.apache.spark.sql.Row [error] : +- GlobalLimit 1 [error] : +- LocalLimit 1 [error] :+- StreamingRelationV2 org.apache.spark.sql.execution.streaming.sources.RateStreamProvider@715e4ad8, rate, org.apache.spark.sql.execution.streaming.sources.RateStreamTable@7963c608, [rowsPerSecond=1], [timestamp#0, value#1L] [error] +- Project [named_struct(idB, idB#22, value, value#23, whenB, whenB#24-T1000ms) AS _2#41] [error]+- EventTimeWatermark whenB#24: timestamp, 1 seconds [error] +- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, example.ExampleB, true])).idB, true, false) AS idB#22, knownnotnull(assertnotnull(input[0, example.ExampleB, true])).value AS value#23, staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, TimestampType, fromJavaTimestamp, knownnotnull(assertnotnull(input[0, example.ExampleB, true])).whenB, true, false) AS whenB#24] [error] +- MapElements example.Main$$$Lambda$23254/0x0008061da840@2a5a573b, interface org.apache.spark.sql.Row, [StructField(timestamp,TimestampType,true), StructField(value,LongType,true)], obj#21: example.ExampleB [error] +- DeserializeToObject createexternalrow(staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, ObjectType(class java.sql.Timestamp), toJavaTimestamp, timestamp#13, true, false), value#14L, StructField(timestamp,TimestampType,true), StructField(value,LongType,true)), obj#20: org.apache.spark.sql.Row [error]+- GlobalLimit 1 [error] +- LocalLimit 1 [error] +- StreamingRelationV2 org.apache.spark.sql.execution.streaming.sources.RateStreamProvider@46bcce3d, rate, org.apache.spark.sql.execution.streaming.sources.RateStreamTable@25a4c0a2, [rowsPerSecond=1], [timestamp#13, value#14L] [error] {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37395) Inline type hint files for files in python/pyspark/ml
[ https://issues.apache.org/jira/browse/SPARK-37395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-37395. Fix Version/s: 3.3.0 Resolution: Fixed > Inline type hint files for files in python/pyspark/ml > - > > Key: SPARK-37395 > URL: https://issues.apache.org/jira/browse/SPARK-37395 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > Fix For: 3.3.0 > > > Currently there are type hint stub files ({{*.pyi}}) to show the expected > types for functions, but we can also take advantage of static type checking > within the functions by inlining the type hints. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38905) Upgrade ORC to 1.6.14
[ https://issues.apache.org/jira/browse/SPARK-38905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38905: Assignee: Apache Spark > Upgrade ORC to 1.6.14 > - > > Key: SPARK-38905 > URL: https://issues.apache.org/jira/browse/SPARK-38905 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.1 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38905) Upgrade ORC to 1.6.14
[ https://issues.apache.org/jira/browse/SPARK-38905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38905: Assignee: (was: Apache Spark) > Upgrade ORC to 1.6.14 > - > > Key: SPARK-38905 > URL: https://issues.apache.org/jira/browse/SPARK-38905 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.1 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38905) Upgrade ORC to 1.6.14
[ https://issues.apache.org/jira/browse/SPARK-38905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522511#comment-17522511 ] Apache Spark commented on SPARK-38905: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36204 > Upgrade ORC to 1.6.14 > - > > Key: SPARK-38905 > URL: https://issues.apache.org/jira/browse/SPARK-38905 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.1 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38905) Upgrade ORC to 1.6.14
[ https://issues.apache.org/jira/browse/SPARK-38905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522512#comment-17522512 ] Apache Spark commented on SPARK-38905: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36204 > Upgrade ORC to 1.6.14 > - > > Key: SPARK-38905 > URL: https://issues.apache.org/jira/browse/SPARK-38905 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.1 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38905) Upgrade ORC to 1.6.14
Dongjoon Hyun created SPARK-38905: - Summary: Upgrade ORC to 1.6.14 Key: SPARK-38905 URL: https://issues.apache.org/jira/browse/SPARK-38905 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.2.1 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37405) Inline type hints for python/pyspark/ml/feature.py
[ https://issues.apache.org/jira/browse/SPARK-37405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522509#comment-17522509 ] Apache Spark commented on SPARK-37405: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/36203 > Inline type hints for python/pyspark/ml/feature.py > -- > > Key: SPARK-37405 > URL: https://issues.apache.org/jira/browse/SPARK-37405 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Apache Spark >Priority: Major > Fix For: 3.3.0 > > > Inline type hints from python/pyspark/ml/feature.pyi to > python/pyspark/ml/feature.py -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37405) Inline type hints for python/pyspark/ml/feature.py
[ https://issues.apache.org/jira/browse/SPARK-37405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-37405. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35530 [https://github.com/apache/spark/pull/35530] > Inline type hints for python/pyspark/ml/feature.py > -- > > Key: SPARK-37405 > URL: https://issues.apache.org/jira/browse/SPARK-37405 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Apache Spark >Priority: Major > Fix For: 3.3.0 > > > Inline type hints from python/pyspark/ml/feature.pyi to > python/pyspark/ml/feature.py -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38667) Optimizer generates error when using inner join along with sequence
[ https://issues.apache.org/jira/browse/SPARK-38667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522500#comment-17522500 ] Bruce Robbins commented on SPARK-38667: --- Strangely, I cannot reproduce on 3.2.1 (or master). Maybe I am missing some configuration? My optimized plan doesn't contain the size check in the {{Join}}: {noformat} == Optimized Logical Plan == Generate explode(sequence(a2#5, b2#13, Some(1), Some(America/Vancouver))), false, [x#25] +- Join Inner, ((a2#5 < b2#13) AND (a1#4 = b1#12)) :- LocalRelation [a1#4, a2#5] +- LocalRelation [b1#12, b2#13] {noformat} > Optimizer generates error when using inner join along with sequence > --- > > Key: SPARK-38667 > URL: https://issues.apache.org/jira/browse/SPARK-38667 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.2.1 >Reporter: Lars >Priority: Major > > This issue occurred in a more complex scenario, so I've broken it down into a > simple case. > {*}Steps to reproduce{*}: Execute the following example. The code should run > without errors, but instead a *java.lang.IllegalArgumentException: Illegal > sequence boundaries: 4 to 2 by 1* is thrown. > {code:java} > package com.example > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.functions._ > object SparkIssue { > def main(args: Array[String]): Unit = { > val spark = SparkSession > .builder() > .master("local[*]") > .getOrCreate() > val dfA = spark > .createDataFrame(Seq((1, 1), (2, 4))) > .toDF("a1", "a2") > val dfB = spark > .createDataFrame(Seq((1, 5), (2, 2))) > .toDF("b1", "b2") > dfA.join(dfB, dfA("a1") === dfB("b1"), "inner") > .where(col("a2") < col("b2")) > .withColumn("x", explode(sequence(col("a2"), col("b2"), lit(1 > .show() > spark.stop() > } > } > {code} > When I look at the Optimized Logical Plan I can see that the Inner Join and > the Filter are brought together, with an additional check for an empty > Sequence. The exception is thrown because the Sequence check is executed > before the Filter. > {code:java} > == Parsed Logical Plan == > 'Project [a1#4, a2#5, b1#12, b2#13, explode(sequence('a2, 'b2, Some(1), > None)) AS x#24] > +- Filter (a2#5 < b2#13) > +- Join Inner, (a1#4 = b1#12) > :- Project [_1#0 AS a1#4, _2#1 AS a2#5] > : +- LocalRelation [_1#0, _2#1] > +- Project [_1#8 AS b1#12, _2#9 AS b2#13] > +- LocalRelation [_1#8, _2#9] > == Analyzed Logical Plan == > a1: int, a2: int, b1: int, b2: int, x: int > Project [a1#4, a2#5, b1#12, b2#13, x#25] > +- Generate explode(sequence(a2#5, b2#13, Some(1), Some(Europe/Berlin))), > false, [x#25] > +- Filter (a2#5 < b2#13) > +- Join Inner, (a1#4 = b1#12) > :- Project [_1#0 AS a1#4, _2#1 AS a2#5] > : +- LocalRelation [_1#0, _2#1] > +- Project [_1#8 AS b1#12, _2#9 AS b2#13] > +- LocalRelation [_1#8, _2#9] > == Optimized Logical Plan == > Generate explode(sequence(a2#5, b2#13, Some(1), Some(Europe/Berlin))), false, > [x#25] > +- Join Inner, (((size(sequence(a2#5, b2#13, Some(1), Some(Europe/Berlin)), > true) > 0) AND (a2#5 < b2#13)) AND (a1#4 = b1#12)) > :- LocalRelation [a1#4, a2#5] > +- LocalRelation [b1#12, b2#13] > == Physical Plan == > Generate explode(sequence(a2#5, b2#13, Some(1), Some(Europe/Berlin))), [a1#4, > a2#5, b1#12, b2#13], false, [x#25] > +- *(1) BroadcastHashJoin [a1#4], [b1#12], Inner, BuildRight, > ((size(sequence(a2#5, b2#13, Some(1), Some(Europe/Berlin)), true) > 0) AND > (a2#5 < b2#13)), false > :- *(1) LocalTableScan [a1#4, a2#5] > +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, > false] as bigint)),false), [id=#15] > +- LocalTableScan [b1#12, b2#13] > {code} > > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38904) Low cost DataFrame schema swap util
[ https://issues.apache.org/jira/browse/SPARK-38904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rafal Wojdyla updated SPARK-38904: -- Description: This question is related to [https://stackoverflow.com/a/37090151/1661491]. Let's assume I have a pyspark DataFrame with certain schema, and I would like to overwrite that schema with a new schema that I *know* is compatible, I could do: {code:python} df: DataFrame new_schema = ... df.rdd.toDF(schema=new_schema) {code} Unfortunately this triggers computation as described in the link above. Is there a way to do that at the metadata level (or lazy), without eagerly triggering computation or conversions? Edit, note: * the schema can be arbitrarily complicated (nested etc) * new schema includes updates to description, nullability and additional metadata (bonus points for updates to the type) * I would like to avoid writing a custom query expression generator, *unless* there's one already built into Spark that can generate query based on the schema/{{{}StructType{}}} Copied from: [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] See POC of workaround/util in [https://github.com/ravwojdyla/spark-schema-utils] Also posted in [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] was: This question is related to [https://stackoverflow.com/a/37090151/1661491]. Let's assume I have a pyspark DataFrame with certain schema, and I would like to overwrite that schema with a new schema that I *know* is compatible, I could do: {code:python} df: DataFrame new_schema = ... df.rdd.toDF(schema=new_schema) {code} Unfortunately this triggers computation as described in the link above. Is there a way to do that at the metadata level (or lazy), without eagerly triggering computation or conversions? Edit, note: * the schema can be arbitrarily complicated (nested etc) * new schema includes updates to description, nullability and additional metadata (bonus points for updates to the type) * I would like to avoid writing a custom query expression generator, *unless* there's one already built into Spark that can generate query based on the schema/{ Unknown macro: {{StructType}} Copied from: [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] See POC of workaround/util in [https://github.com/ravwojdyla/spark-schema-utils] Also posted in [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] > Low cost DataFrame schema swap util > --- > > Key: SPARK-38904 > URL: https://issues.apache.org/jira/browse/SPARK-38904 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.1 >Reporter: Rafal Wojdyla >Priority: Major > > This question is related to [https://stackoverflow.com/a/37090151/1661491]. > Let's assume I have a pyspark DataFrame with certain schema, and I would like > to overwrite that schema with a new schema that I *know* is compatible, I > could do: > {code:python} > df: DataFrame > new_schema = ... > df.rdd.toDF(schema=new_schema) > {code} > Unfortunately this triggers computation as described in the link above. Is > there a way to do that at the metadata level (or lazy), without eagerly > triggering computation or conversions? > Edit, note: > * the schema can be arbitrarily complicated (nested etc) > * new schema includes updates to description, nullability and additional > metadata (bonus points for updates to the type) > * I would like to avoid writing a custom query expression generator, > *unless* there's one already built into Spark that can generate query based > on the schema/{{{}StructType{}}} > Copied from: > [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] > See POC of workaround/util in > [https://github.com/ravwojdyla/spark-schema-utils] > Also posted in > [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38904) Low cost DataFrame schema swap util
[ https://issues.apache.org/jira/browse/SPARK-38904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rafal Wojdyla updated SPARK-38904: -- Description: This question is related to [https://stackoverflow.com/a/37090151/1661491]. Let's assume I have a pyspark DataFrame with certain schema, and I would like to overwrite that schema with a new schema that I *know* is compatible, I could do: {code:python} df: DataFrame new_schema = ... df.rdd.toDF(schema=new_schema) {code} Unfortunately this triggers computation as described in the link above. Is there a way to do that at the metadata level (or lazy), without eagerly triggering computation or conversions? Edit, note: * the schema can be arbitrarily complicated (nested etc) * new schema includes updates to description, nullability and additional metadata (bonus points for updates to the type) * I would like to avoid writing a custom query expression generator, *unless* there's one already built into Spark that can generate query based on the schema/{ Unknown macro: {{StructType}} Copied from: [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] See POC of workaround/util in [https://github.com/ravwojdyla/spark-schema-utils] Also posted in [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] was: This question is related to [https://stackoverflow.com/a/37090151/1661491]. Let's assume I have a pyspark DataFrame with certain schema, and I would like to overwrite that schema with a new schema that I *know* is compatible, I could do: {code:python} df: DataFrame new_schema = ... df.rdd.toDF(schema=new_schema) {code} Unfortunately this triggers computation as described in the link above. Is there a way to do that at the metadata level (or lazy), without eagerly triggering computation or conversions? Edit, note: * the schema can be arbitrarily complicated (nested etc) * new schema includes updates to description, nullability and additional metadata (bonus points for updates to the type) * I would like to avoid writing a custom query expression generator, *unless* there's one already built into Spark that can generate query based on the schema/\{{StructType}} Copied from: [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] See POC of workaround/util in [https://github.com/ravwojdyla/spark-schema-utils] Also posted in [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] > Low cost DataFrame schema swap util > --- > > Key: SPARK-38904 > URL: https://issues.apache.org/jira/browse/SPARK-38904 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.1 >Reporter: Rafal Wojdyla >Priority: Major > > This question is related to [https://stackoverflow.com/a/37090151/1661491]. > Let's assume I have a pyspark DataFrame with certain schema, and I would like > to overwrite that schema with a new schema that I *know* is compatible, I > could do: > {code:python} > df: DataFrame > new_schema = ... > df.rdd.toDF(schema=new_schema) > {code} > Unfortunately this triggers computation as described in the link above. Is > there a way to do that at the metadata level (or lazy), without eagerly > triggering computation or conversions? > Edit, note: > * the schema can be arbitrarily complicated (nested etc) > * new schema includes updates to description, nullability and additional > metadata (bonus points for updates to the type) > * I would like to avoid writing a custom query expression generator, > *unless* there's one already built into Spark that can generate query based > on the schema/{ > Unknown macro: {{StructType}} > Copied from: > [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] > See POC of workaround/util in > [https://github.com/ravwojdyla/spark-schema-utils] > Also posted in > [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38904) Low cost DataFrame schema swap util
[ https://issues.apache.org/jira/browse/SPARK-38904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rafal Wojdyla updated SPARK-38904: -- Description: This question is related to [https://stackoverflow.com/a/37090151/1661491]. Let's assume I have a pyspark DataFrame with certain schema, and I would like to overwrite that schema with a new schema that I *know* is compatible, I could do: {code:python} df: DataFrame new_schema = ... df.rdd.toDF(schema=new_schema) {code} Unfortunately this triggers computation as described in the link above. Is there a way to do that at the metadata level (or lazy), without eagerly triggering computation or conversions? Edit, note: * the schema can be arbitrarily complicated (nested etc) * new schema includes updates to description, nullability and additional metadata (bonus points for updates to the type) * I would like to avoid writing a custom query expression generator, *unless* there's one already built into Spark that can generate query based on the schema/\{{StructType}} Copied from: [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] See POC of workaround/util in [https://github.com/ravwojdyla/spark-schema-utils] Also posted in [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] was: This question is related to [https://stackoverflow.com/a/37090151/1661491]. Let's assume I have a pyspark DataFrame with certain schema, and I would like to overwrite that schema with a new schema that I *know* is compatible, I could do: {code:python} df: DataFrame new_schema = ... df.rdd.toDF(schema=new_schema) {code} Unfortunately this triggers computation as described in the link above. Is there a way to do that at the metadata level (or lazy), without eagerly triggering computation or conversions? Edit, note: * the schema can be arbitrarily complicated (nested etc) * new schema includes updates to description, nullability and additional metadata (bonus points for updates to the type) * I would like to avoid writing a custom query expression generator, *unless* there's one already built into Spark that can generate query based on the schema/`StructType` Copied from: [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] See POC of workaround/util in [https://github.com/ravwojdyla/spark-schema-utils] Also posted in [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] > Low cost DataFrame schema swap util > --- > > Key: SPARK-38904 > URL: https://issues.apache.org/jira/browse/SPARK-38904 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.1 >Reporter: Rafal Wojdyla >Priority: Major > > This question is related to [https://stackoverflow.com/a/37090151/1661491]. > Let's assume I have a pyspark DataFrame with certain schema, and I would like > to overwrite that schema with a new schema that I *know* is compatible, I > could do: > {code:python} > df: DataFrame > new_schema = ... > df.rdd.toDF(schema=new_schema) > {code} > Unfortunately this triggers computation as described in the link above. Is > there a way to do that at the metadata level (or lazy), without eagerly > triggering computation or conversions? > Edit, note: > * the schema can be arbitrarily complicated (nested etc) > * new schema includes updates to description, nullability and additional > metadata (bonus points for updates to the type) > * I would like to avoid writing a custom query expression generator, > *unless* there's one already built into Spark that can generate query based > on the schema/\{{StructType}} > Copied from: > [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] > See POC of workaround/util in > [https://github.com/ravwojdyla/spark-schema-utils] > Also posted in > [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38904) Low cost DataFrame schema swap util
[ https://issues.apache.org/jira/browse/SPARK-38904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rafal Wojdyla updated SPARK-38904: -- Description: This question is related to [https://stackoverflow.com/a/37090151/1661491]. Let's assume I have a pyspark DataFrame with certain schema, and I would like to overwrite that schema with a new schema that I *know* is compatible, I could do: {code:python} df: DataFrame new_schema = ... df.rdd.toDF(schema=new_schema) {code} Unfortunately this triggers computation as described in the link above. Is there a way to do that at the metadata level (or lazy), without eagerly triggering computation or conversions? Edit, note: * the schema can be arbitrarily complicated (nested etc) * new schema includes updates to description, nullability and additional metadata (bonus points for updates to the type) * I would like to avoid writing a custom query expression generator, *unless* there's one already built into Spark that can generate query based on the schema/`StructType` Copied from: [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] See POC of workaround/util in [https://github.com/ravwojdyla/spark-schema-utils] Also posted in [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] was: This question is related to [https://stackoverflow.com/a/37090151/1661491]. Let's assume I have a pyspark DataFrame with certain schema, and I would like to overwrite that schema with a new schema that I *{*}know{*}* is compatible, I could do: {code:python} df: DataFrame new_schema = ... df.rdd.toDF(schema=new_schema) {code} Unfortunately this triggers computation as described in the link above. Is there a way to do that at the metadata level (or lazy), without eagerly triggering computation or conversions? Edit, note: * the schema can be arbitrarily complicated (nested etc) * new schema includes updates to description, nullability and additional metadata (bonus points for updates to the type) * I would like to avoid writing a custom query expression generator, *unless* there's one already built into Spark that can generate query based on the schema/`StructType` Copied from: [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] See POC of workaround/util in https://github.com/ravwojdyla/spark-schema-utils Also posted in [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] > Low cost DataFrame schema swap util > --- > > Key: SPARK-38904 > URL: https://issues.apache.org/jira/browse/SPARK-38904 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.1 >Reporter: Rafal Wojdyla >Priority: Major > > This question is related to [https://stackoverflow.com/a/37090151/1661491]. > Let's assume I have a pyspark DataFrame with certain schema, and I would like > to overwrite that schema with a new schema that I *know* is compatible, I > could do: > {code:python} > df: DataFrame > new_schema = ... > df.rdd.toDF(schema=new_schema) > {code} > Unfortunately this triggers computation as described in the link above. Is > there a way to do that at the metadata level (or lazy), without eagerly > triggering computation or conversions? > Edit, note: > * the schema can be arbitrarily complicated (nested etc) > * new schema includes updates to description, nullability and additional > metadata (bonus points for updates to the type) > * I would like to avoid writing a custom query expression generator, > *unless* there's one already built into Spark that can generate query based > on the schema/`StructType` > Copied from: > [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] > See POC of workaround/util in > [https://github.com/ravwojdyla/spark-schema-utils] > Also posted in > [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38904) Low cost DataFrame schema swap util
[ https://issues.apache.org/jira/browse/SPARK-38904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rafal Wojdyla updated SPARK-38904: -- Description: This question is related to [https://stackoverflow.com/a/37090151/1661491]. Let's assume I have a pyspark DataFrame with certain schema, and I would like to overwrite that schema with a new schema that I *{*}know{*}* is compatible, I could do: {code:python} df: DataFrame new_schema = ... df.rdd.toDF(schema=new_schema) {code} Unfortunately this triggers computation as described in the link above. Is there a way to do that at the metadata level (or lazy), without eagerly triggering computation or conversions? Edit, note: * the schema can be arbitrarily complicated (nested etc) * new schema includes updates to description, nullability and additional metadata (bonus points for updates to the type) * I would like to avoid writing a custom query expression generator, *unless* there's one already built into Spark that can generate query based on the schema/`StructType` Copied from: [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] See POC of workaround/util in https://github.com/ravwojdyla/spark-schema-utils Also posted in [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] was: This question is related to [https://stackoverflow.com/a/37090151/1661491]. Let's assume I have a pyspark DataFrame with certain schema, and I would like to overwrite that schema with a new schema that I *{*}know{*}* is compatible, I could do: {code:python} df: DataFrame new_schema = ... df.rdd.toDF(schema=new_schema) {code} Unfortunately this triggers computation as described in the link above. Is there a way to do that at the metadata level (or lazy), without eagerly triggering computation or conversions? Edit, note: * the schema can be arbitrarily complicated (nested etc) * new schema includes updates to description, nullability and additional metadata (bonus points for updates to the type) * I would like to avoid writing a custom query expression generator, *{*}unless{*}* there's one already built into Spark that can generate query based on the schema/`StructType` Copied from: [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] See POC of workaround/util in https://github.com/ravwojdyla/spark-schema-utils Also posted in [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] > Low cost DataFrame schema swap util > --- > > Key: SPARK-38904 > URL: https://issues.apache.org/jira/browse/SPARK-38904 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.1 >Reporter: Rafal Wojdyla >Priority: Major > > This question is related to [https://stackoverflow.com/a/37090151/1661491]. > Let's assume I have a pyspark DataFrame with certain schema, and I would like > to overwrite that schema with a new schema that I *{*}know{*}* is compatible, > I could do: > {code:python} > df: DataFrame > new_schema = ... > df.rdd.toDF(schema=new_schema) > {code} > Unfortunately this triggers computation as described in the link above. Is > there a way to do that at the metadata level (or lazy), without eagerly > triggering computation or conversions? > Edit, note: > * the schema can be arbitrarily complicated (nested etc) > * new schema includes updates to description, nullability and additional > metadata (bonus points for updates to the type) > * I would like to avoid writing a custom query expression generator, > *unless* there's one already built into Spark that can generate query based > on the schema/`StructType` > Copied from: > [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] > See POC of workaround/util in https://github.com/ravwojdyla/spark-schema-utils > Also posted in > [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38904) Low cost DataFrame schema swap util
Rafal Wojdyla created SPARK-38904: - Summary: Low cost DataFrame schema swap util Key: SPARK-38904 URL: https://issues.apache.org/jira/browse/SPARK-38904 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.2.1 Reporter: Rafal Wojdyla This question is related to [https://stackoverflow.com/a/37090151/1661491]. Let's assume I have a pyspark DataFrame with certain schema, and I would like to overwrite that schema with a new schema that I *{*}know{*}* is compatible, I could do: {code:python} df: DataFrame new_schema = ... df.rdd.toDF(schema=new_schema) {code} Unfortunately this triggers computation as described in the link above. Is there a way to do that at the metadata level (or lazy), without eagerly triggering computation or conversions? Edit, note: * the schema can be arbitrarily complicated (nested etc) * new schema includes updates to description, nullability and additional metadata (bonus points for updates to the type) * I would like to avoid writing a custom query expression generator, *{*}unless{*}* there's one already built into Spark that can generate query based on the schema/`StructType` Copied from: [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan] See POC of workaround/util in https://github.com/ravwojdyla/spark-schema-utils Also posted in [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38891) Skipping allocating vector for repetition & definition levels when possible
[ https://issues.apache.org/jira/browse/SPARK-38891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38891: Assignee: (was: Apache Spark) > Skipping allocating vector for repetition & definition levels when possible > --- > > Key: SPARK-38891 > URL: https://issues.apache.org/jira/browse/SPARK-38891 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Chao Sun >Priority: Major > > Currently the vectorized Parquet reader will allocate vectors for repetition > and definition levels in all cases. However in certain cases (e.g., when > reading primitive types) this is not necessary and should be avoided. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38891) Skipping allocating vector for repetition & definition levels when possible
[ https://issues.apache.org/jira/browse/SPARK-38891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522458#comment-17522458 ] Apache Spark commented on SPARK-38891: -- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/36202 > Skipping allocating vector for repetition & definition levels when possible > --- > > Key: SPARK-38891 > URL: https://issues.apache.org/jira/browse/SPARK-38891 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Chao Sun >Priority: Major > > Currently the vectorized Parquet reader will allocate vectors for repetition > and definition levels in all cases. However in certain cases (e.g., when > reading primitive types) this is not necessary and should be avoided. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38891) Skipping allocating vector for repetition & definition levels when possible
[ https://issues.apache.org/jira/browse/SPARK-38891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522460#comment-17522460 ] Apache Spark commented on SPARK-38891: -- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/36202 > Skipping allocating vector for repetition & definition levels when possible > --- > > Key: SPARK-38891 > URL: https://issues.apache.org/jira/browse/SPARK-38891 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Chao Sun >Priority: Major > > Currently the vectorized Parquet reader will allocate vectors for repetition > and definition levels in all cases. However in certain cases (e.g., when > reading primitive types) this is not necessary and should be avoided. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38891) Skipping allocating vector for repetition & definition levels when possible
[ https://issues.apache.org/jira/browse/SPARK-38891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38891: Assignee: Apache Spark > Skipping allocating vector for repetition & definition levels when possible > --- > > Key: SPARK-38891 > URL: https://issues.apache.org/jira/browse/SPARK-38891 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Chao Sun >Assignee: Apache Spark >Priority: Major > > Currently the vectorized Parquet reader will allocate vectors for repetition > and definition levels in all cases. However in certain cases (e.g., when > reading primitive types) this is not necessary and should be avoided. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38903) Implement `ignore_index` of `Series.sort_values` and `Series.sort_index`
[ https://issues.apache.org/jira/browse/SPARK-38903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522422#comment-17522422 ] Apache Spark commented on SPARK-38903: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36186 > Implement `ignore_index` of `Series.sort_values` and `Series.sort_index` > > > Key: SPARK-38903 > URL: https://issues.apache.org/jira/browse/SPARK-38903 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement `ignore_index` of `Series.sort_values` and `Series.sort_index` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38903) Implement `ignore_index` of `Series.sort_values` and `Series.sort_index`
[ https://issues.apache.org/jira/browse/SPARK-38903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38903: Assignee: (was: Apache Spark) > Implement `ignore_index` of `Series.sort_values` and `Series.sort_index` > > > Key: SPARK-38903 > URL: https://issues.apache.org/jira/browse/SPARK-38903 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement `ignore_index` of `Series.sort_values` and `Series.sort_index` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38903) Implement `ignore_index` of `Series.sort_values` and `Series.sort_index`
[ https://issues.apache.org/jira/browse/SPARK-38903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522421#comment-17522421 ] Apache Spark commented on SPARK-38903: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36186 > Implement `ignore_index` of `Series.sort_values` and `Series.sort_index` > > > Key: SPARK-38903 > URL: https://issues.apache.org/jira/browse/SPARK-38903 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement `ignore_index` of `Series.sort_values` and `Series.sort_index` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38903) Implement `ignore_index` of `Series.sort_values` and `Series.sort_index`
[ https://issues.apache.org/jira/browse/SPARK-38903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38903: Assignee: Apache Spark > Implement `ignore_index` of `Series.sort_values` and `Series.sort_index` > > > Key: SPARK-38903 > URL: https://issues.apache.org/jira/browse/SPARK-38903 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Implement `ignore_index` of `Series.sort_values` and `Series.sort_index` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38903) Implement `ignore_index` of `Series.sort_values` and `Series.sort_index`
Xinrong Meng created SPARK-38903: Summary: Implement `ignore_index` of `Series.sort_values` and `Series.sort_index` Key: SPARK-38903 URL: https://issues.apache.org/jira/browse/SPARK-38903 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.4.0 Reporter: Xinrong Meng Implement `ignore_index` of `Series.sort_values` and `Series.sort_index` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38462) Use error classes in org.apache.spark.executor
[ https://issues.apache.org/jira/browse/SPARK-38462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522331#comment-17522331 ] huangtengfei commented on SPARK-38462: -- I am working on this. Thanks [~bozhang] > Use error classes in org.apache.spark.executor > -- > > Key: SPARK-38462 > URL: https://issues.apache.org/jira/browse/SPARK-38462 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38881) PySpark Kinesis Streaming should expose metricsLevel CloudWatch config that is already supported in the Scala/Java APIs
[ https://issues.apache.org/jira/browse/SPARK-38881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Khaitman updated SPARK-38881: -- Description: This relates to https://issues.apache.org/jira/browse/SPARK-27420 which was merged as part of Spark 3.0.0 This change is desirable as it further exposes the metricsLevel config parameter that was added for the Scala/Java Spark APIs when working with the Kinesis Streaming integration, and makes it available to the PySpark API as well. This change passes all tests, and local testing was done with a development Kinesis stream in AWS, in order to confirm that metrics were no longer being reported to CloudWatch after specifying MetricsLevel.NONE in the PySpark Kinesis streaming context creation, and also worked as it does today when leaving the MetricsLevel parameter out, which would result in a default of DETAILED, with CloudWatch metrics appearing again. https://github.com/apache/spark/pull/36201 was: This relates to https://issues.apache.org/jira/browse/SPARK-27420 which was merged as part of Spark 3.0.0 This change is desirable as it further exposes the metricsLevel config parameter that was added for the Scala/Java Spark APIs when working with the Kinesis Streaming integration, and makes it available to the PySpark API as well. This change passes all tests, and local testing was done with a development Kinesis stream in AWS, in order to confirm that metrics were no longer being reported to CloudWatch after specifying MetricsLevel.NONE in the PySpark Kinesis streaming context creation, and also worked as it does today when leaving the MetricsLevel parameter out, which would result in a default of DETAILED, with CloudWatch metrics appearing again. https://github.com/apache/spark/pull/36166 > PySpark Kinesis Streaming should expose metricsLevel CloudWatch config that > is already supported in the Scala/Java APIs > --- > > Key: SPARK-38881 > URL: https://issues.apache.org/jira/browse/SPARK-38881 > Project: Spark > Issue Type: Improvement > Components: DStreams, Input/Output, PySpark >Affects Versions: 3.2.1 >Reporter: Mark Khaitman >Priority: Major > > This relates to https://issues.apache.org/jira/browse/SPARK-27420 which was > merged as part of Spark 3.0.0 > This change is desirable as it further exposes the metricsLevel config > parameter that was added for the Scala/Java Spark APIs when working with the > Kinesis Streaming integration, and makes it available to the PySpark API as > well. > This change passes all tests, and local testing was done with a development > Kinesis stream in AWS, in order to confirm that metrics were no longer being > reported to CloudWatch after specifying MetricsLevel.NONE in the PySpark > Kinesis streaming context creation, and also worked as it does today when > leaving the MetricsLevel parameter out, which would result in a default of > DETAILED, with CloudWatch metrics appearing again. > https://github.com/apache/spark/pull/36201 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38881) PySpark Kinesis Streaming should expose metricsLevel CloudWatch config that is already supported in the Scala/Java APIs
[ https://issues.apache.org/jira/browse/SPARK-38881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522308#comment-17522308 ] Apache Spark commented on SPARK-38881: -- User 'mkman84' has created a pull request for this issue: https://github.com/apache/spark/pull/36201 > PySpark Kinesis Streaming should expose metricsLevel CloudWatch config that > is already supported in the Scala/Java APIs > --- > > Key: SPARK-38881 > URL: https://issues.apache.org/jira/browse/SPARK-38881 > Project: Spark > Issue Type: Improvement > Components: DStreams, Input/Output, PySpark >Affects Versions: 3.2.1 >Reporter: Mark Khaitman >Priority: Major > > This relates to https://issues.apache.org/jira/browse/SPARK-27420 which was > merged as part of Spark 3.0.0 > This change is desirable as it further exposes the metricsLevel config > parameter that was added for the Scala/Java Spark APIs when working with the > Kinesis Streaming integration, and makes it available to the PySpark API as > well. > This change passes all tests, and local testing was done with a development > Kinesis stream in AWS, in order to confirm that metrics were no longer being > reported to CloudWatch after specifying MetricsLevel.NONE in the PySpark > Kinesis streaming context creation, and also worked as it does today when > leaving the MetricsLevel parameter out, which would result in a default of > DETAILED, with CloudWatch metrics appearing again. > https://github.com/apache/spark/pull/36166 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38881) PySpark Kinesis Streaming should expose metricsLevel CloudWatch config that is already supported in the Scala/Java APIs
[ https://issues.apache.org/jira/browse/SPARK-38881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522307#comment-17522307 ] Apache Spark commented on SPARK-38881: -- User 'mkman84' has created a pull request for this issue: https://github.com/apache/spark/pull/36201 > PySpark Kinesis Streaming should expose metricsLevel CloudWatch config that > is already supported in the Scala/Java APIs > --- > > Key: SPARK-38881 > URL: https://issues.apache.org/jira/browse/SPARK-38881 > Project: Spark > Issue Type: Improvement > Components: DStreams, Input/Output, PySpark >Affects Versions: 3.2.1 >Reporter: Mark Khaitman >Priority: Major > > This relates to https://issues.apache.org/jira/browse/SPARK-27420 which was > merged as part of Spark 3.0.0 > This change is desirable as it further exposes the metricsLevel config > parameter that was added for the Scala/Java Spark APIs when working with the > Kinesis Streaming integration, and makes it available to the PySpark API as > well. > This change passes all tests, and local testing was done with a development > Kinesis stream in AWS, in order to confirm that metrics were no longer being > reported to CloudWatch after specifying MetricsLevel.NONE in the PySpark > Kinesis streaming context creation, and also worked as it does today when > leaving the MetricsLevel parameter out, which would result in a default of > DETAILED, with CloudWatch metrics appearing again. > https://github.com/apache/spark/pull/36166 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38888) Add `RocksDBProvider` similar to `LevelDBProvider`
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522294#comment-17522294 ] Yang Jie commented on SPARK-3: -- I provide a [draft pr |https://github.com/apache/spark/pull/36200]to refactor `LevelDB` usage in this scene, that pr is pre-work of this jira, the refactoring work will help us extend the use of `RocksDB`. [~dongjoon] If you have time, please help to check if this refactoring work is acceptable, thanks ~ > Add `RocksDBProvider` similar to `LevelDBProvider` > -- > > Key: SPARK-3 > URL: https://issues.apache.org/jira/browse/SPARK-3 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, YARN >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > `LevelDBProvider` is used by `ExternalShuffleBlockResolver` and > `YarnShuffleService`, a corresponding `RocksDB` implementation should be added -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38753) Move the tests for `WRITING_JOB_ABORTED` to QueryExecutionErrorsSuite
[ https://issues.apache.org/jira/browse/SPARK-38753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-38753. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36196 [https://github.com/apache/spark/pull/36196] > Move the tests for `WRITING_JOB_ABORTED` to QueryExecutionErrorsSuite > - > > Key: SPARK-38753 > URL: https://issues.apache.org/jira/browse/SPARK-38753 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > Move tests for the error class *WRITING_JOB_ABORTED* from DataSourceV2Suite > to QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38753) Move the tests for `WRITING_JOB_ABORTED` to QueryExecutionErrorsSuite
[ https://issues.apache.org/jira/browse/SPARK-38753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-38753: Assignee: Max Gekk > Move the tests for `WRITING_JOB_ABORTED` to QueryExecutionErrorsSuite > - > > Key: SPARK-38753 > URL: https://issues.apache.org/jira/browse/SPARK-38753 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Move tests for the error class *WRITING_JOB_ABORTED* from DataSourceV2Suite > to QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38722) Test the error class: CAST_CAUSES_OVERFLOW
[ https://issues.apache.org/jira/browse/SPARK-38722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522263#comment-17522263 ] Apache Spark commented on SPARK-38722: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/36199 > Test the error class: CAST_CAUSES_OVERFLOW > -- > > Key: SPARK-38722 > URL: https://issues.apache.org/jira/browse/SPARK-38722 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add at least one test for the error class *CAST_CAUSES_OVERFLOW* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def castingCauseOverflowError(t: Any, dataType: DataType): > ArithmeticException = { > new SparkArithmeticException(errorClass = "CAST_CAUSES_OVERFLOW", > messageParameters = Array(t.toString, dataType.catalogString, > SQLConf.ANSI_ENABLED.key)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38722) Test the error class: CAST_CAUSES_OVERFLOW
[ https://issues.apache.org/jira/browse/SPARK-38722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38722: Assignee: Apache Spark > Test the error class: CAST_CAUSES_OVERFLOW > -- > > Key: SPARK-38722 > URL: https://issues.apache.org/jira/browse/SPARK-38722 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: starter > > Add at least one test for the error class *CAST_CAUSES_OVERFLOW* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def castingCauseOverflowError(t: Any, dataType: DataType): > ArithmeticException = { > new SparkArithmeticException(errorClass = "CAST_CAUSES_OVERFLOW", > messageParameters = Array(t.toString, dataType.catalogString, > SQLConf.ANSI_ENABLED.key)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38722) Test the error class: CAST_CAUSES_OVERFLOW
[ https://issues.apache.org/jira/browse/SPARK-38722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38722: Assignee: (was: Apache Spark) > Test the error class: CAST_CAUSES_OVERFLOW > -- > > Key: SPARK-38722 > URL: https://issues.apache.org/jira/browse/SPARK-38722 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add at least one test for the error class *CAST_CAUSES_OVERFLOW* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def castingCauseOverflowError(t: Any, dataType: DataType): > ArithmeticException = { > new SparkArithmeticException(errorClass = "CAST_CAUSES_OVERFLOW", > messageParameters = Array(t.toString, dataType.catalogString, > SQLConf.ANSI_ENABLED.key)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38902) cast as char/varchar result is string, not expect data type
YuanGuanhu created SPARK-38902: -- Summary: cast as char/varchar result is string, not expect data type Key: SPARK-38902 URL: https://issues.apache.org/jira/browse/SPARK-38902 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1, 3.3.0 Reporter: YuanGuanhu when cast column to char/varchar type, result is string, not expected data type -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38901) Support for more misc functions to push down to data source
Zhixiong Chen created SPARK-38901: - Summary: Support for more misc functions to push down to data source Key: SPARK-38901 URL: https://issues.apache.org/jira/browse/SPARK-38901 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Zhixiong Chen -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38900) Support for more collection functions to push down to data source
Zhixiong Chen created SPARK-38900: - Summary: Support for more collection functions to push down to data source Key: SPARK-38900 URL: https://issues.apache.org/jira/browse/SPARK-38900 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Zhixiong Chen -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38899) Support for more datetime functions to push down to data source
Zhixiong Chen created SPARK-38899: - Summary: Support for more datetime functions to push down to data source Key: SPARK-38899 URL: https://issues.apache.org/jira/browse/SPARK-38899 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Zhixiong Chen -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38898) Failed to build python docker images due to .cache not found
[ https://issues.apache.org/jira/browse/SPARK-38898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38898: Assignee: Apache Spark > Failed to build python docker images due to .cache not found > > > Key: SPARK-38898 > URL: https://issues.apache.org/jira/browse/SPARK-38898 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Apache Spark >Priority: Major > > rm: cannot remove '/root/.cache': No such file or directory > Related: > [https://github.com/volcano-sh/volcano/runs/6020604500?check_suite_focus=true#step:10:2381] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38898) Failed to build python docker images due to .cache not found
[ https://issues.apache.org/jira/browse/SPARK-38898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522203#comment-17522203 ] Apache Spark commented on SPARK-38898: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/36198 > Failed to build python docker images due to .cache not found > > > Key: SPARK-38898 > URL: https://issues.apache.org/jira/browse/SPARK-38898 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > rm: cannot remove '/root/.cache': No such file or directory > Related: > [https://github.com/volcano-sh/volcano/runs/6020604500?check_suite_focus=true#step:10:2381] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38898) Failed to build python docker images due to .cache not found
[ https://issues.apache.org/jira/browse/SPARK-38898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38898: Assignee: (was: Apache Spark) > Failed to build python docker images due to .cache not found > > > Key: SPARK-38898 > URL: https://issues.apache.org/jira/browse/SPARK-38898 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > rm: cannot remove '/root/.cache': No such file or directory > Related: > [https://github.com/volcano-sh/volcano/runs/6020604500?check_suite_focus=true#step:10:2381] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38898) Failed to build python docker images due to .cache not found
[ https://issues.apache.org/jira/browse/SPARK-38898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522201#comment-17522201 ] Apache Spark commented on SPARK-38898: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/36198 > Failed to build python docker images due to .cache not found > > > Key: SPARK-38898 > URL: https://issues.apache.org/jira/browse/SPARK-38898 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > rm: cannot remove '/root/.cache': No such file or directory > Related: > [https://github.com/volcano-sh/volcano/runs/6020604500?check_suite_focus=true#step:10:2381] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38898) Failed to build python docker images due to .cache not found
Yikun Jiang created SPARK-38898: --- Summary: Failed to build python docker images due to .cache not found Key: SPARK-38898 URL: https://issues.apache.org/jira/browse/SPARK-38898 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.4.0 Reporter: Yikun Jiang rm: cannot remove '/root/.cache': No such file or directory Related: [https://github.com/volcano-sh/volcano/runs/6020604500?check_suite_focus=true#step:10:2381] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38897) Support for more string functions to push down to data source
[ https://issues.apache.org/jira/browse/SPARK-38897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated SPARK-38897: -- Parent: SPARK-38852 Issue Type: Sub-task (was: Improvement) > Support for more string functions to push down to data source > - > > Key: SPARK-38897 > URL: https://issues.apache.org/jira/browse/SPARK-38897 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Zhixiong Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38897) Support for more string functions to push down to data source
[ https://issues.apache.org/jira/browse/SPARK-38897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated SPARK-38897: -- Summary: Support for more string functions to push down to data source (was: More string functions support to pushdown to data source) > Support for more string functions to push down to data source > - > > Key: SPARK-38897 > URL: https://issues.apache.org/jira/browse/SPARK-38897 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Zhixiong Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38897) More string functions support to pushdown to data source
Zhixiong Chen created SPARK-38897: - Summary: More string functions support to pushdown to data source Key: SPARK-38897 URL: https://issues.apache.org/jira/browse/SPARK-38897 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Zhixiong Chen -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38438) Can't update spark.jars.packages on existing global/default context
[ https://issues.apache.org/jira/browse/SPARK-38438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522176#comment-17522176 ] Rafal Wojdyla commented on SPARK-38438: --- For posterity see the context in https://lists.apache.org/thread/42rsmbyqc5p1zfv956rwz4wk9lhj4s6w. [~srowen] thanks for the comment, feel free to close this issue if you believe there's no chance of getting this one in. > Can't update spark.jars.packages on existing global/default context > --- > > Key: SPARK-38438 > URL: https://issues.apache.org/jira/browse/SPARK-38438 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core >Affects Versions: 3.2.1 > Environment: py: 3.9 > spark: 3.2.1 >Reporter: Rafal Wojdyla >Priority: Minor > > Reproduction: > {code:python} > from pyspark.sql import SparkSession > # default session: > s = SparkSession.builder.getOrCreate() > # later on we want to update jars.packages, here's e.g. spark-hats > s = (SparkSession.builder > .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") > .getOrCreate()) > # line below returns None, the config was not propagated: > s._sc._conf.get("spark.jars.packages") > {code} > Stopping the context doesn't help, in fact it's even more confusing, because > the configuration is updated, but doesn't have an effect: > {code:python} > from pyspark.sql import SparkSession > # default session: > s = SparkSession.builder.getOrCreate() > s.stop() > s = (SparkSession.builder > .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") > .getOrCreate()) > # now this line returns 'za.co.absa:spark-hats_2.12:0.2.2', but the context > # doesn't download the jar/package, as it would if there was no global context > # thus the extra package is unusable. It's not downloaded, or added to the > # classpath. > s._sc._conf.get("spark.jars.packages") > {code} > One workaround is to stop the context AND kill the JVM gateway, which seems > to be a kind of hard reset: > {code:python} > from pyspark import SparkContext > from pyspark.sql import SparkSession > # default session: > s = SparkSession.builder.getOrCreate() > # Hard reset: > s.stop() > s._sc._gateway.shutdown() > s._sc._gateway.proc.stdin.close() > SparkContext._gateway = None > SparkContext._jvm = None > s = (SparkSession.builder > .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") > .getOrCreate()) > # Now we are guaranteed there's a new spark session, and packages > # are downloaded, added to the classpath etc. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38747) Move the tests for `PARSE_SYNTAX_ERROR` to QueryParsingErrorsSuite
[ https://issues.apache.org/jira/browse/SPARK-38747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38747: Assignee: Apache Spark > Move the tests for `PARSE_SYNTAX_ERROR` to QueryParsingErrorsSuite > -- > > Key: SPARK-38747 > URL: https://issues.apache.org/jira/browse/SPARK-38747 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Move tests for the error class *PARSE_SYNTAX_ERROR* from ErrorParserSuite to > QueryParsingErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38747) Move the tests for `PARSE_SYNTAX_ERROR` to QueryParsingErrorsSuite
[ https://issues.apache.org/jira/browse/SPARK-38747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38747: Assignee: (was: Apache Spark) > Move the tests for `PARSE_SYNTAX_ERROR` to QueryParsingErrorsSuite > -- > > Key: SPARK-38747 > URL: https://issues.apache.org/jira/browse/SPARK-38747 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Move tests for the error class *PARSE_SYNTAX_ERROR* from ErrorParserSuite to > QueryParsingErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38747) Move the tests for `PARSE_SYNTAX_ERROR` to QueryParsingErrorsSuite
[ https://issues.apache.org/jira/browse/SPARK-38747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522169#comment-17522169 ] Apache Spark commented on SPARK-38747: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/36197 > Move the tests for `PARSE_SYNTAX_ERROR` to QueryParsingErrorsSuite > -- > > Key: SPARK-38747 > URL: https://issues.apache.org/jira/browse/SPARK-38747 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Move tests for the error class *PARSE_SYNTAX_ERROR* from ErrorParserSuite to > QueryParsingErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38894) Exclude pyspark.cloudpickle in test coverage report
[ https://issues.apache.org/jira/browse/SPARK-38894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38894: Assignee: Hyukjin Kwon > Exclude pyspark.cloudpickle in test coverage report > --- > > Key: SPARK-38894 > URL: https://issues.apache.org/jira/browse/SPARK-38894 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > > cloudpickle is actually a copy from cloudpickle as is. we don't need to check > test coverage duplicatedly here. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38894) Exclude pyspark.cloudpickle in test coverage report
[ https://issues.apache.org/jira/browse/SPARK-38894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38894. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36191 [https://github.com/apache/spark/pull/36191] > Exclude pyspark.cloudpickle in test coverage report > --- > > Key: SPARK-38894 > URL: https://issues.apache.org/jira/browse/SPARK-38894 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 3.4.0 > > > cloudpickle is actually a copy from cloudpickle as is. we don't need to check > test coverage duplicatedly here. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38893) Test SourceProgress in PySpark
[ https://issues.apache.org/jira/browse/SPARK-38893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38893: Assignee: Hyukjin Kwon > Test SourceProgress in PySpark > -- > > Key: SPARK-38893 > URL: https://issues.apache.org/jira/browse/SPARK-38893 > Project: Spark > Issue Type: Test > Components: PySpark, Structured Streaming >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > There was a mistake and we're not testing SourceProgress (see > https://app.codecov.io/gh/apache/spark/blob/master/python/pyspark/sql/streaming/listener.py) > We should probably test it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38893) Test SourceProgress in PySpark
[ https://issues.apache.org/jira/browse/SPARK-38893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38893. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36190 [https://github.com/apache/spark/pull/36190] > Test SourceProgress in PySpark > -- > > Key: SPARK-38893 > URL: https://issues.apache.org/jira/browse/SPARK-38893 > Project: Spark > Issue Type: Test > Components: PySpark, Structured Streaming >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > There was a mistake and we're not testing SourceProgress (see > https://app.codecov.io/gh/apache/spark/blob/master/python/pyspark/sql/streaming/listener.py) > We should probably test it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38753) Move the tests for `WRITING_JOB_ABORTED` to QueryExecutionErrorsSuite
[ https://issues.apache.org/jira/browse/SPARK-38753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522145#comment-17522145 ] Apache Spark commented on SPARK-38753: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/36196 > Move the tests for `WRITING_JOB_ABORTED` to QueryExecutionErrorsSuite > - > > Key: SPARK-38753 > URL: https://issues.apache.org/jira/browse/SPARK-38753 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Move tests for the error class *WRITING_JOB_ABORTED* from DataSourceV2Suite > to QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38753) Move the tests for `WRITING_JOB_ABORTED` to QueryExecutionErrorsSuite
[ https://issues.apache.org/jira/browse/SPARK-38753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522144#comment-17522144 ] Apache Spark commented on SPARK-38753: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/36196 > Move the tests for `WRITING_JOB_ABORTED` to QueryExecutionErrorsSuite > - > > Key: SPARK-38753 > URL: https://issues.apache.org/jira/browse/SPARK-38753 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Move tests for the error class *WRITING_JOB_ABORTED* from DataSourceV2Suite > to QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38753) Move the tests for `WRITING_JOB_ABORTED` to QueryExecutionErrorsSuite
[ https://issues.apache.org/jira/browse/SPARK-38753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38753: Assignee: Apache Spark > Move the tests for `WRITING_JOB_ABORTED` to QueryExecutionErrorsSuite > - > > Key: SPARK-38753 > URL: https://issues.apache.org/jira/browse/SPARK-38753 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Move tests for the error class *WRITING_JOB_ABORTED* from DataSourceV2Suite > to QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38753) Move the tests for `WRITING_JOB_ABORTED` to QueryExecutionErrorsSuite
[ https://issues.apache.org/jira/browse/SPARK-38753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38753: Assignee: (was: Apache Spark) > Move the tests for `WRITING_JOB_ABORTED` to QueryExecutionErrorsSuite > - > > Key: SPARK-38753 > URL: https://issues.apache.org/jira/browse/SPARK-38753 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Move tests for the error class *WRITING_JOB_ABORTED* from DataSourceV2Suite > to QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38896) Use tryWithResource to recycling KVStoreIterator and remove finalize() from LevelDB/RocksDB
[ https://issues.apache.org/jira/browse/SPARK-38896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38896: Assignee: Apache Spark > Use tryWithResource to recycling KVStoreIterator and remove finalize() from > LevelDB/RocksDB > --- > > Key: SPARK-38896 > URL: https://issues.apache.org/jira/browse/SPARK-38896 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > Use `Utils.tryWithResource` to recycling all `KVStoreIterator` opened by > RocksDB/LevelDB and remove `finalize()` medho from LevelDB/RocksDB -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38896) Use tryWithResource to recycling KVStoreIterator and remove finalize() from LevelDB/RocksDB
[ https://issues.apache.org/jira/browse/SPARK-38896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522123#comment-17522123 ] Apache Spark commented on SPARK-38896: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/36195 > Use tryWithResource to recycling KVStoreIterator and remove finalize() from > LevelDB/RocksDB > --- > > Key: SPARK-38896 > URL: https://issues.apache.org/jira/browse/SPARK-38896 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > Use `Utils.tryWithResource` to recycling all `KVStoreIterator` opened by > RocksDB/LevelDB and remove `finalize()` medho from LevelDB/RocksDB -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org