[jira] [Assigned] (SPARK-44960) Unescape and consist error summary across UI pages
[ https://issues.apache.org/jira/browse/SPARK-44960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-44960: Assignee: Kent Yao > Unescape and consist error summary across UI pages > -- > > Key: SPARK-44960 > URL: https://issues.apache.org/jira/browse/SPARK-44960 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > We escape html4 for error summary for some pages., it's not necessary -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44960) Unescape and consist error summary across UI pages
[ https://issues.apache.org/jira/browse/SPARK-44960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-44960. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42674 [https://github.com/apache/spark/pull/42674] > Unescape and consist error summary across UI pages > -- > > Key: SPARK-44960 > URL: https://issues.apache.org/jira/browse/SPARK-44960 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 4.0.0 > > > We escape html4 for error summary for some pages., it's not necessary -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44091) Introduce withResourceTypes to `ResourceRequestTestHelper` to restore `resourceTypes` as default value after testing
[ https://issues.apache.org/jira/browse/SPARK-44091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44091: -- Summary: Introduce withResourceTypes to `ResourceRequestTestHelper` to restore `resourceTypes` as default value after testing (was: Yarn module test failed on MacOs/Apple Slicon) > Introduce withResourceTypes to `ResourceRequestTestHelper` to restore > `resourceTypes` as default value after testing > > > Key: SPARK-44091 > URL: https://issues.apache.org/jira/browse/SPARK-44091 > Project: Spark > Issue Type: Bug > Components: Tests, YARN >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > When I run > ``` > build/sbt "yarn/test" -Pyarn > -Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest > ``` > The YarnClusterSuite will have some test failures as follows: > ``` > [info] - run Spark in yarn-client mode *** FAILED *** (3 seconds, 123 > milliseconds) > [info] FAILED did not equal FINISHED (stdout/stderr was not captured) > (BaseYarnClusterSuite.scala:238) > [info] org.scalatest.exceptions.TestFailedException: > [info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > [info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > [info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > [info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > [info] at > org.apache.spark.deploy.yarn.BaseYarnClusterSuite.checkResult(BaseYarnClusterSuite.scala:238) > [info] at > org.apache.spark.deploy.yarn.YarnClusterSuite.testBasicYarnApp(YarnClusterSuite.scala:350) > [info] at > org.apache.spark.deploy.yarn.YarnClusterSuite.$anonfun$new$1(YarnClusterSuite.scala:95) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > [info] at > org.apache.spark.deploy.yarn.BaseYarnClusterSuite.$anonfun$test$1(BaseYarnClusterSuite.scala:77) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:221) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:67) > [info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > [info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:67) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:431) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > [info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > [info] at org.scalatest.Suite.run(Suite.scala:1114) > [info] at org.scalatest.Suite.run$(Suite.scala:1096) > [info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) > [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) > [info] at >
[jira] [Assigned] (SPARK-44091) Yarn module test failed on MacOs/Apple Slicon
[ https://issues.apache.org/jira/browse/SPARK-44091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44091: - Assignee: Yang Jie > Yarn module test failed on MacOs/Apple Slicon > - > > Key: SPARK-44091 > URL: https://issues.apache.org/jira/browse/SPARK-44091 > Project: Spark > Issue Type: Bug > Components: Tests, YARN >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > When I run > ``` > build/sbt "yarn/test" -Pyarn > -Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest > ``` > The YarnClusterSuite will have some test failures as follows: > ``` > [info] - run Spark in yarn-client mode *** FAILED *** (3 seconds, 123 > milliseconds) > [info] FAILED did not equal FINISHED (stdout/stderr was not captured) > (BaseYarnClusterSuite.scala:238) > [info] org.scalatest.exceptions.TestFailedException: > [info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > [info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > [info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > [info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > [info] at > org.apache.spark.deploy.yarn.BaseYarnClusterSuite.checkResult(BaseYarnClusterSuite.scala:238) > [info] at > org.apache.spark.deploy.yarn.YarnClusterSuite.testBasicYarnApp(YarnClusterSuite.scala:350) > [info] at > org.apache.spark.deploy.yarn.YarnClusterSuite.$anonfun$new$1(YarnClusterSuite.scala:95) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > [info] at > org.apache.spark.deploy.yarn.BaseYarnClusterSuite.$anonfun$test$1(BaseYarnClusterSuite.scala:77) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:221) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:67) > [info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > [info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:67) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:431) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > [info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > [info] at org.scalatest.Suite.run(Suite.scala:1114) > [info] at org.scalatest.Suite.run$(Suite.scala:1096) > [info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) > [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) > [info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:67) > [info] at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) > [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > [info] at
[jira] [Resolved] (SPARK-44091) Yarn module test failed on MacOs/Apple Slicon
[ https://issues.apache.org/jira/browse/SPARK-44091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44091. --- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 41673 [https://github.com/apache/spark/pull/41673] > Yarn module test failed on MacOs/Apple Slicon > - > > Key: SPARK-44091 > URL: https://issues.apache.org/jira/browse/SPARK-44091 > Project: Spark > Issue Type: Bug > Components: Tests, YARN >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > When I run > ``` > build/sbt "yarn/test" -Pyarn > -Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest > ``` > The YarnClusterSuite will have some test failures as follows: > ``` > [info] - run Spark in yarn-client mode *** FAILED *** (3 seconds, 123 > milliseconds) > [info] FAILED did not equal FINISHED (stdout/stderr was not captured) > (BaseYarnClusterSuite.scala:238) > [info] org.scalatest.exceptions.TestFailedException: > [info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > [info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > [info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > [info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > [info] at > org.apache.spark.deploy.yarn.BaseYarnClusterSuite.checkResult(BaseYarnClusterSuite.scala:238) > [info] at > org.apache.spark.deploy.yarn.YarnClusterSuite.testBasicYarnApp(YarnClusterSuite.scala:350) > [info] at > org.apache.spark.deploy.yarn.YarnClusterSuite.$anonfun$new$1(YarnClusterSuite.scala:95) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > [info] at > org.apache.spark.deploy.yarn.BaseYarnClusterSuite.$anonfun$test$1(BaseYarnClusterSuite.scala:77) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:221) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:67) > [info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > [info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:67) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:431) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > [info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > [info] at org.scalatest.Suite.run(Suite.scala:1114) > [info] at org.scalatest.Suite.run$(Suite.scala:1096) > [info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) > [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) > [info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:67) > [info] at >
[jira] [Assigned] (SPARK-44945) Automate PySpark error class documentation
[ https://issues.apache.org/jira/browse/SPARK-44945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44945: Assignee: Haejoon Lee > Automate PySpark error class documentation > -- > > Key: SPARK-44945 > URL: https://issues.apache.org/jira/browse/SPARK-44945 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > We need to automate the process for PySpark error class documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44945) Automate PySpark error class documentation
[ https://issues.apache.org/jira/browse/SPARK-44945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44945. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42658 [https://github.com/apache/spark/pull/42658] > Automate PySpark error class documentation > -- > > Key: SPARK-44945 > URL: https://issues.apache.org/jira/browse/SPARK-44945 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > We need to automate the process for PySpark error class documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44897) Local Property Propagation to Subquery Broadcast Exec
[ https://issues.apache.org/jira/browse/SPARK-44897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-44897. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 42587 [https://github.com/apache/spark/pull/42587] > Local Property Propagation to Subquery Broadcast Exec > - > > Key: SPARK-44897 > URL: https://issues.apache.org/jira/browse/SPARK-44897 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Michael Chen >Assignee: Michael Chen >Priority: Major > Fix For: 3.5.0 > > > https://issues.apache.org/jira/browse/SPARK-32748 was opened and then I > believe mistakenly reverted to address this issue. The claim was local > properties propagation in SubqueryBroadcastExec to the dynamic pruning thread > is not necessary because they will be propagated by broadcast threads > anyways. However, in a scenario where the dynamic pruning thread is first to > initialize the broadcast relation future, the local properties will not be > propagated correctly. This is because the local properties being propagated > to the broadcast threads would already be incorrect. > I do not have a good way of reproducing this consistently because generally > the SubqueryBroadcastExec is not the first to initialize the broadcast > relation future, but by adding a Thread.sleep(1) into the doPrepare method of > SubqueryBroadcastExec, the following test always fails. > {code:java} > withSQLConf(StaticSQLConf.SUBQUERY_BROADCAST_MAX_THREAD_THRESHOLD.key -> "1") > { > withTable("a", "b") { > val confKey = "spark.sql.y" > val confValue1 = UUID.randomUUID().toString() > val confValue2 = UUID.randomUUID().toString() > Seq((confValue1, "1")).toDF("key", "value") > .write > .format("parquet") > .partitionBy("key") > .mode("overwrite") > .saveAsTable("a") > val df1 = spark.table("a") > def generateBroadcastDataFrame(confKey: String, confValue: String): > Dataset[String] = { > val df = spark.range(1).mapPartitions { _ => > Iterator(TaskContext.get.getLocalProperty(confKey)) > }.filter($"value".contains(confValue)).as("c") > df.hint("broadcast") > } > // set local property and assert > val df2 = generateBroadcastDataFrame(confKey, confValue1) > spark.sparkContext.setLocalProperty(confKey, confValue1) > val checkDF = df1.join(df2).where($"a.key" === > $"c.value").select($"a.key", $"c.value") > val checks = checkDF.collect() > assert(checks.forall(_.toSeq == Seq(confValue1, confValue1))) > // change local property and re-assert > Seq((confValue2, "1")).toDF("key", "value") > .write > .format("parquet") > .partitionBy("key") > .mode("overwrite") > .saveAsTable("b") > val df3 = spark.table("b") > val df4 = generateBroadcastDataFrame(confKey, confValue2) > spark.sparkContext.setLocalProperty(confKey, confValue2) > val checks2DF = df3.join(df4).where($"b.key" === > $"c.value").select($"b.key", $"c.value") > val checks2 = checks2DF.collect() > assert(checks2.forall(_.toSeq == Seq(confValue2, confValue2))) > assert(checks2.nonEmpty) > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44897) Local Property Propagation to Subquery Broadcast Exec
[ https://issues.apache.org/jira/browse/SPARK-44897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-44897: --- Assignee: Michael Chen > Local Property Propagation to Subquery Broadcast Exec > - > > Key: SPARK-44897 > URL: https://issues.apache.org/jira/browse/SPARK-44897 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Michael Chen >Assignee: Michael Chen >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-32748 was opened and then I > believe mistakenly reverted to address this issue. The claim was local > properties propagation in SubqueryBroadcastExec to the dynamic pruning thread > is not necessary because they will be propagated by broadcast threads > anyways. However, in a scenario where the dynamic pruning thread is first to > initialize the broadcast relation future, the local properties will not be > propagated correctly. This is because the local properties being propagated > to the broadcast threads would already be incorrect. > I do not have a good way of reproducing this consistently because generally > the SubqueryBroadcastExec is not the first to initialize the broadcast > relation future, but by adding a Thread.sleep(1) into the doPrepare method of > SubqueryBroadcastExec, the following test always fails. > {code:java} > withSQLConf(StaticSQLConf.SUBQUERY_BROADCAST_MAX_THREAD_THRESHOLD.key -> "1") > { > withTable("a", "b") { > val confKey = "spark.sql.y" > val confValue1 = UUID.randomUUID().toString() > val confValue2 = UUID.randomUUID().toString() > Seq((confValue1, "1")).toDF("key", "value") > .write > .format("parquet") > .partitionBy("key") > .mode("overwrite") > .saveAsTable("a") > val df1 = spark.table("a") > def generateBroadcastDataFrame(confKey: String, confValue: String): > Dataset[String] = { > val df = spark.range(1).mapPartitions { _ => > Iterator(TaskContext.get.getLocalProperty(confKey)) > }.filter($"value".contains(confValue)).as("c") > df.hint("broadcast") > } > // set local property and assert > val df2 = generateBroadcastDataFrame(confKey, confValue1) > spark.sparkContext.setLocalProperty(confKey, confValue1) > val checkDF = df1.join(df2).where($"a.key" === > $"c.value").select($"a.key", $"c.value") > val checks = checkDF.collect() > assert(checks.forall(_.toSeq == Seq(confValue1, confValue1))) > // change local property and re-assert > Seq((confValue2, "1")).toDF("key", "value") > .write > .format("parquet") > .partitionBy("key") > .mode("overwrite") > .saveAsTable("b") > val df3 = spark.table("b") > val df4 = generateBroadcastDataFrame(confKey, confValue2) > spark.sparkContext.setLocalProperty(confKey, confValue2) > val checks2DF = df3.join(df4).where($"b.key" === > $"c.value").select($"b.key", $"c.value") > val checks2 = checks2DF.collect() > assert(checks2.forall(_.toSeq == Seq(confValue2, confValue2))) > assert(checks2.nonEmpty) > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44963) Make PySpark (pyspark-ml module) tests passing without any optional dependency
[ https://issues.apache.org/jira/browse/SPARK-44963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44963: Assignee: Hyukjin Kwon > Make PySpark (pyspark-ml module) tests passing without any optional dependency > -- > > Key: SPARK-44963 > URL: https://issues.apache.org/jira/browse/SPARK-44963 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > {code} > ./python/run-tests --python-executables=python3 --modules=pyspark-ml > ... > Starting test(python3): pyspark.ml.tests.test_model_cache (temp output: > /Users/hyukjin.kwon/workspace/forked/spark/python/target/f6f88c1e-0cb2-43e6-980e-47f1cdb9b463/python3__pyspark.ml.tests.test_model_cache__zij05l1u.log) > Traceback (most recent call last): > File > "/Users/hyukjin.kwon/miniconda3/envs/vanilla-3.10/lib/python3.10/runpy.py", > line 196, in _run_module_as_main > return _run_code(code, main_globals, None, > File > "/Users/hyukjin.kwon/miniconda3/envs/vanilla-3.10/lib/python3.10/runpy.py", > line 86, in _run_code > exec(code, run_globals) > File > "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/ml/tests/test_functions.py", > line 18, in > import pandas as pd > ModuleNotFoundError: No module named 'pandas' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44963) Make PySpark (pyspark-ml module) tests passing without any optional dependency
[ https://issues.apache.org/jira/browse/SPARK-44963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44963. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42678 [https://github.com/apache/spark/pull/42678] > Make PySpark (pyspark-ml module) tests passing without any optional dependency > -- > > Key: SPARK-44963 > URL: https://issues.apache.org/jira/browse/SPARK-44963 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 4.0.0 > > > {code} > ./python/run-tests --python-executables=python3 --modules=pyspark-ml > ... > Starting test(python3): pyspark.ml.tests.test_model_cache (temp output: > /Users/hyukjin.kwon/workspace/forked/spark/python/target/f6f88c1e-0cb2-43e6-980e-47f1cdb9b463/python3__pyspark.ml.tests.test_model_cache__zij05l1u.log) > Traceback (most recent call last): > File > "/Users/hyukjin.kwon/miniconda3/envs/vanilla-3.10/lib/python3.10/runpy.py", > line 196, in _run_module_as_main > return _run_code(code, main_globals, None, > File > "/Users/hyukjin.kwon/miniconda3/envs/vanilla-3.10/lib/python3.10/runpy.py", > line 86, in _run_code > exec(code, run_globals) > File > "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/ml/tests/test_functions.py", > line 18, in > import pandas as pd > ModuleNotFoundError: No module named 'pandas' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44982) Mark Spark Connect configurations as static configuration
Hyukjin Kwon created SPARK-44982: Summary: Mark Spark Connect configurations as static configuration Key: SPARK-44982 URL: https://issues.apache.org/jira/browse/SPARK-44982 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Spark Connect server configurations are not marked either static or runtime yet. We should mark them static. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44981) Filter out static configurations used in local mode
Hyukjin Kwon created SPARK-44981: Summary: Filter out static configurations used in local mode Key: SPARK-44981 URL: https://issues.apache.org/jira/browse/SPARK-44981 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon If you set a static configuration with `--remote local` mode, it shows a bunch of warnings as below: {code} 23/08/28 11:39:42 ERROR ErrorUtils: Spark Connect RPC error during: config. UserId: hyukjin.kwon. SessionId: 424674ef-af95-4b12-b10e-86479413f9fd. org.apache.spark.sql.AnalysisException: Cannot modify the value of a static config: spark.connect.copyFromLocalToFs.allowDestLocal. at org.apache.spark.sql.errors.QueryCompilationErrors$.cannotModifyValueOfStaticConfigError(QueryCompilationErrors.scala:3227) at org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:162) at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42) at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1(SparkConnectConfigHandler.scala:67) at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1$adapted(SparkConnectConfigHandler.scala:65) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handleSet(SparkConnectConfigHandler.scala:65) at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handle(SparkConnectConfigHandler.scala:40) at org.apache.spark.sql.connect.service.SparkConnectService.config(SparkConnectService.scala:120) at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:751) at org.sparkproject.connect.grpc.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) at org.sparkproject.connect.grpc.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:346) at org.sparkproject.connect.grpc.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:860) at org.sparkproject.connect.grpc.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.sparkproject.connect.grpc.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44981) Filter out static configurations used in local mode
[ https://issues.apache.org/jira/browse/SPARK-44981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44981: - Priority: Minor (was: Major) > Filter out static configurations used in local mode > --- > > Key: SPARK-44981 > URL: https://issues.apache.org/jira/browse/SPARK-44981 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > If you set a static configuration with `--remote local` mode, it shows a > bunch of warnings as below: > {code} > 23/08/28 11:39:42 ERROR ErrorUtils: Spark Connect RPC error during: config. > UserId: hyukjin.kwon. SessionId: 424674ef-af95-4b12-b10e-86479413f9fd. > org.apache.spark.sql.AnalysisException: Cannot modify the value of a static > config: spark.connect.copyFromLocalToFs.allowDestLocal. > at > org.apache.spark.sql.errors.QueryCompilationErrors$.cannotModifyValueOfStaticConfigError(QueryCompilationErrors.scala:3227) > at > org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:162) > at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42) > at > org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1(SparkConnectConfigHandler.scala:67) > at > org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1$adapted(SparkConnectConfigHandler.scala:65) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at > org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handleSet(SparkConnectConfigHandler.scala:65) > at > org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handle(SparkConnectConfigHandler.scala:40) > at > org.apache.spark.sql.connect.service.SparkConnectService.config(SparkConnectService.scala:120) > at > org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:751) > at > org.sparkproject.connect.grpc.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) > at > org.sparkproject.connect.grpc.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:346) > at > org.sparkproject.connect.grpc.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:860) > at > org.sparkproject.connect.grpc.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.sparkproject.connect.grpc.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44980) createDataFrame should respect the names namedtuples properly
[ https://issues.apache.org/jira/browse/SPARK-44980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44980: - Description: {code} from collections import namedtuple MyTuple = namedtuple("MyTuple", ["zz", "b", "a"]) class MyInheritedTuple(MyTuple): pass df = spark.createDataFrame([MyInheritedTuple(1, 2, 3), MyInheritedTuple(11, 22, 33)]) df.collect() {code} {code} [Row(zz=None, b=None, a=None), Row(zz=None, b=None, a=None)] {code} should be {code} [Row(zz=1, b=2, a=3), Row(zz=11, b=22, a=33)] {code} was: {code} from collections import namedtuple MyTuple = namedtuple("MyTuple", ["zz", "b", "a"]) df = spark.createDataFrame([MyTuple(1, 2, 3), MyTuple(11, 22, 33)], "a: long, b: long, zz: long") df.show() {code} {code} +---+---+---+ | a| b| zz| +---+---+---+ | 1| 2| 3| | 11| 22| 33| +---+---+---+ {code} should be {code} +---+---+---+ | a| b| zz| +---+---+---+ | 3| 2| 1| | 33| 22| 11| +---+---+---+ {code} > createDataFrame should respect the names namedtuples properly > - > > Key: SPARK-44980 > URL: https://issues.apache.org/jira/browse/SPARK-44980 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > from collections import namedtuple > MyTuple = namedtuple("MyTuple", ["zz", "b", "a"]) > class MyInheritedTuple(MyTuple): > pass > df = spark.createDataFrame([MyInheritedTuple(1, 2, 3), MyInheritedTuple(11, > 22, 33)]) > df.collect() > {code} > {code} > [Row(zz=None, b=None, a=None), Row(zz=None, b=None, a=None)] > {code} > should be > {code} > [Row(zz=1, b=2, a=3), Row(zz=11, b=22, a=33)] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44980) Fix inherited namedtuples to work in createDataFrame
[ https://issues.apache.org/jira/browse/SPARK-44980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44980: - Summary: Fix inherited namedtuples to work in createDataFrame (was: createDataFrame should respect the names namedtuples properly) > Fix inherited namedtuples to work in createDataFrame > > > Key: SPARK-44980 > URL: https://issues.apache.org/jira/browse/SPARK-44980 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > from collections import namedtuple > MyTuple = namedtuple("MyTuple", ["zz", "b", "a"]) > class MyInheritedTuple(MyTuple): > pass > df = spark.createDataFrame([MyInheritedTuple(1, 2, 3), MyInheritedTuple(11, > 22, 33)]) > df.collect() > {code} > {code} > [Row(zz=None, b=None, a=None), Row(zz=None, b=None, a=None)] > {code} > should be > {code} > [Row(zz=1, b=2, a=3), Row(zz=11, b=22, a=33)] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44980) createDataFrame should respect the names namedtuples properly
Hyukjin Kwon created SPARK-44980: Summary: createDataFrame should respect the names namedtuples properly Key: SPARK-44980 URL: https://issues.apache.org/jira/browse/SPARK-44980 Project: Spark Issue Type: Bug Components: Connect, PySpark Affects Versions: 3.5.0 Reporter: Hyukjin Kwon {code} from collections import namedtuple MyTuple = namedtuple("MyTuple", ["zz", "b", "a"]) df = spark.createDataFrame([MyTuple(1, 2, 3), MyTuple(11, 22, 33)], "a: long, b: long, zz: long") df.show() {code} {code} +---+---+---+ | a| b| zz| +---+---+---+ | 1| 2| 3| | 11| 22| 33| +---+---+---+ {code} should be {code} +---+---+---+ | a| b| zz| +---+---+---+ | 3| 2| 1| | 33| 22| 11| +---+---+---+ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44131) Add call_function for Scala API
[ https://issues.apache.org/jira/browse/SPARK-44131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759408#comment-17759408 ] Xiao Li commented on SPARK-44131: - [https://github.com/apache/spark/pull/41950] reverted the deprecation. > Add call_function for Scala API > --- > > Key: SPARK-44131 > URL: https://issues.apache.org/jira/browse/SPARK-44131 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > > The scala API for SQL exists a method call_udf used to call the user-defined > functions. > In fact, call_udf also could call the builtin functions. > The behavior is confused for users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44131) Add call_function for Scala API
[ https://issues.apache.org/jira/browse/SPARK-44131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-44131: Summary: Add call_function for Scala API (was: Add call_function and deprecate call_udf for Scala API) > Add call_function for Scala API > --- > > Key: SPARK-44131 > URL: https://issues.apache.org/jira/browse/SPARK-44131 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > > The scala API for SQL exists a method call_udf used to call the user-defined > functions. > In fact, call_udf also could call the builtin functions. > The behavior is confused for users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44978) Fix SQLQueryTestSuite unable create table normally
[ https://issues.apache.org/jira/browse/SPARK-44978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44978. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42691 [https://github.com/apache/spark/pull/42691] > Fix SQLQueryTestSuite unable create table normally > -- > > Key: SPARK-44978 > URL: https://issues.apache.org/jira/browse/SPARK-44978 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Jia Fan >Assignee: Jia Fan >Priority: Major > Fix For: 4.0.0 > > Attachments: image-2023-08-27-14-25-21-843.png > > > When we repeatedly execute SQLQueryTestSuite to generate the golden file, the > warehouse file executed last time is not cleaned up (maybe killed when test > not finish), resulting in an error result > !image-2023-08-27-14-25-21-843.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44978) Fix SQLQueryTestSuite unable create table normally
[ https://issues.apache.org/jira/browse/SPARK-44978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44978: Assignee: Jia Fan > Fix SQLQueryTestSuite unable create table normally > -- > > Key: SPARK-44978 > URL: https://issues.apache.org/jira/browse/SPARK-44978 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Jia Fan >Assignee: Jia Fan >Priority: Major > Attachments: image-2023-08-27-14-25-21-843.png > > > When we repeatedly execute SQLQueryTestSuite to generate the golden file, the > warehouse file executed last time is not cleaned up (maybe killed when test > not finish), resulting in an error result > !image-2023-08-27-14-25-21-843.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44264) DeepSpeed Distributor
[ https://issues.apache.org/jira/browse/SPARK-44264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-44264: Summary: DeepSpeed Distributor (was: DeepSpeed Distrobutor) > DeepSpeed Distributor > - > > Key: SPARK-44264 > URL: https://issues.apache.org/jira/browse/SPARK-44264 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 3.4.1 >Reporter: Lu Wang >Priority: Critical > Fix For: 3.5.0 > > Attachments: Trying to Run Deepspeed Funcs.html > > > To make it easier for Pyspark users to run distributed training and inference > with DeepSpeed on spark clusters using PySpark. This was a project determined > by the Databricks ML Training Team. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44979) Cache results of simple udfs on executors if same arguments are passed.
[ https://issues.apache.org/jira/browse/SPARK-44979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Dharme updated SPARK-44979: -- Description: Consider two dataframes : {{keyword_given = [ ["green pstr",], ["greenpstr",], ["wlmrt", ], ["walmart",], ["walmart super",] ]}} {{variations = [ ("type green pstr", "ABC", 100), ("type green pstr","PQR",200), ("type green pstr", "NZSD", 2999), ("wlmrt payment","walmart",200), ("wlmrt solutions", "walmart", 200), ("nppssdwlmrt", "walmart", 2000) ]}} {{Imagine I have a task to do fuzzy substring matching between keyword and variation[0] using in built levenstein function. It is possible to optimize this futher in the code itself where we extract out the uniques and then do fuzzy matching on the uniques and join back with the original tables. }} {{But it could be possible as an optimization to cache the results of the already computed udfs till now and do a lookup on each executor separately.}} Just a thought. Not sure if it makes any sense. This behaviour could be behind a config. was: Consider two dataframes : {{keyword_given = [ ["green pstr",], ["greenpstr",], ["wlmrt", ], ["walmart",], ["walmart super",] ]}} {{variations = [ ("type green pstr", "ABC", 100), ("type green pstr","PQR",200), ("type green pstr", "NZSD", 2999), ("wlmrt payment","walmart",200), ("wlmrt solutions", "walmart", 200), ("nppssdwlmrt", "walmart", 2000) ]}} {{Imagine I have a task to do fuzzy substring matching between keyword and variation[0] using in built levenstein function. It is possible to optimize this futher in the code itself where we extract out the uniques and then do fuzzy matching on the uniques and join back with the original table. }} {{But it could be possible as an optimization to cache the results of the already computed udfs till now and do a lookup on each executor separately.}} Just a thought. Not sure if it makes any sense. This behaviour could be behind a config. {{}} {{}} {{}} {{{}{}}}{{{}{}}} > Cache results of simple udfs on executors if same arguments are passed. > --- > > Key: SPARK-44979 > URL: https://issues.apache.org/jira/browse/SPARK-44979 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Dinesh Dharme >Priority: Minor > > Consider two dataframes : > {{keyword_given = [ > ["green pstr",], > ["greenpstr",], > ["wlmrt", ], > ["walmart",], > ["walmart super",] > ]}} > {{variations = [ > ("type green pstr", "ABC", 100), > ("type green pstr","PQR",200), > ("type green pstr", "NZSD", 2999), > ("wlmrt payment","walmart",200), > ("wlmrt solutions", "walmart", 200), > ("nppssdwlmrt", "walmart", 2000) > ]}} > {{Imagine I have a task to do fuzzy substring matching between keyword and > variation[0] using in built levenstein function. It is possible to optimize > this futher in the code itself where we extract out the uniques and then do > fuzzy matching on the uniques and join back with the original tables. }} > {{But it could be possible as an optimization to cache the results of the > already computed udfs till now and do a lookup on each executor separately.}} > Just a thought. Not sure if it makes any sense. This behaviour could be > behind a config. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44979) Cache results of simple udfs on executors if same arguments are passed.
Dinesh Dharme created SPARK-44979: - Summary: Cache results of simple udfs on executors if same arguments are passed. Key: SPARK-44979 URL: https://issues.apache.org/jira/browse/SPARK-44979 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.1 Reporter: Dinesh Dharme Consider two dataframes : {{keyword_given = [ ["green pstr",], ["greenpstr",], ["wlmrt", ], ["walmart",], ["walmart super",] ]}} {{variations = [ ("type green pstr", "ABC", 100), ("type green pstr","PQR",200), ("type green pstr", "NZSD", 2999), ("wlmrt payment","walmart",200), ("wlmrt solutions", "walmart", 200), ("nppssdwlmrt", "walmart", 2000) ]}} {{Imagine I have a task to do fuzzy substring matching between keyword and variation[0] using in built levenstein function. It is possible to optimize this futher in the code itself where we extract out the uniques and then do fuzzy matching on the uniques and join back with the original table. }} {{But it could be possible as an optimization to cache the results of the already computed udfs till now and do a lookup on each executor separately.}} Just a thought. Not sure if it makes any sense. This behaviour could be behind a config. {{}} {{}} {{}} {{{}{}}}{{{}{}}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44978) Fix SQLQueryTestSuite unable create table normally
[ https://issues.apache.org/jira/browse/SPARK-44978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan updated SPARK-44978: Description: When we repeatedly execute SQLQueryTestSuite to generate the golden file, the warehouse file executed last time is not cleaned up (maybe killed when test not finish), resulting in an error result !image-2023-08-27-14-25-21-843.png! was:When we repeatedly execute SQLQueryTestSuite to generate the golden file, the warehouse file executed last time is not cleaned up (maybe killed when test not finish), resulting in an error result !image-2023-08-27-14-22-43-361.png! > Fix SQLQueryTestSuite unable create table normally > -- > > Key: SPARK-44978 > URL: https://issues.apache.org/jira/browse/SPARK-44978 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Jia Fan >Priority: Major > Attachments: image-2023-08-27-14-25-21-843.png > > > When we repeatedly execute SQLQueryTestSuite to generate the golden file, the > warehouse file executed last time is not cleaned up (maybe killed when test > not finish), resulting in an error result > !image-2023-08-27-14-25-21-843.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44978) Fix SQLQueryTestSuite unable create table normally
[ https://issues.apache.org/jira/browse/SPARK-44978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan updated SPARK-44978: Attachment: image-2023-08-27-14-25-21-843.png > Fix SQLQueryTestSuite unable create table normally > -- > > Key: SPARK-44978 > URL: https://issues.apache.org/jira/browse/SPARK-44978 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Jia Fan >Priority: Major > Attachments: image-2023-08-27-14-25-21-843.png > > > When we repeatedly execute SQLQueryTestSuite to generate the golden file, the > warehouse file executed last time is not cleaned up (maybe killed when test > not finish), resulting in an error result !image-2023-08-27-14-22-43-361.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44978) Fix SQLQueryTestSuite unable create table normally
Jia Fan created SPARK-44978: --- Summary: Fix SQLQueryTestSuite unable create table normally Key: SPARK-44978 URL: https://issues.apache.org/jira/browse/SPARK-44978 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Jia Fan When we repeatedly execute SQLQueryTestSuite to generate the golden file, the warehouse file executed last time is not cleaned up (maybe killed when test not finish), resulting in an error result !image-2023-08-27-14-22-43-361.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org