[jira] [Assigned] (SPARK-44960) Unescape and consist error summary across UI pages

2023-08-27 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-44960:


Assignee: Kent Yao

> Unescape and consist error summary across UI pages
> --
>
> Key: SPARK-44960
> URL: https://issues.apache.org/jira/browse/SPARK-44960
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> We escape html4 for error summary for some pages., it's not necessary



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44960) Unescape and consist error summary across UI pages

2023-08-27 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-44960.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42674
[https://github.com/apache/spark/pull/42674]

> Unescape and consist error summary across UI pages
> --
>
> Key: SPARK-44960
> URL: https://issues.apache.org/jira/browse/SPARK-44960
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 4.0.0
>
>
> We escape html4 for error summary for some pages., it's not necessary



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44091) Introduce withResourceTypes to `ResourceRequestTestHelper` to restore `resourceTypes` as default value after testing

2023-08-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44091:
--
Summary: Introduce withResourceTypes to `ResourceRequestTestHelper` to 
restore `resourceTypes` as default value after testing  (was: Yarn module test 
failed on MacOs/Apple Slicon)

> Introduce withResourceTypes to `ResourceRequestTestHelper` to restore 
> `resourceTypes` as default value after testing
> 
>
> Key: SPARK-44091
> URL: https://issues.apache.org/jira/browse/SPARK-44091
> Project: Spark
>  Issue Type: Bug
>  Components: Tests, YARN
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> When I run 
> ```
> build/sbt "yarn/test" -Pyarn 
> -Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest
> ```
> The YarnClusterSuite will have some test failures as follows:
> ```
> [info] - run Spark in yarn-client mode *** FAILED *** (3 seconds, 123 
> milliseconds)
> [info]   FAILED did not equal FINISHED (stdout/stderr was not captured) 
> (BaseYarnClusterSuite.scala:238)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> [info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> [info]   at 
> org.apache.spark.deploy.yarn.BaseYarnClusterSuite.checkResult(BaseYarnClusterSuite.scala:238)
> [info]   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite.testBasicYarnApp(YarnClusterSuite.scala:350)
> [info]   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite.$anonfun$new$1(YarnClusterSuite.scala:95)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at 
> org.apache.spark.deploy.yarn.BaseYarnClusterSuite.$anonfun$test$1(BaseYarnClusterSuite.scala:77)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:221)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:67)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:67)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> [info]   at 
> 

[jira] [Assigned] (SPARK-44091) Yarn module test failed on MacOs/Apple Slicon

2023-08-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44091:
-

Assignee: Yang Jie

> Yarn module test failed on MacOs/Apple Slicon
> -
>
> Key: SPARK-44091
> URL: https://issues.apache.org/jira/browse/SPARK-44091
> Project: Spark
>  Issue Type: Bug
>  Components: Tests, YARN
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> When I run 
> ```
> build/sbt "yarn/test" -Pyarn 
> -Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest
> ```
> The YarnClusterSuite will have some test failures as follows:
> ```
> [info] - run Spark in yarn-client mode *** FAILED *** (3 seconds, 123 
> milliseconds)
> [info]   FAILED did not equal FINISHED (stdout/stderr was not captured) 
> (BaseYarnClusterSuite.scala:238)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> [info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> [info]   at 
> org.apache.spark.deploy.yarn.BaseYarnClusterSuite.checkResult(BaseYarnClusterSuite.scala:238)
> [info]   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite.testBasicYarnApp(YarnClusterSuite.scala:350)
> [info]   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite.$anonfun$new$1(YarnClusterSuite.scala:95)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at 
> org.apache.spark.deploy.yarn.BaseYarnClusterSuite.$anonfun$test$1(BaseYarnClusterSuite.scala:77)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:221)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:67)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:67)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:67)
> [info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info]   at 

[jira] [Resolved] (SPARK-44091) Yarn module test failed on MacOs/Apple Slicon

2023-08-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44091.
---
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 41673
[https://github.com/apache/spark/pull/41673]

> Yarn module test failed on MacOs/Apple Slicon
> -
>
> Key: SPARK-44091
> URL: https://issues.apache.org/jira/browse/SPARK-44091
> Project: Spark
>  Issue Type: Bug
>  Components: Tests, YARN
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> When I run 
> ```
> build/sbt "yarn/test" -Pyarn 
> -Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest
> ```
> The YarnClusterSuite will have some test failures as follows:
> ```
> [info] - run Spark in yarn-client mode *** FAILED *** (3 seconds, 123 
> milliseconds)
> [info]   FAILED did not equal FINISHED (stdout/stderr was not captured) 
> (BaseYarnClusterSuite.scala:238)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> [info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> [info]   at 
> org.apache.spark.deploy.yarn.BaseYarnClusterSuite.checkResult(BaseYarnClusterSuite.scala:238)
> [info]   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite.testBasicYarnApp(YarnClusterSuite.scala:350)
> [info]   at 
> org.apache.spark.deploy.yarn.YarnClusterSuite.$anonfun$new$1(YarnClusterSuite.scala:95)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at 
> org.apache.spark.deploy.yarn.BaseYarnClusterSuite.$anonfun$test$1(BaseYarnClusterSuite.scala:77)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:221)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:67)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:67)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:67)
> [info]   at 
> 

[jira] [Assigned] (SPARK-44945) Automate PySpark error class documentation

2023-08-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44945:


Assignee: Haejoon Lee

> Automate PySpark error class documentation
> --
>
> Key: SPARK-44945
> URL: https://issues.apache.org/jira/browse/SPARK-44945
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> We need to automate the process for PySpark error class documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44945) Automate PySpark error class documentation

2023-08-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44945.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42658
[https://github.com/apache/spark/pull/42658]

> Automate PySpark error class documentation
> --
>
> Key: SPARK-44945
> URL: https://issues.apache.org/jira/browse/SPARK-44945
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> We need to automate the process for PySpark error class documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44897) Local Property Propagation to Subquery Broadcast Exec

2023-08-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-44897.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 42587
[https://github.com/apache/spark/pull/42587]

> Local Property Propagation to Subquery Broadcast Exec
> -
>
> Key: SPARK-44897
> URL: https://issues.apache.org/jira/browse/SPARK-44897
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Michael Chen
>Assignee: Michael Chen
>Priority: Major
> Fix For: 3.5.0
>
>
> https://issues.apache.org/jira/browse/SPARK-32748 was opened and then I 
> believe mistakenly reverted to address this issue. The claim was local 
> properties propagation in SubqueryBroadcastExec to the dynamic pruning thread 
> is not necessary because they will be propagated by broadcast threads 
> anyways. However, in a scenario where the dynamic pruning thread is first to 
> initialize the broadcast relation future, the local properties will not be 
> propagated correctly. This is because the local properties being propagated 
> to the broadcast threads would already be incorrect.
> I do not have a good way of reproducing this consistently because generally 
> the SubqueryBroadcastExec is not the first to initialize the broadcast 
> relation future, but by adding a Thread.sleep(1) into the doPrepare method of 
> SubqueryBroadcastExec, the following test always fails.
> {code:java}
> withSQLConf(StaticSQLConf.SUBQUERY_BROADCAST_MAX_THREAD_THRESHOLD.key -> "1") 
> {
>   withTable("a", "b") {
> val confKey = "spark.sql.y"
> val confValue1 = UUID.randomUUID().toString()
> val confValue2 = UUID.randomUUID().toString()
> Seq((confValue1, "1")).toDF("key", "value")
>   .write
>   .format("parquet")
>   .partitionBy("key")
>   .mode("overwrite")
>   .saveAsTable("a")
> val df1 = spark.table("a")
> def generateBroadcastDataFrame(confKey: String, confValue: String): 
> Dataset[String] = {
>   val df = spark.range(1).mapPartitions { _ =>
> Iterator(TaskContext.get.getLocalProperty(confKey))
>   }.filter($"value".contains(confValue)).as("c")
>   df.hint("broadcast")
> }
> // set local property and assert
> val df2 = generateBroadcastDataFrame(confKey, confValue1)
> spark.sparkContext.setLocalProperty(confKey, confValue1)
> val checkDF = df1.join(df2).where($"a.key" === 
> $"c.value").select($"a.key", $"c.value")
> val checks = checkDF.collect()
> assert(checks.forall(_.toSeq == Seq(confValue1, confValue1)))
> // change local property and re-assert
> Seq((confValue2, "1")).toDF("key", "value")
>   .write
>   .format("parquet")
>   .partitionBy("key")
>   .mode("overwrite")
>   .saveAsTable("b")
> val df3 = spark.table("b")
> val df4 = generateBroadcastDataFrame(confKey, confValue2)
> spark.sparkContext.setLocalProperty(confKey, confValue2)
> val checks2DF = df3.join(df4).where($"b.key" === 
> $"c.value").select($"b.key", $"c.value")
> val checks2 = checks2DF.collect()
> assert(checks2.forall(_.toSeq == Seq(confValue2, confValue2)))
> assert(checks2.nonEmpty)
>   }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44897) Local Property Propagation to Subquery Broadcast Exec

2023-08-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-44897:
---

Assignee: Michael Chen

> Local Property Propagation to Subquery Broadcast Exec
> -
>
> Key: SPARK-44897
> URL: https://issues.apache.org/jira/browse/SPARK-44897
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Michael Chen
>Assignee: Michael Chen
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-32748 was opened and then I 
> believe mistakenly reverted to address this issue. The claim was local 
> properties propagation in SubqueryBroadcastExec to the dynamic pruning thread 
> is not necessary because they will be propagated by broadcast threads 
> anyways. However, in a scenario where the dynamic pruning thread is first to 
> initialize the broadcast relation future, the local properties will not be 
> propagated correctly. This is because the local properties being propagated 
> to the broadcast threads would already be incorrect.
> I do not have a good way of reproducing this consistently because generally 
> the SubqueryBroadcastExec is not the first to initialize the broadcast 
> relation future, but by adding a Thread.sleep(1) into the doPrepare method of 
> SubqueryBroadcastExec, the following test always fails.
> {code:java}
> withSQLConf(StaticSQLConf.SUBQUERY_BROADCAST_MAX_THREAD_THRESHOLD.key -> "1") 
> {
>   withTable("a", "b") {
> val confKey = "spark.sql.y"
> val confValue1 = UUID.randomUUID().toString()
> val confValue2 = UUID.randomUUID().toString()
> Seq((confValue1, "1")).toDF("key", "value")
>   .write
>   .format("parquet")
>   .partitionBy("key")
>   .mode("overwrite")
>   .saveAsTable("a")
> val df1 = spark.table("a")
> def generateBroadcastDataFrame(confKey: String, confValue: String): 
> Dataset[String] = {
>   val df = spark.range(1).mapPartitions { _ =>
> Iterator(TaskContext.get.getLocalProperty(confKey))
>   }.filter($"value".contains(confValue)).as("c")
>   df.hint("broadcast")
> }
> // set local property and assert
> val df2 = generateBroadcastDataFrame(confKey, confValue1)
> spark.sparkContext.setLocalProperty(confKey, confValue1)
> val checkDF = df1.join(df2).where($"a.key" === 
> $"c.value").select($"a.key", $"c.value")
> val checks = checkDF.collect()
> assert(checks.forall(_.toSeq == Seq(confValue1, confValue1)))
> // change local property and re-assert
> Seq((confValue2, "1")).toDF("key", "value")
>   .write
>   .format("parquet")
>   .partitionBy("key")
>   .mode("overwrite")
>   .saveAsTable("b")
> val df3 = spark.table("b")
> val df4 = generateBroadcastDataFrame(confKey, confValue2)
> spark.sparkContext.setLocalProperty(confKey, confValue2)
> val checks2DF = df3.join(df4).where($"b.key" === 
> $"c.value").select($"b.key", $"c.value")
> val checks2 = checks2DF.collect()
> assert(checks2.forall(_.toSeq == Seq(confValue2, confValue2)))
> assert(checks2.nonEmpty)
>   }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44963) Make PySpark (pyspark-ml module) tests passing without any optional dependency

2023-08-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44963:


Assignee: Hyukjin Kwon

> Make PySpark (pyspark-ml module) tests passing without any optional dependency
> --
>
> Key: SPARK-44963
> URL: https://issues.apache.org/jira/browse/SPARK-44963
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> {code}
> ./python/run-tests --python-executables=python3 --modules=pyspark-ml
> ...
> Starting test(python3): pyspark.ml.tests.test_model_cache (temp output: 
> /Users/hyukjin.kwon/workspace/forked/spark/python/target/f6f88c1e-0cb2-43e6-980e-47f1cdb9b463/python3__pyspark.ml.tests.test_model_cache__zij05l1u.log)
> Traceback (most recent call last):
>   File 
> "/Users/hyukjin.kwon/miniconda3/envs/vanilla-3.10/lib/python3.10/runpy.py", 
> line 196, in _run_module_as_main
> return _run_code(code, main_globals, None,
>   File 
> "/Users/hyukjin.kwon/miniconda3/envs/vanilla-3.10/lib/python3.10/runpy.py", 
> line 86, in _run_code
> exec(code, run_globals)
>   File 
> "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/ml/tests/test_functions.py",
>  line 18, in 
> import pandas as pd
> ModuleNotFoundError: No module named 'pandas'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44963) Make PySpark (pyspark-ml module) tests passing without any optional dependency

2023-08-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44963.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42678
[https://github.com/apache/spark/pull/42678]

> Make PySpark (pyspark-ml module) tests passing without any optional dependency
> --
>
> Key: SPARK-44963
> URL: https://issues.apache.org/jira/browse/SPARK-44963
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 4.0.0
>
>
> {code}
> ./python/run-tests --python-executables=python3 --modules=pyspark-ml
> ...
> Starting test(python3): pyspark.ml.tests.test_model_cache (temp output: 
> /Users/hyukjin.kwon/workspace/forked/spark/python/target/f6f88c1e-0cb2-43e6-980e-47f1cdb9b463/python3__pyspark.ml.tests.test_model_cache__zij05l1u.log)
> Traceback (most recent call last):
>   File 
> "/Users/hyukjin.kwon/miniconda3/envs/vanilla-3.10/lib/python3.10/runpy.py", 
> line 196, in _run_module_as_main
> return _run_code(code, main_globals, None,
>   File 
> "/Users/hyukjin.kwon/miniconda3/envs/vanilla-3.10/lib/python3.10/runpy.py", 
> line 86, in _run_code
> exec(code, run_globals)
>   File 
> "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/ml/tests/test_functions.py",
>  line 18, in 
> import pandas as pd
> ModuleNotFoundError: No module named 'pandas'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44982) Mark Spark Connect configurations as static configuration

2023-08-27 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-44982:


 Summary: Mark Spark Connect configurations as static configuration
 Key: SPARK-44982
 URL: https://issues.apache.org/jira/browse/SPARK-44982
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Spark Connect server configurations are not marked either static or runtime 
yet. We should mark them static.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44981) Filter out static configurations used in local mode

2023-08-27 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-44981:


 Summary: Filter out static configurations used in local mode
 Key: SPARK-44981
 URL: https://issues.apache.org/jira/browse/SPARK-44981
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


If you set a static configuration with `--remote local` mode, it shows a bunch 
of warnings as below:

{code}
23/08/28 11:39:42 ERROR ErrorUtils: Spark Connect RPC error during: config. 
UserId: hyukjin.kwon. SessionId: 424674ef-af95-4b12-b10e-86479413f9fd.
org.apache.spark.sql.AnalysisException: Cannot modify the value of a static 
config: spark.connect.copyFromLocalToFs.allowDestLocal.
at 
org.apache.spark.sql.errors.QueryCompilationErrors$.cannotModifyValueOfStaticConfigError(QueryCompilationErrors.scala:3227)
at 
org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:162)
at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42)
at 
org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1(SparkConnectConfigHandler.scala:67)
at 
org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1$adapted(SparkConnectConfigHandler.scala:65)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at 
org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handleSet(SparkConnectConfigHandler.scala:65)
at 
org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handle(SparkConnectConfigHandler.scala:40)
at 
org.apache.spark.sql.connect.service.SparkConnectService.config(SparkConnectService.scala:120)
at 
org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:751)
at 
org.sparkproject.connect.grpc.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
at 
org.sparkproject.connect.grpc.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:346)
at 
org.sparkproject.connect.grpc.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:860)
at 
org.sparkproject.connect.grpc.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at 
org.sparkproject.connect.grpc.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44981) Filter out static configurations used in local mode

2023-08-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-44981:
-
Priority: Minor  (was: Major)

> Filter out static configurations used in local mode
> ---
>
> Key: SPARK-44981
> URL: https://issues.apache.org/jira/browse/SPARK-44981
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> If you set a static configuration with `--remote local` mode, it shows a 
> bunch of warnings as below:
> {code}
> 23/08/28 11:39:42 ERROR ErrorUtils: Spark Connect RPC error during: config. 
> UserId: hyukjin.kwon. SessionId: 424674ef-af95-4b12-b10e-86479413f9fd.
> org.apache.spark.sql.AnalysisException: Cannot modify the value of a static 
> config: spark.connect.copyFromLocalToFs.allowDestLocal.
>   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.cannotModifyValueOfStaticConfigError(QueryCompilationErrors.scala:3227)
>   at 
> org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:162)
>   at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42)
>   at 
> org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1(SparkConnectConfigHandler.scala:67)
>   at 
> org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1$adapted(SparkConnectConfigHandler.scala:65)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at 
> org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handleSet(SparkConnectConfigHandler.scala:65)
>   at 
> org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handle(SparkConnectConfigHandler.scala:40)
>   at 
> org.apache.spark.sql.connect.service.SparkConnectService.config(SparkConnectService.scala:120)
>   at 
> org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:751)
>   at 
> org.sparkproject.connect.grpc.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
>   at 
> org.sparkproject.connect.grpc.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:346)
>   at 
> org.sparkproject.connect.grpc.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:860)
>   at 
> org.sparkproject.connect.grpc.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> org.sparkproject.connect.grpc.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44980) createDataFrame should respect the names namedtuples properly

2023-08-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-44980:
-
Description: 
{code}
from collections import namedtuple
MyTuple = namedtuple("MyTuple", ["zz", "b", "a"])

class MyInheritedTuple(MyTuple):
pass

df = spark.createDataFrame([MyInheritedTuple(1, 2, 3), MyInheritedTuple(11, 22, 
33)])
df.collect()
{code}

{code}
[Row(zz=None, b=None, a=None), Row(zz=None, b=None, a=None)]
{code}

should be

{code}
[Row(zz=1, b=2, a=3), Row(zz=11, b=22, a=33)]
{code}

  was:
{code}
from collections import namedtuple
MyTuple = namedtuple("MyTuple", ["zz", "b", "a"])
df = spark.createDataFrame([MyTuple(1, 2, 3), MyTuple(11, 22, 33)], "a: long, 
b: long, zz: long")
df.show()
{code}

{code}
+---+---+---+
|  a|  b| zz|
+---+---+---+
|  1|  2|  3|
| 11| 22| 33|
+---+---+---+
{code}

should be

{code}
+---+---+---+
|  a|  b| zz|
+---+---+---+
|  3|  2|  1|
| 33| 22| 11|
+---+---+---+
{code}


> createDataFrame should respect the names namedtuples properly
> -
>
> Key: SPARK-44980
> URL: https://issues.apache.org/jira/browse/SPARK-44980
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> from collections import namedtuple
> MyTuple = namedtuple("MyTuple", ["zz", "b", "a"])
> class MyInheritedTuple(MyTuple):
> pass
> df = spark.createDataFrame([MyInheritedTuple(1, 2, 3), MyInheritedTuple(11, 
> 22, 33)])
> df.collect()
> {code}
> {code}
> [Row(zz=None, b=None, a=None), Row(zz=None, b=None, a=None)]
> {code}
> should be
> {code}
> [Row(zz=1, b=2, a=3), Row(zz=11, b=22, a=33)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44980) Fix inherited namedtuples to work in createDataFrame

2023-08-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-44980:
-
Summary: Fix inherited namedtuples to work in createDataFrame  (was: 
createDataFrame should respect the names namedtuples properly)

> Fix inherited namedtuples to work in createDataFrame
> 
>
> Key: SPARK-44980
> URL: https://issues.apache.org/jira/browse/SPARK-44980
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> from collections import namedtuple
> MyTuple = namedtuple("MyTuple", ["zz", "b", "a"])
> class MyInheritedTuple(MyTuple):
> pass
> df = spark.createDataFrame([MyInheritedTuple(1, 2, 3), MyInheritedTuple(11, 
> 22, 33)])
> df.collect()
> {code}
> {code}
> [Row(zz=None, b=None, a=None), Row(zz=None, b=None, a=None)]
> {code}
> should be
> {code}
> [Row(zz=1, b=2, a=3), Row(zz=11, b=22, a=33)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44980) createDataFrame should respect the names namedtuples properly

2023-08-27 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-44980:


 Summary: createDataFrame should respect the names namedtuples 
properly
 Key: SPARK-44980
 URL: https://issues.apache.org/jira/browse/SPARK-44980
 Project: Spark
  Issue Type: Bug
  Components: Connect, PySpark
Affects Versions: 3.5.0
Reporter: Hyukjin Kwon


{code}
from collections import namedtuple
MyTuple = namedtuple("MyTuple", ["zz", "b", "a"])
df = spark.createDataFrame([MyTuple(1, 2, 3), MyTuple(11, 22, 33)], "a: long, 
b: long, zz: long")
df.show()
{code}

{code}
+---+---+---+
|  a|  b| zz|
+---+---+---+
|  1|  2|  3|
| 11| 22| 33|
+---+---+---+
{code}

should be

{code}
+---+---+---+
|  a|  b| zz|
+---+---+---+
|  3|  2|  1|
| 33| 22| 11|
+---+---+---+
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44131) Add call_function for Scala API

2023-08-27 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759408#comment-17759408
 ] 

Xiao Li commented on SPARK-44131:
-

[https://github.com/apache/spark/pull/41950] reverted the deprecation. 

> Add call_function for Scala API
> ---
>
> Key: SPARK-44131
> URL: https://issues.apache.org/jira/browse/SPARK-44131
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.5.0
>
>
> The scala API for SQL exists a method call_udf used to call the user-defined 
> functions.
> In fact, call_udf also could call the builtin functions.
> The behavior is confused for users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44131) Add call_function for Scala API

2023-08-27 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-44131:

Summary: Add call_function for Scala API  (was: Add call_function and 
deprecate call_udf for Scala API)

> Add call_function for Scala API
> ---
>
> Key: SPARK-44131
> URL: https://issues.apache.org/jira/browse/SPARK-44131
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.5.0
>
>
> The scala API for SQL exists a method call_udf used to call the user-defined 
> functions.
> In fact, call_udf also could call the builtin functions.
> The behavior is confused for users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44978) Fix SQLQueryTestSuite unable create table normally

2023-08-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44978.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42691
[https://github.com/apache/spark/pull/42691]

> Fix SQLQueryTestSuite unable create table normally
> --
>
> Key: SPARK-44978
> URL: https://issues.apache.org/jira/browse/SPARK-44978
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: image-2023-08-27-14-25-21-843.png
>
>
> When we repeatedly execute SQLQueryTestSuite to generate the golden file, the 
> warehouse file executed last time is not cleaned up (maybe killed when test 
> not finish), resulting in an error result 
> !image-2023-08-27-14-25-21-843.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44978) Fix SQLQueryTestSuite unable create table normally

2023-08-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44978:


Assignee: Jia Fan

> Fix SQLQueryTestSuite unable create table normally
> --
>
> Key: SPARK-44978
> URL: https://issues.apache.org/jira/browse/SPARK-44978
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Major
> Attachments: image-2023-08-27-14-25-21-843.png
>
>
> When we repeatedly execute SQLQueryTestSuite to generate the golden file, the 
> warehouse file executed last time is not cleaned up (maybe killed when test 
> not finish), resulting in an error result 
> !image-2023-08-27-14-25-21-843.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44264) DeepSpeed Distributor

2023-08-27 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-44264:

Summary: DeepSpeed Distributor  (was: DeepSpeed Distrobutor)

> DeepSpeed Distributor
> -
>
> Key: SPARK-44264
> URL: https://issues.apache.org/jira/browse/SPARK-44264
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.4.1
>Reporter: Lu Wang
>Priority: Critical
> Fix For: 3.5.0
>
> Attachments: Trying to Run Deepspeed Funcs.html
>
>
> To make it easier for Pyspark users to run distributed training and inference 
> with DeepSpeed on spark clusters using PySpark. This was a project determined 
> by the Databricks ML Training Team.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44979) Cache results of simple udfs on executors if same arguments are passed.

2023-08-27 Thread Dinesh Dharme (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Dharme updated SPARK-44979:
--
Description: 
Consider two dataframes :

{{keyword_given = [
["green pstr",],
["greenpstr",],
["wlmrt", ],
["walmart",],
["walmart super",]
]}}

{{variations = [
("type green pstr", "ABC", 100),
("type green pstr","PQR",200),
("type green pstr", "NZSD", 2999),
("wlmrt payment","walmart",200),
("wlmrt solutions", "walmart", 200),
("nppssdwlmrt", "walmart", 2000)
]}}

{{Imagine I have a task to do fuzzy substring matching between keyword and 
variation[0] using in built levenstein function. It is possible to optimize 
this futher in the code itself where we extract out the uniques and then do 
fuzzy matching on the uniques and join back with the original tables. }}

{{But it could be possible as an optimization to cache the results of the 
already computed udfs till now and do a lookup on each executor separately.}}

Just a thought. Not sure if it makes any sense. This behaviour could be behind 
a config.

 

  was:
Consider two dataframes :

{{keyword_given = [
["green pstr",],
["greenpstr",],
["wlmrt", ],
["walmart",],
["walmart super",]
]}}

{{variations = [
("type green pstr", "ABC", 100),
("type green pstr","PQR",200),
("type green pstr", "NZSD", 2999),
("wlmrt payment","walmart",200),
("wlmrt solutions", "walmart", 200),
("nppssdwlmrt", "walmart", 2000)
 ]}}

{{Imagine I have a task to do fuzzy substring matching between keyword and 
variation[0] using in built levenstein function. It is possible to optimize 
this futher in the code itself where we extract out the uniques and then do 
fuzzy matching on the uniques and join back with the original table. }}

{{But it could be possible as an optimization to cache the results of the 
already computed udfs till now and do a lookup on each executor separately.}}

Just a thought. Not sure if it makes any sense. This behaviour could be behind 
a config.

{{}}

{{}}

{{}}

{{{}{}}}{{{}{}}}


> Cache results of simple udfs on executors if same arguments are passed.
> ---
>
> Key: SPARK-44979
> URL: https://issues.apache.org/jira/browse/SPARK-44979
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Dinesh Dharme
>Priority: Minor
>
> Consider two dataframes :
> {{keyword_given = [
> ["green pstr",],
> ["greenpstr",],
> ["wlmrt", ],
> ["walmart",],
> ["walmart super",]
> ]}}
> {{variations = [
> ("type green pstr", "ABC", 100),
> ("type green pstr","PQR",200),
> ("type green pstr", "NZSD", 2999),
> ("wlmrt payment","walmart",200),
> ("wlmrt solutions", "walmart", 200),
> ("nppssdwlmrt", "walmart", 2000)
> ]}}
> {{Imagine I have a task to do fuzzy substring matching between keyword and 
> variation[0] using in built levenstein function. It is possible to optimize 
> this futher in the code itself where we extract out the uniques and then do 
> fuzzy matching on the uniques and join back with the original tables. }}
> {{But it could be possible as an optimization to cache the results of the 
> already computed udfs till now and do a lookup on each executor separately.}}
> Just a thought. Not sure if it makes any sense. This behaviour could be 
> behind a config.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44979) Cache results of simple udfs on executors if same arguments are passed.

2023-08-27 Thread Dinesh Dharme (Jira)
Dinesh Dharme created SPARK-44979:
-

 Summary: Cache results of simple udfs on executors if same 
arguments are passed.
 Key: SPARK-44979
 URL: https://issues.apache.org/jira/browse/SPARK-44979
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.1
Reporter: Dinesh Dharme


Consider two dataframes :

{{keyword_given = [
["green pstr",],
["greenpstr",],
["wlmrt", ],
["walmart",],
["walmart super",]
]}}

{{variations = [
("type green pstr", "ABC", 100),
("type green pstr","PQR",200),
("type green pstr", "NZSD", 2999),
("wlmrt payment","walmart",200),
("wlmrt solutions", "walmart", 200),
("nppssdwlmrt", "walmart", 2000)
 ]}}

{{Imagine I have a task to do fuzzy substring matching between keyword and 
variation[0] using in built levenstein function. It is possible to optimize 
this futher in the code itself where we extract out the uniques and then do 
fuzzy matching on the uniques and join back with the original table. }}

{{But it could be possible as an optimization to cache the results of the 
already computed udfs till now and do a lookup on each executor separately.}}

Just a thought. Not sure if it makes any sense. This behaviour could be behind 
a config.

{{}}

{{}}

{{}}

{{{}{}}}{{{}{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44978) Fix SQLQueryTestSuite unable create table normally

2023-08-27 Thread Jia Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Fan updated SPARK-44978:

Description: 
When we repeatedly execute SQLQueryTestSuite to generate the golden file, the 
warehouse file executed last time is not cleaned up (maybe killed when test not 
finish), resulting in an error result 

!image-2023-08-27-14-25-21-843.png!

  was:When we repeatedly execute SQLQueryTestSuite to generate the golden file, 
the warehouse file executed last time is not cleaned up (maybe killed when test 
not finish), resulting in an error result !image-2023-08-27-14-22-43-361.png!


> Fix SQLQueryTestSuite unable create table normally
> --
>
> Key: SPARK-44978
> URL: https://issues.apache.org/jira/browse/SPARK-44978
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Jia Fan
>Priority: Major
> Attachments: image-2023-08-27-14-25-21-843.png
>
>
> When we repeatedly execute SQLQueryTestSuite to generate the golden file, the 
> warehouse file executed last time is not cleaned up (maybe killed when test 
> not finish), resulting in an error result 
> !image-2023-08-27-14-25-21-843.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44978) Fix SQLQueryTestSuite unable create table normally

2023-08-27 Thread Jia Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Fan updated SPARK-44978:

Attachment: image-2023-08-27-14-25-21-843.png

> Fix SQLQueryTestSuite unable create table normally
> --
>
> Key: SPARK-44978
> URL: https://issues.apache.org/jira/browse/SPARK-44978
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Jia Fan
>Priority: Major
> Attachments: image-2023-08-27-14-25-21-843.png
>
>
> When we repeatedly execute SQLQueryTestSuite to generate the golden file, the 
> warehouse file executed last time is not cleaned up (maybe killed when test 
> not finish), resulting in an error result !image-2023-08-27-14-22-43-361.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44978) Fix SQLQueryTestSuite unable create table normally

2023-08-27 Thread Jia Fan (Jira)
Jia Fan created SPARK-44978:
---

 Summary: Fix SQLQueryTestSuite unable create table normally
 Key: SPARK-44978
 URL: https://issues.apache.org/jira/browse/SPARK-44978
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Jia Fan


When we repeatedly execute SQLQueryTestSuite to generate the golden file, the 
warehouse file executed last time is not cleaned up (maybe killed when test not 
finish), resulting in an error result !image-2023-08-27-14-22-43-361.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org