[spark] branch branch-3.0 updated (5412009 -> d1a3fad)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 5412009 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package add d1a3fad [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (5412009 -> d1a3fad)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 5412009 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package add d1a3fad [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4afe2b1 -> 56d4f27)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4afe2b1 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package add 56d4f27 [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (5412009 -> d1a3fad)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 5412009 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package add d1a3fad [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4afe2b1 -> 56d4f27)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4afe2b1 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package add 56d4f27 [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (5412009 -> d1a3fad)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 5412009 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package add d1a3fad [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4afe2b1 -> 56d4f27)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4afe2b1 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package add 56d4f27 [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (5412009 -> d1a3fad)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 5412009 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package add d1a3fad [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4afe2b1 -> 56d4f27)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4afe2b1 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package add 56d4f27 [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4afe2b1 -> 56d4f27)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4afe2b1 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package add 56d4f27 [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 5412009 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package 5412009 is described below commit 5412009d157f77ee4c90de12079502046f9c8682 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Wed Jun 10 21:32:16 2020 -0700 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package ### What changes were proposed in this pull request? This PR proposes to remove package private in classes/objects in sql.execution package, as per SPARK-16964. ### Why are the changes needed? This is per post-hoc review comment, see https://github.com/apache/spark/pull/24996#discussion_r437126445 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #28790 from HeartSaVioR/SPARK-28199-FOLLOWUP-apply-SPARK-16964. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Dongjoon Hyun (cherry picked from commit 4afe2b1bc9ef190c0117e28da447871b90100622) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/execution/streaming/Triggers.scala| 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Triggers.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Triggers.scala index d40208f..28171f4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Triggers.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Triggers.scala @@ -50,17 +50,17 @@ private object Triggers { * A [[Trigger]] that processes only one batch of data in a streaming query then terminates * the query. */ -private[sql] case object OneTimeTrigger extends Trigger +case object OneTimeTrigger extends Trigger /** * A [[Trigger]] that runs a query periodically based on the processing time. If `interval` is 0, * the query will run as fast as possible. */ -private[sql] case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger { +case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger { Triggers.validate(intervalMs) } -private[sql] object ProcessingTimeTrigger { +object ProcessingTimeTrigger { import Triggers._ def apply(interval: String): ProcessingTimeTrigger = { @@ -84,11 +84,11 @@ private[sql] object ProcessingTimeTrigger { * A [[Trigger]] that continuously processes streaming data, asynchronously checkpointing at * the specified interval. */ -private[sql] case class ContinuousTrigger(intervalMs: Long) extends Trigger { +case class ContinuousTrigger(intervalMs: Long) extends Trigger { Triggers.validate(intervalMs) } -private[sql] object ContinuousTrigger { +object ContinuousTrigger { import Triggers._ def apply(interval: String): ContinuousTrigger = { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4afe2b1 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package 4afe2b1 is described below commit 4afe2b1bc9ef190c0117e28da447871b90100622 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Wed Jun 10 21:32:16 2020 -0700 [SPARK-28199][SS][FOLLOWUP] Remove package private in class/object in sql.execution package ### What changes were proposed in this pull request? This PR proposes to remove package private in classes/objects in sql.execution package, as per SPARK-16964. ### Why are the changes needed? This is per post-hoc review comment, see https://github.com/apache/spark/pull/24996#discussion_r437126445 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #28790 from HeartSaVioR/SPARK-28199-FOLLOWUP-apply-SPARK-16964. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/execution/streaming/Triggers.scala| 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Triggers.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Triggers.scala index f29970d..ebd237b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Triggers.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Triggers.scala @@ -50,17 +50,17 @@ private object Triggers { * A [[Trigger]] that processes only one batch of data in a streaming query then terminates * the query. */ -private[sql] case object OneTimeTrigger extends Trigger +case object OneTimeTrigger extends Trigger /** * A [[Trigger]] that runs a query periodically based on the processing time. If `interval` is 0, * the query will run as fast as possible. */ -private[sql] case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger { +case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger { Triggers.validate(intervalMs) } -private[sql] object ProcessingTimeTrigger { +object ProcessingTimeTrigger { import Triggers._ def apply(interval: String): ProcessingTimeTrigger = { @@ -84,11 +84,11 @@ private[sql] object ProcessingTimeTrigger { * A [[Trigger]] that continuously processes streaming data, asynchronously checkpointing at * the specified interval. */ -private[sql] case class ContinuousTrigger(intervalMs: Long) extends Trigger { +case class ContinuousTrigger(intervalMs: Long) extends Trigger { Triggers.validate(intervalMs) } -private[sql] object ContinuousTrigger { +object ContinuousTrigger { import Triggers._ def apply(interval: String): ContinuousTrigger = { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8ad9b83 [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally 8ad9b83 is described below commit 8ad9b83edc239eae6b468d619419af5c0f41b4d0 Author: HyukjinKwon AuthorDate: Wed Jun 10 21:15:40 2020 -0700 [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally ### What changes were proposed in this pull request? This PR proposes to move the doctests in `registerJavaUDAF` and `registerJavaFunction` to the proper unittests that run conditionally when the test classes are present. Both tests are dependent on the test classes in JVM side, `test.org.apache.spark.sql.JavaStringLength` and `test.org.apache.spark.sql.MyDoubleAvg`. So if you run the tests against the plain `sbt package`, it fails as below: ``` ** File "/.../spark/python/pyspark/sql/udf.py", line 366, in pyspark.sql.udf.UDFRegistration.registerJavaFunction Failed example: spark.udf.registerJavaFunction( "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType()) Exception raised: Traceback (most recent call last): ... test.org.apache.spark.sql.JavaStringLength, please make sure it is on the classpath; ... 6 of 7 in pyspark.sql.udf.UDFRegistration.registerJavaFunction 2 of 4 in pyspark.sql.udf.UDFRegistration.registerJavaUDAF ***Test Failed*** 8 failures. ``` ### Why are the changes needed? In order to support to run the tests against the plain SBT build. See also https://spark.apache.org/developer-tools.html ### Does this PR introduce _any_ user-facing change? No, it's test-only. ### How was this patch tested? Manually tested as below: ```bash ./build/sbt -DskipTests -Phive-thriftserver clean package cd python ./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction" ./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests" ``` ```bash ./build/sbt -DskipTests -Phive-thriftserver clean test:package cd python ./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction" ./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests" ``` Closes #28795 from HyukjinKwon/SPARK-31965. Authored-by: HyukjinKwon Signed-off-by: Dongjoon Hyun (cherry picked from commit 56264fb5d3ad1a488be5e08feb2e0304d1c2ed6a) Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/tests/test_udf.py | 28 python/pyspark/sql/udf.py| 14 +- 2 files changed, 37 insertions(+), 5 deletions(-) diff --git a/python/pyspark/sql/tests/test_udf.py b/python/pyspark/sql/tests/test_udf.py index 061d3f5..ea7ec9f 100644 --- a/python/pyspark/sql/tests/test_udf.py +++ b/python/pyspark/sql/tests/test_udf.py @@ -21,6 +21,8 @@ import shutil import tempfile import unittest +import py4j + from pyspark import SparkContext from pyspark.sql import SparkSession, Column, Row from pyspark.sql.functions import UserDefinedFunction, udf @@ -357,6 +359,32 @@ class UDFTests(ReusedSQLTestCase): df.select(add_four("id").alias("plus_four")).collect() ) +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_function(self): +self.spark.udf.registerJavaFunction( +"javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType()) +[value] = self.spark.sql("SELECT javaStringLength('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength2", "test.org.apache.spark.sql.JavaStringLength") +[value] = self.spark.sql("SELECT javaStringLength2('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength3", "test.org.apache.spark.sql.JavaStringLength", "integer") +[value] = self.spark.sql("SELECT javaStringLength3('test')").first() +self.assertEqual(value, 4) + +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_udaf(self): +self.spark.udf.registerJavaUDAF("javaUDAF", "test.org.apache.spark.sql.MyDoubleAvg") +df = self.spark.createDataFrame([(1, "a"), (2, "b"), (3, "a")], ["id", "name"]) +df.createOrReplaceTempView("df") +row = self.spark.sql( +"SELECT name, javaUDAF(id) as avg
[spark] branch branch-3.0 updated: [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8ad9b83 [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally 8ad9b83 is described below commit 8ad9b83edc239eae6b468d619419af5c0f41b4d0 Author: HyukjinKwon AuthorDate: Wed Jun 10 21:15:40 2020 -0700 [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally ### What changes were proposed in this pull request? This PR proposes to move the doctests in `registerJavaUDAF` and `registerJavaFunction` to the proper unittests that run conditionally when the test classes are present. Both tests are dependent on the test classes in JVM side, `test.org.apache.spark.sql.JavaStringLength` and `test.org.apache.spark.sql.MyDoubleAvg`. So if you run the tests against the plain `sbt package`, it fails as below: ``` ** File "/.../spark/python/pyspark/sql/udf.py", line 366, in pyspark.sql.udf.UDFRegistration.registerJavaFunction Failed example: spark.udf.registerJavaFunction( "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType()) Exception raised: Traceback (most recent call last): ... test.org.apache.spark.sql.JavaStringLength, please make sure it is on the classpath; ... 6 of 7 in pyspark.sql.udf.UDFRegistration.registerJavaFunction 2 of 4 in pyspark.sql.udf.UDFRegistration.registerJavaUDAF ***Test Failed*** 8 failures. ``` ### Why are the changes needed? In order to support to run the tests against the plain SBT build. See also https://spark.apache.org/developer-tools.html ### Does this PR introduce _any_ user-facing change? No, it's test-only. ### How was this patch tested? Manually tested as below: ```bash ./build/sbt -DskipTests -Phive-thriftserver clean package cd python ./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction" ./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests" ``` ```bash ./build/sbt -DskipTests -Phive-thriftserver clean test:package cd python ./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction" ./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests" ``` Closes #28795 from HyukjinKwon/SPARK-31965. Authored-by: HyukjinKwon Signed-off-by: Dongjoon Hyun (cherry picked from commit 56264fb5d3ad1a488be5e08feb2e0304d1c2ed6a) Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/tests/test_udf.py | 28 python/pyspark/sql/udf.py| 14 +- 2 files changed, 37 insertions(+), 5 deletions(-) diff --git a/python/pyspark/sql/tests/test_udf.py b/python/pyspark/sql/tests/test_udf.py index 061d3f5..ea7ec9f 100644 --- a/python/pyspark/sql/tests/test_udf.py +++ b/python/pyspark/sql/tests/test_udf.py @@ -21,6 +21,8 @@ import shutil import tempfile import unittest +import py4j + from pyspark import SparkContext from pyspark.sql import SparkSession, Column, Row from pyspark.sql.functions import UserDefinedFunction, udf @@ -357,6 +359,32 @@ class UDFTests(ReusedSQLTestCase): df.select(add_four("id").alias("plus_four")).collect() ) +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_function(self): +self.spark.udf.registerJavaFunction( +"javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType()) +[value] = self.spark.sql("SELECT javaStringLength('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength2", "test.org.apache.spark.sql.JavaStringLength") +[value] = self.spark.sql("SELECT javaStringLength2('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength3", "test.org.apache.spark.sql.JavaStringLength", "integer") +[value] = self.spark.sql("SELECT javaStringLength3('test')").first() +self.assertEqual(value, 4) + +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_udaf(self): +self.spark.udf.registerJavaUDAF("javaUDAF", "test.org.apache.spark.sql.MyDoubleAvg") +df = self.spark.createDataFrame([(1, "a"), (2, "b"), (3, "a")], ["id", "name"]) +df.createOrReplaceTempView("df") +row = self.spark.sql( +"SELECT name, javaUDAF(id) as avg
[spark] branch master updated (76b5ed4 -> 56264fb)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 76b5ed4 [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3 add 56264fb [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally No new revisions were added by this update. Summary of changes: python/pyspark/sql/tests/test_udf.py | 28 python/pyspark/sql/udf.py| 14 +- 2 files changed, 37 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (76b5ed4 -> 56264fb)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 76b5ed4 [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3 add 56264fb [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally No new revisions were added by this update. Summary of changes: python/pyspark/sql/tests/test_udf.py | 28 python/pyspark/sql/udf.py| 14 +- 2 files changed, 37 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8ad9b83 [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally 8ad9b83 is described below commit 8ad9b83edc239eae6b468d619419af5c0f41b4d0 Author: HyukjinKwon AuthorDate: Wed Jun 10 21:15:40 2020 -0700 [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally ### What changes were proposed in this pull request? This PR proposes to move the doctests in `registerJavaUDAF` and `registerJavaFunction` to the proper unittests that run conditionally when the test classes are present. Both tests are dependent on the test classes in JVM side, `test.org.apache.spark.sql.JavaStringLength` and `test.org.apache.spark.sql.MyDoubleAvg`. So if you run the tests against the plain `sbt package`, it fails as below: ``` ** File "/.../spark/python/pyspark/sql/udf.py", line 366, in pyspark.sql.udf.UDFRegistration.registerJavaFunction Failed example: spark.udf.registerJavaFunction( "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType()) Exception raised: Traceback (most recent call last): ... test.org.apache.spark.sql.JavaStringLength, please make sure it is on the classpath; ... 6 of 7 in pyspark.sql.udf.UDFRegistration.registerJavaFunction 2 of 4 in pyspark.sql.udf.UDFRegistration.registerJavaUDAF ***Test Failed*** 8 failures. ``` ### Why are the changes needed? In order to support to run the tests against the plain SBT build. See also https://spark.apache.org/developer-tools.html ### Does this PR introduce _any_ user-facing change? No, it's test-only. ### How was this patch tested? Manually tested as below: ```bash ./build/sbt -DskipTests -Phive-thriftserver clean package cd python ./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction" ./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests" ``` ```bash ./build/sbt -DskipTests -Phive-thriftserver clean test:package cd python ./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction" ./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests" ``` Closes #28795 from HyukjinKwon/SPARK-31965. Authored-by: HyukjinKwon Signed-off-by: Dongjoon Hyun (cherry picked from commit 56264fb5d3ad1a488be5e08feb2e0304d1c2ed6a) Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/tests/test_udf.py | 28 python/pyspark/sql/udf.py| 14 +- 2 files changed, 37 insertions(+), 5 deletions(-) diff --git a/python/pyspark/sql/tests/test_udf.py b/python/pyspark/sql/tests/test_udf.py index 061d3f5..ea7ec9f 100644 --- a/python/pyspark/sql/tests/test_udf.py +++ b/python/pyspark/sql/tests/test_udf.py @@ -21,6 +21,8 @@ import shutil import tempfile import unittest +import py4j + from pyspark import SparkContext from pyspark.sql import SparkSession, Column, Row from pyspark.sql.functions import UserDefinedFunction, udf @@ -357,6 +359,32 @@ class UDFTests(ReusedSQLTestCase): df.select(add_four("id").alias("plus_four")).collect() ) +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_function(self): +self.spark.udf.registerJavaFunction( +"javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType()) +[value] = self.spark.sql("SELECT javaStringLength('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength2", "test.org.apache.spark.sql.JavaStringLength") +[value] = self.spark.sql("SELECT javaStringLength2('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength3", "test.org.apache.spark.sql.JavaStringLength", "integer") +[value] = self.spark.sql("SELECT javaStringLength3('test')").first() +self.assertEqual(value, 4) + +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_udaf(self): +self.spark.udf.registerJavaUDAF("javaUDAF", "test.org.apache.spark.sql.MyDoubleAvg") +df = self.spark.createDataFrame([(1, "a"), (2, "b"), (3, "a")], ["id", "name"]) +df.createOrReplaceTempView("df") +row = self.spark.sql( +"SELECT name, javaUDAF(id) as avg
[spark] branch branch-3.0 updated: [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8ad9b83 [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally 8ad9b83 is described below commit 8ad9b83edc239eae6b468d619419af5c0f41b4d0 Author: HyukjinKwon AuthorDate: Wed Jun 10 21:15:40 2020 -0700 [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally ### What changes were proposed in this pull request? This PR proposes to move the doctests in `registerJavaUDAF` and `registerJavaFunction` to the proper unittests that run conditionally when the test classes are present. Both tests are dependent on the test classes in JVM side, `test.org.apache.spark.sql.JavaStringLength` and `test.org.apache.spark.sql.MyDoubleAvg`. So if you run the tests against the plain `sbt package`, it fails as below: ``` ** File "/.../spark/python/pyspark/sql/udf.py", line 366, in pyspark.sql.udf.UDFRegistration.registerJavaFunction Failed example: spark.udf.registerJavaFunction( "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType()) Exception raised: Traceback (most recent call last): ... test.org.apache.spark.sql.JavaStringLength, please make sure it is on the classpath; ... 6 of 7 in pyspark.sql.udf.UDFRegistration.registerJavaFunction 2 of 4 in pyspark.sql.udf.UDFRegistration.registerJavaUDAF ***Test Failed*** 8 failures. ``` ### Why are the changes needed? In order to support to run the tests against the plain SBT build. See also https://spark.apache.org/developer-tools.html ### Does this PR introduce _any_ user-facing change? No, it's test-only. ### How was this patch tested? Manually tested as below: ```bash ./build/sbt -DskipTests -Phive-thriftserver clean package cd python ./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction" ./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests" ``` ```bash ./build/sbt -DskipTests -Phive-thriftserver clean test:package cd python ./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction" ./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests" ``` Closes #28795 from HyukjinKwon/SPARK-31965. Authored-by: HyukjinKwon Signed-off-by: Dongjoon Hyun (cherry picked from commit 56264fb5d3ad1a488be5e08feb2e0304d1c2ed6a) Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/tests/test_udf.py | 28 python/pyspark/sql/udf.py| 14 +- 2 files changed, 37 insertions(+), 5 deletions(-) diff --git a/python/pyspark/sql/tests/test_udf.py b/python/pyspark/sql/tests/test_udf.py index 061d3f5..ea7ec9f 100644 --- a/python/pyspark/sql/tests/test_udf.py +++ b/python/pyspark/sql/tests/test_udf.py @@ -21,6 +21,8 @@ import shutil import tempfile import unittest +import py4j + from pyspark import SparkContext from pyspark.sql import SparkSession, Column, Row from pyspark.sql.functions import UserDefinedFunction, udf @@ -357,6 +359,32 @@ class UDFTests(ReusedSQLTestCase): df.select(add_four("id").alias("plus_four")).collect() ) +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_function(self): +self.spark.udf.registerJavaFunction( +"javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType()) +[value] = self.spark.sql("SELECT javaStringLength('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength2", "test.org.apache.spark.sql.JavaStringLength") +[value] = self.spark.sql("SELECT javaStringLength2('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength3", "test.org.apache.spark.sql.JavaStringLength", "integer") +[value] = self.spark.sql("SELECT javaStringLength3('test')").first() +self.assertEqual(value, 4) + +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_udaf(self): +self.spark.udf.registerJavaUDAF("javaUDAF", "test.org.apache.spark.sql.MyDoubleAvg") +df = self.spark.createDataFrame([(1, "a"), (2, "b"), (3, "a")], ["id", "name"]) +df.createOrReplaceTempView("df") +row = self.spark.sql( +"SELECT name, javaUDAF(id) as avg
[spark] branch master updated (76b5ed4 -> 56264fb)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 76b5ed4 [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3 add 56264fb [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally No new revisions were added by this update. Summary of changes: python/pyspark/sql/tests/test_udf.py | 28 python/pyspark/sql/udf.py| 14 +- 2 files changed, 37 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8ad9b83 [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally 8ad9b83 is described below commit 8ad9b83edc239eae6b468d619419af5c0f41b4d0 Author: HyukjinKwon AuthorDate: Wed Jun 10 21:15:40 2020 -0700 [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally ### What changes were proposed in this pull request? This PR proposes to move the doctests in `registerJavaUDAF` and `registerJavaFunction` to the proper unittests that run conditionally when the test classes are present. Both tests are dependent on the test classes in JVM side, `test.org.apache.spark.sql.JavaStringLength` and `test.org.apache.spark.sql.MyDoubleAvg`. So if you run the tests against the plain `sbt package`, it fails as below: ``` ** File "/.../spark/python/pyspark/sql/udf.py", line 366, in pyspark.sql.udf.UDFRegistration.registerJavaFunction Failed example: spark.udf.registerJavaFunction( "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType()) Exception raised: Traceback (most recent call last): ... test.org.apache.spark.sql.JavaStringLength, please make sure it is on the classpath; ... 6 of 7 in pyspark.sql.udf.UDFRegistration.registerJavaFunction 2 of 4 in pyspark.sql.udf.UDFRegistration.registerJavaUDAF ***Test Failed*** 8 failures. ``` ### Why are the changes needed? In order to support to run the tests against the plain SBT build. See also https://spark.apache.org/developer-tools.html ### Does this PR introduce _any_ user-facing change? No, it's test-only. ### How was this patch tested? Manually tested as below: ```bash ./build/sbt -DskipTests -Phive-thriftserver clean package cd python ./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction" ./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests" ``` ```bash ./build/sbt -DskipTests -Phive-thriftserver clean test:package cd python ./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction" ./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests" ``` Closes #28795 from HyukjinKwon/SPARK-31965. Authored-by: HyukjinKwon Signed-off-by: Dongjoon Hyun (cherry picked from commit 56264fb5d3ad1a488be5e08feb2e0304d1c2ed6a) Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/tests/test_udf.py | 28 python/pyspark/sql/udf.py| 14 +- 2 files changed, 37 insertions(+), 5 deletions(-) diff --git a/python/pyspark/sql/tests/test_udf.py b/python/pyspark/sql/tests/test_udf.py index 061d3f5..ea7ec9f 100644 --- a/python/pyspark/sql/tests/test_udf.py +++ b/python/pyspark/sql/tests/test_udf.py @@ -21,6 +21,8 @@ import shutil import tempfile import unittest +import py4j + from pyspark import SparkContext from pyspark.sql import SparkSession, Column, Row from pyspark.sql.functions import UserDefinedFunction, udf @@ -357,6 +359,32 @@ class UDFTests(ReusedSQLTestCase): df.select(add_four("id").alias("plus_four")).collect() ) +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_function(self): +self.spark.udf.registerJavaFunction( +"javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType()) +[value] = self.spark.sql("SELECT javaStringLength('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength2", "test.org.apache.spark.sql.JavaStringLength") +[value] = self.spark.sql("SELECT javaStringLength2('test')").first() +self.assertEqual(value, 4) + +self.spark.udf.registerJavaFunction( +"javaStringLength3", "test.org.apache.spark.sql.JavaStringLength", "integer") +[value] = self.spark.sql("SELECT javaStringLength3('test')").first() +self.assertEqual(value, 4) + +@unittest.skipIf(not test_compiled, test_not_compiled_message) +def test_register_java_udaf(self): +self.spark.udf.registerJavaUDAF("javaUDAF", "test.org.apache.spark.sql.MyDoubleAvg") +df = self.spark.createDataFrame([(1, "a"), (2, "b"), (3, "a")], ["id", "name"]) +df.createOrReplaceTempView("df") +row = self.spark.sql( +"SELECT name, javaUDAF(id) as avg
[spark] branch master updated (76b5ed4 -> 56264fb)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 76b5ed4 [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3 add 56264fb [SPARK-31965][TESTS][PYTHON] Move doctests related to Java function registration to test conditionally No new revisions were added by this update. Summary of changes: python/pyspark/sql/tests/test_udf.py | 28 python/pyspark/sql/udf.py| 14 +- 2 files changed, 37 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 76b5ed4 [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3 76b5ed4 is described below commit 76b5ed4ffaa82241944aeae0a0238cf8ee86e44a Author: Gengliang Wang AuthorDate: Wed Jun 10 20:59:48 2020 -0700 [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3 ### What changes were proposed in this pull request? This PR updates the test case to accept Hadoop 2/3 error message correctly. ### Why are the changes needed? SPARK-31935(#28760) breaks Hadoop 3.2 UT because Hadoop 2 and Hadoop 3 have different exception messages. In https://github.com/apache/spark/pull/28791, there are two test suites missed the fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test Closes #28796 from gengliangwang/SPARK-31926-followup. Authored-by: Gengliang Wang Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/execution/datasources/DataSourceSuite.scala | 3 ++- .../scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceSuite.scala index 9345158..aa91791 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceSuite.scala @@ -142,7 +142,8 @@ class DataSourceSuite extends SharedSparkSession with PrivateMethodTester { val message = intercept[java.io.IOException] { dataSource invokePrivate checkAndGlobPathIfNecessary(false, false) }.getMessage -assert(message.equals("No FileSystem for scheme: nonexistsFs")) +val expectMessage = "No FileSystem for scheme nonexistsFs" +assert(message.filterNot(Set(':', '"').contains) == expectMessage) } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala index 32dceaa..7b16aeb 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala @@ -536,11 +536,11 @@ class FileStreamSourceSuite extends FileStreamSourceTest { withTempDir { dir => val path = dir.getCanonicalPath val defaultFs = "nonexistFS://nonexistFS" - val expectMessage = "No FileSystem for scheme: nonexistFS" + val expectMessage = "No FileSystem for scheme nonexistFS" val message = intercept[java.io.IOException] { spark.readStream.option("fs.defaultFS", defaultFs).text(path) }.getMessage - assert(message == expectMessage) + assert(message.filterNot(Set(':', '"').contains) == expectMessage) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 15d2922 [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs 15d2922 is described below commit 15d2922b1efd8c365059d9e223d1be753d5d16ee Author: HyukjinKwon AuthorDate: Wed Jun 10 15:54:07 2020 -0700 [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs ### What changes were proposed in this pull request? This is another approach to fix the issue. See the previous try https://github.com/apache/spark/pull/28745. It was too invasive so I took more conservative approach. This PR proposes to resolve grouping attributes separately first so it can be properly referred when `FlatMapGroupsInPandas` and `FlatMapCoGroupsInPandas` are resolved without ambiguity. Previously, ```python from pyspark.sql.functions import * df = spark.createDataFrame([[1, 1]], ["column", "Score"]) pandas_udf("column integer, Score float", PandasUDFType.GROUPED_MAP) def my_pandas_udf(pdf): return pdf.assign(Score=0.5) df.groupby('COLUMN').apply(my_pandas_udf).show() ``` was failed as below: ``` pyspark.sql.utils.AnalysisException: "Reference 'COLUMN' is ambiguous, could be: COLUMN, COLUMN.;" ``` because the unresolved `COLUMN` in `FlatMapGroupsInPandas` doesn't know which reference to take from the child projection. After this fix, it resolves the child projection first with grouping keys and pass, to `FlatMapGroupsInPandas`, the attribute as a grouping key from the child projection that is positionally selected. ### Why are the changes needed? To resolve grouping keys correctly. ### Does this PR introduce _any_ user-facing change? Yes, ```python from pyspark.sql.functions import * df = spark.createDataFrame([[1, 1]], ["column", "Score"]) pandas_udf("column integer, Score float", PandasUDFType.GROUPED_MAP) def my_pandas_udf(pdf): return pdf.assign(Score=0.5) df.groupby('COLUMN').apply(my_pandas_udf).show() ``` ```python df1 = spark.createDataFrame([(1, 1)], ("column", "value")) df2 = spark.createDataFrame([(1, 1)], ("column", "value")) df1.groupby("COLUMN").cogroup( df2.groupby("COLUMN") ).applyInPandas(lambda r, l: r + l, df1.schema).show() ``` Before: ``` pyspark.sql.utils.AnalysisException: Reference 'COLUMN' is ambiguous, could be: COLUMN, COLUMN.; ``` ``` pyspark.sql.utils.AnalysisException: cannot resolve '`COLUMN`' given input columns: [COLUMN, COLUMN, value, value];; 'FlatMapCoGroupsInPandas ['COLUMN], ['COLUMN], (column#9L, value#10L, column#13L, value#14L), [column#22L, value#23L] :- Project [COLUMN#9L, column#9L, value#10L] : +- LogicalRDD [column#9L, value#10L], false +- Project [COLUMN#13L, column#13L, value#14L] +- LogicalRDD [column#13L, value#14L], false ``` After: ``` +--+-+ |column|Score| +--+-+ | 1| 0.5| +--+-+ ``` ``` +--+-+ |column|value| +--+-+ | 2|2| +--+-+ ``` ### How was this patch tested? Unittests were added and manually tested. Closes #28777 from HyukjinKwon/SPARK-31915-another. Authored-by: HyukjinKwon Signed-off-by: Bryan Cutler --- python/pyspark/sql/tests/test_pandas_cogrouped_map.py | 18 +- python/pyspark/sql/tests/test_pandas_grouped_map.py| 10 ++ .../apache/spark/sql/RelationalGroupedDataset.scala| 17 ++--- 3 files changed, 37 insertions(+), 8 deletions(-) diff --git a/python/pyspark/sql/tests/test_pandas_cogrouped_map.py b/python/pyspark/sql/tests/test_pandas_cogrouped_map.py index 3ed9d2a..c1cb30c 100644 --- a/python/pyspark/sql/tests/test_pandas_cogrouped_map.py +++ b/python/pyspark/sql/tests/test_pandas_cogrouped_map.py @@ -19,7 +19,7 @@ import unittest import sys from pyspark.sql.functions import array, explode, col, lit, udf, sum, pandas_udf, PandasUDFType -from pyspark.sql.types import DoubleType, StructType, StructField +from pyspark.sql.types import DoubleType, StructType, StructField, Row from pyspark.testing.sqlutils import ReusedSQLTestCase, have_pandas, have_pyarrow, \ pandas_requirement_message, pyarrow_requirement_message from pyspark.testing.utils import QuietTest @@ -193,6 +193,22 @@ class CogroupedMapInPandasTests(ReusedSQLTestCase): left.groupby('id').cogroup(right.groupby('id')) \
[spark] branch branch-3.0 updated: [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 15d2922 [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs 15d2922 is described below commit 15d2922b1efd8c365059d9e223d1be753d5d16ee Author: HyukjinKwon AuthorDate: Wed Jun 10 15:54:07 2020 -0700 [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs ### What changes were proposed in this pull request? This is another approach to fix the issue. See the previous try https://github.com/apache/spark/pull/28745. It was too invasive so I took more conservative approach. This PR proposes to resolve grouping attributes separately first so it can be properly referred when `FlatMapGroupsInPandas` and `FlatMapCoGroupsInPandas` are resolved without ambiguity. Previously, ```python from pyspark.sql.functions import * df = spark.createDataFrame([[1, 1]], ["column", "Score"]) pandas_udf("column integer, Score float", PandasUDFType.GROUPED_MAP) def my_pandas_udf(pdf): return pdf.assign(Score=0.5) df.groupby('COLUMN').apply(my_pandas_udf).show() ``` was failed as below: ``` pyspark.sql.utils.AnalysisException: "Reference 'COLUMN' is ambiguous, could be: COLUMN, COLUMN.;" ``` because the unresolved `COLUMN` in `FlatMapGroupsInPandas` doesn't know which reference to take from the child projection. After this fix, it resolves the child projection first with grouping keys and pass, to `FlatMapGroupsInPandas`, the attribute as a grouping key from the child projection that is positionally selected. ### Why are the changes needed? To resolve grouping keys correctly. ### Does this PR introduce _any_ user-facing change? Yes, ```python from pyspark.sql.functions import * df = spark.createDataFrame([[1, 1]], ["column", "Score"]) pandas_udf("column integer, Score float", PandasUDFType.GROUPED_MAP) def my_pandas_udf(pdf): return pdf.assign(Score=0.5) df.groupby('COLUMN').apply(my_pandas_udf).show() ``` ```python df1 = spark.createDataFrame([(1, 1)], ("column", "value")) df2 = spark.createDataFrame([(1, 1)], ("column", "value")) df1.groupby("COLUMN").cogroup( df2.groupby("COLUMN") ).applyInPandas(lambda r, l: r + l, df1.schema).show() ``` Before: ``` pyspark.sql.utils.AnalysisException: Reference 'COLUMN' is ambiguous, could be: COLUMN, COLUMN.; ``` ``` pyspark.sql.utils.AnalysisException: cannot resolve '`COLUMN`' given input columns: [COLUMN, COLUMN, value, value];; 'FlatMapCoGroupsInPandas ['COLUMN], ['COLUMN], (column#9L, value#10L, column#13L, value#14L), [column#22L, value#23L] :- Project [COLUMN#9L, column#9L, value#10L] : +- LogicalRDD [column#9L, value#10L], false +- Project [COLUMN#13L, column#13L, value#14L] +- LogicalRDD [column#13L, value#14L], false ``` After: ``` +--+-+ |column|Score| +--+-+ | 1| 0.5| +--+-+ ``` ``` +--+-+ |column|value| +--+-+ | 2|2| +--+-+ ``` ### How was this patch tested? Unittests were added and manually tested. Closes #28777 from HyukjinKwon/SPARK-31915-another. Authored-by: HyukjinKwon Signed-off-by: Bryan Cutler --- python/pyspark/sql/tests/test_pandas_cogrouped_map.py | 18 +- python/pyspark/sql/tests/test_pandas_grouped_map.py| 10 ++ .../apache/spark/sql/RelationalGroupedDataset.scala| 17 ++--- 3 files changed, 37 insertions(+), 8 deletions(-) diff --git a/python/pyspark/sql/tests/test_pandas_cogrouped_map.py b/python/pyspark/sql/tests/test_pandas_cogrouped_map.py index 3ed9d2a..c1cb30c 100644 --- a/python/pyspark/sql/tests/test_pandas_cogrouped_map.py +++ b/python/pyspark/sql/tests/test_pandas_cogrouped_map.py @@ -19,7 +19,7 @@ import unittest import sys from pyspark.sql.functions import array, explode, col, lit, udf, sum, pandas_udf, PandasUDFType -from pyspark.sql.types import DoubleType, StructType, StructField +from pyspark.sql.types import DoubleType, StructType, StructField, Row from pyspark.testing.sqlutils import ReusedSQLTestCase, have_pandas, have_pyarrow, \ pandas_requirement_message, pyarrow_requirement_message from pyspark.testing.utils import QuietTest @@ -193,6 +193,22 @@ class CogroupedMapInPandasTests(ReusedSQLTestCase): left.groupby('id').cogroup(right.groupby('id')) \
[spark] branch master updated (22dda6e -> 5d78537)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 22dda6e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing add 5d78537 [SPARK-31942] Revert "[SPARK-31864][SQL] Adjust AQE skew join trigger condition No new revisions were added by this update. Summary of changes: .../execution/adaptive/OptimizeSkewedJoin.scala| 29 -- 1 file changed, 16 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (22dda6e -> 5d78537)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 22dda6e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing add 5d78537 [SPARK-31942] Revert "[SPARK-31864][SQL] Adjust AQE skew join trigger condition No new revisions were added by this update. Summary of changes: .../execution/adaptive/OptimizeSkewedJoin.scala| 29 -- 1 file changed, 16 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (22dda6e -> 5d78537)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 22dda6e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing add 5d78537 [SPARK-31942] Revert "[SPARK-31864][SQL] Adjust AQE skew join trigger condition No new revisions were added by this update. Summary of changes: .../execution/adaptive/OptimizeSkewedJoin.scala| 29 -- 1 file changed, 16 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (b9807ac -> 4638402)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from b9807ac Revert "[SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber" add 4638402 [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing No new revisions were added by this update. Summary of changes: .../catalyst/util/DateTimeFormatterHelper.scala| 26 - .../sql/catalyst/util/DateFormatterSuite.scala | 2 + .../sql/catalyst/util/DatetimeFormatterSuite.scala | 78 +++ .../catalyst/util/TimestampFormatterSuite.scala| 2 + .../sql-tests/inputs/datetime-parsing-invalid.sql | 20 ...time-legacy.sql => datetime-parsing-legacy.sql} | 2 +- .../sql-tests/inputs/datetime-parsing.sql | 16 +++ .../results/datetime-parsing-invalid.sql.out | 110 + .../results/datetime-parsing-legacy.sql.out| 106 .../sql-tests/results/datetime-parsing.sql.out | 106 10 files changed, 464 insertions(+), 4 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing-invalid.sql copy sql/core/src/test/resources/sql-tests/inputs/{datetime-legacy.sql => datetime-parsing-legacy.sql} (61%) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-invalid.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-legacy.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (22dda6e -> 5d78537)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 22dda6e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing add 5d78537 [SPARK-31942] Revert "[SPARK-31864][SQL] Adjust AQE skew join trigger condition No new revisions were added by this update. Summary of changes: .../execution/adaptive/OptimizeSkewedJoin.scala| 29 -- 1 file changed, 16 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (b9807ac -> 4638402)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from b9807ac Revert "[SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber" add 4638402 [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing No new revisions were added by this update. Summary of changes: .../catalyst/util/DateTimeFormatterHelper.scala| 26 - .../sql/catalyst/util/DateFormatterSuite.scala | 2 + .../sql/catalyst/util/DatetimeFormatterSuite.scala | 78 +++ .../catalyst/util/TimestampFormatterSuite.scala| 2 + .../sql-tests/inputs/datetime-parsing-invalid.sql | 20 ...time-legacy.sql => datetime-parsing-legacy.sql} | 2 +- .../sql-tests/inputs/datetime-parsing.sql | 16 +++ .../results/datetime-parsing-invalid.sql.out | 110 + .../results/datetime-parsing-legacy.sql.out| 106 .../sql-tests/results/datetime-parsing.sql.out | 106 10 files changed, 464 insertions(+), 4 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing-invalid.sql copy sql/core/src/test/resources/sql-tests/inputs/{datetime-legacy.sql => datetime-parsing-legacy.sql} (61%) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-invalid.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-legacy.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b7ef529 -> 22dda6e)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b7ef529 [SPARK-31964][PYTHON] Use Pandas is_categorical on Arrow category type conversion add 22dda6e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing No new revisions were added by this update. Summary of changes: .../catalyst/util/DateTimeFormatterHelper.scala| 26 - .../sql/catalyst/util/DateFormatterSuite.scala | 2 + .../sql/catalyst/util/DatetimeFormatterSuite.scala | 78 +++ .../catalyst/util/TimestampFormatterSuite.scala| 2 + .../sql-tests/inputs/datetime-parsing-invalid.sql | 20 ...time-legacy.sql => datetime-parsing-legacy.sql} | 2 +- .../sql-tests/inputs/datetime-parsing.sql | 16 +++ .../results/datetime-parsing-invalid.sql.out | 110 + .../results/datetime-parsing-legacy.sql.out| 106 .../sql-tests/results/datetime-parsing.sql.out | 106 10 files changed, 464 insertions(+), 4 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing-invalid.sql copy sql/core/src/test/resources/sql-tests/inputs/{datetime-legacy.sql => datetime-parsing-legacy.sql} (61%) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-invalid.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-legacy.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (22dda6e -> 5d78537)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 22dda6e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing add 5d78537 [SPARK-31942] Revert "[SPARK-31864][SQL] Adjust AQE skew join trigger condition No new revisions were added by this update. Summary of changes: .../execution/adaptive/OptimizeSkewedJoin.scala| 29 -- 1 file changed, 16 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b7ef529 -> 22dda6e)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b7ef529 [SPARK-31964][PYTHON] Use Pandas is_categorical on Arrow category type conversion add 22dda6e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing No new revisions were added by this update. Summary of changes: .../catalyst/util/DateTimeFormatterHelper.scala| 26 - .../sql/catalyst/util/DateFormatterSuite.scala | 2 + .../sql/catalyst/util/DatetimeFormatterSuite.scala | 78 +++ .../catalyst/util/TimestampFormatterSuite.scala| 2 + .../sql-tests/inputs/datetime-parsing-invalid.sql | 20 ...time-legacy.sql => datetime-parsing-legacy.sql} | 2 +- .../sql-tests/inputs/datetime-parsing.sql | 16 +++ .../results/datetime-parsing-invalid.sql.out | 110 + .../results/datetime-parsing-legacy.sql.out| 106 .../sql-tests/results/datetime-parsing.sql.out | 106 10 files changed, 464 insertions(+), 4 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing-invalid.sql copy sql/core/src/test/resources/sql-tests/inputs/{datetime-legacy.sql => datetime-parsing-legacy.sql} (61%) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-invalid.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-legacy.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b7ef529 -> 22dda6e)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b7ef529 [SPARK-31964][PYTHON] Use Pandas is_categorical on Arrow category type conversion add 22dda6e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing No new revisions were added by this update. Summary of changes: .../catalyst/util/DateTimeFormatterHelper.scala| 26 - .../sql/catalyst/util/DateFormatterSuite.scala | 2 + .../sql/catalyst/util/DatetimeFormatterSuite.scala | 78 +++ .../catalyst/util/TimestampFormatterSuite.scala| 2 + .../sql-tests/inputs/datetime-parsing-invalid.sql | 20 ...time-legacy.sql => datetime-parsing-legacy.sql} | 2 +- .../sql-tests/inputs/datetime-parsing.sql | 16 +++ .../results/datetime-parsing-invalid.sql.out | 110 + .../results/datetime-parsing-legacy.sql.out| 106 .../sql-tests/results/datetime-parsing.sql.out | 106 10 files changed, 464 insertions(+), 4 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing-invalid.sql copy sql/core/src/test/resources/sql-tests/inputs/{datetime-legacy.sql => datetime-parsing-legacy.sql} (61%) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-invalid.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-legacy.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (b9807ac -> 4638402)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from b9807ac Revert "[SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber" add 4638402 [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing No new revisions were added by this update. Summary of changes: .../catalyst/util/DateTimeFormatterHelper.scala| 26 - .../sql/catalyst/util/DateFormatterSuite.scala | 2 + .../sql/catalyst/util/DatetimeFormatterSuite.scala | 78 +++ .../catalyst/util/TimestampFormatterSuite.scala| 2 + .../sql-tests/inputs/datetime-parsing-invalid.sql | 20 ...time-legacy.sql => datetime-parsing-legacy.sql} | 2 +- .../sql-tests/inputs/datetime-parsing.sql | 16 +++ .../results/datetime-parsing-invalid.sql.out | 110 + .../results/datetime-parsing-legacy.sql.out| 106 .../sql-tests/results/datetime-parsing.sql.out | 106 10 files changed, 464 insertions(+), 4 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing-invalid.sql copy sql/core/src/test/resources/sql-tests/inputs/{datetime-legacy.sql => datetime-parsing-legacy.sql} (61%) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-invalid.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-legacy.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (b9807ac -> 4638402)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from b9807ac Revert "[SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber" add 4638402 [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing No new revisions were added by this update. Summary of changes: .../catalyst/util/DateTimeFormatterHelper.scala| 26 - .../sql/catalyst/util/DateFormatterSuite.scala | 2 + .../sql/catalyst/util/DatetimeFormatterSuite.scala | 78 +++ .../catalyst/util/TimestampFormatterSuite.scala| 2 + .../sql-tests/inputs/datetime-parsing-invalid.sql | 20 ...time-legacy.sql => datetime-parsing-legacy.sql} | 2 +- .../sql-tests/inputs/datetime-parsing.sql | 16 +++ .../results/datetime-parsing-invalid.sql.out | 110 + .../results/datetime-parsing-legacy.sql.out| 106 .../sql-tests/results/datetime-parsing.sql.out | 106 10 files changed, 464 insertions(+), 4 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing-invalid.sql copy sql/core/src/test/resources/sql-tests/inputs/{datetime-legacy.sql => datetime-parsing-legacy.sql} (61%) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-invalid.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-legacy.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (b9807ac -> 4638402)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from b9807ac Revert "[SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber" add 4638402 [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing No new revisions were added by this update. Summary of changes: .../catalyst/util/DateTimeFormatterHelper.scala| 26 - .../sql/catalyst/util/DateFormatterSuite.scala | 2 + .../sql/catalyst/util/DatetimeFormatterSuite.scala | 78 +++ .../catalyst/util/TimestampFormatterSuite.scala| 2 + .../sql-tests/inputs/datetime-parsing-invalid.sql | 20 ...time-legacy.sql => datetime-parsing-legacy.sql} | 2 +- .../sql-tests/inputs/datetime-parsing.sql | 16 +++ .../results/datetime-parsing-invalid.sql.out | 110 + .../results/datetime-parsing-legacy.sql.out| 106 .../sql-tests/results/datetime-parsing.sql.out | 106 10 files changed, 464 insertions(+), 4 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing-invalid.sql copy sql/core/src/test/resources/sql-tests/inputs/{datetime-legacy.sql => datetime-parsing-legacy.sql} (61%) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-invalid.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-legacy.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b7ef529 -> 22dda6e)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b7ef529 [SPARK-31964][PYTHON] Use Pandas is_categorical on Arrow category type conversion add 22dda6e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing No new revisions were added by this update. Summary of changes: .../catalyst/util/DateTimeFormatterHelper.scala| 26 - .../sql/catalyst/util/DateFormatterSuite.scala | 2 + .../sql/catalyst/util/DatetimeFormatterSuite.scala | 78 +++ .../catalyst/util/TimestampFormatterSuite.scala| 2 + .../sql-tests/inputs/datetime-parsing-invalid.sql | 20 ...time-legacy.sql => datetime-parsing-legacy.sql} | 2 +- .../sql-tests/inputs/datetime-parsing.sql | 16 +++ .../results/datetime-parsing-invalid.sql.out | 110 + .../results/datetime-parsing-legacy.sql.out| 106 .../sql-tests/results/datetime-parsing.sql.out | 106 10 files changed, 464 insertions(+), 4 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing-invalid.sql copy sql/core/src/test/resources/sql-tests/inputs/{datetime-legacy.sql => datetime-parsing-legacy.sql} (61%) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-invalid.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-legacy.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b7ef529 -> 22dda6e)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b7ef529 [SPARK-31964][PYTHON] Use Pandas is_categorical on Arrow category type conversion add 22dda6e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing No new revisions were added by this update. Summary of changes: .../catalyst/util/DateTimeFormatterHelper.scala| 26 - .../sql/catalyst/util/DateFormatterSuite.scala | 2 + .../sql/catalyst/util/DatetimeFormatterSuite.scala | 78 +++ .../catalyst/util/TimestampFormatterSuite.scala| 2 + .../sql-tests/inputs/datetime-parsing-invalid.sql | 20 ...time-legacy.sql => datetime-parsing-legacy.sql} | 2 +- .../sql-tests/inputs/datetime-parsing.sql | 16 +++ .../results/datetime-parsing-invalid.sql.out | 110 + .../results/datetime-parsing-legacy.sql.out| 106 .../sql-tests/results/datetime-parsing.sql.out | 106 10 files changed, 464 insertions(+), 4 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing-invalid.sql copy sql/core/src/test/resources/sql-tests/inputs/{datetime-legacy.sql => datetime-parsing-legacy.sql} (61%) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/datetime-parsing.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-invalid.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing-legacy.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/datetime-parsing.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Use 2.4.6 at download page example
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 06509a5 Use 2.4.6 at download page example 06509a5 is described below commit 06509a57b64c889cce85e05f1a6e291ef7a67a83 Author: Dongjoon Hyun AuthorDate: Wed Jun 10 20:12:45 2020 -0700 Use 2.4.6 at download page example --- downloads.md| 2 +- site/downloads.html | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/downloads.md b/downloads.md index f8f47fa..d6c3930 100644 --- a/downloads.md +++ b/downloads.md @@ -42,7 +42,7 @@ Spark artifacts are [hosted in Maven Central](https://search.maven.org/search?q= groupId: org.apache.spark artifactId: spark-core_2.11 -version: 2.4.5 +version: 2.4.6 ### Installing with PyPi https://pypi.org/project/pyspark/;>PySpark is now available in pypi. To install just run `pip install pyspark`. diff --git a/site/downloads.html b/site/downloads.html index b7c123d..1d8a065 100644 --- a/site/downloads.html +++ b/site/downloads.html @@ -242,7 +242,7 @@ You can select and download it above. groupId: org.apache.spark artifactId: spark-core_2.11 -version: 2.4.5 +version: 2.4.6 Installing with PyPi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun merged pull request #267: Use 2.4.6 at download page example
dongjoon-hyun merged pull request #267: URL: https://github.com/apache/spark-website/pull/267 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on pull request #267: Use 2.4.6 at download page example
dongjoon-hyun commented on pull request #267: URL: https://github.com/apache/spark-website/pull/267#issuecomment-642379335 Thanks~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on pull request #267: Use 2.4.6 at download page example
dongjoon-hyun commented on pull request #267: URL: https://github.com/apache/spark-website/pull/267#issuecomment-642376509 Could you review this, @maropu ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun opened a new pull request #267: Use 2.4.6 at download page example
dongjoon-hyun opened a new pull request #267: URL: https://github.com/apache/spark-website/pull/267 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Fix 2-4-6 web build
This is an automated email from the ASF dual-hosted git repository. holden pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 3d9740f Fix 2-4-6 web build 3d9740f is described below commit 3d9740f38beca3b8609b8650409edb93a70c1aec Author: Holden Karau AuthorDate: Wed Jun 10 18:36:12 2020 -0700 Fix 2-4-6 web build Fix the 2.4.6 web build, the jekyll serve wrote some localhost values in the sitemap we don't want and add the generated release files. Author: Holden Karau Closes #266 from holdenk/spark-2-4-6-rebuild. --- site/mailing-lists.html| 2 +- site/{mailing-lists.html => news/spark-2-4-6.html} | 16 +- .../spark-release-2-4-6.html} | 56 +++- site/sitemap.xml | 370 ++--- 4 files changed, 248 insertions(+), 196 deletions(-) diff --git a/site/mailing-lists.html b/site/mailing-lists.html index 2f4a88f..f6686f9 100644 --- a/site/mailing-lists.html +++ b/site/mailing-lists.html @@ -12,7 +12,7 @@ -http://localhost:4000/community.html; /> +https://spark.apache.org/community.html; /> diff --git a/site/mailing-lists.html b/site/news/spark-2-4-6.html similarity index 94% copy from site/mailing-lists.html copy to site/news/spark-2-4-6.html index 2f4a88f..53d1399 100644 --- a/site/mailing-lists.html +++ b/site/news/spark-2-4-6.html @@ -6,14 +6,11 @@ - Mailing Lists | Apache Spark + Spark 2.4.6 released | Apache Spark - -http://localhost:4000/community.html; /> - @@ -203,7 +200,16 @@ - +Spark 2.4.6 released + + +We are happy to announce the availability of Spark 2.4.6! Visit the release notes to read about the new features, or download the release today. + + + + +Spark News Archive + diff --git a/site/mailing-lists.html b/site/releases/spark-release-2-4-6.html similarity index 68% copy from site/mailing-lists.html copy to site/releases/spark-release-2-4-6.html index 2f4a88f..299cf58 100644 --- a/site/mailing-lists.html +++ b/site/releases/spark-release-2-4-6.html @@ -6,14 +6,11 @@ - Mailing Lists | Apache Spark + Spark Release 2.4.6 | Apache Spark - -http://localhost:4000/community.html; /> - @@ -203,7 +200,56 @@ - +Spark Release 2.4.6 + + +Spark 2.4.6 is a maintenance release containing stability, correctness, and security fixes. This release is based on the branch-2.4 maintenance branch of Spark. We strongly recommend all 2.4 users to upgrade to this stable release. + +Notable changes + + https://issues.apache.org/jira/browse/SPARK-29419;>[SPARK-29419]: Seq.toDS / spark.createDataset(Seq) is not thread-safe + https://issues.apache.org/jira/browse/SPARK-31519;>[SPARK-31519]: Cast in having aggregate expressions returns the wrong result + https://issues.apache.org/jira/browse/SPARK-26293;>[SPARK-26293]: Cast exception when having python udf in subquery + https://issues.apache.org/jira/browse/SPARK-30826;>[SPARK-30826]: LIKE returns wrong result from external table using parquet + https://issues.apache.org/jira/browse/SPARK-30857;>[SPARK-30857]: Wrong truncations of timestamps before the epoch to hours and days + https://issues.apache.org/jira/browse/SPARK-31256;>[SPARK-31256]: Dropna doesnt work for struct columns + https://issues.apache.org/jira/browse/SPARK-31312;>[SPARK-31312]: Transforming Hive simple UDF (using JAR) expression may incur CNFE in later evaluation + https://issues.apache.org/jira/browse/SPARK-31420;>[SPARK-31420]: Infinite timeline redraw in job details page + https://issues.apache.org/jira/browse/SPARK-31485;>[SPARK-31485]: Barrier stage can hang if only partial tasks launched + https://issues.apache.org/jira/browse/SPARK-31500;>[SPARK-31500]: collect_set() of BinaryType returns duplicate elements + https://issues.apache.org/jira/browse/SPARK-31503;>[SPARK-31503]: fix the SQL string of the TRIM functions + https://issues.apache.org/jira/browse/SPARK-31663;>[SPARK-31663]: Grouping sets with having clause returns the wrong result + https://issues.apache.org/jira/browse/SPARK-26908;>[SPARK-26908]: Fix toMilis + https://issues.apache.org/jira/browse/SPARK-31563;>[SPARK-31563]: Failure of Inset.sql for UTF8String collection + + +Dependency Changes + +While being a maintence release we did still upgrade some dependencies in this release they are: + + netty-all to 4.1.47.Final (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-20445;>[CVE-2019-20445]) + Janino to 3.0.16 (SQL Generated code) + aws-java-sdk-sts to 1.11.655 (required for kinesis client upgrade) + snappy 1.1.7.5 (stability improvements ppc64le performance) + + +Known issues + + https://issues.apache.org/jira/browse/SPARK-31170;>[SPARK-31170]:
[GitHub] [spark-website] asfgit closed pull request #266: Fix 2-4-6 web build
asfgit closed pull request #266: URL: https://github.com/apache/spark-website/pull/266 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] holdenk opened a new pull request #266: Fix 2-4-6 web build
holdenk opened a new pull request #266: URL: https://github.com/apache/spark-website/pull/266 Fix the 2.4.6 web build, the jekyll serve wrote some localhost values in the sitemap we don't want and add the generated release files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c7d45c0 -> b7ef529)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c7d45c0 [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3 add b7ef529 [SPARK-31964][PYTHON] Use Pandas is_categorical on Arrow category type conversion No new revisions were added by this update. Summary of changes: python/pyspark/sql/pandas/serializers.py | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c7d45c0 -> b7ef529)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c7d45c0 [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3 add b7ef529 [SPARK-31964][PYTHON] Use Pandas is_categorical on Arrow category type conversion No new revisions were added by this update. Summary of changes: python/pyspark/sql/pandas/serializers.py | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c7d45c0 -> b7ef529)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c7d45c0 [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3 add b7ef529 [SPARK-31964][PYTHON] Use Pandas is_categorical on Arrow category type conversion No new revisions were added by this update. Summary of changes: python/pyspark/sql/pandas/serializers.py | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (62fbff8 -> b9807ac)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 62fbff8 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join add b9807ac Revert "[SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber" No new revisions were added by this update. Summary of changes: project/SparkBuild.scala | 3 +- .../sql/hive/thriftserver/SharedThriftServer.scala | 46 ++ .../thriftserver/ThriftServerQueryTestSuite.scala | 3 -- .../ThriftServerWithSparkContextSuite.scala| 11 +- .../service/cli/thrift/ThriftBinaryCLIService.java | 11 +- .../hive/service/cli/thrift/ThriftCLIService.java | 3 -- .../service/cli/thrift/ThriftHttpCLIService.java | 21 +++--- .../service/cli/thrift/ThriftBinaryCLIService.java | 11 +- .../hive/service/cli/thrift/ThriftCLIService.java | 3 -- .../service/cli/thrift/ThriftHttpCLIService.java | 21 +++--- 10 files changed, 29 insertions(+), 104 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (62fbff8 -> b9807ac)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 62fbff8 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join add b9807ac Revert "[SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber" No new revisions were added by this update. Summary of changes: project/SparkBuild.scala | 3 +- .../sql/hive/thriftserver/SharedThriftServer.scala | 46 ++ .../thriftserver/ThriftServerQueryTestSuite.scala | 3 -- .../ThriftServerWithSparkContextSuite.scala| 11 +- .../service/cli/thrift/ThriftBinaryCLIService.java | 11 +- .../hive/service/cli/thrift/ThriftCLIService.java | 3 -- .../service/cli/thrift/ThriftHttpCLIService.java | 21 +++--- .../service/cli/thrift/ThriftBinaryCLIService.java | 11 +- .../hive/service/cli/thrift/ThriftCLIService.java | 3 -- .../service/cli/thrift/ThriftHttpCLIService.java | 21 +++--- 10 files changed, 29 insertions(+), 104 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c7d45c0 [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3 c7d45c0 is described below commit c7d45c0e0b8c077da8ed4a902503a6102becf255 Author: Dongjoon Hyun AuthorDate: Wed Jun 10 17:36:32 2020 -0700 [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3 ### What changes were proposed in this pull request? This PR updates the test case to accept Hadoop 2/3 error message correctly. ### Why are the changes needed? SPARK-31935(https://github.com/apache/spark/pull/28760) breaks Hadoop 3.2 UT because Hadoop 2 and Hadoop 3 have different exception messages. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins with both Hadoop 2/3 or do the following manually. **Hadoop 2.7** ``` $ build/sbt "sql/testOnly *.FileBasedDataSourceSuite -- -z SPARK-31935" ... [info] All tests passed. ``` **Hadoop 3.2** ``` $ build/sbt "sql/testOnly *.FileBasedDataSourceSuite -- -z SPARK-31935" -Phadoop-3.2 ... [info] All tests passed. ``` Closes #28791 from dongjoon-hyun/SPARK-31935. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala index efc7cac..d8157d3 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala @@ -849,15 +849,15 @@ class FileBasedDataSourceSuite extends QueryTest withTempDir { dir => val path = dir.getCanonicalPath val defaultFs = "nonexistFS://nonexistFS" - val expectMessage = "No FileSystem for scheme: nonexistFS" + val expectMessage = "No FileSystem for scheme nonexistFS" val message1 = intercept[java.io.IOException] { spark.range(10).write.option("fs.defaultFS", defaultFs).parquet(path) }.getMessage - assert(message1 == expectMessage) + assert(message1.filterNot(Set(':', '"').contains) == expectMessage) val message2 = intercept[java.io.IOException] { spark.read.option("fs.defaultFS", defaultFs).parquet(path) }.getMessage - assert(message2 == expectMessage) + assert(message2.filterNot(Set(':', '"').contains) == expectMessage) } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (00d06ca -> 4a25200)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 00d06ca [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs add 4a25200 Revert "[SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber" No new revisions were added by this update. Summary of changes: project/SparkBuild.scala | 3 +- .../sql/hive/thriftserver/SharedThriftServer.scala | 46 ++ .../thriftserver/ThriftServerQueryTestSuite.scala | 3 -- .../ThriftServerWithSparkContextSuite.scala| 11 +- .../service/cli/thrift/ThriftBinaryCLIService.java | 11 +- .../hive/service/cli/thrift/ThriftCLIService.java | 3 -- .../service/cli/thrift/ThriftHttpCLIService.java | 21 +++--- .../service/cli/thrift/ThriftBinaryCLIService.java | 11 +- .../hive/service/cli/thrift/ThriftCLIService.java | 3 -- .../service/cli/thrift/ThriftHttpCLIService.java | 21 +++--- 10 files changed, 29 insertions(+), 104 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (00d06ca -> 4a25200)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 00d06ca [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs add 4a25200 Revert "[SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber" No new revisions were added by this update. Summary of changes: project/SparkBuild.scala | 3 +- .../sql/hive/thriftserver/SharedThriftServer.scala | 46 ++ .../thriftserver/ThriftServerQueryTestSuite.scala | 3 -- .../ThriftServerWithSparkContextSuite.scala| 11 +- .../service/cli/thrift/ThriftBinaryCLIService.java | 11 +- .../hive/service/cli/thrift/ThriftCLIService.java | 3 -- .../service/cli/thrift/ThriftHttpCLIService.java | 21 +++--- .../service/cli/thrift/ThriftBinaryCLIService.java | 11 +- .../hive/service/cli/thrift/ThriftCLIService.java | 3 -- .../service/cli/thrift/ThriftHttpCLIService.java | 21 +++--- 10 files changed, 29 insertions(+), 104 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (00d06ca -> 4a25200)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 00d06ca [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs add 4a25200 Revert "[SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber" No new revisions were added by this update. Summary of changes: project/SparkBuild.scala | 3 +- .../sql/hive/thriftserver/SharedThriftServer.scala | 46 ++ .../thriftserver/ThriftServerQueryTestSuite.scala | 3 -- .../ThriftServerWithSparkContextSuite.scala| 11 +- .../service/cli/thrift/ThriftBinaryCLIService.java | 11 +- .../hive/service/cli/thrift/ThriftCLIService.java | 3 -- .../service/cli/thrift/ThriftHttpCLIService.java | 21 +++--- .../service/cli/thrift/ThriftBinaryCLIService.java | 11 +- .../hive/service/cli/thrift/ThriftCLIService.java | 3 -- .../service/cli/thrift/ThriftHttpCLIService.java | 21 +++--- 10 files changed, 29 insertions(+), 104 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs
This is an automated email from the ASF dual-hosted git repository. cutlerb pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 00d06ca [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs 00d06ca is described below commit 00d06cad564d5e3e5f78a687776d02fe0695a861 Author: HyukjinKwon AuthorDate: Wed Jun 10 15:54:07 2020 -0700 [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs ### What changes were proposed in this pull request? This is another approach to fix the issue. See the previous try https://github.com/apache/spark/pull/28745. It was too invasive so I took more conservative approach. This PR proposes to resolve grouping attributes separately first so it can be properly referred when `FlatMapGroupsInPandas` and `FlatMapCoGroupsInPandas` are resolved without ambiguity. Previously, ```python from pyspark.sql.functions import * df = spark.createDataFrame([[1, 1]], ["column", "Score"]) pandas_udf("column integer, Score float", PandasUDFType.GROUPED_MAP) def my_pandas_udf(pdf): return pdf.assign(Score=0.5) df.groupby('COLUMN').apply(my_pandas_udf).show() ``` was failed as below: ``` pyspark.sql.utils.AnalysisException: "Reference 'COLUMN' is ambiguous, could be: COLUMN, COLUMN.;" ``` because the unresolved `COLUMN` in `FlatMapGroupsInPandas` doesn't know which reference to take from the child projection. After this fix, it resolves the child projection first with grouping keys and pass, to `FlatMapGroupsInPandas`, the attribute as a grouping key from the child projection that is positionally selected. ### Why are the changes needed? To resolve grouping keys correctly. ### Does this PR introduce _any_ user-facing change? Yes, ```python from pyspark.sql.functions import * df = spark.createDataFrame([[1, 1]], ["column", "Score"]) pandas_udf("column integer, Score float", PandasUDFType.GROUPED_MAP) def my_pandas_udf(pdf): return pdf.assign(Score=0.5) df.groupby('COLUMN').apply(my_pandas_udf).show() ``` ```python df1 = spark.createDataFrame([(1, 1)], ("column", "value")) df2 = spark.createDataFrame([(1, 1)], ("column", "value")) df1.groupby("COLUMN").cogroup( df2.groupby("COLUMN") ).applyInPandas(lambda r, l: r + l, df1.schema).show() ``` Before: ``` pyspark.sql.utils.AnalysisException: Reference 'COLUMN' is ambiguous, could be: COLUMN, COLUMN.; ``` ``` pyspark.sql.utils.AnalysisException: cannot resolve '`COLUMN`' given input columns: [COLUMN, COLUMN, value, value];; 'FlatMapCoGroupsInPandas ['COLUMN], ['COLUMN], (column#9L, value#10L, column#13L, value#14L), [column#22L, value#23L] :- Project [COLUMN#9L, column#9L, value#10L] : +- LogicalRDD [column#9L, value#10L], false +- Project [COLUMN#13L, column#13L, value#14L] +- LogicalRDD [column#13L, value#14L], false ``` After: ``` +--+-+ |column|Score| +--+-+ | 1| 0.5| +--+-+ ``` ``` +--+-+ |column|value| +--+-+ | 2|2| +--+-+ ``` ### How was this patch tested? Unittests were added and manually tested. Closes #28777 from HyukjinKwon/SPARK-31915-another. Authored-by: HyukjinKwon Signed-off-by: Bryan Cutler --- python/pyspark/sql/tests/test_pandas_cogrouped_map.py | 18 +- python/pyspark/sql/tests/test_pandas_grouped_map.py| 10 ++ .../apache/spark/sql/RelationalGroupedDataset.scala| 17 ++--- 3 files changed, 37 insertions(+), 8 deletions(-) diff --git a/python/pyspark/sql/tests/test_pandas_cogrouped_map.py b/python/pyspark/sql/tests/test_pandas_cogrouped_map.py index 3ed9d2a..c1cb30c 100644 --- a/python/pyspark/sql/tests/test_pandas_cogrouped_map.py +++ b/python/pyspark/sql/tests/test_pandas_cogrouped_map.py @@ -19,7 +19,7 @@ import unittest import sys from pyspark.sql.functions import array, explode, col, lit, udf, sum, pandas_udf, PandasUDFType -from pyspark.sql.types import DoubleType, StructType, StructField +from pyspark.sql.types import DoubleType, StructType, StructField, Row from pyspark.testing.sqlutils import ReusedSQLTestCase, have_pandas, have_pyarrow, \ pandas_requirement_message, pyarrow_requirement_message from pyspark.testing.utils import QuietTest @@ -193,6 +193,22 @@ class CogroupedMapInPandasTests(ReusedSQLTestCase): left.groupby('id').cogroup(right.groupby('id')) \ .applyInPandas(lambda:
[spark] branch master updated: [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs
This is an automated email from the ASF dual-hosted git repository. cutlerb pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 00d06ca [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs 00d06ca is described below commit 00d06cad564d5e3e5f78a687776d02fe0695a861 Author: HyukjinKwon AuthorDate: Wed Jun 10 15:54:07 2020 -0700 [SPARK-31915][SQL][PYTHON] Resolve the grouping column properly per the case sensitivity in grouped and cogrouped pandas UDFs ### What changes were proposed in this pull request? This is another approach to fix the issue. See the previous try https://github.com/apache/spark/pull/28745. It was too invasive so I took more conservative approach. This PR proposes to resolve grouping attributes separately first so it can be properly referred when `FlatMapGroupsInPandas` and `FlatMapCoGroupsInPandas` are resolved without ambiguity. Previously, ```python from pyspark.sql.functions import * df = spark.createDataFrame([[1, 1]], ["column", "Score"]) pandas_udf("column integer, Score float", PandasUDFType.GROUPED_MAP) def my_pandas_udf(pdf): return pdf.assign(Score=0.5) df.groupby('COLUMN').apply(my_pandas_udf).show() ``` was failed as below: ``` pyspark.sql.utils.AnalysisException: "Reference 'COLUMN' is ambiguous, could be: COLUMN, COLUMN.;" ``` because the unresolved `COLUMN` in `FlatMapGroupsInPandas` doesn't know which reference to take from the child projection. After this fix, it resolves the child projection first with grouping keys and pass, to `FlatMapGroupsInPandas`, the attribute as a grouping key from the child projection that is positionally selected. ### Why are the changes needed? To resolve grouping keys correctly. ### Does this PR introduce _any_ user-facing change? Yes, ```python from pyspark.sql.functions import * df = spark.createDataFrame([[1, 1]], ["column", "Score"]) pandas_udf("column integer, Score float", PandasUDFType.GROUPED_MAP) def my_pandas_udf(pdf): return pdf.assign(Score=0.5) df.groupby('COLUMN').apply(my_pandas_udf).show() ``` ```python df1 = spark.createDataFrame([(1, 1)], ("column", "value")) df2 = spark.createDataFrame([(1, 1)], ("column", "value")) df1.groupby("COLUMN").cogroup( df2.groupby("COLUMN") ).applyInPandas(lambda r, l: r + l, df1.schema).show() ``` Before: ``` pyspark.sql.utils.AnalysisException: Reference 'COLUMN' is ambiguous, could be: COLUMN, COLUMN.; ``` ``` pyspark.sql.utils.AnalysisException: cannot resolve '`COLUMN`' given input columns: [COLUMN, COLUMN, value, value];; 'FlatMapCoGroupsInPandas ['COLUMN], ['COLUMN], (column#9L, value#10L, column#13L, value#14L), [column#22L, value#23L] :- Project [COLUMN#9L, column#9L, value#10L] : +- LogicalRDD [column#9L, value#10L], false +- Project [COLUMN#13L, column#13L, value#14L] +- LogicalRDD [column#13L, value#14L], false ``` After: ``` +--+-+ |column|Score| +--+-+ | 1| 0.5| +--+-+ ``` ``` +--+-+ |column|value| +--+-+ | 2|2| +--+-+ ``` ### How was this patch tested? Unittests were added and manually tested. Closes #28777 from HyukjinKwon/SPARK-31915-another. Authored-by: HyukjinKwon Signed-off-by: Bryan Cutler --- python/pyspark/sql/tests/test_pandas_cogrouped_map.py | 18 +- python/pyspark/sql/tests/test_pandas_grouped_map.py| 10 ++ .../apache/spark/sql/RelationalGroupedDataset.scala| 17 ++--- 3 files changed, 37 insertions(+), 8 deletions(-) diff --git a/python/pyspark/sql/tests/test_pandas_cogrouped_map.py b/python/pyspark/sql/tests/test_pandas_cogrouped_map.py index 3ed9d2a..c1cb30c 100644 --- a/python/pyspark/sql/tests/test_pandas_cogrouped_map.py +++ b/python/pyspark/sql/tests/test_pandas_cogrouped_map.py @@ -19,7 +19,7 @@ import unittest import sys from pyspark.sql.functions import array, explode, col, lit, udf, sum, pandas_udf, PandasUDFType -from pyspark.sql.types import DoubleType, StructType, StructField +from pyspark.sql.types import DoubleType, StructType, StructField, Row from pyspark.testing.sqlutils import ReusedSQLTestCase, have_pandas, have_pyarrow, \ pandas_requirement_message, pyarrow_requirement_message from pyspark.testing.utils import QuietTest @@ -193,6 +193,22 @@ class CogroupedMapInPandasTests(ReusedSQLTestCase): left.groupby('id').cogroup(right.groupby('id')) \ .applyInPandas(lambda:
[spark] branch master updated (c400519 -> 2ab82fa)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c400519 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join add 2ab82fa [SPARK-31963][PYSPARK][SQL] Support both pandas 0.23 and 1.0 in serializers.py No new revisions were added by this update. Summary of changes: python/pyspark/sql/pandas/serializers.py | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c400519 -> 2ab82fa)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c400519 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join add 2ab82fa [SPARK-31963][PYSPARK][SQL] Support both pandas 0.23 and 1.0 in serializers.py No new revisions were added by this update. Summary of changes: python/pyspark/sql/pandas/serializers.py | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c400519 -> 2ab82fa)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c400519 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join add 2ab82fa [SPARK-31963][PYSPARK][SQL] Support both pandas 0.23 and 1.0 in serializers.py No new revisions were added by this update. Summary of changes: python/pyspark/sql/pandas/serializers.py | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c400519 -> 2ab82fa)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c400519 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join add 2ab82fa [SPARK-31963][PYSPARK][SQL] Support both pandas 0.23 and 1.0 in serializers.py No new revisions were added by this update. Summary of changes: python/pyspark/sql/pandas/serializers.py | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c400519 -> 2ab82fa)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c400519 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join add 2ab82fa [SPARK-31963][PYSPARK][SQL] Support both pandas 0.23 and 1.0 in serializers.py No new revisions were added by this update. Summary of changes: python/pyspark/sql/pandas/serializers.py | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31956][SQL] Do not fail if there is no ambiguous self join
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 62fbff8 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join 62fbff8 is described below commit 62fbff8ad127f3a6dd2360f3c02a20f4391cdad4 Author: Wenchen Fan AuthorDate: Wed Jun 10 13:11:24 2020 -0700 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/28695 , to fix the problem completely. The root cause is that, `df("col").as("name")` is not a column reference anymore, and should not have the special column metadata. However, this was broken in https://github.com/apache/spark/commit/ba7adc494923de8104ab37d412edd78afe540f45#diff-ac415c903887e49486ba542a65eec980L1050-L1053 This PR fixes the regression, by strip the special column metadata in `Column.name`, which is the behavior before https://github.com/apache/spark/pull/28326 . ### Why are the changes needed? Fix a regression. We shouldn't fail if there is no ambiguous self-join. ### Does this PR introduce _any_ user-facing change? Yes, the query in the test can run now. ### How was this patch tested? updated test Closes #28783 from cloud-fan/self-join. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun (cherry picked from commit c40051932290db3a63f80324900a116019b1e589) Signed-off-by: Dongjoon Hyun --- sql/core/src/main/scala/org/apache/spark/sql/Column.scala | 2 +- .../test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala | 7 ++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Column.scala b/sql/core/src/main/scala/org/apache/spark/sql/Column.scala index 2144472..e6f7b1d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Column.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Column.scala @@ -1042,7 +1042,7 @@ class Column(val expr: Expression) extends Logging { * @since 2.0.0 */ def name(alias: String): Column = withExpr { -Alias(expr, alias)() +Alias(normalizedExpr(), alias)() } /** diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala index fb58c98..3b3b54f 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala @@ -204,7 +204,7 @@ class DataFrameSelfJoinSuite extends QueryTest with SharedSparkSession { } } - test("SPARK-28344: don't fail as ambiguous self join when there is no join") { + test("SPARK-28344: don't fail if there is no ambiguous self join") { withSQLConf( SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true") { val df = Seq(1, 1, 2, 2).toDF("a") @@ -212,6 +212,11 @@ class DataFrameSelfJoinSuite extends QueryTest with SharedSparkSession { checkAnswer( df.select(df("a").alias("x"), sum(df("a")).over(w)), Seq((1, 2), (1, 2), (2, 4), (2, 4)).map(Row.fromTuple)) + + val joined = df.join(spark.range(1)).select($"a") + checkAnswer( +joined.select(joined("a").alias("x"), sum(joined("a")).over(w)), +Seq((1, 2), (1, 2), (2, 4), (2, 4)).map(Row.fromTuple)) } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31956][SQL] Do not fail if there is no ambiguous self join
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 62fbff8 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join 62fbff8 is described below commit 62fbff8ad127f3a6dd2360f3c02a20f4391cdad4 Author: Wenchen Fan AuthorDate: Wed Jun 10 13:11:24 2020 -0700 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/28695 , to fix the problem completely. The root cause is that, `df("col").as("name")` is not a column reference anymore, and should not have the special column metadata. However, this was broken in https://github.com/apache/spark/commit/ba7adc494923de8104ab37d412edd78afe540f45#diff-ac415c903887e49486ba542a65eec980L1050-L1053 This PR fixes the regression, by strip the special column metadata in `Column.name`, which is the behavior before https://github.com/apache/spark/pull/28326 . ### Why are the changes needed? Fix a regression. We shouldn't fail if there is no ambiguous self-join. ### Does this PR introduce _any_ user-facing change? Yes, the query in the test can run now. ### How was this patch tested? updated test Closes #28783 from cloud-fan/self-join. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun (cherry picked from commit c40051932290db3a63f80324900a116019b1e589) Signed-off-by: Dongjoon Hyun --- sql/core/src/main/scala/org/apache/spark/sql/Column.scala | 2 +- .../test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala | 7 ++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Column.scala b/sql/core/src/main/scala/org/apache/spark/sql/Column.scala index 2144472..e6f7b1d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Column.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Column.scala @@ -1042,7 +1042,7 @@ class Column(val expr: Expression) extends Logging { * @since 2.0.0 */ def name(alias: String): Column = withExpr { -Alias(expr, alias)() +Alias(normalizedExpr(), alias)() } /** diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala index fb58c98..3b3b54f 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala @@ -204,7 +204,7 @@ class DataFrameSelfJoinSuite extends QueryTest with SharedSparkSession { } } - test("SPARK-28344: don't fail as ambiguous self join when there is no join") { + test("SPARK-28344: don't fail if there is no ambiguous self join") { withSQLConf( SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true") { val df = Seq(1, 1, 2, 2).toDF("a") @@ -212,6 +212,11 @@ class DataFrameSelfJoinSuite extends QueryTest with SharedSparkSession { checkAnswer( df.select(df("a").alias("x"), sum(df("a")).over(w)), Seq((1, 2), (1, 2), (2, 4), (2, 4)).map(Row.fromTuple)) + + val joined = df.join(spark.range(1)).select($"a") + checkAnswer( +joined.select(joined("a").alias("x"), sum(joined("a")).over(w)), +Seq((1, 2), (1, 2), (2, 4), (2, 4)).map(Row.fromTuple)) } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (43063e2 -> c400519)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 43063e2 [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column add c400519 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join No new revisions were added by this update. Summary of changes: sql/core/src/main/scala/org/apache/spark/sql/Column.scala | 2 +- .../test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala | 7 ++- 2 files changed, 7 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31956][SQL] Do not fail if there is no ambiguous self join
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 62fbff8 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join 62fbff8 is described below commit 62fbff8ad127f3a6dd2360f3c02a20f4391cdad4 Author: Wenchen Fan AuthorDate: Wed Jun 10 13:11:24 2020 -0700 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/28695 , to fix the problem completely. The root cause is that, `df("col").as("name")` is not a column reference anymore, and should not have the special column metadata. However, this was broken in https://github.com/apache/spark/commit/ba7adc494923de8104ab37d412edd78afe540f45#diff-ac415c903887e49486ba542a65eec980L1050-L1053 This PR fixes the regression, by strip the special column metadata in `Column.name`, which is the behavior before https://github.com/apache/spark/pull/28326 . ### Why are the changes needed? Fix a regression. We shouldn't fail if there is no ambiguous self-join. ### Does this PR introduce _any_ user-facing change? Yes, the query in the test can run now. ### How was this patch tested? updated test Closes #28783 from cloud-fan/self-join. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun (cherry picked from commit c40051932290db3a63f80324900a116019b1e589) Signed-off-by: Dongjoon Hyun --- sql/core/src/main/scala/org/apache/spark/sql/Column.scala | 2 +- .../test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala | 7 ++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Column.scala b/sql/core/src/main/scala/org/apache/spark/sql/Column.scala index 2144472..e6f7b1d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Column.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Column.scala @@ -1042,7 +1042,7 @@ class Column(val expr: Expression) extends Logging { * @since 2.0.0 */ def name(alias: String): Column = withExpr { -Alias(expr, alias)() +Alias(normalizedExpr(), alias)() } /** diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala index fb58c98..3b3b54f 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala @@ -204,7 +204,7 @@ class DataFrameSelfJoinSuite extends QueryTest with SharedSparkSession { } } - test("SPARK-28344: don't fail as ambiguous self join when there is no join") { + test("SPARK-28344: don't fail if there is no ambiguous self join") { withSQLConf( SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true") { val df = Seq(1, 1, 2, 2).toDF("a") @@ -212,6 +212,11 @@ class DataFrameSelfJoinSuite extends QueryTest with SharedSparkSession { checkAnswer( df.select(df("a").alias("x"), sum(df("a")).over(w)), Seq((1, 2), (1, 2), (2, 4), (2, 4)).map(Row.fromTuple)) + + val joined = df.join(spark.range(1)).select($"a") + checkAnswer( +joined.select(joined("a").alias("x"), sum(joined("a")).over(w)), +Seq((1, 2), (1, 2), (2, 4), (2, 4)).map(Row.fromTuple)) } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (43063e2 -> c400519)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 43063e2 [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column add c400519 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join No new revisions were added by this update. Summary of changes: sql/core/src/main/scala/org/apache/spark/sql/Column.scala | 2 +- .../test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala | 7 ++- 2 files changed, 7 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (43063e2 -> c400519)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 43063e2 [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column add c400519 [SPARK-31956][SQL] Do not fail if there is no ambiguous self join No new revisions were added by this update. Summary of changes: sql/core/src/main/scala/org/apache/spark/sql/Column.scala | 2 +- .../test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala | 7 ++- 2 files changed, 7 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (82ff29b -> 43063e2)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 82ff29b [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore add 43063e2 [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column No new revisions were added by this update. Summary of changes: .../catalyst/optimizer/NestedColumnAliasing.scala | 35 +--- .../optimizer/NestedColumnAliasingSuite.scala | 94 ++ .../execution/datasources/SchemaPruningSuite.scala | 71 3 files changed, 190 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (82ff29b -> 43063e2)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 82ff29b [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore add 43063e2 [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column No new revisions were added by this update. Summary of changes: .../catalyst/optimizer/NestedColumnAliasing.scala | 35 +--- .../optimizer/NestedColumnAliasingSuite.scala | 94 ++ .../execution/datasources/SchemaPruningSuite.scala | 71 3 files changed, 190 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (82ff29b -> 43063e2)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 82ff29b [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore add 43063e2 [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column No new revisions were added by this update. Summary of changes: .../catalyst/optimizer/NestedColumnAliasing.scala | 35 +--- .../optimizer/NestedColumnAliasingSuite.scala | 94 ++ .../execution/datasources/SchemaPruningSuite.scala | 71 3 files changed, 190 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 53f1349 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore 53f1349 is described below commit 53f1349e768be66a92542c3ebf0493ffb779ed91 Author: SaurabhChawla AuthorDate: Wed Jun 10 16:51:19 2020 +0900 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore ### What changes were proposed in this pull request? After SPARK-31632 SparkException is thrown from def applicationInfo `def applicationInfo(): v1.ApplicationInfo = { try { // The ApplicationInfo may not be available when Spark is starting up. store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info } catch { case _: NoSuchElementException => throw new SparkException("Failed to get the application information. " + "If you are starting up Spark, please wait a while until it's ready.") } }` Where as the caller for this method def getSparkUser in Spark UI is not handling SparkException in the catch `def getSparkUser: String = { try { Option(store.applicationInfo().attempts.head.sparkUser) .orElse(store.environmentInfo().systemProperties.toMap.get("user.name")) .getOrElse("") } catch { case _: NoSuchElementException => "" } }` So On using this method (getSparkUser )we can get the application erred out. As the part of this PR we will replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore ### Why are the changes needed? On invoking the method getSparkUser, we can get the SparkException on calling store.applicationInfo(). And this is not handled in the catch block and getSparkUser will error out in this scenario ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Done the manual testing using the spark-shell and spark-submit Closes #28768 from SaurabhChawla100/SPARK-31941. Authored-by: SaurabhChawla Signed-off-by: Kousuke Saruta (cherry picked from commit 82ff29be7afa2ff6350310ab9bdf6b474398fdc1) Signed-off-by: Kousuke Saruta --- core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala index e2086d6..8919dab 100644 --- a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala +++ b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala @@ -40,7 +40,7 @@ private[spark] class AppStatusStore( store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info } catch { case _: NoSuchElementException => -throw new SparkException("Failed to get the application information. " + +throw new NoSuchElementException("Failed to get the application information. " + "If you are starting up Spark, please wait a while until it's ready.") } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (89b1d46 -> 9ba9d85)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 89b1d46 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list add 9ba9d85 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 53f1349 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore 53f1349 is described below commit 53f1349e768be66a92542c3ebf0493ffb779ed91 Author: SaurabhChawla AuthorDate: Wed Jun 10 16:51:19 2020 +0900 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore ### What changes were proposed in this pull request? After SPARK-31632 SparkException is thrown from def applicationInfo `def applicationInfo(): v1.ApplicationInfo = { try { // The ApplicationInfo may not be available when Spark is starting up. store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info } catch { case _: NoSuchElementException => throw new SparkException("Failed to get the application information. " + "If you are starting up Spark, please wait a while until it's ready.") } }` Where as the caller for this method def getSparkUser in Spark UI is not handling SparkException in the catch `def getSparkUser: String = { try { Option(store.applicationInfo().attempts.head.sparkUser) .orElse(store.environmentInfo().systemProperties.toMap.get("user.name")) .getOrElse("") } catch { case _: NoSuchElementException => "" } }` So On using this method (getSparkUser )we can get the application erred out. As the part of this PR we will replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore ### Why are the changes needed? On invoking the method getSparkUser, we can get the SparkException on calling store.applicationInfo(). And this is not handled in the catch block and getSparkUser will error out in this scenario ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Done the manual testing using the spark-shell and spark-submit Closes #28768 from SaurabhChawla100/SPARK-31941. Authored-by: SaurabhChawla Signed-off-by: Kousuke Saruta (cherry picked from commit 82ff29be7afa2ff6350310ab9bdf6b474398fdc1) Signed-off-by: Kousuke Saruta --- core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala index e2086d6..8919dab 100644 --- a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala +++ b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala @@ -40,7 +40,7 @@ private[spark] class AppStatusStore( store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info } catch { case _: NoSuchElementException => -throw new SparkException("Failed to get the application information. " + +throw new NoSuchElementException("Failed to get the application information. " + "If you are starting up Spark, please wait a while until it's ready.") } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (89b1d46 -> 9ba9d85)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 89b1d46 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list add 9ba9d85 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8490eab -> 82ff29b)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8490eab [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry for config "spark.standalone.submit.waitAppCompletion" add 82ff29b [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 53f1349 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore 53f1349 is described below commit 53f1349e768be66a92542c3ebf0493ffb779ed91 Author: SaurabhChawla AuthorDate: Wed Jun 10 16:51:19 2020 +0900 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore ### What changes were proposed in this pull request? After SPARK-31632 SparkException is thrown from def applicationInfo `def applicationInfo(): v1.ApplicationInfo = { try { // The ApplicationInfo may not be available when Spark is starting up. store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info } catch { case _: NoSuchElementException => throw new SparkException("Failed to get the application information. " + "If you are starting up Spark, please wait a while until it's ready.") } }` Where as the caller for this method def getSparkUser in Spark UI is not handling SparkException in the catch `def getSparkUser: String = { try { Option(store.applicationInfo().attempts.head.sparkUser) .orElse(store.environmentInfo().systemProperties.toMap.get("user.name")) .getOrElse("") } catch { case _: NoSuchElementException => "" } }` So On using this method (getSparkUser )we can get the application erred out. As the part of this PR we will replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore ### Why are the changes needed? On invoking the method getSparkUser, we can get the SparkException on calling store.applicationInfo(). And this is not handled in the catch block and getSparkUser will error out in this scenario ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Done the manual testing using the spark-shell and spark-submit Closes #28768 from SaurabhChawla100/SPARK-31941. Authored-by: SaurabhChawla Signed-off-by: Kousuke Saruta (cherry picked from commit 82ff29be7afa2ff6350310ab9bdf6b474398fdc1) Signed-off-by: Kousuke Saruta --- core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala index e2086d6..8919dab 100644 --- a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala +++ b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala @@ -40,7 +40,7 @@ private[spark] class AppStatusStore( store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info } catch { case _: NoSuchElementException => -throw new SparkException("Failed to get the application information. " + +throw new NoSuchElementException("Failed to get the application information. " + "If you are starting up Spark, please wait a while until it's ready.") } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (89b1d46 -> 9ba9d85)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 89b1d46 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list add 9ba9d85 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8490eab -> 82ff29b)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8490eab [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry for config "spark.standalone.submit.waitAppCompletion" add 82ff29b [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 53f1349 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore 53f1349 is described below commit 53f1349e768be66a92542c3ebf0493ffb779ed91 Author: SaurabhChawla AuthorDate: Wed Jun 10 16:51:19 2020 +0900 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore ### What changes were proposed in this pull request? After SPARK-31632 SparkException is thrown from def applicationInfo `def applicationInfo(): v1.ApplicationInfo = { try { // The ApplicationInfo may not be available when Spark is starting up. store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info } catch { case _: NoSuchElementException => throw new SparkException("Failed to get the application information. " + "If you are starting up Spark, please wait a while until it's ready.") } }` Where as the caller for this method def getSparkUser in Spark UI is not handling SparkException in the catch `def getSparkUser: String = { try { Option(store.applicationInfo().attempts.head.sparkUser) .orElse(store.environmentInfo().systemProperties.toMap.get("user.name")) .getOrElse("") } catch { case _: NoSuchElementException => "" } }` So On using this method (getSparkUser )we can get the application erred out. As the part of this PR we will replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore ### Why are the changes needed? On invoking the method getSparkUser, we can get the SparkException on calling store.applicationInfo(). And this is not handled in the catch block and getSparkUser will error out in this scenario ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Done the manual testing using the spark-shell and spark-submit Closes #28768 from SaurabhChawla100/SPARK-31941. Authored-by: SaurabhChawla Signed-off-by: Kousuke Saruta (cherry picked from commit 82ff29be7afa2ff6350310ab9bdf6b474398fdc1) Signed-off-by: Kousuke Saruta --- core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala index e2086d6..8919dab 100644 --- a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala +++ b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala @@ -40,7 +40,7 @@ private[spark] class AppStatusStore( store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info } catch { case _: NoSuchElementException => -throw new SparkException("Failed to get the application information. " + +throw new NoSuchElementException("Failed to get the application information. " + "If you are starting up Spark, please wait a while until it's ready.") } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (89b1d46 -> 9ba9d85)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 89b1d46 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list add 9ba9d85 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8490eab -> 82ff29b)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8490eab [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry for config "spark.standalone.submit.waitAppCompletion" add 82ff29b [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (032d179 -> 8490eab)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 032d179 [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function add 8490eab [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry for config "spark.standalone.submit.waitAppCompletion" No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/deploy/Client.scala | 3 +-- .../main/scala/org/apache/spark/internal/config/package.scala| 9 + 2 files changed, 10 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 53f1349 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore 53f1349 is described below commit 53f1349e768be66a92542c3ebf0493ffb779ed91 Author: SaurabhChawla AuthorDate: Wed Jun 10 16:51:19 2020 +0900 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore ### What changes were proposed in this pull request? After SPARK-31632 SparkException is thrown from def applicationInfo `def applicationInfo(): v1.ApplicationInfo = { try { // The ApplicationInfo may not be available when Spark is starting up. store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info } catch { case _: NoSuchElementException => throw new SparkException("Failed to get the application information. " + "If you are starting up Spark, please wait a while until it's ready.") } }` Where as the caller for this method def getSparkUser in Spark UI is not handling SparkException in the catch `def getSparkUser: String = { try { Option(store.applicationInfo().attempts.head.sparkUser) .orElse(store.environmentInfo().systemProperties.toMap.get("user.name")) .getOrElse("") } catch { case _: NoSuchElementException => "" } }` So On using this method (getSparkUser )we can get the application erred out. As the part of this PR we will replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore ### Why are the changes needed? On invoking the method getSparkUser, we can get the SparkException on calling store.applicationInfo(). And this is not handled in the catch block and getSparkUser will error out in this scenario ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Done the manual testing using the spark-shell and spark-submit Closes #28768 from SaurabhChawla100/SPARK-31941. Authored-by: SaurabhChawla Signed-off-by: Kousuke Saruta (cherry picked from commit 82ff29be7afa2ff6350310ab9bdf6b474398fdc1) Signed-off-by: Kousuke Saruta --- core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala index e2086d6..8919dab 100644 --- a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala +++ b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala @@ -40,7 +40,7 @@ private[spark] class AppStatusStore( store.view(classOf[ApplicationInfoWrapper]).max(1).iterator().next().info } catch { case _: NoSuchElementException => -throw new SparkException("Failed to get the application information. " + +throw new NoSuchElementException("Failed to get the application information. " + "If you are starting up Spark, please wait a while until it's ready.") } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (89b1d46 -> 9ba9d85)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 89b1d46 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list add 9ba9d85 [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8490eab -> 82ff29b)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8490eab [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry for config "spark.standalone.submit.waitAppCompletion" add 82ff29b [SPARK-31941][CORE] Replace SparkException to NoSuchElementException for applicationInfo in AppStatusStore No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (032d179 -> 8490eab)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 032d179 [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function add 8490eab [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry for config "spark.standalone.submit.waitAppCompletion" No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/deploy/Client.scala | 3 +-- .../main/scala/org/apache/spark/internal/config/package.scala| 9 + 2 files changed, 10 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e14029b -> 032d179)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e14029b [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list add 032d179 [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/api/python/PythonRDD.scala| 16 ++-- .../scala/org/apache/spark/api/python/PythonRunner.scala | 2 +- python/pyspark/sql/tests/test_udf.py | 9 + .../spark/sql/execution/python/PythonUDFRunner.scala | 2 +- 4 files changed, 25 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (032d179 -> 8490eab)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 032d179 [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function add 8490eab [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry for config "spark.standalone.submit.waitAppCompletion" No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/deploy/Client.scala | 3 +-- .../main/scala/org/apache/spark/internal/config/package.scala| 9 + 2 files changed, 10 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e14029b -> 032d179)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e14029b [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list add 032d179 [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/api/python/PythonRDD.scala| 16 ++-- .../scala/org/apache/spark/api/python/PythonRunner.scala | 2 +- python/pyspark/sql/tests/test_udf.py | 9 + .../spark/sql/execution/python/PythonUDFRunner.scala | 2 +- 4 files changed, 25 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (032d179 -> 8490eab)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 032d179 [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function add 8490eab [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry for config "spark.standalone.submit.waitAppCompletion" No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/deploy/Client.scala | 3 +-- .../main/scala/org/apache/spark/internal/config/package.scala| 9 + 2 files changed, 10 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e14029b -> 032d179)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e14029b [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list add 032d179 [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/api/python/PythonRDD.scala| 16 ++-- .../scala/org/apache/spark/api/python/PythonRunner.scala | 2 +- python/pyspark/sql/tests/test_udf.py | 9 + .../spark/sql/execution/python/PythonUDFRunner.scala | 2 +- 4 files changed, 25 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (032d179 -> 8490eab)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 032d179 [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function add 8490eab [SPARK-31486][CORE][FOLLOW-UP] Use ConfigEntry for config "spark.standalone.submit.waitAppCompletion" No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/deploy/Client.scala | 3 +-- .../main/scala/org/apache/spark/internal/config/package.scala| 9 + 2 files changed, 10 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e14029b -> 032d179)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e14029b [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list add 032d179 [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/api/python/PythonRDD.scala| 16 ++-- .../scala/org/apache/spark/api/python/PythonRunner.scala | 2 +- python/pyspark/sql/tests/test_udf.py | 9 + .../spark/sql/execution/python/PythonUDFRunner.scala | 2 +- 4 files changed, 25 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e14029b -> 032d179)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e14029b [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list add 032d179 [SPARK-31945][SQL][PYSPARK] Enable cache for the same Python function No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/api/python/PythonRDD.scala| 16 ++-- .../scala/org/apache/spark/api/python/PythonRunner.scala | 2 +- python/pyspark/sql/tests/test_udf.py | 9 + .../spark/sql/execution/python/PythonUDFRunner.scala | 2 +- 4 files changed, 25 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (4b625bd -> 89b1d46)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 4b625bd [SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber add 89b1d46 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list No new revisions were added by this update. Summary of changes: .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (4b625bd -> 89b1d46)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 4b625bd [SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber add 89b1d46 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list No new revisions were added by this update. Summary of changes: .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f3771c6 -> e14029b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3771c6 [SPARK-31935][SQL] Hadoop file system config should be effective in data source options add e14029b [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list No new revisions were added by this update. Summary of changes: .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 89b1d46 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list 89b1d46 is described below commit 89b1d4614ef1a3d15ff0f1e745c770ebd8f5cddb Author: Takeshi Yamamuro AuthorDate: Wed Jun 10 16:29:43 2020 +0900 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list ### What changes were proposed in this pull request? This PR intends to add `TYPE` in the ANSI non-reserved list because it is not reserved in the standard. See SPARK-26905 for a full set of the reserved/non-reserved keywords of `SQL:2016`. Note: The current master behaviour is as follows; ``` scala> sql("SET spark.sql.ansi.enabled=false") scala> sql("create table t1 (type int)") res4: org.apache.spark.sql.DataFrame = [] scala> sql("SET spark.sql.ansi.enabled=true") scala> sql("create table t2 (type int)") org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'type'(line 1, pos 17) == SQL == create table t2 (type int) -^^^ ``` ### Why are the changes needed? To follow the ANSI/SQL standard. ### Does this PR introduce _any_ user-facing change? Makes users use `TYPE` as identifiers. ### How was this patch tested? Update the keyword lists in `TableIdentifierParserSuite`. Closes #28773 from maropu/SPARK-26905. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit e14029b18df10db5094f8abf8b9874dbc9186b4e) Signed-off-by: Takeshi Yamamuro --- .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 2adaa9f..208a503 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -1153,6 +1153,7 @@ ansiNonReserved | TRIM | TRUE | TRUNCATE +| TYPE | UNARCHIVE | UNBOUNDED | UNCACHE diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala index d5b0885..bd617bf 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala @@ -513,6 +513,7 @@ class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { "transform", "true", "truncate", +"type", "unarchive", "unbounded", "uncache", - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org