[spark] branch branch-3.2 updated: [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 864b89b [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5 864b89b is described below commit 864b89bbea8c2620938b28661fc8bea6271c7048 Author: Hyukjin Kwon AuthorDate: Fri Mar 18 14:00:48 2022 +0900 [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5 This PR is a retry of https://github.com/apache/spark/pull/35871 with bumping up the version to 0.10.9.5. It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J. In Py4J 0.10.9.5, the issue was fixed (https://github.com/py4j/py4j/pull/475), and it added Python 3.10 support officially with CI set up (https://github.com/py4j/py4j/pull/477). See https://github.com/apache/spark/pull/35871 See https://github.com/apache/spark/pull/35871 Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch: ```bash ./bin/pyspark ``` ``` import py4j py4j.__version__ spark.range(10).show() ``` ``` Using Python version 3.10.0 (default, Mar 3 2022 03:57:21) Spark context Web UI available at http://172.30.5.50:4040 Spark context available as 'sc' (master = local[*], app id = local-1647571387534). SparkSession available as 'spark'. >>> import py4j >>> py4j.__version__ '0.10.9.5' >>> spark.range(10).show() +---+ | id| +---+ ... ``` Closes #35907 from HyukjinKwon/SPARK-38563-followup. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit 97335ea037a9a036c013c86ef62d74ca638f808e) Signed-off-by: Hyukjin Kwon --- bin/pyspark| 2 +- bin/pyspark2.cmd | 2 +- core/pom.xml | 2 +- .../org/apache/spark/api/python/PythonUtils.scala | 2 +- dev/deps/spark-deps-hadoop-2.7-hive-2.3| 2 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3| 2 +- docs/job-scheduling.md | 2 +- python/docs/Makefile | 2 +- python/docs/make2.bat | 2 +- python/docs/source/getting_started/install.rst | 2 +- python/lib/py4j-0.10.9.3-src.zip | Bin 42021 -> 0 bytes python/lib/py4j-0.10.9.5-src.zip | Bin 0 -> 42404 bytes python/pyspark/context.py | 6 ++-- python/pyspark/util.py | 32 +++-- python/setup.py| 2 +- sbin/spark-config.sh | 2 +- 16 files changed, 19 insertions(+), 43 deletions(-) diff --git a/bin/pyspark b/bin/pyspark index 4840589..21a514e 100755 --- a/bin/pyspark +++ b/bin/pyspark @@ -50,7 +50,7 @@ export PYSPARK_DRIVER_PYTHON_OPTS # Add the PySpark classes to the Python path: export PYTHONPATH="${SPARK_HOME}/python/:$PYTHONPATH" -export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.3-src.zip:$PYTHONPATH" +export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.5-src.zip:$PYTHONPATH" # Load the PySpark shell.py script when ./pyspark is used interactively: export OLD_PYTHONSTARTUP="$PYTHONSTARTUP" diff --git a/bin/pyspark2.cmd b/bin/pyspark2.cmd index a19627a..eec02a4 100644 --- a/bin/pyspark2.cmd +++ b/bin/pyspark2.cmd @@ -30,7 +30,7 @@ if "x%PYSPARK_DRIVER_PYTHON%"=="x" ( ) set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH% -set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.10.9.3-src.zip;%PYTHONPATH% +set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.10.9.5-src.zip;%PYTHONPATH% set OLD_PYTHONSTARTUP=%PYTHONSTARTUP% set PYTHONSTARTUP=%SPARK_HOME%\python\pyspark\shell.py diff --git a/core/pom.xml b/core/pom.xml index 3833794..4926664 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -433,7 +433,7 @@ net.sf.py4j py4j - 0.10.9.3 + 0.10.9.5 org.apache.spark diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala index 8daba86..6336171 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala @@ -27,7 +27,7 @@ import org.apache.spark.SparkContext import org.apache.spark.api.java.{JavaRDD, JavaSparkContext} private[spark] object PythonUtils { - val PY4J_ZIP_NAME = "py4j-0.10.9.3-src.zip" + val PY4J_ZIP_NAME = "py4j-0.10.9.5-src.zip" /** Get the PYTHONPATH for PySpark, either from SPARK_HOME, if it is set, or from our JAR */ def sparkPythonPath: String = { diff --git
[spark] branch branch-3.3 updated: [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 7bb1d6f [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5 7bb1d6f is described below commit 7bb1d6f01148b037acad12de8166cf742cd30ea3 Author: Hyukjin Kwon AuthorDate: Fri Mar 18 14:00:48 2022 +0900 [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5 ### What changes were proposed in this pull request? This PR is a retry of https://github.com/apache/spark/pull/35871 with bumping up the version to 0.10.9.5. It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J. In Py4J 0.10.9.5, the issue was fixed (https://github.com/py4j/py4j/pull/475), and it added Python 3.10 support officially with CI set up (https://github.com/py4j/py4j/pull/477). ### Why are the changes needed? See https://github.com/apache/spark/pull/35871 ### Does this PR introduce _any_ user-facing change? See https://github.com/apache/spark/pull/35871 ### How was this patch tested? Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch: ```bash ./bin/pyspark ``` ``` import py4j py4j.__version__ spark.range(10).show() ``` ``` Using Python version 3.10.0 (default, Mar 3 2022 03:57:21) Spark context Web UI available at http://172.30.5.50:4040 Spark context available as 'sc' (master = local[*], app id = local-1647571387534). SparkSession available as 'spark'. >>> import py4j >>> py4j.__version__ '0.10.9.5' >>> spark.range(10).show() +---+ | id| +---+ ... ``` Closes #35907 from HyukjinKwon/SPARK-38563-followup. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit 97335ea037a9a036c013c86ef62d74ca638f808e) Signed-off-by: Hyukjin Kwon --- bin/pyspark| 2 +- bin/pyspark2.cmd | 2 +- core/pom.xml | 2 +- .../org/apache/spark/api/python/PythonUtils.scala | 2 +- dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- docs/job-scheduling.md | 2 +- python/docs/Makefile | 2 +- python/docs/make2.bat | 2 +- python/docs/source/getting_started/install.rst | 2 +- python/lib/py4j-0.10.9.3-src.zip | Bin 42021 -> 0 bytes python/lib/py4j-0.10.9.5-src.zip | Bin 0 -> 42404 bytes python/pyspark/context.py | 6 ++-- python/pyspark/util.py | 35 +++-- python/setup.py| 2 +- sbin/spark-config.sh | 2 +- 16 files changed, 20 insertions(+), 45 deletions(-) diff --git a/bin/pyspark b/bin/pyspark index 4840589..21a514e 100755 --- a/bin/pyspark +++ b/bin/pyspark @@ -50,7 +50,7 @@ export PYSPARK_DRIVER_PYTHON_OPTS # Add the PySpark classes to the Python path: export PYTHONPATH="${SPARK_HOME}/python/:$PYTHONPATH" -export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.3-src.zip:$PYTHONPATH" +export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.5-src.zip:$PYTHONPATH" # Load the PySpark shell.py script when ./pyspark is used interactively: export OLD_PYTHONSTARTUP="$PYTHONSTARTUP" diff --git a/bin/pyspark2.cmd b/bin/pyspark2.cmd index a19627a..eec02a4 100644 --- a/bin/pyspark2.cmd +++ b/bin/pyspark2.cmd @@ -30,7 +30,7 @@ if "x%PYSPARK_DRIVER_PYTHON%"=="x" ( ) set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH% -set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.10.9.3-src.zip;%PYTHONPATH% +set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.10.9.5-src.zip;%PYTHONPATH% set OLD_PYTHONSTARTUP=%PYTHONSTARTUP% set PYTHONSTARTUP=%SPARK_HOME%\python\pyspark\shell.py diff --git a/core/pom.xml b/core/pom.xml index 9d3b170..a753a59 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -423,7 +423,7 @@ net.sf.py4j py4j - 0.10.9.3 + 0.10.9.5 org.apache.spark diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala index 8daba86..6336171 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala @@ -27,7 +27,7 @@ import org.apache.spark.SparkContext import org.apache.spark.api.java.{JavaRDD, JavaSparkContext} private[spark] object PythonUtils { - val PY4J_ZIP_NAME = "py4j-0.10.9.3-src.zip" + val
[spark] branch master updated (f36a5fb -> 97335ea)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f36a5fb [SPARK-38194] Followup: Fix k8s memory overhead passing to executor pods add 97335ea [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5 No new revisions were added by this update. Summary of changes: bin/pyspark| 2 +- bin/pyspark2.cmd | 2 +- core/pom.xml | 2 +- .../org/apache/spark/api/python/PythonUtils.scala | 2 +- dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- docs/job-scheduling.md | 2 +- python/docs/Makefile | 2 +- python/docs/make2.bat | 2 +- python/docs/source/getting_started/install.rst | 2 +- python/lib/py4j-0.10.9.3-src.zip | Bin 42021 -> 0 bytes python/lib/py4j-0.10.9.5-src.zip | Bin 0 -> 42404 bytes python/pyspark/context.py | 6 ++-- python/pyspark/util.py | 35 +++-- python/setup.py| 2 +- sbin/spark-config.sh | 2 +- 16 files changed, 20 insertions(+), 45 deletions(-) delete mode 100644 python/lib/py4j-0.10.9.3-src.zip create mode 100644 python/lib/py4j-0.10.9.5-src.zip - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4"
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 8b73b72 Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4" 8b73b72 is described below commit 8b73b72770996ae4a81f092bddb01f1c26346efd Author: Hyukjin Kwon AuthorDate: Fri Mar 18 10:22:55 2022 +0900 Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4" This reverts commit 69903200845b68a0474ecb0a3317dc744490c521. --- bin/pyspark| 2 +- bin/pyspark2.cmd | 2 +- core/pom.xml | 2 +- .../org/apache/spark/api/python/PythonUtils.scala | 2 +- dev/deps/spark-deps-hadoop-2.7-hive-2.3| 2 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3| 2 +- docs/job-scheduling.md | 2 +- python/docs/Makefile | 2 +- python/docs/make2.bat | 2 +- python/docs/source/getting_started/install.rst | 2 +- python/lib/py4j-0.10.9.3-src.zip | Bin 0 -> 42021 bytes python/lib/py4j-0.10.9.4-src.zip | Bin 42404 -> 0 bytes python/pyspark/context.py | 6 ++-- python/pyspark/util.py | 33 + python/setup.py| 2 +- sbin/spark-config.sh | 2 +- 16 files changed, 43 insertions(+), 20 deletions(-) diff --git a/bin/pyspark b/bin/pyspark index 1e16c56..4840589 100755 --- a/bin/pyspark +++ b/bin/pyspark @@ -50,7 +50,7 @@ export PYSPARK_DRIVER_PYTHON_OPTS # Add the PySpark classes to the Python path: export PYTHONPATH="${SPARK_HOME}/python/:$PYTHONPATH" -export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.4-src.zip:$PYTHONPATH" +export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.3-src.zip:$PYTHONPATH" # Load the PySpark shell.py script when ./pyspark is used interactively: export OLD_PYTHONSTARTUP="$PYTHONSTARTUP" diff --git a/bin/pyspark2.cmd b/bin/pyspark2.cmd index f20c320..a19627a 100644 --- a/bin/pyspark2.cmd +++ b/bin/pyspark2.cmd @@ -30,7 +30,7 @@ if "x%PYSPARK_DRIVER_PYTHON%"=="x" ( ) set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH% -set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.10.9.4-src.zip;%PYTHONPATH% +set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.10.9.3-src.zip;%PYTHONPATH% set OLD_PYTHONSTARTUP=%PYTHONSTARTUP% set PYTHONSTARTUP=%SPARK_HOME%\python\pyspark\shell.py diff --git a/core/pom.xml b/core/pom.xml index 94b3e58..3833794 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -433,7 +433,7 @@ net.sf.py4j py4j - 0.10.9.4 + 0.10.9.3 org.apache.spark diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala index a9c35369..8daba86 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala @@ -27,7 +27,7 @@ import org.apache.spark.SparkContext import org.apache.spark.api.java.{JavaRDD, JavaSparkContext} private[spark] object PythonUtils { - val PY4J_ZIP_NAME = "py4j-0.10.9.4-src.zip" + val PY4J_ZIP_NAME = "py4j-0.10.9.3-src.zip" /** Get the PYTHONPATH for PySpark, either from SPARK_HOME, if it is set, or from our JAR */ def sparkPythonPath: String = { diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 index 742710e..c2882bd 100644 --- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 @@ -208,7 +208,7 @@ parquet-format-structures/1.12.2//parquet-format-structures-1.12.2.jar parquet-hadoop/1.12.2//parquet-hadoop-1.12.2.jar parquet-jackson/1.12.2//parquet-jackson-1.12.2.jar protobuf-java/2.5.0//protobuf-java-2.5.0.jar -py4j/0.10.9.4//py4j-0.10.9.4.jar +py4j/0.10.9.3//py4j-0.10.9.3.jar pyrolite/4.30//pyrolite-4.30.jar rocksdbjni/6.20.3//rocksdbjni-6.20.3.jar scala-collection-compat_2.12/2.1.1//scala-collection-compat_2.12-2.1.1.jar diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 index ce0dc17..be4c7b8 100644 --- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 @@ -179,7 +179,7 @@ parquet-format-structures/1.12.2//parquet-format-structures-1.12.2.jar parquet-hadoop/1.12.2//parquet-hadoop-1.12.2.jar parquet-jackson/1.12.2//parquet-jackson-1.12.2.jar protobuf-java/2.5.0//protobuf-java-2.5.0.jar -py4j/0.10.9.4//py4j-0.10.9.4.jar +py4j/0.10.9.3//py4j-0.10.9.3.jar pyrolite/4.30//pyrolite-4.30.jar rocksdbjni/6.20.3//rocksdbjni-6.20.3.jar
[spark] branch branch-3.3 updated: Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new ef8773a Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4" ef8773a is described below commit ef8773ad5e90738a00408de87d2fe8566dc4acdc Author: Dongjoon Hyun AuthorDate: Thu Mar 17 17:58:40 2022 -0700 Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4" ### What changes were proposed in this pull request? This reverts commit 3bbf346d9ca984faa0c3e67cd1387a13b2bd1e37 from branch-3.3 to recover Apache Spark 3.3 on Python 3.10. ### Why are the changes needed? Py4J 0.10.9.4 has a regression which doesn't support Python 3.10. ### Does this PR introduce _any_ user-facing change? No. This is not released yet. ### How was this patch tested? Python UT with Python 3.10. Closes #35904 from dongjoon-hyun/SPARK-38563-3.3. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- bin/pyspark| 2 +- bin/pyspark2.cmd | 2 +- core/pom.xml | 2 +- .../org/apache/spark/api/python/PythonUtils.scala | 2 +- dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- docs/job-scheduling.md | 2 +- python/docs/Makefile | 2 +- python/docs/make2.bat | 2 +- python/docs/source/getting_started/install.rst | 2 +- python/lib/py4j-0.10.9.3-src.zip | Bin 0 -> 42021 bytes python/lib/py4j-0.10.9.4-src.zip | Bin 42404 -> 0 bytes python/pyspark/context.py | 6 ++-- python/pyspark/util.py | 35 ++--- python/setup.py| 2 +- sbin/spark-config.sh | 2 +- 16 files changed, 45 insertions(+), 20 deletions(-) diff --git a/bin/pyspark b/bin/pyspark index 1e16c56..4840589 100755 --- a/bin/pyspark +++ b/bin/pyspark @@ -50,7 +50,7 @@ export PYSPARK_DRIVER_PYTHON_OPTS # Add the PySpark classes to the Python path: export PYTHONPATH="${SPARK_HOME}/python/:$PYTHONPATH" -export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.4-src.zip:$PYTHONPATH" +export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.3-src.zip:$PYTHONPATH" # Load the PySpark shell.py script when ./pyspark is used interactively: export OLD_PYTHONSTARTUP="$PYTHONSTARTUP" diff --git a/bin/pyspark2.cmd b/bin/pyspark2.cmd index f20c320..a19627a 100644 --- a/bin/pyspark2.cmd +++ b/bin/pyspark2.cmd @@ -30,7 +30,7 @@ if "x%PYSPARK_DRIVER_PYTHON%"=="x" ( ) set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH% -set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.10.9.4-src.zip;%PYTHONPATH% +set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.10.9.3-src.zip;%PYTHONPATH% set OLD_PYTHONSTARTUP=%PYTHONSTARTUP% set PYTHONSTARTUP=%SPARK_HOME%\python\pyspark\shell.py diff --git a/core/pom.xml b/core/pom.xml index 953c76b..9d3b170 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -423,7 +423,7 @@ net.sf.py4j py4j - 0.10.9.4 + 0.10.9.3 org.apache.spark diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala index a9c35369..8daba86 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala @@ -27,7 +27,7 @@ import org.apache.spark.SparkContext import org.apache.spark.api.java.{JavaRDD, JavaSparkContext} private[spark] object PythonUtils { - val PY4J_ZIP_NAME = "py4j-0.10.9.4-src.zip" + val PY4J_ZIP_NAME = "py4j-0.10.9.3-src.zip" /** Get the PYTHONPATH for PySpark, either from SPARK_HOME, if it is set, or from our JAR */ def sparkPythonPath: String = { diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 b/dev/deps/spark-deps-hadoop-2-hive-2.3 index f2db663..bcbf8b9 100644 --- a/dev/deps/spark-deps-hadoop-2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3 @@ -233,7 +233,7 @@ parquet-hadoop/1.12.2//parquet-hadoop-1.12.2.jar parquet-jackson/1.12.2//parquet-jackson-1.12.2.jar pickle/1.2//pickle-1.2.jar protobuf-java/2.5.0//protobuf-java-2.5.0.jar -py4j/0.10.9.4//py4j-0.10.9.4.jar +py4j/0.10.9.3//py4j-0.10.9.3.jar remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar rocksdbjni/6.20.3//rocksdbjni-6.20.3.jar scala-collection-compat_2.12/2.1.1//scala-collection-compat_2.12-2.1.1.jar diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index c56b4c9..8ca7880 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3
[spark] branch master updated: [SPARK-38194] Followup: Fix k8s memory overhead passing to executor pods
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f36a5fb [SPARK-38194] Followup: Fix k8s memory overhead passing to executor pods f36a5fb is described below commit f36a5fb2b88620c1c490d087b0293c4e58d29979 Author: Adam Binford - Customer Site (Virginia) - CW 121796 AuthorDate: Thu Mar 17 18:32:29 2022 -0500 [SPARK-38194] Followup: Fix k8s memory overhead passing to executor pods ### What changes were proposed in this pull request? Follow up to https://github.com/apache/spark/pull/35504 to fix k8s memory overhead handling. ### Why are the changes needed? https://github.com/apache/spark/pull/35504 introduced a bug only caught by the K8S integration tests. ### Does this PR introduce _any_ user-facing change? Fix back to old behavior. ### How was this patch tested? See if IT passes Closes #35901 from Kimahriman/k8s-memory-overhead-executors. Authored-by: Adam Binford - Customer Site (Virginia) - CW 121796 Signed-off-by: Thomas Graves --- .../k8s/features/BasicDriverFeatureStep.scala | 30 +++- .../k8s/features/BasicDriverFeatureStepSuite.scala | 57 +- 2 files changed, 63 insertions(+), 24 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala index 9715149..413f5bc 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala @@ -53,28 +53,32 @@ private[spark] class BasicDriverFeatureStep(conf: KubernetesDriverConf) // Memory settings private val driverMemoryMiB = conf.get(DRIVER_MEMORY) - private val memoryOverheadFactor = if (conf.contains(DRIVER_MEMORY_OVERHEAD_FACTOR)) { -conf.get(DRIVER_MEMORY_OVERHEAD_FACTOR) - } else { -conf.get(MEMORY_OVERHEAD_FACTOR) - } - // The memory overhead factor to use. If the user has not set it, then use a different - // value for non-JVM apps. This value is propagated to executors. - private val overheadFactor = + // The default memory overhead factor to use, derived from the deprecated + // `spark.kubernetes.memoryOverheadFactor` config or the default overhead values. + // If the user has not set it, then use a different default for non-JVM apps. This value is + // propagated to executors and used if the executor overhead factor is not set explicitly. + private val defaultOverheadFactor = if (conf.mainAppResource.isInstanceOf[NonJVMResource]) { - if (conf.contains(MEMORY_OVERHEAD_FACTOR) || conf.contains(DRIVER_MEMORY_OVERHEAD_FACTOR)) { -memoryOverheadFactor + if (conf.contains(MEMORY_OVERHEAD_FACTOR)) { +conf.get(MEMORY_OVERHEAD_FACTOR) } else { NON_JVM_MEMORY_OVERHEAD_FACTOR } } else { - memoryOverheadFactor + conf.get(MEMORY_OVERHEAD_FACTOR) } + // Prefer the driver memory overhead factor if set explicitly + private val memoryOverheadFactor = if (conf.contains(DRIVER_MEMORY_OVERHEAD_FACTOR)) { +conf.get(DRIVER_MEMORY_OVERHEAD_FACTOR) + } else { +defaultOverheadFactor + } + private val memoryOverheadMiB = conf .get(DRIVER_MEMORY_OVERHEAD) -.getOrElse(math.max((overheadFactor * driverMemoryMiB).toInt, +.getOrElse(math.max((memoryOverheadFactor * driverMemoryMiB).toInt, ResourceProfile.MEMORY_OVERHEAD_MIN_MIB)) private val driverMemoryWithOverheadMiB = driverMemoryMiB + memoryOverheadMiB @@ -169,7 +173,7 @@ private[spark] class BasicDriverFeatureStep(conf: KubernetesDriverConf) KUBERNETES_DRIVER_POD_NAME.key -> driverPodName, "spark.app.id" -> conf.appId, KUBERNETES_DRIVER_SUBMIT_CHECK.key -> "true", - DRIVER_MEMORY_OVERHEAD_FACTOR.key -> overheadFactor.toString) + MEMORY_OVERHEAD_FACTOR.key -> defaultOverheadFactor.toString) // try upload local, resolvable files to a hadoop compatible file system Seq(JARS, FILES, ARCHIVES, SUBMIT_PYTHON_FILES).foreach { key => val uris = conf.get(key).filter(uri => KubernetesUtils.isLocalAndResolvable(uri)) diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala index d45f5f9..9a3b06a 100644 ---
[spark] branch master updated (2d1d18a -> 54fdb88)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2d1d18a [SPARK-37425][PYTHON] Inline type hints for python/pyspark/mllib/recommendation.py add 54fdb88 Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4" No new revisions were added by this update. Summary of changes: bin/pyspark| 2 +- bin/pyspark2.cmd | 2 +- core/pom.xml | 2 +- .../org/apache/spark/api/python/PythonUtils.scala | 2 +- dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- docs/job-scheduling.md | 2 +- python/docs/Makefile | 2 +- python/docs/make2.bat | 2 +- python/docs/source/getting_started/install.rst | 2 +- python/lib/py4j-0.10.9.3-src.zip | Bin 0 -> 42021 bytes python/lib/py4j-0.10.9.4-src.zip | Bin 42404 -> 0 bytes python/pyspark/context.py | 6 ++-- python/pyspark/util.py | 35 ++--- python/setup.py| 2 +- sbin/spark-config.sh | 2 +- 16 files changed, 45 insertions(+), 20 deletions(-) create mode 100644 python/lib/py4j-0.10.9.3-src.zip delete mode 100644 python/lib/py4j-0.10.9.4-src.zip - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37425][PYTHON] Inline type hints for python/pyspark/mllib/recommendation.py
This is an automated email from the ASF dual-hosted git repository. zero323 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2d1d18a [SPARK-37425][PYTHON] Inline type hints for python/pyspark/mllib/recommendation.py 2d1d18a is described below commit 2d1d18ab4268eb5ba2ff22f74a5b3b85988d65b4 Author: dch nguyen AuthorDate: Fri Mar 18 00:14:04 2022 +0100 [SPARK-37425][PYTHON] Inline type hints for python/pyspark/mllib/recommendation.py ### What changes were proposed in this pull request? Inline type hints for recommendation in python/pyspark/mllib/ ### Why are the changes needed? We can take advantage of static type checking within the functions by inlining the type hints. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes #35766 from dchvn/SPARK-37425. Authored-by: dch nguyen Signed-off-by: zero323 --- python/pyspark/mllib/recommendation.py | 70 +++- python/pyspark/mllib/recommendation.pyi | 71 - 2 files changed, 42 insertions(+), 99 deletions(-) diff --git a/python/pyspark/mllib/recommendation.py b/python/pyspark/mllib/recommendation.py index cb412fb..55eae10 100644 --- a/python/pyspark/mllib/recommendation.py +++ b/python/pyspark/mllib/recommendation.py @@ -17,7 +17,7 @@ import array import sys -from collections import namedtuple +from typing import Any, List, NamedTuple, Optional, Tuple, Type, Union from pyspark import SparkContext, since from pyspark.rdd import RDD @@ -28,7 +28,7 @@ from pyspark.sql import DataFrame __all__ = ["MatrixFactorizationModel", "ALS", "Rating"] -class Rating(namedtuple("Rating", ["user", "product", "rating"])): +class Rating(NamedTuple): """ Represents a (user, product, rating) tuple. @@ -43,12 +43,18 @@ class Rating(namedtuple("Rating", ["user", "product", "rating"])): (1, 2, 5.0) """ -def __reduce__(self): +user: int +product: int +rating: float + +def __reduce__(self) -> Tuple[Type["Rating"], Tuple[int, int, float]]: return Rating, (int(self.user), int(self.product), float(self.rating)) @inherit_doc -class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader): +class MatrixFactorizationModel( +JavaModelWrapper, JavaSaveable, JavaLoader["MatrixFactorizationModel"] +): """A matrix factorisation model trained by regularized alternating least-squares. @@ -135,14 +141,14 @@ class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader): """ @since("0.9.0") -def predict(self, user, product): +def predict(self, user: int, product: int) -> float: """ Predicts rating for the given user and product. """ return self._java_model.predict(int(user), int(product)) @since("0.9.0") -def predictAll(self, user_product): +def predictAll(self, user_product: RDD[Tuple[int, int]]) -> RDD[Rating]: """ Returns a list of predicted ratings for input user and product pairs. @@ -154,7 +160,7 @@ class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader): return self.call("predict", user_product) @since("1.2.0") -def userFeatures(self): +def userFeatures(self) -> RDD[Tuple[int, array.array]]: """ Returns a paired RDD, where the first element is the user and the second is an array of features corresponding to that user. @@ -162,7 +168,7 @@ class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader): return self.call("getUserFeatures").mapValues(lambda v: array.array("d", v)) @since("1.2.0") -def productFeatures(self): +def productFeatures(self) -> RDD[Tuple[int, array.array]]: """ Returns a paired RDD, where the first element is the product and the second is an array of features corresponding to that product. @@ -170,7 +176,7 @@ class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader): return self.call("getProductFeatures").mapValues(lambda v: array.array("d", v)) @since("1.4.0") -def recommendUsers(self, product, num): +def recommendUsers(self, product: int, num: int) -> List[Rating]: """ Recommends the top "num" number of users for a given product and returns a list of Rating objects sorted by the predicted rating in @@ -179,7 +185,7 @@ class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader): return list(self.call("recommendUsers", product, num)) @since("1.4.0") -def recommendProducts(self, user, num): +def recommendProducts(self, user: int, num: int) -> List[Rating]: """ Recommends the top "num"
[spark] branch branch-3.3 updated: Revert "[SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new cf0afa8 Revert "[SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable" cf0afa8 is described below commit cf0afa8619544ab6008fcec8a25891c2ff43625a Author: Thomas Graves AuthorDate: Thu Mar 17 12:54:50 2022 -0700 Revert "[SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable" ### What changes were proposed in this pull request? This reverts commit 8405ec352dbed6a3199fc2af3c60fae7186d15b5. ### Why are the changes needed? The original PR broke K8s integration tests so lets revert in branch-3.3 for now and fix on master. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CI. K8s IT is recovered like the following. ``` [info] KubernetesSuite: [info] - Run SparkPi with no resources (9 seconds, 832 milliseconds) [info] - Run SparkPi with no resources & statefulset allocation (9 seconds, 715 milliseconds) [info] - Run SparkPi with a very long application name. (8 seconds, 672 milliseconds) [info] - Use SparkLauncher.NO_RESOURCE (9 seconds, 614 milliseconds) [info] - Run SparkPi with a master URL without a scheme. (9 seconds, 616 milliseconds) [info] - Run SparkPi with an argument. (8 seconds, 633 milliseconds) [info] - Run SparkPi with custom labels, annotations, and environment variables. (8 seconds, 631 milliseconds) [info] - All pods have the same service account by default (8 seconds, 625 milliseconds) [info] - Run extraJVMOptions check on driver (4 seconds, 639 milliseconds) [info] - Run SparkRemoteFileTest using a remote data file (8 seconds, 699 milliseconds) [info] - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j2.properties (14 seconds, 31 milliseconds) [info] - Run SparkPi with env and mount secrets. (17 seconds, 878 milliseconds) [info] - Run PySpark on simple pi.py example (9 seconds, 642 milliseconds) [info] - Run PySpark to test a pyfiles example (11 seconds, 883 milliseconds) [info] - Run PySpark with memory customization (9 seconds, 602 milliseconds) [info] - Run in client mode. (6 seconds, 303 milliseconds) [info] - Start pod creation from template (8 seconds, 864 milliseconds) [info] - SPARK-38398: Schedule pod creation from template (8 seconds, 665 milliseconds) [info] - Test basic decommissioning (41 seconds, 74 milliseconds) [info] - Test basic decommissioning with shuffle cleanup (41 seconds, 318 milliseconds) [info] - Test decommissioning with dynamic allocation & shuffle cleanups (2 minutes, 40 seconds) [info] - Test decommissioning timeouts (41 seconds, 892 milliseconds) [info] - SPARK-37576: Rolling decommissioning (1 minute, 7 seconds) [info] - Run SparkR on simple dataframe.R example (11 seconds, 643 milliseconds) [info] VolcanoSuite: [info] - Run SparkPi with no resources (9 seconds, 585 milliseconds) [info] - Run SparkPi with no resources & statefulset allocation (10 seconds, 607 milliseconds) [info] - Run SparkPi with a very long application name. (9 seconds, 636 milliseconds) [info] - Use SparkLauncher.NO_RESOURCE (10 seconds, 681 milliseconds) [info] - Run SparkPi with a master URL without a scheme. (10 seconds, 628 milliseconds) [info] - Run SparkPi with an argument. (9 seconds, 638 milliseconds) [info] - Run SparkPi with custom labels, annotations, and environment variables. (9 seconds, 626 milliseconds) [info] - All pods have the same service account by default (10 seconds, 615 milliseconds) [info] - Run extraJVMOptions check on driver (4 seconds, 590 milliseconds) [info] - Run SparkRemoteFileTest using a remote data file (9 seconds, 660 milliseconds) [info] - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j2.properties (15 seconds, 277 milliseconds) [info] - Run SparkPi with env and mount secrets. (19 seconds, 300 milliseconds) [info] - Run PySpark on simple pi.py example (10 seconds, 641 milliseconds) [info] - Run PySpark to test a pyfiles example (12 seconds, 656 milliseconds) [info] - Run PySpark with memory customization (10 seconds, 599 milliseconds) [info] - Run in client mode. (7 seconds, 258 milliseconds) [info] - Start pod creation from template (10 seconds, 664 milliseconds) [info] - SPARK-38398: Schedule pod creation from template (10 seconds, 891 milliseconds) [info] - Test basic decommissioning (42 seconds, 85 milliseconds) [info] - Test basic decommissioning with shuffle cleanup (42 seconds, 384 milliseconds) [info] - Test decommissioning with dynamic allocation &
[spark] branch master updated (968bb34 -> cd86df8)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 968bb34 [SPARK-38575][INFRA][FOLLOW-UP] Use GITHUB_REF to get the current branch add cd86df8 [SPARK-38575][INFRA][FOLLOW-UP] Pin the branch to `master` for forked repositories No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38575][INFRA][FOLLOW-UP] Use GITHUB_REF to get the current branch
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 968bb34 [SPARK-38575][INFRA][FOLLOW-UP] Use GITHUB_REF to get the current branch 968bb34 is described below commit 968bb34ef6f824d2c1b7c68ea5190d0f5ef32654 Author: Hyukjin Kwon AuthorDate: Thu Mar 17 18:52:26 2022 +0900 [SPARK-38575][INFRA][FOLLOW-UP] Use GITHUB_REF to get the current branch ### What changes were proposed in this pull request? Current `git --show-current` doesn't work because we're not in a git repository when we calculate the changes we need. Instead, this PR proposes to use `GITHUB_REF` environment variable to get the current branch. ### Why are the changes needed? To fix up the build. See https://github.com/apache/spark/runs/5583091595?check_suite_focus=true (branch names) ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Should monitor test. Closes #35892 from HyukjinKwon/SPARK-38575-follow2. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- .github/workflows/build_and_test.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index c589b39..2e9f971 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -96,7 +96,7 @@ jobs: echo '::set-output name=hadoop::hadoop3' else echo '::set-output name=java::8' - echo "::set-output name=branch::`git branch --show-current`" + echo "::set-output name=branch::${GITHUB_REF#refs/heads/}" echo '::set-output name=type::regular' echo '::set-output name=envs::{"SPARK_ANSI_SQL_MODE": "${{ inputs.ansi_enabled }}"}' echo '::set-output name=hadoop::hadoop3' - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-38586][INFRA] Trigger notifying workflow in branch-3.3 and other future branches
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 2ee5705 [SPARK-38586][INFRA] Trigger notifying workflow in branch-3.3 and other future branches 2ee5705 is described below commit 2ee57055fd917a1829adce06c2504db0a68e3ba2 Author: Hyukjin Kwon AuthorDate: Thu Mar 17 18:43:42 2022 +0900 [SPARK-38586][INFRA] Trigger notifying workflow in branch-3.3 and other future branches This PR fixes `Notify test workflow` workflow to be triggered against PRs. In fact, we don't need to check if the branch is `master` since the event triggers the workflow that's found in the commit SHA. To link builds to the CI status in PRs. No, dev-only. Will be checked after it gets merged - it's pretty straightforward. Closes #35891 from HyukjinKwon/SPARK-38575-follow. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit 5c4930ab96360fc8c9ec8f15316e36eb7d516560) Signed-off-by: Hyukjin Kwon --- .github/workflows/notify_test_workflow.yml | 1 - 1 file changed, 1 deletion(-) diff --git a/.github/workflows/notify_test_workflow.yml b/.github/workflows/notify_test_workflow.yml index 04e7ab8..eb0da84 100644 --- a/.github/workflows/notify_test_workflow.yml +++ b/.github/workflows/notify_test_workflow.yml @@ -34,7 +34,6 @@ jobs: steps: - name: "Notify test workflow" uses: actions/github-script@f05a81df23035049204b043b50c3322045ce7eb3 # pin@v3 -if: ${{ github.base_ref == 'master' }} with: github-token: ${{ secrets.GITHUB_TOKEN }} script: | - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38586][INFRA] Trigger notifying workflow in branch-3.3 and other future branches
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5c4930a [SPARK-38586][INFRA] Trigger notifying workflow in branch-3.3 and other future branches 5c4930a is described below commit 5c4930ab96360fc8c9ec8f15316e36eb7d516560 Author: Hyukjin Kwon AuthorDate: Thu Mar 17 18:43:42 2022 +0900 [SPARK-38586][INFRA] Trigger notifying workflow in branch-3.3 and other future branches ### What changes were proposed in this pull request? This PR fixes `Notify test workflow` workflow to be triggered against PRs. In fact, we don't need to check if the branch is `master` since the event triggers the workflow that's found in the commit SHA. ### Why are the changes needed? To link builds to the CI status in PRs. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Will be checked after it gets merged - it's pretty straightforward. Closes #35891 from HyukjinKwon/SPARK-38575-follow. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- .github/workflows/notify_test_workflow.yml | 1 - 1 file changed, 1 deletion(-) diff --git a/.github/workflows/notify_test_workflow.yml b/.github/workflows/notify_test_workflow.yml index 04e7ab8..eb0da84 100644 --- a/.github/workflows/notify_test_workflow.yml +++ b/.github/workflows/notify_test_workflow.yml @@ -34,7 +34,6 @@ jobs: steps: - name: "Notify test workflow" uses: actions/github-script@f05a81df23035049204b043b50c3322045ce7eb3 # pin@v3 -if: ${{ github.base_ref == 'master' }} with: github-token: ${{ secrets.GITHUB_TOKEN }} script: | - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated (ebaee4f -> 1824c69)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git. from ebaee4f [SPARK-37995][SQL] PlanAdaptiveDynamicPruningFilters should use prepareExecutedPlan rather than createSparkPlan to re-plan subquery add 1824c69 [SPARK-38566][SQL][3.3] Revert the parser changes for DEFAULT column support No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md| 1 - .../spark/sql/catalyst/parser/SqlBaseLexer.g4 | 1 - .../spark/sql/catalyst/parser/SqlBaseParser.g4 | 22 ++-- .../spark/sql/catalyst/parser/AstBuilder.scala | 61 +- .../spark/sql/errors/QueryParsingErrors.scala | 4 -- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 53 --- .../spark/sql/execution/SparkSqlParser.scala | 2 +- 7 files changed, 6 insertions(+), 138 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR][INFRA] Add ANTLR generated files to .gitignore
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 46ccc22 [MINOR][INFRA] Add ANTLR generated files to .gitignore 46ccc22 is described below commit 46ccc22ee40c780f6ae4a9af4562fb1ad10ccd9f Author: Yuto Akutsu AuthorDate: Thu Mar 17 18:12:13 2022 +0900 [MINOR][INFRA] Add ANTLR generated files to .gitignore ### What changes were proposed in this pull request? Add git ignore entries for files created by ANTLR. ### Why are the changes needed? To avoid developers from accidentally adding those files when working on parser/lexer. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? To make sure those files are ignored by `git status` when they exist. Closes #35838 from yutoacts/minor_gitignore. Authored-by: Yuto Akutsu Signed-off-by: Kousuke Saruta --- .gitignore | 5 + 1 file changed, 5 insertions(+) diff --git a/.gitignore b/.gitignore index b758781..0e2f59f 100644 --- a/.gitignore +++ b/.gitignore @@ -117,3 +117,8 @@ spark-warehouse/ # For Node.js node_modules + +# For Antlr +sql/catalyst/gen/ +sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.tokens +sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/gen/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-37995][SQL] PlanAdaptiveDynamicPruningFilters should use prepareExecutedPlan rather than createSparkPlan to re-plan subquery
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new a27f13e [SPARK-37995][SQL] PlanAdaptiveDynamicPruningFilters should use prepareExecutedPlan rather than createSparkPlan to re-plan subquery a27f13e is described below commit a27f13ea8c14d72558c77bfb75d725d649cd2081 Author: ulysses-you AuthorDate: Thu Mar 17 16:58:22 2022 +0800 [SPARK-37995][SQL] PlanAdaptiveDynamicPruningFilters should use prepareExecutedPlan rather than createSparkPlan to re-plan subquery ### What changes were proposed in this pull request? Use exists adaptive execution context to re-compile subquery. ### Why are the changes needed? If a subquery which is inferred by dpp contains a nested subquery, AQE can not compile it correctly. the added test will fail before this pr: ```java java.lang.ClassCastException: org.apache.spark.sql.catalyst.plans.logical.Aggregate cannot be cast to org.apache.spark.sql.execution.SparkPlan at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75) at org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75) ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? add test Closes #35849 from ulysses-you/SPARK-37995. Authored-by: ulysses-you Signed-off-by: Wenchen Fan (cherry picked from commit 3afc4fb08b01597a5677ce706731639c687fd2dd) Signed-off-by: Wenchen Fan --- .../apache/spark/sql/execution/QueryExecution.scala | 13 + .../adaptive/PlanAdaptiveDynamicPruningFilters.scala | 11 --- .../spark/sql/DynamicPartitionPruningSuite.scala | 20 +++- 3 files changed, 36 insertions(+), 8 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala index 0d3b752..4c5b0cc 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala @@ -482,6 +482,19 @@ object QueryExecution { prepareExecutedPlan(spark, sparkPlan) } + /** + * Prepare the [[SparkPlan]] for execution using exists adaptive execution context. + * This method is only called by [[PlanAdaptiveDynamicPruningFilters]]. + */ + def prepareExecutedPlan( + session: SparkSession, + plan: LogicalPlan, + context: AdaptiveExecutionContext): SparkPlan = { +val sparkPlan = createSparkPlan(session, session.sessionState.planner, plan.clone()) +val preparationRules = preparations(session, Option(InsertAdaptiveSparkPlan(context)), true) +prepareForExecution(preparationRules, sparkPlan.clone()) + } + private val currentCteMap = new ThreadLocal[mutable.HashMap[Long, CTERelationDef]]() def cteMap: mutable.HashMap[Long, CTERelationDef] = currentCteMap.get() diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala index f9c1bbe..aa8f236 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala @@ -71,13 +71,10 @@ case class PlanAdaptiveDynamicPruningFilters( val aggregate = Aggregate(Seq(alias), Seq(alias), buildPlan) val session = adaptivePlan.context.session - val planner = session.sessionState.planner - // Here we can't call the QueryExecution.prepareExecutedPlan() method to - // get the sparkPlan as Non-AQE use case, which will cause the physical - // plan optimization rules be inserted twice, once in AQE framework and - // another in prepareExecutedPlan() method. - val
[spark] branch branch-3.3 updated: [SPARK-37995][SQL] PlanAdaptiveDynamicPruningFilters should use prepareExecutedPlan rather than createSparkPlan to re-plan subquery
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new ebaee4f [SPARK-37995][SQL] PlanAdaptiveDynamicPruningFilters should use prepareExecutedPlan rather than createSparkPlan to re-plan subquery ebaee4f is described below commit ebaee4fad1a09d5e19d74ced940ef09f2a416f71 Author: ulysses-you AuthorDate: Thu Mar 17 16:58:22 2022 +0800 [SPARK-37995][SQL] PlanAdaptiveDynamicPruningFilters should use prepareExecutedPlan rather than createSparkPlan to re-plan subquery ### What changes were proposed in this pull request? Use exists adaptive execution context to re-compile subquery. ### Why are the changes needed? If a subquery which is inferred by dpp contains a nested subquery, AQE can not compile it correctly. the added test will fail before this pr: ```java java.lang.ClassCastException: org.apache.spark.sql.catalyst.plans.logical.Aggregate cannot be cast to org.apache.spark.sql.execution.SparkPlan at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75) at org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75) ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? add test Closes #35849 from ulysses-you/SPARK-37995. Authored-by: ulysses-you Signed-off-by: Wenchen Fan (cherry picked from commit 3afc4fb08b01597a5677ce706731639c687fd2dd) Signed-off-by: Wenchen Fan --- .../apache/spark/sql/execution/QueryExecution.scala | 13 + .../adaptive/PlanAdaptiveDynamicPruningFilters.scala | 11 --- .../spark/sql/DynamicPartitionPruningSuite.scala | 20 +++- 3 files changed, 36 insertions(+), 8 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala index 1b08994..9bf8de5 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala @@ -485,6 +485,19 @@ object QueryExecution { prepareExecutedPlan(spark, sparkPlan) } + /** + * Prepare the [[SparkPlan]] for execution using exists adaptive execution context. + * This method is only called by [[PlanAdaptiveDynamicPruningFilters]]. + */ + def prepareExecutedPlan( + session: SparkSession, + plan: LogicalPlan, + context: AdaptiveExecutionContext): SparkPlan = { +val sparkPlan = createSparkPlan(session, session.sessionState.planner, plan.clone()) +val preparationRules = preparations(session, Option(InsertAdaptiveSparkPlan(context)), true) +prepareForExecution(preparationRules, sparkPlan.clone()) + } + private val currentCteMap = new ThreadLocal[mutable.HashMap[Long, CTERelationDef]]() def cteMap: mutable.HashMap[Long, CTERelationDef] = currentCteMap.get() diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala index 1cc39df..9a780c1 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala @@ -71,13 +71,10 @@ case class PlanAdaptiveDynamicPruningFilters( val aggregate = Aggregate(Seq(alias), Seq(alias), buildPlan) val session = adaptivePlan.context.session - val planner = session.sessionState.planner - // Here we can't call the QueryExecution.prepareExecutedPlan() method to - // get the sparkPlan as Non-AQE use case, which will cause the physical - // plan optimization rules be inserted twice, once in AQE framework and - // another in prepareExecutedPlan() method. - val
[spark] branch master updated (f0b836b -> 3afc4fb)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f0b836b [SPARK-38560][SQL] If `Sum`, `Count`, `Any` accompany with distinct, cannot do partial agg push down add 3afc4fb [SPARK-37995][SQL] PlanAdaptiveDynamicPruningFilters should use prepareExecutedPlan rather than createSparkPlan to re-plan subquery No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/QueryExecution.scala | 13 + .../adaptive/PlanAdaptiveDynamicPruningFilters.scala | 11 --- .../spark/sql/DynamicPartitionPruningSuite.scala | 20 +++- 3 files changed, 36 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-38560][SQL] If `Sum`, `Count`, `Any` accompany with distinct, cannot do partial agg push down
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 801c330 [SPARK-38560][SQL] If `Sum`, `Count`, `Any` accompany with distinct, cannot do partial agg push down 801c330 is described below commit 801c330036ef93789061daeea82f7cbc8b9cdebb Author: Jiaan Geng AuthorDate: Thu Mar 17 16:53:40 2022 +0800 [SPARK-38560][SQL] If `Sum`, `Count`, `Any` accompany with distinct, cannot do partial agg push down ### What changes were proposed in this pull request? Spark could partial push down sum(distinct col), count(distinct col) if data source have multiple partitions, and Spark will sum the value again. So the result may not correctly. ### Why are the changes needed? Fix the bug push down sum(distinct col), count(distinct col) to data source and return incorrect result. ### Does this PR introduce _any_ user-facing change? 'Yes'. Users will see the correct behavior. ### How was this patch tested? New tests. Closes #35873 from beliefer/SPARK-38560. Authored-by: Jiaan Geng Signed-off-by: Wenchen Fan --- .../datasources/v2/V2ScanRelationPushDown.scala| 184 +++-- .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 14 +- 2 files changed, 111 insertions(+), 87 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala index 3ff9176..b4bd027 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala @@ -26,7 +26,7 @@ import org.apache.spark.sql.catalyst.planning.ScanOperation import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Filter, LeafNode, Limit, LocalLimit, LogicalPlan, Project, Sample, Sort} import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.connector.expressions.SortOrder -import org.apache.spark.sql.connector.expressions.aggregate.{Aggregation, Avg, GeneralAggregateFunc} +import org.apache.spark.sql.connector.expressions.aggregate.{Aggregation, Avg, Count, GeneralAggregateFunc, Sum} import org.apache.spark.sql.connector.read.{Scan, ScanBuilder, SupportsPushDownAggregates, SupportsPushDownFilters, V1Scan} import org.apache.spark.sql.execution.datasources.DataSourceStrategy import org.apache.spark.sql.sources @@ -156,101 +156,106 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper { } } - val pushedAggregates = finalTranslatedAggregates.filter(r.pushAggregation) - if (pushedAggregates.isEmpty) { + if (finalTranslatedAggregates.isEmpty) { aggNode // return original plan node - } else if (!supportPartialAggPushDown(pushedAggregates.get) && -!r.supportCompletePushDown(pushedAggregates.get)) { + } else if (!r.supportCompletePushDown(finalTranslatedAggregates.get) && +!supportPartialAggPushDown(finalTranslatedAggregates.get)) { aggNode // return original plan node } else { -// No need to do column pruning because only the aggregate columns are used as -// DataSourceV2ScanRelation output columns. All the other columns are not -// included in the output. -val scan = sHolder.builder.build() - -// scalastyle:off -// use the group by columns and aggregate columns as the output columns -// e.g. TABLE t (c1 INT, c2 INT, c3 INT) -// SELECT min(c1), max(c1) FROM t GROUP BY c2; -// Use c2, min(c1), max(c1) as output for DataSourceV2ScanRelation -// We want to have the following logical plan: -// == Optimized Logical Plan == -// Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18] -// +- RelationV2[c2#10, min(c1)#21, max(c1)#22] -// scalastyle:on -val newOutput = scan.readSchema().toAttributes -assert(newOutput.length == groupingExpressions.length + finalAggregates.length) -val groupAttrs = normalizedGroupingExpressions.zip(newOutput).map { - case (a: Attribute, b: Attribute) => b.withExprId(a.exprId) - case (_, b) => b -} -val aggOutput = newOutput.drop(groupAttrs.length) -val output = groupAttrs ++ aggOutput - -logInfo( -
[spark] branch master updated: [SPARK-38560][SQL] If `Sum`, `Count`, `Any` accompany with distinct, cannot do partial agg push down
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f0b836b [SPARK-38560][SQL] If `Sum`, `Count`, `Any` accompany with distinct, cannot do partial agg push down f0b836b is described below commit f0b836b1a6d5926dd09018b67a27461aca5ce739 Author: Jiaan Geng AuthorDate: Thu Mar 17 16:53:40 2022 +0800 [SPARK-38560][SQL] If `Sum`, `Count`, `Any` accompany with distinct, cannot do partial agg push down ### What changes were proposed in this pull request? Spark could partial push down sum(distinct col), count(distinct col) if data source have multiple partitions, and Spark will sum the value again. So the result may not correctly. ### Why are the changes needed? Fix the bug push down sum(distinct col), count(distinct col) to data source and return incorrect result. ### Does this PR introduce _any_ user-facing change? 'Yes'. Users will see the correct behavior. ### How was this patch tested? New tests. Closes #35873 from beliefer/SPARK-38560. Authored-by: Jiaan Geng Signed-off-by: Wenchen Fan --- .../datasources/v2/V2ScanRelationPushDown.scala| 184 +++-- .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 14 +- 2 files changed, 111 insertions(+), 87 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala index 3ff9176..b4bd027 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala @@ -26,7 +26,7 @@ import org.apache.spark.sql.catalyst.planning.ScanOperation import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Filter, LeafNode, Limit, LocalLimit, LogicalPlan, Project, Sample, Sort} import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.connector.expressions.SortOrder -import org.apache.spark.sql.connector.expressions.aggregate.{Aggregation, Avg, GeneralAggregateFunc} +import org.apache.spark.sql.connector.expressions.aggregate.{Aggregation, Avg, Count, GeneralAggregateFunc, Sum} import org.apache.spark.sql.connector.read.{Scan, ScanBuilder, SupportsPushDownAggregates, SupportsPushDownFilters, V1Scan} import org.apache.spark.sql.execution.datasources.DataSourceStrategy import org.apache.spark.sql.sources @@ -156,101 +156,106 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper { } } - val pushedAggregates = finalTranslatedAggregates.filter(r.pushAggregation) - if (pushedAggregates.isEmpty) { + if (finalTranslatedAggregates.isEmpty) { aggNode // return original plan node - } else if (!supportPartialAggPushDown(pushedAggregates.get) && -!r.supportCompletePushDown(pushedAggregates.get)) { + } else if (!r.supportCompletePushDown(finalTranslatedAggregates.get) && +!supportPartialAggPushDown(finalTranslatedAggregates.get)) { aggNode // return original plan node } else { -// No need to do column pruning because only the aggregate columns are used as -// DataSourceV2ScanRelation output columns. All the other columns are not -// included in the output. -val scan = sHolder.builder.build() - -// scalastyle:off -// use the group by columns and aggregate columns as the output columns -// e.g. TABLE t (c1 INT, c2 INT, c3 INT) -// SELECT min(c1), max(c1) FROM t GROUP BY c2; -// Use c2, min(c1), max(c1) as output for DataSourceV2ScanRelation -// We want to have the following logical plan: -// == Optimized Logical Plan == -// Aggregate [c2#10], [min(min(c1)#21) AS min(c1)#17, max(max(c1)#22) AS max(c1)#18] -// +- RelationV2[c2#10, min(c1)#21, max(c1)#22] -// scalastyle:on -val newOutput = scan.readSchema().toAttributes -assert(newOutput.length == groupingExpressions.length + finalAggregates.length) -val groupAttrs = normalizedGroupingExpressions.zip(newOutput).map { - case (a: Attribute, b: Attribute) => b.withExprId(a.exprId) - case (_, b) => b -} -val aggOutput = newOutput.drop(groupAttrs.length) -val output = groupAttrs ++ aggOutput - -logInfo( - s"""