This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.4 by this push: new 3477d14d802 [SPARK-42530][PYSPARK][DOCS] Remove Hadoop 2 from PySpark installation guide 3477d14d802 is described below commit 3477d14d802a0b45970f2f99330dd4ddb9e6fefc Author: Dongjoon Hyun <dongj...@apache.org> AuthorDate: Wed Feb 22 17:09:17 2023 -0800 [SPARK-42530][PYSPARK][DOCS] Remove Hadoop 2 from PySpark installation guide ### What changes were proposed in this pull request? This PR aims to remove `Hadoop 2` from PySpark installation guide. ### Why are the changes needed? From Apache Spark 3.4.0, we don't provide Hadoop 2 binaries. ### Does this PR introduce _any_ user-facing change? This is a documentation fix to be consistent with the new availability. ### How was this patch tested? Manual review. Closes #40127 from dongjoon-hyun/SPARK-42530. Authored-by: Dongjoon Hyun <dongj...@apache.org> Signed-off-by: Dongjoon Hyun <dongj...@apache.org> (cherry picked from commit 295617c5d8913fc1afc78fa9647d2f99b925ceaf) Signed-off-by: Dongjoon Hyun <dongj...@apache.org> --- python/docs/source/getting_started/install.rst | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/python/docs/source/getting_started/install.rst b/python/docs/source/getting_started/install.rst index be2a1eae66d..3db6b278403 100644 --- a/python/docs/source/getting_started/install.rst +++ b/python/docs/source/getting_started/install.rst @@ -57,7 +57,7 @@ For PySpark with/without a specific Hadoop version, you can install it by using .. code-block:: bash - PYSPARK_HADOOP_VERSION=2 pip install pyspark + PYSPARK_HADOOP_VERSION=3 pip install pyspark The default distribution uses Hadoop 3.3 and Hive 2.3. If users specify different versions of Hadoop, the pip installation automatically downloads a different version and uses it in PySpark. Downloading it can take a while depending on @@ -65,18 +65,17 @@ the network and the mirror chosen. ``PYSPARK_RELEASE_MIRROR`` can be set to manu .. code-block:: bash - PYSPARK_RELEASE_MIRROR=http://mirror.apache-kr.org PYSPARK_HADOOP_VERSION=2 pip install + PYSPARK_RELEASE_MIRROR=http://mirror.apache-kr.org PYSPARK_HADOOP_VERSION=3 pip install It is recommended to use ``-v`` option in ``pip`` to track the installation and download status. .. code-block:: bash - PYSPARK_HADOOP_VERSION=2 pip install pyspark -v + PYSPARK_HADOOP_VERSION=3 pip install pyspark -v Supported values in ``PYSPARK_HADOOP_VERSION`` are: - ``without``: Spark pre-built with user-provided Apache Hadoop -- ``2``: Spark pre-built for Apache Hadoop 2.7 - ``3``: Spark pre-built for Apache Hadoop 3.3 and later (default) Note that this installation of PySpark with/without a specific Hadoop version is experimental. It can change or be removed between minor releases. @@ -132,7 +131,7 @@ to install Spark, for example, as below: .. code-block:: bash - tar xzvf spark-3.3.0-bin-hadoop3.tgz + tar xzvf spark-3.4.0-bin-hadoop3.tgz Ensure the ``SPARK_HOME`` environment variable points to the directory where the tar file has been extracted. Update ``PYTHONPATH`` environment variable such that it can find the PySpark and Py4J under ``SPARK_HOME/python/lib``. @@ -140,7 +139,7 @@ One example of doing this is shown below: .. code-block:: bash - cd spark-3.3.0-bin-hadoop3 + cd spark-3.4.0-bin-hadoop3 export SPARK_HOME=`pwd` export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo "${ZIPS[*]}"):$PYTHONPATH --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org