HyukjinKwon commented on a change in pull request #29703: URL: https://github.com/apache/spark/pull/29703#discussion_r489952888
########## File path: python/docs/source/getting_started/installation.rst ########## @@ -38,8 +38,36 @@ PySpark installation using `PyPI <https://pypi.org/project/pyspark/>`_ .. code-block:: bash pip install pyspark - -Using Conda + +For PySpark with different Hadoop and/or Hive, you can install it by using ``HIVE_VERSION`` and ``HADOOP_VERSION`` environment variables as below: + +.. code-block:: bash + + HIVE_VERSION=2.3 pip install pyspark + HADOOP_VERSION=2.7 pip install pyspark + HIVE_VERSION=1.2 HADOOP_VERSION=2.7 pip install pyspark + +The default distribution has built-in Hadoop 3.2 and Hive 2.3. If users specify different versions, the pip installation automatically +downloads a different version and use it in PySpark. Downloading it can take a while depending on the network and the mirror chosen. +It is recommended to use `-v` option in `pip` to track the installation and download status. + +.. code-block:: bash + + HADOOP_VERSION=2.7 pip install pyspark -v + +Supported versions are as below: + +====================================== ====================================== ====================================== +``HADOOP_VERSION`` \\ ``HIVE_VERSION`` 1.2 2.3 (default) +====================================== ====================================== ====================================== +**2.7** O O +**3.2 (default)** X O +**without** X O +====================================== ====================================== ====================================== + +Note that this installation of PySpark with different versions of Hadoop and Hive is experimental. It can change or be removed betweem minor releases. Review comment: ```suggestion Note that this installation of PySpark with different versions of Hadoop and Hive is experimental. It can change or be removed between minor releases. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org