[GitHub] [spark] HyukjinKwon commented on a change in pull request #29703: [SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI

GitBox Wed, 16 Sep 2020 21:07:21 -0700


HyukjinKwon commented on a change in pull request #29703:
URL: https://github.com/apache/spark/pull/29703#discussion_r489952888




##########
File path: python/docs/source/getting_started/installation.rst
##########
@@ -38,8 +38,36 @@ PySpark installation using `PyPI 
<https://pypi.org/project/pyspark/>`_
 .. code-block:: bash
 
     pip install pyspark
-       
-Using Conda  
+
+For PySpark with different Hadoop and/or Hive, you can install it by using 
``HIVE_VERSION`` and ``HADOOP_VERSION`` environment variables as below:
+
+.. code-block:: bash
+
+    HIVE_VERSION=2.3 pip install pyspark
+    HADOOP_VERSION=2.7 pip install pyspark
+    HIVE_VERSION=1.2 HADOOP_VERSION=2.7 pip install pyspark
+
+The default distribution has built-in Hadoop 3.2 and Hive 2.3. If users 
specify different versions, the pip installation automatically
+downloads a different version and use it in PySpark. Downloading it can take a 
while depending on the network and the mirror chosen.
+It is recommended to use `-v` option in `pip` to track the installation and 
download status.
+
+.. code-block:: bash
+
+    HADOOP_VERSION=2.7 pip install pyspark -v
+
+Supported versions are as below:
+
+====================================== ====================================== 
======================================
+``HADOOP_VERSION`` \\ ``HIVE_VERSION`` 1.2                                    
2.3 (default)
+====================================== ====================================== 
======================================
+**2.7**                                O                                      O
+**3.2 (default)**                      X                                      O
+**without**                            X                                      O
+====================================== ====================================== 
======================================
+
+Note that this installation of PySpark with different versions of Hadoop and 
Hive is experimental. It can change or be removed betweem minor releases.

Review comment:
       ```suggestion
   Note that this installation of PySpark with different versions of Hadoop and 
Hive is experimental. It can change or be removed between minor releases.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29703: [SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI

Reply via email to