This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 016ab0c [SPARK-37050][PYTHON] Update Conda installation instructions 016ab0c is described below commit 016ab0c6f2e5d29a2df08bc6d5e2bb7240a6577a Author: H. Vetinari <h.vetin...@gmx.com> AuthorDate: Fri Oct 22 13:34:23 2021 +0900 [SPARK-37050][PYTHON] Update Conda installation instructions ### What changes were proposed in this pull request? Improve conda installation docs ### Why are the changes needed? As requested [here](https://github.com/apache/spark-website/pull/361#issuecomment-945660978). Ideally, this should be backported to the 3.2-branch (so it becomes visible for the 3.2.0 installation documentation [here](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)). CC gengliangwang ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not tested Closes #34315 from h-vetinari/conda-install. Lead-authored-by: H. Vetinari <h.vetin...@gmx.com> Co-authored-by: h-vetinari <h.vetin...@gmx.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- python/docs/source/getting_started/install.rst | 50 ++++++++++++-------------- 1 file changed, 23 insertions(+), 27 deletions(-) diff --git a/python/docs/source/getting_started/install.rst b/python/docs/source/getting_started/install.rst index 63a7eca..13c6f8f 100644 --- a/python/docs/source/getting_started/install.rst +++ b/python/docs/source/getting_started/install.rst @@ -83,46 +83,42 @@ Note that this installation way of PySpark with/without a specific Hadoop versio Using Conda ----------- -Conda is an open-source package management and environment management system which is a part of -the `Anaconda <https://docs.continuum.io/anaconda/>`_ distribution. It is both cross-platform and -language agnostic. In practice, Conda can replace both `pip <https://pip.pypa.io/en/latest/>`_ and -`virtualenv <https://virtualenv.pypa.io/en/latest/>`_. +Conda is an open-source package management and environment management system (developed by +`Anaconda <https://www.anaconda.com/>`_), which is best installed through +`Miniconda <https://docs.conda.io/en/latest/miniconda.html/>`_ or `Miniforge <https://github.com/conda-forge/miniforge/>`_. +The tool is both cross-platform and language agnostic, and in practice, conda can replace both +`pip <https://pip.pypa.io/en/latest/>`_ and `virtualenv <https://virtualenv.pypa.io/en/latest/>`_. -Create new virtual environment from your terminal as shown below: +Conda uses so-called channels to distribute packages, and together with the default channels by +Anaconda itself, the most important channel is `conda-forge <https://conda-forge.org/>`_, which +is the community-driven packaging effort that is the most extensive & the most current (and also +serves as the upstream for the Anaconda channels in most cases). -.. code-block:: bash - - conda create -n pyspark_env - -After the virtual environment is created, it should be visible under the list of Conda environments -which can be seen using the following command: - -.. code-block:: bash - - conda env list - -Now activate the newly created environment with the following command: +To create a new conda environment from your terminal and activate it, proceed as shown below: .. code-block:: bash + conda create -n pyspark_env conda activate pyspark_env -You can install pyspark by `Using PyPI <#using-pypi>`_ to install PySpark in the newly created -environment, for example as below. It will install PySpark under the new virtual environment -``pyspark_env`` created above. +After activating the environment, use the following command to install pyspark, +a python version of your choice, as well as other packages you want to use in +the same session as pyspark (you can install in several steps too). .. code-block:: bash - pip install pyspark - -Alternatively, you can install PySpark from Conda itself as below: + conda install -c conda-forge pyspark # can also add "python=3.8 some_package [etc.]" here -.. code-block:: bash +Note that `PySpark for conda <https://anaconda.org/conda-forge/pyspark>`_ is maintained +separately by the community; while new versions generally get packaged quickly, the +availability through conda(-forge) is not directly in sync with the PySpark release cycle. - conda install pyspark +While using pip in a conda environment is technically feasible (with the same command as +`above <#using-pypi>`_), this approach is `discouraged <https://www.anaconda.com/blog/using-pip-in-a-conda-environment/>`_, +because pip does not interoperate with conda. -However, note that `PySpark at Conda <https://anaconda.org/conda-forge/pyspark>`_ is not necessarily -synced with PySpark release cycle because it is maintained by the community separately. +For a short summary about useful conda commands, see their +`cheat sheet <https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html/>`_. Manually Downloading --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org