GitHub user dusktreader opened a pull request:

    https://github.com/apache/spark/pull/18981

    Fixed pandoc dependency issue in python/setup.py

    ## Problem Description
    
    When pyspark is listed as a dependency of another package, installing
    the other package will cause an install failure in pyspark. When the
    other package is being installed, pyspark's setup_requires requirements
    are installed including pypandoc. Thus, the exception handling on
    setup.py:152 does not work because the pypandoc module is indeed
    available. However, the pypandoc.convert() function fails if pandoc
    itself is not installed (in our use cases it is not). This raises an
    OSError that is not handled, and setup fails.
    
    The following is a sample failure:
    ```
    $ which pandoc
    $ pip freeze | grep pypandoc
    pypandoc==1.4
    $ pip install pyspark
    Collecting pyspark
      Downloading pyspark-2.2.0.post0.tar.gz (188.3MB)
        100% 
|████████████████████████████████|
 188.3MB 16.8MB/s
        Complete output from command python setup.py egg_info:
        Maybe try:
    
            sudo apt-get install pandoc
        See http://johnmacfarlane.net/pandoc/installing.html
        for installation options
        ---------------------------------------------------------------
    
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/tmp/pip-build-mfnizcwa/pyspark/setup.py", line 151, in <module>
            long_description = pypandoc.convert('README.md', 'rst')
          File 
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
 line 69, in convert
            outputfile=outputfile, filters=filters)
          File 
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
 line 260, in _convert_input
            _ensure_pandoc_path()
          File 
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
 line 544, in _ensure_pandoc_path
            raise OSError("No pandoc was found: either install pandoc and add 
it\n"
        OSError: No pandoc was found: either install pandoc and add it
        to your PATH or or call pypandoc.download_pandoc(...) or
        install pypandoc wheels with included pandoc.
    
        ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in 
/tmp/pip-build-mfnizcwa/pyspark/
    ```
    
    ## What changes were proposed in this pull request?
    
    This change simply adds an additional exception handler for the OSError
    that is raised. This allows pyspark to be installed client-side without 
requiring pandoc to be installed.
    
    ## How was this patch tested?
    
    I tested this by building a wheel package of pyspark with the change 
applied. Then, in a clean virtual environment with pypandoc installed but 
pandoc not available on the system, I installed pyspark from the wheel.
    
    Here is the output
    
    ```
    $ pip freeze | grep pypandoc
    pypandoc==1.4
    $ which pandoc
    $ pip install --no-cache-dir 
../spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl 
    Processing 
/home/tbeck/work/spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl
    Requirement already satisfied: py4j==0.10.6 in 
/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages (from 
pyspark==2.3.0.dev0)
    Installing collected packages: pyspark
    Successfully installed pyspark-2.3.0.dev0
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dusktreader/spark 
dusktreader/fix-pandoc-dependency-issue-in-setup_py

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18981.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18981
    
----
commit edd53828a23561144b535582430c0805fe54b355
Author: Tucker Beck <tucker.b...@rentrakmail.com>
Date:   2017-08-17T22:20:47Z

    Fixed pandoc dependency issue in python/setup.py
    
    When pyspark is listed as a dependency of another package, installing
    the other package will cause an install failure in pyspark. When the
    other package is being installed, pyspark's setup_requires requirements
    are installed including pypandoc. Thus, the exception handling on
    setup.py:152 does not work because the pypandoc module is indeed
    available. However, the pypandoc.convert() function fails if pandoc
    itself is not installed (in our use cases it is not). This raises an
    OSError that is not handled, and setup fails.
    
    This change simply adds an additional exception handler for the OSError
    that is raised.
    
    The following is a sample failure:
    
    $ which pandoc
    $ pip freeze | grep pypandoc
    pypandoc==1.4
    $ pip install pyspark
    Collecting pyspark
      Downloading pyspark-2.2.0.post0.tar.gz (188.3MB)
        100% 
|████████████████████████████████|
 188.3MB 16.8MB/s
        Complete output from command python setup.py egg_info:
        Maybe try:
    
            sudo apt-get install pandoc
        See http://johnmacfarlane.net/pandoc/installing.html
        for installation options
        ---------------------------------------------------------------
    
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/tmp/pip-build-mfnizcwa/pyspark/setup.py", line 151, in <module>
            long_description = pypandoc.convert('README.md', 'rst')
          File 
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
 line 69, in convert
            outputfile=outputfile, filters=filters)
          File 
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
 line 260, in _convert_input
            _ensure_pandoc_path()
          File 
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
 line 544, in _ensure_pandoc_path
            raise OSError("No pandoc was found: either install pandoc and add 
it\n"
        OSError: No pandoc was found: either install pandoc and add it
        to your PATH or or call pypandoc.download_pandoc(...) or
        install pypandoc wheels with included pandoc.
    
        ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in 
/tmp/pip-build-mfnizcwa/pyspark/

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to