GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/20487

    [SPARK-23319][TESTS] Explicitly skips PySpark tests for old Pandas and 
PyArrow

    ## What changes were proposed in this pull request?
    
    This PR proposes to explicitly skip the tests for old Pandas and PyArrow.
    
    We declared the extra dependencies:
    
    
https://github.com/apache/spark/blob/b8bfce51abf28c66ba1fc67b0f25fe1617c81025/python/setup.py#L204
    
    but currently we only check if pyarrow is installed or not without checking 
the version. It already fails to run tests.
    
    Also, we have a conditional skip for old Pandas. Seems we specify the 
condition for Pandas >= 0.19.2.
    
    ## How was this patch tested?
    
    Manually tested by modifying the condition:
    
    ```
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) 
... skipped 'Pandas >= 1.19.2 must be installed; however, your version was 
0.19.2.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) 
... skipped 'Pandas >= 1.19.2 must be installed; however, your version was 
0.19.2.'
    test_createDataFrame_respect_session_timezone 
(pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 1.19.2 must be installed; 
however, your version was 0.19.2.'
    ```
    
    ```
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) 
... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) 
... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    test_createDataFrame_respect_session_timezone 
(pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; 
however, it was not found.'
    ```
    
    ```
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) 
... skipped 'PyArrow >= 1.8.0 must be installed; however, your version was 
0.8.0.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) 
... skipped 'PyArrow >= 1.8.0 must be installed; however, your version was 
0.8.0.'
    test_createDataFrame_respect_session_timezone 
(pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 1.8.0 must be installed; 
however, your version was 0.8.0.'
    ```
    
    ```
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) 
... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) 
... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
    test_createDataFrame_respect_session_timezone 
(pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; 
however, it was not found.'
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark pyarrow-pandas-skip

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20487.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20487
    
----
commit 08b42f80322636169fc440e0e2f36819b8d6e837
Author: hyukjinkwon <gurwls223@...>
Date:   2018-02-02T13:21:34Z

    Explicitly skips PySpark tests for old Pandas and PyArrow

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to