[GitHub] spark issue #19816: [SPARK-21693][FOLLOWUP][R] Reduce shuffle partitions run...

HyukjinKwon Sat, 25 Nov 2017 01:11:30 -0800

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19816
  
    @felixcheung, I just tried to lower this by default and ran. Seems some 
tests are being failed. For example, if we lower`spark.sql.shuffle.partitions` 
to 5, these fail additionally:
    
    ```
    Failed 
-------------------------------------------------------------------------
    1. Failure: spark.als (@test_mllib_recommendation.R#36) 
------------------------
    predictions$prediction not equal to c(-0.1380762, 2.6258414, -1.5018409).
    3/3 mismatches (average diff: 2.75)
    [1]  2.626 - -0.138 ==  2.76
    [2] -1.502 -  2.626 == -4.13
    [3] -0.138 - -1.502 ==  1.36
    
    
    2. Failure: pivot GroupedData column (@test_sparkSQL.R#1921) 
-------------------
    `sum1` not equal to `correct_answer`.
    Component âyearâ: Mean relative difference: 0.0004961548
    Component âPythonâ: Mean relative difference: 0.0952381
    Component âRâ: Mean relative difference: 0.5454545
    
    
    3. Failure: pivot GroupedData column (@test_sparkSQL.R#1922) 
-------------------
    `sum2` not equal to `correct_answer`.
    Component âyearâ: Mean relative difference: 0.0004961548
    Component âPythonâ: Mean relative difference: 0.0952381
    Component âRâ: Mean relative difference: 0.5454545
    
    
    4. Failure: pivot GroupedData column (@test_sparkSQL.R#1923) 
-------------------
    `sum3` not equal to `correct_answer`.
    Component âyearâ: Mean relative difference: 0.0004961548
    Component âPythonâ: Mean relative difference: 0.0952381
    Component âRâ: Mean relative difference: 0.5454545
    
    
    5. Failure: pivot GroupedData column (@test_sparkSQL.R#1924) 
-------------------
    `sum4` not equal to correct_answer[, c("year", "R")].
    Component âyearâ: Mean relative difference: 0.0004961548
    Component âRâ: Mean relative difference: 0.5454545
    ```
     
    Shuffle + R worker cases look not quite frequent (to be clear, just shuffle 
without R will be fine IIUC). 
    
    I don't have a strong opinion on lowering because ..  if we don't lower, 
some tests in the future could cause such problem again vs if we should lower, 
the required change looks quite larger and this case might be not quite 
frequent.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19816: [SPARK-21693][FOLLOWUP][R] Reduce shuffle partitions run...

Reply via email to