Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19816
  
    @felixcheung, I just tried to lower this by default and ran. Seems some 
tests are being failed. For example, if we lower`spark.sql.shuffle.partitions` 
to 5, these fail additionally:
    
    ```
    Failed 
-------------------------------------------------------------------------
    1. Failure: spark.als (@test_mllib_recommendation.R#36) 
------------------------
    predictions$prediction not equal to c(-0.1380762, 2.6258414, -1.5018409).
    3/3 mismatches (average diff: 2.75)
    [1]  2.626 - -0.138 ==  2.76
    [2] -1.502 -  2.626 == -4.13
    [3] -0.138 - -1.502 ==  1.36
    
    
    2. Failure: pivot GroupedData column (@test_sparkSQL.R#1921) 
-------------------
    `sum1` not equal to `correct_answer`.
    Component “year”: Mean relative difference: 0.0004961548
    Component “Python”: Mean relative difference: 0.0952381
    Component “R”: Mean relative difference: 0.5454545
    
    
    3. Failure: pivot GroupedData column (@test_sparkSQL.R#1922) 
-------------------
    `sum2` not equal to `correct_answer`.
    Component “year”: Mean relative difference: 0.0004961548
    Component “Python”: Mean relative difference: 0.0952381
    Component “R”: Mean relative difference: 0.5454545
    
    
    4. Failure: pivot GroupedData column (@test_sparkSQL.R#1923) 
-------------------
    `sum3` not equal to `correct_answer`.
    Component “year”: Mean relative difference: 0.0004961548
    Component “Python”: Mean relative difference: 0.0952381
    Component “R”: Mean relative difference: 0.5454545
    
    
    5. Failure: pivot GroupedData column (@test_sparkSQL.R#1924) 
-------------------
    `sum4` not equal to correct_answer[, c("year", "R")].
    Component “year”: Mean relative difference: 0.0004961548
    Component “R”: Mean relative difference: 0.5454545
    ```
     
    Shuffle + R worker cases look not quite frequent (to be clear, just shuffle 
without R will be fine IIUC). 
    
    I don't have a strong opinion on lowering because ..  if we don't lower, 
some tests in the future could cause such problem again vs if we should lower, 
the required change looks quite larger and this case might be not quite 
frequent. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to