Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/21698
  
    @jiangxb1987 Any closure sensitive to iteration order [1] is effected by 
this - under the set of circumstances.
    If we cannot solve it in a principled manner (make shuffle repeatable which 
I believe you have investigated and found to be difficult ?) - next best thing 
until we have a performant solution, would be to expose it to user's and have 
them deal with it (which is what I did, for example) - with hints on how to 
accomplish it.
    
    The proposed solution will cause cascading failures for non trivial 
applications (chain of shuffles) - and also introduce high cost - and can 
unfortunately cause application failures and unpredictable SLA's.
    
    
    [1] I gave example of zip* and sampling, but really - any user defined 
closure is affected; and we cannot special case for all of them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to