[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

mridulm Thu, 16 Aug 2018 19:26:22 -0700

Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/22112
  
    @tgravescs To understand better, are you suggesting that we do not support 
any api and/or user closure which depends on input order ?
    If yes, that would break not just repartition + shuffle, but also other 
publically exposed api in spark core and (my guess) non trivial aspects of 
mllib.
    
    Or is it that we support repartition and possibly a few other high priority 
cases (sampling in mllib for example ?) and not support the rest ?
    
    My (unproven) contention is that solution for repartition + shuffle would 
be a general solution (or very close to it) : which will then work for all 
other cases with suitable modifications as required.
    By "expand solution to cover all later.", I was referring to these changes 
to leverage whatever we build for repartition in other usecases- for example 
set appropriate parameters, etc in interest of time.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

Reply via email to