Github user mridulm commented on the issue: https://github.com/apache/spark/pull/22112 @tgravescs To understand better, are you suggesting that we do not support any api and/or user closure which depends on input order ? If yes, that would break not just repartition + shuffle, but also other publically exposed api in spark core and (my guess) non trivial aspects of mllib. Or is it that we support repartition and possibly a few other high priority cases (sampling in mllib for example ?) and not support the rest ? My (unproven) contention is that solution for repartition + shuffle would be a general solution (or very close to it) : which will then work for all other cases with suitable modifications as required. By "expand solution to cover all later.", I was referring to these changes to leverage whatever we build for repartition in other usecases- for example set appropriate parameters, etc in interest of time.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org