Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3632#issuecomment-68069421 The reason for separate classes is to cleanly segregate the available/supportable functionality. Not every `PairRDD` has keys that can be ordered, so `sortByKey` shouldn't be part of `PairRDD`. When keys can be ordered, there is often a natural ordering that is already implicitly in scope. When that is true, then we don't want to force the user to explicitly provide an `Ordering` -- e.g. if you have an `RDD[Int, Foo]`, then rdd.sortByKey() should just work. If you want a different Ordering, then you just need to bring a new implicit Ordering for that key type into scope. Things aren't as cleanly separated in the Java API because of the lack of support for implicits there, but that doesn't mean that we should abandon the separation between `PairRDD` and `OrderedRDD` on the Scala side or start dirtying-up `PairRDD.scala` when we want to provide new methods for RDDs whose keys and values can both be ordered. I really think that we want to repeat the pattern of `OrderedRDD` for these `DoublyOrderedRDD` -- or whatever better name you can come up with. The biggest quirk I can see right now is if the types of both keys and values are the same but you want to order them one way when sorting by key and a different way when doing the secondary sort on values. That won't work with implicits since there can only be one implicit `Ordering` for the type in scope at a time. The problem could either be avoided by using distinct types for the key and value roles, or a method signature with explicit orderings could be added to address this corner case.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org