Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/3632#issuecomment-68069421
  
    The reason for separate classes is to cleanly segregate the 
available/supportable functionality.  Not every `PairRDD` has keys that can be 
ordered, so `sortByKey` shouldn't be part of `PairRDD`.  When keys can be 
ordered, there is often a natural ordering that is already implicitly in scope. 
 When that is true, then we don't want to force the user to explicitly provide 
an `Ordering` -- e.g. if you have an `RDD[Int, Foo]`, then rdd.sortByKey() 
should just work.  If you want a different Ordering, then you just need to 
bring a new implicit Ordering for that key type into scope.
    
    Things aren't as cleanly separated in the Java API because of the lack of 
support for implicits there, but that doesn't mean that we should abandon the 
separation between `PairRDD` and `OrderedRDD` on the Scala side or start 
dirtying-up `PairRDD.scala` when we want to provide new methods for RDDs whose 
keys and values can both be ordered.
    
    I really think that we want to repeat the pattern of `OrderedRDD` for these 
`DoublyOrderedRDD` -- or whatever better name you can come up with.  The 
biggest quirk I can see right now is if the types of both keys and values are 
the same but you want to order them one way when sorting by key and a different 
way when doing the secondary sort on values.  That won't work with implicits 
since there can only be one implicit `Ordering` for the type in scope at a 
time.  The problem could either be avoided by using distinct types for the key 
and value roles, or a method signature with explicit orderings could be added 
to address this corner case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to