[ https://issues.apache.org/jira/browse/SPARK-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060267#comment-14060267 ]
Hans Uhlig commented on SPARK-2278: ----------------------------------- So I just checked with the current 1.0.0 api and JavaPairRDD implements the following. (There was no SortBy that I could find) JavaPairRDD<K,V> JavaPairRDD.sortByKey() JavaPairRDD<K,V> JavaPairRDD.sortByKey(Comparator comp) JavaPairRDD<K,V> JavaPairRDD.sortByKey(boolean ascending) JavaPairRDD<K,V> JavaPairRDD.sortByKey(Comparator comp, boolean ascending) JavaPairRDD<K,V> JavaPairRDD.sortByKey(Comparator comp, boolean ascending, int numPartitions) JavaPairRDD<K,Iterable<T>> JavaRDD.groupBy( groupBy(Function<T,K> arg0) ) JavaPairRDD<K,Iterable<T>> JavaRDD.groupBy( JavaPairRDD<K,Iterable<Tuple2<K,V>>> groupBy(Function<Tuple2<K,V>,K> f) ) JavaPairRDD<K,Iterable<T>> JavaRDD.groupBy( JavaPairRDD<K,Iterable<T>> groupBy(Function<T,K> arg0, int arg1) ) JavaPairRDD<K,Iterable<T>> JavaRDD.groupBy( JavaPairRDD<K,Iterable<Tuple2<K,V>>> groupBy(Function<Tuple2<K,V>,K> f, int numPartitions) ) JavaPairRDD.groupByKey() JavaPairRDD.groupByKey(Partitioner partitioner ) JavaPairRDD.groupByKey(int numPartitions ) The base non implied parameter functions should provide the following interfaces for optimum control and flexibility: JavaRDD<K,V> JavaRDD.sortBy(Comparator comp, boolean ascending, Partitioner partitioner, int numPartitions) JavaPairRDD<K,V> JavaPairRDD.sortByKey(Comparator comp, boolean ascending, Partitioner partitioner, int numPartitions) JavaRDD<K,Iterable<T>> JavaRDD.groupBy(JavaPairRDD<K,Iterable<T>> groupBy(Function<T,K> func()), Comparator comp, boolean ascending, Partitioner partitioner, int numPartitions) JavaPairRDD<K,Iterable<V>> JavaPairRDD.groupByKey( JavaPairRDD<K,Iterable<T>> groupBy(Function<T,K> func), Comparator comp, boolean ascending, Partitioner partitioner, int numPartitions) GroupByKey's function Reference should look something like "Iterable<O> Function<K,V,O> (K key, Iterable<V> values)" Unless there is a different function to do that particular job that I am missing. The lack of descriptions for what the inputs and outputs of the function references should do make that a bit difficult to discern sometimes. > groupBy & groupByKey should support custom comparator > ----------------------------------------------------- > > Key: SPARK-2278 > URL: https://issues.apache.org/jira/browse/SPARK-2278 > Project: Spark > Issue Type: New Feature > Components: Java API > Affects Versions: 1.0.0 > Reporter: Hans Uhlig > > To maintain parity with MapReduce you should be able to specify a custom key > equality function in groupBy/groupByKey similar to sortByKey. -- This message was sent by Atlassian JIRA (v6.2#6252)