[ 
https://issues.apache.org/jira/browse/SPARK-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060718#comment-14060718
 ] 

Sean Owen commented on SPARK-2278:
----------------------------------

groupBy / groupByKey vs collectBy / collectByKey?

(This is kind of a tangent, but a Comparator is not ideal here. Really the 
requirements is to define key equality differently, and that can be defined as 
"when compare() == 0", although ordering is not needed. But yes Comparator gets 
used this way in Java.)

In JavaRDD, you define a function of the key which yields a value whose 
equality matches how you want to group. Again given the hypothetical Employee, 
grouping by name:

{code}
new Function<Employee,String>() {
  public String call(Employee e) {
    return e.getName();
  }
}
{code}

There's not a copy here. This may not have been what you had in mind though. 
For Java it would have been:

{code}
new Comparator<Employee>() {
  public int compare(Employee e1, Employee e2) {
    return e1.getName().compareTo(e2.getName());
  }
}
{code}

That's the equivalent. Although the disadvantage I see right now in the JavaRDD 
is you can't further define the ordering you want, in cases like sortBy, where 
the right ordering isn't the natural ordering of some function of the values.

What is the role of func vs comp in your example for groupBy (?) though?

> groupBy & groupByKey should support custom comparator
> -----------------------------------------------------
>
>                 Key: SPARK-2278
>                 URL: https://issues.apache.org/jira/browse/SPARK-2278
>             Project: Spark
>          Issue Type: New Feature
>          Components: Java API
>    Affects Versions: 1.0.0
>            Reporter: Hans Uhlig
>
> To maintain parity with MapReduce you should be able to specify a custom key 
> equality function in groupBy/groupByKey similar to sortByKey. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to