groupByKey does not run a combiner so be careful about the performance...groupByKey does shuffle even for local groups...
reduceByKey and aggregateByKey does run a combiner but if you want a separate function for each key, you can have a key to closure map that you can broadcast and use it in reduceByKey if you have access to the key in reduceByKey/aggregateByKey... I did not have the need to access the key in reduceByKey/aggregateByKey yet but there should be a way... On Tue, Nov 18, 2014 at 7:24 AM, Yanbo <yanboha...@gmail.com> wrote: > First use groupByKey(), you get a tuple RDD with > (key:K,value:ArrayBuffer[V]). > Then use map() on this RDD with a function has different operations > depending on the key which act as a parameter of this function. > > > > 在 2014年11月18日,下午8:59,jelgh <johannes.e...@gmail.com> 写道: > > > > Hello everyone, > > > > I'm new to Spark and I have the following problem: > > > > I have this large JavaRDD<MyClass> collection, which I group with by > > creating a hashcode from some fields in MyClass: > > > > JavaRDD<MyClass> collection = ...; > > JavaPairRDD<Integer, Iterable<MyClass>> grouped = > > collection.groupBy(...); // the group-function is just creating a > hashcode > > from some fields in MyClass. > > > > Now I want to reduce the variable grouped. However, I want to reduce it > with > > different functions depending on the key in the JavaPairRDD. So > basically a > > reduceByKey but with multiple functions. > > > > Only solution I've come up with is by filtering grouped for each reduce > > function and apply it on the filtered subsets. This feels kinda hackish > > though. > > > > Is there a better way? > > > > Best regards, > > Johannes > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/ReduceByKey-but-with-different-functions-depending-on-key-tp19177.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >