I am trying to create new RDD based on given PairRDD. I have a PairRDD with
few keys but each keys have large (about 100k) values. I want to somehow
repartition, make each `Iterablev` into RDD[v] so that I can further
apply map, reduce, sortBy etc effectively on those values. I am sensing
flatMapValues is my friend but want to check with other sparkens. This is
for real-time spark app. I have already tried collect() and computing all
measures in-memory of app server but trying to improve upon it.
This is what I try (psuedo)
class ComputeMetrices{
transient JavaSparkContext sparkContext;
public MapString, V computeMetrices(JavaPairRdd javaPairRdd) {
javaPairRdd.groupByKey(10).mapValues(itr = {
sparContext.parallelize(list(itr)) //null pointer ; probably at
sparkContext
})
}
}
I want to create RDD out of that Iterable from groupByKey result so that I
can user further spark transformations.
Thanks
Nir
--
[image: What's New with Xactly] http://www.xactlycorp.com/email-click/
[image: Facebook] http://www.facebook.com/XactlyCorp [image: LinkedIn]
http://www.linkedin.com/company/xactly-corporation [image: Twitter]
https://twitter.com/xactly [image: YouTube]
http://www.youtube.com/xactlycorporation