The operator you’re looking for is .flatMap. It flattens all the results if you 
have nested lists of results (e.g. A map over a source element can return zero 
or more target elements)
I’m not very familiar with the Java APIs but in scala it would go like this 
(keeping type annotations only as documentation):

def toBson(bean: ProductBean): BSONObject = { … }

val customerBeans: RDD[(Long, Seq[ProductBean])] = 
allBeans.groupBy(_.customerId)
val mongoObjects: RDD[BSONObject] = customerBeans.flatMap { case (id, beans) => 
beans.map(toBson) }

Hope this helps,
-adrian

From: Shams ul Haque
Date: Tuesday, October 27, 2015 at 12:50 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Separate all values from Iterable

Hi,


I have grouped all my customers in JavaPairRDD<Long, Iterable<ProductBean>> by 
there customerId (of Long type). Means every customerId have a List or 
ProductBean.

Now i want to save all ProductBean to DB irrespective of customerId. I got all 
values by using method
JavaRDD<Iterable<ProductBean>> values = custGroupRDD.values();

Now i want to convert JavaRDD<Iterable<ProductBean>> to JavaRDD<Object, 
BSONObject> so that i can save it to Mongo. Remember, every BSONObject is made 
of Single ProductBean.

I am not getting any idea of how to do this in Spark, i mean which Spark's 
Transformation is used to do that job. I think this task is some kind of 
seperate all values from Iterable. Please let me know how is this possible.
Any hint in Scala or Python are also ok.


Thanks

Shams

Reply via email to