For the record, the solution I was suggesting was about like this: inputRDD.flatMap { input => val tokens = input.split(',') val id = tokens(0) val keyValuePairs = tokens.tail.grouped(2) val keys = keyValuePairs.map(_(0)) keys.map(key => (id, key)) }
This is much more efficient. On Wed, Dec 31, 2014 at 3:46 PM, Sean Owen <so...@cloudera.com> wrote: > From the clarification below, the problem is that you are calling > flatMapValues, which is only available on an RDD of key-value tuples. > Your map function returns a tuple in one case but a String in the > other, so your RDD is a bunch of Any, which is not at all what you > want. You need to return a tuple in both cases, which is what Kapil > pointed out. > > However it's still not quite what you want. Your input is basically > [key value1 value2 value3] so you want to flatMap that to (key,value1) > (key,value2) (key,value3). flatMapValues does not come into play. > > On Wed, Dec 31, 2014 at 3:25 PM, Sanjay Subramanian > <sanjaysubraman...@yahoo.com> wrote: >> My understanding is as follows >> >> STEP 1 (This would create a pair RDD) >> ======= >> >> reacRdd.map(line => line.split(',')).map(fields => { >> if (fields.length >= 11 && !fields(0).contains("VAERS_ID")) { >> >> (fields(0),(fields(1)+"\t"+fields(3)+"\t"+fields(5)+"\t"+fields(7)+"\t"+fields(9))) >> } >> else { >> "" >> } >> }) >> >> STEP 2 >> ======= >> Since previous step created a pair RDD, I thought flatMapValues method will >> be applicable. >> But the code does not even compile saying that flatMapValues is not >> applicable to RDD :-( >> >> >> reacRdd.map(line => line.split(',')).map(fields => { >> if (fields.length >= 11 && !fields(0).contains("VAERS_ID")) { >> >> (fields(0),(fields(1)+"\t"+fields(3)+"\t"+fields(5)+"\t"+fields(7)+"\t"+fields(9))) >> } >> else { >> "" >> } >> }).flatMapValues(skus => >> skus.split('\t')).saveAsTextFile("/data/vaers/msfx/reac/" + outFile) >> >> >> SUMMARY >> ======= >> when a dataset looks like the following >> >> 1,red,blue,green >> 2,yellow,violet,pink >> >> I want to output the following and I am asking how do I do that ? Perhaps my >> code is 100% wrong. Please correct me and educate me :-) >> >> 1,red >> 1,blue >> 1,green >> 2,yellow >> 2,violet >> 2,pink --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org