Hi everybody,

I have and rdd that I want to group according to some key, but it just doesn't work. I am a Scala beginner. So I have the following RDD:

langs: List[String]

rdd: RDD[WikipediaArticle])

val meinVal = rdd.flatMap(article=>langs.map(lang=>{if (article.mentionsLanguage(lang){ Tuple2(lang,article)} else{None}})).filter(_!=None)

meinVal.collect.foreach(println) gives:

(Scala,WikipediaArticle(2,Scala and Java run on the JVM))
(Java,WikipediaArticle(2,Scala and Java run on the JVM))
(Scala,WikipediaArticle(3,Scala is not purely functional))

I have two questions:

1) Why can I not apply the groupByKey function? It is an rdd that contains a list of tuples, the first tuple-entry is the key.

2) I don't see how to  apply groupby either. I thought I could do meinVal.groupby(x=> x._1), but that trows an error.

I notice, that when I use an IDE and hover over "meinVal" it shows that it is RDD[Object] whereas it should be RDD[(String,WikipediaArticle)]. I do not know how to get this information without the IDE. So it seems that the rdd contains just one big object. I only don't see why that is.

Anyone? Please?


To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to