Hi everybody,
I have and rdd that I want to group according to some key, but it just
doesn't work. I am a Scala beginner. So I have the following RDD:
langs: List[String]
rdd: RDD[WikipediaArticle])
val meinVal = rdd.flatMap(article=>langs.map(lang=>{if
(article.mentionsLanguage(lang){ Tuple2(lang,article)}
else{None}})).filter(_!=None)
meinVal.collect.foreach(println) gives:
(Scala,WikipediaArticle(2,Scala and Java run on the JVM))
(Java,WikipediaArticle(2,Scala and Java run on the JVM))
(Scala,WikipediaArticle(3,Scala is not purely functional))
I have two questions:
1) Why can I not apply the groupByKey function? It is an rdd that
contains a list of tuples, the first tuple-entry is the key.
2) I don't see how to apply groupby either. I thought I could do
meinVal.groupby(x=> x._1), but that trows an error.
I notice, that when I use an IDE and hover over "meinVal" it shows that
it is RDD[Object] whereas it should be RDD[(String,WikipediaArticle)]. I
do not know how to get this information without the IDE. So it seems
that the rdd contains just one big object. I only don't see why that is.
Anyone? Please?
Irene
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org