The result of groupByKey is an RDD of K and Iterable[V]. The values may in fact be ArrayBuffer[V] but this is not guaranteed to you by the API. Printing it as a String would show it's an ArrayBuffer but not (necessarily) as far as the types are concerned. And .distinct() is not a method of Iterable, so that's why you have the error.
You can call Iterable.toSet() instead. Also you can simplify this with a mapValues call instead, too. On Mon, Aug 18, 2014 at 9:09 PM, SK <skrishna...@gmail.com> wrote: > Hi, > > I have a piece of code in which the result of a groupByKey operation is as > follows: > > (2013-04, ArrayBuffer(s1, s2, s3, s1, s2, s4)) > > The first element is a String value representing a date and the ArrayBuffer > consists of (non-unique) strings. I want to extract the unique elements of > the ArrayBuffer. So I am expecting the result to be: > > (2013-04, ArrayBuffer(s1, s2, s3, s4)) > > I tried the following: > .groupByKey > .map(g => (g._1, g,_2.distinct) > > But I get the following runtime error: > value distinct is not a member of Iterable[String] > [error] .map(g=> (g._1, g._2.distinct)) > > I also tried g._2.distinct(), but got the same error. > > > I looked at the Scala ArrayBuffer documentation and it supports distinct() > and count() operations. I am using Spark 1.0.1 and scala 2.10.4. I would > like to know how to extract the unique elements of the ArrayBuffer above. > > thanks > > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Extracting-unique-elements-of-an-ArrayBuffer-tp12320.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org