The result of groupByKey is an RDD of K and Iterable[V]. The values
may in fact be ArrayBuffer[V] but this is not guaranteed to you by the
API. Printing it as a String would show it's an ArrayBuffer but not
(necessarily) as far as the types are concerned. And .distinct() is
not a method of Iterable, so that's why you have the error.

You can call Iterable.toSet() instead.

Also you can simplify this with a mapValues call instead, too.

On Mon, Aug 18, 2014 at 9:09 PM, SK <skrishna...@gmail.com> wrote:
> Hi,
>
> I have a piece of code in which the result of a  groupByKey operation is as
> follows:
>
> (2013-04, ArrayBuffer(s1, s2, s3, s1, s2, s4))
>
> The first element is a String value representing a date and the ArrayBuffer
> consists of (non-unique) strings. I want to extract the unique elements of
> the ArrayBuffer. So I am expecting the result to be:
>
> (2013-04, ArrayBuffer(s1, s2, s3, s4))
>
> I tried the following:
>   .groupByKey
>   .map(g => (g._1, g,_2.distinct)
>
> But I get the following  runtime error:
> value distinct is not a member of Iterable[String]
> [error]                    .map(g=> (g._1, g._2.distinct))
>
> I also  tried g._2.distinct(), but got the same error.
>
>
> I looked at the Scala ArrayBuffer documentation and it supports distinct()
> and count() operations.  I am using Spark 1.0.1 and scala 2.10.4.  I would
> like to know how to extract the unique elements of the ArrayBuffer above.
>
> thanks
>
>
>
>
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Extracting-unique-elements-of-an-ArrayBuffer-tp12320.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to