println(parts(0)) does not solve the problem. It does not work

On Mon, Aug 25, 2014 at 1:30 PM, Sean Owen <so...@cloudera.com> wrote:

> On Mon, Aug 25, 2014 at 7:18 AM, Deep Pradhan <pradhandeep1...@gmail.com>
> wrote:
> > When I add
> >
> > parts(0).collect().foreach(println)
> >
> > parts(1).collect().foreach(println), for printing parts, I get the
> following
> > error
> >
> > not enough arguments for method collect: (pf:
> > PartialFunction[Char,B])(implicit
> > bf:scala.collection.generic.CanBuildFrom[String,B,That])That.Unspecified
> > value parameter pf.parts(0).collect().foreach(println)
>
> >>>     val links = lines.map{ s =>
> >>>       val parts = s.split("\\s+")
> >>>       (parts(0), parts(1))  /*I want to print this "parts"*/
> >>>     }.distinct().groupByKey().cache()
>
>
> Within this code, you are working in a simple Scala function. parts is
> an Array[String]. parts(0) is a String. You can just
> println(parts(0)). You are not calling RDD.collect() there, but
> collect() on a String a sequence of Char.
>
> However note that this will print the String on the worker that
> executes this, not the driver.
>
> Maybe you want to print the result right after this map function? Then
> break this into two statements and print the result of the first. You
> already are doing that in your code. A good formula is actually
> "take(10)" rather than "collect()" in case the RDD is huge.
>

Reply via email to