println(parts(0)) does not solve the problem. It does not work
On Mon, Aug 25, 2014 at 1:30 PM, Sean Owen <so...@cloudera.com> wrote: > On Mon, Aug 25, 2014 at 7:18 AM, Deep Pradhan <pradhandeep1...@gmail.com> > wrote: > > When I add > > > > parts(0).collect().foreach(println) > > > > parts(1).collect().foreach(println), for printing parts, I get the > following > > error > > > > not enough arguments for method collect: (pf: > > PartialFunction[Char,B])(implicit > > bf:scala.collection.generic.CanBuildFrom[String,B,That])That.Unspecified > > value parameter pf.parts(0).collect().foreach(println) > > >>> val links = lines.map{ s => > >>> val parts = s.split("\\s+") > >>> (parts(0), parts(1)) /*I want to print this "parts"*/ > >>> }.distinct().groupByKey().cache() > > > Within this code, you are working in a simple Scala function. parts is > an Array[String]. parts(0) is a String. You can just > println(parts(0)). You are not calling RDD.collect() there, but > collect() on a String a sequence of Char. > > However note that this will print the String on the worker that > executes this, not the driver. > > Maybe you want to print the result right after this map function? Then > break this into two statements and print the result of the first. You > already are doing that in your code. A good formula is actually > "take(10)" rather than "collect()" in case the RDD is huge. >