Re: Printing the RDDs in SparkPageRank
println(parts(0)) does not solve the problem. It does not work On Mon, Aug 25, 2014 at 1:30 PM, Sean Owen so...@cloudera.com wrote: On Mon, Aug 25, 2014 at 7:18 AM, Deep Pradhan pradhandeep1...@gmail.com wrote: When I add parts(0).collect().foreach(println) parts(1).collect().foreach(println), for printing parts, I get the following error not enough arguments for method collect: (pf: PartialFunction[Char,B])(implicit bf:scala.collection.generic.CanBuildFrom[String,B,That])That.Unspecified value parameter pf.parts(0).collect().foreach(println) val links = lines.map{ s = val parts = s.split(\\s+) (parts(0), parts(1)) /*I want to print this parts*/ }.distinct().groupByKey().cache() Within this code, you are working in a simple Scala function. parts is an Array[String]. parts(0) is a String. You can just println(parts(0)). You are not calling RDD.collect() there, but collect() on a String a sequence of Char. However note that this will print the String on the worker that executes this, not the driver. Maybe you want to print the result right after this map function? Then break this into two statements and print the result of the first. You already are doing that in your code. A good formula is actually take(10) rather than collect() in case the RDD is huge.
Re: Printing the RDDs in SparkPageRank
When I add parts(0).collect().foreach(println) parts(1).collect().foreach(println), for printing parts, I get the following error *not enough arguments for method collect: (pf: PartialFunction[Char,B])(implicit bf:scala.collection.generic.CanBuildFrom[String,B,That])That.Unspecified value parameter pf.parts(0).collect().foreach(println)* And, when I add parts.collect().foreach(println), I get the following error *not enough arguments for method collect: (pf: PartialFunction[String,B])(implicit bf: scala.collection.generic.CanBuildFrom[Array[String],B,That])That.Unspecified value parameter pf.parts.collect().foreach(println) * On Sun, Aug 24, 2014 at 8:27 PM, Jörn Franke jornfra...@gmail.com wrote: Hi, What kind of error do you receive? Best regards, Jörn Le 24 août 2014 08:29, Deep Pradhan pradhandeep1...@gmail.com a écrit : Hi, I was going through the SparkPageRank code and want to see the intermediate steps, like the RDDs formed in the intermediate steps. Here is a part of the code along with the lines that I added in order to print the RDDs. I want to print the *parts* in the code (denoted by the comment in Bold letters). But, when I try to do the same thing there, it gives an error. Can someone suggest what I should be doing? Thank You CODE: object SparkPageRank { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(PageRank) var iters = args(1).toInt val ctx = new SparkContext(sparkConf) val lines = ctx.textFile(args(0), 1) println(The lines RDD is) lines.collect().foreach(println) val links = lines.map{ s = val parts = s.split(\\s+) (parts(0), parts(1)) */*I want to print this parts*/* }.distinct().groupByKey().cache() println(The links RDD is) links.collect().foreach(println) var ranks = links.mapValues(v = 1.0) println(The ranks RDD is) ranks.collect().foreach(println) for (i - 1 to iters) { val contribs = links.join(ranks).values.flatMap{ case (urls, rank) = val size = urls.size urls.map(url = (url, rank / size)) } println(The contribs RDD is) contribs.collect().foreach(println) ranks = contribs.reduceByKey(_ + _).mapValues(0.15 + 0.85 * _) } println(The second ranks RDD is) ranks.collect().foreach(println) val output = ranks.collect() output.foreach(tup = println(tup._1 + has rank: + tup._2 + .)) ctx.stop() } }
Re: Printing the RDDs in SparkPageRank
On Mon, Aug 25, 2014 at 7:18 AM, Deep Pradhan pradhandeep1...@gmail.com wrote: When I add parts(0).collect().foreach(println) parts(1).collect().foreach(println), for printing parts, I get the following error not enough arguments for method collect: (pf: PartialFunction[Char,B])(implicit bf:scala.collection.generic.CanBuildFrom[String,B,That])That.Unspecified value parameter pf.parts(0).collect().foreach(println) val links = lines.map{ s = val parts = s.split(\\s+) (parts(0), parts(1)) /*I want to print this parts*/ }.distinct().groupByKey().cache() Within this code, you are working in a simple Scala function. parts is an Array[String]. parts(0) is a String. You can just println(parts(0)). You are not calling RDD.collect() there, but collect() on a String a sequence of Char. However note that this will print the String on the worker that executes this, not the driver. Maybe you want to print the result right after this map function? Then break this into two statements and print the result of the first. You already are doing that in your code. A good formula is actually take(10) rather than collect() in case the RDD is huge. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Printing the RDDs in SparkPageRank
Hi, What kind of error do you receive? Best regards, Jörn Le 24 août 2014 08:29, Deep Pradhan pradhandeep1...@gmail.com a écrit : Hi, I was going through the SparkPageRank code and want to see the intermediate steps, like the RDDs formed in the intermediate steps. Here is a part of the code along with the lines that I added in order to print the RDDs. I want to print the *parts* in the code (denoted by the comment in Bold letters). But, when I try to do the same thing there, it gives an error. Can someone suggest what I should be doing? Thank You CODE: object SparkPageRank { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(PageRank) var iters = args(1).toInt val ctx = new SparkContext(sparkConf) val lines = ctx.textFile(args(0), 1) println(The lines RDD is) lines.collect().foreach(println) val links = lines.map{ s = val parts = s.split(\\s+) (parts(0), parts(1)) */*I want to print this parts*/* }.distinct().groupByKey().cache() println(The links RDD is) links.collect().foreach(println) var ranks = links.mapValues(v = 1.0) println(The ranks RDD is) ranks.collect().foreach(println) for (i - 1 to iters) { val contribs = links.join(ranks).values.flatMap{ case (urls, rank) = val size = urls.size urls.map(url = (url, rank / size)) } println(The contribs RDD is) contribs.collect().foreach(println) ranks = contribs.reduceByKey(_ + _).mapValues(0.15 + 0.85 * _) } println(The second ranks RDD is) ranks.collect().foreach(println) val output = ranks.collect() output.foreach(tup = println(tup._1 + has rank: + tup._2 + .)) ctx.stop() } }