Re: Printing the RDDs in SparkPageRank

2014-08-26 Thread Deep Pradhan
println(parts(0)) does not solve the problem. It does not work


On Mon, Aug 25, 2014 at 1:30 PM, Sean Owen so...@cloudera.com wrote:

 On Mon, Aug 25, 2014 at 7:18 AM, Deep Pradhan pradhandeep1...@gmail.com
 wrote:
  When I add
 
  parts(0).collect().foreach(println)
 
  parts(1).collect().foreach(println), for printing parts, I get the
 following
  error
 
  not enough arguments for method collect: (pf:
  PartialFunction[Char,B])(implicit
  bf:scala.collection.generic.CanBuildFrom[String,B,That])That.Unspecified
  value parameter pf.parts(0).collect().foreach(println)

  val links = lines.map{ s =
val parts = s.split(\\s+)
(parts(0), parts(1))  /*I want to print this parts*/
  }.distinct().groupByKey().cache()


 Within this code, you are working in a simple Scala function. parts is
 an Array[String]. parts(0) is a String. You can just
 println(parts(0)). You are not calling RDD.collect() there, but
 collect() on a String a sequence of Char.

 However note that this will print the String on the worker that
 executes this, not the driver.

 Maybe you want to print the result right after this map function? Then
 break this into two statements and print the result of the first. You
 already are doing that in your code. A good formula is actually
 take(10) rather than collect() in case the RDD is huge.



Re: Printing the RDDs in SparkPageRank

2014-08-25 Thread Deep Pradhan
When I add

parts(0).collect().foreach(println)

parts(1).collect().foreach(println), for printing parts, I get the
following error

*not enough arguments for method collect: (pf:
PartialFunction[Char,B])(implicit
bf:scala.collection.generic.CanBuildFrom[String,B,That])That.Unspecified
value parameter pf.parts(0).collect().foreach(println)*


 And, when I add
parts.collect().foreach(println), I get the following error

*not enough arguments for method collect: (pf:
PartialFunction[String,B])(implicit bf:
scala.collection.generic.CanBuildFrom[Array[String],B,That])That.Unspecified
value parameter pf.parts.collect().foreach(println) *


On Sun, Aug 24, 2014 at 8:27 PM, Jörn Franke jornfra...@gmail.com wrote:

 Hi,

 What kind of error do you receive?

 Best regards,

 Jörn
 Le 24 août 2014 08:29, Deep Pradhan pradhandeep1...@gmail.com a écrit
 :

 Hi,
 I was going through the SparkPageRank code and want to see the
 intermediate steps, like the RDDs formed in the intermediate steps.
 Here is a part of the code along with the lines that I added in order to
 print the RDDs.
 I want to print the *parts* in the code (denoted by the comment in
 Bold letters). But, when I try to do the same thing there, it gives an
 error.
 Can someone suggest what I should be doing?
 Thank You

 CODE:

 object SparkPageRank {
   def main(args: Array[String]) {
 val sparkConf = new SparkConf().setAppName(PageRank)
 var iters = args(1).toInt
 val ctx = new SparkContext(sparkConf)
 val lines = ctx.textFile(args(0), 1)
 println(The lines RDD is)
  lines.collect().foreach(println)
 val links = lines.map{ s =
   val parts = s.split(\\s+)
   (parts(0), parts(1))  */*I want to print this parts*/*
 }.distinct().groupByKey().cache()
 println(The links RDD is)
 links.collect().foreach(println)
 var ranks = links.mapValues(v = 1.0)
 println(The ranks RDD is)
 ranks.collect().foreach(println)
 for (i - 1 to iters) {
   val contribs = links.join(ranks).values.flatMap{ case (urls, rank)
 =
 val size = urls.size
 urls.map(url = (url, rank / size))
   }
 println(The contribs RDD is)
   contribs.collect().foreach(println)
   ranks = contribs.reduceByKey(_ + _).mapValues(0.15 + 0.85 * _)
 }
 println(The second ranks RDD is)
ranks.collect().foreach(println)

 val output = ranks.collect()
 output.foreach(tup = println(tup._1 +  has rank:  + tup._2 + .))

 ctx.stop()
   }
 }






Re: Printing the RDDs in SparkPageRank

2014-08-25 Thread Sean Owen
On Mon, Aug 25, 2014 at 7:18 AM, Deep Pradhan pradhandeep1...@gmail.com wrote:
 When I add

 parts(0).collect().foreach(println)

 parts(1).collect().foreach(println), for printing parts, I get the following
 error

 not enough arguments for method collect: (pf:
 PartialFunction[Char,B])(implicit
 bf:scala.collection.generic.CanBuildFrom[String,B,That])That.Unspecified
 value parameter pf.parts(0).collect().foreach(println)

 val links = lines.map{ s =
   val parts = s.split(\\s+)
   (parts(0), parts(1))  /*I want to print this parts*/
 }.distinct().groupByKey().cache()


Within this code, you are working in a simple Scala function. parts is
an Array[String]. parts(0) is a String. You can just
println(parts(0)). You are not calling RDD.collect() there, but
collect() on a String a sequence of Char.

However note that this will print the String on the worker that
executes this, not the driver.

Maybe you want to print the result right after this map function? Then
break this into two statements and print the result of the first. You
already are doing that in your code. A good formula is actually
take(10) rather than collect() in case the RDD is huge.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Printing the RDDs in SparkPageRank

2014-08-24 Thread Jörn Franke
Hi,

What kind of error do you receive?

Best regards,

Jörn
Le 24 août 2014 08:29, Deep Pradhan pradhandeep1...@gmail.com a écrit :

 Hi,
 I was going through the SparkPageRank code and want to see the
 intermediate steps, like the RDDs formed in the intermediate steps.
 Here is a part of the code along with the lines that I added in order to
 print the RDDs.
 I want to print the *parts* in the code (denoted by the comment in Bold
 letters). But, when I try to do the same thing there, it gives an error.
 Can someone suggest what I should be doing?
 Thank You

 CODE:

 object SparkPageRank {
   def main(args: Array[String]) {
 val sparkConf = new SparkConf().setAppName(PageRank)
 var iters = args(1).toInt
 val ctx = new SparkContext(sparkConf)
 val lines = ctx.textFile(args(0), 1)
 println(The lines RDD is)
  lines.collect().foreach(println)
 val links = lines.map{ s =
   val parts = s.split(\\s+)
   (parts(0), parts(1))  */*I want to print this parts*/*
 }.distinct().groupByKey().cache()
 println(The links RDD is)
 links.collect().foreach(println)
 var ranks = links.mapValues(v = 1.0)
 println(The ranks RDD is)
 ranks.collect().foreach(println)
 for (i - 1 to iters) {
   val contribs = links.join(ranks).values.flatMap{ case (urls, rank) =
 val size = urls.size
 urls.map(url = (url, rank / size))
   }
 println(The contribs RDD is)
   contribs.collect().foreach(println)
   ranks = contribs.reduceByKey(_ + _).mapValues(0.15 + 0.85 * _)
 }
 println(The second ranks RDD is)
ranks.collect().foreach(println)

 val output = ranks.collect()
 output.foreach(tup = println(tup._1 +  has rank:  + tup._2 + .))

 ctx.stop()
   }
 }