RE: How to get progress information of an RDD operation

2016-02-24 Thread Wang, Ningjun (LNG-NPV)
? Ningjun From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, February 23, 2016 2:30 PM To: Kevin Mellott Cc: Wang, Ningjun (LNG-NPV); user@spark.apache.org Subject: Re: How to get progress information of an RDD operation I think Ningjun was looking for programmatic way of tracking progress. I took

Re: How to get progress information of an RDD operation

2016-02-23 Thread Ted Yu
I think Ningjun was looking for programmatic way of tracking progress. I took a look at: ./core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala but there doesn't seem to exist fine grained events directly reflecting what Ningjun looks for. On Tue, Feb 23, 2016 at 11:24 AM, Kevin

Re: How to get progress information of an RDD operation

2016-02-23 Thread Kevin Mellott
Have you considered using the Spark Web UI to view progress on your job? It does a very good job showing the progress of the overall job, as well as allows you to drill into the individual tasks and server activity. On Tue, Feb 23, 2016 at 12:53 PM, Wang, Ningjun (LNG-NPV) <

How to get progress information of an RDD operation

2016-02-23 Thread Wang, Ningjun (LNG-NPV)
How can I get progress information of a RDD operation? For example val lines = sc.textFile("c:/temp/input.txt") // a RDD of millions of line lines.foreach(line => { handleLine(line) }) The input.txt contains millions of lines. The entire operation take 6 hours. I want to print out