I think Ningjun was looking for programmatic way of tracking progress. I took a look at: ./core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala
but there doesn't seem to exist fine grained events directly reflecting what Ningjun looks for. On Tue, Feb 23, 2016 at 11:24 AM, Kevin Mellott <kevin.r.mell...@gmail.com> wrote: > Have you considered using the Spark Web UI to view progress on your job? > It does a very good job showing the progress of the overall job, as well as > allows you to drill into the individual tasks and server activity. > > On Tue, Feb 23, 2016 at 12:53 PM, Wang, Ningjun (LNG-NPV) < > ningjun.w...@lexisnexis.com> wrote: > >> How can I get progress information of a RDD operation? For example >> >> >> >> *val *lines = sc.textFile(*"c:/temp/input.txt"*) // a RDD of millions >> of line >> lines.foreach(line => { >> handleLine(line) >> }) >> >> The input.txt contains millions of lines. The entire operation take 6 >> hours. I want to print out how many lines are processed every 1 minute so >> user know the progress. How can I do that? >> >> >> >> One way I am thinking of is to use accumulator, e.g. >> >> >> >> >> >> *val *lines = sc.textFile(*"c:/temp/input.txt"*) >> *val *acCount = sc.accumulator(0L) >> lines.foreach(line => { >> handleLine(line) >> acCount += 1 >> } >> >> However how can I print out account every 1 minutes? >> >> >> >> >> >> Ningjun >> >> >> > >