How can I get progress information of a RDD operation? For example

val lines = sc.textFile("c:/temp/input.txt")  // a RDD of millions of line
lines.foreach(line => {
        handleLine(line)
    })

The input.txt contains millions of lines. The entire operation take 6 hours. I 
want to print out how many lines are processed every 1 minute so user know the 
progress. How can I do that?

One way I am thinking of is to use accumulator, e.g.



val lines = sc.textFile("c:/temp/input.txt")
val acCount = sc.accumulator(0L)
lines.foreach(line => {
        handleLine(line)
        acCount += 1
}


However how can I print out account every 1 minutes?



Ningjun

Reply via email to