How can I get progress information of a RDD operation? For example val lines = sc.textFile("c:/temp/input.txt") // a RDD of millions of line lines.foreach(line => { handleLine(line) })
The input.txt contains millions of lines. The entire operation take 6 hours. I want to print out how many lines are processed every 1 minute so user know the progress. How can I do that? One way I am thinking of is to use accumulator, e.g. val lines = sc.textFile("c:/temp/input.txt") val acCount = sc.accumulator(0L) lines.foreach(line => { handleLine(line) acCount += 1 } However how can I print out account every 1 minutes? Ningjun