I think Ningjun was looking for programmatic way of tracking progress.

I took a look at:
./core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala

but there doesn't seem to exist fine grained events directly reflecting
what Ningjun looks for.

On Tue, Feb 23, 2016 at 11:24 AM, Kevin Mellott <kevin.r.mell...@gmail.com>
wrote:

> Have you considered using the Spark Web UI to view progress on your job?
> It does a very good job showing the progress of the overall job, as well as
> allows you to drill into the individual tasks and server activity.
>
> On Tue, Feb 23, 2016 at 12:53 PM, Wang, Ningjun (LNG-NPV) <
> ningjun.w...@lexisnexis.com> wrote:
>
>> How can I get progress information of a RDD operation? For example
>>
>>
>>
>> *val *lines = sc.textFile(*"c:/temp/input.txt"*)  // a RDD of millions
>> of line
>> lines.foreach(line => {
>>         handleLine(line)
>>     })
>>
>> The input.txt contains millions of lines. The entire operation take 6
>> hours. I want to print out how many lines are processed every 1 minute so
>> user know the progress. How can I do that?
>>
>>
>>
>> One way I am thinking of is to use accumulator, e.g.
>>
>>
>>
>>
>>
>> *val *lines = sc.textFile(*"c:/temp/input.txt"*)
>> *val *acCount = sc.accumulator(0L)
>> lines.foreach(line => {
>>         handleLine(line)
>>         acCount += 1
>> }
>>
>> However how can I print out account every 1 minutes?
>>
>>
>>
>>
>>
>> Ningjun
>>
>>
>>
>
>

Reply via email to