Thanks Akhil and Mark. I can of course count events (assuming I can deduce
the shuffle boundaries), but like I said the program isn't simple and I'd
have to do this manually every time I change the code. So I rather find a
way of doing this automatically if possible.

On 4 February 2015 at 19:41, Mark Hamstra <m...@clearstorydata.com> wrote:

> But there isn't a 1-1 mapping from operations to stages since multiple
> operations will be pipelined into a single stage if no shuffle is
> required.  To determine the number of stages in a job you really need to be
> looking for shuffle boundaries.
>
> On Wed, Feb 4, 2015 at 11:27 AM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> You can easily understand the flow by looking at the number of operations
>> in your program (like map, groupBy, join etc.), first of all you list out
>> the number of operations happening in your application and then from the
>> webui you will be able to see how many operations have happened so far.
>>
>> Thanks
>> Best Regards
>>
>> On Wed, Feb 4, 2015 at 4:33 PM, Joe Wass <jw...@crossref.org> wrote:
>>
>>> I'm sitting here looking at my application crunching gigabytes of data
>>> on a cluster and I have no idea if it's an hour away from completion or a
>>> minute. The web UI shows progress through each stage, but not how many
>>> stages remaining. How can I work out how many stages my program will take
>>> automatically?
>>>
>>> My application has a slightly interesting DAG (re-use of functions that
>>> contain Spark transformations, persistent RDDs). Not that complex, but not
>>> 'step 1, step 2, step 3'.
>>>
>>> I'm guessing that if the driver program runs sequentially sending
>>> messages to Spark, then Spark has no knowledge of the structure of the
>>> driver program. Therefore it's necessary to execute it on a small test
>>> dataset and see how many stages result?
>>>
>>> When I set spark.eventLog.enabled = true and run on (very small) test
>>> data I don't get any stage messages in my STDOUT or in the log file. This
>>> is on a `local` instance.
>>>
>>> Did I miss something obvious?
>>>
>>> Thanks!
>>>
>>> Joe
>>>
>>
>>
>

Reply via email to