Thanks Akhil and Mark. I can of course count events (assuming I can deduce the shuffle boundaries), but like I said the program isn't simple and I'd have to do this manually every time I change the code. So I rather find a way of doing this automatically if possible.
On 4 February 2015 at 19:41, Mark Hamstra <m...@clearstorydata.com> wrote: > But there isn't a 1-1 mapping from operations to stages since multiple > operations will be pipelined into a single stage if no shuffle is > required. To determine the number of stages in a job you really need to be > looking for shuffle boundaries. > > On Wed, Feb 4, 2015 at 11:27 AM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> You can easily understand the flow by looking at the number of operations >> in your program (like map, groupBy, join etc.), first of all you list out >> the number of operations happening in your application and then from the >> webui you will be able to see how many operations have happened so far. >> >> Thanks >> Best Regards >> >> On Wed, Feb 4, 2015 at 4:33 PM, Joe Wass <jw...@crossref.org> wrote: >> >>> I'm sitting here looking at my application crunching gigabytes of data >>> on a cluster and I have no idea if it's an hour away from completion or a >>> minute. The web UI shows progress through each stage, but not how many >>> stages remaining. How can I work out how many stages my program will take >>> automatically? >>> >>> My application has a slightly interesting DAG (re-use of functions that >>> contain Spark transformations, persistent RDDs). Not that complex, but not >>> 'step 1, step 2, step 3'. >>> >>> I'm guessing that if the driver program runs sequentially sending >>> messages to Spark, then Spark has no knowledge of the structure of the >>> driver program. Therefore it's necessary to execute it on a small test >>> dataset and see how many stages result? >>> >>> When I set spark.eventLog.enabled = true and run on (very small) test >>> data I don't get any stage messages in my STDOUT or in the log file. This >>> is on a `local` instance. >>> >>> Did I miss something obvious? >>> >>> Thanks! >>> >>> Joe >>> >> >> >