Re: How many stages in my application?

2015-02-05 Thread Kostas Sakellis
Yes, there is no way right now to know how many stages a job will generate automatically. Like Mark said, RDD#toDebugString will give you some info about the RDD DAG and from that you can determine based on the dependency types (Wide vs. narrow) if there is a stage boundary. On Thu, Feb 5, 2015

Re: How many stages in my application?

2015-02-05 Thread Joe Wass
Thanks Akhil and Mark. I can of course count events (assuming I can deduce the shuffle boundaries), but like I said the program isn't simple and I'd have to do this manually every time I change the code. So I rather find a way of doing this automatically if possible. On 4 February 2015 at 19:41,

Re: How many stages in my application?

2015-02-05 Thread Mark Hamstra
RDD#toDebugString will help. On Thu, Feb 5, 2015 at 1:15 AM, Joe Wass jw...@crossref.org wrote: Thanks Akhil and Mark. I can of course count events (assuming I can deduce the shuffle boundaries), but like I said the program isn't simple and I'd have to do this manually every time I change the

Re: How many stages in my application?

2015-02-05 Thread Mark Hamstra
And the Job page of the web UI will give you an idea of stages completed out of the total number of stages for the job. That same information is also available as JSON. Statically determining how many stages a job logically comprises is one thing, but dynamically determining how many stages

Re: How many stages in my application?

2015-02-04 Thread Akhil Das
You can easily understand the flow by looking at the number of operations in your program (like map, groupBy, join etc.), first of all you list out the number of operations happening in your application and then from the webui you will be able to see how many operations have happened so far.

Re: How many stages in my application?

2015-02-04 Thread Mark Hamstra
But there isn't a 1-1 mapping from operations to stages since multiple operations will be pipelined into a single stage if no shuffle is required. To determine the number of stages in a job you really need to be looking for shuffle boundaries. On Wed, Feb 4, 2015 at 11:27 AM, Akhil Das

How many stages in my application?

2015-02-04 Thread Joe Wass
I'm sitting here looking at my application crunching gigabytes of data on a cluster and I have no idea if it's an hour away from completion or a minute. The web UI shows progress through each stage, but not how many stages remaining. How can I work out how many stages my program will take