You appear to be misunderstanding the nature of a Stage.  Individual
transformation steps such as `map` do not define the boundaries of Stages.
Rather, a sequence of transformations in which there is only a
NarrowDependency between each of the transformations will be pipelined into
a single Stage.  It is only when there is a ShuffleDependency that a new
Stage will be defined -- i.e. shuffle boundaries define Stage boundaries.
With whole stage code gen in Spark 2.0, there will be even less opportunity
to treat individual transformations within a sequence of narrow
dependencies as though they were discrete, separable entities.  The Failed
Stages portion of the Web UI will tell you which Stage in a Job failed, and
the accompanying error log message will generally also give you some idea
of which Tasks failed and why.  Tracing the error back further and at a
different level of abstraction to lay blame on a particular transformation
wouldn't be particularly easy.

On Wed, May 25, 2016 at 5:28 PM, Nirav Patel <npa...@xactlycorp.com> wrote:

> It's great that spark scheduler does optimized DAG processing and only
> does lazy eval when some action is performed or shuffle dependency is
> encountered. Sometime it goes further after shuffle dep before executing
> anything. e.g. if there are map steps after shuffle then it doesn't stop at
> shuffle to execute anything but goes to that next map steps until it finds
> a reason(spark action) to execute. As a result stage that spark is running
> can be internally series of (map -> shuffle -> map -> map -> collect) and
> spark UI just shows its currently running 'collect' stage. SO  if job fails
> at that point spark UI just says Collect failed but in fact it could be any
> stage in that lazy chain of evaluation. Looking at executor logs gives some
> insights but that's not always straightforward.
> Correct me if I am wrong here but I think we need more visibility into
> what's happening underneath so we can easily troubleshoot as well as
> optimize our DAG.
>
> THanks
>
>
>
> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>
> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
> <https://twitter.com/Xactly>  [image: Facebook]
> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
> <http://www.youtube.com/xactlycorporation>

Reply via email to