You appear to be misunderstanding the nature of a Stage. Individual transformation steps such as `map` do not define the boundaries of Stages. Rather, a sequence of transformations in which there is only a NarrowDependency between each of the transformations will be pipelined into a single Stage. It is only when there is a ShuffleDependency that a new Stage will be defined -- i.e. shuffle boundaries define Stage boundaries. With whole stage code gen in Spark 2.0, there will be even less opportunity to treat individual transformations within a sequence of narrow dependencies as though they were discrete, separable entities. The Failed Stages portion of the Web UI will tell you which Stage in a Job failed, and the accompanying error log message will generally also give you some idea of which Tasks failed and why. Tracing the error back further and at a different level of abstraction to lay blame on a particular transformation wouldn't be particularly easy.
On Wed, May 25, 2016 at 5:28 PM, Nirav Patel <npa...@xactlycorp.com> wrote: > It's great that spark scheduler does optimized DAG processing and only > does lazy eval when some action is performed or shuffle dependency is > encountered. Sometime it goes further after shuffle dep before executing > anything. e.g. if there are map steps after shuffle then it doesn't stop at > shuffle to execute anything but goes to that next map steps until it finds > a reason(spark action) to execute. As a result stage that spark is running > can be internally series of (map -> shuffle -> map -> map -> collect) and > spark UI just shows its currently running 'collect' stage. SO if job fails > at that point spark UI just says Collect failed but in fact it could be any > stage in that lazy chain of evaluation. Looking at executor logs gives some > insights but that's not always straightforward. > Correct me if I am wrong here but I think we need more visibility into > what's happening underneath so we can easily troubleshoot as well as > optimize our DAG. > > THanks > > > > [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> > > <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] > <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] > <https://twitter.com/Xactly> [image: Facebook] > <https://www.facebook.com/XactlyCorp> [image: YouTube] > <http://www.youtube.com/xactlycorporation>