[ 
https://issues.apache.org/jira/browse/TEZ-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000346#comment-14000346
 ] 

Bikas Saha commented on TEZ-1117:
---------------------------------

Please bear with me while I play the devil's advocate. What do admins (or folks 
who monitor that the cluster is doing well) do when a bunch of jobs show up as 
killed or failed on the aggregate YARN job status page when actually things are 
much better. In my previous team, there was a watchdog that would fire alerts 
when it saw 5 back-to-back job failures.
Its interesting that Pig does things this way. Does Pig specifically not prefer 
the fire and forget model? How do folks run regularly scheduled (cron like) 
jobs? By keeping a client alive? What use case is that targeting? Is that by 
design or a compromise? E.g. in my previous project, the users could choose to 
get an email when their job completed. So they did not have to poll or block on 
anything.

I can see supporting the thing that you are not asking for :) Which is 
terminating the session when a dag fails. So if a pipeline of dags is being 
submitted to the session then the session fails if a member of that pipeline 
fails. So a session maps to a pipeline. Not an intended use case for sessions 
but not that far off.

Btw, in which cases does Pig need to create a pipeline of dags and submit them 
instead of just creating a single DAG? Is there something we can do in Tez to 
make that work? What kind of dependency are we not able to express?

FYI, we are going to merge the session and non-session submission code path in 
TEZ-692. So the users can write the same code and choose to go session mode or 
non-session mode as appropriate.

> Option to make YARN application failed on dag failure
> -----------------------------------------------------
>
>                 Key: TEZ-1117
>                 URL: https://issues.apache.org/jira/browse/TEZ-1117
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>
> Can we have an configuration to make the Application status FAILED on 
> termination if one of the DAGs fail? It is very confusing  for users to see 
> the application SUCCEEDED.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to