[
https://issues.apache.org/jira/browse/TEZ-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000346#comment-14000346
]
Bikas Saha commented on TEZ-1117:
---------------------------------
Please bear with me while I play the devil's advocate. What do admins (or folks
who monitor that the cluster is doing well) do when a bunch of jobs show up as
killed or failed on the aggregate YARN job status page when actually things are
much better. In my previous team, there was a watchdog that would fire alerts
when it saw 5 back-to-back job failures.
Its interesting that Pig does things this way. Does Pig specifically not prefer
the fire and forget model? How do folks run regularly scheduled (cron like)
jobs? By keeping a client alive? What use case is that targeting? Is that by
design or a compromise? E.g. in my previous project, the users could choose to
get an email when their job completed. So they did not have to poll or block on
anything.
I can see supporting the thing that you are not asking for :) Which is
terminating the session when a dag fails. So if a pipeline of dags is being
submitted to the session then the session fails if a member of that pipeline
fails. So a session maps to a pipeline. Not an intended use case for sessions
but not that far off.
Btw, in which cases does Pig need to create a pipeline of dags and submit them
instead of just creating a single DAG? Is there something we can do in Tez to
make that work? What kind of dependency are we not able to express?
FYI, we are going to merge the session and non-session submission code path in
TEZ-692. So the users can write the same code and choose to go session mode or
non-session mode as appropriate.
> Option to make YARN application failed on dag failure
> -----------------------------------------------------
>
> Key: TEZ-1117
> URL: https://issues.apache.org/jira/browse/TEZ-1117
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
>
> Can we have an configuration to make the Application status FAILED on
> termination if one of the DAGs fail? It is very confusing for users to see
> the application SUCCEEDED.
--
This message was sent by Atlassian JIRA
(v6.2#6252)