[
https://issues.apache.org/jira/browse/OOZIE-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916675#comment-13916675
]
Robert Kanter commented on OOZIE-1722:
--------------------------------------
{quote}
Looks good. When killing the previous attempt jobs, one thing I wanted to do
was to use the same tag for both the launcher and action configs. And while
killing the previous attempt jobs, exclude the "re"launched job. This will help
in the case of uber mode also when we enable it.
{quote}
If we put the tag in the job itself, then the launcher would try to kill itself
unless we add additional logic to exclude it, but I'm not sure we can do that
easily. I'm also not sure what we'd gain from this; the launcher doesn't need
to find itself and the Oozie server already has the job ID for the launcher.
I hadn't considered the affect of this on uber mode. But shouldn't that not
make a difference here? The launcher and action jobs are still separate jobs
so they should have different tags, right?
----
I'll look into the test failures and update the patch; its probably something
trivial with how TestLauncher works.
> When an ApplicationMaster restarts, it restarts the launcher job
> ----------------------------------------------------------------
>
> Key: OOZIE-1722
> URL: https://issues.apache.org/jira/browse/OOZIE-1722
> Project: Oozie
> Issue Type: Improvement
> Affects Versions: trunk
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Attachments: OOZIE-1722.patch
>
>
> When using Yarn, there are some situations in which the ApplicationMaster can
> be restarted (e.g. RM failover, the AM dies and another attempt is made,
> etc).
> When this happens, it starts the launcher job again, which will start over.
> So, if that launcher has already launched a job, we'll end up with two
> instances of the same job, which can be problematic. For example, if you
> have a Pig action, the Pig client might run a job, but then the launcher gets
> restarted by an AM restart and launches that same job again.
> We don't have a way of "re-attaching" to previously launched jobs; however,
> with YARN-1461 and MAPREDUCE-5699, we can use yarn tags to find anything the
> launcher previously launched that's running and kill them. We still have to
> start over, but at least we're not running two instances of a job at the same
> time.
> Here's what we can do for each action type:
> - Pig, Sqoop, Hive
> -- Kill previously launched jobs and start over
> - MapReduce (different because of the optimization)
> -- Exit launcher if a previously launched job already exists
> - Java, Shell
> -- No out-of-the-box support for this
> -- Like with other things, the Java action can take advantage of this like
> Pig, Sqoop, and Hive if the user adds some code
> - DistCp
> -- Not supported
> - SSH, Email
> -- N/A
> The yarn tags won't be available until Hadoop 2.4.0, but is in the nightly
> (i.e. Hadoop 3.0.0-SNAPSHOT); and its obviously not in Hadoop 1.x. To be
> able to use the Yarn methods and the new methods for tagging, we can add a
> new type of Hadooplib called "Hadoop Utils" where we can put classes that are
> specific to a specific version of Hadoop; the other implementations can have
> dummy versions. For example, in the Hadoop-2 Hadoop Utils, we can put a
> method foo() that calls some yarn stuff but in the Hadoop-1 Hadoop Utils, the
> foo() method would either do the equivalent in MR1 or a no-op. So for now, I
> put some methods in the Hadoop-3 Hadoop Utils that use the tags and the
> Hadoop-1, Hadoop-2, and Hadoop-23 Hadoop Utils all have dummy implementations
> that don't do anything (so the existing behavior is preserved). The Hadoop
> Utils modules will allow us to take advantage of Hadoop 2 only features in
> the future, while still being able to compile against Hadoop 1; so it's not
> just limited to this feature.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)