[
https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504702#comment-13504702
]
Alejandro Abdelnur commented on OOZIE-1089:
-------------------------------------------
Mohammad,
Oozie does not add any duplicate entry in the DC.
The problem resides on how Yarn handles distributed cache and a duplicate check
introduced in MRApps.
Let me explain here again MAPREDUCE-4820 using Oozie terminology:
* Oozie ActionExecutor creates the jobconfs for both the launcher job and the
action job
* Both jobconfs are configured with the corresponding DistributedCache entries
* The DistributedCached entries are identical in both
* The DistributedCached entries are required for both (for the launcher to
submit the action job, for the action job to run)
* Because the way YARN works (and this changed from Hadoop 1), all JARs in the
distributed cache are symlinked to the task running directory.
* Because the way MRApps works (for job submission), in injects to the
distributed cache all JARs in the current directory and in the lib/ directory.
* Because the launcher job runs MRApps again (to submit the action job), the
duplication happens between the entries in the distributed cache and in the
task current directory.
The workaround flushes the action jobconf distributed cache entries (rightfully
assuming in the case of Hadoop 2) that they'll be in the current dir of the
launcher task, thus added to the distributed cache of the action jobconf
implicitly.
Because of this, there is nothing to be done by Oozie other than the workaround.
I think the correct fix for MAPREDUCE-4820 is to dedup instead of fail, I'll be
posting a patch momentarily, but until a Hadoop 2 release including the fix is
released we need the workaround.
Hope this explains things clearly.
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
> Key: OOZIE-1089
> URL: https://issues.apache.org/jira/browse/OOZIE-1089
> Project: Oozie
> Issue Type: Bug
> Components: workflow
> Affects Versions: 3.3.0
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
> Fix For: 3.3.0
>
> Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate
> check that exposes an change of behavior in how the distributed-cache works
> in Hadoop 2 (as opposed to Hadoop-1).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira