[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194976#comment-16194976
 ] 

Chris Trezzo commented on MAPREDUCE-5951:
-----------------------------------------

Thanks for the comment [~mingma]!

bq. Should any code be moved from MR to YARN to make it easier for other YARN 
applications to use shared cache? For example, maybe other applications can 
benefit from part of LocalResourceBuilder or the special care when dealing with 
fragment.

I have thought about this a fair amount. Originally we started pushing more of 
the fragment code down into the YARN layer (see YARN-3637), but later I 
realized that the code dealing with fragments is purely at the MapReduce layer. 
YARN's api does not use fragments. Instead the ContainerLaunchContext expects a 
Map<String, LocalResource> localResources, where the strings are the 
destination file names (i.e. symlinks). We wound up pulling the fragment 
portion back out of YARN (see YARN-7250) because it was not consistent with the 
rest of the YARN api. Additionally, I think that the way MapReduce uses 
fragments right now is very brittle and prone to bugs. Within MapReduce, 
resources with fragments are converted between paths, URIs and URLs multiple 
times throughout the code and each of these three classes supports fragments in 
different ways. If you are not very careful, one could easily drop a fragment.

I also thought about moving LocalResourceBuilder to YARN, but it has a fair 
amount of MapReduce specific things that would need to change. For example:
# All of the parameters are array based due to how MapReduce currently handles 
resources. We could change this, but then that would need additional 
refactoring at the MapReduce level.
# Components from the MapReduce wildcard feature are in this class. We would 
need to figure out if that makes sense at the yarn layer.
# LocalResourceBuilder currently handles fragments, which we would also need to 
figure out if it makes sense at the yarn layer.

At the end of the day, it would not be simply dropping the LocalResourceBuilder 
into YARN and being done. We would have to think about it more. It does seem 
like something YARN could benefit from, along with a resource uploader. I can 
file another jira to cover these topics, but I think it is probably out of 
scope for this jira.

I think in reality the complexity in this jira is due to the way MapReduce 
itself handles resources and the above mentioned issues with fragments. If we 
wanted to implement a generic yarn resource uploader, I think it could be much 
simpler. For example, this is a slightly simplified version of the code devoted 
to using something in the shared cache:
{noformat}
String localPathChecksum = sharedCacheClient.getFileChecksum(localPath);
URL cachedResource = sharedCacheClient.use(appId, localPathChecksum);
LocalResource resource = LocalResource.newInstance(cachedResource,
      LocalResourceType.FILE, LocalResourceVisibility.PUBLIC
      size, timestamp, null, true);
{noformat}

That LocalResource can then be passed directly to the ContainerLaunchContext 
where a symlink can be specified as a String. As you can see, there is no 
innate need for fragments at the YARN layer.

Please let me know if that makes sense or if I have missed something! Thanks.

> Add support for the YARN Shared Cache
> -------------------------------------
>
>                 Key: MAPREDUCE-5951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>              Labels: BB2015-05-TBR
>         Attachments: MAPREDUCE-5951-Overview.001.pdf, 
> MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, 
> MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, 
> MAPREDUCE-5951-trunk-020.patch, MAPREDUCE-5951-trunk-021.patch, 
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, 
> MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, 
> MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, 
> MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, 
> MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, 
> MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, 
> MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, 
> MAPREDUCE-5951-trunk-v9.patch
>
>
> Implement the necessary changes so that the MapReduce application can 
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify 
> which set of resources they would like to cache (i.e. jobjar, libjars, 
> archives, files).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to