[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989545#comment-15989545 ] Chris Trezzo commented on MAPREDUCE-5951: - [~xkrogen] bq. Is this so that the uploading to SCM can be done by the NM, which is a privileged user, to have more secure control over it? Yes exactly. We wanted to ensure that only trusted entities (i.e. the SCM and the node manager) were modifying the shared cached directories in HDFS. Additionally, we wanted to make sure that the checksum used when adding a resource to the cache was computed by a trusted entity as well. > Add support for the YARN Shared Cache > - > > Key: MAPREDUCE-5951 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5951-Overview.001.pdf, > MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, > MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, > MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, > MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, > MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, > MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, > MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, > MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, > MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, > MAPREDUCE-5951-trunk-v9.patch > > > Implement the necessary changes so that the MapReduce application can > leverage the new YARN shared cache (i.e. YARN-1492). > Specifically, allow per-job configuration so that MapReduce jobs can specify > which set of resources they would like to cache (i.e. jobjar, libjars, > archives, files). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989035#comment-15989035 ] Erik Krogen commented on MAPREDUCE-5951: Ah, excellent point, [~jlowe]... I actually would love to hear the reasoning behind the current strategy of AM downloads resource -> AM uploads resource to SCM> rather than the seemingly more obvious/simpler . Is this so that the uploading to SCM can be done by the NM, which is a privileged user, to have more secure control over it? [~ctrezzo], first off thanks for getting back so quickly! And for the pointer to YARN-5727; that's an interesting issue. The public visibility solution is certainly simpler from the YARN side and seems pretty reasonable from a point of expectation of burden on an application ("you want a publicly shared resource? put it somewhere public"). It doesn't add _too_ much complexity on the MR side, though having a separate staging directory just for public resources is a bit cumbersome. It also means that other application developers will have to build the same type of logic - in general I would lean towards more logic pushed into the YARN level so that it is easy for application devs to support. I don't have good insight into how difficult your initially proposed solution in YARN-5727 would be to implement, though. > Add support for the YARN Shared Cache > - > > Key: MAPREDUCE-5951 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5951-Overview.001.pdf, > MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, > MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, > MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, > MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, > MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, > MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, > MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, > MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, > MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, > MAPREDUCE-5951-trunk-v9.patch > > > Implement the necessary changes so that the MapReduce application can > leverage the new YARN shared cache (i.e. YARN-1492). > Specifically, allow per-job configuration so that MapReduce jobs can specify > which set of resources they would like to cache (i.e. jobjar, libjars, > archives, files). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988744#comment-15988744 ] Jason Lowe commented on MAPREDUCE-5951: --- I don't think it really matters whether the jar resource uploaded by the client is public or private. In both cases the HDFS path to which the client posts the resource will be removed when the job completes. If any subsequent jobs come along and figure out via the SCM that they can avoid uploading their own, redundant copy of the same resource then they will receive a resource path within the SCM area which is a _different_ path than the one used by the first job. That means the resource is going to get downloaded to the node again because it's in a different location than the first job's resource. Even if the first job's client uploads the resource to a public directory, no other job is going to ask for that resource under the same path. It will be uploaded to a public staging directory which is specific to that app and whose path exists only as long as the app. The problem with having jobs try to share resources automatically just from the job client is knowing when the resource can be removed, otherwise we could yank it just as another app tries to localize it or never clean it up. That's why the SCM does the necessary ref counting to know what's being used and when resources can be freed safely. If we want to avoid the double-download of the resource then the job client will need to upload the resource to the SCM directly and then submit the job _after_ it has received the public resource path from the SCM. > Add support for the YARN Shared Cache > - > > Key: MAPREDUCE-5951 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5951-Overview.001.pdf, > MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, > MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, > MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, > MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, > MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, > MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, > MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, > MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, > MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, > MAPREDUCE-5951-trunk-v9.patch > > > Implement the necessary changes so that the MapReduce application can > leverage the new YARN shared cache (i.e. YARN-1492). > Specifically, allow per-job configuration so that MapReduce jobs can specify > which set of resources they would like to cache (i.e. jobjar, libjars, > archives, files). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org