[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308934#comment-15308934
 ] 

Chris Trezzo commented on MAPREDUCE-6690:
-----------------------------------------

Thanks for the review [~jlowe]!

bq. Is this intended to apply to all distributed cache items or only those that 
need to be uploaded during job submission?

Yes, it is intended to apply to all distributed cache items as well. Good 
catch! I will add in the DC items to the check. As a side note: the reasoning 
for including DC items is that even though the DC items are in an accessible 
place, they could still cause a significant amount of localization to the YARN 
local cache. The amount of localization is affected by the local cache size and 
the hit rate in the cache, but I chose to go with the most conservative 
approach.

I will also address your other comments.

> Limit the number of resources a single map reduce job can submit for 
> localization
> ---------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6690
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6690
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: MAPREDUCE-6690-trunk-v1.patch, 
> MAPREDUCE-6690-trunk-v2.patch
>
>
> Users will sometimes submit a large amount of resources to be localized as 
> part of a single map reduce job. This can cause issues with YARN localization 
> that destabilize the cluster and potentially impact other user jobs. These 
> resources are specified via the files, libjars, archives and jobjar command 
> line arguments or directly through the configuration (i.e. distributed cache 
> api). The resources specified could be too large in multiple dimensions:
> # Total size
> # Number of files
> # Size of an individual resource (i.e. a large fat jar)
> We would like to encourage good behavior on the client side by having the 
> option of enforcing resource limits along the above dimensions.
> There should be a separate effort to enforce limits at the YARN layer on the 
> server side, but this jira is only covering the map reduce layer on the 
> client side. In practice, having these client side limits will get us a long 
> way towards preventing these localization anti-patterns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to