[jira] [Resolved] (MAPREDUCE-2011) Reduce number of getFileStatus call made from every task(TaskDistributedCache) setup

Eric Payne (JIRA) Mon, 11 Jan 2016 09:19:03 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eric Payne resolved MAPREDUCE-2011.
-----------------------------------
    Resolution: Won't Fix

[~knoguchi], here are [~jlowe]'s comments from an offline discussion:
I think the distributed cache already behaves the way you desire, at least in 
YARN. When a resource request arrives at the nodemanager, it tries to lookup 
the local resource info based on that request. If it finds it (i.e.: a hit in 
the cache) then it just increments the refcount of the resource – I don't see 
any attempt to stat HDFS to verify it's still there in HDFS. The only time I 
see the timestamp of the request compared with HDFS is when it tries to 
download the resource from HDFS.

> Reduce number of getFileStatus call made from every 
> task(TaskDistributedCache) setup
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2011
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2011
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>            Reporter: Koji Noguchi
>
> On our cluster, we had jobs with 20 dist cache and very short-lived tasks 
> resulting in 500 map tasks launched per second resulting in  10,000 
> getFileStatus calls to the namenode.  Namenode can handle this but asking to 
> see if we can reduce this somehow.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (MAPREDUCE-2011) Reduce number of getFileStatus call made from every task(TaskDistributedCache) setup

Reply via email to