[ https://issues.apache.org/jira/browse/MAPREDUCE-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Payne resolved MAPREDUCE-2011. ----------------------------------- Resolution: Won't Fix [~knoguchi], here are [~jlowe]'s comments from an offline discussion: I think the distributed cache already behaves the way you desire, at least in YARN. When a resource request arrives at the nodemanager, it tries to lookup the local resource info based on that request. If it finds it (i.e.: a hit in the cache) then it just increments the refcount of the resource – I don't see any attempt to stat HDFS to verify it's still there in HDFS. The only time I see the timestamp of the request compared with HDFS is when it tries to download the resource from HDFS. > Reduce number of getFileStatus call made from every > task(TaskDistributedCache) setup > ------------------------------------------------------------------------------------ > > Key: MAPREDUCE-2011 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2011 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distributed-cache > Reporter: Koji Noguchi > > On our cluster, we had jobs with 20 dist cache and very short-lived tasks > resulting in 500 map tasks launched per second resulting in 10,000 > getFileStatus calls to the namenode. Namenode can handle this but asking to > see if we can reduce this somehow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)