[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer updated MAPREDUCE-5969: ---------------------------------------- Labels: BB2015-05-TBR (was: ) > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. > ------------------------------------------------------------------------------------------ > > Key: MAPREDUCE-5969 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 > Reporter: zhihai xu > Assignee: zhihai xu > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5969.branch1.1.patch, > MAPREDUCE-5969.branch1.patch > > > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. Private non-Archive Files list is passed in by "-files" command > line option. The Distributed Cache directory size is used to check whether > the total cache files size exceed the cache size limitation, the default > cache size limitation is 10G. > I add log in addCacheInfoUpdate and setSize in > TrackerDistributedCacheManager.java. > I use the following command to test: > hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files > hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar > /tmp/zxu/test_in/ /tmp/zxu/test_out > to add two files into distributed cache:WordCount.java and wordcount.jar. > WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 > bytes. The total should be 6260. > The log show these files size added twice: > add one time before download to local node and add second time after download > to local node, so total file number becomes 4 instead of 2: > addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local > addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local > addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local > In the code, for Private non-Archive File, the first time we add file size is > at > getLocalCache: > {code} > if (!isArchive) { > //for private archives, the lengths come over RPC from the > //JobLocalizer since the JobLocalizer is the one who expands > //archives and gets the total length > lcacheStatus.size = fileStatus.getLen(); > LOG.info("getLocalCache:" + localizedPath + " size = " > + lcacheStatus.size); > // Increase the size and sub directory count of the cache > // from baseDirSize and baseDirNumberSubDir. > baseDirManager.addCacheInfoUpdate(lcacheStatus); > } > {code} > The second time we add file size is at > setSize: > {code} > synchronized (status) { > status.size = size; > baseDirManager.addCacheInfoUpdate(status); > } > {code} > The fix is not to add the file size for for Private non-Archive File after > download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.3.4#6332)