[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062903#comment-14062903
 ] 

zhihai xu commented on MAPREDUCE-5969:
--------------------------------------

I updated the patch. The new patch is to delete the old file size before add 
the new file size for Private non-Archive Files.
With the new patch, It will work even you change the file size multiple times 
for the same file.

> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation.
> ------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5969
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: MAPREDUCE-5969.branch1.patch
>
>
> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation. Private non-Archive Files list is passed in by "-files" command 
> line option. The Distributed Cache directory size is used to check whether 
> the total cache files size exceed the cache size limitation,  the default 
> cache size limitation is 10G.
> I add log in addCacheInfoUpdate and setSize in 
> TrackerDistributedCacheManager.java.
> I use the following command to test:
> hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
> hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
>  /tmp/zxu/test_in/ /tmp/zxu/test_out
> to add two files into distributed cache:WordCount.java and wordcount.jar.
> WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
> bytes. The total should be 6260.
> The log show these files size added twice:
> add one time before download to local node and add second time after download 
> to local node, so total file number becomes 4 instead of 2:
> addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
> addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
> addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
> In the code, for Private non-Archive File, the first time we add file size is 
> at 
> getLocalCache:
>             if (!isArchive) {
>               //for private archives, the lengths come over RPC from the 
>               //JobLocalizer since the JobLocalizer is the one who expands
>               //archives and gets the total length
>               lcacheStatus.size = fileStatus.getLen();
>               LOG.info("getLocalCache:" + localizedPath + " size = "
>                   + lcacheStatus.size);
>               // Increase the size and sub directory count of the cache
>               // from baseDirSize and baseDirNumberSubDir.
>               baseDirManager.addCacheInfoUpdate(lcacheStatus);
>             }
> The second time we add file size is at 
> setSize:
>       synchronized (status) {
>         status.size = size;
>         baseDirManager.addCacheInfoUpdate(status);
>       }
> The fix is not to add the file size for for Private non-Archive File after 
> download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to