[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5969: Labels: BB2015-05-TBR (was: ) Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Labels: BB2015-05-TBR Attachments: MAPREDUCE-5969.branch1.1.patch, MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: {code} if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } {code} The second time we add file size is at setSize: {code} synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } {code} The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: MAPREDUCE-5969.branch1.1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5969.branch1.1.patch, MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: {code} if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } {code} The second time we add file size is at setSize: {code} synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } {code} The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: {code} if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } {code} The second time we add file size is at setSize: {code} synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } {code} The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: (was: MAPREDUCE-5969.branch1.patch) Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: {code} if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } {code} The second time we add file size is at setSize: {code} synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } {code} The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Description: Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: {code} if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } {code} The second time we add file size is at setSize: {code} synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } {code} The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). was: Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } The second time we add file size is at setSize: synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter:
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: (was: MAPREDUCE-5969.branch1.patch) Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } The second time we add file size is at setSize: synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } The second time we add file size is at setSize: synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Status: Patch Available (was: Open) Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } The second time we add file size is at setSize: synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } The second time we add file size is at setSize: synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)