[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5969: Labels: BB2015-05-TBR (was: ) > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. > -- > > Key: MAPREDUCE-5969 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Reporter: zhihai xu >Assignee: zhihai xu > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5969.branch1.1.patch, > MAPREDUCE-5969.branch1.patch > > > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. Private non-Archive Files list is passed in by "-files" command > line option. The Distributed Cache directory size is used to check whether > the total cache files size exceed the cache size limitation, the default > cache size limitation is 10G. > I add log in addCacheInfoUpdate and setSize in > TrackerDistributedCacheManager.java. > I use the following command to test: > hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files > hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar > /tmp/zxu/test_in/ /tmp/zxu/test_out > to add two files into distributed cache:WordCount.java and wordcount.jar. > WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 > bytes. The total should be 6260. > The log show these files size added twice: > add one time before download to local node and add second time after download > to local node, so total file number becomes 4 instead of 2: > addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local > addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local > addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local > In the code, for Private non-Archive File, the first time we add file size is > at > getLocalCache: > {code} > if (!isArchive) { > //for private archives, the lengths come over RPC from the > //JobLocalizer since the JobLocalizer is the one who expands > //archives and gets the total length > lcacheStatus.size = fileStatus.getLen(); > LOG.info("getLocalCache:" + localizedPath + " size = " > + lcacheStatus.size); > // Increase the size and sub directory count of the cache > // from baseDirSize and baseDirNumberSubDir. > baseDirManager.addCacheInfoUpdate(lcacheStatus); > } > {code} > The second time we add file size is at > setSize: > {code} > synchronized (status) { > status.size = size; > baseDirManager.addCacheInfoUpdate(status); > } > {code} > The fix is not to add the file size for for Private non-Archive File after > download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: MAPREDUCE-5969.branch1.1.patch > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. > -- > > Key: MAPREDUCE-5969 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-5969.branch1.1.patch, > MAPREDUCE-5969.branch1.patch > > > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. Private non-Archive Files list is passed in by "-files" command > line option. The Distributed Cache directory size is used to check whether > the total cache files size exceed the cache size limitation, the default > cache size limitation is 10G. > I add log in addCacheInfoUpdate and setSize in > TrackerDistributedCacheManager.java. > I use the following command to test: > hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files > hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar > /tmp/zxu/test_in/ /tmp/zxu/test_out > to add two files into distributed cache:WordCount.java and wordcount.jar. > WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 > bytes. The total should be 6260. > The log show these files size added twice: > add one time before download to local node and add second time after download > to local node, so total file number becomes 4 instead of 2: > addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local > addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local > addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local > In the code, for Private non-Archive File, the first time we add file size is > at > getLocalCache: > {code} > if (!isArchive) { > //for private archives, the lengths come over RPC from the > //JobLocalizer since the JobLocalizer is the one who expands > //archives and gets the total length > lcacheStatus.size = fileStatus.getLen(); > LOG.info("getLocalCache:" + localizedPath + " size = " > + lcacheStatus.size); > // Increase the size and sub directory count of the cache > // from baseDirSize and baseDirNumberSubDir. > baseDirManager.addCacheInfoUpdate(lcacheStatus); > } > {code} > The second time we add file size is at > setSize: > {code} > synchronized (status) { > status.size = size; > baseDirManager.addCacheInfoUpdate(status); > } > {code} > The fix is not to add the file size for for Private non-Archive File after > download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: (was: MAPREDUCE-5969.branch1.patch) > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. > -- > > Key: MAPREDUCE-5969 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-5969.branch1.patch > > > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. Private non-Archive Files list is passed in by "-files" command > line option. The Distributed Cache directory size is used to check whether > the total cache files size exceed the cache size limitation, the default > cache size limitation is 10G. > I add log in addCacheInfoUpdate and setSize in > TrackerDistributedCacheManager.java. > I use the following command to test: > hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files > hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar > /tmp/zxu/test_in/ /tmp/zxu/test_out > to add two files into distributed cache:WordCount.java and wordcount.jar. > WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 > bytes. The total should be 6260. > The log show these files size added twice: > add one time before download to local node and add second time after download > to local node, so total file number becomes 4 instead of 2: > addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local > addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local > addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local > In the code, for Private non-Archive File, the first time we add file size is > at > getLocalCache: > {code} > if (!isArchive) { > //for private archives, the lengths come over RPC from the > //JobLocalizer since the JobLocalizer is the one who expands > //archives and gets the total length > lcacheStatus.size = fileStatus.getLen(); > LOG.info("getLocalCache:" + localizedPath + " size = " > + lcacheStatus.size); > // Increase the size and sub directory count of the cache > // from baseDirSize and baseDirNumberSubDir. > baseDirManager.addCacheInfoUpdate(lcacheStatus); > } > {code} > The second time we add file size is at > setSize: > {code} > synchronized (status) { > status.size = size; > baseDirManager.addCacheInfoUpdate(status); > } > {code} > The fix is not to add the file size for for Private non-Archive File after > download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: MAPREDUCE-5969.branch1.patch > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. > -- > > Key: MAPREDUCE-5969 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-5969.branch1.patch > > > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. Private non-Archive Files list is passed in by "-files" command > line option. The Distributed Cache directory size is used to check whether > the total cache files size exceed the cache size limitation, the default > cache size limitation is 10G. > I add log in addCacheInfoUpdate and setSize in > TrackerDistributedCacheManager.java. > I use the following command to test: > hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files > hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar > /tmp/zxu/test_in/ /tmp/zxu/test_out > to add two files into distributed cache:WordCount.java and wordcount.jar. > WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 > bytes. The total should be 6260. > The log show these files size added twice: > add one time before download to local node and add second time after download > to local node, so total file number becomes 4 instead of 2: > addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local > addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local > addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local > In the code, for Private non-Archive File, the first time we add file size is > at > getLocalCache: > {code} > if (!isArchive) { > //for private archives, the lengths come over RPC from the > //JobLocalizer since the JobLocalizer is the one who expands > //archives and gets the total length > lcacheStatus.size = fileStatus.getLen(); > LOG.info("getLocalCache:" + localizedPath + " size = " > + lcacheStatus.size); > // Increase the size and sub directory count of the cache > // from baseDirSize and baseDirNumberSubDir. > baseDirManager.addCacheInfoUpdate(lcacheStatus); > } > {code} > The second time we add file size is at > setSize: > {code} > synchronized (status) { > status.size = size; > baseDirManager.addCacheInfoUpdate(status); > } > {code} > The fix is not to add the file size for for Private non-Archive File after > download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Description: Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by "-files" command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: {code} if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info("getLocalCache:" + localizedPath + " size = " + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } {code} The second time we add file size is at setSize: {code} synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } {code} The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). was: Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by "-files" command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info("getLocalCache:" + localizedPath + " size = " + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } The second time we add file size is at setSize: synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. > -- > > Key: MAPREDUCE-5969 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: MAPREDUCE-5969.branch1.patch > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. > -- > > Key: MAPREDUCE-5969 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-5969.branch1.patch > > > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. Private non-Archive Files list is passed in by "-files" command > line option. The Distributed Cache directory size is used to check whether > the total cache files size exceed the cache size limitation, the default > cache size limitation is 10G. > I add log in addCacheInfoUpdate and setSize in > TrackerDistributedCacheManager.java. > I use the following command to test: > hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files > hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar > /tmp/zxu/test_in/ /tmp/zxu/test_out > to add two files into distributed cache:WordCount.java and wordcount.jar. > WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 > bytes. The total should be 6260. > The log show these files size added twice: > add one time before download to local node and add second time after download > to local node, so total file number becomes 4 instead of 2: > addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local > addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local > addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local > In the code, for Private non-Archive File, the first time we add file size is > at > getLocalCache: > if (!isArchive) { > //for private archives, the lengths come over RPC from the > //JobLocalizer since the JobLocalizer is the one who expands > //archives and gets the total length > lcacheStatus.size = fileStatus.getLen(); > LOG.info("getLocalCache:" + localizedPath + " size = " > + lcacheStatus.size); > // Increase the size and sub directory count of the cache > // from baseDirSize and baseDirNumberSubDir. > baseDirManager.addCacheInfoUpdate(lcacheStatus); > } > The second time we add file size is at > setSize: > synchronized (status) { > status.size = size; > baseDirManager.addCacheInfoUpdate(status); > } > The fix is not to add the file size for for Private non-Archive File after > download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: (was: MAPREDUCE-5969.branch1.patch) > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. > -- > > Key: MAPREDUCE-5969 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Reporter: zhihai xu >Assignee: zhihai xu > > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. Private non-Archive Files list is passed in by "-files" command > line option. The Distributed Cache directory size is used to check whether > the total cache files size exceed the cache size limitation, the default > cache size limitation is 10G. > I add log in addCacheInfoUpdate and setSize in > TrackerDistributedCacheManager.java. > I use the following command to test: > hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files > hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar > /tmp/zxu/test_in/ /tmp/zxu/test_out > to add two files into distributed cache:WordCount.java and wordcount.jar. > WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 > bytes. The total should be 6260. > The log show these files size added twice: > add one time before download to local node and add second time after download > to local node, so total file number becomes 4 instead of 2: > addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local > addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local > addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local > In the code, for Private non-Archive File, the first time we add file size is > at > getLocalCache: > if (!isArchive) { > //for private archives, the lengths come over RPC from the > //JobLocalizer since the JobLocalizer is the one who expands > //archives and gets the total length > lcacheStatus.size = fileStatus.getLen(); > LOG.info("getLocalCache:" + localizedPath + " size = " > + lcacheStatus.size); > // Increase the size and sub directory count of the cache > // from baseDirSize and baseDirNumberSubDir. > baseDirManager.addCacheInfoUpdate(lcacheStatus); > } > The second time we add file size is at > setSize: > synchronized (status) { > status.size = size; > baseDirManager.addCacheInfoUpdate(status); > } > The fix is not to add the file size for for Private non-Archive File after > download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: MAPREDUCE-5969.branch1.patch > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. > -- > > Key: MAPREDUCE-5969 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-5969.branch1.patch > > > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. Private non-Archive Files list is passed in by "-files" command > line option. The Distributed Cache directory size is used to check whether > the total cache files size exceed the cache size limitation, the default > cache size limitation is 10G. > I add log in addCacheInfoUpdate and setSize in > TrackerDistributedCacheManager.java. > I use the following command to test: > hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files > hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar > /tmp/zxu/test_in/ /tmp/zxu/test_out > to add two files into distributed cache:WordCount.java and wordcount.jar. > WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 > bytes. The total should be 6260. > The log show these files size added twice: > add one time before download to local node and add second time after download > to local node, so total file number becomes 4 instead of 2: > addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local > addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local > addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local > In the code, for Private non-Archive File, the first time we add file size is > at > getLocalCache: > if (!isArchive) { > //for private archives, the lengths come over RPC from the > //JobLocalizer since the JobLocalizer is the one who expands > //archives and gets the total length > lcacheStatus.size = fileStatus.getLen(); > LOG.info("getLocalCache:" + localizedPath + " size = " > + lcacheStatus.size); > // Increase the size and sub directory count of the cache > // from baseDirSize and baseDirNumberSubDir. > baseDirManager.addCacheInfoUpdate(lcacheStatus); > } > The second time we add file size is at > setSize: > synchronized (status) { > status.size = size; > baseDirManager.addCacheInfoUpdate(status); > } > The fix is not to add the file size for for Private non-Archive File after > download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Status: Patch Available (was: Open) > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. > -- > > Key: MAPREDUCE-5969 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-5969.branch1.patch > > > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. Private non-Archive Files list is passed in by "-files" command > line option. The Distributed Cache directory size is used to check whether > the total cache files size exceed the cache size limitation, the default > cache size limitation is 10G. > I add log in addCacheInfoUpdate and setSize in > TrackerDistributedCacheManager.java. > I use the following command to test: > hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files > hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar > /tmp/zxu/test_in/ /tmp/zxu/test_out > to add two files into distributed cache:WordCount.java and wordcount.jar. > WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 > bytes. The total should be 6260. > The log show these files size added twice: > add one time before download to local node and add second time after download > to local node, so total file number becomes 4 instead of 2: > addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local > addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local > addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local > In the code, for Private non-Archive File, the first time we add file size is > at > getLocalCache: > if (!isArchive) { > //for private archives, the lengths come over RPC from the > //JobLocalizer since the JobLocalizer is the one who expands > //archives and gets the total length > lcacheStatus.size = fileStatus.getLen(); > LOG.info("getLocalCache:" + localizedPath + " size = " > + lcacheStatus.size); > // Increase the size and sub directory count of the cache > // from baseDirSize and baseDirNumberSubDir. > baseDirManager.addCacheInfoUpdate(lcacheStatus); > } > The second time we add file size is at > setSize: > synchronized (status) { > status.size = size; > baseDirManager.addCacheInfoUpdate(status); > } > The fix is not to add the file size for for Private non-Archive File after > download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)