[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5969:

Labels: BB2015-05-TBR  (was: )

> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation.
> --
>
> Key: MAPREDUCE-5969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Reporter: zhihai xu
>Assignee: zhihai xu
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5969.branch1.1.patch, 
> MAPREDUCE-5969.branch1.patch
>
>
> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation. Private non-Archive Files list is passed in by "-files" command 
> line option. The Distributed Cache directory size is used to check whether 
> the total cache files size exceed the cache size limitation,  the default 
> cache size limitation is 10G.
> I add log in addCacheInfoUpdate and setSize in 
> TrackerDistributedCacheManager.java.
> I use the following command to test:
> hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
> hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
>  /tmp/zxu/test_in/ /tmp/zxu/test_out
> to add two files into distributed cache:WordCount.java and wordcount.jar.
> WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
> bytes. The total should be 6260.
> The log show these files size added twice:
> add one time before download to local node and add second time after download 
> to local node, so total file number becomes 4 instead of 2:
> addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
> addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
> addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
> In the code, for Private non-Archive File, the first time we add file size is 
> at 
> getLocalCache:
> {code}
> if (!isArchive) {
>   //for private archives, the lengths come over RPC from the 
>   //JobLocalizer since the JobLocalizer is the one who expands
>   //archives and gets the total length
>   lcacheStatus.size = fileStatus.getLen();
>   LOG.info("getLocalCache:" + localizedPath + " size = "
>   + lcacheStatus.size);
>   // Increase the size and sub directory count of the cache
>   // from baseDirSize and baseDirNumberSubDir.
>   baseDirManager.addCacheInfoUpdate(lcacheStatus);
> }
> {code}
> The second time we add file size is at 
> setSize:
> {code}
>   synchronized (status) {
> status.size = size;
> baseDirManager.addCacheInfoUpdate(status);
>   }
> {code}
> The fix is not to add the file size for for Private non-Archive File after 
> download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-10-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-
Attachment: MAPREDUCE-5969.branch1.1.patch

> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation.
> --
>
> Key: MAPREDUCE-5969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-5969.branch1.1.patch, 
> MAPREDUCE-5969.branch1.patch
>
>
> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation. Private non-Archive Files list is passed in by "-files" command 
> line option. The Distributed Cache directory size is used to check whether 
> the total cache files size exceed the cache size limitation,  the default 
> cache size limitation is 10G.
> I add log in addCacheInfoUpdate and setSize in 
> TrackerDistributedCacheManager.java.
> I use the following command to test:
> hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
> hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
>  /tmp/zxu/test_in/ /tmp/zxu/test_out
> to add two files into distributed cache:WordCount.java and wordcount.jar.
> WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
> bytes. The total should be 6260.
> The log show these files size added twice:
> add one time before download to local node and add second time after download 
> to local node, so total file number becomes 4 instead of 2:
> addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
> addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
> addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
> In the code, for Private non-Archive File, the first time we add file size is 
> at 
> getLocalCache:
> {code}
> if (!isArchive) {
>   //for private archives, the lengths come over RPC from the 
>   //JobLocalizer since the JobLocalizer is the one who expands
>   //archives and gets the total length
>   lcacheStatus.size = fileStatus.getLen();
>   LOG.info("getLocalCache:" + localizedPath + " size = "
>   + lcacheStatus.size);
>   // Increase the size and sub directory count of the cache
>   // from baseDirSize and baseDirNumberSubDir.
>   baseDirManager.addCacheInfoUpdate(lcacheStatus);
> }
> {code}
> The second time we add file size is at 
> setSize:
> {code}
>   synchronized (status) {
> status.size = size;
> baseDirManager.addCacheInfoUpdate(status);
>   }
> {code}
> The fix is not to add the file size for for Private non-Archive File after 
> download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-31 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Attachment: (was: MAPREDUCE-5969.branch1.patch)

> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation.
> --
>
> Key: MAPREDUCE-5969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-5969.branch1.patch
>
>
> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation. Private non-Archive Files list is passed in by "-files" command 
> line option. The Distributed Cache directory size is used to check whether 
> the total cache files size exceed the cache size limitation,  the default 
> cache size limitation is 10G.
> I add log in addCacheInfoUpdate and setSize in 
> TrackerDistributedCacheManager.java.
> I use the following command to test:
> hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
> hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
>  /tmp/zxu/test_in/ /tmp/zxu/test_out
> to add two files into distributed cache:WordCount.java and wordcount.jar.
> WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
> bytes. The total should be 6260.
> The log show these files size added twice:
> add one time before download to local node and add second time after download 
> to local node, so total file number becomes 4 instead of 2:
> addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
> addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
> addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
> In the code, for Private non-Archive File, the first time we add file size is 
> at 
> getLocalCache:
> {code}
> if (!isArchive) {
>   //for private archives, the lengths come over RPC from the 
>   //JobLocalizer since the JobLocalizer is the one who expands
>   //archives and gets the total length
>   lcacheStatus.size = fileStatus.getLen();
>   LOG.info("getLocalCache:" + localizedPath + " size = "
>   + lcacheStatus.size);
>   // Increase the size and sub directory count of the cache
>   // from baseDirSize and baseDirNumberSubDir.
>   baseDirManager.addCacheInfoUpdate(lcacheStatus);
> }
> {code}
> The second time we add file size is at 
> setSize:
> {code}
>   synchronized (status) {
> status.size = size;
> baseDirManager.addCacheInfoUpdate(status);
>   }
> {code}
> The fix is not to add the file size for for Private non-Archive File after 
> download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-31 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Attachment: MAPREDUCE-5969.branch1.patch

> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation.
> --
>
> Key: MAPREDUCE-5969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-5969.branch1.patch
>
>
> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation. Private non-Archive Files list is passed in by "-files" command 
> line option. The Distributed Cache directory size is used to check whether 
> the total cache files size exceed the cache size limitation,  the default 
> cache size limitation is 10G.
> I add log in addCacheInfoUpdate and setSize in 
> TrackerDistributedCacheManager.java.
> I use the following command to test:
> hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
> hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
>  /tmp/zxu/test_in/ /tmp/zxu/test_out
> to add two files into distributed cache:WordCount.java and wordcount.jar.
> WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
> bytes. The total should be 6260.
> The log show these files size added twice:
> add one time before download to local node and add second time after download 
> to local node, so total file number becomes 4 instead of 2:
> addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
> addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
> addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
> In the code, for Private non-Archive File, the first time we add file size is 
> at 
> getLocalCache:
> {code}
> if (!isArchive) {
>   //for private archives, the lengths come over RPC from the 
>   //JobLocalizer since the JobLocalizer is the one who expands
>   //archives and gets the total length
>   lcacheStatus.size = fileStatus.getLen();
>   LOG.info("getLocalCache:" + localizedPath + " size = "
>   + lcacheStatus.size);
>   // Increase the size and sub directory count of the cache
>   // from baseDirSize and baseDirNumberSubDir.
>   baseDirManager.addCacheInfoUpdate(lcacheStatus);
> }
> {code}
> The second time we add file size is at 
> setSize:
> {code}
>   synchronized (status) {
> status.size = size;
> baseDirManager.addCacheInfoUpdate(status);
>   }
> {code}
> The fix is not to add the file size for for Private non-Archive File after 
> download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-25 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Description: 
Private non-Archive Files' size add twice in Distributed Cache directory size 
calculation. Private non-Archive Files list is passed in by "-files" command 
line option. The Distributed Cache directory size is used to check whether the 
total cache files size exceed the cache size limitation,  the default cache 
size limitation is 10G.
I add log in addCacheInfoUpdate and setSize in 
TrackerDistributedCacheManager.java.
I use the following command to test:
hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar 
/tmp/zxu/test_in/ /tmp/zxu/test_out
to add two files into distributed cache:WordCount.java and wordcount.jar.
WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
bytes. The total should be 6260.
The log show these files size added twice:
add one time before download to local node and add second time after download 
to local node, so total file number becomes 4 instead of 2:
addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
In the code, for Private non-Archive File, the first time we add file size is 
at 
getLocalCache:
{code}
if (!isArchive) {
  //for private archives, the lengths come over RPC from the 
  //JobLocalizer since the JobLocalizer is the one who expands
  //archives and gets the total length
  lcacheStatus.size = fileStatus.getLen();

  LOG.info("getLocalCache:" + localizedPath + " size = "
  + lcacheStatus.size);
  // Increase the size and sub directory count of the cache
  // from baseDirSize and baseDirNumberSubDir.
  baseDirManager.addCacheInfoUpdate(lcacheStatus);
}
{code}
The second time we add file size is at 
setSize:
{code}
  synchronized (status) {
status.size = size;
baseDirManager.addCacheInfoUpdate(status);
  }
{code}
The fix is not to add the file size for for Private non-Archive File after 
download(downloadCacheObject).


  was:
Private non-Archive Files' size add twice in Distributed Cache directory size 
calculation. Private non-Archive Files list is passed in by "-files" command 
line option. The Distributed Cache directory size is used to check whether the 
total cache files size exceed the cache size limitation,  the default cache 
size limitation is 10G.
I add log in addCacheInfoUpdate and setSize in 
TrackerDistributedCacheManager.java.
I use the following command to test:
hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar 
/tmp/zxu/test_in/ /tmp/zxu/test_out
to add two files into distributed cache:WordCount.java and wordcount.jar.
WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
bytes. The total should be 6260.
The log show these files size added twice:
add one time before download to local node and add second time after download 
to local node, so total file number becomes 4 instead of 2:
addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
In the code, for Private non-Archive File, the first time we add file size is 
at 
getLocalCache:
if (!isArchive) {
  //for private archives, the lengths come over RPC from the 
  //JobLocalizer since the JobLocalizer is the one who expands
  //archives and gets the total length
  lcacheStatus.size = fileStatus.getLen();

  LOG.info("getLocalCache:" + localizedPath + " size = "
  + lcacheStatus.size);
  // Increase the size and sub directory count of the cache
  // from baseDirSize and baseDirNumberSubDir.
  baseDirManager.addCacheInfoUpdate(lcacheStatus);
}
The second time we add file size is at 
setSize:
  synchronized (status) {
status.size = size;
baseDirManager.addCacheInfoUpdate(status);
  }
The fix is not to add the file size for for Private non-Archive File after 
download(downloadCacheObject).



> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation.
> --
>
> Key: MAPREDUCE-5969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>

[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Attachment: MAPREDUCE-5969.branch1.patch

> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation.
> --
>
> Key: MAPREDUCE-5969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-5969.branch1.patch
>
>
> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation. Private non-Archive Files list is passed in by "-files" command 
> line option. The Distributed Cache directory size is used to check whether 
> the total cache files size exceed the cache size limitation,  the default 
> cache size limitation is 10G.
> I add log in addCacheInfoUpdate and setSize in 
> TrackerDistributedCacheManager.java.
> I use the following command to test:
> hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
> hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
>  /tmp/zxu/test_in/ /tmp/zxu/test_out
> to add two files into distributed cache:WordCount.java and wordcount.jar.
> WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
> bytes. The total should be 6260.
> The log show these files size added twice:
> add one time before download to local node and add second time after download 
> to local node, so total file number becomes 4 instead of 2:
> addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
> addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
> addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
> In the code, for Private non-Archive File, the first time we add file size is 
> at 
> getLocalCache:
> if (!isArchive) {
>   //for private archives, the lengths come over RPC from the 
>   //JobLocalizer since the JobLocalizer is the one who expands
>   //archives and gets the total length
>   lcacheStatus.size = fileStatus.getLen();
>   LOG.info("getLocalCache:" + localizedPath + " size = "
>   + lcacheStatus.size);
>   // Increase the size and sub directory count of the cache
>   // from baseDirSize and baseDirNumberSubDir.
>   baseDirManager.addCacheInfoUpdate(lcacheStatus);
> }
> The second time we add file size is at 
> setSize:
>   synchronized (status) {
> status.size = size;
> baseDirManager.addCacheInfoUpdate(status);
>   }
> The fix is not to add the file size for for Private non-Archive File after 
> download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Attachment: (was: MAPREDUCE-5969.branch1.patch)

> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation.
> --
>
> Key: MAPREDUCE-5969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation. Private non-Archive Files list is passed in by "-files" command 
> line option. The Distributed Cache directory size is used to check whether 
> the total cache files size exceed the cache size limitation,  the default 
> cache size limitation is 10G.
> I add log in addCacheInfoUpdate and setSize in 
> TrackerDistributedCacheManager.java.
> I use the following command to test:
> hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
> hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
>  /tmp/zxu/test_in/ /tmp/zxu/test_out
> to add two files into distributed cache:WordCount.java and wordcount.jar.
> WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
> bytes. The total should be 6260.
> The log show these files size added twice:
> add one time before download to local node and add second time after download 
> to local node, so total file number becomes 4 instead of 2:
> addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
> addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
> addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
> In the code, for Private non-Archive File, the first time we add file size is 
> at 
> getLocalCache:
> if (!isArchive) {
>   //for private archives, the lengths come over RPC from the 
>   //JobLocalizer since the JobLocalizer is the one who expands
>   //archives and gets the total length
>   lcacheStatus.size = fileStatus.getLen();
>   LOG.info("getLocalCache:" + localizedPath + " size = "
>   + lcacheStatus.size);
>   // Increase the size and sub directory count of the cache
>   // from baseDirSize and baseDirNumberSubDir.
>   baseDirManager.addCacheInfoUpdate(lcacheStatus);
> }
> The second time we add file size is at 
> setSize:
>   synchronized (status) {
> status.size = size;
> baseDirManager.addCacheInfoUpdate(status);
>   }
> The fix is not to add the file size for for Private non-Archive File after 
> download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-14 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Attachment: MAPREDUCE-5969.branch1.patch

> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation.
> --
>
> Key: MAPREDUCE-5969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-5969.branch1.patch
>
>
> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation. Private non-Archive Files list is passed in by "-files" command 
> line option. The Distributed Cache directory size is used to check whether 
> the total cache files size exceed the cache size limitation,  the default 
> cache size limitation is 10G.
> I add log in addCacheInfoUpdate and setSize in 
> TrackerDistributedCacheManager.java.
> I use the following command to test:
> hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
> hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
>  /tmp/zxu/test_in/ /tmp/zxu/test_out
> to add two files into distributed cache:WordCount.java and wordcount.jar.
> WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
> bytes. The total should be 6260.
> The log show these files size added twice:
> add one time before download to local node and add second time after download 
> to local node, so total file number becomes 4 instead of 2:
> addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
> addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
> addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
> In the code, for Private non-Archive File, the first time we add file size is 
> at 
> getLocalCache:
> if (!isArchive) {
>   //for private archives, the lengths come over RPC from the 
>   //JobLocalizer since the JobLocalizer is the one who expands
>   //archives and gets the total length
>   lcacheStatus.size = fileStatus.getLen();
>   LOG.info("getLocalCache:" + localizedPath + " size = "
>   + lcacheStatus.size);
>   // Increase the size and sub directory count of the cache
>   // from baseDirSize and baseDirNumberSubDir.
>   baseDirManager.addCacheInfoUpdate(lcacheStatus);
> }
> The second time we add file size is at 
> setSize:
>   synchronized (status) {
> status.size = size;
> baseDirManager.addCacheInfoUpdate(status);
>   }
> The fix is not to add the file size for for Private non-Archive File after 
> download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-14 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Status: Patch Available  (was: Open)

> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation.
> --
>
> Key: MAPREDUCE-5969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-5969.branch1.patch
>
>
> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation. Private non-Archive Files list is passed in by "-files" command 
> line option. The Distributed Cache directory size is used to check whether 
> the total cache files size exceed the cache size limitation,  the default 
> cache size limitation is 10G.
> I add log in addCacheInfoUpdate and setSize in 
> TrackerDistributedCacheManager.java.
> I use the following command to test:
> hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
> hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
>  /tmp/zxu/test_in/ /tmp/zxu/test_out
> to add two files into distributed cache:WordCount.java and wordcount.jar.
> WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
> bytes. The total should be 6260.
> The log show these files size added twice:
> add one time before download to local node and add second time after download 
> to local node, so total file number becomes 4 instead of 2:
> addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
> addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
> addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
> In the code, for Private non-Archive File, the first time we add file size is 
> at 
> getLocalCache:
> if (!isArchive) {
>   //for private archives, the lengths come over RPC from the 
>   //JobLocalizer since the JobLocalizer is the one who expands
>   //archives and gets the total length
>   lcacheStatus.size = fileStatus.getLen();
>   LOG.info("getLocalCache:" + localizedPath + " size = "
>   + lcacheStatus.size);
>   // Increase the size and sub directory count of the cache
>   // from baseDirSize and baseDirNumberSubDir.
>   baseDirManager.addCacheInfoUpdate(lcacheStatus);
> }
> The second time we add file size is at 
> setSize:
>   synchronized (status) {
> status.size = size;
> baseDirManager.addCacheInfoUpdate(status);
>   }
> The fix is not to add the file size for for Private non-Archive File after 
> download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)