[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5969:

Labels: BB2015-05-TBR  (was: )

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5969.branch1.1.patch, 
 MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 {code}
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 {code}
 The second time we add file size is at 
 setSize:
 {code}
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 {code}
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-10-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-
Attachment: MAPREDUCE-5969.branch1.1.patch

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.1.patch, 
 MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 {code}
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 {code}
 The second time we add file size is at 
 setSize:
 {code}
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 {code}
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-31 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Attachment: MAPREDUCE-5969.branch1.patch

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 {code}
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 {code}
 The second time we add file size is at 
 setSize:
 {code}
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 {code}
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-31 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Attachment: (was: MAPREDUCE-5969.branch1.patch)

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 {code}
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 {code}
 The second time we add file size is at 
 setSize:
 {code}
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 {code}
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-25 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Description: 
Private non-Archive Files' size add twice in Distributed Cache directory size 
calculation. Private non-Archive Files list is passed in by -files command 
line option. The Distributed Cache directory size is used to check whether the 
total cache files size exceed the cache size limitation,  the default cache 
size limitation is 10G.
I add log in addCacheInfoUpdate and setSize in 
TrackerDistributedCacheManager.java.
I use the following command to test:
hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar 
/tmp/zxu/test_in/ /tmp/zxu/test_out
to add two files into distributed cache:WordCount.java and wordcount.jar.
WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
bytes. The total should be 6260.
The log show these files size added twice:
add one time before download to local node and add second time after download 
to local node, so total file number becomes 4 instead of 2:
addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
In the code, for Private non-Archive File, the first time we add file size is 
at 
getLocalCache:
{code}
if (!isArchive) {
  //for private archives, the lengths come over RPC from the 
  //JobLocalizer since the JobLocalizer is the one who expands
  //archives and gets the total length
  lcacheStatus.size = fileStatus.getLen();

  LOG.info(getLocalCache: + localizedPath +  size = 
  + lcacheStatus.size);
  // Increase the size and sub directory count of the cache
  // from baseDirSize and baseDirNumberSubDir.
  baseDirManager.addCacheInfoUpdate(lcacheStatus);
}
{code}
The second time we add file size is at 
setSize:
{code}
  synchronized (status) {
status.size = size;
baseDirManager.addCacheInfoUpdate(status);
  }
{code}
The fix is not to add the file size for for Private non-Archive File after 
download(downloadCacheObject).


  was:
Private non-Archive Files' size add twice in Distributed Cache directory size 
calculation. Private non-Archive Files list is passed in by -files command 
line option. The Distributed Cache directory size is used to check whether the 
total cache files size exceed the cache size limitation,  the default cache 
size limitation is 10G.
I add log in addCacheInfoUpdate and setSize in 
TrackerDistributedCacheManager.java.
I use the following command to test:
hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar 
/tmp/zxu/test_in/ /tmp/zxu/test_out
to add two files into distributed cache:WordCount.java and wordcount.jar.
WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
bytes. The total should be 6260.
The log show these files size added twice:
add one time before download to local node and add second time after download 
to local node, so total file number becomes 4 instead of 2:
addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
In the code, for Private non-Archive File, the first time we add file size is 
at 
getLocalCache:
if (!isArchive) {
  //for private archives, the lengths come over RPC from the 
  //JobLocalizer since the JobLocalizer is the one who expands
  //archives and gets the total length
  lcacheStatus.size = fileStatus.getLen();

  LOG.info(getLocalCache: + localizedPath +  size = 
  + lcacheStatus.size);
  // Increase the size and sub directory count of the cache
  // from baseDirSize and baseDirNumberSubDir.
  baseDirManager.addCacheInfoUpdate(lcacheStatus);
}
The second time we add file size is at 
setSize:
  synchronized (status) {
status.size = size;
baseDirManager.addCacheInfoUpdate(status);
  }
The fix is not to add the file size for for Private non-Archive File after 
download(downloadCacheObject).



 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: 

[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Attachment: (was: MAPREDUCE-5969.branch1.patch)

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 The second time we add file size is at 
 setSize:
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Attachment: MAPREDUCE-5969.branch1.patch

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 The second time we add file size is at 
 setSize:
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-14 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Status: Patch Available  (was: Open)

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 The second time we add file size is at 
 setSize:
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-14 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Attachment: MAPREDUCE-5969.branch1.patch

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 The second time we add file size is at 
 setSize:
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)