[jira] [Commented] (MAPREDUCE-6005) native-task: fix some valgrind errors

2014-07-31 Thread Sean Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080546#comment-14080546
 ] 

Sean Zhong commented on MAPREDUCE-6005:
---

{quote}
In merger, all MergeEntryPtr is owned by Merger::_entries, and is deleted in 
~Merger at end, so it doesn't require additional care.
{quote}

you are right, +1.

 native-task: fix some valgrind errors 
 --

 Key: MAPREDUCE-6005
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6005
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: MAPREDUCE-6005.v1.patch, MAPREDUCE-6005.v2.patch, 
 MAPREDUCE-6005.v3.patch


 Running test with valgrind shows there are some bugs, this jira try to fix 
 them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6005) native-task: fix some valgrind errors

2014-07-31 Thread Sean Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080567#comment-14080567
 ] 

Sean Zhong commented on MAPREDUCE-6005:
---

One more:

Can you also fix 
{quote}
string StringUtil::ToString(int32_t v) {
  char tmp[32];
  snprintf(tmp, 32, %d, v);
  return tmp;
}

string StringUtil::ToString(uint32_t v) {
  char tmp[32];
  snprintf(tmp, 32, %u, v);
  return tmp;
}

string StringUtil::ToString(int64_t v) {
  char tmp[32];
  snprintf(tmp, 32, %lld, (long long int)v);
  return tmp;
}

string StringUtil::ToString(int64_t v, char pad, int64_t len) {
  char tmp[32];
  snprintf(tmp, 32, %%%c%lldlld, pad, len);
  return Format(tmp, v);
}

string StringUtil::ToString(uint64_t v) {
  char tmp[32];
  snprintf(tmp, 32, %llu, (long long unsigned int)v);
  return tmp;
}

string StringUtil::ToString(bool v) {
  if (v) {
return true;
  } else {
return false;
  }
}

string StringUtil::ToString(float v) {
  char tmp[32];
  snprintf(tmp, 32, %f, v);
  return tmp;
}

string StringUtil::ToString(double v) {
  char tmp[32];
  snprintf(tmp, 32, %lf, v);
  return tmp;
}

{quote}

1) it is not safe to convert a char array to a string like this. It will 
trigger a copy contructor. But by 
http://www.cplusplus.com/reference/string/string/string/, 
{quote}
string (const char* s);
the string need to be null terminated.
{quote}

2)   snprintf(tmp, 32, %lf, v) impl is platform dependant when size  “32” 
equals the v length. It may truncate the raw data, or may ignore the null 
terminitor. http://linux.die.net/man/3/snprintf, 
{quote}
The functions snprintf() and vsnprintf() write at most size bytes (including 
the terminating null byte ('\0')) to str.
{quote}

 native-task: fix some valgrind errors 
 --

 Key: MAPREDUCE-6005
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6005
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: MAPREDUCE-6005.v1.patch, MAPREDUCE-6005.v2.patch, 
 MAPREDUCE-6005.v3.patch


 Running test with valgrind shows there are some bugs, this jira try to fix 
 them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-6005) native-task: fix some valgrind errors

2014-07-31 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated MAPREDUCE-6005:
-

Attachment: MAPREDUCE-6005.v4.patch

Thanks for the comments Sean. I change toString(double) and toString(float) to 
use Format, which is safe. For other fixed int types, buffer size 32 should be 
sufficient.


 native-task: fix some valgrind errors 
 --

 Key: MAPREDUCE-6005
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6005
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: MAPREDUCE-6005.v1.patch, MAPREDUCE-6005.v2.patch, 
 MAPREDUCE-6005.v3.patch, MAPREDUCE-6005.v4.patch


 Running test with valgrind shows there are some bugs, this jira try to fix 
 them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5968) Work directory is not deleted in DistCache if Exception happen in downloadCacheObject.

2014-07-31 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5968:
-

Attachment: (was: MAPREDUCE-5968.branch1.patch)

 Work directory is not deleted in  DistCache if Exception happen in 
 downloadCacheObject.
 ---

 Key: MAPREDUCE-5968
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5968
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.2.1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5968.branch1.patch


 Work directory is not deleted in  DistCache if Exception happen in 
 downloadCacheObject. In downloadCacheObject, the cache file will be copied to 
 temporarily work directory first, then the  work directory will be renamed to 
 the final directory. If IOException happens during the copy, the  work 
 directory will not be deleted. This will cause garbage data left in local 
 disk cache. For example If the MR application use Distributed Cache to send a 
 very large Archive/file(50G), if the disk is full during the copy, then the 
 IOException will be triggered, the work directory will be not deleted or 
 renamed and the work directory will occupy a big chunk of disk space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5968) Work directory is not deleted in DistCache if Exception happen in downloadCacheObject.

2014-07-31 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5968:
-

Attachment: MAPREDUCE-5968.branch1.patch

 Work directory is not deleted in  DistCache if Exception happen in 
 downloadCacheObject.
 ---

 Key: MAPREDUCE-5968
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5968
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.2.1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5968.branch1.patch


 Work directory is not deleted in  DistCache if Exception happen in 
 downloadCacheObject. In downloadCacheObject, the cache file will be copied to 
 temporarily work directory first, then the  work directory will be renamed to 
 the final directory. If IOException happens during the copy, the  work 
 directory will not be deleted. This will cause garbage data left in local 
 disk cache. For example If the MR application use Distributed Cache to send a 
 very large Archive/file(50G), if the disk is full during the copy, then the 
 IOException will be triggered, the work directory will be not deleted or 
 renamed and the work directory will occupy a big chunk of disk space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6005) native-task: fix some valgrind errors

2014-07-31 Thread Sean Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080651#comment-14080651
 ] 

Sean Zhong commented on MAPREDUCE-6005:
---

Thanks. +1

 native-task: fix some valgrind errors 
 --

 Key: MAPREDUCE-6005
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6005
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: MAPREDUCE-6005.v1.patch, MAPREDUCE-6005.v2.patch, 
 MAPREDUCE-6005.v3.patch, MAPREDUCE-6005.v4.patch


 Running test with valgrind shows there are some bugs, this jira try to fix 
 them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5968) Work directory is not deleted in DistCache if Exception happen in downloadCacheObject.

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080670#comment-14080670
 ] 

Hadoop QA commented on MAPREDUCE-5968:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12658867/MAPREDUCE-5968.branch1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4780//console

This message is automatically generated.

 Work directory is not deleted in  DistCache if Exception happen in 
 downloadCacheObject.
 ---

 Key: MAPREDUCE-5968
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5968
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.2.1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5968.branch1.patch


 Work directory is not deleted in  DistCache if Exception happen in 
 downloadCacheObject. In downloadCacheObject, the cache file will be copied to 
 temporarily work directory first, then the  work directory will be renamed to 
 the final directory. If IOException happens during the copy, the  work 
 directory will not be deleted. This will cause garbage data left in local 
 disk cache. For example If the MR application use Distributed Cache to send a 
 very large Archive/file(50G), if the disk is full during the copy, then the 
 IOException will be triggered, the work directory will be not deleted or 
 renamed and the work directory will occupy a big chunk of disk space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-6017) hadoop yarn mapreduce skip failed records doesn't work

2014-07-31 Thread Jakub Stransky (JIRA)
Jakub Stransky created MAPREDUCE-6017:
-

 Summary: hadoop yarn mapreduce skip failed records doesn't work
 Key: MAPREDUCE-6017
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6017
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.2.0
Reporter: Jakub Stransky
Priority: Minor


I am trying to use skip failed records map-reduce functionality during the 
map phase. I created special testing file with 8 corrupted records. I am using 
TextInputFormat and during the processing (of the record) map function fails 
with unhandled exception (parsing the record into expected structure). Job is 
using the old mapred api.

My job settings for enabling skip failed records feature:

property
namemapred.skip.mode.enabled/name
valuetrue/value
/property
property
namemapreduce.map.maxattempts/name
value10/value
/property
property
namemapreduce.task.skip.start.attempts/name
value1/value
/property
property
namemapreduce.map.skip.maxrecords/name
value1/value
/property

I verified that those properties are propagated via verification in job.xml. 
I am using hadoop 2.2.0 (HDP 2.0). Job is still failing after 10 attempts.

UPDATE:
- obviously job is not entering skip record mode

Q: Does this feature works on RecordReader level only? Hadoop definite guide 
(which is for v.1) descibes thais feature at the level of map/reduce funciton



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-6016) hadoop yarn mapreduce skip failed records doesn't work

2014-07-31 Thread Jakub Stransky (JIRA)
Jakub Stransky created MAPREDUCE-6016:
-

 Summary: hadoop yarn mapreduce skip failed records doesn't work
 Key: MAPREDUCE-6016
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6016
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.2.0
Reporter: Jakub Stransky
Priority: Minor


I am trying to use skip failed records map-reduce functionality during the 
map phase. I created special testing file with 8 corrupted records. I am using 
TextInputFormat and during the processing (of the record) map function fails 
with unhandled exception (parsing the record into expected structure). Job is 
using the old mapred api.

My job settings for enabling skip failed records feature:

property
namemapred.skip.mode.enabled/name
valuetrue/value
/property
property
namemapreduce.map.maxattempts/name
value10/value
/property
property
namemapreduce.task.skip.start.attempts/name
value1/value
/property
property
namemapreduce.map.skip.maxrecords/name
value1/value
/property

I verified that those properties are propagated via verification in job.xml. 
I am using hadoop 2.2.0 (HDP 2.0). Job is still failing after 10 attempts.

UPDATE:
- obviously job is not entering skip record mode

Q: Does this feature works on RecordReader level only? Hadoop definite guide 
(which is for v.1) descibes thais feature at the level of map/reduce funciton



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (MAPREDUCE-6016) hadoop yarn mapreduce skip failed records doesn't work

2014-07-31 Thread Jakub Stransky (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakub Stransky resolved MAPREDUCE-6016.
---

Resolution: Duplicate

Not sure what happen but issue report were duplicated MAPREDUCE-6017

 hadoop yarn mapreduce skip failed records doesn't work
 --

 Key: MAPREDUCE-6016
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6016
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.2.0
Reporter: Jakub Stransky
Priority: Minor

 I am trying to use skip failed records map-reduce functionality during the 
 map phase. I created special testing file with 8 corrupted records. I am 
 using TextInputFormat and during the processing (of the record) map function 
 fails with unhandled exception (parsing the record into expected structure). 
 Job is using the old mapred api.
 My job settings for enabling skip failed records feature:
 property
 namemapred.skip.mode.enabled/name
 valuetrue/value
 /property
 property
 namemapreduce.map.maxattempts/name
 value10/value
 /property
 property
 namemapreduce.task.skip.start.attempts/name
 value1/value
 /property
 property
 namemapreduce.map.skip.maxrecords/name
 value1/value
 /property
 I verified that those properties are propagated via verification in job.xml. 
 I am using hadoop 2.2.0 (HDP 2.0). Job is still failing after 10 attempts.
 UPDATE:
 - obviously job is not entering skip record mode
 Q: Does this feature works on RecordReader level only? Hadoop definite guide 
 (which is for v.1) descibes thais feature at the level of map/reduce funciton



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6015) Make MR ApplicationMaster disable loading user's jars firstly

2014-07-31 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081023#comment-14081023
 ] 

Sangjin Lee commented on MAPREDUCE-6015:


I understand the nature of the issue, but it also makes me wonder how effective 
the proposed solution would be. There are cases like uberized jobs where the 
proposed config probably cannot be used. Even in the case of non-uberized jobs, 
the MR app master loads several user classes and those may still need 
user.classpath.first for their operation.

Curious, have you used mapreduce.job.classloader? To me that should be a much 
more robust solution, including issues in the MR app master (with the fix that 
was recently made in MAPREDUCE-5957).

 Make MR ApplicationMaster disable loading user's jars firstly 
 --

 Key: MAPREDUCE-6015
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6015
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster
Affects Versions: 2.4.1
Reporter: Bing Jiang

 In most cases, we want to use -Dmapreduce.user.classpath.first=true to pick 
 user's jars ahead of hadoop system's jars, which can make tasks run based 
 upon the customized environment under the circumstance that hadoop system 
 default library contains the different version of dependent jars.
 However, using -Dmapreduce.user.classpath.first=true will cause 
 ApplicationMaster failure to launch due to conflicting classes.
 In most cases, if users do not customize the ApplicationMaster for MapReduce 
 framework, I believe we can treat MRAppMaster different with 
 MapTask/ReduceTask at the point of loading user's jar in classloader. 
 I believe it can provide  a property of  
 '-Dmapreduce.am.user.classpath.first=false'  to disable the feature of 
 loading user's jars firstly. 
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-6018) Create a framework specific config to enable timeline server

2014-07-31 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created MAPREDUCE-6018:
--

 Summary: Create a framework specific config to enable timeline 
server
 Key: MAPREDUCE-6018
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6018
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Jonathan Eagles






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-6019) MapReduce changes for exposing YARN/MR endpoints on multiple interfaces.

2014-07-31 Thread Xuan Gong (JIRA)
Xuan Gong created MAPREDUCE-6019:


 Summary: MapReduce changes for exposing YARN/MR endpoints on 
multiple interfaces.
 Key: MAPREDUCE-6019
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6019
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Xuan Gong
Assignee: Craig Welch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-31 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Attachment: MAPREDUCE-5969.branch1.patch

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 {code}
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 {code}
 The second time we add file size is at 
 setSize:
 {code}
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 {code}
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-31 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5969:
-

Attachment: (was: MAPREDUCE-5969.branch1.patch)

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 {code}
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 {code}
 The second time we add file size is at 
 setSize:
 {code}
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 {code}
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-31 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081388#comment-14081388
 ] 

zhihai xu commented on MAPREDUCE-5969:
--

I add unit test: testFileNotDoubleCounted to cover this error case. Without the 
fix, the test will fail.

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 {code}
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 {code}
 The second time we add file size is at 
 setSize:
 {code}
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 {code}
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (MAPREDUCE-6019) MapReduce changes for exposing YARN/MR endpoints on multiple interfaces.

2014-07-31 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved MAPREDUCE-6019.
--

   Resolution: Fixed
Fix Version/s: 2.6.0

Committed to trunk and branch-2 with YARN-1994

 MapReduce changes for exposing YARN/MR endpoints on multiple interfaces.
 

 Key: MAPREDUCE-6019
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6019
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Xuan Gong
Assignee: Craig Welch
 Fix For: 2.6.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved

2014-07-31 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated MAPREDUCE-6007:


Attachment: MAPREDUCE-6007.001.patch

A patch is attached which adds a d option to distcp -p. d is Disable raw.* 
namespace xattr preservation. Refer to the document attached to HDFS-6509 for 
more details on the motivation for this.

I've also taken the liberty to create a new DistCpTestUtils file to hold common 
methods for distcp tests.

Please have a look.

 Create a new option for distcp -p which causes raw.* namespace extended 
 attributes to not be preserved
 --

 Key: MAPREDUCE-6007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: distcp
Affects Versions: fs-encryption
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: MAPREDUCE-6007.001.patch


 As part of the Data at Rest Encryption work (HDFS-6134), we need to create a 
 new option for distcp which causes raw.* namespace extended attributes to not 
 be preserved. See the doc in HDFS-6509 for details. The default for this 
 option will be to preserve raw.* xattrs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job count

2014-07-31 Thread zhihai xu (JIRA)
zhihai xu created MAPREDUCE-6020:


 Summary: Too many threads blocking on the global JobTracker lock 
from getJobCounters, optimize getJobCounters to release global JobTracker lock 
before access the per job counter in JobInProgress
 Key: MAPREDUCE-6020
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.23.10
Reporter: zhihai xu
Assignee: zhihai xu


Too many threads blocking on the global JobTracker lock from getJobCounters, 
optimize getJobCounters to release global JobTracker lock before access the per 
job counter in JobInProgress. It may be a lot of JobClients to call 
getJobCounters in JobTracker at the same time, Current code will lock the 
JobTracker to block all the threads to get counter from JobInProgress. It is 
better to unlock the JobTracker when get counter from 
JobInProgress(job.getCounters(counters)). So all the theads can run parallel 
when access its own job counter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job count

2014-07-31 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6020:
-

Status: Patch Available  (was: Open)

 Too many threads blocking on the global JobTracker lock from getJobCounters, 
 optimize getJobCounters to release global JobTracker lock before access the 
 per job counter in JobInProgress
 -

 Key: MAPREDUCE-6020
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.23.10
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6020.branch1.patch


 Too many threads blocking on the global JobTracker lock from getJobCounters, 
 optimize getJobCounters to release global JobTracker lock before access the 
 per job counter in JobInProgress. It may be a lot of JobClients to call 
 getJobCounters in JobTracker at the same time, Current code will lock the 
 JobTracker to block all the threads to get counter from JobInProgress. It is 
 better to unlock the JobTracker when get counter from 
 JobInProgress(job.getCounters(counters)). So all the theads can run parallel 
 when access its own job counter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job count

2014-07-31 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6020:
-

Attachment: MAPREDUCE-6020.branch1.patch

 Too many threads blocking on the global JobTracker lock from getJobCounters, 
 optimize getJobCounters to release global JobTracker lock before access the 
 per job counter in JobInProgress
 -

 Key: MAPREDUCE-6020
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.23.10
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6020.branch1.patch


 Too many threads blocking on the global JobTracker lock from getJobCounters, 
 optimize getJobCounters to release global JobTracker lock before access the 
 per job counter in JobInProgress. It may be a lot of JobClients to call 
 getJobCounters in JobTracker at the same time, Current code will lock the 
 JobTracker to block all the threads to get counter from JobInProgress. It is 
 better to unlock the JobTracker when get counter from 
 JobInProgress(job.getCounters(counters)). So all the theads can run parallel 
 when access its own job counter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job cou

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081827#comment-14081827
 ] 

Hadoop QA commented on MAPREDUCE-6020:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12659032/MAPREDUCE-6020.branch1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4781//console

This message is automatically generated.

 Too many threads blocking on the global JobTracker lock from getJobCounters, 
 optimize getJobCounters to release global JobTracker lock before access the 
 per job counter in JobInProgress
 -

 Key: MAPREDUCE-6020
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.23.10
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6020.branch1.patch


 Too many threads blocking on the global JobTracker lock from getJobCounters, 
 optimize getJobCounters to release global JobTracker lock before access the 
 per job counter in JobInProgress. It may be a lot of JobClients to call 
 getJobCounters in JobTracker at the same time, Current code will lock the 
 JobTracker to block all the threads to get counter from JobInProgress. It is 
 better to unlock the JobTracker when get counter from 
 JobInProgress(job.getCounters(counters)). So all the theads can run parallel 
 when access its own job counter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081860#comment-14081860
 ] 

Hadoop QA commented on MAPREDUCE-5969:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12658993/MAPREDUCE-5969.branch1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4782//console

This message is automatically generated.

 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation.
 --

 Key: MAPREDUCE-5969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-5969.branch1.patch


 Private non-Archive Files' size add twice in Distributed Cache directory size 
 calculation. Private non-Archive Files list is passed in by -files command 
 line option. The Distributed Cache directory size is used to check whether 
 the total cache files size exceed the cache size limitation,  the default 
 cache size limitation is 10G.
 I add log in addCacheInfoUpdate and setSize in 
 TrackerDistributedCacheManager.java.
 I use the following command to test:
 hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
 hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
  /tmp/zxu/test_in/ /tmp/zxu/test_out
 to add two files into distributed cache:WordCount.java and wordcount.jar.
 WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
 bytes. The total should be 6260.
 The log show these files size added twice:
 add one time before download to local node and add second time after download 
 to local node, so total file number becomes 4 instead of 2:
 addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
 addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
 addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
 In the code, for Private non-Archive File, the first time we add file size is 
 at 
 getLocalCache:
 {code}
 if (!isArchive) {
   //for private archives, the lengths come over RPC from the 
   //JobLocalizer since the JobLocalizer is the one who expands
   //archives and gets the total length
   lcacheStatus.size = fileStatus.getLen();
   LOG.info(getLocalCache: + localizedPath +  size = 
   + lcacheStatus.size);
   // Increase the size and sub directory count of the cache
   // from baseDirSize and baseDirNumberSubDir.
   baseDirManager.addCacheInfoUpdate(lcacheStatus);
 }
 {code}
 The second time we add file size is at 
 setSize:
 {code}
   synchronized (status) {
 status.size = size;
 baseDirManager.addCacheInfoUpdate(status);
   }
 {code}
 The fix is not to add the file size for for Private non-Archive File after 
 download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.2#6252)