[jira] [Commented] (MAPREDUCE-6005) native-task: fix some valgrind errors
[ https://issues.apache.org/jira/browse/MAPREDUCE-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080546#comment-14080546 ] Sean Zhong commented on MAPREDUCE-6005: --- {quote} In merger, all MergeEntryPtr is owned by Merger::_entries, and is deleted in ~Merger at end, so it doesn't require additional care. {quote} you are right, +1. native-task: fix some valgrind errors -- Key: MAPREDUCE-6005 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6005 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-6005.v1.patch, MAPREDUCE-6005.v2.patch, MAPREDUCE-6005.v3.patch Running test with valgrind shows there are some bugs, this jira try to fix them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6005) native-task: fix some valgrind errors
[ https://issues.apache.org/jira/browse/MAPREDUCE-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080567#comment-14080567 ] Sean Zhong commented on MAPREDUCE-6005: --- One more: Can you also fix {quote} string StringUtil::ToString(int32_t v) { char tmp[32]; snprintf(tmp, 32, %d, v); return tmp; } string StringUtil::ToString(uint32_t v) { char tmp[32]; snprintf(tmp, 32, %u, v); return tmp; } string StringUtil::ToString(int64_t v) { char tmp[32]; snprintf(tmp, 32, %lld, (long long int)v); return tmp; } string StringUtil::ToString(int64_t v, char pad, int64_t len) { char tmp[32]; snprintf(tmp, 32, %%%c%lldlld, pad, len); return Format(tmp, v); } string StringUtil::ToString(uint64_t v) { char tmp[32]; snprintf(tmp, 32, %llu, (long long unsigned int)v); return tmp; } string StringUtil::ToString(bool v) { if (v) { return true; } else { return false; } } string StringUtil::ToString(float v) { char tmp[32]; snprintf(tmp, 32, %f, v); return tmp; } string StringUtil::ToString(double v) { char tmp[32]; snprintf(tmp, 32, %lf, v); return tmp; } {quote} 1) it is not safe to convert a char array to a string like this. It will trigger a copy contructor. But by http://www.cplusplus.com/reference/string/string/string/, {quote} string (const char* s); the string need to be null terminated. {quote} 2) snprintf(tmp, 32, %lf, v) impl is platform dependant when size “32” equals the v length. It may truncate the raw data, or may ignore the null terminitor. http://linux.die.net/man/3/snprintf, {quote} The functions snprintf() and vsnprintf() write at most size bytes (including the terminating null byte ('\0')) to str. {quote} native-task: fix some valgrind errors -- Key: MAPREDUCE-6005 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6005 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-6005.v1.patch, MAPREDUCE-6005.v2.patch, MAPREDUCE-6005.v3.patch Running test with valgrind shows there are some bugs, this jira try to fix them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-6005) native-task: fix some valgrind errors
[ https://issues.apache.org/jira/browse/MAPREDUCE-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated MAPREDUCE-6005: - Attachment: MAPREDUCE-6005.v4.patch Thanks for the comments Sean. I change toString(double) and toString(float) to use Format, which is safe. For other fixed int types, buffer size 32 should be sufficient. native-task: fix some valgrind errors -- Key: MAPREDUCE-6005 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6005 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-6005.v1.patch, MAPREDUCE-6005.v2.patch, MAPREDUCE-6005.v3.patch, MAPREDUCE-6005.v4.patch Running test with valgrind shows there are some bugs, this jira try to fix them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5968) Work directory is not deleted in DistCache if Exception happen in downloadCacheObject.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5968: - Attachment: (was: MAPREDUCE-5968.branch1.patch) Work directory is not deleted in DistCache if Exception happen in downloadCacheObject. --- Key: MAPREDUCE-5968 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5968 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5968.branch1.patch Work directory is not deleted in DistCache if Exception happen in downloadCacheObject. In downloadCacheObject, the cache file will be copied to temporarily work directory first, then the work directory will be renamed to the final directory. If IOException happens during the copy, the work directory will not be deleted. This will cause garbage data left in local disk cache. For example If the MR application use Distributed Cache to send a very large Archive/file(50G), if the disk is full during the copy, then the IOException will be triggered, the work directory will be not deleted or renamed and the work directory will occupy a big chunk of disk space. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5968) Work directory is not deleted in DistCache if Exception happen in downloadCacheObject.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5968: - Attachment: MAPREDUCE-5968.branch1.patch Work directory is not deleted in DistCache if Exception happen in downloadCacheObject. --- Key: MAPREDUCE-5968 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5968 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5968.branch1.patch Work directory is not deleted in DistCache if Exception happen in downloadCacheObject. In downloadCacheObject, the cache file will be copied to temporarily work directory first, then the work directory will be renamed to the final directory. If IOException happens during the copy, the work directory will not be deleted. This will cause garbage data left in local disk cache. For example If the MR application use Distributed Cache to send a very large Archive/file(50G), if the disk is full during the copy, then the IOException will be triggered, the work directory will be not deleted or renamed and the work directory will occupy a big chunk of disk space. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6005) native-task: fix some valgrind errors
[ https://issues.apache.org/jira/browse/MAPREDUCE-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080651#comment-14080651 ] Sean Zhong commented on MAPREDUCE-6005: --- Thanks. +1 native-task: fix some valgrind errors -- Key: MAPREDUCE-6005 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6005 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-6005.v1.patch, MAPREDUCE-6005.v2.patch, MAPREDUCE-6005.v3.patch, MAPREDUCE-6005.v4.patch Running test with valgrind shows there are some bugs, this jira try to fix them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5968) Work directory is not deleted in DistCache if Exception happen in downloadCacheObject.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080670#comment-14080670 ] Hadoop QA commented on MAPREDUCE-5968: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658867/MAPREDUCE-5968.branch1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4780//console This message is automatically generated. Work directory is not deleted in DistCache if Exception happen in downloadCacheObject. --- Key: MAPREDUCE-5968 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5968 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5968.branch1.patch Work directory is not deleted in DistCache if Exception happen in downloadCacheObject. In downloadCacheObject, the cache file will be copied to temporarily work directory first, then the work directory will be renamed to the final directory. If IOException happens during the copy, the work directory will not be deleted. This will cause garbage data left in local disk cache. For example If the MR application use Distributed Cache to send a very large Archive/file(50G), if the disk is full during the copy, then the IOException will be triggered, the work directory will be not deleted or renamed and the work directory will occupy a big chunk of disk space. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-6017) hadoop yarn mapreduce skip failed records doesn't work
Jakub Stransky created MAPREDUCE-6017: - Summary: hadoop yarn mapreduce skip failed records doesn't work Key: MAPREDUCE-6017 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6017 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.2.0 Reporter: Jakub Stransky Priority: Minor I am trying to use skip failed records map-reduce functionality during the map phase. I created special testing file with 8 corrupted records. I am using TextInputFormat and during the processing (of the record) map function fails with unhandled exception (parsing the record into expected structure). Job is using the old mapred api. My job settings for enabling skip failed records feature: property namemapred.skip.mode.enabled/name valuetrue/value /property property namemapreduce.map.maxattempts/name value10/value /property property namemapreduce.task.skip.start.attempts/name value1/value /property property namemapreduce.map.skip.maxrecords/name value1/value /property I verified that those properties are propagated via verification in job.xml. I am using hadoop 2.2.0 (HDP 2.0). Job is still failing after 10 attempts. UPDATE: - obviously job is not entering skip record mode Q: Does this feature works on RecordReader level only? Hadoop definite guide (which is for v.1) descibes thais feature at the level of map/reduce funciton -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-6016) hadoop yarn mapreduce skip failed records doesn't work
Jakub Stransky created MAPREDUCE-6016: - Summary: hadoop yarn mapreduce skip failed records doesn't work Key: MAPREDUCE-6016 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6016 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.2.0 Reporter: Jakub Stransky Priority: Minor I am trying to use skip failed records map-reduce functionality during the map phase. I created special testing file with 8 corrupted records. I am using TextInputFormat and during the processing (of the record) map function fails with unhandled exception (parsing the record into expected structure). Job is using the old mapred api. My job settings for enabling skip failed records feature: property namemapred.skip.mode.enabled/name valuetrue/value /property property namemapreduce.map.maxattempts/name value10/value /property property namemapreduce.task.skip.start.attempts/name value1/value /property property namemapreduce.map.skip.maxrecords/name value1/value /property I verified that those properties are propagated via verification in job.xml. I am using hadoop 2.2.0 (HDP 2.0). Job is still failing after 10 attempts. UPDATE: - obviously job is not entering skip record mode Q: Does this feature works on RecordReader level only? Hadoop definite guide (which is for v.1) descibes thais feature at the level of map/reduce funciton -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-6016) hadoop yarn mapreduce skip failed records doesn't work
[ https://issues.apache.org/jira/browse/MAPREDUCE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakub Stransky resolved MAPREDUCE-6016. --- Resolution: Duplicate Not sure what happen but issue report were duplicated MAPREDUCE-6017 hadoop yarn mapreduce skip failed records doesn't work -- Key: MAPREDUCE-6016 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6016 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.2.0 Reporter: Jakub Stransky Priority: Minor I am trying to use skip failed records map-reduce functionality during the map phase. I created special testing file with 8 corrupted records. I am using TextInputFormat and during the processing (of the record) map function fails with unhandled exception (parsing the record into expected structure). Job is using the old mapred api. My job settings for enabling skip failed records feature: property namemapred.skip.mode.enabled/name valuetrue/value /property property namemapreduce.map.maxattempts/name value10/value /property property namemapreduce.task.skip.start.attempts/name value1/value /property property namemapreduce.map.skip.maxrecords/name value1/value /property I verified that those properties are propagated via verification in job.xml. I am using hadoop 2.2.0 (HDP 2.0). Job is still failing after 10 attempts. UPDATE: - obviously job is not entering skip record mode Q: Does this feature works on RecordReader level only? Hadoop definite guide (which is for v.1) descibes thais feature at the level of map/reduce funciton -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6015) Make MR ApplicationMaster disable loading user's jars firstly
[ https://issues.apache.org/jira/browse/MAPREDUCE-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081023#comment-14081023 ] Sangjin Lee commented on MAPREDUCE-6015: I understand the nature of the issue, but it also makes me wonder how effective the proposed solution would be. There are cases like uberized jobs where the proposed config probably cannot be used. Even in the case of non-uberized jobs, the MR app master loads several user classes and those may still need user.classpath.first for their operation. Curious, have you used mapreduce.job.classloader? To me that should be a much more robust solution, including issues in the MR app master (with the fix that was recently made in MAPREDUCE-5957). Make MR ApplicationMaster disable loading user's jars firstly -- Key: MAPREDUCE-6015 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6015 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster Affects Versions: 2.4.1 Reporter: Bing Jiang In most cases, we want to use -Dmapreduce.user.classpath.first=true to pick user's jars ahead of hadoop system's jars, which can make tasks run based upon the customized environment under the circumstance that hadoop system default library contains the different version of dependent jars. However, using -Dmapreduce.user.classpath.first=true will cause ApplicationMaster failure to launch due to conflicting classes. In most cases, if users do not customize the ApplicationMaster for MapReduce framework, I believe we can treat MRAppMaster different with MapTask/ReduceTask at the point of loading user's jar in classloader. I believe it can provide a property of '-Dmapreduce.am.user.classpath.first=false' to disable the feature of loading user's jars firstly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-6018) Create a framework specific config to enable timeline server
Jonathan Eagles created MAPREDUCE-6018: -- Summary: Create a framework specific config to enable timeline server Key: MAPREDUCE-6018 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6018 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Jonathan Eagles -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-6019) MapReduce changes for exposing YARN/MR endpoints on multiple interfaces.
Xuan Gong created MAPREDUCE-6019: Summary: MapReduce changes for exposing YARN/MR endpoints on multiple interfaces. Key: MAPREDUCE-6019 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6019 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: Xuan Gong Assignee: Craig Welch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: {code} if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } {code} The second time we add file size is at setSize: {code} synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } {code} The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: - Attachment: (was: MAPREDUCE-5969.branch1.patch) Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: {code} if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } {code} The second time we add file size is at setSize: {code} synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } {code} The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081388#comment-14081388 ] zhihai xu commented on MAPREDUCE-5969: -- I add unit test: testFileNotDoubleCounted to cover this error case. Without the fix, the test will fail. Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: {code} if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } {code} The second time we add file size is at setSize: {code} synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } {code} The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-6019) MapReduce changes for exposing YARN/MR endpoints on multiple interfaces.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong resolved MAPREDUCE-6019. -- Resolution: Fixed Fix Version/s: 2.6.0 Committed to trunk and branch-2 with YARN-1994 MapReduce changes for exposing YARN/MR endpoints on multiple interfaces. Key: MAPREDUCE-6019 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6019 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: Xuan Gong Assignee: Craig Welch Fix For: 2.6.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved
[ https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated MAPREDUCE-6007: Attachment: MAPREDUCE-6007.001.patch A patch is attached which adds a d option to distcp -p. d is Disable raw.* namespace xattr preservation. Refer to the document attached to HDFS-6509 for more details on the motivation for this. I've also taken the liberty to create a new DistCpTestUtils file to hold common methods for distcp tests. Please have a look. Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved -- Key: MAPREDUCE-6007 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007 Project: Hadoop Map/Reduce Issue Type: New Feature Components: distcp Affects Versions: fs-encryption Reporter: Charles Lamb Assignee: Charles Lamb Attachments: MAPREDUCE-6007.001.patch As part of the Data at Rest Encryption work (HDFS-6134), we need to create a new option for distcp which causes raw.* namespace extended attributes to not be preserved. See the doc in HDFS-6509 for details. The default for this option will be to preserve raw.* xattrs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job count
zhihai xu created MAPREDUCE-6020: Summary: Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress Key: MAPREDUCE-6020 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.10 Reporter: zhihai xu Assignee: zhihai xu Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress. It may be a lot of JobClients to call getJobCounters in JobTracker at the same time, Current code will lock the JobTracker to block all the threads to get counter from JobInProgress. It is better to unlock the JobTracker when get counter from JobInProgress(job.getCounters(counters)). So all the theads can run parallel when access its own job counter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job count
[ https://issues.apache.org/jira/browse/MAPREDUCE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6020: - Status: Patch Available (was: Open) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress - Key: MAPREDUCE-6020 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.10 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-6020.branch1.patch Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress. It may be a lot of JobClients to call getJobCounters in JobTracker at the same time, Current code will lock the JobTracker to block all the threads to get counter from JobInProgress. It is better to unlock the JobTracker when get counter from JobInProgress(job.getCounters(counters)). So all the theads can run parallel when access its own job counter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job count
[ https://issues.apache.org/jira/browse/MAPREDUCE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6020: - Attachment: MAPREDUCE-6020.branch1.patch Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress - Key: MAPREDUCE-6020 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.10 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-6020.branch1.patch Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress. It may be a lot of JobClients to call getJobCounters in JobTracker at the same time, Current code will lock the JobTracker to block all the threads to get counter from JobInProgress. It is better to unlock the JobTracker when get counter from JobInProgress(job.getCounters(counters)). So all the theads can run parallel when access its own job counter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job cou
[ https://issues.apache.org/jira/browse/MAPREDUCE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081827#comment-14081827 ] Hadoop QA commented on MAPREDUCE-6020: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659032/MAPREDUCE-6020.branch1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4781//console This message is automatically generated. Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress - Key: MAPREDUCE-6020 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.10 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-6020.branch1.patch Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress. It may be a lot of JobClients to call getJobCounters in JobTracker at the same time, Current code will lock the JobTracker to block all the threads to get counter from JobInProgress. It is better to unlock the JobTracker when get counter from JobInProgress(job.getCounters(counters)). So all the theads can run parallel when access its own job counter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081860#comment-14081860 ] Hadoop QA commented on MAPREDUCE-5969: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658993/MAPREDUCE-5969.branch1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4782//console This message is automatically generated. Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: {code} if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } {code} The second time we add file size is at setSize: {code} synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } {code} The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)