[jira] [Created] (MAPREDUCE-6193) Hadoop 2.x MapReduce Job Counter Data Local Maps Lower than Hadoop 1.x
Xu Chen created MAPREDUCE-6193: -- Summary: Hadoop 2.x MapReduce Job Counter Data Local Maps Lower than Hadoop 1.x Key: MAPREDUCE-6193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6193 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.4.0 Reporter: Xu Chen Assignee: Xu Chen I run the MapReduce job at 400+ node of cluster based on Hadoop 2.4.0 confined scheduer is FairScheduer and I noticed Job Counter of Data-Local is much lower than Hadoop 1.x Such as these situations: Hadoop 1.x Data-local 99% Rack-Local 1% Hadoop 2.4.0 Data-Local 75% Rack-Local 25% So I looked up the source code of Hadoop 2.4.0 MRAppMaster and YARN-FairScheduer,there are some situations that may lead to this kind of problem. We know MRAppMaster builds the Map of Priority->ResourceNamer->Capacity->RemoteRequest->NumContainer Too many containers are assigned to MRAppMaster from FairScheduler MRAppMaster addContainerReq() and assignContainer() have changed NumContainer which will send RemoteRequest to FairScheduler, and the FairScheduler will reset value of NumContainer by the MRAppMaster’s heartbeat, but FairScheduler set NumContainer itself when handle NODE_UPDATE event , So if the heartbeat of MRAppMaster’s NumContainer next time is bigger than FairScheduler’s NumContainer,the extra container is redundant for MRAppMaster,and MRAppMaster will assign this container to Rack-Local because no task is needed on this container’s host now Besides, when one task requires more than one host, it will also cause this problem. So the conclusion is the FairScheduler’s NumConainer is reset by MRAppMaster’s heartbeat and handle NODE_UPDATE event , both of MRAppMaster’s and NODE_UPDATE are async logic I found properties of FairScheduler’s config there are yarn.scheduler.fair.locality.threshold.node, yarn.scheduler.fair.locality.threshold.rack and I’m confused that FairScheuler assignContainer() should be invoked app.addSchedulingOpportunity(priority) after NODE_LOCAL assigned logic . but now is opposite , means the application have chance to assign a container is opportunity will increment , and when the application missed node of NODE_LOCAL opportunity is great than locality.threshold.node most time ,so those properties is useless for me . —— And if AppMaster sends no RemoteRequest.ANY at the same priority request , the Scheudler will get NPE ,and the ResourceManager will exit immediately see this public synchronized int getTotalRequiredResources(Priority) { return getResourceRequest(priority,RMNode.ANY).getNumContainers(); } Anyone has ideas for those issues please comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245131#comment-14245131 ] Hadoop QA commented on MAPREDUCE-4879: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687005/MAPREDUCE-4879.004.patch against trunk revision 0e37bbc. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-examples. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5078//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5078//console This message is automatically generated. > TeraOutputFormat may overwrite an existing output directory > --- > > Key: MAPREDUCE-4879 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: 1.2.1, 2.6.0 >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: MAPREDUCE-4879-trunk-rev1.patch, > MAPREDUCE-4879-trunk.patch, MAPREDUCE-4879.003.patch, MAPREDUCE-4879.004.patch > > > Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs > from writing into an existing directory, and potentially overwriting previous > runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated MAPREDUCE-4879: - Attachment: MAPREDUCE-4879.004.patch Fixing javac warnings in 004. > TeraOutputFormat may overwrite an existing output directory > --- > > Key: MAPREDUCE-4879 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: 1.2.1, 2.6.0 >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: MAPREDUCE-4879-trunk-rev1.patch, > MAPREDUCE-4879-trunk.patch, MAPREDUCE-4879.003.patch, MAPREDUCE-4879.004.patch > > > Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs > from writing into an existing directory, and potentially overwriting previous > runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6133) Rumen is not generating json for .hist file
[ https://issues.apache.org/jira/browse/MAPREDUCE-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Kestelyn updated MAPREDUCE-6133: --- Assignee: (was: Justin Kestelyn) > Rumen is not generating json for .hist file > --- > > Key: MAPREDUCE-6133 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6133 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen >Affects Versions: 2.4.1 >Reporter: ARPAN BHANDARI > Attachments: > job_1413468876912_0002-1413563166476-impadmin-word+count-1413563185844-1-1-SUCCEEDED-default-1413563171610.jhist > > > Rumen is creating a blank json for .hist file.Please find below command that > is being run : > java -cp > hadoop-2.4.1/share/hadoop/common/hadoop-common-2.4.1.jar:hadoop-2.4.1/share/hadoop/tools/lib/hadoop-rumen-2.4.1.jar:hadoop-2.4.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:hadoop-2.4.1/share/hadoop/common/lib/commons-cli-1.2.jar:hadoop-2.4.1/share/hadoop/common/lib/commons-configuration-1.6.jar:hadoop-2.4.1/share/hadoop/common/lib/commons-lang-2.6.jar:hadoop-2.4.1/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:hadoop-2.4.1/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:hadoop-2.4.1/share/hadoop/tools/lib/guava-11.0.2.jar:hadoop-2.4.1/share/hadoop/tools/lib/commons-collections-3.2.1.jar:hadoop-2.4.1/share/hadoop/common/lib/hadoop-auth-2.4.1.jar:hadoop-2.4.1/share/hadoop/common/lib/slf4j-api-1.7.5.jar:hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar:hadoop-2.4.1/share/hadoop/common/lib/log4j-1.2.17.jar > org.apache.hadoop.tools.rumen.TraceBuilder file://job-trace.json > file://topology > file://job_1413468876912_0002-1413563166476-impadmin-word+count-1413563185844-1-1-SUCCEEDED-default-1413563171610.jhist -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244708#comment-14244708 ] Hadoop QA commented on MAPREDUCE-4879: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686922/MAPREDUCE-4879.003.patch against trunk revision 3681de2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1222 javac compiler warnings (more than the trunk's current 1221 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-examples. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5077//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5077//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5077//console This message is automatically generated. > TeraOutputFormat may overwrite an existing output directory > --- > > Key: MAPREDUCE-4879 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: 1.2.1, 2.6.0 >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: MAPREDUCE-4879-trunk-rev1.patch, > MAPREDUCE-4879-trunk.patch, MAPREDUCE-4879.003.patch > > > Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs > from writing into an existing directory, and potentially overwriting previous > runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6192) Create unit test to automatically compare MR related classes and mapred-default.xml
Ray Chiang created MAPREDUCE-6192: - Summary: Create unit test to automatically compare MR related classes and mapred-default.xml Key: MAPREDUCE-6192 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6192 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Create a unit test that will automatically compare the fields in the various MapReduce related classes and mapred-default.xml. It should throw an error if a property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated MAPREDUCE-4879: - Target Version/s: 2.7.0 Affects Version/s: (was: trunk) 1.2.1 2.6.0 Status: Patch Available (was: Open) > TeraOutputFormat may overwrite an existing output directory > --- > > Key: MAPREDUCE-4879 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: 2.6.0, 1.2.1 >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: MAPREDUCE-4879-trunk-rev1.patch, > MAPREDUCE-4879-trunk.patch, MAPREDUCE-4879.003.patch > > > Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs > from writing into an existing directory, and potentially overwriting previous > runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated MAPREDUCE-4879: - Attachment: MAPREDUCE-4879.003.patch Thanks for picking up the review [~cnauroth]! Please check MAPREDUCE-4879.003.patch > TeraOutputFormat may overwrite an existing output directory > --- > > Key: MAPREDUCE-4879 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: trunk >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: MAPREDUCE-4879-trunk-rev1.patch, > MAPREDUCE-4879-trunk.patch, MAPREDUCE-4879.003.patch > > > Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs > from writing into an existing directory, and potentially overwriting previous > runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6149) Document override log4j.properties in MR job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244428#comment-14244428 ] Hadoop QA commented on MAPREDUCE-6149: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686862/MAPREDUCE-6149.patch against trunk revision bda748a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5076//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5076//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5076//console This message is automatically generated. > Document override log4j.properties in MR job > > > Key: MAPREDUCE-6149 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6149 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: documentation >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-6149.patch > > > This new feature comes from MAPREDUCE-6052, some documentation requirements > from Vinod below: > Document the new config in mapred-default.xml > Mention in that documentation that if no-scheme is given in the path, it > defaults to a log4j file on the local FS. > Modify the documentation of log-level configs to say that if you override > to have your own log4j.properties file, the log-level configs may not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6149) Document override log4j.properties in MR job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6149: -- Status: Patch Available (was: Open) > Document override log4j.properties in MR job > > > Key: MAPREDUCE-6149 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6149 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: documentation >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-6149.patch > > > This new feature comes from MAPREDUCE-6052, some documentation requirements > from Vinod below: > Document the new config in mapred-default.xml > Mention in that documentation that if no-scheme is given in the path, it > defaults to a log4j file on the local FS. > Modify the documentation of log-level configs to say that if you override > to have your own log4j.properties file, the log-level configs may not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6149) Document override log4j.properties in MR job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6149: -- Attachment: MAPREDUCE-6149.patch > Document override log4j.properties in MR job > > > Key: MAPREDUCE-6149 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6149 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: documentation >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-6149.patch > > > This new feature comes from MAPREDUCE-6052, some documentation requirements > from Vinod below: > Document the new config in mapred-default.xml > Mention in that documentation that if no-scheme is given in the path, it > defaults to a log4j file on the local FS. > Modify the documentation of log-level configs to say that if you override > to have your own log4j.properties file, the log-level configs may not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6046) Change the class name for logs in RMCommunicator.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244275#comment-14244275 ] Hudson commented on MAPREDUCE-6046: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1990 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1990/]) MAPREDUCE-6046. Change the class name for logs in RMCommunicator. (devaraj: rev 0bd022911013629a8c9e7357fae8cf4399d7a1e3) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java * hadoop-mapreduce-project/CHANGES.txt > Change the class name for logs in RMCommunicator.java > - > > Key: MAPREDUCE-6046 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6046 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 3.0.0 >Reporter: Devaraj K >Assignee: Sahil Takiar >Priority: Minor > Labels: newbie > Fix For: 2.7.0 > > Attachments: MAPREDUCE-6046-01.patch > > > It is little confusing when the logs gets generated with the class name as > RMContainerAllocator and not present in RMContainerAllocator.java. > {code:title=RMCommunicator.java|borderStyle=solid} > private static final Log LOG = > LogFactory.getLog(RMContainerAllocator.class); > {code} > In the above RMContainerAllocator.class needs to be changed to > RMCommunicator.class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6046) Change the class name for logs in RMCommunicator.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244248#comment-14244248 ] Hudson commented on MAPREDUCE-6046: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #40 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/40/]) MAPREDUCE-6046. Change the class name for logs in RMCommunicator. (devaraj: rev 0bd022911013629a8c9e7357fae8cf4399d7a1e3) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java > Change the class name for logs in RMCommunicator.java > - > > Key: MAPREDUCE-6046 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6046 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 3.0.0 >Reporter: Devaraj K >Assignee: Sahil Takiar >Priority: Minor > Labels: newbie > Fix For: 2.7.0 > > Attachments: MAPREDUCE-6046-01.patch > > > It is little confusing when the logs gets generated with the class name as > RMContainerAllocator and not present in RMContainerAllocator.java. > {code:title=RMCommunicator.java|borderStyle=solid} > private static final Log LOG = > LogFactory.getLog(RMContainerAllocator.class); > {code} > In the above RMContainerAllocator.class needs to be changed to > RMCommunicator.class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6046) Change the class name for logs in RMCommunicator.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244205#comment-14244205 ] Hudson commented on MAPREDUCE-6046: --- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #36 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/36/]) MAPREDUCE-6046. Change the class name for logs in RMCommunicator. (devaraj: rev 0bd022911013629a8c9e7357fae8cf4399d7a1e3) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java > Change the class name for logs in RMCommunicator.java > - > > Key: MAPREDUCE-6046 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6046 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 3.0.0 >Reporter: Devaraj K >Assignee: Sahil Takiar >Priority: Minor > Labels: newbie > Fix For: 2.7.0 > > Attachments: MAPREDUCE-6046-01.patch > > > It is little confusing when the logs gets generated with the class name as > RMContainerAllocator and not present in RMContainerAllocator.java. > {code:title=RMCommunicator.java|borderStyle=solid} > private static final Log LOG = > LogFactory.getLog(RMContainerAllocator.class); > {code} > In the above RMContainerAllocator.class needs to be changed to > RMCommunicator.class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6046) Change the class name for logs in RMCommunicator.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244196#comment-14244196 ] Hudson commented on MAPREDUCE-6046: --- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1970 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1970/]) MAPREDUCE-6046. Change the class name for logs in RMCommunicator. (devaraj: rev 0bd022911013629a8c9e7357fae8cf4399d7a1e3) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java > Change the class name for logs in RMCommunicator.java > - > > Key: MAPREDUCE-6046 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6046 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 3.0.0 >Reporter: Devaraj K >Assignee: Sahil Takiar >Priority: Minor > Labels: newbie > Fix For: 2.7.0 > > Attachments: MAPREDUCE-6046-01.patch > > > It is little confusing when the logs gets generated with the class name as > RMContainerAllocator and not present in RMContainerAllocator.java. > {code:title=RMCommunicator.java|borderStyle=solid} > private static final Log LOG = > LogFactory.getLog(RMContainerAllocator.class); > {code} > In the above RMContainerAllocator.class needs to be changed to > RMCommunicator.class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5767) Data corruption when single value exceeds map buffer size (io.sort.mb)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244169#comment-14244169 ] Ben Roling commented on MAPREDUCE-5767: --- Thanks for the feedback [~tomdeleu] -- I'm glad you found the post helpful. It was quite tedious to come to root cause when we encountered the issue and I figured there would probably be some others that ran into it. > Data corruption when single value exceeds map buffer size (io.sort.mb) > -- > > Key: MAPREDUCE-5767 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5767 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: 0.20.1 >Reporter: Ben Roling > > There is an issue in org.apache.hadoop.mapred.MapTask in 0.20 that can cause > data corruption when the size of a single value produced by the mapper > exceeds the size of the map output buffer (roughly io.sort.mb). > I experienced this issue in CDH4.2.1, but am logging the issue here for > greater visibility in case anyone else might run across the issue. > The issue does not exist in 0.21 and beyond due to the implementation of > MAPREDUCE-64. That JIRA significantly changes the way the map output > buffering is done and it looks like the issue has been resolved by those > changes. > I expect this bug will likely be closed / won't fix due to the fact that 0.20 > is obsolete. As stated previously, I am just logging this issue for > visibility in case anyone else is still running something based on 0.20 and > encounters the same problem. > In my situation the issue manifested as an ArrayIndexOutOfBoundsException in > the reduce phase when deserializing a key -- causing the job to fail. > However, I think the problem could manifest in a more dangerous fashion where > the affected job succeeds, but produces corrupt output. The stack trace I > saw was: > 2014-02-13 01:07:34,690 WARN org.apache.hadoop.mapred.Child: Error running > child > java.lang.ArrayIndexOutOfBoundsException: 24 > at > org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) > at > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) > at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) > at > org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135) > at > org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:86) > at > org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:70) > at > org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:135) > at > org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:114) > at > org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:291) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:163) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > The problem appears to me to be in > org.apache.hadoop.mapred.MapTask.MapOutputBuffer.Buffer.write(byte[], int, > int). The sequence of events that leads up to the issue is: > * some complete records (cumulative size less than total buffer size) written > to buffer > * large (over io.sort.mb) record starts writing > * [soft buffer limit > exceeded|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1030] > - spill starts > * write of large record continues > * buffer becomes > [full|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1012] > * > [wrap|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1013] > evaluates to true, suggesting the buffer can be safely wrapped > * writing the large record continues until
[jira] [Commented] (MAPREDUCE-5767) Data corruption when single value exceeds map buffer size (io.sort.mb)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244062#comment-14244062 ] Tom De Leu commented on MAPREDUCE-5767: --- Thanks a lot for this analysis! We had the exact same problem at work , running CDH3.5, with Crunch 0.8.4 and Avro 1.7.7. It's only after trying a couple of days to find the cause of our problem, and not finding it, that I came across this issue via a lucky Google search. I can confirm that increasing *io.sort.mb* solved our problem. Thank you for saving us probably weeks of investigation :) > Data corruption when single value exceeds map buffer size (io.sort.mb) > -- > > Key: MAPREDUCE-5767 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5767 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: 0.20.1 >Reporter: Ben Roling > > There is an issue in org.apache.hadoop.mapred.MapTask in 0.20 that can cause > data corruption when the size of a single value produced by the mapper > exceeds the size of the map output buffer (roughly io.sort.mb). > I experienced this issue in CDH4.2.1, but am logging the issue here for > greater visibility in case anyone else might run across the issue. > The issue does not exist in 0.21 and beyond due to the implementation of > MAPREDUCE-64. That JIRA significantly changes the way the map output > buffering is done and it looks like the issue has been resolved by those > changes. > I expect this bug will likely be closed / won't fix due to the fact that 0.20 > is obsolete. As stated previously, I am just logging this issue for > visibility in case anyone else is still running something based on 0.20 and > encounters the same problem. > In my situation the issue manifested as an ArrayIndexOutOfBoundsException in > the reduce phase when deserializing a key -- causing the job to fail. > However, I think the problem could manifest in a more dangerous fashion where > the affected job succeeds, but produces corrupt output. The stack trace I > saw was: > 2014-02-13 01:07:34,690 WARN org.apache.hadoop.mapred.Child: Error running > child > java.lang.ArrayIndexOutOfBoundsException: 24 > at > org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) > at > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) > at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) > at > org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:173) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135) > at > org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:86) > at > org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperDeserializer.deserialize(SafeAvroSerialization.java:70) > at > org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:135) > at > org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:114) > at > org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:291) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:163) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > The problem appears to me to be in > org.apache.hadoop.mapred.MapTask.MapOutputBuffer.Buffer.write(byte[], int, > int). The sequence of events that leads up to the issue is: > * some complete records (cumulative size less than total buffer size) written > to buffer > * large (over io.sort.mb) record starts writing > * [soft buffer limit > exceeded|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1030] > - spill starts > * write of large record continues > * buffer becomes > [full|https://github.com/apache/hadoop-common/blob/release-0.20.1/src/mapred/org/apache/hadoop/mapred/MapTask.java#L1012] > * > [wrap|https://github.com/apache/hadoop-common/blob/re
[jira] [Commented] (MAPREDUCE-6046) Change the class name for logs in RMCommunicator.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244026#comment-14244026 ] Hudson commented on MAPREDUCE-6046: --- FAILURE: Integrated in Hadoop-Yarn-trunk #773 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/773/]) MAPREDUCE-6046. Change the class name for logs in RMCommunicator. (devaraj: rev 0bd022911013629a8c9e7357fae8cf4399d7a1e3) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java * hadoop-mapreduce-project/CHANGES.txt > Change the class name for logs in RMCommunicator.java > - > > Key: MAPREDUCE-6046 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6046 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 3.0.0 >Reporter: Devaraj K >Assignee: Sahil Takiar >Priority: Minor > Labels: newbie > Fix For: 2.7.0 > > Attachments: MAPREDUCE-6046-01.patch > > > It is little confusing when the logs gets generated with the class name as > RMContainerAllocator and not present in RMContainerAllocator.java. > {code:title=RMCommunicator.java|borderStyle=solid} > private static final Log LOG = > LogFactory.getLog(RMContainerAllocator.class); > {code} > In the above RMContainerAllocator.class needs to be changed to > RMCommunicator.class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6046) Change the class name for logs in RMCommunicator.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243970#comment-14243970 ] Hudson commented on MAPREDUCE-6046: --- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #38 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/38/]) MAPREDUCE-6046. Change the class name for logs in RMCommunicator. (devaraj: rev 0bd022911013629a8c9e7357fae8cf4399d7a1e3) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java > Change the class name for logs in RMCommunicator.java > - > > Key: MAPREDUCE-6046 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6046 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 3.0.0 >Reporter: Devaraj K >Assignee: Sahil Takiar >Priority: Minor > Labels: newbie > Fix For: 2.7.0 > > Attachments: MAPREDUCE-6046-01.patch > > > It is little confusing when the logs gets generated with the class name as > RMContainerAllocator and not present in RMContainerAllocator.java. > {code:title=RMCommunicator.java|borderStyle=solid} > private static final Log LOG = > LogFactory.getLog(RMContainerAllocator.class); > {code} > In the above RMContainerAllocator.class needs to be changed to > RMCommunicator.class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)