[jira] [Commented] (MAPREDUCE-5542) Killing a job just as it finishes can generate an NPE in client
[ https://issues.apache.org/jira/browse/MAPREDUCE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173361#comment-14173361 ] Rohith commented on MAPREDUCE-5542: --- bq. the client would then loop until the full timeout before returning. I see. Agree. This can be improved. > Killing a job just as it finishes can generate an NPE in client > --- > > Key: MAPREDUCE-5542 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5542 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, mrv2 >Affects Versions: 2.1.0-beta, 0.23.9 >Reporter: Jason Lowe >Assignee: Rohith > Attachments: MAPREDUCE-5542.1.patch, MAPREDUCE-5542.2.patch, > MAPREDUCE-5542.3.patch, MAPREDUCE-5542.4.patch > > > If a client tries to kill a job just as the job is finishing then the client > can crash with an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4818) Easier identification of tasks that timeout during localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173230#comment-14173230 ] Hadoop QA commented on MAPREDUCE-4818: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12675155/MAPREDUCE-4818.v5.patch against trunk revision 466f087. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.sls.TestSLSRunner org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4967//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4967//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4967//console This message is automatically generated. > Easier identification of tasks that timeout during localization > --- > > Key: MAPREDUCE-4818 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4818 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 0.23.3, 2.0.3-alpha >Reporter: Jason Lowe >Assignee: Siqi Li > Labels: usability > Attachments: MAPREDUCE-4818.v1.patch, MAPREDUCE-4818.v2.patch, > MAPREDUCE-4818.v3.patch, MAPREDUCE-4818.v4.patch, MAPREDUCE-4818.v5.patch > > > When a task is taking too long to localize and is killed by the AM due to > task timeout, the job UI/history is not very helpful. The attempt simply > lists a diagnostic stating it was killed due to timeout, but there are no > logs for the attempt since it never actually got started. There are log > messages on the NM that show the container never made it past localization by > the time it was killed, but users often do not have access to those logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6129) Job failed due to counter out of limited in MRAppMaster
[ https://issues.apache.org/jira/browse/MAPREDUCE-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated MAPREDUCE-6129: Attachment: MAPREDUCE-6129.diff > Job failed due to counter out of limited in MRAppMaster > --- > > Key: MAPREDUCE-6129 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6129 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.3.0, 2.5.0, 2.4.1, 2.5.1 >Reporter: Min Zhou > Attachments: MAPREDUCE-6129.diff > > > Lots of of cluster's job use more than 120 counters, those kind of jobs > failed with exception like below > {noformat} > 2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673] > org.apache.hadoop.ipc.Server: Unable to read call parameters for client > 10.180.216.12on connection protocol > org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE > org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many > counters: 121 max=120 > at > org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103) > at > org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110) > at > org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175) > at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324) > at > org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314) > at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489) > at > org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140) > at > org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157) > at > org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577) > {noformat} > The class org.apache.hadoop.mapreduce.counters.Limits load the > mapred-site.xml on nodemanager node for JobConf if it hasn't been inited. > If the mapred-site.xml on nodemanager node is not exist or the > mapreduce.job.counters.max hasn't been defined on that file, Class > org.apache.hadoop.mapreduce.counters.Limits will just use the default value > 120. > Instead, we should read user job's conf file rather than config files on > nodemanager for checking counters limits. > I will submitt a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6129) Job failed due to counter out of limited in MRAppMaster
[ https://issues.apache.org/jira/browse/MAPREDUCE-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated MAPREDUCE-6129: Affects Version/s: 3.0.0 2.3.0 2.5.0 2.4.1 2.5.1 > Job failed due to counter out of limited in MRAppMaster > --- > > Key: MAPREDUCE-6129 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6129 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.3.0, 2.5.0, 2.4.1, 2.5.1 >Reporter: Min Zhou > > Lots of of cluster's job use more than 120 counters, those kind of jobs > failed with exception like below > {noformat} > 2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673] > org.apache.hadoop.ipc.Server: Unable to read call parameters for client > 10.180.216.12on connection protocol > org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE > org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many > counters: 121 max=120 > at > org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103) > at > org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110) > at > org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175) > at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324) > at > org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314) > at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489) > at > org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140) > at > org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157) > at > org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577) > {noformat} > The class org.apache.hadoop.mapreduce.counters.Limits load the > mapred-site.xml on nodemanager node for JobConf if it hasn't been inited. > If the mapred-site.xml on nodemanager node is not exist or the > mapreduce.job.counters.max hasn't been defined on that file, Class > org.apache.hadoop.mapreduce.counters.Limits will just use the default value > 120. > Instead, we should read user job's conf file rather than config files on > nodemanager for checking counters limits. > I will submitt a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6129) Job failed due to counter out of limited in MRAppMaster
Min Zhou created MAPREDUCE-6129: --- Summary: Job failed due to counter out of limited in MRAppMaster Key: MAPREDUCE-6129 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6129 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Reporter: Min Zhou Lots of of cluster's job use more than 120 counters, those kind of jobs failed with exception like below {noformat} 2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673] org.apache.hadoop.ipc.Server: Unable to read call parameters for client 10.180.216.12on connection protocol org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175) at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324) at org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314) at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489) at org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157) at org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577) {noformat} The class org.apache.hadoop.mapreduce.counters.Limits load the mapred-site.xml on nodemanager node for JobConf if it hasn't been inited. If the mapred-site.xml on nodemanager node is not exist or the mapreduce.job.counters.max hasn't been defined on that file, Class org.apache.hadoop.mapreduce.counters.Limits will just use the default value 120. Instead, we should read user job's conf file rather than config files on nodemanager for checking counters limits. I will submitt a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4818) Easier identification of tasks that timeout during localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated MAPREDUCE-4818: --- Attachment: MAPREDUCE-4818.v5.patch > Easier identification of tasks that timeout during localization > --- > > Key: MAPREDUCE-4818 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4818 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 0.23.3, 2.0.3-alpha >Reporter: Jason Lowe >Assignee: Siqi Li > Labels: usability > Attachments: MAPREDUCE-4818.v1.patch, MAPREDUCE-4818.v2.patch, > MAPREDUCE-4818.v3.patch, MAPREDUCE-4818.v4.patch, MAPREDUCE-4818.v5.patch > > > When a task is taking too long to localize and is killed by the AM due to > task timeout, the job UI/history is not very helpful. The attempt simply > lists a diagnostic stating it was killed due to timeout, but there are no > logs for the attempt since it never actually got started. There are log > messages on the NM that show the container never made it past localization by > the time it was killed, but users often do not have access to those logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172967#comment-14172967 ] Nathan Roberts commented on MAPREDUCE-2841: --- {quote} Let's let this bake in trunk for a little while and consider a backport to branch-2 down the road if there is demand. Marking the issue as resolved for now. {quote} Nice Work! Not sure how much baking really happens on trunk;) Looking forward to this getting onto branch 2. > Task level native optimization > -- > > Key: MAPREDUCE-2841 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task > Environment: x86-64 Linux/Unix >Reporter: Binglin Chang >Assignee: Sean Zhong > Fix For: 3.0.0 > > Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, > MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, > dualpivotv20-0.patch, fb-shuffle.patch, > hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, > mr-2841-merge-2.txt, mr-2841-merge-3.patch, mr-2841-merge-4.patch, > mr-2841-merge.txt > > > I'm recently working on native optimization for MapTask based on JNI. > The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs > emitted by mapper, therefore sort, spill, IFile serialization can all be done > in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising > results: > 1. Sort is about 3x-10x as fast as java(only binary string compare is > supported) > 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware > CRC32C is used, things can get much faster(1G/ > 3. Merge code is not completed yet, so the test use enough io.sort.mb to > prevent mid-spill > This leads to a total speed up of 2x~3x for the whole MapTask, if > IdentityMapper(mapper does nothing) is used > There are limitations of course, currently only Text and BytesWritable is > supported, and I have not think through many things right now, such as how to > support map side combine. I had some discussion with somebody familiar with > hive, it seems that these limitations won't be much problem for Hive to > benefit from those optimizations, at least. Advices or discussions about > improving compatibility are most welcome:) > Currently NativeMapOutputCollector has a static method called canEnable(), > which checks if key/value type, comparator type, combiner are all compatible, > then MapTask can choose to enable NativeMapOutputCollector. > This is only a preliminary test, more work need to be done. I expect better > final results, and I believe similar optimization can be adopt to reduce task > and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6083) Map/Reduce dangerously adds Guava @Beta class to CryptoUtils
[ https://issues.apache.org/jira/browse/MAPREDUCE-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172896#comment-14172896 ] Christopher Tubbs commented on MAPREDUCE-6083: -- Would this be more likely to be accepted for 2.6.0 if it were provided as a copied/re-implemented version of LimitInputStream instead of a dependency version change? > Map/Reduce dangerously adds Guava @Beta class to CryptoUtils > > > Key: MAPREDUCE-6083 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6083 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Christopher Tubbs > Labels: beta, deprecated, guava > Attachments: > 0001-MAPREDUCE-6083-Avoid-client-use-of-deprecated-LimitI.patch > > > See HDFS-7040 for more background/details. > In recent 2.6.0-SNAPSHOTs, the use of LimitInputStream was added to > CryptoUtils. This is part of the API components of Hadoop, which severely > impacts users who were utilizing newer versions of Guava, where the @Beta and > @Deprecated class, LimitInputStream, has been removed (removed in version 15 > and later), beyond the impact already experienced in 2.4.0 as identified in > HDFS-7040. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6117) Hadoop ignores yarn.nodemanager.hostname for RPC listeners
[ https://issues.apache.org/jira/browse/MAPREDUCE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172821#comment-14172821 ] Hadoop QA commented on MAPREDUCE-6117: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12675078/MapReduce-534.patch against trunk revision f19771a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4966//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4966//console This message is automatically generated. > Hadoop ignores yarn.nodemanager.hostname for RPC listeners > -- > > Key: MAPREDUCE-6117 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6117 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, task >Affects Versions: 2.2.1, 2.4.1, 2.5.1 > Environment: Any mapreduce example with standard cluster. In our > case each node has four networks. It is important that all internode > communication be done on a specific network. >Reporter: Waldyn Benbenek >Assignee: Waldyn Benbenek > Fix For: 2.5.1 > > Attachments: MapReduce-534.patch > > Original Estimate: 48h > Time Spent: 384h > Remaining Estimate: 0h > > The RPC listeners for an application are using the hostname of the node as > the binding address of the listener, They ignore yarn.nodemanager.hostname > for this. In our setup we want all communication between nodes to be done > via the network addresses we specify in yarn.nodemanager.hostname on each > node. > TaskAttemptListenerImpl.java and MRClientService.java are two places I have > found where the default address is used rather that NM_host. The node > Manager hostname should be used for all communication between nodes including > the RPC listeners. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6117) Hadoop ignores yarn.nodemanager.hostname for RPC listeners
[ https://issues.apache.org/jira/browse/MAPREDUCE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Waldyn Benbenek updated MAPREDUCE-6117: --- Release Note: This patch has few new tests for the following reasons: TestTaskAttemptListenerImpl does not test or even perform the service start where the change is made. This is because that would starting a new process. TestMRClientService already checks the NM_HOST which change does effect. The change pulls the NM_HOST from the environment. This needs to be passed to a spawned process which none of the tests do. In general , it would be better if NM_HOST were more pervasive, that is, if the property were passed to the all the parts of the application, in particular the parts that deal with RPC. Since that is not the case, I have chosen to pull it from the environment where once can depend upon its being. I have tested it in clusters with multiple networks where the nm host is configured and those where it is not. It works as designed. That is, if the NM host is configured on the node the TaskAttempt Listner and the Client Service listen on the give NM host, otherwise they listen on the node's "hostname". was: This patch has no new tests for the following reasons: TestTaskAttemptListenerImpl does not test or even perform the service start where the change is made. This is because that would starting a new process. TestMRClientService already checks the NM_HOST which change does effect. The change pulls the NM_HOST from the environment. This needs to be passed to a spawned process which none of the tests do. In general , it would be better if NM_HOST were more pervasive, that is, if the property were passed to the all the parts of the application, in particular the parts that deal with RPC. Since that is not the case, I have chosen to pull it from the environment where once can depend upon its being. I have tested it in clusters with multiple networks where the nm host is configured and those where it is not. It works as designed. That is, if the NM host is configured on the node the TaskAttempt Listner and the Client Service listen on the give NM host, otherwise they listen on the node's "hostname". Status: Patch Available (was: Open) > Hadoop ignores yarn.nodemanager.hostname for RPC listeners > -- > > Key: MAPREDUCE-6117 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6117 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, task >Affects Versions: 2.5.1, 2.4.1, 2.2.1 > Environment: Any mapreduce example with standard cluster. In our > case each node has four networks. It is important that all internode > communication be done on a specific network. >Reporter: Waldyn Benbenek >Assignee: Waldyn Benbenek > Fix For: 2.5.1 > > Attachments: MapReduce-534.patch > > Original Estimate: 48h > Time Spent: 384h > Remaining Estimate: 0h > > The RPC listeners for an application are using the hostname of the node as > the binding address of the listener, They ignore yarn.nodemanager.hostname > for this. In our setup we want all communication between nodes to be done > via the network addresses we specify in yarn.nodemanager.hostname on each > node. > TaskAttemptListenerImpl.java and MRClientService.java are two places I have > found where the default address is used rather that NM_host. The node > Manager hostname should be used for all communication between nodes including > the RPC listeners. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6117) Hadoop ignores yarn.nodemanager.hostname for RPC listeners
[ https://issues.apache.org/jira/browse/MAPREDUCE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Waldyn Benbenek updated MAPREDUCE-6117: --- Attachment: MapReduce-534.patch Same patch with test update > Hadoop ignores yarn.nodemanager.hostname for RPC listeners > -- > > Key: MAPREDUCE-6117 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6117 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, task >Affects Versions: 2.2.1, 2.4.1, 2.5.1 > Environment: Any mapreduce example with standard cluster. In our > case each node has four networks. It is important that all internode > communication be done on a specific network. >Reporter: Waldyn Benbenek >Assignee: Waldyn Benbenek > Fix For: 2.5.1 > > Attachments: MapReduce-534.patch > > Original Estimate: 48h > Time Spent: 384h > Remaining Estimate: 0h > > The RPC listeners for an application are using the hostname of the node as > the binding address of the listener, They ignore yarn.nodemanager.hostname > for this. In our setup we want all communication between nodes to be done > via the network addresses we specify in yarn.nodemanager.hostname on each > node. > TaskAttemptListenerImpl.java and MRClientService.java are two places I have > found where the default address is used rather that NM_host. The node > Manager hostname should be used for all communication between nodes including > the RPC listeners. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5542) Killing a job just as it finishes can generate an NPE in client
[ https://issues.apache.org/jira/browse/MAPREDUCE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172724#comment-14172724 ] Jason Lowe commented on MAPREDUCE-5542: --- Thanks for updating the patch. Looks better, but I noticed something I should have before. After sending the kill directive via the MR client the code then loops as long as the job state isn't KILLED. However if the job is finishing just as we send the kill then the job may finish in a non-killed state. I think the client would then loop until the full timeout before returning. Instead of checking for not KILLED we should be checking for a non-terminal state instead (e.g. != KILLED, SUCCEEDED, or FAILED). We can make an EnumSet of the terminal job states and check if the status is not in that set. > Killing a job just as it finishes can generate an NPE in client > --- > > Key: MAPREDUCE-5542 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5542 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, mrv2 >Affects Versions: 2.1.0-beta, 0.23.9 >Reporter: Jason Lowe >Assignee: Rohith > Attachments: MAPREDUCE-5542.1.patch, MAPREDUCE-5542.2.patch, > MAPREDUCE-5542.3.patch, MAPREDUCE-5542.4.patch > > > If a client tries to kill a job just as the job is finishing then the client > can crash with an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5970) Provide a boolean switch to enable MR-AM profiling
[ https://issues.apache.org/jira/browse/MAPREDUCE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172691#comment-14172691 ] Hudson commented on MAPREDUCE-5970: --- FAILURE: Integrated in Hadoop-trunk-Commit #6266 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6266/]) MAPREDUCE-5970. Provide a boolean switch to enable MR-AM profiling. Contributed by Gera Shegalov (jlowe: rev f19771a24c2f90982cf6dec35889836a6146c968) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobConf.java > Provide a boolean switch to enable MR-AM profiling > -- > > Key: MAPREDUCE-5970 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5970 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, client >Affects Versions: 2.4.1 >Reporter: Gera Shegalov >Assignee: Gera Shegalov >Priority: Minor > Fix For: 2.6.0 > > Attachments: MAPREDUCE-5970.v01.patch, MAPREDUCE-5970.v02.patch, > MAPREDUCE-5970.v03.patch > > > MR task profiling can be enabled with a simple switch > {{mapreduce.task.profile=true}}. We can analogously have > {{yarn.app.mapreduce.am.profile}} for MR-AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5970) Provide a boolean switch to enable MR-AM profiling
[ https://issues.apache.org/jira/browse/MAPREDUCE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5970: -- Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks, Gera! I committed this to trunk, branch-2, and branch-2.6. > Provide a boolean switch to enable MR-AM profiling > -- > > Key: MAPREDUCE-5970 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5970 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, client >Affects Versions: 2.4.1 >Reporter: Gera Shegalov >Assignee: Gera Shegalov >Priority: Minor > Fix For: 2.6.0 > > Attachments: MAPREDUCE-5970.v01.patch, MAPREDUCE-5970.v02.patch, > MAPREDUCE-5970.v03.patch > > > MR task profiling can be enabled with a simple switch > {{mapreduce.task.profile=true}}. We can analogously have > {{yarn.app.mapreduce.am.profile}} for MR-AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5970) Provide a boolean switch to enable MR-AM profiling
[ https://issues.apache.org/jira/browse/MAPREDUCE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172662#comment-14172662 ] Jason Lowe commented on MAPREDUCE-5970: --- +1 lgtm. Committing this. > Provide a boolean switch to enable MR-AM profiling > -- > > Key: MAPREDUCE-5970 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5970 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, client >Affects Versions: 2.4.1 >Reporter: Gera Shegalov >Assignee: Gera Shegalov >Priority: Minor > Attachments: MAPREDUCE-5970.v01.patch, MAPREDUCE-5970.v02.patch, > MAPREDUCE-5970.v03.patch > > > MR task profiling can be enabled with a simple switch > {{mapreduce.task.profile=true}}. We can analogously have > {{yarn.app.mapreduce.am.profile}} for MR-AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5873) Shuffle bandwidth computation includes time spent waiting for maps
[ https://issues.apache.org/jira/browse/MAPREDUCE-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172512#comment-14172512 ] Hudson commented on MAPREDUCE-5873: --- SUCCESS: Integrated in Hadoop-trunk-Commit #6264 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6264/]) MAPREDUCE-5873. Shuffle bandwidth computation includes time spent waiting for maps. Contributed by Siqi Li (jlowe: rev b9edad64034a9c8a121ec2b37792c190ba561e26) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java > Shuffle bandwidth computation includes time spent waiting for maps > -- > > Key: MAPREDUCE-5873 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5873 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.6.0 > > Attachments: MAPREDUCE-5873.v1.patch, MAPREDUCE-5873.v2.patch, > MAPREDUCE-5873.v3.patch, MAPREDUCE-5873.v4.patch, MAPREDUCE-5873.v5.patch, > MAPREDUCE-5873.v6.patch, MAPREDUCE-5873.v9.patch > > > Currently ShuffleScheduler in ReduceTask JVM status displays bandwidth. Its > definition however is confusing because it captures the time where there is > no copying because there is a pause between when new wave of map outputs is > available. > current bw is definded as (bytes copied so far) / (total time in the copy > phase so far) > It would be more useful > 1) to measure bandwidth of a single copy call. > 2) display aggregated bw as long as there is at least one fetcher is in the > copy call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5873) Shuffle bandwidth computation includes time spent waiting for maps
[ https://issues.apache.org/jira/browse/MAPREDUCE-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5873: -- Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks, Siqi! I committed this to trunk, branch-2, and branch-2.6. > Shuffle bandwidth computation includes time spent waiting for maps > -- > > Key: MAPREDUCE-5873 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5873 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.6.0 > > Attachments: MAPREDUCE-5873.v1.patch, MAPREDUCE-5873.v2.patch, > MAPREDUCE-5873.v3.patch, MAPREDUCE-5873.v4.patch, MAPREDUCE-5873.v5.patch, > MAPREDUCE-5873.v6.patch, MAPREDUCE-5873.v9.patch > > > Currently ShuffleScheduler in ReduceTask JVM status displays bandwidth. Its > definition however is confusing because it captures the time where there is > no copying because there is a pause between when new wave of map outputs is > available. > current bw is definded as (bytes copied so far) / (total time in the copy > phase so far) > It would be more useful > 1) to measure bandwidth of a single copy call. > 2) display aggregated bw as long as there is at least one fetcher is in the > copy call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5873) Shuffle bandwidth computation includes time spent waiting for maps
[ https://issues.apache.org/jira/browse/MAPREDUCE-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5873: -- Summary: Shuffle bandwidth computation includes time spent waiting for maps (was: Measure bw of a single copy call and display the correct aggregated bw) > Shuffle bandwidth computation includes time spent waiting for maps > -- > > Key: MAPREDUCE-5873 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5873 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: MAPREDUCE-5873.v1.patch, MAPREDUCE-5873.v2.patch, > MAPREDUCE-5873.v3.patch, MAPREDUCE-5873.v4.patch, MAPREDUCE-5873.v5.patch, > MAPREDUCE-5873.v6.patch, MAPREDUCE-5873.v9.patch > > > Currently ShuffleScheduler in ReduceTask JVM status displays bandwidth. Its > definition however is confusing because it captures the time where there is > no copying because there is a pause between when new wave of map outputs is > available. > current bw is definded as (bytes copied so far) / (total time in the copy > phase so far) > It would be more useful > 1) to measure bandwidth of a single copy call. > 2) display aggregated bw as long as there is at least one fetcher is in the > copy call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5873) Measure bw of a single copy call and display the correct aggregated bw
[ https://issues.apache.org/jira/browse/MAPREDUCE-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172488#comment-14172488 ] Jason Lowe commented on MAPREDUCE-5873: --- +1 lgtm. Committing this. > Measure bw of a single copy call and display the correct aggregated bw > -- > > Key: MAPREDUCE-5873 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5873 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: MAPREDUCE-5873.v1.patch, MAPREDUCE-5873.v2.patch, > MAPREDUCE-5873.v3.patch, MAPREDUCE-5873.v4.patch, MAPREDUCE-5873.v5.patch, > MAPREDUCE-5873.v6.patch, MAPREDUCE-5873.v9.patch > > > Currently ShuffleScheduler in ReduceTask JVM status displays bandwidth. Its > definition however is confusing because it captures the time where there is > no copying because there is a pause between when new wave of map outputs is > available. > current bw is definded as (bytes copied so far) / (total time in the copy > phase so far) > It would be more useful > 1) to measure bandwidth of a single copy call. > 2) display aggregated bw as long as there is at least one fetcher is in the > copy call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6117) Hadoop ignores yarn.nodemanager.hostname for RPC listeners
[ https://issues.apache.org/jira/browse/MAPREDUCE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Waldyn Benbenek updated MAPREDUCE-6117: --- Attachment: (was: MapReduce-325.patch) > Hadoop ignores yarn.nodemanager.hostname for RPC listeners > -- > > Key: MAPREDUCE-6117 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6117 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, task >Affects Versions: 2.2.1, 2.4.1, 2.5.1 > Environment: Any mapreduce example with standard cluster. In our > case each node has four networks. It is important that all internode > communication be done on a specific network. >Reporter: Waldyn Benbenek >Assignee: Waldyn Benbenek > Fix For: 2.5.1 > > Original Estimate: 48h > Time Spent: 384h > Remaining Estimate: 0h > > The RPC listeners for an application are using the hostname of the node as > the binding address of the listener, They ignore yarn.nodemanager.hostname > for this. In our setup we want all communication between nodes to be done > via the network addresses we specify in yarn.nodemanager.hostname on each > node. > TaskAttemptListenerImpl.java and MRClientService.java are two places I have > found where the default address is used rather that NM_host. The node > Manager hostname should be used for all communication between nodes including > the RPC listeners. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6117) Hadoop ignores yarn.nodemanager.hostname for RPC listeners
[ https://issues.apache.org/jira/browse/MAPREDUCE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Waldyn Benbenek updated MAPREDUCE-6117: --- Status: Open (was: Patch Available) Replacing patch with one including test change > Hadoop ignores yarn.nodemanager.hostname for RPC listeners > -- > > Key: MAPREDUCE-6117 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6117 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, task >Affects Versions: 2.5.1, 2.4.1, 2.2.1 > Environment: Any mapreduce example with standard cluster. In our > case each node has four networks. It is important that all internode > communication be done on a specific network. >Reporter: Waldyn Benbenek >Assignee: Waldyn Benbenek > Fix For: 2.5.1 > > Attachments: MapReduce-325.patch > > Original Estimate: 48h > Time Spent: 384h > Remaining Estimate: 0h > > The RPC listeners for an application are using the hostname of the node as > the binding address of the listener, They ignore yarn.nodemanager.hostname > for this. In our setup we want all communication between nodes to be done > via the network addresses we specify in yarn.nodemanager.hostname on each > node. > TaskAttemptListenerImpl.java and MRClientService.java are two places I have > found where the default address is used rather that NM_host. The node > Manager hostname should be used for all communication between nodes including > the RPC listeners. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5269) Preemption of Reducer (and Shuffle) via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172257#comment-14172257 ] Hadoop QA commented on MAPREDUCE-5269: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674970/MAPREDUCE-5269.4.patch against trunk revision 128ace1. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4965//console This message is automatically generated. > Preemption of Reducer (and Shuffle) via checkpointing > - > > Key: MAPREDUCE-5269 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5269 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: MAPREDUCE-5269.2.patch, MAPREDUCE-5269.3.patch, > MAPREDUCE-5269.4.patch, MAPREDUCE-5269.patch > > > This patch tracks the changes in the task runtime (shuffle, reducer context, > etc.) that are required to implement checkpoint-based preemption of reducer > tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5269) Preemption of Reducer (and Shuffle) via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Augusto Souza updated MAPREDUCE-5269: - Attachment: MAPREDUCE-5269.4.patch Fixing MAPREDUCE-5269.3 by removing prefixes in files > Preemption of Reducer (and Shuffle) via checkpointing > - > > Key: MAPREDUCE-5269 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5269 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: MAPREDUCE-5269.2.patch, MAPREDUCE-5269.3.patch, > MAPREDUCE-5269.4.patch, MAPREDUCE-5269.patch > > > This patch tracks the changes in the task runtime (shuffle, reducer context, > etc.) that are required to implement checkpoint-based preemption of reducer > tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6128) Automatic addition of bundled jars to distributed cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated MAPREDUCE-6128: - Attachment: MAPREDUCE-6128.v01.patch v01 to illustrate the idea. > Automatic addition of bundled jars to distributed cache > > > Key: MAPREDUCE-6128 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6128 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.5.1 >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: MAPREDUCE-6128.v01.patch > > > On the client side, JDK adds Class-Path elements from the job jar manifest > on the classpath. In theory there could be many bundled jars in many > directories such that adding them manually via libjars or similar means to > task classpaths is cumbersome. If this property is enabled, the same jars are > added > to the task classpaths automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6128) Automatic addition of bundled jars to distributed cache
Gera Shegalov created MAPREDUCE-6128: Summary: Automatic addition of bundled jars to distributed cache Key: MAPREDUCE-6128 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6128 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.5.1 Reporter: Gera Shegalov Assignee: Gera Shegalov On the client side, JDK adds Class-Path elements from the job jar manifest on the classpath. In theory there could be many bundled jars in many directories such that adding them manually via libjars or similar means to task classpaths is cumbersome. If this property is enabled, the same jars are added to the task classpaths automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)