[jira] [Updated] (MAPREDUCE-3597) Provide a way to access other info of history file from Rumentool
[ https://issues.apache.org/jira/browse/MAPREDUCE-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-3597: Attachment: 3597.branch-1.v1.patch Attaching patch for branch-1. Provide a way to access other info of history file from Rumentool - Key: MAPREDUCE-3597 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3597 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tools/rumen Affects Versions: 0.24.0 Reporter: Ravi Gummadi Assignee: Ravi Gummadi Fix For: 0.24.0 Attachments: 3597.branch-1.v1.patch, 3597.v0.patch, 3597.v1.patch As the trace file generated by Rumen TraceBuilder is skipping some of the info like job counters, task counters, etc. we need a way to access other info available in history file which is not dumped to trace file. This is useful for components which want to parse history files and get info. These components can directly use/leverage Rumen's parsing of history files across hadoop releases and get history info in a consistent way for further analysis/processing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3664) HDFS Federation Documentation has incorrect configuration example
[ https://issues.apache.org/jira/browse/MAPREDUCE-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185569#comment-13185569 ] Hudson commented on MAPREDUCE-3664: --- Integrated in Hadoop-Hdfs-trunk #924 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/924/]) MAPREDUCE-3664. Federation Documentation has incorrect configuration example. Contributed by Brandon Li. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1230708 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/Federation.apt.vm HDFS Federation Documentation has incorrect configuration example - Key: MAPREDUCE-3664 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3664 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 0.23.0, 0.24.0 Reporter: praveen sripati Priority: Minor Attachments: HDFS-2778.txt, HDFS-2778.txt HDFS Federation documentation example (1) has the following property namedfs.namenode.rpc-address.ns1/name valuehdfs://nn-host1:rpc-port/value /property dfs.namenode.rpc-address.* should be set to hostname:port, hdfs:// should not be there. (1) - http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3545) Remove Avro RPC
[ https://issues.apache.org/jira/browse/MAPREDUCE-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185568#comment-13185568 ] Hudson commented on MAPREDUCE-3545: --- Integrated in Hadoop-Hdfs-trunk #924 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/924/]) Remove the empty avro directories for MAPREDUCE-3545. szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1230886 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/avro * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/avro * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/avro * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/avro Remove Avro RPC --- Key: MAPREDUCE-3545 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3545 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.1 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.1, 0.24.0 Attachments: MR-3545.txt Please see the discussion in HDFS-2660 for more details. I have created a branch HADOOP-6659 to save the Avro work, if in the future some one wants to use the work that existed to add support for Avro RPC. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3597) Provide a way to access other info of history file from Rumentool
[ https://issues.apache.org/jira/browse/MAPREDUCE-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185576#comment-13185576 ] Amar Kamat commented on MAPREDUCE-3597: --- The patch looks good to me. It seems that branch-1 Rumen is aware of pre and post 21 changes. We need to be sure of the implications. Provide a way to access other info of history file from Rumentool - Key: MAPREDUCE-3597 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3597 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tools/rumen Affects Versions: 0.24.0 Reporter: Ravi Gummadi Assignee: Ravi Gummadi Fix For: 0.24.0 Attachments: 3597.branch-1.v1.patch, 3597.v0.patch, 3597.v1.patch As the trace file generated by Rumen TraceBuilder is skipping some of the info like job counters, task counters, etc. we need a way to access other info available in history file which is not dumped to trace file. This is useful for components which want to parse history files and get info. These components can directly use/leverage Rumen's parsing of history files across hadoop releases and get history info in a consistent way for further analysis/processing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3545) Remove Avro RPC
[ https://issues.apache.org/jira/browse/MAPREDUCE-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185578#comment-13185578 ] Hudson commented on MAPREDUCE-3545: --- Integrated in Hadoop-Mapreduce-trunk #957 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/957/]) Remove the empty avro directories for MAPREDUCE-3545. szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1230886 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/avro * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/avro * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/avro * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/avro Remove Avro RPC --- Key: MAPREDUCE-3545 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3545 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.1 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.1, 0.24.0 Attachments: MR-3545.txt Please see the discussion in HDFS-2660 for more details. I have created a branch HADOOP-6659 to save the Avro work, if in the future some one wants to use the work that existed to add support for Avro RPC. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3664) HDFS Federation Documentation has incorrect configuration example
[ https://issues.apache.org/jira/browse/MAPREDUCE-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185579#comment-13185579 ] Hudson commented on MAPREDUCE-3664: --- Integrated in Hadoop-Mapreduce-trunk #957 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/957/]) MAPREDUCE-3664. Federation Documentation has incorrect configuration example. Contributed by Brandon Li. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1230708 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/Federation.apt.vm HDFS Federation Documentation has incorrect configuration example - Key: MAPREDUCE-3664 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3664 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 0.23.0, 0.24.0 Reporter: praveen sripati Priority: Minor Attachments: HDFS-2778.txt, HDFS-2778.txt HDFS Federation documentation example (1) has the following property namedfs.namenode.rpc-address.ns1/name valuehdfs://nn-host1:rpc-port/value /property dfs.namenode.rpc-address.* should be set to hostname:port, hdfs:// should not be there. (1) - http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3667) Gridmix jobs are failing with OOM in reduce shuffle phase.
Gridmix jobs are failing with OOM in reduce shuffle phase. -- Key: MAPREDUCE-3667 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3667 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Amol Kekre Priority: Blocker Fix For: 0.23.1 Roll up bug for gridmix3 benchmark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code
[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3656: -- Status: Open (was: Patch Available) Cancelling patch. Needs another fix. Sort is completing with this patch + MR3596, but there's random map task failures. TaskAttemptListener should be returning a null JvmTask instead of JvmTask.task=null. Sort job on 350 scale is consistently failing with latest MRV2 code Key: MAPREDUCE-3656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Siddharth Seth Priority: Blocker Fix For: 0.23.1 Attachments: MR3656.txt, MR3656.txt With the code checked out on last two days. Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs When around 50% of maps are completed, suddenly job jumps to failed state. On looking at NM log, found RM sent Stop Container Request to NM for AM container. But at INFO level from RM log not able find why RM is killing AM when job is not killed manually. One thing found common on failed AM logs is -: org.apache.hadoop.yarn.state.InvalidStateTransitonException With with different. For e.g. One log says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED {code} Whereas other logs says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code
[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3656: -- Attachment: MR3656.txt Yet another patch. Hopefully this one has everything resolved. Sort job on 350 scale is consistently failing with latest MRV2 code Key: MAPREDUCE-3656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Siddharth Seth Priority: Blocker Fix For: 0.23.1 Attachments: MR3656.txt, MR3656.txt, MR3656.txt With the code checked out on last two days. Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs When around 50% of maps are completed, suddenly job jumps to failed state. On looking at NM log, found RM sent Stop Container Request to NM for AM container. But at INFO level from RM log not able find why RM is killing AM when job is not killed manually. One thing found common on failed AM logs is -: org.apache.hadoop.yarn.state.InvalidStateTransitonException With with different. For e.g. One log says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED {code} Whereas other logs says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code
[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3656: -- Status: Patch Available (was: Open) Sort job on 350 scale is consistently failing with latest MRV2 code Key: MAPREDUCE-3656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Siddharth Seth Priority: Blocker Fix For: 0.23.1 Attachments: MR3656.txt, MR3656.txt, MR3656.txt With the code checked out on last two days. Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs When around 50% of maps are completed, suddenly job jumps to failed state. On looking at NM log, found RM sent Stop Container Request to NM for AM container. But at INFO level from RM log not able find why RM is killing AM when job is not killed manually. One thing found common on failed AM logs is -: org.apache.hadoop.yarn.state.InvalidStateTransitonException With with different. For e.g. One log says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED {code} Whereas other logs says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3668) AccessControlException when running mapred job -list command
AccessControlException when running mapred job -list command Key: MAPREDUCE-3668 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3668 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2, security Affects Versions: 0.23.1 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker If a user tries to examine the status of all jobs running on a secure cluster the mapred client can fail with an AccessControlException. For example, submitting two jobs each from a different user then trying to query the status as the second user can fail like this: $ mapred job -list all 12/01/12 20:01:12 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used Total jobs:2 JobId State StartTime UserNameQueue PriorityMaps Reduces UsedContainers RsvdContainers UsedMem RsvdMem NeededMem AM info 12/01/12 20:01:14 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server job_1326396427223_0002 SUCCEEDED 1326398424244 user2default NORMAL 2 2 0 0 0M 0M 0M hostremoved:8088/proxy/application_1326396427223_0002/jobhistory/job/job_1326396427223_2_2 12/01/12 20:01:14 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 12/01/12 20:01:14 WARN mapred.ClientServiceDelegate: Error from remote end: User user2 cannot perform operation VIEW_JOB on job_1326396427223_0001 Exception in thread main RemoteTrace: java.security.AccessControlException: User user2 cannot perform operation VIEW_JOB on job_1326396427223_0001 at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.checkAccess(HistoryClientService.java:293) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:184) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.getJobReport(HistoryClientService.java:200) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:106) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:187) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:344) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1490) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1486) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1484) at Local Trace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: User user2 cannot perform operation VIEW_JOB on job_1326396427223_0001 at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:151) at $Proxy10.getJobReport(Unknown Source) at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:328) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:405) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:431) at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:186) at org.apache.hadoop.mapreduce.tools.CLI.displayJobList(CLI.java:571) at org.apache.hadoop.mapreduce.tools.CLI.listAllJobs(CLI.java:500) at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:298) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1209) The information provided by the command is similar to what is presented on the ResourceManager web UI, and that page has no security. Marking this as a blocker since many of our automated acceptance tests use this command to obtain the status of jobs
[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code
[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185805#comment-13185805 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-3656: Glad you ran it on a cluster before commit. +1 for the latest fix. Sort job on 350 scale is consistently failing with latest MRV2 code Key: MAPREDUCE-3656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Siddharth Seth Priority: Blocker Fix For: 0.23.1 Attachments: MR3656.txt, MR3656.txt, MR3656.txt With the code checked out on last two days. Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs When around 50% of maps are completed, suddenly job jumps to failed state. On looking at NM log, found RM sent Stop Container Request to NM for AM container. But at INFO level from RM log not able find why RM is killing AM when job is not killed manually. One thing found common on failed AM logs is -: org.apache.hadoop.yarn.state.InvalidStateTransitonException With with different. For e.g. One log says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED {code} Whereas other logs says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code
[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185808#comment-13185808 ] Hadoop QA commented on MAPREDUCE-3656: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510512/MR3656.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1610//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1610//console This message is automatically generated. Sort job on 350 scale is consistently failing with latest MRV2 code Key: MAPREDUCE-3656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Siddharth Seth Priority: Blocker Fix For: 0.23.1 Attachments: MR3656.txt, MR3656.txt, MR3656.txt With the code checked out on last two days. Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs When around 50% of maps are completed, suddenly job jumps to failed state. On looking at NM log, found RM sent Stop Container Request to NM for AM container. But at INFO level from RM log not able find why RM is killing AM when job is not killed manually. One thing found common on failed AM logs is -: org.apache.hadoop.yarn.state.InvalidStateTransitonException With with different. For e.g. One log says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED {code} Whereas other logs says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3669) Getting a lot of PriviledgedActionException / SaslException when running a job
Getting a lot of PriviledgedActionException / SaslException when running a job -- Key: MAPREDUCE-3669 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3669 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Priority: Blocker On a secure cluster, when running a job we are seeing a lot of PriviledgedActionException / SaslExceptions. The job runs fine, its just the jobclient can't connect to the AM to get the progress information. Its in a very tight loop retrying while getting the exceptions. snip of the client log is: 12/01/13 15:33:45 INFO security.SecurityUtil: Acquired token Ident: 00 1c 68 61 64 6f 6f 70 71 61 40 44 45 56 2e 59 47 52 49 44 2e 59 41 48 4f 4f 2e 43 4f 4d 08 6d 61 70 72 65 64 71 61 00 8a 01 34 d7 b3 ff f5 8a 01 34 fb c0 83 f5 08 02, Kind: HDFS_DELEGATION_TOKEN, Service: 10.10.10.10:8020 12/01/13 15:33:45 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 8 for user1 on 10.10.10.10:8020 12/01/13 15:33:45 INFO security.TokenCache: Got dt for hdfs://host1.domain.com:8020;uri=10.10.10.10:8020;t.service=10.10.10.10:8020 12/01/13 15:33:45 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used 12/01/13 15:33:45 INFO mapreduce.JobSubmitter: number of splits:2 12/01/13 15:33:45 INFO mapred.ResourceMgrDelegate: Submitted application application_1326410042859_0008 to ResourceManager at rmhost.domain/10.10.10.11:8040 12/01/13 15:33:45 INFO mapreduce.Job: Running job: job_1326410042859_0008 12/01/13 15:33:52 INFO mapred.ClientServiceDelegate: The url to track the job: rmhost.domain:8088/proxy/application_1326410042859_0008/ 12/01/13 15:33:52 ERROR security.UserGroupInformation: PriviledgedActionException as:us...@dev.ygrid.yahoo.com (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Fail ed to find any Kerberos tgt)] 12/01/13 15:33:52 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 12/01/13 15:33:52 ERROR security.UserGroupInformation: PriviledgedActionException as:us...@dev.ygrid.yahoo.com (auth:SIMPLE) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided ( Mechanism level: Failed to find any Kerberos tgt)] 12/01/13 15:33:52 INFO mapred.ClientServiceDelegate: The url to track the job: rmhost.domain:8088/proxy/application_1326410042859_0008/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3668) AccessControlException when running mapred job -list command
[ https://issues.apache.org/jira/browse/MAPREDUCE-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185828#comment-13185828 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-3668: A quick fix that comes to mind is to catch and ignore AccessControlExcpetions on the client side, but there is a bigger underlying issue. job -list going to each and every AM is not going to scale. As part of MAPREDUCE-3476, I am moving all the per-AM information to job -status. I am going to work on MAPREDUCE-3476 soon, but if that gets late, we can push the quick fix in. AccessControlException when running mapred job -list command Key: MAPREDUCE-3668 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3668 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2, security Affects Versions: 0.23.1 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker If a user tries to examine the status of all jobs running on a secure cluster the mapred client can fail with an AccessControlException. For example, submitting two jobs each from a different user then trying to query the status as the second user can fail like this: $ mapred job -list all 12/01/12 20:01:12 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used Total jobs:2 JobId State StartTime UserNameQueue PriorityMaps Reduces UsedContainers RsvdContainers UsedMem RsvdMem NeededMem AM info 12/01/12 20:01:14 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server job_1326396427223_0002 SUCCEEDED 1326398424244 user2default NORMAL 2 2 0 0 0M 0M 0M hostremoved:8088/proxy/application_1326396427223_0002/jobhistory/job/job_1326396427223_2_2 12/01/12 20:01:14 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 12/01/12 20:01:14 WARN mapred.ClientServiceDelegate: Error from remote end: User user2 cannot perform operation VIEW_JOB on job_1326396427223_0001 Exception in thread main RemoteTrace: java.security.AccessControlException: User user2 cannot perform operation VIEW_JOB on job_1326396427223_0001 at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.checkAccess(HistoryClientService.java:293) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:184) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.getJobReport(HistoryClientService.java:200) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:106) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:187) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:344) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1490) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1486) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1484) at Local Trace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: User user2 cannot perform operation VIEW_JOB on job_1326396427223_0001 at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:151) at $Proxy10.getJobReport(Unknown Source) at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:328) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:405) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:431) at
[jira] [Assigned] (MAPREDUCE-3669) Getting a lot of PriviledgedActionException / SaslException when running a job
[ https://issues.apache.org/jira/browse/MAPREDUCE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned MAPREDUCE-3669: -- Assignee: Mahadev konar Mahadev, can you please look at it? This is most likely related to MAPREDUCE-3380. Getting a lot of PriviledgedActionException / SaslException when running a job -- Key: MAPREDUCE-3669 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3669 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Assignee: Mahadev konar Priority: Blocker On a secure cluster, when running a job we are seeing a lot of PriviledgedActionException / SaslExceptions. The job runs fine, its just the jobclient can't connect to the AM to get the progress information. Its in a very tight loop retrying while getting the exceptions. snip of the client log is: 12/01/13 15:33:45 INFO security.SecurityUtil: Acquired token Ident: 00 1c 68 61 64 6f 6f 70 71 61 40 44 45 56 2e 59 47 52 49 44 2e 59 41 48 4f 4f 2e 43 4f 4d 08 6d 61 70 72 65 64 71 61 00 8a 01 34 d7 b3 ff f5 8a 01 34 fb c0 83 f5 08 02, Kind: HDFS_DELEGATION_TOKEN, Service: 10.10.10.10:8020 12/01/13 15:33:45 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 8 for user1 on 10.10.10.10:8020 12/01/13 15:33:45 INFO security.TokenCache: Got dt for hdfs://host1.domain.com:8020;uri=10.10.10.10:8020;t.service=10.10.10.10:8020 12/01/13 15:33:45 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used 12/01/13 15:33:45 INFO mapreduce.JobSubmitter: number of splits:2 12/01/13 15:33:45 INFO mapred.ResourceMgrDelegate: Submitted application application_1326410042859_0008 to ResourceManager at rmhost.domain/10.10.10.11:8040 12/01/13 15:33:45 INFO mapreduce.Job: Running job: job_1326410042859_0008 12/01/13 15:33:52 INFO mapred.ClientServiceDelegate: The url to track the job: rmhost.domain:8088/proxy/application_1326410042859_0008/ 12/01/13 15:33:52 ERROR security.UserGroupInformation: PriviledgedActionException as:us...@dev.ygrid.yahoo.com (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Fail ed to find any Kerberos tgt)] 12/01/13 15:33:52 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 12/01/13 15:33:52 ERROR security.UserGroupInformation: PriviledgedActionException as:us...@dev.ygrid.yahoo.com (auth:SIMPLE) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided ( Mechanism level: Failed to find any Kerberos tgt)] 12/01/13 15:33:52 INFO mapred.ClientServiceDelegate: The url to track the job: rmhost.domain:8088/proxy/application_1326410042859_0008/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3628) DFSIO read throughput is decreased by 16% in 0.23 than Hadoop-0.20.204 on 350 nodes size cluster.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-3628: --- Issue Type: Sub-task (was: Task) Parent: MAPREDUCE-3561 DFSIO read throughput is decreased by 16% in 0.23 than Hadoop-0.20.204 on 350 nodes size cluster. - Key: MAPREDUCE-3628 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3628 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: mrv2 Affects Versions: 0.23.0 Reporter: Amol Kekre Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 DFSIO read throughput is decreased by 16% in 0.23 than Hadoop-0.20.204 on 350 nodes size cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185850#comment-13185850 ] Siddharth Seth commented on MAPREDUCE-3596: --- +1. Patch looks good. Also ran a couple of runs of sort with this patch and MAPREDUCE-3656 - completed without running into either issue. Sort benchmark got hang after completion of 99% map phase - Key: MAPREDUCE-3596 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE-3596-20120111.1.txt, MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2 Courtesy [~vinaythota] {quote} Ran sort benchmark couple of times and every time the job got hang after completion 99% map phase. There are some map tasks failed. Also it's not scheduled some of the pending map tasks. Cluster size is 350 nodes. Build Details: == Compiled: Fri Dec 9 16:25:27 PST 2011 by someone from branches/branch-0.23/hadoop-common-project/hadoop-common ResourceManager version:revision 1212681 by someone source checksum on Fri Dec 9 16:52:07 PST 2011 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 2011 {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code
[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185852#comment-13185852 ] Siddharth Seth commented on MAPREDUCE-3656: --- Ran sort again with this patch and MAPREDUCE-3596. Completed without either error. Sort job on 350 scale is consistently failing with latest MRV2 code Key: MAPREDUCE-3656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Siddharth Seth Priority: Blocker Fix For: 0.23.1 Attachments: MR3656.txt, MR3656.txt, MR3656.txt With the code checked out on last two days. Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs When around 50% of maps are completed, suddenly job jumps to failed state. On looking at NM log, found RM sent Stop Container Request to NM for AM container. But at INFO level from RM log not able find why RM is killing AM when job is not killed manually. One thing found common on failed AM logs is -: org.apache.hadoop.yarn.state.InvalidStateTransitonException With with different. For e.g. One log says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED {code} Whereas other logs says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3596: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-0.23. Thanks Vinod Sort benchmark got hang after completion of 99% map phase - Key: MAPREDUCE-3596 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE-3596-20120111.1.txt, MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2 Courtesy [~vinaythota] {quote} Ran sort benchmark couple of times and every time the job got hang after completion 99% map phase. There are some map tasks failed. Also it's not scheduled some of the pending map tasks. Cluster size is 350 nodes. Build Details: == Compiled: Fri Dec 9 16:25:27 PST 2011 by someone from branches/branch-0.23/hadoop-common-project/hadoop-common ResourceManager version:revision 1212681 by someone source checksum on Fri Dec 9 16:52:07 PST 2011 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 2011 {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185868#comment-13185868 ] Hudson commented on MAPREDUCE-3596: --- Integrated in Hadoop-Hdfs-0.23-Commit #363 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/363/]) merge MAPREDUCE-3596 from trunk. Fix scheduler to handle cleaned up containers, which NMs may subsequently report as running. (Contributed by Vinod Kumar Vavilapalli) sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231303 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java Sort benchmark got hang after completion of 99% map phase - Key: MAPREDUCE-3596 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE-3596-20120111.1.txt, MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2 Courtesy [~vinaythota] {quote} Ran sort benchmark couple of times and every time the job got hang after completion 99% map phase. There are some map tasks failed. Also it's not scheduled some of the pending map tasks. Cluster size is 350 nodes. Build Details: == Compiled: Fri Dec 9 16:25:27 PST 2011 by someone from branches/branch-0.23/hadoop-common-project/hadoop-common ResourceManager version:revision 1212681 by someone source checksum on Fri Dec 9 16:52:07 PST 2011 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 2011 {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on
[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185869#comment-13185869 ] Hudson commented on MAPREDUCE-3596: --- Integrated in Hadoop-Hdfs-trunk-Commit #1612 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1612/]) MAPREDUCE-3596. Fix scheduler to handle cleaned up containers, which NMs may subsequently report as running. (Contributed by Vinod Kumar Vavilapalli) sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231297 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java Sort benchmark got hang after completion of 99% map phase - Key: MAPREDUCE-3596 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE-3596-20120111.1.txt, MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2 Courtesy [~vinaythota] {quote} Ran sort benchmark couple of times and every time the job got hang after completion 99% map phase. There are some map tasks failed. Also it's not scheduled some of the pending map tasks. Cluster size is 350 nodes. Build Details: == Compiled: Fri Dec 9 16:25:27 PST 2011 by someone from branches/branch-0.23/hadoop-common-project/hadoop-common ResourceManager version:revision 1212681 by someone source checksum on Fri Dec 9 16:52:07 PST 2011 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 2011 {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185870#comment-13185870 ] Hudson commented on MAPREDUCE-3596: --- Integrated in Hadoop-Common-trunk-Commit #1539 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1539/]) MAPREDUCE-3596. Fix scheduler to handle cleaned up containers, which NMs may subsequently report as running. (Contributed by Vinod Kumar Vavilapalli) sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231297 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java Sort benchmark got hang after completion of 99% map phase - Key: MAPREDUCE-3596 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE-3596-20120111.1.txt, MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2 Courtesy [~vinaythota] {quote} Ran sort benchmark couple of times and every time the job got hang after completion 99% map phase. There are some map tasks failed. Also it's not scheduled some of the pending map tasks. Cluster size is 350 nodes. Build Details: == Compiled: Fri Dec 9 16:25:27 PST 2011 by someone from branches/branch-0.23/hadoop-common-project/hadoop-common ResourceManager version:revision 1212681 by someone source checksum on Fri Dec 9 16:52:07 PST 2011 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 2011 {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185872#comment-13185872 ] Hudson commented on MAPREDUCE-3596: --- Integrated in Hadoop-Common-0.23-Commit #373 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/373/]) merge MAPREDUCE-3596 from trunk. Fix scheduler to handle cleaned up containers, which NMs may subsequently report as running. (Contributed by Vinod Kumar Vavilapalli) sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231303 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java Sort benchmark got hang after completion of 99% map phase - Key: MAPREDUCE-3596 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE-3596-20120111.1.txt, MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2 Courtesy [~vinaythota] {quote} Ran sort benchmark couple of times and every time the job got hang after completion 99% map phase. There are some map tasks failed. Also it's not scheduled some of the pending map tasks. Cluster size is 350 nodes. Build Details: == Compiled: Fri Dec 9 16:25:27 PST 2011 by someone from branches/branch-0.23/hadoop-common-project/hadoop-common ResourceManager version:revision 1212681 by someone source checksum on Fri Dec 9 16:52:07 PST 2011 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 2011 {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information
[jira] [Commented] (MAPREDUCE-3669) Getting a lot of PriviledgedActionException / SaslException when running a job
[ https://issues.apache.org/jira/browse/MAPREDUCE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185871#comment-13185871 ] Mahadev konar commented on MAPREDUCE-3669: -- Ok, I think I figured out what the issue is, given that I cannot reproduce this. This is the service classloading that we do that is causing the issue. For MRClientProtocol, we have two Security info's, HSSecurityInfo and MRClientSecurityInfo. Depending on which class is loaded first, something will break, either talking to the HS or AM. This was working until now because HSSecurityInfo worked only for kerberos and MRClientSecurityInfo only for tokens. After I added tokens to HSSecurityInfo, this is an issue. Getting a lot of PriviledgedActionException / SaslException when running a job -- Key: MAPREDUCE-3669 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3669 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Assignee: Mahadev konar Priority: Blocker On a secure cluster, when running a job we are seeing a lot of PriviledgedActionException / SaslExceptions. The job runs fine, its just the jobclient can't connect to the AM to get the progress information. Its in a very tight loop retrying while getting the exceptions. snip of the client log is: 12/01/13 15:33:45 INFO security.SecurityUtil: Acquired token Ident: 00 1c 68 61 64 6f 6f 70 71 61 40 44 45 56 2e 59 47 52 49 44 2e 59 41 48 4f 4f 2e 43 4f 4d 08 6d 61 70 72 65 64 71 61 00 8a 01 34 d7 b3 ff f5 8a 01 34 fb c0 83 f5 08 02, Kind: HDFS_DELEGATION_TOKEN, Service: 10.10.10.10:8020 12/01/13 15:33:45 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 8 for user1 on 10.10.10.10:8020 12/01/13 15:33:45 INFO security.TokenCache: Got dt for hdfs://host1.domain.com:8020;uri=10.10.10.10:8020;t.service=10.10.10.10:8020 12/01/13 15:33:45 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used 12/01/13 15:33:45 INFO mapreduce.JobSubmitter: number of splits:2 12/01/13 15:33:45 INFO mapred.ResourceMgrDelegate: Submitted application application_1326410042859_0008 to ResourceManager at rmhost.domain/10.10.10.11:8040 12/01/13 15:33:45 INFO mapreduce.Job: Running job: job_1326410042859_0008 12/01/13 15:33:52 INFO mapred.ClientServiceDelegate: The url to track the job: rmhost.domain:8088/proxy/application_1326410042859_0008/ 12/01/13 15:33:52 ERROR security.UserGroupInformation: PriviledgedActionException as:us...@dev.ygrid.yahoo.com (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Fail ed to find any Kerberos tgt)] 12/01/13 15:33:52 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 12/01/13 15:33:52 ERROR security.UserGroupInformation: PriviledgedActionException as:us...@dev.ygrid.yahoo.com (auth:SIMPLE) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided ( Mechanism level: Failed to find any Kerberos tgt)] 12/01/13 15:33:52 INFO mapred.ClientServiceDelegate: The url to track the job: rmhost.domain:8088/proxy/application_1326410042859_0008/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code
[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-3656: --- Resolution: Fixed Release Note: Fixed a race condition in MR AM which is failing the sort benchmark consistently. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I just committed this to trunk and merged it into branch-0.23. Thanks Sid! Sort job on 350 scale is consistently failing with latest MRV2 code Key: MAPREDUCE-3656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Siddharth Seth Priority: Blocker Fix For: 0.23.1 Attachments: MR3656.txt, MR3656.txt, MR3656.txt With the code checked out on last two days. Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs When around 50% of maps are completed, suddenly job jumps to failed state. On looking at NM log, found RM sent Stop Container Request to NM for AM container. But at INFO level from RM log not able find why RM is killing AM when job is not killed manually. One thing found common on failed AM logs is -: org.apache.hadoop.yarn.state.InvalidStateTransitonException With with different. For e.g. One log says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED {code} Whereas other logs says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185881#comment-13185881 ] Hudson commented on MAPREDUCE-3596: --- Integrated in Hadoop-Mapreduce-0.23-Commit #385 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/385/]) merge MAPREDUCE-3596 from trunk. Fix scheduler to handle cleaned up containers, which NMs may subsequently report as running. (Contributed by Vinod Kumar Vavilapalli) sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231303 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java Sort benchmark got hang after completion of 99% map phase - Key: MAPREDUCE-3596 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE-3596-20120111.1.txt, MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2 Courtesy [~vinaythota] {quote} Ran sort benchmark couple of times and every time the job got hang after completion 99% map phase. There are some map tasks failed. Also it's not scheduled some of the pending map tasks. Cluster size is 350 nodes. Build Details: == Compiled: Fri Dec 9 16:25:27 PST 2011 by someone from branches/branch-0.23/hadoop-common-project/hadoop-common ResourceManager version:revision 1212681 by someone source checksum on Fri Dec 9 16:52:07 PST 2011 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 2011 {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more
[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code
[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185884#comment-13185884 ] Hudson commented on MAPREDUCE-3656: --- Integrated in Hadoop-Hdfs-0.23-Commit #364 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/364/]) MAPREDUCE-3656. Fixed a race condition in MR AM which is failing the sort benchmark consistently. Contributed by Siddarth Seth. svn merge --ignore-ancestry -c 1231314 ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231316 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskAttemptListener.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java Sort job on 350 scale is consistently failing with latest MRV2 code Key: MAPREDUCE-3656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Siddharth Seth Priority: Blocker Fix For: 0.23.1 Attachments: MR3656.txt, MR3656.txt, MR3656.txt With the code checked out on last two days. Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs When around 50% of maps are completed, suddenly job jumps to failed state. On looking at NM log, found RM sent Stop Container Request to NM for AM container. But at INFO level from RM log not able find why RM is killing AM when job is not killed manually. One thing found common on failed AM logs is -: org.apache.hadoop.yarn.state.InvalidStateTransitonException With with different. For e.g. One log says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED {code} Whereas other logs says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code
[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185886#comment-13185886 ] Hudson commented on MAPREDUCE-3656: --- Integrated in Hadoop-Common-trunk-Commit #1540 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1540/]) MAPREDUCE-3656. Fixed a race condition in MR AM which is failing the sort benchmark consistently. Contributed by Siddarth Seth. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231314 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskAttemptListener.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java Sort job on 350 scale is consistently failing with latest MRV2 code Key: MAPREDUCE-3656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Siddharth Seth Priority: Blocker Fix For: 0.23.1 Attachments: MR3656.txt, MR3656.txt, MR3656.txt With the code checked out on last two days. Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs When around 50% of maps are completed, suddenly job jumps to failed state. On looking at NM log, found RM sent Stop Container Request to NM for AM container. But at INFO level from RM log not able find why RM is killing AM when job is not killed manually. One thing found common on failed AM logs is -: org.apache.hadoop.yarn.state.InvalidStateTransitonException With with different. For e.g. One log says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED {code} Whereas other logs says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code
[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185885#comment-13185885 ] Hudson commented on MAPREDUCE-3656: --- Integrated in Hadoop-Common-0.23-Commit #374 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/374/]) MAPREDUCE-3656. Fixed a race condition in MR AM which is failing the sort benchmark consistently. Contributed by Siddarth Seth. svn merge --ignore-ancestry -c 1231314 ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231316 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskAttemptListener.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java Sort job on 350 scale is consistently failing with latest MRV2 code Key: MAPREDUCE-3656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Siddharth Seth Priority: Blocker Fix For: 0.23.1 Attachments: MR3656.txt, MR3656.txt, MR3656.txt With the code checked out on last two days. Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs When around 50% of maps are completed, suddenly job jumps to failed state. On looking at NM log, found RM sent Stop Container Request to NM for AM container. But at INFO level from RM log not able find why RM is killing AM when job is not killed manually. One thing found common on failed AM logs is -: org.apache.hadoop.yarn.state.InvalidStateTransitonException With with different. For e.g. One log says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED {code} Whereas other logs says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code
[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185887#comment-13185887 ] Hudson commented on MAPREDUCE-3656: --- Integrated in Hadoop-Hdfs-trunk-Commit #1613 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1613/]) MAPREDUCE-3656. Fixed a race condition in MR AM which is failing the sort benchmark consistently. Contributed by Siddarth Seth. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231314 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskAttemptListener.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java Sort job on 350 scale is consistently failing with latest MRV2 code Key: MAPREDUCE-3656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Siddharth Seth Priority: Blocker Fix For: 0.23.1 Attachments: MR3656.txt, MR3656.txt, MR3656.txt With the code checked out on last two days. Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs When around 50% of maps are completed, suddenly job jumps to failed state. On looking at NM log, found RM sent Stop Container Request to NM for AM container. But at INFO level from RM log not able find why RM is killing AM when job is not killed manually. One thing found common on failed AM logs is -: org.apache.hadoop.yarn.state.InvalidStateTransitonException With with different. For e.g. One log says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED {code} Whereas other logs says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned MAPREDUCE-3532: -- Assignee: Bhallamudi Venkata Siva Kamesh When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM -- Key: MAPREDUCE-3532 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Bhallamudi Venkata Siva Kamesh Priority: Critical Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch I tried following -: yarn.nodemanager.address=0.0.0.0:0 yarn.nodemanager.webapp.address=0.0.0.0:0 yarn.nodemanager.localizer.address=0.0.0.0:0 mapreduce.shuffle.port=0 When 0 is provided as number in yarn.nodemanager.webapp.address. NM instantiate WebServer as 0 piort e.g. {code} 2011-12-08 11:33:02,467 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:0 {code} After that WebServer pick up some random port e.g. {code} 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 36272 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:36272 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 36272 {code} And NM WebServer responds correctly but RM's cluster/Nodes page shows the following -: {code} /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB {code} Whereas NM:0 is not clickable. Seems even NM's webserver pick random port but it never gets updated and so NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable But verified that MR job runs successfully with random. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3614) finalState UNDEFINED if AM is killed by hand
[ https://issues.apache.org/jira/browse/MAPREDUCE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185893#comment-13185893 ] Siddharth Seth commented on MAPREDUCE-3614: --- For the job history related changes - do we want SIGTERM jobs to show up as KILLED ? In that case the proposed change to the shutdown hook will be required. Otherwise another possibility would be to ensure JobHistoryEventHandler.stop() calls / has already called closeEventWriter() - which is what takes care of moving the history file to the correct location. finalState UNDEFINED if AM is killed by hand Key: MAPREDUCE-3614 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3614 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: MAPREDUCE-3614.branch-0.23.patch Courtesy [~dcapwell] {quote} If the AM is running and you kill the process (sudo kill #pid), the State in Yarn would be FINISHED and FinalStatus is UNDEFINED. The Tracking UI would say History and point to the proxy url (which will redirect to the history server). The state should be more descriptive that the job failed and the tracker url shouldn't point to the history server. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code
[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185900#comment-13185900 ] Hudson commented on MAPREDUCE-3656: --- Integrated in Hadoop-Mapreduce-0.23-Commit #386 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/386/]) MAPREDUCE-3656. Fixed a race condition in MR AM which is failing the sort benchmark consistently. Contributed by Siddarth Seth. svn merge --ignore-ancestry -c 1231314 ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231316 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskAttemptListener.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java Sort job on 350 scale is consistently failing with latest MRV2 code Key: MAPREDUCE-3656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Siddharth Seth Priority: Blocker Fix For: 0.23.1 Attachments: MR3656.txt, MR3656.txt, MR3656.txt With the code checked out on last two days. Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs When around 50% of maps are completed, suddenly job jumps to failed state. On looking at NM log, found RM sent Stop Container Request to NM for AM container. But at INFO level from RM log not able find why RM is killing AM when job is not killed manually. One thing found common on failed AM logs is -: org.apache.hadoop.yarn.state.InvalidStateTransitonException With with different. For e.g. One log says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED {code} Whereas other logs says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185901#comment-13185901 ] Hudson commented on MAPREDUCE-3596: --- Integrated in Hadoop-Mapreduce-trunk-Commit #1557 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1557/]) MAPREDUCE-3596. Fix scheduler to handle cleaned up containers, which NMs may subsequently report as running. (Contributed by Vinod Kumar Vavilapalli) sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231297 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java Sort benchmark got hang after completion of 99% map phase - Key: MAPREDUCE-3596 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE-3596-20120111.1.txt, MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2 Courtesy [~vinaythota] {quote} Ran sort benchmark couple of times and every time the job got hang after completion 99% map phase. There are some map tasks failed. Also it's not scheduled some of the pending map tasks. Cluster size is 350 nodes. Build Details: == Compiled: Fri Dec 9 16:25:27 PST 2011 by someone from branches/branch-0.23/hadoop-common-project/hadoop-common ResourceManager version:revision 1212681 by someone source checksum on Fri Dec 9 16:52:07 PST 2011 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 2011 {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-3532: --- Fix Version/s: 0.23.1 Status: Open (was: Patch Available) I looked through the patch. Looks good. +1. When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM -- Key: MAPREDUCE-3532 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Bhallamudi Venkata Siva Kamesh Priority: Critical Fix For: 0.23.1 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch I tried following -: yarn.nodemanager.address=0.0.0.0:0 yarn.nodemanager.webapp.address=0.0.0.0:0 yarn.nodemanager.localizer.address=0.0.0.0:0 mapreduce.shuffle.port=0 When 0 is provided as number in yarn.nodemanager.webapp.address. NM instantiate WebServer as 0 piort e.g. {code} 2011-12-08 11:33:02,467 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:0 {code} After that WebServer pick up some random port e.g. {code} 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 36272 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:36272 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 36272 {code} And NM WebServer responds correctly but RM's cluster/Nodes page shows the following -: {code} /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB {code} Whereas NM:0 is not clickable. Seems even NM's webserver pick random port but it never gets updated and so NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable But verified that MR job runs successfully with random. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved MAPREDUCE-3532. Resolution: Fixed Release Note: Modified NM to report correct http address when an ephemeral web port is configured. Hadoop Flags: Reviewed I just committed this to trunk and branch-0.23. Thanks Kamesh! When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM -- Key: MAPREDUCE-3532 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Bhallamudi Venkata Siva Kamesh Priority: Critical Fix For: 0.23.1 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch I tried following -: yarn.nodemanager.address=0.0.0.0:0 yarn.nodemanager.webapp.address=0.0.0.0:0 yarn.nodemanager.localizer.address=0.0.0.0:0 mapreduce.shuffle.port=0 When 0 is provided as number in yarn.nodemanager.webapp.address. NM instantiate WebServer as 0 piort e.g. {code} 2011-12-08 11:33:02,467 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:0 {code} After that WebServer pick up some random port e.g. {code} 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 36272 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:36272 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 36272 {code} And NM WebServer responds correctly but RM's cluster/Nodes page shows the following -: {code} /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB {code} Whereas NM:0 is not clickable. Seems even NM's webserver pick random port but it never gets updated and so NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable But verified that MR job runs successfully with random. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3670) TaskAttemptListener should respond with errors to unregistered tasks
TaskAttemptListener should respond with errors to unregistered tasks Key: MAPREDUCE-3670 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3670 Project: Hadoop Map/Reduce Issue Type: Task Components: mr-am, mrv2 Affects Versions: 0.23.0 Reporter: Siddharth Seth The TaskAttemptListener currently accepts TaskUmbilical calls from tasks which may have already been unregistered and processes updates. It should just send back Exceptions so that the tasks die. This isn't critical though - since the task/container would eventually be killed by the AM (via a call to NM stopContainer). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185928#comment-13185928 ] Hudson commented on MAPREDUCE-3532: --- Integrated in Hadoop-Common-trunk-Commit #1541 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1541/]) MAPREDUCE-3532. Modified NM to report correct http address when an ephemeral web port is configured. Contributed by Bhallamudi Venkata Siva Kamesh. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231342 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM -- Key: MAPREDUCE-3532 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Bhallamudi Venkata Siva Kamesh Priority: Critical Fix For: 0.23.1 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch I tried following -: yarn.nodemanager.address=0.0.0.0:0 yarn.nodemanager.webapp.address=0.0.0.0:0 yarn.nodemanager.localizer.address=0.0.0.0:0 mapreduce.shuffle.port=0 When 0 is provided as number in yarn.nodemanager.webapp.address. NM instantiate WebServer as 0 piort e.g. {code} 2011-12-08 11:33:02,467 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:0 {code} After that WebServer pick up some random port e.g. {code} 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 36272 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:36272 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 36272 {code} And NM WebServer responds correctly but RM's cluster/Nodes page shows the following -: {code} /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB {code} Whereas NM:0 is not clickable. Seems even NM's webserver pick random port but it never gets updated and so NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable But verified that MR job runs successfully with random. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185927#comment-13185927 ] Hudson commented on MAPREDUCE-3532: --- Integrated in Hadoop-Common-0.23-Commit #375 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/375/]) MAPREDUCE-3532. Modified NM to report correct http address when an ephemeral web port is configured. Contributed by Bhallamudi Venkata Siva Kamesh. svn merge --ignore-ancestry -c 1231342 ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231344 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM -- Key: MAPREDUCE-3532 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Bhallamudi Venkata Siva Kamesh Priority: Critical Fix For: 0.23.1 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch I tried following -: yarn.nodemanager.address=0.0.0.0:0 yarn.nodemanager.webapp.address=0.0.0.0:0 yarn.nodemanager.localizer.address=0.0.0.0:0 mapreduce.shuffle.port=0 When 0 is provided as number in yarn.nodemanager.webapp.address. NM instantiate WebServer as 0 piort e.g. {code} 2011-12-08 11:33:02,467 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:0 {code} After that WebServer pick up some random port e.g. {code} 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 36272 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:36272 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 36272 {code} And NM WebServer responds correctly but RM's cluster/Nodes page shows the following -: {code} /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB {code} Whereas NM:0 is not clickable. Seems even NM's webserver pick random port but it never gets updated and so NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable But verified that MR job runs successfully with random. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3671) AM-NM RPC calls occasionally takes a long time to respond
AM-NM RPC calls occasionally takes a long time to respond - Key: MAPREDUCE-3671 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3671 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.0 Reporter: Siddharth Seth Observed while looking at MAPREDUCE-3596 and MAPREDUCE-3656. startContainer taking over a minute in some cases, otherwise 15 seconds. Both were observed soon after reduce tasks started. Network congestion ? Need more looking into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185931#comment-13185931 ] Hudson commented on MAPREDUCE-3532: --- Integrated in Hadoop-Hdfs-0.23-Commit #365 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/365/]) MAPREDUCE-3532. Modified NM to report correct http address when an ephemeral web port is configured. Contributed by Bhallamudi Venkata Siva Kamesh. svn merge --ignore-ancestry -c 1231342 ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231344 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM -- Key: MAPREDUCE-3532 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Bhallamudi Venkata Siva Kamesh Priority: Critical Fix For: 0.23.1 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch I tried following -: yarn.nodemanager.address=0.0.0.0:0 yarn.nodemanager.webapp.address=0.0.0.0:0 yarn.nodemanager.localizer.address=0.0.0.0:0 mapreduce.shuffle.port=0 When 0 is provided as number in yarn.nodemanager.webapp.address. NM instantiate WebServer as 0 piort e.g. {code} 2011-12-08 11:33:02,467 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:0 {code} After that WebServer pick up some random port e.g. {code} 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 36272 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:36272 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 36272 {code} And NM WebServer responds correctly but RM's cluster/Nodes page shows the following -: {code} /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB {code} Whereas NM:0 is not clickable. Seems even NM's webserver pick random port but it never gets updated and so NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable But verified that MR job runs successfully with random. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185932#comment-13185932 ] Hudson commented on MAPREDUCE-3532: --- Integrated in Hadoop-Hdfs-trunk-Commit #1614 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1614/]) MAPREDUCE-3532. Modified NM to report correct http address when an ephemeral web port is configured. Contributed by Bhallamudi Venkata Siva Kamesh. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231342 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM -- Key: MAPREDUCE-3532 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Bhallamudi Venkata Siva Kamesh Priority: Critical Fix For: 0.23.1 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch I tried following -: yarn.nodemanager.address=0.0.0.0:0 yarn.nodemanager.webapp.address=0.0.0.0:0 yarn.nodemanager.localizer.address=0.0.0.0:0 mapreduce.shuffle.port=0 When 0 is provided as number in yarn.nodemanager.webapp.address. NM instantiate WebServer as 0 piort e.g. {code} 2011-12-08 11:33:02,467 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:0 {code} After that WebServer pick up some random port e.g. {code} 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 36272 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:36272 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 36272 {code} And NM WebServer responds correctly but RM's cluster/Nodes page shows the following -: {code} /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB {code} Whereas NM:0 is not clickable. Seems even NM's webserver pick random port but it never gets updated and so NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable But verified that MR job runs successfully with random. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code
[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185938#comment-13185938 ] Hudson commented on MAPREDUCE-3656: --- Integrated in Hadoop-Mapreduce-trunk-Commit #1558 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1558/]) MAPREDUCE-3656. Fixed a race condition in MR AM which is failing the sort benchmark consistently. Contributed by Siddarth Seth. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231314 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskAttemptListener.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java Sort job on 350 scale is consistently failing with latest MRV2 code Key: MAPREDUCE-3656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Siddharth Seth Priority: Blocker Fix For: 0.23.1 Attachments: MR3656.txt, MR3656.txt, MR3656.txt With the code checked out on last two days. Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs When around 50% of maps are completed, suddenly job jumps to failed state. On looking at NM log, found RM sent Stop Container Request to NM for AM container. But at INFO level from RM log not able find why RM is killing AM when job is not killed manually. One thing found common on failed AM logs is -: org.apache.hadoop.yarn.state.InvalidStateTransitonException With with different. For e.g. One log says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED {code} Whereas other logs says -: {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185945#comment-13185945 ] Hudson commented on MAPREDUCE-3532: --- Integrated in Hadoop-Mapreduce-0.23-Commit #387 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/387/]) MAPREDUCE-3532. Modified NM to report correct http address when an ephemeral web port is configured. Contributed by Bhallamudi Venkata Siva Kamesh. svn merge --ignore-ancestry -c 1231342 ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231344 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM -- Key: MAPREDUCE-3532 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Bhallamudi Venkata Siva Kamesh Priority: Critical Fix For: 0.23.1 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch I tried following -: yarn.nodemanager.address=0.0.0.0:0 yarn.nodemanager.webapp.address=0.0.0.0:0 yarn.nodemanager.localizer.address=0.0.0.0:0 mapreduce.shuffle.port=0 When 0 is provided as number in yarn.nodemanager.webapp.address. NM instantiate WebServer as 0 piort e.g. {code} 2011-12-08 11:33:02,467 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:0 {code} After that WebServer pick up some random port e.g. {code} 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 36272 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:36272 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 36272 {code} And NM WebServer responds correctly but RM's cluster/Nodes page shows the following -: {code} /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB {code} Whereas NM:0 is not clickable. Seems even NM's webserver pick random port but it never gets updated and so NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable But verified that MR job runs successfully with random. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185979#comment-13185979 ] Hudson commented on MAPREDUCE-3532: --- Integrated in Hadoop-Mapreduce-trunk-Commit #1559 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1559/]) MAPREDUCE-3532. Modified NM to report correct http address when an ephemeral web port is configured. Contributed by Bhallamudi Venkata Siva Kamesh. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231342 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM -- Key: MAPREDUCE-3532 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1 Reporter: Karam Singh Assignee: Bhallamudi Venkata Siva Kamesh Priority: Critical Fix For: 0.23.1 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch I tried following -: yarn.nodemanager.address=0.0.0.0:0 yarn.nodemanager.webapp.address=0.0.0.0:0 yarn.nodemanager.localizer.address=0.0.0.0:0 mapreduce.shuffle.port=0 When 0 is provided as number in yarn.nodemanager.webapp.address. NM instantiate WebServer as 0 piort e.g. {code} 2011-12-08 11:33:02,467 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:0 {code} After that WebServer pick up some random port e.g. {code} 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 36272 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:36272 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 36272 {code} And NM WebServer responds correctly but RM's cluster/Nodes page shows the following -: {code} /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB {code} Whereas NM:0 is not clickable. Seems even NM's webserver pick random port but it never gets updated and so NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable But verified that MR job runs successfully with random. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false
[ https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185986#comment-13185986 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-3404: bq. How do we make sure that if mapreduce.job.maps.speculative=false and mapreduce.job.reduce.speculative=true, the maps dont get speculated and the reduces gets speculated? The speculator handles map and reduce speculation separately. I just looked at the patch, and it achieves the above by not sending any map events to the speculative when map-speculation is disabled. The speculator doesn't seem to find any maps to speculates (as it doesn't know about any maps at all) and so only speculates reduces. Works (IMO) a convoluted way but can live with that. +1 for the patch. Pushing this in. Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false --- Key: MAPREDUCE-3404 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission, mrv2 Affects Versions: 0.23.0 Environment: Hadoop version is: Hadoop 0.23.0.1110031628 10 node test cluster Reporter: patrick white Assignee: Eric Payne Priority: Critical Fix For: 0.23.0, 0.23.1, 0.24.0 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt When forcing a mapper to take significantly longer than other map tasks, speculative map tasks are launched even if the mapreduce.job.maps.speculative.execution parameter is set to 'false'. Testcase: ran default WordCount job with spec execution set to false for both map and reduce but still saw a fifth mapper task launch, ran job as follows: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=false -Dmapreduce.job.reduces.speculative.execution=false /tmp/test_file_of_words* /tmp/file_of_words.out Input data was 4 text files hdfs blocksize, with same word pattern plus one diff text line in each file, fourth file was 4 times as large as others: hadoop --config config fs -ls /tmp Found 5 items drwxr-xr-x - user hdfs 0 2011-10-20 16:17 /tmp/file_of_words.out -rw-r--r-- 3 user hdfs 62800021 2011-10-20 14:45 /tmp/test_file_of_words1 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words2 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words3 -rw-r--r-- 3 user hdfs 271708312 2011-10-20 15:50 /tmp/test_file_of_words4 Job launched 5 mappers despite spec exec set to false, output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=273540 SLOTS_MILLIS_REDUCES=212876 Reran same case as above only set both spec exec params to 'true', same results only this time the fifth task being launched is expected since spec exec = true. job run: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=true -Dmapreduce.job.reduces.speculative.execution=true /tmp/test_file_of_words* /tmp/file_of_words.out output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=279653 SLOTS_MILLIS_REDUCES=211474 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false
[ https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-3404: --- Resolution: Fixed Fix Version/s: (was: 0.23.0) Release Note: Corrected MR AM to honor speculative configuration and enable speculating either maps or reduces. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) ..oh, and the tests look good too. Just committed this to trunk and branch-0.23. Thanks Eric! On a side note, not caused by this patch, it is not correct that we increment the num_failed_maps counter when the speculation kills a task. Instead we should have a num_killed_maps. Separate issue, will file a ticket. Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false --- Key: MAPREDUCE-3404 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission, mrv2 Affects Versions: 0.23.0 Environment: Hadoop version is: Hadoop 0.23.0.1110031628 10 node test cluster Reporter: patrick white Assignee: Eric Payne Priority: Critical Fix For: 0.23.1, 0.24.0 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt When forcing a mapper to take significantly longer than other map tasks, speculative map tasks are launched even if the mapreduce.job.maps.speculative.execution parameter is set to 'false'. Testcase: ran default WordCount job with spec execution set to false for both map and reduce but still saw a fifth mapper task launch, ran job as follows: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=false -Dmapreduce.job.reduces.speculative.execution=false /tmp/test_file_of_words* /tmp/file_of_words.out Input data was 4 text files hdfs blocksize, with same word pattern plus one diff text line in each file, fourth file was 4 times as large as others: hadoop --config config fs -ls /tmp Found 5 items drwxr-xr-x - user hdfs 0 2011-10-20 16:17 /tmp/file_of_words.out -rw-r--r-- 3 user hdfs 62800021 2011-10-20 14:45 /tmp/test_file_of_words1 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words2 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words3 -rw-r--r-- 3 user hdfs 271708312 2011-10-20 15:50 /tmp/test_file_of_words4 Job launched 5 mappers despite spec exec set to false, output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=273540 SLOTS_MILLIS_REDUCES=212876 Reran same case as above only set both spec exec params to 'true', same results only this time the fifth task being launched is expected since spec exec = true. job run: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=true -Dmapreduce.job.reduces.speculative.execution=true /tmp/test_file_of_words* /tmp/file_of_words.out output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=279653 SLOTS_MILLIS_REDUCES=211474 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3672) Killed maps shouldn't be counted towards JobCounter.NUM_FAILED_MAPS
[ https://issues.apache.org/jira/browse/MAPREDUCE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-3672: --- Description: We count maps that are killed, say by speculator, towards JobCounter.NUM_FAILED_MAPS. We should instead have a separate JobCounter for killed maps. Same with reduces too. was: We counted maps that are killed, say by speculator, towards JobCounter.NUM_FAILED_MAPS. We should instead have a separate JobCounter for killed maps. Same with reduces too. Killed maps shouldn't be counted towards JobCounter.NUM_FAILED_MAPS --- Key: MAPREDUCE-3672 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3672 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Fix For: 0.23.1 We count maps that are killed, say by speculator, towards JobCounter.NUM_FAILED_MAPS. We should instead have a separate JobCounter for killed maps. Same with reduces too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3672) Killed maps shouldn't be counted towards JobCounter.NUM_FAILED_MAPS
Killed maps shouldn't be counted towards JobCounter.NUM_FAILED_MAPS --- Key: MAPREDUCE-3672 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3672 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Fix For: 0.23.1 We counted maps that are killed, say by speculator, towards JobCounter.NUM_FAILED_MAPS. We should instead have a separate JobCounter for killed maps. Same with reduces too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3671) AM-NM RPC calls occasionally takes a long time to respond
[ https://issues.apache.org/jira/browse/MAPREDUCE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185993#comment-13185993 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-3671: .. And this has performance implications on benchmarks. Hurts execution time real bad when there is no speculation for jobs with small tasks. Which is the default. AM-NM RPC calls occasionally takes a long time to respond - Key: MAPREDUCE-3671 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3671 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.0 Reporter: Siddharth Seth Observed while looking at MAPREDUCE-3596 and MAPREDUCE-3656. startContainer taking over a minute in some cases, otherwise 15 seconds. Both were observed soon after reduce tasks started. Network congestion ? Need more looking into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3671) AM-NM RPC calls occasionally takes a long time to respond
[ https://issues.apache.org/jira/browse/MAPREDUCE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-3671: --- Issue Type: Sub-task (was: Bug) Parent: MAPREDUCE-3561 AM-NM RPC calls occasionally takes a long time to respond - Key: MAPREDUCE-3671 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3671 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: mrv2, nodemanager Affects Versions: 0.23.0 Reporter: Siddharth Seth Observed while looking at MAPREDUCE-3596 and MAPREDUCE-3656. startContainer taking over a minute in some cases, otherwise 15 seconds. Both were observed soon after reduce tasks started. Network congestion ? Need more looking into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false
[ https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185996#comment-13185996 ] Hudson commented on MAPREDUCE-3404: --- Integrated in Hadoop-Hdfs-0.23-Commit #366 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/366/]) MAPREDUCE-3404. Corrected MR AM to honor speculative configuration and enable speculating either maps or reduces. Contributed by Eric Payne. svn merge --ignore-ancestry -c 1231395 ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231397 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestSpeculativeExecution.java Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false --- Key: MAPREDUCE-3404 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission, mrv2 Affects Versions: 0.23.0 Environment: Hadoop version is: Hadoop 0.23.0.1110031628 10 node test cluster Reporter: patrick white Assignee: Eric Payne Priority: Critical Fix For: 0.23.1, 0.24.0 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt When forcing a mapper to take significantly longer than other map tasks, speculative map tasks are launched even if the mapreduce.job.maps.speculative.execution parameter is set to 'false'. Testcase: ran default WordCount job with spec execution set to false for both map and reduce but still saw a fifth mapper task launch, ran job as follows: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=false -Dmapreduce.job.reduces.speculative.execution=false /tmp/test_file_of_words* /tmp/file_of_words.out Input data was 4 text files hdfs blocksize, with same word pattern plus one diff text line in each file, fourth file was 4 times as large as others: hadoop --config config fs -ls /tmp Found 5 items drwxr-xr-x - user hdfs 0 2011-10-20 16:17 /tmp/file_of_words.out -rw-r--r-- 3 user hdfs 62800021 2011-10-20 14:45 /tmp/test_file_of_words1 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words2 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words3 -rw-r--r-- 3 user hdfs 271708312 2011-10-20 15:50 /tmp/test_file_of_words4 Job launched 5 mappers despite spec exec set to false, output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=273540 SLOTS_MILLIS_REDUCES=212876 Reran same case as above only set both spec exec params to 'true', same results only this time the fifth task being launched is expected since spec exec = true. job run: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=true -Dmapreduce.job.reduces.speculative.execution=true /tmp/test_file_of_words* /tmp/file_of_words.out output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=279653 SLOTS_MILLIS_REDUCES=211474 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false
[ https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185997#comment-13185997 ] Hudson commented on MAPREDUCE-3404: --- Integrated in Hadoop-Hdfs-trunk-Commit #1615 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1615/]) MAPREDUCE-3404. Corrected MR AM to honor speculative configuration and enable speculating either maps or reduces. Contributed by Eric Payne. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231395 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestSpeculativeExecution.java Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false --- Key: MAPREDUCE-3404 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission, mrv2 Affects Versions: 0.23.0 Environment: Hadoop version is: Hadoop 0.23.0.1110031628 10 node test cluster Reporter: patrick white Assignee: Eric Payne Priority: Critical Fix For: 0.23.1, 0.24.0 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt When forcing a mapper to take significantly longer than other map tasks, speculative map tasks are launched even if the mapreduce.job.maps.speculative.execution parameter is set to 'false'. Testcase: ran default WordCount job with spec execution set to false for both map and reduce but still saw a fifth mapper task launch, ran job as follows: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=false -Dmapreduce.job.reduces.speculative.execution=false /tmp/test_file_of_words* /tmp/file_of_words.out Input data was 4 text files hdfs blocksize, with same word pattern plus one diff text line in each file, fourth file was 4 times as large as others: hadoop --config config fs -ls /tmp Found 5 items drwxr-xr-x - user hdfs 0 2011-10-20 16:17 /tmp/file_of_words.out -rw-r--r-- 3 user hdfs 62800021 2011-10-20 14:45 /tmp/test_file_of_words1 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words2 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words3 -rw-r--r-- 3 user hdfs 271708312 2011-10-20 15:50 /tmp/test_file_of_words4 Job launched 5 mappers despite spec exec set to false, output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=273540 SLOTS_MILLIS_REDUCES=212876 Reran same case as above only set both spec exec params to 'true', same results only this time the fifth task being launched is expected since spec exec = true. job run: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=true -Dmapreduce.job.reduces.speculative.execution=true /tmp/test_file_of_words* /tmp/file_of_words.out output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=279653 SLOTS_MILLIS_REDUCES=211474 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false
[ https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185999#comment-13185999 ] Hudson commented on MAPREDUCE-3404: --- Integrated in Hadoop-Common-trunk-Commit #1542 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1542/]) MAPREDUCE-3404. Corrected MR AM to honor speculative configuration and enable speculating either maps or reduces. Contributed by Eric Payne. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231395 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestSpeculativeExecution.java Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false --- Key: MAPREDUCE-3404 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission, mrv2 Affects Versions: 0.23.0 Environment: Hadoop version is: Hadoop 0.23.0.1110031628 10 node test cluster Reporter: patrick white Assignee: Eric Payne Priority: Critical Fix For: 0.23.1, 0.24.0 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt When forcing a mapper to take significantly longer than other map tasks, speculative map tasks are launched even if the mapreduce.job.maps.speculative.execution parameter is set to 'false'. Testcase: ran default WordCount job with spec execution set to false for both map and reduce but still saw a fifth mapper task launch, ran job as follows: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=false -Dmapreduce.job.reduces.speculative.execution=false /tmp/test_file_of_words* /tmp/file_of_words.out Input data was 4 text files hdfs blocksize, with same word pattern plus one diff text line in each file, fourth file was 4 times as large as others: hadoop --config config fs -ls /tmp Found 5 items drwxr-xr-x - user hdfs 0 2011-10-20 16:17 /tmp/file_of_words.out -rw-r--r-- 3 user hdfs 62800021 2011-10-20 14:45 /tmp/test_file_of_words1 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words2 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words3 -rw-r--r-- 3 user hdfs 271708312 2011-10-20 15:50 /tmp/test_file_of_words4 Job launched 5 mappers despite spec exec set to false, output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=273540 SLOTS_MILLIS_REDUCES=212876 Reran same case as above only set both spec exec params to 'true', same results only this time the fifth task being launched is expected since spec exec = true. job run: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=true -Dmapreduce.job.reduces.speculative.execution=true /tmp/test_file_of_words* /tmp/file_of_words.out output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=279653 SLOTS_MILLIS_REDUCES=211474 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false
[ https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186011#comment-13186011 ] Hudson commented on MAPREDUCE-3404: --- Integrated in Hadoop-Mapreduce-0.23-Commit #388 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/388/]) MAPREDUCE-3404. Corrected MR AM to honor speculative configuration and enable speculating either maps or reduces. Contributed by Eric Payne. svn merge --ignore-ancestry -c 1231395 ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231397 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestSpeculativeExecution.java Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false --- Key: MAPREDUCE-3404 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission, mrv2 Affects Versions: 0.23.0 Environment: Hadoop version is: Hadoop 0.23.0.1110031628 10 node test cluster Reporter: patrick white Assignee: Eric Payne Priority: Critical Fix For: 0.23.1, 0.24.0 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt When forcing a mapper to take significantly longer than other map tasks, speculative map tasks are launched even if the mapreduce.job.maps.speculative.execution parameter is set to 'false'. Testcase: ran default WordCount job with spec execution set to false for both map and reduce but still saw a fifth mapper task launch, ran job as follows: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=false -Dmapreduce.job.reduces.speculative.execution=false /tmp/test_file_of_words* /tmp/file_of_words.out Input data was 4 text files hdfs blocksize, with same word pattern plus one diff text line in each file, fourth file was 4 times as large as others: hadoop --config config fs -ls /tmp Found 5 items drwxr-xr-x - user hdfs 0 2011-10-20 16:17 /tmp/file_of_words.out -rw-r--r-- 3 user hdfs 62800021 2011-10-20 14:45 /tmp/test_file_of_words1 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words2 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words3 -rw-r--r-- 3 user hdfs 271708312 2011-10-20 15:50 /tmp/test_file_of_words4 Job launched 5 mappers despite spec exec set to false, output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=273540 SLOTS_MILLIS_REDUCES=212876 Reran same case as above only set both spec exec params to 'true', same results only this time the fifth task being launched is expected since spec exec = true. job run: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=true -Dmapreduce.job.reduces.speculative.execution=true /tmp/test_file_of_words* /tmp/file_of_words.out output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=279653 SLOTS_MILLIS_REDUCES=211474 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false
[ https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186024#comment-13186024 ] Hudson commented on MAPREDUCE-3404: --- Integrated in Hadoop-Mapreduce-trunk-Commit #1560 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1560/]) MAPREDUCE-3404. Corrected MR AM to honor speculative configuration and enable speculating either maps or reduces. Contributed by Eric Payne. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231395 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestSpeculativeExecution.java Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false --- Key: MAPREDUCE-3404 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission, mrv2 Affects Versions: 0.23.0 Environment: Hadoop version is: Hadoop 0.23.0.1110031628 10 node test cluster Reporter: patrick white Assignee: Eric Payne Priority: Critical Fix For: 0.23.1, 0.24.0 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt When forcing a mapper to take significantly longer than other map tasks, speculative map tasks are launched even if the mapreduce.job.maps.speculative.execution parameter is set to 'false'. Testcase: ran default WordCount job with spec execution set to false for both map and reduce but still saw a fifth mapper task launch, ran job as follows: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=false -Dmapreduce.job.reduces.speculative.execution=false /tmp/test_file_of_words* /tmp/file_of_words.out Input data was 4 text files hdfs blocksize, with same word pattern plus one diff text line in each file, fourth file was 4 times as large as others: hadoop --config config fs -ls /tmp Found 5 items drwxr-xr-x - user hdfs 0 2011-10-20 16:17 /tmp/file_of_words.out -rw-r--r-- 3 user hdfs 62800021 2011-10-20 14:45 /tmp/test_file_of_words1 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words2 -rw-r--r-- 3 user hdfs 62800024 2011-10-20 14:46 /tmp/test_file_of_words3 -rw-r--r-- 3 user hdfs 271708312 2011-10-20 15:50 /tmp/test_file_of_words4 Job launched 5 mappers despite spec exec set to false, output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=273540 SLOTS_MILLIS_REDUCES=212876 Reran same case as above only set both spec exec params to 'true', same results only this time the fifth task being launched is expected since spec exec = true. job run: hadoop --config config jar /tmp/testphw/wordcount.jar WordCount -Dmapreduce.job.maps.speculative.execution=true -Dmapreduce.job.reduces.speculative.execution=true /tmp/test_file_of_words* /tmp/file_of_words.out output snippet: org.apache.hadoop.mapreduce.JobCounter NUM_FAILED_MAPS=1 TOTAL_LAUNCHED_MAPS=5 TOTAL_LAUNCHED_REDUCES=1 RACK_LOCAL_MAPS=5 SLOTS_MILLIS_MAPS=279653 SLOTS_MILLIS_REDUCES=211474 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira