[jira] [Updated] (MAPREDUCE-3597) Provide a way to access other info of history file from Rumentool

2012-01-13 Thread Ravi Gummadi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-3597:


Attachment: 3597.branch-1.v1.patch

Attaching patch for branch-1.

 Provide a way to access other info of history file from Rumentool
 -

 Key: MAPREDUCE-3597
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3597
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Affects Versions: 0.24.0
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Fix For: 0.24.0

 Attachments: 3597.branch-1.v1.patch, 3597.v0.patch, 3597.v1.patch


 As the trace file generated by Rumen TraceBuilder is skipping some of the 
 info like job counters, task counters, etc. we need a way to access other 
 info available in history file which is not dumped to trace file. This is 
 useful for components which want to parse history files and get info. These 
 components can directly use/leverage Rumen's parsing of history files across 
 hadoop releases and get history info in a consistent way for further 
 analysis/processing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3664) HDFS Federation Documentation has incorrect configuration example

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185569#comment-13185569
 ] 

Hudson commented on MAPREDUCE-3664:
---

Integrated in Hadoop-Hdfs-trunk #924 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/924/])
MAPREDUCE-3664. Federation Documentation has incorrect configuration 
example. Contributed by Brandon Li.

jitendra : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1230708
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/Federation.apt.vm


 HDFS Federation Documentation has incorrect configuration example
 -

 Key: MAPREDUCE-3664
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3664
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.23.0, 0.24.0
Reporter: praveen sripati
Priority: Minor
 Attachments: HDFS-2778.txt, HDFS-2778.txt


 HDFS Federation documentation example (1) has the following
 property
 namedfs.namenode.rpc-address.ns1/name
 valuehdfs://nn-host1:rpc-port/value
 /property
 dfs.namenode.rpc-address.* should be set to hostname:port, hdfs:// should not 
 be there.
 (1) - 
 http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3545) Remove Avro RPC

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185568#comment-13185568
 ] 

Hudson commented on MAPREDUCE-3545:
---

Integrated in Hadoop-Hdfs-trunk #924 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/924/])
Remove the empty avro directories for MAPREDUCE-3545.

szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1230886
Files : 
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/avro
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/avro
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/avro
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/avro


 Remove Avro RPC
 ---

 Key: MAPREDUCE-3545
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3545
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.1
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.23.1, 0.24.0

 Attachments: MR-3545.txt


 Please see the discussion in HDFS-2660 for more details. I have created a 
 branch HADOOP-6659 to save the Avro work, if in the future some one wants to 
 use the work that existed to add support for Avro RPC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3597) Provide a way to access other info of history file from Rumentool

2012-01-13 Thread Amar Kamat (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185576#comment-13185576
 ] 

Amar Kamat commented on MAPREDUCE-3597:
---

The patch looks good to me. It seems that branch-1 Rumen is aware of pre and 
post 21 changes. We need to be sure of the implications.

 Provide a way to access other info of history file from Rumentool
 -

 Key: MAPREDUCE-3597
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3597
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Affects Versions: 0.24.0
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Fix For: 0.24.0

 Attachments: 3597.branch-1.v1.patch, 3597.v0.patch, 3597.v1.patch


 As the trace file generated by Rumen TraceBuilder is skipping some of the 
 info like job counters, task counters, etc. we need a way to access other 
 info available in history file which is not dumped to trace file. This is 
 useful for components which want to parse history files and get info. These 
 components can directly use/leverage Rumen's parsing of history files across 
 hadoop releases and get history info in a consistent way for further 
 analysis/processing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3545) Remove Avro RPC

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185578#comment-13185578
 ] 

Hudson commented on MAPREDUCE-3545:
---

Integrated in Hadoop-Mapreduce-trunk #957 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/957/])
Remove the empty avro directories for MAPREDUCE-3545.

szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1230886
Files : 
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/avro
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/avro
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/avro
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/avro


 Remove Avro RPC
 ---

 Key: MAPREDUCE-3545
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3545
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.1
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.23.1, 0.24.0

 Attachments: MR-3545.txt


 Please see the discussion in HDFS-2660 for more details. I have created a 
 branch HADOOP-6659 to save the Avro work, if in the future some one wants to 
 use the work that existed to add support for Avro RPC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3664) HDFS Federation Documentation has incorrect configuration example

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185579#comment-13185579
 ] 

Hudson commented on MAPREDUCE-3664:
---

Integrated in Hadoop-Mapreduce-trunk #957 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/957/])
MAPREDUCE-3664. Federation Documentation has incorrect configuration 
example. Contributed by Brandon Li.

jitendra : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1230708
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/Federation.apt.vm


 HDFS Federation Documentation has incorrect configuration example
 -

 Key: MAPREDUCE-3664
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3664
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.23.0, 0.24.0
Reporter: praveen sripati
Priority: Minor
 Attachments: HDFS-2778.txt, HDFS-2778.txt


 HDFS Federation documentation example (1) has the following
 property
 namedfs.namenode.rpc-address.ns1/name
 valuehdfs://nn-host1:rpc-port/value
 /property
 dfs.namenode.rpc-address.* should be set to hostname:port, hdfs:// should not 
 be there.
 (1) - 
 http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3667) Gridmix jobs are failing with OOM in reduce shuffle phase.

2012-01-13 Thread Amol Kekre (Created) (JIRA)
Gridmix jobs are failing with OOM in reduce shuffle phase.
--

 Key: MAPREDUCE-3667
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3667
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Amol Kekre
Priority: Blocker
 Fix For: 0.23.1


Roll up bug for gridmix3 benchmark

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code

2012-01-13 Thread Siddharth Seth (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3656:
--

Status: Open  (was: Patch Available)

Cancelling patch. Needs another fix. Sort is completing with this patch + 
MR3596, but there's random map task failures. TaskAttemptListener should be 
returning a null JvmTask instead of JvmTask.task=null.

 Sort job on 350 scale is consistently failing with latest MRV2 code 
 

 Key: MAPREDUCE-3656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MR3656.txt, MR3656.txt


 With the code checked out on last two days. 
 Sort Job on 350 node scale with 16800 maps and 680 reduces consistently 
 failing for around last 6 runs
 When around 50% of maps are completed, suddenly job jumps to failed state.
 On looking at NM log, found RM sent Stop Container Request to NM for AM 
 container.
 But at INFO level from RM log not able find why RM is killing AM when job is 
 not killed manually.
 One thing found common on failed AM logs is -:
 org.apache.hadoop.yarn.state.InvalidStateTransitonException
 With with different.
 For e.g. One log says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_UPDATE at ASSIGNED 
 {code}
 Whereas other logs says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_COUNTER_UPDATE at ERROR
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code

2012-01-13 Thread Siddharth Seth (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3656:
--

Attachment: MR3656.txt

Yet another patch. Hopefully this one has everything resolved.

 Sort job on 350 scale is consistently failing with latest MRV2 code 
 

 Key: MAPREDUCE-3656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MR3656.txt, MR3656.txt, MR3656.txt


 With the code checked out on last two days. 
 Sort Job on 350 node scale with 16800 maps and 680 reduces consistently 
 failing for around last 6 runs
 When around 50% of maps are completed, suddenly job jumps to failed state.
 On looking at NM log, found RM sent Stop Container Request to NM for AM 
 container.
 But at INFO level from RM log not able find why RM is killing AM when job is 
 not killed manually.
 One thing found common on failed AM logs is -:
 org.apache.hadoop.yarn.state.InvalidStateTransitonException
 With with different.
 For e.g. One log says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_UPDATE at ASSIGNED 
 {code}
 Whereas other logs says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_COUNTER_UPDATE at ERROR
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code

2012-01-13 Thread Siddharth Seth (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3656:
--

Status: Patch Available  (was: Open)

 Sort job on 350 scale is consistently failing with latest MRV2 code 
 

 Key: MAPREDUCE-3656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MR3656.txt, MR3656.txt, MR3656.txt


 With the code checked out on last two days. 
 Sort Job on 350 node scale with 16800 maps and 680 reduces consistently 
 failing for around last 6 runs
 When around 50% of maps are completed, suddenly job jumps to failed state.
 On looking at NM log, found RM sent Stop Container Request to NM for AM 
 container.
 But at INFO level from RM log not able find why RM is killing AM when job is 
 not killed manually.
 One thing found common on failed AM logs is -:
 org.apache.hadoop.yarn.state.InvalidStateTransitonException
 With with different.
 For e.g. One log says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_UPDATE at ASSIGNED 
 {code}
 Whereas other logs says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_COUNTER_UPDATE at ERROR
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3668) AccessControlException when running mapred job -list command

2012-01-13 Thread Jason Lowe (Created) (JIRA)
AccessControlException when running mapred job -list command


 Key: MAPREDUCE-3668
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3668
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2, security
Affects Versions: 0.23.1
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker


If a user tries to examine the status of all jobs running on a secure cluster 
the mapred client can fail with an AccessControlException.  For example, 
submitting two jobs each from a different user then trying to query the status 
as the second user can fail like this:

$ mapred job -list all
12/01/12 20:01:12 WARN conf.Configuration: mapred.used.genericoptionsparser is 
deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
Total jobs:2
JobId   State   StartTime   UserNameQueue   PriorityMaps
Reduces UsedContainers  RsvdContainers UsedMem RsvdMem NeededMem   AM info
12/01/12 20:01:14 INFO mapred.ClientServiceDelegate: Application state is 
completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
job_1326396427223_0002  SUCCEEDED   1326398424244   user2default 
NORMAL  2   2   0   0  0M  0M  0M 
hostremoved:8088/proxy/application_1326396427223_0002/jobhistory/job/job_1326396427223_2_2
12/01/12 20:01:14 INFO mapred.ClientServiceDelegate: Application state is 
completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
12/01/12 20:01:14 WARN mapred.ClientServiceDelegate: Error from remote end: 
User user2 cannot perform operation VIEW_JOB on job_1326396427223_0001
Exception in thread main RemoteTrace: 
java.security.AccessControlException: User user2 cannot perform operation 
VIEW_JOB on job_1326396427223_0001
at 
org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.checkAccess(HistoryClientService.java:293)
at 
org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:184)
at 
org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.getJobReport(HistoryClientService.java:200)
at 
org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:106)
at 
org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:187)
at 
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:344)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1490)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1486)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1484)
 at Local Trace: 
org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
User user2 cannot perform operation VIEW_JOB on job_1326396427223_0001
at 
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:151)
at $Proxy10.getJobReport(Unknown Source)
at 
org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:328)
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:405)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:431)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:186)
at org.apache.hadoop.mapreduce.tools.CLI.displayJobList(CLI.java:571)
at org.apache.hadoop.mapreduce.tools.CLI.listAllJobs(CLI.java:500)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:298)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1209)


The information provided by the command is similar to what is presented on the 
ResourceManager web UI, and that page has no security.

Marking this as a blocker since many of our automated acceptance tests use this 
command to obtain the status of jobs 

[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code

2012-01-13 Thread Vinod Kumar Vavilapalli (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185805#comment-13185805
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3656:


Glad you ran it on a cluster before commit. +1 for the latest fix.

 Sort job on 350 scale is consistently failing with latest MRV2 code 
 

 Key: MAPREDUCE-3656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MR3656.txt, MR3656.txt, MR3656.txt


 With the code checked out on last two days. 
 Sort Job on 350 node scale with 16800 maps and 680 reduces consistently 
 failing for around last 6 runs
 When around 50% of maps are completed, suddenly job jumps to failed state.
 On looking at NM log, found RM sent Stop Container Request to NM for AM 
 container.
 But at INFO level from RM log not able find why RM is killing AM when job is 
 not killed manually.
 One thing found common on failed AM logs is -:
 org.apache.hadoop.yarn.state.InvalidStateTransitonException
 With with different.
 For e.g. One log says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_UPDATE at ASSIGNED 
 {code}
 Whereas other logs says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_COUNTER_UPDATE at ERROR
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code

2012-01-13 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185808#comment-13185808
 ] 

Hadoop QA commented on MAPREDUCE-3656:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510512/MR3656.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1610//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1610//console

This message is automatically generated.

 Sort job on 350 scale is consistently failing with latest MRV2 code 
 

 Key: MAPREDUCE-3656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MR3656.txt, MR3656.txt, MR3656.txt


 With the code checked out on last two days. 
 Sort Job on 350 node scale with 16800 maps and 680 reduces consistently 
 failing for around last 6 runs
 When around 50% of maps are completed, suddenly job jumps to failed state.
 On looking at NM log, found RM sent Stop Container Request to NM for AM 
 container.
 But at INFO level from RM log not able find why RM is killing AM when job is 
 not killed manually.
 One thing found common on failed AM logs is -:
 org.apache.hadoop.yarn.state.InvalidStateTransitonException
 With with different.
 For e.g. One log says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_UPDATE at ASSIGNED 
 {code}
 Whereas other logs says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_COUNTER_UPDATE at ERROR
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3669) Getting a lot of PriviledgedActionException / SaslException when running a job

2012-01-13 Thread Thomas Graves (Created) (JIRA)
Getting a lot of PriviledgedActionException / SaslException when running a job
--

 Key: MAPREDUCE-3669
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3669
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
Priority: Blocker


On a secure cluster, when running a job we are seeing a lot of 
PriviledgedActionException / SaslExceptions.  The job runs fine, its just the 
jobclient can't connect to the AM to get the progress information.

Its in a very tight loop retrying while getting the exceptions.

snip of the client log is:
12/01/13 15:33:45 INFO security.SecurityUtil: Acquired token Ident: 00 1c 68 61 
64 6f 6f 70 71 61 40 44 45 56 2e 59 47
52 49 44 2e 59 41 48 4f 4f 2e 43 4f 4d 08 6d 61 70 72 65 64 71 61 00 8a 01 34 
d7 b3 ff f5 8a 01 34 fb c0 83 f5 08 02,
Kind: HDFS_DELEGATION_TOKEN, Service: 10.10.10.10:8020
12/01/13 15:33:45 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 8 
for user1 on 10.10.10.10:8020
12/01/13 15:33:45 INFO security.TokenCache: Got dt for
hdfs://host1.domain.com:8020;uri=10.10.10.10:8020;t.service=10.10.10.10:8020
12/01/13 15:33:45 WARN conf.Configuration: mapred.used.genericoptionsparser is 
deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
12/01/13 15:33:45 INFO mapreduce.JobSubmitter: number of splits:2
12/01/13 15:33:45 INFO mapred.ResourceMgrDelegate: Submitted application 
application_1326410042859_0008 to
ResourceManager at rmhost.domain/10.10.10.11:8040
12/01/13 15:33:45 INFO mapreduce.Job: Running job: job_1326410042859_0008
12/01/13 15:33:52 INFO mapred.ClientServiceDelegate: The url to track the job:
rmhost.domain:8088/proxy/application_1326410042859_0008/
12/01/13 15:33:52 ERROR security.UserGroupInformation: 
PriviledgedActionException as:us...@dev.ygrid.yahoo.com
(auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed 
[Caused by GSSException: No valid credentials provided (Mechanism level: Fail
ed to find any
Kerberos tgt)]
12/01/13 15:33:52 WARN ipc.Client: Exception encountered while connecting to 
the server :
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided
(Mechanism level: Failed to find any Kerberos tgt)]
12/01/13 15:33:52 ERROR security.UserGroupInformation: 
PriviledgedActionException as:us...@dev.ygrid.yahoo.com
(auth:SIMPLE) cause:java.io.IOException: javax.security.sasl.SaslException: GSS 
initiate failed [Caused by GSSException: No valid credentials provided (
Mechanism level:
Failed to find any Kerberos tgt)]
12/01/13 15:33:52 INFO mapred.ClientServiceDelegate: The url to track the job:
rmhost.domain:8088/proxy/application_1326410042859_0008/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3668) AccessControlException when running mapred job -list command

2012-01-13 Thread Vinod Kumar Vavilapalli (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185828#comment-13185828
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3668:


A quick fix that comes to mind is to catch and ignore AccessControlExcpetions 
on the client side, but there is a bigger underlying issue.

job -list going to each and every AM is not going to scale. As part of 
MAPREDUCE-3476, I am moving all the per-AM information to job -status.

I am going to work on MAPREDUCE-3476 soon, but if that gets late, we can push 
the quick fix in.

 AccessControlException when running mapred job -list command
 

 Key: MAPREDUCE-3668
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3668
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2, security
Affects Versions: 0.23.1
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker

 If a user tries to examine the status of all jobs running on a secure cluster 
 the mapred client can fail with an AccessControlException.  For example, 
 submitting two jobs each from a different user then trying to query the 
 status as the second user can fail like this:
 $ mapred job -list all
 12/01/12 20:01:12 WARN conf.Configuration: mapred.used.genericoptionsparser 
 is deprecated. Instead, use
 mapreduce.client.genericoptionsparser.used
 Total jobs:2
 JobId   State   StartTime   UserNameQueue   PriorityMaps  
   Reduces UsedContainers  RsvdContainers UsedMem RsvdMem NeededMem   AM 
 info
 12/01/12 20:01:14 INFO mapred.ClientServiceDelegate: Application state is 
 completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
 job_1326396427223_0002  SUCCEEDED   1326398424244   user2default 
 NORMAL  2   2   0   0  0M  0M  0M 
 hostremoved:8088/proxy/application_1326396427223_0002/jobhistory/job/job_1326396427223_2_2
 12/01/12 20:01:14 INFO mapred.ClientServiceDelegate: Application state is 
 completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
 12/01/12 20:01:14 WARN mapred.ClientServiceDelegate: Error from remote end: 
 User user2 cannot perform operation VIEW_JOB on job_1326396427223_0001
 Exception in thread main RemoteTrace: 
 java.security.AccessControlException: User user2 cannot perform operation 
 VIEW_JOB on job_1326396427223_0001
 at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.checkAccess(HistoryClientService.java:293)
 at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:184)
 at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.getJobReport(HistoryClientService.java:200)
 at 
 org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:106)
 at 
 org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:187)
 at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:344)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1490)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1486)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1484)
  at Local Trace: 
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
 User user2 cannot perform operation VIEW_JOB on job_1326396427223_0001
 at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:151)
 at $Proxy10.getJobReport(Unknown Source)
 at 
 org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:104)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:328)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:405)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:431)
 at 

[jira] [Assigned] (MAPREDUCE-3669) Getting a lot of PriviledgedActionException / SaslException when running a job

2012-01-13 Thread Vinod Kumar Vavilapalli (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned MAPREDUCE-3669:
--

Assignee: Mahadev konar

Mahadev, can you please look at it? This is most likely related to 
MAPREDUCE-3380.

 Getting a lot of PriviledgedActionException / SaslException when running a job
 --

 Key: MAPREDUCE-3669
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3669
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
Assignee: Mahadev konar
Priority: Blocker

 On a secure cluster, when running a job we are seeing a lot of 
 PriviledgedActionException / SaslExceptions.  The job runs fine, its just the 
 jobclient can't connect to the AM to get the progress information.
 Its in a very tight loop retrying while getting the exceptions.
 snip of the client log is:
 12/01/13 15:33:45 INFO security.SecurityUtil: Acquired token Ident: 00 1c 68 
 61 64 6f 6f 70 71 61 40 44 45 56 2e 59 47
 52 49 44 2e 59 41 48 4f 4f 2e 43 4f 4d 08 6d 61 70 72 65 64 71 61 00 8a 01 34 
 d7 b3 ff f5 8a 01 34 fb c0 83 f5 08 02,
 Kind: HDFS_DELEGATION_TOKEN, Service: 10.10.10.10:8020
 12/01/13 15:33:45 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 8 
 for user1 on 10.10.10.10:8020
 12/01/13 15:33:45 INFO security.TokenCache: Got dt for
 hdfs://host1.domain.com:8020;uri=10.10.10.10:8020;t.service=10.10.10.10:8020
 12/01/13 15:33:45 WARN conf.Configuration: mapred.used.genericoptionsparser 
 is deprecated. Instead, use
 mapreduce.client.genericoptionsparser.used
 12/01/13 15:33:45 INFO mapreduce.JobSubmitter: number of splits:2
 12/01/13 15:33:45 INFO mapred.ResourceMgrDelegate: Submitted application 
 application_1326410042859_0008 to
 ResourceManager at rmhost.domain/10.10.10.11:8040
 12/01/13 15:33:45 INFO mapreduce.Job: Running job: job_1326410042859_0008
 12/01/13 15:33:52 INFO mapred.ClientServiceDelegate: The url to track the job:
 rmhost.domain:8088/proxy/application_1326410042859_0008/
 12/01/13 15:33:52 ERROR security.UserGroupInformation: 
 PriviledgedActionException as:us...@dev.ygrid.yahoo.com
 (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed 
 [Caused by GSSException: No valid credentials provided (Mechanism level: Fail
 ed to find any
 Kerberos tgt)]
 12/01/13 15:33:52 WARN ipc.Client: Exception encountered while connecting to 
 the server :
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided
 (Mechanism level: Failed to find any Kerberos tgt)]
 12/01/13 15:33:52 ERROR security.UserGroupInformation: 
 PriviledgedActionException as:us...@dev.ygrid.yahoo.com
 (auth:SIMPLE) cause:java.io.IOException: javax.security.sasl.SaslException: 
 GSS initiate failed [Caused by GSSException: No valid credentials provided (
 Mechanism level:
 Failed to find any Kerberos tgt)]
 12/01/13 15:33:52 INFO mapred.ClientServiceDelegate: The url to track the job:
 rmhost.domain:8088/proxy/application_1326410042859_0008/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3628) DFSIO read throughput is decreased by 16% in 0.23 than Hadoop-0.20.204 on 350 nodes size cluster.

2012-01-13 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3628:
---

Issue Type: Sub-task  (was: Task)
Parent: MAPREDUCE-3561

 DFSIO read throughput is decreased by 16% in 0.23 than Hadoop-0.20.204 on 350 
 nodes size cluster.
 -

 Key: MAPREDUCE-3628
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3628
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Amol Kekre
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.1


 DFSIO read throughput is decreased by 16% in 0.23 than Hadoop-0.20.204 on 350 
 nodes size cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase

2012-01-13 Thread Siddharth Seth (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185850#comment-13185850
 ] 

Siddharth Seth commented on MAPREDUCE-3596:
---

+1. Patch looks good. Also ran a couple of runs of sort with this patch and 
MAPREDUCE-3656 - completed without running into either issue.

 Sort benchmark got hang after completion of 99% map phase
 -

 Key: MAPREDUCE-3596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3596-20120111.1.txt, 
 MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, 
 MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2


 Courtesy [~vinaythota]
 {quote}
 Ran sort benchmark couple of times and every time the job got hang after 
 completion 99% map phase. There are some map tasks failed. Also it's not 
 scheduled some of the pending map tasks.
 Cluster size is 350 nodes.
 Build Details:
 ==
 Compiled:   Fri Dec 9 16:25:27 PST 2011 by someone from 
 branches/branch-0.23/hadoop-common-project/hadoop-common 
 ResourceManager version:revision 1212681 by someone source checksum 
 on Fri Dec 9 16:52:07 PST 2011
 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 
 2011
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code

2012-01-13 Thread Siddharth Seth (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185852#comment-13185852
 ] 

Siddharth Seth commented on MAPREDUCE-3656:
---

Ran sort again with this patch and MAPREDUCE-3596. Completed without either 
error.

 Sort job on 350 scale is consistently failing with latest MRV2 code 
 

 Key: MAPREDUCE-3656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MR3656.txt, MR3656.txt, MR3656.txt


 With the code checked out on last two days. 
 Sort Job on 350 node scale with 16800 maps and 680 reduces consistently 
 failing for around last 6 runs
 When around 50% of maps are completed, suddenly job jumps to failed state.
 On looking at NM log, found RM sent Stop Container Request to NM for AM 
 container.
 But at INFO level from RM log not able find why RM is killing AM when job is 
 not killed manually.
 One thing found common on failed AM logs is -:
 org.apache.hadoop.yarn.state.InvalidStateTransitonException
 With with different.
 For e.g. One log says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_UPDATE at ASSIGNED 
 {code}
 Whereas other logs says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_COUNTER_UPDATE at ERROR
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase

2012-01-13 Thread Siddharth Seth (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3596:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-0.23. Thanks Vinod

 Sort benchmark got hang after completion of 99% map phase
 -

 Key: MAPREDUCE-3596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3596-20120111.1.txt, 
 MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, 
 MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2


 Courtesy [~vinaythota]
 {quote}
 Ran sort benchmark couple of times and every time the job got hang after 
 completion 99% map phase. There are some map tasks failed. Also it's not 
 scheduled some of the pending map tasks.
 Cluster size is 350 nodes.
 Build Details:
 ==
 Compiled:   Fri Dec 9 16:25:27 PST 2011 by someone from 
 branches/branch-0.23/hadoop-common-project/hadoop-common 
 ResourceManager version:revision 1212681 by someone source checksum 
 on Fri Dec 9 16:52:07 PST 2011
 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 
 2011
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185868#comment-13185868
 ] 

Hudson commented on MAPREDUCE-3596:
---

Integrated in Hadoop-Hdfs-0.23-Commit #363 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/363/])
merge MAPREDUCE-3596 from trunk. Fix scheduler to handle cleaned up 
containers, which NMs may subsequently report as running. (Contributed by Vinod 
Kumar Vavilapalli)

sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231303
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Sort benchmark got hang after completion of 99% map phase
 -

 Key: MAPREDUCE-3596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3596-20120111.1.txt, 
 MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, 
 MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2


 Courtesy [~vinaythota]
 {quote}
 Ran sort benchmark couple of times and every time the job got hang after 
 completion 99% map phase. There are some map tasks failed. Also it's not 
 scheduled some of the pending map tasks.
 Cluster size is 350 nodes.
 Build Details:
 ==
 Compiled:   Fri Dec 9 16:25:27 PST 2011 by someone from 
 branches/branch-0.23/hadoop-common-project/hadoop-common 
 ResourceManager version:revision 1212681 by someone source checksum 
 on Fri Dec 9 16:52:07 PST 2011
 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 
 2011
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on 

[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185869#comment-13185869
 ] 

Hudson commented on MAPREDUCE-3596:
---

Integrated in Hadoop-Hdfs-trunk-Commit #1612 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1612/])
MAPREDUCE-3596. Fix scheduler to handle cleaned up containers, which NMs 
may subsequently report as running. (Contributed by Vinod Kumar Vavilapalli)

sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231297
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Sort benchmark got hang after completion of 99% map phase
 -

 Key: MAPREDUCE-3596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3596-20120111.1.txt, 
 MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, 
 MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2


 Courtesy [~vinaythota]
 {quote}
 Ran sort benchmark couple of times and every time the job got hang after 
 completion 99% map phase. There are some map tasks failed. Also it's not 
 scheduled some of the pending map tasks.
 Cluster size is 350 nodes.
 Build Details:
 ==
 Compiled:   Fri Dec 9 16:25:27 PST 2011 by someone from 
 branches/branch-0.23/hadoop-common-project/hadoop-common 
 ResourceManager version:revision 1212681 by someone source checksum 
 on Fri Dec 9 16:52:07 PST 2011
 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 
 2011
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185870#comment-13185870
 ] 

Hudson commented on MAPREDUCE-3596:
---

Integrated in Hadoop-Common-trunk-Commit #1539 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1539/])
MAPREDUCE-3596. Fix scheduler to handle cleaned up containers, which NMs 
may subsequently report as running. (Contributed by Vinod Kumar Vavilapalli)

sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231297
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Sort benchmark got hang after completion of 99% map phase
 -

 Key: MAPREDUCE-3596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3596-20120111.1.txt, 
 MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, 
 MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2


 Courtesy [~vinaythota]
 {quote}
 Ran sort benchmark couple of times and every time the job got hang after 
 completion 99% map phase. There are some map tasks failed. Also it's not 
 scheduled some of the pending map tasks.
 Cluster size is 350 nodes.
 Build Details:
 ==
 Compiled:   Fri Dec 9 16:25:27 PST 2011 by someone from 
 branches/branch-0.23/hadoop-common-project/hadoop-common 
 ResourceManager version:revision 1212681 by someone source checksum 
 on Fri Dec 9 16:52:07 PST 2011
 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 
 2011
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185872#comment-13185872
 ] 

Hudson commented on MAPREDUCE-3596:
---

Integrated in Hadoop-Common-0.23-Commit #373 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/373/])
merge MAPREDUCE-3596 from trunk. Fix scheduler to handle cleaned up 
containers, which NMs may subsequently report as running. (Contributed by Vinod 
Kumar Vavilapalli)

sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231303
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Sort benchmark got hang after completion of 99% map phase
 -

 Key: MAPREDUCE-3596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3596-20120111.1.txt, 
 MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, 
 MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2


 Courtesy [~vinaythota]
 {quote}
 Ran sort benchmark couple of times and every time the job got hang after 
 completion 99% map phase. There are some map tasks failed. Also it's not 
 scheduled some of the pending map tasks.
 Cluster size is 350 nodes.
 Build Details:
 ==
 Compiled:   Fri Dec 9 16:25:27 PST 2011 by someone from 
 branches/branch-0.23/hadoop-common-project/hadoop-common 
 ResourceManager version:revision 1212681 by someone source checksum 
 on Fri Dec 9 16:52:07 PST 2011
 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 
 2011
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information 

[jira] [Commented] (MAPREDUCE-3669) Getting a lot of PriviledgedActionException / SaslException when running a job

2012-01-13 Thread Mahadev konar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185871#comment-13185871
 ] 

Mahadev konar commented on MAPREDUCE-3669:
--

Ok,
 I think I figured out what the issue is, given that I cannot reproduce this. 
This is the service classloading that we do that is causing the issue. For 
MRClientProtocol, we have two Security info's, HSSecurityInfo and 
MRClientSecurityInfo. Depending on which class is loaded first, something will 
break, either talking to the HS or AM. This was working until now because 
HSSecurityInfo worked only for kerberos and MRClientSecurityInfo only for 
tokens. After I added tokens to HSSecurityInfo, this is an issue.


 Getting a lot of PriviledgedActionException / SaslException when running a job
 --

 Key: MAPREDUCE-3669
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3669
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
Assignee: Mahadev konar
Priority: Blocker

 On a secure cluster, when running a job we are seeing a lot of 
 PriviledgedActionException / SaslExceptions.  The job runs fine, its just the 
 jobclient can't connect to the AM to get the progress information.
 Its in a very tight loop retrying while getting the exceptions.
 snip of the client log is:
 12/01/13 15:33:45 INFO security.SecurityUtil: Acquired token Ident: 00 1c 68 
 61 64 6f 6f 70 71 61 40 44 45 56 2e 59 47
 52 49 44 2e 59 41 48 4f 4f 2e 43 4f 4d 08 6d 61 70 72 65 64 71 61 00 8a 01 34 
 d7 b3 ff f5 8a 01 34 fb c0 83 f5 08 02,
 Kind: HDFS_DELEGATION_TOKEN, Service: 10.10.10.10:8020
 12/01/13 15:33:45 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 8 
 for user1 on 10.10.10.10:8020
 12/01/13 15:33:45 INFO security.TokenCache: Got dt for
 hdfs://host1.domain.com:8020;uri=10.10.10.10:8020;t.service=10.10.10.10:8020
 12/01/13 15:33:45 WARN conf.Configuration: mapred.used.genericoptionsparser 
 is deprecated. Instead, use
 mapreduce.client.genericoptionsparser.used
 12/01/13 15:33:45 INFO mapreduce.JobSubmitter: number of splits:2
 12/01/13 15:33:45 INFO mapred.ResourceMgrDelegate: Submitted application 
 application_1326410042859_0008 to
 ResourceManager at rmhost.domain/10.10.10.11:8040
 12/01/13 15:33:45 INFO mapreduce.Job: Running job: job_1326410042859_0008
 12/01/13 15:33:52 INFO mapred.ClientServiceDelegate: The url to track the job:
 rmhost.domain:8088/proxy/application_1326410042859_0008/
 12/01/13 15:33:52 ERROR security.UserGroupInformation: 
 PriviledgedActionException as:us...@dev.ygrid.yahoo.com
 (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed 
 [Caused by GSSException: No valid credentials provided (Mechanism level: Fail
 ed to find any
 Kerberos tgt)]
 12/01/13 15:33:52 WARN ipc.Client: Exception encountered while connecting to 
 the server :
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided
 (Mechanism level: Failed to find any Kerberos tgt)]
 12/01/13 15:33:52 ERROR security.UserGroupInformation: 
 PriviledgedActionException as:us...@dev.ygrid.yahoo.com
 (auth:SIMPLE) cause:java.io.IOException: javax.security.sasl.SaslException: 
 GSS initiate failed [Caused by GSSException: No valid credentials provided (
 Mechanism level:
 Failed to find any Kerberos tgt)]
 12/01/13 15:33:52 INFO mapred.ClientServiceDelegate: The url to track the job:
 rmhost.domain:8088/proxy/application_1326410042859_0008/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code

2012-01-13 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3656:
---

  Resolution: Fixed
Release Note: Fixed a race condition in MR AM which is failing the sort 
benchmark consistently.
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I just committed this to trunk and merged it into branch-0.23. Thanks Sid!

 Sort job on 350 scale is consistently failing with latest MRV2 code 
 

 Key: MAPREDUCE-3656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MR3656.txt, MR3656.txt, MR3656.txt


 With the code checked out on last two days. 
 Sort Job on 350 node scale with 16800 maps and 680 reduces consistently 
 failing for around last 6 runs
 When around 50% of maps are completed, suddenly job jumps to failed state.
 On looking at NM log, found RM sent Stop Container Request to NM for AM 
 container.
 But at INFO level from RM log not able find why RM is killing AM when job is 
 not killed manually.
 One thing found common on failed AM logs is -:
 org.apache.hadoop.yarn.state.InvalidStateTransitonException
 With with different.
 For e.g. One log says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_UPDATE at ASSIGNED 
 {code}
 Whereas other logs says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_COUNTER_UPDATE at ERROR
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185881#comment-13185881
 ] 

Hudson commented on MAPREDUCE-3596:
---

Integrated in Hadoop-Mapreduce-0.23-Commit #385 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/385/])
merge MAPREDUCE-3596 from trunk. Fix scheduler to handle cleaned up 
containers, which NMs may subsequently report as running. (Contributed by Vinod 
Kumar Vavilapalli)

sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231303
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Sort benchmark got hang after completion of 99% map phase
 -

 Key: MAPREDUCE-3596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3596-20120111.1.txt, 
 MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, 
 MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2


 Courtesy [~vinaythota]
 {quote}
 Ran sort benchmark couple of times and every time the job got hang after 
 completion 99% map phase. There are some map tasks failed. Also it's not 
 scheduled some of the pending map tasks.
 Cluster size is 350 nodes.
 Build Details:
 ==
 Compiled:   Fri Dec 9 16:25:27 PST 2011 by someone from 
 branches/branch-0.23/hadoop-common-project/hadoop-common 
 ResourceManager version:revision 1212681 by someone source checksum 
 on Fri Dec 9 16:52:07 PST 2011
 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 
 2011
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more 

[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185884#comment-13185884
 ] 

Hudson commented on MAPREDUCE-3656:
---

Integrated in Hadoop-Hdfs-0.23-Commit #364 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/364/])
MAPREDUCE-3656. Fixed a race condition in MR AM which is failing the sort 
benchmark consistently. Contributed by Siddarth Seth.
svn merge --ignore-ancestry -c 1231314 ../../trunk/

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231316
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskAttemptListener.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java


 Sort job on 350 scale is consistently failing with latest MRV2 code 
 

 Key: MAPREDUCE-3656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MR3656.txt, MR3656.txt, MR3656.txt


 With the code checked out on last two days. 
 Sort Job on 350 node scale with 16800 maps and 680 reduces consistently 
 failing for around last 6 runs
 When around 50% of maps are completed, suddenly job jumps to failed state.
 On looking at NM log, found RM sent Stop Container Request to NM for AM 
 container.
 But at INFO level from RM log not able find why RM is killing AM when job is 
 not killed manually.
 One thing found common on failed AM logs is -:
 org.apache.hadoop.yarn.state.InvalidStateTransitonException
 With with different.
 For e.g. One log says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_UPDATE at ASSIGNED 
 {code}
 Whereas other logs says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_COUNTER_UPDATE at ERROR
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185886#comment-13185886
 ] 

Hudson commented on MAPREDUCE-3656:
---

Integrated in Hadoop-Common-trunk-Commit #1540 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1540/])
MAPREDUCE-3656. Fixed a race condition in MR AM which is failing the sort 
benchmark consistently. Contributed by Siddarth Seth.

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231314
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskAttemptListener.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java


 Sort job on 350 scale is consistently failing with latest MRV2 code 
 

 Key: MAPREDUCE-3656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MR3656.txt, MR3656.txt, MR3656.txt


 With the code checked out on last two days. 
 Sort Job on 350 node scale with 16800 maps and 680 reduces consistently 
 failing for around last 6 runs
 When around 50% of maps are completed, suddenly job jumps to failed state.
 On looking at NM log, found RM sent Stop Container Request to NM for AM 
 container.
 But at INFO level from RM log not able find why RM is killing AM when job is 
 not killed manually.
 One thing found common on failed AM logs is -:
 org.apache.hadoop.yarn.state.InvalidStateTransitonException
 With with different.
 For e.g. One log says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_UPDATE at ASSIGNED 
 {code}
 Whereas other logs says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_COUNTER_UPDATE at ERROR
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185885#comment-13185885
 ] 

Hudson commented on MAPREDUCE-3656:
---

Integrated in Hadoop-Common-0.23-Commit #374 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/374/])
MAPREDUCE-3656. Fixed a race condition in MR AM which is failing the sort 
benchmark consistently. Contributed by Siddarth Seth.
svn merge --ignore-ancestry -c 1231314 ../../trunk/

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231316
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskAttemptListener.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java


 Sort job on 350 scale is consistently failing with latest MRV2 code 
 

 Key: MAPREDUCE-3656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MR3656.txt, MR3656.txt, MR3656.txt


 With the code checked out on last two days. 
 Sort Job on 350 node scale with 16800 maps and 680 reduces consistently 
 failing for around last 6 runs
 When around 50% of maps are completed, suddenly job jumps to failed state.
 On looking at NM log, found RM sent Stop Container Request to NM for AM 
 container.
 But at INFO level from RM log not able find why RM is killing AM when job is 
 not killed manually.
 One thing found common on failed AM logs is -:
 org.apache.hadoop.yarn.state.InvalidStateTransitonException
 With with different.
 For e.g. One log says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_UPDATE at ASSIGNED 
 {code}
 Whereas other logs says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_COUNTER_UPDATE at ERROR
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185887#comment-13185887
 ] 

Hudson commented on MAPREDUCE-3656:
---

Integrated in Hadoop-Hdfs-trunk-Commit #1613 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1613/])
MAPREDUCE-3656. Fixed a race condition in MR AM which is failing the sort 
benchmark consistently. Contributed by Siddarth Seth.

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231314
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskAttemptListener.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java


 Sort job on 350 scale is consistently failing with latest MRV2 code 
 

 Key: MAPREDUCE-3656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MR3656.txt, MR3656.txt, MR3656.txt


 With the code checked out on last two days. 
 Sort Job on 350 node scale with 16800 maps and 680 reduces consistently 
 failing for around last 6 runs
 When around 50% of maps are completed, suddenly job jumps to failed state.
 On looking at NM log, found RM sent Stop Container Request to NM for AM 
 container.
 But at INFO level from RM log not able find why RM is killing AM when job is 
 not killed manually.
 One thing found common on failed AM logs is -:
 org.apache.hadoop.yarn.state.InvalidStateTransitonException
 With with different.
 For e.g. One log says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_UPDATE at ASSIGNED 
 {code}
 Whereas other logs says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_COUNTER_UPDATE at ERROR
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM

2012-01-13 Thread Vinod Kumar Vavilapalli (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned MAPREDUCE-3532:
--

Assignee: Bhallamudi Venkata Siva Kamesh

 When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs 
 webserver component picks up random port, NM keeps on Reporting 0 port to RM
 --

 Key: MAPREDUCE-3532
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Bhallamudi Venkata Siva Kamesh
Priority: Critical
 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch


 I tried following -:
 yarn.nodemanager.address=0.0.0.0:0
 yarn.nodemanager.webapp.address=0.0.0.0:0
 yarn.nodemanager.localizer.address=0.0.0.0:0
 mapreduce.shuffle.port=0
 When 0 is provided as number in yarn.nodemanager.webapp.address. 
 NM instantiate WebServer as 0 piort e.g.
 {code}
 2011-12-08 11:33:02,467 INFO 
 org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating 
 NMWebApp at 0.0.0.0:0
 {code}
 After that WebServer pick up some random port e.g.
 {code}
 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
 to port 36272
 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26
 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started 
 SelectChannelConnector@0.0.0.0:36272
 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app 
 /node started at 36272
 {code}
 And NM WebServer responds correctly but
  RM's cluster/Nodes page shows the following -:
 {code}
 /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB
 {code}
 Whereas NM:0 is not clickable.
 Seems even NM's webserver pick random port but it never gets updated and so 
 NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable
 But verified that MR job runs successfully with random.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3614) finalState UNDEFINED if AM is killed by hand

2012-01-13 Thread Siddharth Seth (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185893#comment-13185893
 ] 

Siddharth Seth commented on MAPREDUCE-3614:
---

For the job history related changes - do we want SIGTERM jobs to show up as 
KILLED ? In that case the proposed change to the shutdown hook will be required.
Otherwise another possibility would be to ensure JobHistoryEventHandler.stop() 
calls / has already called closeEventWriter() - which is what takes care of 
moving the history file to the correct location.

 finalState UNDEFINED if AM is killed by hand
 

 Key: MAPREDUCE-3614
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3614
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: MAPREDUCE-3614.branch-0.23.patch


 Courtesy [~dcapwell]
 {quote}
 If the AM is running and you kill the process (sudo kill #pid), the State in 
 Yarn would be FINISHED and FinalStatus is UNDEFINED.  The Tracking UI would 
 say History and point to the proxy url (which will redirect to the history 
 server).
 The state should be more descriptive that the job failed and the tracker url 
 shouldn't point to the history server.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185900#comment-13185900
 ] 

Hudson commented on MAPREDUCE-3656:
---

Integrated in Hadoop-Mapreduce-0.23-Commit #386 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/386/])
MAPREDUCE-3656. Fixed a race condition in MR AM which is failing the sort 
benchmark consistently. Contributed by Siddarth Seth.
svn merge --ignore-ancestry -c 1231314 ../../trunk/

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231316
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskAttemptListener.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java


 Sort job on 350 scale is consistently failing with latest MRV2 code 
 

 Key: MAPREDUCE-3656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MR3656.txt, MR3656.txt, MR3656.txt


 With the code checked out on last two days. 
 Sort Job on 350 node scale with 16800 maps and 680 reduces consistently 
 failing for around last 6 runs
 When around 50% of maps are completed, suddenly job jumps to failed state.
 On looking at NM log, found RM sent Stop Container Request to NM for AM 
 container.
 But at INFO level from RM log not able find why RM is killing AM when job is 
 not killed manually.
 One thing found common on failed AM logs is -:
 org.apache.hadoop.yarn.state.InvalidStateTransitonException
 With with different.
 For e.g. One log says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_UPDATE at ASSIGNED 
 {code}
 Whereas other logs says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_COUNTER_UPDATE at ERROR
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185901#comment-13185901
 ] 

Hudson commented on MAPREDUCE-3596:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #1557 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1557/])
MAPREDUCE-3596. Fix scheduler to handle cleaned up containers, which NMs 
may subsequently report as running. (Contributed by Vinod Kumar Vavilapalli)

sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231297
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Sort benchmark got hang after completion of 99% map phase
 -

 Key: MAPREDUCE-3596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3596-20120111.1.txt, 
 MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.1.txt, 
 MAPREDUCE-3596-20120112.txt, logs.tar.bz2, logs.tar.bz2


 Courtesy [~vinaythota]
 {quote}
 Ran sort benchmark couple of times and every time the job got hang after 
 completion 99% map phase. There are some map tasks failed. Also it's not 
 scheduled some of the pending map tasks.
 Cluster size is 350 nodes.
 Build Details:
 ==
 Compiled:   Fri Dec 9 16:25:27 PST 2011 by someone from 
 branches/branch-0.23/hadoop-common-project/hadoop-common 
 ResourceManager version:revision 1212681 by someone source checksum 
 on Fri Dec 9 16:52:07 PST 2011
 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 
 2011
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM

2012-01-13 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3532:
---

Fix Version/s: 0.23.1
   Status: Open  (was: Patch Available)

I looked through the patch. Looks good. +1.

 When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs 
 webserver component picks up random port, NM keeps on Reporting 0 port to RM
 --

 Key: MAPREDUCE-3532
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Bhallamudi Venkata Siva Kamesh
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch


 I tried following -:
 yarn.nodemanager.address=0.0.0.0:0
 yarn.nodemanager.webapp.address=0.0.0.0:0
 yarn.nodemanager.localizer.address=0.0.0.0:0
 mapreduce.shuffle.port=0
 When 0 is provided as number in yarn.nodemanager.webapp.address. 
 NM instantiate WebServer as 0 piort e.g.
 {code}
 2011-12-08 11:33:02,467 INFO 
 org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating 
 NMWebApp at 0.0.0.0:0
 {code}
 After that WebServer pick up some random port e.g.
 {code}
 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
 to port 36272
 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26
 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started 
 SelectChannelConnector@0.0.0.0:36272
 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app 
 /node started at 36272
 {code}
 And NM WebServer responds correctly but
  RM's cluster/Nodes page shows the following -:
 {code}
 /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB
 {code}
 Whereas NM:0 is not clickable.
 Seems even NM's webserver pick random port but it never gets updated and so 
 NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable
 But verified that MR job runs successfully with random.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM

2012-01-13 Thread Vinod Kumar Vavilapalli (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-3532.


  Resolution: Fixed
Release Note: Modified NM to report correct http address when an ephemeral 
web port is configured.
Hadoop Flags: Reviewed

I just committed this to trunk and branch-0.23. Thanks Kamesh!

 When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs 
 webserver component picks up random port, NM keeps on Reporting 0 port to RM
 --

 Key: MAPREDUCE-3532
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Bhallamudi Venkata Siva Kamesh
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch


 I tried following -:
 yarn.nodemanager.address=0.0.0.0:0
 yarn.nodemanager.webapp.address=0.0.0.0:0
 yarn.nodemanager.localizer.address=0.0.0.0:0
 mapreduce.shuffle.port=0
 When 0 is provided as number in yarn.nodemanager.webapp.address. 
 NM instantiate WebServer as 0 piort e.g.
 {code}
 2011-12-08 11:33:02,467 INFO 
 org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating 
 NMWebApp at 0.0.0.0:0
 {code}
 After that WebServer pick up some random port e.g.
 {code}
 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
 to port 36272
 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26
 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started 
 SelectChannelConnector@0.0.0.0:36272
 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app 
 /node started at 36272
 {code}
 And NM WebServer responds correctly but
  RM's cluster/Nodes page shows the following -:
 {code}
 /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB
 {code}
 Whereas NM:0 is not clickable.
 Seems even NM's webserver pick random port but it never gets updated and so 
 NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable
 But verified that MR job runs successfully with random.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3670) TaskAttemptListener should respond with errors to unregistered tasks

2012-01-13 Thread Siddharth Seth (Created) (JIRA)
TaskAttemptListener should respond with errors to unregistered tasks


 Key: MAPREDUCE-3670
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3670
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Siddharth Seth


The TaskAttemptListener currently accepts TaskUmbilical calls from tasks which 
may have already been unregistered and processes updates. It should just send 
back Exceptions so that the tasks die.
This isn't critical though - since the task/container would eventually be 
killed by the AM (via a call to NM stopContainer).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185928#comment-13185928
 ] 

Hudson commented on MAPREDUCE-3532:
---

Integrated in Hadoop-Common-trunk-Commit #1541 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1541/])
MAPREDUCE-3532. Modified NM to report correct http address when an 
ephemeral web port is configured. Contributed by Bhallamudi Venkata Siva Kamesh.

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231342
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java


 When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs 
 webserver component picks up random port, NM keeps on Reporting 0 port to RM
 --

 Key: MAPREDUCE-3532
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Bhallamudi Venkata Siva Kamesh
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch


 I tried following -:
 yarn.nodemanager.address=0.0.0.0:0
 yarn.nodemanager.webapp.address=0.0.0.0:0
 yarn.nodemanager.localizer.address=0.0.0.0:0
 mapreduce.shuffle.port=0
 When 0 is provided as number in yarn.nodemanager.webapp.address. 
 NM instantiate WebServer as 0 piort e.g.
 {code}
 2011-12-08 11:33:02,467 INFO 
 org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating 
 NMWebApp at 0.0.0.0:0
 {code}
 After that WebServer pick up some random port e.g.
 {code}
 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
 to port 36272
 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26
 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started 
 SelectChannelConnector@0.0.0.0:36272
 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app 
 /node started at 36272
 {code}
 And NM WebServer responds correctly but
  RM's cluster/Nodes page shows the following -:
 {code}
 /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB
 {code}
 Whereas NM:0 is not clickable.
 Seems even NM's webserver pick random port but it never gets updated and so 
 NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable
 But verified that MR job runs successfully with random.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185927#comment-13185927
 ] 

Hudson commented on MAPREDUCE-3532:
---

Integrated in Hadoop-Common-0.23-Commit #375 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/375/])
MAPREDUCE-3532. Modified NM to report correct http address when an 
ephemeral web port is configured. Contributed by Bhallamudi Venkata Siva Kamesh.
svn merge --ignore-ancestry -c 1231342 ../../trunk/

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231344
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java


 When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs 
 webserver component picks up random port, NM keeps on Reporting 0 port to RM
 --

 Key: MAPREDUCE-3532
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Bhallamudi Venkata Siva Kamesh
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch


 I tried following -:
 yarn.nodemanager.address=0.0.0.0:0
 yarn.nodemanager.webapp.address=0.0.0.0:0
 yarn.nodemanager.localizer.address=0.0.0.0:0
 mapreduce.shuffle.port=0
 When 0 is provided as number in yarn.nodemanager.webapp.address. 
 NM instantiate WebServer as 0 piort e.g.
 {code}
 2011-12-08 11:33:02,467 INFO 
 org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating 
 NMWebApp at 0.0.0.0:0
 {code}
 After that WebServer pick up some random port e.g.
 {code}
 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
 to port 36272
 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26
 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started 
 SelectChannelConnector@0.0.0.0:36272
 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app 
 /node started at 36272
 {code}
 And NM WebServer responds correctly but
  RM's cluster/Nodes page shows the following -:
 {code}
 /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB
 {code}
 Whereas NM:0 is not clickable.
 Seems even NM's webserver pick random port but it never gets updated and so 
 NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable
 But verified that MR job runs successfully with random.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3671) AM-NM RPC calls occasionally takes a long time to respond

2012-01-13 Thread Siddharth Seth (Created) (JIRA)
AM-NM RPC calls occasionally takes a long time to respond
-

 Key: MAPREDUCE-3671
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3671
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.0
Reporter: Siddharth Seth


Observed while looking at MAPREDUCE-3596 and MAPREDUCE-3656.
startContainer taking over a minute in some cases, otherwise 15 seconds. Both 
were observed soon after reduce tasks started. Network congestion ? Need more 
looking into.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185931#comment-13185931
 ] 

Hudson commented on MAPREDUCE-3532:
---

Integrated in Hadoop-Hdfs-0.23-Commit #365 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/365/])
MAPREDUCE-3532. Modified NM to report correct http address when an 
ephemeral web port is configured. Contributed by Bhallamudi Venkata Siva Kamesh.
svn merge --ignore-ancestry -c 1231342 ../../trunk/

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231344
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java


 When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs 
 webserver component picks up random port, NM keeps on Reporting 0 port to RM
 --

 Key: MAPREDUCE-3532
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Bhallamudi Venkata Siva Kamesh
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch


 I tried following -:
 yarn.nodemanager.address=0.0.0.0:0
 yarn.nodemanager.webapp.address=0.0.0.0:0
 yarn.nodemanager.localizer.address=0.0.0.0:0
 mapreduce.shuffle.port=0
 When 0 is provided as number in yarn.nodemanager.webapp.address. 
 NM instantiate WebServer as 0 piort e.g.
 {code}
 2011-12-08 11:33:02,467 INFO 
 org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating 
 NMWebApp at 0.0.0.0:0
 {code}
 After that WebServer pick up some random port e.g.
 {code}
 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
 to port 36272
 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26
 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started 
 SelectChannelConnector@0.0.0.0:36272
 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app 
 /node started at 36272
 {code}
 And NM WebServer responds correctly but
  RM's cluster/Nodes page shows the following -:
 {code}
 /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB
 {code}
 Whereas NM:0 is not clickable.
 Seems even NM's webserver pick random port but it never gets updated and so 
 NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable
 But verified that MR job runs successfully with random.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185932#comment-13185932
 ] 

Hudson commented on MAPREDUCE-3532:
---

Integrated in Hadoop-Hdfs-trunk-Commit #1614 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1614/])
MAPREDUCE-3532. Modified NM to report correct http address when an 
ephemeral web port is configured. Contributed by Bhallamudi Venkata Siva Kamesh.

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231342
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java


 When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs 
 webserver component picks up random port, NM keeps on Reporting 0 port to RM
 --

 Key: MAPREDUCE-3532
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Bhallamudi Venkata Siva Kamesh
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch


 I tried following -:
 yarn.nodemanager.address=0.0.0.0:0
 yarn.nodemanager.webapp.address=0.0.0.0:0
 yarn.nodemanager.localizer.address=0.0.0.0:0
 mapreduce.shuffle.port=0
 When 0 is provided as number in yarn.nodemanager.webapp.address. 
 NM instantiate WebServer as 0 piort e.g.
 {code}
 2011-12-08 11:33:02,467 INFO 
 org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating 
 NMWebApp at 0.0.0.0:0
 {code}
 After that WebServer pick up some random port e.g.
 {code}
 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
 to port 36272
 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26
 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started 
 SelectChannelConnector@0.0.0.0:36272
 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app 
 /node started at 36272
 {code}
 And NM WebServer responds correctly but
  RM's cluster/Nodes page shows the following -:
 {code}
 /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB
 {code}
 Whereas NM:0 is not clickable.
 Seems even NM's webserver pick random port but it never gets updated and so 
 NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable
 But verified that MR job runs successfully with random.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3656) Sort job on 350 scale is consistently failing with latest MRV2 code

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185938#comment-13185938
 ] 

Hudson commented on MAPREDUCE-3656:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #1558 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1558/])
MAPREDUCE-3656. Fixed a race condition in MR AM which is failing the sort 
benchmark consistently. Contributed by Siddarth Seth.

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231314
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskAttemptListener.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java


 Sort job on 350 scale is consistently failing with latest MRV2 code 
 

 Key: MAPREDUCE-3656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 0.23.1

 Attachments: MR3656.txt, MR3656.txt, MR3656.txt


 With the code checked out on last two days. 
 Sort Job on 350 node scale with 16800 maps and 680 reduces consistently 
 failing for around last 6 runs
 When around 50% of maps are completed, suddenly job jumps to failed state.
 On looking at NM log, found RM sent Stop Container Request to NM for AM 
 container.
 But at INFO level from RM log not able find why RM is killing AM when job is 
 not killed manually.
 One thing found common on failed AM logs is -:
 org.apache.hadoop.yarn.state.InvalidStateTransitonException
 With with different.
 For e.g. One log says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_UPDATE at ASSIGNED 
 {code}
 Whereas other logs says -:
 {code}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_COUNTER_UPDATE at ERROR
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185945#comment-13185945
 ] 

Hudson commented on MAPREDUCE-3532:
---

Integrated in Hadoop-Mapreduce-0.23-Commit #387 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/387/])
MAPREDUCE-3532. Modified NM to report correct http address when an 
ephemeral web port is configured. Contributed by Bhallamudi Venkata Siva Kamesh.
svn merge --ignore-ancestry -c 1231342 ../../trunk/

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231344
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java


 When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs 
 webserver component picks up random port, NM keeps on Reporting 0 port to RM
 --

 Key: MAPREDUCE-3532
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Bhallamudi Venkata Siva Kamesh
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch


 I tried following -:
 yarn.nodemanager.address=0.0.0.0:0
 yarn.nodemanager.webapp.address=0.0.0.0:0
 yarn.nodemanager.localizer.address=0.0.0.0:0
 mapreduce.shuffle.port=0
 When 0 is provided as number in yarn.nodemanager.webapp.address. 
 NM instantiate WebServer as 0 piort e.g.
 {code}
 2011-12-08 11:33:02,467 INFO 
 org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating 
 NMWebApp at 0.0.0.0:0
 {code}
 After that WebServer pick up some random port e.g.
 {code}
 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
 to port 36272
 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26
 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started 
 SelectChannelConnector@0.0.0.0:36272
 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app 
 /node started at 36272
 {code}
 And NM WebServer responds correctly but
  RM's cluster/Nodes page shows the following -:
 {code}
 /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB
 {code}
 Whereas NM:0 is not clickable.
 Seems even NM's webserver pick random port but it never gets updated and so 
 NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable
 But verified that MR job runs successfully with random.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3532) When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185979#comment-13185979
 ] 

Hudson commented on MAPREDUCE-3532:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #1559 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1559/])
MAPREDUCE-3532. Modified NM to report correct http address when an 
ephemeral web port is configured. Contributed by Bhallamudi Venkata Siva Kamesh.

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231342
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java


 When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs 
 webserver component picks up random port, NM keeps on Reporting 0 port to RM
 --

 Key: MAPREDUCE-3532
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3532
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Bhallamudi Venkata Siva Kamesh
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3532-1.patch, MAPREDUCE-3532.patch


 I tried following -:
 yarn.nodemanager.address=0.0.0.0:0
 yarn.nodemanager.webapp.address=0.0.0.0:0
 yarn.nodemanager.localizer.address=0.0.0.0:0
 mapreduce.shuffle.port=0
 When 0 is provided as number in yarn.nodemanager.webapp.address. 
 NM instantiate WebServer as 0 piort e.g.
 {code}
 2011-12-08 11:33:02,467 INFO 
 org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating 
 NMWebApp at 0.0.0.0:0
 {code}
 After that WebServer pick up some random port e.g.
 {code}
 2011-12-08 11:33:02,562 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
 to port 36272
 2011-12-08 11:33:02,562 INFO org.mortbay.log: jetty-6.1.26
 2011-12-08 11:33:02,831 INFO org.mortbay.log: Started 
 SelectChannelConnector@0.0.0.0:36272
 2011-12-08 11:33:02,831 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app 
 /node started at 36272
 {code}
 And NM WebServer responds correctly but
  RM's cluster/Nodes page shows the following -:
 {code}
 /Rack RUNNING NM:57963 NM:0 Healthy 8-Dec-2011 11:33:01 Healthy 8 12 GB 0 KB
 {code}
 Whereas NM:0 is not clickable.
 Seems even NM's webserver pick random port but it never gets updated and so 
 NM report 0 as HTTP port to RM causing NM Hyperlinks un-clickable
 But verified that MR job runs successfully with random.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false

2012-01-13 Thread Vinod Kumar Vavilapalli (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185986#comment-13185986
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3404:


bq. How do we make sure that if mapreduce.job.maps.speculative=false and 
mapreduce.job.reduce.speculative=true, the maps dont get speculated and the 
reduces gets speculated?
The speculator handles map and reduce speculation separately. I just looked at 
the patch, and it achieves the above by not sending any map events to the 
speculative when map-speculation is disabled. The speculator doesn't seem to 
find any maps to speculates (as it doesn't know about any maps at all) and so 
only speculates reduces. Works (IMO) a convoluted way but can live with that.

+1 for the patch. Pushing this in.

 Speculative Execution: speculative map tasks launched even if 
 -Dmapreduce.map.speculative=false
 ---

 Key: MAPREDUCE-3404
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, mrv2
Affects Versions: 0.23.0
 Environment: Hadoop version is: Hadoop 0.23.0.1110031628
 10 node test cluster
Reporter: patrick white
Assignee: Eric Payne
Priority: Critical
 Fix For: 0.23.0, 0.23.1, 0.24.0

 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt


 When forcing a mapper to take significantly longer than other map tasks, 
 speculative map tasks are
 launched even if the mapreduce.job.maps.speculative.execution parameter is 
 set to 'false'.
 Testcase: ran default WordCount job with spec execution set to false for both 
 map and reduce but still saw a fifth mapper
 task launch, ran job as follows:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=false  
 -Dmapreduce.job.reduces.speculative.execution=false 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 Input data was 4 text files hdfs blocksize, with same word pattern plus one 
 diff text line in each file, fourth
 file was 4 times as large as others:
 hadoop --config config  fs -ls  /tmp
 Found 5 items
 drwxr-xr-x   - user hdfs  0 2011-10-20 16:17 /tmp/file_of_words.out
 -rw-r--r--   3 user hdfs   62800021 2011-10-20 14:45 /tmp/test_file_of_words1
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words2
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words3
 -rw-r--r--   3 user hdfs  271708312 2011-10-20 15:50 /tmp/test_file_of_words4
 Job launched 5 mappers despite spec exec set to false, output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=273540
 SLOTS_MILLIS_REDUCES=212876
 Reran same case as above only set both spec exec params to 'true', same 
 results only this time the fifth task being
 launched is expected since spec exec = true.
 job run:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=true  
 -Dmapreduce.job.reduces.speculative.execution=true 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=279653
 SLOTS_MILLIS_REDUCES=211474

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false

2012-01-13 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3404:
---

   Resolution: Fixed
Fix Version/s: (was: 0.23.0)
 Release Note: Corrected MR AM to honor speculative configuration and 
enable speculating either maps or reduces.
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

..oh, and the tests look good too.

Just committed this to trunk and branch-0.23. Thanks Eric!

On a side note, not caused by this patch, it is not correct that we increment 
the num_failed_maps counter when the speculation kills a task. Instead we 
should have a num_killed_maps. Separate issue, will file a ticket.

 Speculative Execution: speculative map tasks launched even if 
 -Dmapreduce.map.speculative=false
 ---

 Key: MAPREDUCE-3404
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, mrv2
Affects Versions: 0.23.0
 Environment: Hadoop version is: Hadoop 0.23.0.1110031628
 10 node test cluster
Reporter: patrick white
Assignee: Eric Payne
Priority: Critical
 Fix For: 0.23.1, 0.24.0

 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt


 When forcing a mapper to take significantly longer than other map tasks, 
 speculative map tasks are
 launched even if the mapreduce.job.maps.speculative.execution parameter is 
 set to 'false'.
 Testcase: ran default WordCount job with spec execution set to false for both 
 map and reduce but still saw a fifth mapper
 task launch, ran job as follows:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=false  
 -Dmapreduce.job.reduces.speculative.execution=false 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 Input data was 4 text files hdfs blocksize, with same word pattern plus one 
 diff text line in each file, fourth
 file was 4 times as large as others:
 hadoop --config config  fs -ls  /tmp
 Found 5 items
 drwxr-xr-x   - user hdfs  0 2011-10-20 16:17 /tmp/file_of_words.out
 -rw-r--r--   3 user hdfs   62800021 2011-10-20 14:45 /tmp/test_file_of_words1
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words2
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words3
 -rw-r--r--   3 user hdfs  271708312 2011-10-20 15:50 /tmp/test_file_of_words4
 Job launched 5 mappers despite spec exec set to false, output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=273540
 SLOTS_MILLIS_REDUCES=212876
 Reran same case as above only set both spec exec params to 'true', same 
 results only this time the fifth task being
 launched is expected since spec exec = true.
 job run:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=true  
 -Dmapreduce.job.reduces.speculative.execution=true 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=279653
 SLOTS_MILLIS_REDUCES=211474

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3672) Killed maps shouldn't be counted towards JobCounter.NUM_FAILED_MAPS

2012-01-13 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3672:
---

Description: 
We count maps that are killed, say by speculator, towards 
JobCounter.NUM_FAILED_MAPS. We should instead have a separate JobCounter for 
killed maps.

Same with reduces too.

  was:
We counted maps that are killed, say by speculator, towards 
JobCounter.NUM_FAILED_MAPS. We should instead have a separate JobCounter for 
killed maps.

Same with reduces too.


 Killed maps shouldn't be counted towards JobCounter.NUM_FAILED_MAPS
 ---

 Key: MAPREDUCE-3672
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3672
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


 We count maps that are killed, say by speculator, towards 
 JobCounter.NUM_FAILED_MAPS. We should instead have a separate JobCounter for 
 killed maps.
 Same with reduces too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3672) Killed maps shouldn't be counted towards JobCounter.NUM_FAILED_MAPS

2012-01-13 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Killed maps shouldn't be counted towards JobCounter.NUM_FAILED_MAPS
---

 Key: MAPREDUCE-3672
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3672
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


We counted maps that are killed, say by speculator, towards 
JobCounter.NUM_FAILED_MAPS. We should instead have a separate JobCounter for 
killed maps.

Same with reduces too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3671) AM-NM RPC calls occasionally takes a long time to respond

2012-01-13 Thread Vinod Kumar Vavilapalli (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185993#comment-13185993
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3671:


.. And this has performance implications on benchmarks. Hurts execution time 
real bad when there is no speculation for jobs with small tasks. Which is the 
default.

 AM-NM RPC calls occasionally takes a long time to respond
 -

 Key: MAPREDUCE-3671
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3671
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.0
Reporter: Siddharth Seth

 Observed while looking at MAPREDUCE-3596 and MAPREDUCE-3656.
 startContainer taking over a minute in some cases, otherwise 15 seconds. Both 
 were observed soon after reduce tasks started. Network congestion ? Need more 
 looking into.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3671) AM-NM RPC calls occasionally takes a long time to respond

2012-01-13 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3671:
---

Issue Type: Sub-task  (was: Bug)
Parent: MAPREDUCE-3561

 AM-NM RPC calls occasionally takes a long time to respond
 -

 Key: MAPREDUCE-3671
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3671
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2, nodemanager
Affects Versions: 0.23.0
Reporter: Siddharth Seth

 Observed while looking at MAPREDUCE-3596 and MAPREDUCE-3656.
 startContainer taking over a minute in some cases, otherwise 15 seconds. Both 
 were observed soon after reduce tasks started. Network congestion ? Need more 
 looking into.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185996#comment-13185996
 ] 

Hudson commented on MAPREDUCE-3404:
---

Integrated in Hadoop-Hdfs-0.23-Commit #366 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/366/])
MAPREDUCE-3404. Corrected MR AM to honor speculative configuration and 
enable speculating either maps or reduces. Contributed by Eric Payne.
svn merge --ignore-ancestry -c 1231395 ../../trunk/

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231397
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestSpeculativeExecution.java


 Speculative Execution: speculative map tasks launched even if 
 -Dmapreduce.map.speculative=false
 ---

 Key: MAPREDUCE-3404
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, mrv2
Affects Versions: 0.23.0
 Environment: Hadoop version is: Hadoop 0.23.0.1110031628
 10 node test cluster
Reporter: patrick white
Assignee: Eric Payne
Priority: Critical
 Fix For: 0.23.1, 0.24.0

 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt


 When forcing a mapper to take significantly longer than other map tasks, 
 speculative map tasks are
 launched even if the mapreduce.job.maps.speculative.execution parameter is 
 set to 'false'.
 Testcase: ran default WordCount job with spec execution set to false for both 
 map and reduce but still saw a fifth mapper
 task launch, ran job as follows:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=false  
 -Dmapreduce.job.reduces.speculative.execution=false 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 Input data was 4 text files hdfs blocksize, with same word pattern plus one 
 diff text line in each file, fourth
 file was 4 times as large as others:
 hadoop --config config  fs -ls  /tmp
 Found 5 items
 drwxr-xr-x   - user hdfs  0 2011-10-20 16:17 /tmp/file_of_words.out
 -rw-r--r--   3 user hdfs   62800021 2011-10-20 14:45 /tmp/test_file_of_words1
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words2
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words3
 -rw-r--r--   3 user hdfs  271708312 2011-10-20 15:50 /tmp/test_file_of_words4
 Job launched 5 mappers despite spec exec set to false, output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=273540
 SLOTS_MILLIS_REDUCES=212876
 Reran same case as above only set both spec exec params to 'true', same 
 results only this time the fifth task being
 launched is expected since spec exec = true.
 job run:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=true  
 -Dmapreduce.job.reduces.speculative.execution=true 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=279653
 SLOTS_MILLIS_REDUCES=211474

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185997#comment-13185997
 ] 

Hudson commented on MAPREDUCE-3404:
---

Integrated in Hadoop-Hdfs-trunk-Commit #1615 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1615/])
MAPREDUCE-3404. Corrected MR AM to honor speculative configuration and 
enable speculating either maps or reduces. Contributed by Eric Payne.

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231395
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestSpeculativeExecution.java


 Speculative Execution: speculative map tasks launched even if 
 -Dmapreduce.map.speculative=false
 ---

 Key: MAPREDUCE-3404
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, mrv2
Affects Versions: 0.23.0
 Environment: Hadoop version is: Hadoop 0.23.0.1110031628
 10 node test cluster
Reporter: patrick white
Assignee: Eric Payne
Priority: Critical
 Fix For: 0.23.1, 0.24.0

 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt


 When forcing a mapper to take significantly longer than other map tasks, 
 speculative map tasks are
 launched even if the mapreduce.job.maps.speculative.execution parameter is 
 set to 'false'.
 Testcase: ran default WordCount job with spec execution set to false for both 
 map and reduce but still saw a fifth mapper
 task launch, ran job as follows:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=false  
 -Dmapreduce.job.reduces.speculative.execution=false 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 Input data was 4 text files hdfs blocksize, with same word pattern plus one 
 diff text line in each file, fourth
 file was 4 times as large as others:
 hadoop --config config  fs -ls  /tmp
 Found 5 items
 drwxr-xr-x   - user hdfs  0 2011-10-20 16:17 /tmp/file_of_words.out
 -rw-r--r--   3 user hdfs   62800021 2011-10-20 14:45 /tmp/test_file_of_words1
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words2
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words3
 -rw-r--r--   3 user hdfs  271708312 2011-10-20 15:50 /tmp/test_file_of_words4
 Job launched 5 mappers despite spec exec set to false, output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=273540
 SLOTS_MILLIS_REDUCES=212876
 Reran same case as above only set both spec exec params to 'true', same 
 results only this time the fifth task being
 launched is expected since spec exec = true.
 job run:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=true  
 -Dmapreduce.job.reduces.speculative.execution=true 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=279653
 SLOTS_MILLIS_REDUCES=211474

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185999#comment-13185999
 ] 

Hudson commented on MAPREDUCE-3404:
---

Integrated in Hadoop-Common-trunk-Commit #1542 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1542/])
MAPREDUCE-3404. Corrected MR AM to honor speculative configuration and 
enable speculating either maps or reduces. Contributed by Eric Payne.

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231395
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestSpeculativeExecution.java


 Speculative Execution: speculative map tasks launched even if 
 -Dmapreduce.map.speculative=false
 ---

 Key: MAPREDUCE-3404
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, mrv2
Affects Versions: 0.23.0
 Environment: Hadoop version is: Hadoop 0.23.0.1110031628
 10 node test cluster
Reporter: patrick white
Assignee: Eric Payne
Priority: Critical
 Fix For: 0.23.1, 0.24.0

 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt


 When forcing a mapper to take significantly longer than other map tasks, 
 speculative map tasks are
 launched even if the mapreduce.job.maps.speculative.execution parameter is 
 set to 'false'.
 Testcase: ran default WordCount job with spec execution set to false for both 
 map and reduce but still saw a fifth mapper
 task launch, ran job as follows:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=false  
 -Dmapreduce.job.reduces.speculative.execution=false 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 Input data was 4 text files hdfs blocksize, with same word pattern plus one 
 diff text line in each file, fourth
 file was 4 times as large as others:
 hadoop --config config  fs -ls  /tmp
 Found 5 items
 drwxr-xr-x   - user hdfs  0 2011-10-20 16:17 /tmp/file_of_words.out
 -rw-r--r--   3 user hdfs   62800021 2011-10-20 14:45 /tmp/test_file_of_words1
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words2
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words3
 -rw-r--r--   3 user hdfs  271708312 2011-10-20 15:50 /tmp/test_file_of_words4
 Job launched 5 mappers despite spec exec set to false, output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=273540
 SLOTS_MILLIS_REDUCES=212876
 Reran same case as above only set both spec exec params to 'true', same 
 results only this time the fifth task being
 launched is expected since spec exec = true.
 job run:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=true  
 -Dmapreduce.job.reduces.speculative.execution=true 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=279653
 SLOTS_MILLIS_REDUCES=211474

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186011#comment-13186011
 ] 

Hudson commented on MAPREDUCE-3404:
---

Integrated in Hadoop-Mapreduce-0.23-Commit #388 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/388/])
MAPREDUCE-3404. Corrected MR AM to honor speculative configuration and 
enable speculating either maps or reduces. Contributed by Eric Payne.
svn merge --ignore-ancestry -c 1231395 ../../trunk/

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231397
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestSpeculativeExecution.java


 Speculative Execution: speculative map tasks launched even if 
 -Dmapreduce.map.speculative=false
 ---

 Key: MAPREDUCE-3404
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, mrv2
Affects Versions: 0.23.0
 Environment: Hadoop version is: Hadoop 0.23.0.1110031628
 10 node test cluster
Reporter: patrick white
Assignee: Eric Payne
Priority: Critical
 Fix For: 0.23.1, 0.24.0

 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt


 When forcing a mapper to take significantly longer than other map tasks, 
 speculative map tasks are
 launched even if the mapreduce.job.maps.speculative.execution parameter is 
 set to 'false'.
 Testcase: ran default WordCount job with spec execution set to false for both 
 map and reduce but still saw a fifth mapper
 task launch, ran job as follows:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=false  
 -Dmapreduce.job.reduces.speculative.execution=false 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 Input data was 4 text files hdfs blocksize, with same word pattern plus one 
 diff text line in each file, fourth
 file was 4 times as large as others:
 hadoop --config config  fs -ls  /tmp
 Found 5 items
 drwxr-xr-x   - user hdfs  0 2011-10-20 16:17 /tmp/file_of_words.out
 -rw-r--r--   3 user hdfs   62800021 2011-10-20 14:45 /tmp/test_file_of_words1
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words2
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words3
 -rw-r--r--   3 user hdfs  271708312 2011-10-20 15:50 /tmp/test_file_of_words4
 Job launched 5 mappers despite spec exec set to false, output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=273540
 SLOTS_MILLIS_REDUCES=212876
 Reran same case as above only set both spec exec params to 'true', same 
 results only this time the fifth task being
 launched is expected since spec exec = true.
 job run:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=true  
 -Dmapreduce.job.reduces.speculative.execution=true 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=279653
 SLOTS_MILLIS_REDUCES=211474

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3404) Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false

2012-01-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186024#comment-13186024
 ] 

Hudson commented on MAPREDUCE-3404:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #1560 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1560/])
MAPREDUCE-3404. Corrected MR AM to honor speculative configuration and 
enable speculating either maps or reduces. Contributed by Eric Payne.

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1231395
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestSpeculativeExecution.java


 Speculative Execution: speculative map tasks launched even if 
 -Dmapreduce.map.speculative=false
 ---

 Key: MAPREDUCE-3404
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3404
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, mrv2
Affects Versions: 0.23.0
 Environment: Hadoop version is: Hadoop 0.23.0.1110031628
 10 node test cluster
Reporter: patrick white
Assignee: Eric Payne
Priority: Critical
 Fix For: 0.23.1, 0.24.0

 Attachments: MAPREDUCE-3404.1.txt, MAPREDUCE-3404.2.txt


 When forcing a mapper to take significantly longer than other map tasks, 
 speculative map tasks are
 launched even if the mapreduce.job.maps.speculative.execution parameter is 
 set to 'false'.
 Testcase: ran default WordCount job with spec execution set to false for both 
 map and reduce but still saw a fifth mapper
 task launch, ran job as follows:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=false  
 -Dmapreduce.job.reduces.speculative.execution=false 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 Input data was 4 text files hdfs blocksize, with same word pattern plus one 
 diff text line in each file, fourth
 file was 4 times as large as others:
 hadoop --config config  fs -ls  /tmp
 Found 5 items
 drwxr-xr-x   - user hdfs  0 2011-10-20 16:17 /tmp/file_of_words.out
 -rw-r--r--   3 user hdfs   62800021 2011-10-20 14:45 /tmp/test_file_of_words1
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words2
 -rw-r--r--   3 user hdfs   62800024 2011-10-20 14:46 /tmp/test_file_of_words3
 -rw-r--r--   3 user hdfs  271708312 2011-10-20 15:50 /tmp/test_file_of_words4
 Job launched 5 mappers despite spec exec set to false, output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=273540
 SLOTS_MILLIS_REDUCES=212876
 Reran same case as above only set both spec exec params to 'true', same 
 results only this time the fifth task being
 launched is expected since spec exec = true.
 job run:
 hadoop --config config  jar   /tmp/testphw/wordcount.jar   WordCount  
 -Dmapreduce.job.maps.speculative.execution=true  
 -Dmapreduce.job.reduces.speculative.execution=true 
 /tmp/test_file_of_words* /tmp/file_of_words.out
 output snippet:
 org.apache.hadoop.mapreduce.JobCounter
 NUM_FAILED_MAPS=1
 TOTAL_LAUNCHED_MAPS=5
 TOTAL_LAUNCHED_REDUCES=1
 RACK_LOCAL_MAPS=5
 SLOTS_MILLIS_MAPS=279653
 SLOTS_MILLIS_REDUCES=211474

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira