date:20120214

[jira] [Commented] (MAPREDUCE-3859) CapacityScheduler incorrectly utilizes extra-resources of queue for high-memory jobs

2012-02-14 Thread Sergey Tryuber (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207611#comment-13207611
 ] 

Sergey Tryuber commented on MAPREDUCE-3859:
---

Sorry, I don't know what exactly version of Hadoop is used in cdh3u1 
distribution. There we have following lines in *CapacitySchedulerQueue.java* in 
*assignSlotsToJob* method:
{code}
int queueSlotsOccupied = getNumSlotsOccupied(taskType);
int currentCapacity;

if (queueSlotsOccupied  queueCapacity) {
  currentCapacity = queueCapacity;
}
else {
  currentCapacity = queueSlotsOccupied + numSlotsRequested;
}
{code}

Imagine we have a job with 1 slot per task, if we have queue with 10 configured 
capacity and 9 occupied slots (imagine, we have large maximum capacity and a 
lot of free slots on cluster), then _currentCapacity=10_ and task will be 
scheduled properly. Later, when will have 10 occupied slots, 
_currentCapacity=11_ and all will be fine too. And so on...

Now imagine, we have a job with 3 slots per task, if we have queue with 10 
configured capacity and 9 occupied slots, then _currentCapacity=10_, but that's 
not enough for scheduling this new task!!! So, this job will never use more 
then 9 slots!

I've fixed this problem by changing:
{code}
if (queueSlotsOccupied  queueCapacity) {
{code}
on
{code}
if (queueSlotsOccupied + numSlotsRequested = queueCapacity) {
{code}

I've rebuilt cdh3u1 from sources, deployed jar on the cluster and 
CapacityScheduler works well now for me.

Also I've checkouted current Hadoop's trunk. Unfortunately, sources of 
CapacityScheduler dramatically changed. But I've found the similar lines in 
*LeafQueue.java* in *computeUserLimit* method:
{code}
final int currentCapacity = 
  (consumed  queueCapacity) ? 
  queueCapacity : (consumed + required.getMemory());
{code}
So, it seems to me, this bug also affects the latest CapacityScheduler

 CapacityScheduler incorrectly utilizes extra-resources of queue for 
 high-memory jobs
 

 Key: MAPREDUCE-3859
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3859
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/capacity-sched
 Environment: CDH3u1
Reporter: Sergey Tryuber

 Imagine, we have a queue A with capacity 10 slots and 20 as extra-capacity, 
 jobs which use 3 map slots will never consume more than 9 slots, regardless 
 how many free slots on a cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-3859) CapacityScheduler incorrectly utilizes extra-resources of queue for high-memory jobs

2012-02-14 Thread Sergey Tryuber (Created) (JIRA)

CapacityScheduler incorrectly utilizes extra-resources of queue for high-memory 
jobs


 Key: MAPREDUCE-3859
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3859
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/capacity-sched
 Environment: CDH3u1
Reporter: Sergey Tryuber


Imagine, we have a queue A with capacity 10 slots and 20 as extra-capacity, 
jobs which use 3 map slots will never consume more than 9 slots, regardless how 
many free slots on a cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207663#comment-13207663
 ] 

Hudson commented on MAPREDUCE-3837:
---

Integrated in Hadoop-Hdfs-trunk #955 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/955/])
MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. 
Contributed by Mayank Bansal. (Revision 1243695)

 Result = FAILURE
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1243695
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobTracker.java


 Hadoop 22 Job tracker is not able to recover job in case of crash and after 
 that no user can submit job.
 

 Key: MAPREDUCE-3837
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 0.24.0, 0.22.1, 0.23.2

 Attachments: PATCH-MAPREDUCE-3837.patch, 
 PATCH-TRUNK-MAPREDUCE-3837.patch


 If job tracker is crashed while running , and there were some jobs are 
 running , so if job tracker's property mapreduce.jobtracker.restart.recover 
 is true then it should recover the job.
 However the current behavior is as follows
 jobtracker try to restore the jobs but it can not . And after that jobtracker 
 closes its handle to hdfs and nobody else can submit job. 
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3846) Restarted+Recovered AM hangs in some corner cases

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207664#comment-13207664
 ] 

Hudson commented on MAPREDUCE-3846:
---

Integrated in Hadoop-Hdfs-trunk #955 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/955/])
MAPREDUCE-3846. Addressed MR AM hanging issues during AM restart and then 
the recovery. (vinodkv) (Revision 1243752)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1243752
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/MapTaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/ReduceTaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/Recovery.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/TypeConverter.java


 Restarted+Recovered AM hangs in some corner cases
 -

 Key: MAPREDUCE-3846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3846-20120210.txt, 
 MAPREDUCE-3846-20120210.txt, MAPREDUCE-3846-20120213.txt


 [~karams] found this while testing AM restart/recovery feature. After the 
 first generation AM crashes (manually killed by kill -9), the second 
 generation AM starts, but hangs after a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207670#comment-13207670
 ] 

Hudson commented on MAPREDUCE-3837:
---

Integrated in Hadoop-Hdfs-0.23-Build #168 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/168/])
MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. 
Contributed by Mayank Bansal. (Revision 1243698)

 Result = FAILURE
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1243698
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobTracker.java


 Hadoop 22 Job tracker is not able to recover job in case of crash and after 
 that no user can submit job.
 

 Key: MAPREDUCE-3837
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 0.24.0, 0.22.1, 0.23.2

 Attachments: PATCH-MAPREDUCE-3837.patch, 
 PATCH-TRUNK-MAPREDUCE-3837.patch


 If job tracker is crashed while running , and there were some jobs are 
 running , so if job tracker's property mapreduce.jobtracker.restart.recover 
 is true then it should recover the job.
 However the current behavior is as follows
 jobtracker try to restore the jobs but it can not . And after that jobtracker 
 closes its handle to hdfs and nobody else can submit job. 
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3846) Restarted+Recovered AM hangs in some corner cases

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207671#comment-13207671
 ] 

Hudson commented on MAPREDUCE-3846:
---

Integrated in Hadoop-Hdfs-0.23-Build #168 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/168/])
MAPREDUCE-3846. Addressed MR AM hanging issues during AM restart and then 
the recovery. (vinodkv)
svn merge --ignore-ancestry -c 1243752 ../../trunk/ (Revision 1243755)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1243755
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/MapTaskImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/ReduceTaskImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/Recovery.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/TypeConverter.java


 Restarted+Recovered AM hangs in some corner cases
 -

 Key: MAPREDUCE-3846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3846-20120210.txt, 
 MAPREDUCE-3846-20120210.txt, MAPREDUCE-3846-20120213.txt


 [~karams] found this while testing AM restart/recovery feature. After the 
 first generation AM crashes (manually killed by kill -9), the second 
 generation AM starts, but hangs after a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3859) CapacityScheduler incorrectly utilizes extra-resources of queue for high-memory jobs

2012-02-14 Thread Harsh J (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated MAPREDUCE-3859:
---

 Target Version/s: 1.1.0
Affects Version/s: 1.0.0

This affects 1.x as well, comparing the difference in commits to CS between 
CDH3's CS and Apache 1.x.

 CapacityScheduler incorrectly utilizes extra-resources of queue for 
 high-memory jobs
 

 Key: MAPREDUCE-3859
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3859
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/capacity-sched
Affects Versions: 1.0.0
 Environment: CDH3u1
Reporter: Sergey Tryuber

 Imagine, we have a queue A with capacity 10 slots and 20 as extra-capacity, 
 jobs which use 3 map slots will never consume more than 9 slots, regardless 
 how many free slots on a cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3859) CapacityScheduler incorrectly utilizes extra-resources of queue for high-memory jobs

2012-02-14 Thread Harsh J (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207680#comment-13207680
 ] 

Harsh J commented on MAPREDUCE-3859:


Sergey,

Thanks for taking a dig at the code and coming up with a fix! Would you be 
interested in posting a patch fix for this as well? We'd require a test case 
that fails without the fix as well.

Let us know if thats possible, thanks again!

 CapacityScheduler incorrectly utilizes extra-resources of queue for 
 high-memory jobs
 

 Key: MAPREDUCE-3859
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3859
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/capacity-sched
Affects Versions: 1.0.0
 Environment: CDH3u1
Reporter: Sergey Tryuber

 Imagine, we have a queue A with capacity 10 slots and 20 as extra-capacity, 
 jobs which use 3 map slots will never consume more than 9 slots, regardless 
 how many free slots on a cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3846) Restarted+Recovered AM hangs in some corner cases

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207711#comment-13207711
 ] 

Hudson commented on MAPREDUCE-3846:
---

Integrated in Hadoop-Mapreduce-0.23-Build #196 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/196/])
MAPREDUCE-3846. Addressed MR AM hanging issues during AM restart and then 
the recovery. (vinodkv)
svn merge --ignore-ancestry -c 1243752 ../../trunk/ (Revision 1243755)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1243755
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/MapTaskImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/ReduceTaskImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/Recovery.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/TypeConverter.java


 Restarted+Recovered AM hangs in some corner cases
 -

 Key: MAPREDUCE-3846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3846-20120210.txt, 
 MAPREDUCE-3846-20120210.txt, MAPREDUCE-3846-20120213.txt


 [~karams] found this while testing AM restart/recovery feature. After the 
 first generation AM crashes (manually killed by kill -9), the second 
 generation AM starts, but hangs after a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207716#comment-13207716
 ] 

Hudson commented on MAPREDUCE-3837:
---

Integrated in Hadoop-Mapreduce-trunk #990 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/990/])
MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. 
Contributed by Mayank Bansal. (Revision 1243695)

 Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1243695
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobTracker.java


 Hadoop 22 Job tracker is not able to recover job in case of crash and after 
 that no user can submit job.
 

 Key: MAPREDUCE-3837
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 0.24.0, 0.22.1, 0.23.2

 Attachments: PATCH-MAPREDUCE-3837.patch, 
 PATCH-TRUNK-MAPREDUCE-3837.patch


 If job tracker is crashed while running , and there were some jobs are 
 running , so if job tracker's property mapreduce.jobtracker.restart.recover 
 is true then it should recover the job.
 However the current behavior is as follows
 jobtracker try to restore the jobs but it can not . And after that jobtracker 
 closes its handle to hdfs and nobody else can submit job. 
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3846) Restarted+Recovered AM hangs in some corner cases

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207717#comment-13207717
 ] 

Hudson commented on MAPREDUCE-3846:
---

Integrated in Hadoop-Mapreduce-trunk #990 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/990/])
MAPREDUCE-3846. Addressed MR AM hanging issues during AM restart and then 
the recovery. (vinodkv) (Revision 1243752)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1243752
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/MapTaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/ReduceTaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/Recovery.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/TypeConverter.java


 Restarted+Recovered AM hangs in some corner cases
 -

 Key: MAPREDUCE-3846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3846-20120210.txt, 
 MAPREDUCE-3846-20120210.txt, MAPREDUCE-3846-20120213.txt


 [~karams] found this while testing AM restart/recovery feature. After the 
 first generation AM crashes (manually killed by kill -9), the second 
 generation AM starts, but hangs after a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3825) MR should not be getting duplicate tokens for a MR Job.

2012-02-14 Thread Daryn Sharp (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207767#comment-13207767
]

Daryn Sharp commented on MAPREDUCE-3825:

If there are external multi-token filesystems, that currently work, then they
have implemented {{getDelegationTokens(renewer, creds)}}. Those filesystems
will continue to work so long as {{getDelegationTokens(renewer, creds)}} isn't
marked {{final}} as also proposed. W/o {{final}}, the proposal is completely
backwards compatible.

MR should not be getting duplicate tokens for a MR Job.
---

Key: MAPREDUCE-3825
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3825
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: security
Affects Versions: 0.23.1, 0.24.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Attachments: MAPREDUCE-3825.patch, TokenCache.pdf

This is the counterpart to HADOOP-7967.
MR gets tokens for all input, output and the default filesystem when a MR job
is submitted.
The APIs in FileSystem make it challenging to avoid duplicate tokens when
there are file systems that have embedded
filesystems.
Here is the original description that Daryn wrote:
The token cache currently tries to assume a filesystem's token service key.
The assumption generally worked while there was a one to one mapping of
filesystem to token. With the advent of multi-token filesystems like viewfs,
the token cache will try to use a service key (ie. for viewfs) that will
never exist (because it really gets the mounted fs tokens).
The descriop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3802) If an MR AM dies twice it looks like the process freezes

2012-02-14 Thread Robert Joseph Evans (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207811#comment-13207811
 ] 

Robert Joseph Evans commented on MAPREDUCE-3802:


+1 you are correct I manually verified that 0.23.2 does not show this problem 
any more, and so I assume that it is MAPREDUCE-3846 that fixed it.

 If an MR AM dies twice  it looks like the process freezes
 -

 Key: MAPREDUCE-3802
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3802
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Affects Versions: 0.23.1, 0.24.0
Reporter: Robert Joseph Evans
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3802-20120213.txt, 
 MAPREDUCE-3802-20120213.txt, syslog


 It looks like recovering from an RM AM dieing works very well on a single 
 failure.  But if it fails multiple times we appear to get into a live lock 
 situation.
 {noformat}
 yarn jar 
 hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*-SNAPSHOT.jar 
 wordcount -Dyarn.app.mapreduce.am.log.level=DEBUG -Dmapreduce.job.reduces=30 
 input output
 12/02/03 21:06:57 WARN conf.Configuration: fs.default.name is deprecated. 
 Instead, use fs.defaultFS
 12/02/03 21:06:57 WARN conf.Configuration: mapred.used.genericoptionsparser 
 is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
 12/02/03 21:06:57 INFO input.FileInputFormat: Total input paths to process : 
 17
 12/02/03 21:06:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 12/02/03 21:06:57 WARN snappy.LoadSnappy: Snappy native library not loaded
 12/02/03 21:06:57 INFO mapreduce.JobSubmitter: number of splits:17
 12/02/03 21:06:57 INFO mapred.ResourceMgrDelegate: Submitted application 
 application_1328302034486_0003 to ResourceManager at HOST/IP:8040
 12/02/03 21:06:57 INFO mapreduce.Job: The url to track the job: 
 http://HOST:8088/proxy/application_1328302034486_0003/
 12/02/03 21:06:57 INFO mapreduce.Job: Running job: job_1328302034486_0003
 12/02/03 21:07:03 INFO mapreduce.Job: Job job_1328302034486_0003 running in 
 uber mode : false
 12/02/03 21:07:03 INFO mapreduce.Job:  map 0% reduce 0%
 12/02/03 21:07:09 INFO mapreduce.Job:  map 5% reduce 0%
 12/02/03 21:07:10 INFO mapreduce.Job:  map 17% reduce 0%
 #KILLED AM with kill -9 here
 12/02/03 21:07:16 INFO mapreduce.Job:  map 29% reduce 0%
 12/02/03 21:07:17 INFO mapreduce.Job:  map 35% reduce 0%
 12/02/03 21:07:30 INFO mapreduce.Job:  map 52% reduce 0%
 12/02/03 21:07:35 INFO mapreduce.Job:  map 58% reduce 0%
 12/02/03 21:07:37 INFO mapreduce.Job:  map 70% reduce 0%
 12/02/03 21:07:41 INFO mapreduce.Job:  map 76% reduce 0%
 12/02/03 21:07:43 INFO mapreduce.Job:  map 82% reduce 0%
 12/02/03 21:07:44 INFO mapreduce.Job:  map 88% reduce 0%
 12/02/03 21:07:47 INFO mapreduce.Job:  map 94% reduce 0%
 12/02/03 21:07:49 INFO mapreduce.Job:  map 100% reduce 0%
 12/02/03 21:07:53 INFO mapreduce.Job:  map 100% reduce 3%
 12/02/03 21:08:00 INFO mapreduce.Job:  map 100% reduce 6%
 12/02/03 21:08:06 INFO mapreduce.Job:  map 100% reduce 10%
 12/02/03 21:08:12 INFO mapreduce.Job:  map 100% reduce 13%
 12/02/03 21:08:18 INFO mapreduce.Job:  map 100% reduce 16%
 #killed AM with kill -9 here
 12/02/03 21:08:20 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 0 time(s).
 12/02/03 21:08:21 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 1 time(s).
 12/02/03 21:08:22 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 2 time(s).
 12/02/03 21:08:26 INFO mapreduce.Job:  map 64% reduce 16%
 #It never makes any more progress...
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3348) mapred job -status fails to give info even if the job is present in History

2012-02-14 Thread Devaraj K (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-3348:
-

Attachment: MAPREDUCE-3348.patch

 mapred job -status fails to give info even if the job is present in History
 ---

 Key: MAPREDUCE-3348
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3348
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: MAPREDUCE-3348.patch


 It is trying to get the app report from the RM  for the job, RM throws 
 exception when it doesn't find and then it is giving the same exception 
 without trying from History Server.
 {code}
 11/11/03 08:47:27 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy 
 for protocol interface org.apache.hadoop.mapred   
uce.v2.api.MRClientProtocol
 11/11/03 08:47:28 WARN mapred.ClientServiceDelegate: Exception thrown by 
 remote end.
 RemoteTrace:
  at LocalTrace:
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
 Trying to get information for an absent applicat  
 ion 
 application_1320278804241_0002
 at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
 at $Proxy6.getApplicationReport(Unknown Source)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getApplicationReport(ClientRMProtocolPBClie
   
 ntImpl.java:111)
 at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:321)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:137)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:273)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:353)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:429)
 at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:186)
 at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:240)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1106)
 Exception in thread main RemoteTrace:
  at Local Trace:
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
 Trying to get information for an absent applicat  
 ion 
 application_1320278804241_0002
 at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
 at $Proxy6.getApplicationReport(Unknown Source)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getApplicationReport(ClientRMProtocolPBClie
   
 ntImpl.java:111)
 at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:321)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:137)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:273)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:353)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:429)
 at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:186)
 at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:240)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1106)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3348) mapred job -status fails to give info even if the job is present in History

2012-02-14 Thread Devaraj K (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-3348:
-

Status: Patch Available  (was: Open)

 mapred job -status fails to give info even if the job is present in History
 ---

 Key: MAPREDUCE-3348
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3348
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.24.0
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: MAPREDUCE-3348.patch


 It is trying to get the app report from the RM  for the job, RM throws 
 exception when it doesn't find and then it is giving the same exception 
 without trying from History Server.
 {code}
 11/11/03 08:47:27 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy 
 for protocol interface org.apache.hadoop.mapred   
uce.v2.api.MRClientProtocol
 11/11/03 08:47:28 WARN mapred.ClientServiceDelegate: Exception thrown by 
 remote end.
 RemoteTrace:
  at LocalTrace:
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
 Trying to get information for an absent applicat  
 ion 
 application_1320278804241_0002
 at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
 at $Proxy6.getApplicationReport(Unknown Source)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getApplicationReport(ClientRMProtocolPBClie
   
 ntImpl.java:111)
 at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:321)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:137)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:273)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:353)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:429)
 at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:186)
 at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:240)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1106)
 Exception in thread main RemoteTrace:
  at Local Trace:
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
 Trying to get information for an absent applicat  
 ion 
 application_1320278804241_0002
 at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
 at $Proxy6.getApplicationReport(Unknown Source)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getApplicationReport(ClientRMProtocolPBClie
   
 ntImpl.java:111)
 at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:321)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:137)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:273)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:353)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:429)
 at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:186)
 at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:240)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1106)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3348) mapred job -status fails to give info even if the job is present in History

2012-02-14 Thread Devaraj K (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-3348:
-

Component/s: mrv2

 mapred job -status fails to give info even if the job is present in History
 ---

 Key: MAPREDUCE-3348
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3348
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.24.0
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: MAPREDUCE-3348.patch


 It is trying to get the app report from the RM  for the job, RM throws 
 exception when it doesn't find and then it is giving the same exception 
 without trying from History Server.
 {code}
 11/11/03 08:47:27 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy 
 for protocol interface org.apache.hadoop.mapred   
uce.v2.api.MRClientProtocol
 11/11/03 08:47:28 WARN mapred.ClientServiceDelegate: Exception thrown by 
 remote end.
 RemoteTrace:
  at LocalTrace:
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
 Trying to get information for an absent applicat  
 ion 
 application_1320278804241_0002
 at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
 at $Proxy6.getApplicationReport(Unknown Source)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getApplicationReport(ClientRMProtocolPBClie
   
 ntImpl.java:111)
 at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:321)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:137)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:273)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:353)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:429)
 at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:186)
 at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:240)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1106)
 Exception in thread main RemoteTrace:
  at Local Trace:
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
 Trying to get information for an absent applicat  
 ion 
 application_1320278804241_0002
 at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
 at $Proxy6.getApplicationReport(Unknown Source)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getApplicationReport(ClientRMProtocolPBClie
   
 ntImpl.java:111)
 at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:321)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:137)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:273)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:353)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:429)
 at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:186)
 at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:240)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1106)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3348) mapred job -status fails to give info even if the job is present in History

2012-02-14 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207851#comment-13207851
 ] 

Hadoop QA commented on MAPREDUCE-3348:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12514501/MAPREDUCE-3348.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1853//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1853//console

This message is automatically generated.

 mapred job -status fails to give info even if the job is present in History
 ---

 Key: MAPREDUCE-3348
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3348
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.24.0
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: MAPREDUCE-3348.patch


 It is trying to get the app report from the RM  for the job, RM throws 
 exception when it doesn't find and then it is giving the same exception 
 without trying from History Server.
 {code}
 11/11/03 08:47:27 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy 
 for protocol interface org.apache.hadoop.mapred   
uce.v2.api.MRClientProtocol
 11/11/03 08:47:28 WARN mapred.ClientServiceDelegate: Exception thrown by 
 remote end.
 RemoteTrace:
  at LocalTrace:
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
 Trying to get information for an absent applicat  
 ion 
 application_1320278804241_0002
 at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
 at $Proxy6.getApplicationReport(Unknown Source)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getApplicationReport(ClientRMProtocolPBClie
   
 ntImpl.java:111)
 at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:321)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:137)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:273)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:353)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:429)
 at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:186)
 at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:240)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1106)
 Exception in thread main RemoteTrace:
  at Local Trace:
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
 Trying to get information for an absent applicat  
 ion 
 application_1320278804241_0002
 at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
 at $Proxy6.getApplicationReport(Unknown Source)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getApplicationReport(ClientRMProtocolPBClie
   
 ntImpl.java:111)
 at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:321)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:137)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:273)
 at

[jira] [Commented] (MAPREDUCE-3858) Task attempt failure during commit results in task never completing

2012-02-14 Thread Aaron T. Myers (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207883#comment-13207883
 ] 

Aaron T. Myers commented on MAPREDUCE-3858:
---

+1 (non-binding.) I tested this patch on a cluster over night and didn't 
experience this issue at all, whereas without this patch I hit this problem 
twice in a 12 hour period under the same load.

 Task attempt failure during commit results in task never completing
 ---

 Key: MAPREDUCE-3858
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3858
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Tom White
Assignee: Tom White
Priority: Critical
 Attachments: MAPREDUCE-3858.patch


 On a terasort job a task attempt failed during the commit phase. Another 
 attempt was rescheduled, but when it tried to commit it failed.
 {noformat}
 attempt_1329019187148_0083_r_000586_0 already given a go for committing the 
 task output, so killing attempt_1329019187148_0083_r_000586_1
 {noformat}
 The job hung as new attempts kept getting scheduled only to fail during 
 commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3858) Task attempt failure during commit results in task never completing

2012-02-14 Thread Mahadev konar (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207903#comment-13207903
 ] 

Mahadev konar commented on MAPREDUCE-3858:
--

Good patch to go into 0.23.1 as well. Ill go ahead and commit this to all the 3 
branches.

 Task attempt failure during commit results in task never completing
 ---

 Key: MAPREDUCE-3858
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3858
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Tom White
Assignee: Tom White
Priority: Critical
 Attachments: MAPREDUCE-3858.patch


 On a terasort job a task attempt failed during the commit phase. Another 
 attempt was rescheduled, but when it tried to commit it failed.
 {noformat}
 attempt_1329019187148_0083_r_000586_0 already given a go for committing the 
 task output, so killing attempt_1329019187148_0083_r_000586_1
 {noformat}
 The job hung as new attempts kept getting scheduled only to fail during 
 commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3802) If an MR AM dies twice it looks like the process freezes

2012-02-14 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3802:
---

  Resolution: Fixed
Release Note: Added test to validate that AM can crash multiple times and 
still can recover successfully after MAPREDUCE-3846.
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the review/verification, Robert!

I just committed this to trunk, 0.23 and 0.23.1.

 If an MR AM dies twice  it looks like the process freezes
 -

 Key: MAPREDUCE-3802
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3802
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Affects Versions: 0.23.1, 0.24.0
Reporter: Robert Joseph Evans
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3802-20120213.txt, 
 MAPREDUCE-3802-20120213.txt, syslog


 It looks like recovering from an RM AM dieing works very well on a single 
 failure.  But if it fails multiple times we appear to get into a live lock 
 situation.
 {noformat}
 yarn jar 
 hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*-SNAPSHOT.jar 
 wordcount -Dyarn.app.mapreduce.am.log.level=DEBUG -Dmapreduce.job.reduces=30 
 input output
 12/02/03 21:06:57 WARN conf.Configuration: fs.default.name is deprecated. 
 Instead, use fs.defaultFS
 12/02/03 21:06:57 WARN conf.Configuration: mapred.used.genericoptionsparser 
 is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
 12/02/03 21:06:57 INFO input.FileInputFormat: Total input paths to process : 
 17
 12/02/03 21:06:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 12/02/03 21:06:57 WARN snappy.LoadSnappy: Snappy native library not loaded
 12/02/03 21:06:57 INFO mapreduce.JobSubmitter: number of splits:17
 12/02/03 21:06:57 INFO mapred.ResourceMgrDelegate: Submitted application 
 application_1328302034486_0003 to ResourceManager at HOST/IP:8040
 12/02/03 21:06:57 INFO mapreduce.Job: The url to track the job: 
 http://HOST:8088/proxy/application_1328302034486_0003/
 12/02/03 21:06:57 INFO mapreduce.Job: Running job: job_1328302034486_0003
 12/02/03 21:07:03 INFO mapreduce.Job: Job job_1328302034486_0003 running in 
 uber mode : false
 12/02/03 21:07:03 INFO mapreduce.Job:  map 0% reduce 0%
 12/02/03 21:07:09 INFO mapreduce.Job:  map 5% reduce 0%
 12/02/03 21:07:10 INFO mapreduce.Job:  map 17% reduce 0%
 #KILLED AM with kill -9 here
 12/02/03 21:07:16 INFO mapreduce.Job:  map 29% reduce 0%
 12/02/03 21:07:17 INFO mapreduce.Job:  map 35% reduce 0%
 12/02/03 21:07:30 INFO mapreduce.Job:  map 52% reduce 0%
 12/02/03 21:07:35 INFO mapreduce.Job:  map 58% reduce 0%
 12/02/03 21:07:37 INFO mapreduce.Job:  map 70% reduce 0%
 12/02/03 21:07:41 INFO mapreduce.Job:  map 76% reduce 0%
 12/02/03 21:07:43 INFO mapreduce.Job:  map 82% reduce 0%
 12/02/03 21:07:44 INFO mapreduce.Job:  map 88% reduce 0%
 12/02/03 21:07:47 INFO mapreduce.Job:  map 94% reduce 0%
 12/02/03 21:07:49 INFO mapreduce.Job:  map 100% reduce 0%
 12/02/03 21:07:53 INFO mapreduce.Job:  map 100% reduce 3%
 12/02/03 21:08:00 INFO mapreduce.Job:  map 100% reduce 6%
 12/02/03 21:08:06 INFO mapreduce.Job:  map 100% reduce 10%
 12/02/03 21:08:12 INFO mapreduce.Job:  map 100% reduce 13%
 12/02/03 21:08:18 INFO mapreduce.Job:  map 100% reduce 16%
 #killed AM with kill -9 here
 12/02/03 21:08:20 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 0 time(s).
 12/02/03 21:08:21 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 1 time(s).
 12/02/03 21:08:22 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 2 time(s).
 12/02/03 21:08:26 INFO mapreduce.Job:  map 64% reduce 16%
 #It never makes any more progress...
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3838) MapReduce job submission time has increased in 0.23 when compared to 0.20.206

2012-02-14 Thread Vinod Kumar Vavilapalli (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207943#comment-13207943
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3838:


Amar, need more information on this. How did you measure the job submit time 
exactly?

 MapReduce job submission time has increased in 0.23 when compared to 0.20.206
 -

 Key: MAPREDUCE-3838
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3838
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.24.0
Reporter: Amar Kamat
  Labels: gridmix, job-submit-time, yarn
 Fix For: 0.23.1, 0.24.0


 While running Gridmix on 0.23, we found that the job submission time has 
 increased when compared to 0.20.206. 
 Here are some stats:
 ||Submit-Time||Total number of jobs in YARN|| Total number of jobs in FRED||
 | 25secs|3   |1  |
 | 20secs| 6  | 2 |
 | 15secs| 14 | 4 |
 | 10secs| 24 | 4 |
 | 5secs | 67 | 28|
 Note that Gridmix was run using the same trace.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3838) MapReduce job submission time has increased in 0.23 when compared to 0.20.206

2012-02-14 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3838:
---

 Target Version/s: 0.23.2  (was: 0.24.0)
Affects Version/s: (was: 0.24.0)
   0.23.0
Fix Version/s: (was: 0.23.1)
   (was: 0.24.0)

Setting a tentative target version of 0.23.2.

 MapReduce job submission time has increased in 0.23 when compared to 0.20.206
 -

 Key: MAPREDUCE-3838
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3838
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.23.0
Reporter: Amar Kamat
  Labels: gridmix, job-submit-time, yarn

 While running Gridmix on 0.23, we found that the job submission time has 
 increased when compared to 0.20.206. 
 Here are some stats:
 ||Submit-Time||Total number of jobs in YARN|| Total number of jobs in FRED||
 | 25secs|3   |1  |
 | 20secs| 6  | 2 |
 | 15secs| 14 | 4 |
 | 10secs| 24 | 4 |
 | 5secs | 67 | 28|
 Note that Gridmix was run using the same trace.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3838) MapReduce job submission time has increased in 0.23 when compared to 0.20.206

2012-02-14 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3838:
---

Issue Type: Sub-task  (was: Bug)
Parent: MAPREDUCE-3561

 MapReduce job submission time has increased in 0.23 when compared to 0.20.206
 -

 Key: MAPREDUCE-3838
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3838
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: client
Affects Versions: 0.23.0
Reporter: Amar Kamat
  Labels: gridmix, job-submit-time, yarn

 While running Gridmix on 0.23, we found that the job submission time has 
 increased when compared to 0.20.206. 
 Here are some stats:
 ||Submit-Time||Total number of jobs in YARN|| Total number of jobs in FRED||
 | 25secs|3   |1  |
 | 20secs| 6  | 2 |
 | 15secs| 14 | 4 |
 | 10secs| 24 | 4 |
 | 5secs | 67 | 28|
 Note that Gridmix was run using the same trace.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3846) Restarted+Recovered AM hangs in some corner cases

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207953#comment-13207953
 ] 

Hudson commented on MAPREDUCE-3846:
---

Integrated in Hadoop-Hdfs-0.23-Commit #537 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/537/])
MAPREDUCE-3802. Added test to validate that AM can crash multiple times and 
still can recover successfully after MAPREDUCE-3846. (vinodkv)
svn merge --ignore-ancestry -c 1244178 ../../trunk/ (Revision 1244180)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244180
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java


 Restarted+Recovered AM hangs in some corner cases
 -

 Key: MAPREDUCE-3846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3846-20120210.txt, 
 MAPREDUCE-3846-20120210.txt, MAPREDUCE-3846-20120213.txt


 [~karams] found this while testing AM restart/recovery feature. After the 
 first generation AM crashes (manually killed by kill -9), the second 
 generation AM starts, but hangs after a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3802) If an MR AM dies twice it looks like the process freezes

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207954#comment-13207954
 ] 

Hudson commented on MAPREDUCE-3802:
---

Integrated in Hadoop-Hdfs-0.23-Commit #537 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/537/])
MAPREDUCE-3802. Added test to validate that AM can crash multiple times and 
still can recover successfully after MAPREDUCE-3846. (vinodkv)
svn merge --ignore-ancestry -c 1244178 ../../trunk/ (Revision 1244180)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244180
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java


 If an MR AM dies twice  it looks like the process freezes
 -

 Key: MAPREDUCE-3802
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3802
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Affects Versions: 0.23.1, 0.24.0
Reporter: Robert Joseph Evans
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3802-20120213.txt, 
 MAPREDUCE-3802-20120213.txt, syslog


 It looks like recovering from an RM AM dieing works very well on a single 
 failure.  But if it fails multiple times we appear to get into a live lock 
 situation.
 {noformat}
 yarn jar 
 hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*-SNAPSHOT.jar 
 wordcount -Dyarn.app.mapreduce.am.log.level=DEBUG -Dmapreduce.job.reduces=30 
 input output
 12/02/03 21:06:57 WARN conf.Configuration: fs.default.name is deprecated. 
 Instead, use fs.defaultFS
 12/02/03 21:06:57 WARN conf.Configuration: mapred.used.genericoptionsparser 
 is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
 12/02/03 21:06:57 INFO input.FileInputFormat: Total input paths to process : 
 17
 12/02/03 21:06:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 12/02/03 21:06:57 WARN snappy.LoadSnappy: Snappy native library not loaded
 12/02/03 21:06:57 INFO mapreduce.JobSubmitter: number of splits:17
 12/02/03 21:06:57 INFO mapred.ResourceMgrDelegate: Submitted application 
 application_1328302034486_0003 to ResourceManager at HOST/IP:8040
 12/02/03 21:06:57 INFO mapreduce.Job: The url to track the job: 
 http://HOST:8088/proxy/application_1328302034486_0003/
 12/02/03 21:06:57 INFO mapreduce.Job: Running job: job_1328302034486_0003
 12/02/03 21:07:03 INFO mapreduce.Job: Job job_1328302034486_0003 running in 
 uber mode : false
 12/02/03 21:07:03 INFO mapreduce.Job:  map 0% reduce 0%
 12/02/03 21:07:09 INFO mapreduce.Job:  map 5% reduce 0%
 12/02/03 21:07:10 INFO mapreduce.Job:  map 17% reduce 0%
 #KILLED AM with kill -9 here
 12/02/03 21:07:16 INFO mapreduce.Job:  map 29% reduce 0%
 12/02/03 21:07:17 INFO mapreduce.Job:  map 35% reduce 0%
 12/02/03 21:07:30 INFO mapreduce.Job:  map 52% reduce 0%
 12/02/03 21:07:35 INFO mapreduce.Job:  map 58% reduce 0%
 12/02/03 21:07:37 INFO mapreduce.Job:  map 70% reduce 0%
 12/02/03 21:07:41 INFO mapreduce.Job:  map 76% reduce 0%
 12/02/03 21:07:43 INFO mapreduce.Job:  map 82% reduce 0%
 12/02/03 21:07:44 INFO mapreduce.Job:  map 88% reduce 0%
 12/02/03 21:07:47 INFO mapreduce.Job:  map 94% reduce 0%
 12/02/03 21:07:49 INFO mapreduce.Job:  map 100% reduce 0%
 12/02/03 21:07:53 INFO mapreduce.Job:  map 100% reduce 3%
 12/02/03 21:08:00 INFO mapreduce.Job:  map 100% reduce 6%
 12/02/03 21:08:06 INFO mapreduce.Job:  map 100% reduce 10%
 12/02/03 21:08:12 INFO mapreduce.Job:  map 100% reduce 13%
 12/02/03 21:08:18 INFO mapreduce.Job:  map 100% reduce 16%
 #killed AM with kill -9 here
 12/02/03 21:08:20 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 0 time(s).
 12/02/03 21:08:21 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 1 time(s).
 12/02/03 21:08:22 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 2 time(s).
 12/02/03 21:08:26 INFO mapreduce.Job:  map 64% reduce 16%
 #It never makes any more progress...
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3802) If an MR AM dies twice it looks like the process freezes

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207959#comment-13207959
 ] 

Hudson commented on MAPREDUCE-3802:
---

Integrated in Hadoop-Common-0.23-Commit #549 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/549/])
MAPREDUCE-3802. Added test to validate that AM can crash multiple times and 
still can recover successfully after MAPREDUCE-3846. (vinodkv)
svn merge --ignore-ancestry -c 1244178 ../../trunk/ (Revision 1244180)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244180
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java


 If an MR AM dies twice  it looks like the process freezes
 -

 Key: MAPREDUCE-3802
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3802
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Affects Versions: 0.23.1, 0.24.0
Reporter: Robert Joseph Evans
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3802-20120213.txt, 
 MAPREDUCE-3802-20120213.txt, syslog


 It looks like recovering from an RM AM dieing works very well on a single 
 failure.  But if it fails multiple times we appear to get into a live lock 
 situation.
 {noformat}
 yarn jar 
 hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*-SNAPSHOT.jar 
 wordcount -Dyarn.app.mapreduce.am.log.level=DEBUG -Dmapreduce.job.reduces=30 
 input output
 12/02/03 21:06:57 WARN conf.Configuration: fs.default.name is deprecated. 
 Instead, use fs.defaultFS
 12/02/03 21:06:57 WARN conf.Configuration: mapred.used.genericoptionsparser 
 is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
 12/02/03 21:06:57 INFO input.FileInputFormat: Total input paths to process : 
 17
 12/02/03 21:06:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 12/02/03 21:06:57 WARN snappy.LoadSnappy: Snappy native library not loaded
 12/02/03 21:06:57 INFO mapreduce.JobSubmitter: number of splits:17
 12/02/03 21:06:57 INFO mapred.ResourceMgrDelegate: Submitted application 
 application_1328302034486_0003 to ResourceManager at HOST/IP:8040
 12/02/03 21:06:57 INFO mapreduce.Job: The url to track the job: 
 http://HOST:8088/proxy/application_1328302034486_0003/
 12/02/03 21:06:57 INFO mapreduce.Job: Running job: job_1328302034486_0003
 12/02/03 21:07:03 INFO mapreduce.Job: Job job_1328302034486_0003 running in 
 uber mode : false
 12/02/03 21:07:03 INFO mapreduce.Job:  map 0% reduce 0%
 12/02/03 21:07:09 INFO mapreduce.Job:  map 5% reduce 0%
 12/02/03 21:07:10 INFO mapreduce.Job:  map 17% reduce 0%
 #KILLED AM with kill -9 here
 12/02/03 21:07:16 INFO mapreduce.Job:  map 29% reduce 0%
 12/02/03 21:07:17 INFO mapreduce.Job:  map 35% reduce 0%
 12/02/03 21:07:30 INFO mapreduce.Job:  map 52% reduce 0%
 12/02/03 21:07:35 INFO mapreduce.Job:  map 58% reduce 0%
 12/02/03 21:07:37 INFO mapreduce.Job:  map 70% reduce 0%
 12/02/03 21:07:41 INFO mapreduce.Job:  map 76% reduce 0%
 12/02/03 21:07:43 INFO mapreduce.Job:  map 82% reduce 0%
 12/02/03 21:07:44 INFO mapreduce.Job:  map 88% reduce 0%
 12/02/03 21:07:47 INFO mapreduce.Job:  map 94% reduce 0%
 12/02/03 21:07:49 INFO mapreduce.Job:  map 100% reduce 0%
 12/02/03 21:07:53 INFO mapreduce.Job:  map 100% reduce 3%
 12/02/03 21:08:00 INFO mapreduce.Job:  map 100% reduce 6%
 12/02/03 21:08:06 INFO mapreduce.Job:  map 100% reduce 10%
 12/02/03 21:08:12 INFO mapreduce.Job:  map 100% reduce 13%
 12/02/03 21:08:18 INFO mapreduce.Job:  map 100% reduce 16%
 #killed AM with kill -9 here
 12/02/03 21:08:20 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 0 time(s).
 12/02/03 21:08:21 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 1 time(s).
 12/02/03 21:08:22 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 2 time(s).
 12/02/03 21:08:26 INFO mapreduce.Job:  map 64% reduce 16%
 #It never makes any more progress...
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3846) Restarted+Recovered AM hangs in some corner cases

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207958#comment-13207958
 ] 

Hudson commented on MAPREDUCE-3846:
---

Integrated in Hadoop-Common-0.23-Commit #549 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/549/])
MAPREDUCE-3802. Added test to validate that AM can crash multiple times and 
still can recover successfully after MAPREDUCE-3846. (vinodkv)
svn merge --ignore-ancestry -c 1244178 ../../trunk/ (Revision 1244180)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244180
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java


 Restarted+Recovered AM hangs in some corner cases
 -

 Key: MAPREDUCE-3846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3846-20120210.txt, 
 MAPREDUCE-3846-20120210.txt, MAPREDUCE-3846-20120213.txt


 [~karams] found this while testing AM restart/recovery feature. After the 
 first generation AM crashes (manually killed by kill -9), the second 
 generation AM starts, but hangs after a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3846) Restarted+Recovered AM hangs in some corner cases

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207963#comment-13207963
 ] 

Hudson commented on MAPREDUCE-3846:
---

Integrated in Hadoop-Hdfs-trunk-Commit #1800 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1800/])
MAPREDUCE-3802. Added test to validate that AM can crash multiple times and 
still can recover successfully after MAPREDUCE-3846. (vinodkv) (Revision 
1244178)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244178
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java


 Restarted+Recovered AM hangs in some corner cases
 -

 Key: MAPREDUCE-3846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3846-20120210.txt, 
 MAPREDUCE-3846-20120210.txt, MAPREDUCE-3846-20120213.txt


 [~karams] found this while testing AM restart/recovery feature. After the 
 first generation AM crashes (manually killed by kill -9), the second 
 generation AM starts, but hangs after a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3802) If an MR AM dies twice it looks like the process freezes

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207966#comment-13207966
 ] 

Hudson commented on MAPREDUCE-3802:
---

Integrated in Hadoop-Hdfs-trunk-Commit #1800 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1800/])
MAPREDUCE-3802. Added test to validate that AM can crash multiple times and 
still can recover successfully after MAPREDUCE-3846. (vinodkv) (Revision 
1244178)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244178
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java


 If an MR AM dies twice  it looks like the process freezes
 -

 Key: MAPREDUCE-3802
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3802
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Affects Versions: 0.23.1, 0.24.0
Reporter: Robert Joseph Evans
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3802-20120213.txt, 
 MAPREDUCE-3802-20120213.txt, syslog


 It looks like recovering from an RM AM dieing works very well on a single 
 failure.  But if it fails multiple times we appear to get into a live lock 
 situation.
 {noformat}
 yarn jar 
 hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*-SNAPSHOT.jar 
 wordcount -Dyarn.app.mapreduce.am.log.level=DEBUG -Dmapreduce.job.reduces=30 
 input output
 12/02/03 21:06:57 WARN conf.Configuration: fs.default.name is deprecated. 
 Instead, use fs.defaultFS
 12/02/03 21:06:57 WARN conf.Configuration: mapred.used.genericoptionsparser 
 is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
 12/02/03 21:06:57 INFO input.FileInputFormat: Total input paths to process : 
 17
 12/02/03 21:06:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 12/02/03 21:06:57 WARN snappy.LoadSnappy: Snappy native library not loaded
 12/02/03 21:06:57 INFO mapreduce.JobSubmitter: number of splits:17
 12/02/03 21:06:57 INFO mapred.ResourceMgrDelegate: Submitted application 
 application_1328302034486_0003 to ResourceManager at HOST/IP:8040
 12/02/03 21:06:57 INFO mapreduce.Job: The url to track the job: 
 http://HOST:8088/proxy/application_1328302034486_0003/
 12/02/03 21:06:57 INFO mapreduce.Job: Running job: job_1328302034486_0003
 12/02/03 21:07:03 INFO mapreduce.Job: Job job_1328302034486_0003 running in 
 uber mode : false
 12/02/03 21:07:03 INFO mapreduce.Job:  map 0% reduce 0%
 12/02/03 21:07:09 INFO mapreduce.Job:  map 5% reduce 0%
 12/02/03 21:07:10 INFO mapreduce.Job:  map 17% reduce 0%
 #KILLED AM with kill -9 here
 12/02/03 21:07:16 INFO mapreduce.Job:  map 29% reduce 0%
 12/02/03 21:07:17 INFO mapreduce.Job:  map 35% reduce 0%
 12/02/03 21:07:30 INFO mapreduce.Job:  map 52% reduce 0%
 12/02/03 21:07:35 INFO mapreduce.Job:  map 58% reduce 0%
 12/02/03 21:07:37 INFO mapreduce.Job:  map 70% reduce 0%
 12/02/03 21:07:41 INFO mapreduce.Job:  map 76% reduce 0%
 12/02/03 21:07:43 INFO mapreduce.Job:  map 82% reduce 0%
 12/02/03 21:07:44 INFO mapreduce.Job:  map 88% reduce 0%
 12/02/03 21:07:47 INFO mapreduce.Job:  map 94% reduce 0%
 12/02/03 21:07:49 INFO mapreduce.Job:  map 100% reduce 0%
 12/02/03 21:07:53 INFO mapreduce.Job:  map 100% reduce 3%
 12/02/03 21:08:00 INFO mapreduce.Job:  map 100% reduce 6%
 12/02/03 21:08:06 INFO mapreduce.Job:  map 100% reduce 10%
 12/02/03 21:08:12 INFO mapreduce.Job:  map 100% reduce 13%
 12/02/03 21:08:18 INFO mapreduce.Job:  map 100% reduce 16%
 #killed AM with kill -9 here
 12/02/03 21:08:20 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 0 time(s).
 12/02/03 21:08:21 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 1 time(s).
 12/02/03 21:08:22 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 2 time(s).
 12/02/03 21:08:26 INFO mapreduce.Job:  map 64% reduce 16%
 #It never makes any more progress...
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3846) Restarted+Recovered AM hangs in some corner cases

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207970#comment-13207970
 ] 

Hudson commented on MAPREDUCE-3846:
---

Integrated in Hadoop-Common-trunk-Commit #1726 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1726/])
MAPREDUCE-3802. Added test to validate that AM can crash multiple times and 
still can recover successfully after MAPREDUCE-3846. (vinodkv) (Revision 
1244178)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244178
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java


 Restarted+Recovered AM hangs in some corner cases
 -

 Key: MAPREDUCE-3846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3846-20120210.txt, 
 MAPREDUCE-3846-20120210.txt, MAPREDUCE-3846-20120213.txt


 [~karams] found this while testing AM restart/recovery feature. After the 
 first generation AM crashes (manually killed by kill -9), the second 
 generation AM starts, but hangs after a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3802) If an MR AM dies twice it looks like the process freezes

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207974#comment-13207974
 ] 

Hudson commented on MAPREDUCE-3802:
---

Integrated in Hadoop-Common-trunk-Commit #1726 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1726/])
MAPREDUCE-3802. Added test to validate that AM can crash multiple times and 
still can recover successfully after MAPREDUCE-3846. (vinodkv) (Revision 
1244178)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244178
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java


 If an MR AM dies twice  it looks like the process freezes
 -

 Key: MAPREDUCE-3802
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3802
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Affects Versions: 0.23.1, 0.24.0
Reporter: Robert Joseph Evans
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3802-20120213.txt, 
 MAPREDUCE-3802-20120213.txt, syslog


 It looks like recovering from an RM AM dieing works very well on a single 
 failure.  But if it fails multiple times we appear to get into a live lock 
 situation.
 {noformat}
 yarn jar 
 hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*-SNAPSHOT.jar 
 wordcount -Dyarn.app.mapreduce.am.log.level=DEBUG -Dmapreduce.job.reduces=30 
 input output
 12/02/03 21:06:57 WARN conf.Configuration: fs.default.name is deprecated. 
 Instead, use fs.defaultFS
 12/02/03 21:06:57 WARN conf.Configuration: mapred.used.genericoptionsparser 
 is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
 12/02/03 21:06:57 INFO input.FileInputFormat: Total input paths to process : 
 17
 12/02/03 21:06:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 12/02/03 21:06:57 WARN snappy.LoadSnappy: Snappy native library not loaded
 12/02/03 21:06:57 INFO mapreduce.JobSubmitter: number of splits:17
 12/02/03 21:06:57 INFO mapred.ResourceMgrDelegate: Submitted application 
 application_1328302034486_0003 to ResourceManager at HOST/IP:8040
 12/02/03 21:06:57 INFO mapreduce.Job: The url to track the job: 
 http://HOST:8088/proxy/application_1328302034486_0003/
 12/02/03 21:06:57 INFO mapreduce.Job: Running job: job_1328302034486_0003
 12/02/03 21:07:03 INFO mapreduce.Job: Job job_1328302034486_0003 running in 
 uber mode : false
 12/02/03 21:07:03 INFO mapreduce.Job:  map 0% reduce 0%
 12/02/03 21:07:09 INFO mapreduce.Job:  map 5% reduce 0%
 12/02/03 21:07:10 INFO mapreduce.Job:  map 17% reduce 0%
 #KILLED AM with kill -9 here
 12/02/03 21:07:16 INFO mapreduce.Job:  map 29% reduce 0%
 12/02/03 21:07:17 INFO mapreduce.Job:  map 35% reduce 0%
 12/02/03 21:07:30 INFO mapreduce.Job:  map 52% reduce 0%
 12/02/03 21:07:35 INFO mapreduce.Job:  map 58% reduce 0%
 12/02/03 21:07:37 INFO mapreduce.Job:  map 70% reduce 0%
 12/02/03 21:07:41 INFO mapreduce.Job:  map 76% reduce 0%
 12/02/03 21:07:43 INFO mapreduce.Job:  map 82% reduce 0%
 12/02/03 21:07:44 INFO mapreduce.Job:  map 88% reduce 0%
 12/02/03 21:07:47 INFO mapreduce.Job:  map 94% reduce 0%
 12/02/03 21:07:49 INFO mapreduce.Job:  map 100% reduce 0%
 12/02/03 21:07:53 INFO mapreduce.Job:  map 100% reduce 3%
 12/02/03 21:08:00 INFO mapreduce.Job:  map 100% reduce 6%
 12/02/03 21:08:06 INFO mapreduce.Job:  map 100% reduce 10%
 12/02/03 21:08:12 INFO mapreduce.Job:  map 100% reduce 13%
 12/02/03 21:08:18 INFO mapreduce.Job:  map 100% reduce 16%
 #killed AM with kill -9 here
 12/02/03 21:08:20 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 0 time(s).
 12/02/03 21:08:21 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 1 time(s).
 12/02/03 21:08:22 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 2 time(s).
 12/02/03 21:08:26 INFO mapreduce.Job:  map 64% reduce 16%
 #It never makes any more progress...
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3846) Restarted+Recovered AM hangs in some corner cases

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207987#comment-13207987
 ] 

Hudson commented on MAPREDUCE-3846:
---

Integrated in Hadoop-Mapreduce-0.23-Commit #553 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/553/])
MAPREDUCE-3802. Added test to validate that AM can crash multiple times and 
still can recover successfully after MAPREDUCE-3846. (vinodkv)
svn merge --ignore-ancestry -c 1244178 ../../trunk/ (Revision 1244180)

 Result = ABORTED
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244180
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java


 Restarted+Recovered AM hangs in some corner cases
 -

 Key: MAPREDUCE-3846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3846-20120210.txt, 
 MAPREDUCE-3846-20120210.txt, MAPREDUCE-3846-20120213.txt


 [~karams] found this while testing AM restart/recovery feature. After the 
 first generation AM crashes (manually killed by kill -9), the second 
 generation AM starts, but hangs after a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3802) If an MR AM dies twice it looks like the process freezes

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207989#comment-13207989
 ] 

Hudson commented on MAPREDUCE-3802:
---

Integrated in Hadoop-Mapreduce-0.23-Commit #553 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/553/])
MAPREDUCE-3802. Added test to validate that AM can crash multiple times and 
still can recover successfully after MAPREDUCE-3846. (vinodkv)
svn merge --ignore-ancestry -c 1244178 ../../trunk/ (Revision 1244180)

 Result = ABORTED
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244180
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java


 If an MR AM dies twice  it looks like the process freezes
 -

 Key: MAPREDUCE-3802
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3802
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Affects Versions: 0.23.1, 0.24.0
Reporter: Robert Joseph Evans
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3802-20120213.txt, 
 MAPREDUCE-3802-20120213.txt, syslog


 It looks like recovering from an RM AM dieing works very well on a single 
 failure.  But if it fails multiple times we appear to get into a live lock 
 situation.
 {noformat}
 yarn jar 
 hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*-SNAPSHOT.jar 
 wordcount -Dyarn.app.mapreduce.am.log.level=DEBUG -Dmapreduce.job.reduces=30 
 input output
 12/02/03 21:06:57 WARN conf.Configuration: fs.default.name is deprecated. 
 Instead, use fs.defaultFS
 12/02/03 21:06:57 WARN conf.Configuration: mapred.used.genericoptionsparser 
 is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
 12/02/03 21:06:57 INFO input.FileInputFormat: Total input paths to process : 
 17
 12/02/03 21:06:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 12/02/03 21:06:57 WARN snappy.LoadSnappy: Snappy native library not loaded
 12/02/03 21:06:57 INFO mapreduce.JobSubmitter: number of splits:17
 12/02/03 21:06:57 INFO mapred.ResourceMgrDelegate: Submitted application 
 application_1328302034486_0003 to ResourceManager at HOST/IP:8040
 12/02/03 21:06:57 INFO mapreduce.Job: The url to track the job: 
 http://HOST:8088/proxy/application_1328302034486_0003/
 12/02/03 21:06:57 INFO mapreduce.Job: Running job: job_1328302034486_0003
 12/02/03 21:07:03 INFO mapreduce.Job: Job job_1328302034486_0003 running in 
 uber mode : false
 12/02/03 21:07:03 INFO mapreduce.Job:  map 0% reduce 0%
 12/02/03 21:07:09 INFO mapreduce.Job:  map 5% reduce 0%
 12/02/03 21:07:10 INFO mapreduce.Job:  map 17% reduce 0%
 #KILLED AM with kill -9 here
 12/02/03 21:07:16 INFO mapreduce.Job:  map 29% reduce 0%
 12/02/03 21:07:17 INFO mapreduce.Job:  map 35% reduce 0%
 12/02/03 21:07:30 INFO mapreduce.Job:  map 52% reduce 0%
 12/02/03 21:07:35 INFO mapreduce.Job:  map 58% reduce 0%
 12/02/03 21:07:37 INFO mapreduce.Job:  map 70% reduce 0%
 12/02/03 21:07:41 INFO mapreduce.Job:  map 76% reduce 0%
 12/02/03 21:07:43 INFO mapreduce.Job:  map 82% reduce 0%
 12/02/03 21:07:44 INFO mapreduce.Job:  map 88% reduce 0%
 12/02/03 21:07:47 INFO mapreduce.Job:  map 94% reduce 0%
 12/02/03 21:07:49 INFO mapreduce.Job:  map 100% reduce 0%
 12/02/03 21:07:53 INFO mapreduce.Job:  map 100% reduce 3%
 12/02/03 21:08:00 INFO mapreduce.Job:  map 100% reduce 6%
 12/02/03 21:08:06 INFO mapreduce.Job:  map 100% reduce 10%
 12/02/03 21:08:12 INFO mapreduce.Job:  map 100% reduce 13%
 12/02/03 21:08:18 INFO mapreduce.Job:  map 100% reduce 16%
 #killed AM with kill -9 here
 12/02/03 21:08:20 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 0 time(s).
 12/02/03 21:08:21 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 1 time(s).
 12/02/03 21:08:22 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 2 time(s).
 12/02/03 21:08:26 INFO mapreduce.Job:  map 64% reduce 16%
 #It never makes any more progress...
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3802) If an MR AM dies twice it looks like the process freezes

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207992#comment-13207992
 ] 

Hudson commented on MAPREDUCE-3802:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #1737 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1737/])
MAPREDUCE-3802. Added test to validate that AM can crash multiple times and 
still can recover successfully after MAPREDUCE-3846. (vinodkv) (Revision 
1244178)

 Result = ABORTED
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244178
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java


 If an MR AM dies twice  it looks like the process freezes
 -

 Key: MAPREDUCE-3802
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3802
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Affects Versions: 0.23.1, 0.24.0
Reporter: Robert Joseph Evans
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3802-20120213.txt, 
 MAPREDUCE-3802-20120213.txt, syslog


 It looks like recovering from an RM AM dieing works very well on a single 
 failure.  But if it fails multiple times we appear to get into a live lock 
 situation.
 {noformat}
 yarn jar 
 hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*-SNAPSHOT.jar 
 wordcount -Dyarn.app.mapreduce.am.log.level=DEBUG -Dmapreduce.job.reduces=30 
 input output
 12/02/03 21:06:57 WARN conf.Configuration: fs.default.name is deprecated. 
 Instead, use fs.defaultFS
 12/02/03 21:06:57 WARN conf.Configuration: mapred.used.genericoptionsparser 
 is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
 12/02/03 21:06:57 INFO input.FileInputFormat: Total input paths to process : 
 17
 12/02/03 21:06:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 12/02/03 21:06:57 WARN snappy.LoadSnappy: Snappy native library not loaded
 12/02/03 21:06:57 INFO mapreduce.JobSubmitter: number of splits:17
 12/02/03 21:06:57 INFO mapred.ResourceMgrDelegate: Submitted application 
 application_1328302034486_0003 to ResourceManager at HOST/IP:8040
 12/02/03 21:06:57 INFO mapreduce.Job: The url to track the job: 
 http://HOST:8088/proxy/application_1328302034486_0003/
 12/02/03 21:06:57 INFO mapreduce.Job: Running job: job_1328302034486_0003
 12/02/03 21:07:03 INFO mapreduce.Job: Job job_1328302034486_0003 running in 
 uber mode : false
 12/02/03 21:07:03 INFO mapreduce.Job:  map 0% reduce 0%
 12/02/03 21:07:09 INFO mapreduce.Job:  map 5% reduce 0%
 12/02/03 21:07:10 INFO mapreduce.Job:  map 17% reduce 0%
 #KILLED AM with kill -9 here
 12/02/03 21:07:16 INFO mapreduce.Job:  map 29% reduce 0%
 12/02/03 21:07:17 INFO mapreduce.Job:  map 35% reduce 0%
 12/02/03 21:07:30 INFO mapreduce.Job:  map 52% reduce 0%
 12/02/03 21:07:35 INFO mapreduce.Job:  map 58% reduce 0%
 12/02/03 21:07:37 INFO mapreduce.Job:  map 70% reduce 0%
 12/02/03 21:07:41 INFO mapreduce.Job:  map 76% reduce 0%
 12/02/03 21:07:43 INFO mapreduce.Job:  map 82% reduce 0%
 12/02/03 21:07:44 INFO mapreduce.Job:  map 88% reduce 0%
 12/02/03 21:07:47 INFO mapreduce.Job:  map 94% reduce 0%
 12/02/03 21:07:49 INFO mapreduce.Job:  map 100% reduce 0%
 12/02/03 21:07:53 INFO mapreduce.Job:  map 100% reduce 3%
 12/02/03 21:08:00 INFO mapreduce.Job:  map 100% reduce 6%
 12/02/03 21:08:06 INFO mapreduce.Job:  map 100% reduce 10%
 12/02/03 21:08:12 INFO mapreduce.Job:  map 100% reduce 13%
 12/02/03 21:08:18 INFO mapreduce.Job:  map 100% reduce 16%
 #killed AM with kill -9 here
 12/02/03 21:08:20 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 0 time(s).
 12/02/03 21:08:21 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 1 time(s).
 12/02/03 21:08:22 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
 Already tried 2 time(s).
 12/02/03 21:08:26 INFO mapreduce.Job:  map 64% reduce 16%
 #It never makes any more progress...
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException

2012-02-14 Thread Tsz Wo (Nicholas), SZE (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-3583:
--

Hadoop Flags: Reviewed

+1 patch looks good.  Thanks a lot!

 ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
 -

 Key: MAPREDUCE-3583
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
 Environment: 64-bit Linux:
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, 
 mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583.txt


 HBase PreCommit builds frequently gave us NumberFormatException.
 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
 {code}
 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
 set.  User classes may not be found. See JobConf(Class) or 
 JobConf#setJar(String).
 java.lang.NumberFormatException: For input string: 18446743988060683582
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:422)
   at java.lang.Long.parseLong(Long.java:468)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
   at 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
 causing NFE:
 {code}
 // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
 {code}
 You can find information on the OS at the beginning of 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
 {code}
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 20
 file size   (blocks, -f) unlimited
 pending signals (-i) 16382
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 6
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 2048
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 6
 Running in Jenkins mode
 {code}
 From Nicolas Sze:
 {noformat}
 It looks like that the ppid is a 64-bit positive integer but Java long is 
 signed and so only works with 63-bit positive integers.  In your case,
   2^64  18446743988060683582  2^63.
 Therefore, there is a NFE. 
 {noformat}
 I propose changing allProcessInfo to MapString, ProcessInfo so that we 
 don't encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3846) Restarted+Recovered AM hangs in some corner cases

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207990#comment-13207990
 ] 

Hudson commented on MAPREDUCE-3846:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #1737 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1737/])
MAPREDUCE-3802. Added test to validate that AM can crash multiple times and 
still can recover successfully after MAPREDUCE-3846. (vinodkv) (Revision 
1244178)

 Result = ABORTED
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244178
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java


 Restarted+Recovered AM hangs in some corner cases
 -

 Key: MAPREDUCE-3846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3846-20120210.txt, 
 MAPREDUCE-3846-20120210.txt, MAPREDUCE-3846-20120213.txt


 [~karams] found this while testing AM restart/recovery feature. After the 
 first generation AM crashes (manually killed by kill -9), the second 
 generation AM starts, but hangs after a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3761) AM info in job -list does not reflect the actual AM hostname

2012-02-14 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3761:
---

Attachment: MAPREDUCE-3761-20120214.1.txt

Fixing the test.

 AM info in job -list does not reflect the actual AM hostname
 

 Key: MAPREDUCE-3761
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3761
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3761-20120202.txt, 
 MAPREDUCE-3761-20120214.1.txt


 The AM info field on bin/mapred job -list currently has a value 
 resourcemanager hostname:8088/proxy/appID. This info is irrelevant unless 
 it shows the real information of where the AM was launched. This needs to be 
 fixed to show the AM host details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3761) AM info in job -list does not reflect the actual AM hostname

2012-02-14 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3761:
---

Status: Patch Available  (was: Open)

 AM info in job -list does not reflect the actual AM hostname
 

 Key: MAPREDUCE-3761
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3761
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3761-20120202.txt, 
 MAPREDUCE-3761-20120214.1.txt


 The AM info field on bin/mapred job -list currently has a value 
 resourcemanager hostname:8088/proxy/appID. This info is irrelevant unless 
 it shows the real information of where the AM was launched. This needs to be 
 fixed to show the AM host details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3812) Change default memory slot sizes to be 1.5GB

2012-02-14 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3812:
---

Fix Version/s: (was: 0.23.1)
   0.23.2
   Status: Open  (was: Patch Available)

This patch needs more work. Moving to 0.23.2.

 Change default memory slot sizes to be 1.5GB
 

 Key: MAPREDUCE-3812
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3812
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2, performance
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3812-20120205.txt, 
 MAPREDUCE-3812-20120206.1.txt, MAPREDUCE-3812-20120206.txt, 
 MAPREDUCE-3812.patch, MAPREDUCE-3812.patch


 After a few performance improvements tracked at MAPREDUCE-3561, like 
 MAPREDUCE-3511 and MAPREDUCE-3567, even a 100K maps job can also run within 
 1GB vmem. We earlier increased AM slot size from 1 slot to two slots to work 
 around the issues with AM heap. Now that those are fixed, we should go back 
 to 1GB.
 This is just a configuration change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3838) MapReduce job submission time has increased in 0.23 when compared to 0.20.206

2012-02-14 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3838:
---

Fix Version/s: 0.23.2

 MapReduce job submission time has increased in 0.23 when compared to 0.20.206
 -

 Key: MAPREDUCE-3838
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3838
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: client
Affects Versions: 0.23.0
Reporter: Amar Kamat
  Labels: gridmix, job-submit-time, yarn
 Fix For: 0.23.2


 While running Gridmix on 0.23, we found that the job submission time has 
 increased when compared to 0.20.206. 
 Here are some stats:
 ||Submit-Time||Total number of jobs in YARN|| Total number of jobs in FRED||
 | 25secs|3   |1  |
 | 20secs| 6  | 2 |
 | 15secs| 14 | 4 |
 | 10secs| 24 | 4 |
 | 5secs | 67 | 28|
 Note that Gridmix was run using the same trace.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3854) Reinstate environment variable tests in TestMiniMRChildTask

2012-02-14 Thread Vinod Kumar Vavilapalli (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208031#comment-13208031
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3854:


Looks good. +1. Pushing this in.

 Reinstate environment variable tests in TestMiniMRChildTask
 ---

 Key: MAPREDUCE-3854
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3854
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: mrv2
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-3854.patch, MAPREDUCE-3854.patch


 MAPREDUCE-3716 reinstated one of the tests in TestMiniMRChildTask, but there 
 are two more which should be run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3849) Change TokenCache's reading of the binary token file

2012-02-14 Thread Daryn Sharp (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated MAPREDUCE-3849:
---

Attachment: MAPREDUCE-3849-2.patch

Add more tests.

 Change TokenCache's reading of the binary token file
 

 Key: MAPREDUCE-3849
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3849
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: security
Affects Versions: 0.23.1, 0.24.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: MAPREDUCE-3849-2.patch, MAPREDUCE-3849.patch


 When obtaining the tokens for a {{FileSystem}}, the {{TokenCache}} will read 
 the binary token file if a token is not already in the {{Credentials}}.  
 However, it will overwrite any existing tokens in the {{Credentials}} with 
 the contents of the binary token file if a single token is missing.  This may 
 cause new tokens to be replaced with invalid/cancelled tokens from the binary 
 file.  The new tokens will not be canceled, and thus leak in the namenode 
 until they expire.
 The binary tokens should be merged with, but not replace, existing tokens in 
 the {{Credentials}}.
 The code that reads the binary token file is prefaced with:
 {code}
 //TODO: Need to come up with a better place to put
 //this block of code to do with reading the file
 {code}
 Also, the loading of the binary token file is the only reason that the 
 {{TokenCache}} has to use {{getCanonicalService}}.  If this linkage can be 
 broken, then the 1-to-1 filesystem to token service coupling may be removed.  
 And use of {{getCanonicalService}} can be removed in a subsequent jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3854) Reinstate environment variable tests in TestMiniMRChildTask

2012-02-14 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3854:
---

   Resolution: Fixed
Fix Version/s: 0.23.1
 Release Note: Fixed and reenabled tests related to MR child JVM's 
environmental variables in TestMiniMRChildTask.
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I just committed this to trunk and branch-0.23. Thanks Tom!

 Reinstate environment variable tests in TestMiniMRChildTask
 ---

 Key: MAPREDUCE-3854
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3854
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: mrv2
Reporter: Tom White
Assignee: Tom White
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3854.patch, MAPREDUCE-3854.patch


 MAPREDUCE-3716 reinstated one of the tests in TestMiniMRChildTask, but there 
 are two more which should be run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3736) Variable substitution depth too large for fs.default.name causes jobs to fail

2012-02-14 Thread Alejandro Abdelnur (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208049#comment-13208049
]

Alejandro Abdelnur commented on MAPREDUCE-3736:
---

+1. I'll be committing this later today.

Variable substitution depth too large for fs.default.name causes jobs to fail
-

Key: MAPREDUCE-3736
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3736
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.1
Reporter: Eli Collins
Assignee: Ahmed Radwan
Priority: Blocker
Attachments: MAPREDUCE-3736.patch, MAPREDUCE-3736_rev2.patch

I'm seeing the same failure as MAPREDUCE-3462 in downstream projects running
against a recent build of branch-23. MR-3462 modified the tests rather than
fixing the framework. In that jira Ravi mentioned I'm still ignorant of the
change which made the tests start to fail. I should probably understand
better the reasons for that change before proposing a more generalized fix.
Let's figure out the general fix (rather than require all projects to set
mapreduce.job.hdfs-servers in their conf we should fix this in the
framework). Perhaps we should not default this config to $fs.default.name?

[jira] [Commented] (MAPREDUCE-3824) Distributed caches are not removed properly

2012-02-14 Thread Thomas Graves (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208050#comment-13208050
 ] 

Thomas Graves commented on MAPREDUCE-3824:
--

After some debugging, it appears that the size isn't being calculated properly 
(set to 0) if the user specifies a directory to go into the distributed cache. 
It only appears to happen if its a private cached directory. I'm working on a 
patch.

Allen, can you confirm that your users were specifying a directory to be cached 
and not a file?

 Distributed caches are not removed properly
 ---

 Key: MAPREDUCE-3824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3824
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1.0.0
Reporter: Allen Wittenauer
Priority: Critical
 Attachments: MAPREDUCE-3824-branch-1.0.txt


 Distributed caches are not being properly removed by the TaskTracker when 
 they are expected to be expired. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-3824) Distributed caches are not removed properly

2012-02-14 Thread Thomas Graves (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned MAPREDUCE-3824:


Assignee: Thomas Graves

 Distributed caches are not removed properly
 ---

 Key: MAPREDUCE-3824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3824
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1.0.0
Reporter: Allen Wittenauer
Assignee: Thomas Graves
Priority: Critical
 Attachments: MAPREDUCE-3824-branch-1.0.txt


 Distributed caches are not being properly removed by the TaskTracker when 
 they are expected to be expired. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3854) Reinstate environment variable tests in TestMiniMRChildTask

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208081#comment-13208081
 ] 

Hudson commented on MAPREDUCE-3854:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #1739 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1739/])
MAPREDUCE-3854. Fixed and reenabled tests related to MR child JVM's 
environmental variables in TestMiniMRChildTask. (Tom White via vinodkv) 
(Revision 1244223)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244223
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMiniMRChildTask.java


 Reinstate environment variable tests in TestMiniMRChildTask
 ---

 Key: MAPREDUCE-3854
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3854
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: mrv2
Reporter: Tom White
Assignee: Tom White
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3854.patch, MAPREDUCE-3854.patch


 MAPREDUCE-3716 reinstated one of the tests in TestMiniMRChildTask, but there 
 are two more which should be run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3854) Reinstate environment variable tests in TestMiniMRChildTask

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208082#comment-13208082
 ] 

Hudson commented on MAPREDUCE-3854:
---

Integrated in Hadoop-Mapreduce-0.23-Commit #555 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/555/])
MAPREDUCE-3854. Fixed and reenabled tests related to MR child JVM's 
environmental variables in TestMiniMRChildTask. (Tom White via vinodkv)
svn merge --ignore-ancestry -c 1244223 ../../trunk/ (Revision 1244224)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244224
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMiniMRChildTask.java


 Reinstate environment variable tests in TestMiniMRChildTask
 ---

 Key: MAPREDUCE-3854
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3854
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: mrv2
Reporter: Tom White
Assignee: Tom White
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3854.patch, MAPREDUCE-3854.patch


 MAPREDUCE-3716 reinstated one of the tests in TestMiniMRChildTask, but there 
 are two more which should be run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException

2012-02-14 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208083#comment-13208083
 ] 

Zhihong Yu commented on MAPREDUCE-3583:
---

For TRUNK, should both of the following be included in patch ?
{code}
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ProcfsBasedProcessTree.java
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
{code}

 ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
 -

 Key: MAPREDUCE-3583
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
 Environment: 64-bit Linux:
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, 
 mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583.txt


 HBase PreCommit builds frequently gave us NumberFormatException.
 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
 {code}
 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
 set.  User classes may not be found. See JobConf(Class) or 
 JobConf#setJar(String).
 java.lang.NumberFormatException: For input string: 18446743988060683582
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:422)
   at java.lang.Long.parseLong(Long.java:468)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
   at 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
 causing NFE:
 {code}
 // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
 {code}
 You can find information on the OS at the beginning of 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
 {code}
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 20
 file size   (blocks, -f) unlimited
 pending signals (-i) 16382
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 6
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 2048
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 6
 Running in Jenkins mode
 {code}
 From Nicolas Sze:
 {noformat}
 It looks like that the ppid is a 64-bit positive integer but Java long is 
 signed and so only works with 63-bit positive integers.  In your case,
   2^64  18446743988060683582  2^63.
 Therefore, there is a NFE. 
 {noformat}
 I propose changing allProcessInfo to MapString, ProcessInfo so that we 
 don't encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3736) Variable substitution depth too large for fs.default.name causes jobs to fail

2012-02-14 Thread Ahmed Radwan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated MAPREDUCE-3736:


Attachment: MAPREDUCE-3736_rev3.patch

Updated patch, since location of yarn-default.xml was changed.

 Variable substitution depth too large for fs.default.name causes jobs to fail
 -

 Key: MAPREDUCE-3736
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3736
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Eli Collins
Assignee: Ahmed Radwan
Priority: Blocker
 Attachments: MAPREDUCE-3736.patch, MAPREDUCE-3736_rev2.patch, 
 MAPREDUCE-3736_rev3.patch


 I'm seeing the same failure as MAPREDUCE-3462 in downstream projects running 
 against a recent build of branch-23. MR-3462 modified the tests rather than 
 fixing the framework. In that jira Ravi mentioned I'm still ignorant of the 
 change which made the tests start to fail. I should probably understand 
 better the reasons for that change before proposing a more generalized fix. 
 Let's figure out the general fix (rather than require all projects to set 
 mapreduce.job.hdfs-servers in their conf we should fix this in the 
 framework). Perhaps we should not default this config to $fs.default.name?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3824) Distributed caches are not removed properly

2012-02-14 Thread Allen Wittenauer (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208090#comment-13208090
 ] 

Allen Wittenauer commented on MAPREDUCE-3824:
-

Yes, that corresponds to what I was seeing as well.  Sorry, forgot to mention 
the directory thing.  I've been running the patch for so long and so 
successfully I forgot about that detail.

 Distributed caches are not removed properly
 ---

 Key: MAPREDUCE-3824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3824
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1.0.0
Reporter: Allen Wittenauer
Assignee: Thomas Graves
Priority: Critical
 Attachments: MAPREDUCE-3824-branch-1.0.txt


 Distributed caches are not being properly removed by the TaskTracker when 
 they are expected to be expired. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3858) Task attempt failure during commit results in task never completing

2012-02-14 Thread Mahadev konar (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-3858:
-

  Resolution: Fixed
   Fix Version/s: 0.23.1
Target Version/s: 0.23.1  (was: 0.23.2)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I just committed to all 3 branches 0.23.1, 0.23 and trunk. Thanks Tom!

 Task attempt failure during commit results in task never completing
 ---

 Key: MAPREDUCE-3858
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3858
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Tom White
Assignee: Tom White
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3858.patch


 On a terasort job a task attempt failed during the commit phase. Another 
 attempt was rescheduled, but when it tried to commit it failed.
 {noformat}
 attempt_1329019187148_0083_r_000586_0 already given a go for committing the 
 task output, so killing attempt_1329019187148_0083_r_000586_1
 {noformat}
 The job hung as new attempts kept getting scheduled only to fail during 
 commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3736) Variable substitution depth too large for fs.default.name causes jobs to fail

2012-02-14 Thread Alejandro Abdelnur (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208100#comment-13208100
 ] 

Alejandro Abdelnur commented on MAPREDUCE-3736:
---

+1

 Variable substitution depth too large for fs.default.name causes jobs to fail
 -

 Key: MAPREDUCE-3736
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3736
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Eli Collins
Assignee: Ahmed Radwan
Priority: Blocker
 Attachments: MAPREDUCE-3736.patch, MAPREDUCE-3736_rev2.patch, 
 MAPREDUCE-3736_rev3.patch


 I'm seeing the same failure as MAPREDUCE-3462 in downstream projects running 
 against a recent build of branch-23. MR-3462 modified the tests rather than 
 fixing the framework. In that jira Ravi mentioned I'm still ignorant of the 
 change which made the tests start to fail. I should probably understand 
 better the reasons for that change before proposing a more generalized fix. 
 Let's figure out the general fix (rather than require all projects to set 
 mapreduce.job.hdfs-servers in their conf we should fix this in the 
 framework). Perhaps we should not default this config to $fs.default.name?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3736) Variable substitution depth too large for fs.default.name causes jobs to fail

2012-02-14 Thread Alejandro Abdelnur (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-3736:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks Ahmed. Committed to trunk and branch-0.23

 Variable substitution depth too large for fs.default.name causes jobs to fail
 -

 Key: MAPREDUCE-3736
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3736
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Eli Collins
Assignee: Ahmed Radwan
Priority: Blocker
 Attachments: MAPREDUCE-3736.patch, MAPREDUCE-3736_rev2.patch, 
 MAPREDUCE-3736_rev3.patch


 I'm seeing the same failure as MAPREDUCE-3462 in downstream projects running 
 against a recent build of branch-23. MR-3462 modified the tests rather than 
 fixing the framework. In that jira Ravi mentioned I'm still ignorant of the 
 change which made the tests start to fail. I should probably understand 
 better the reasons for that change before proposing a more generalized fix. 
 Let's figure out the general fix (rather than require all projects to set 
 mapreduce.job.hdfs-servers in their conf we should fix this in the 
 framework). Perhaps we should not default this config to $fs.default.name?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3858) Task attempt failure during commit results in task never completing

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208118#comment-13208118
 ] 

Hudson commented on MAPREDUCE-3858:
---

Integrated in Hadoop-Common-trunk-Commit #1727 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1727/])
MAPREDUCE-3858. Task attempt failure during commit results in task never 
completing. (Tom White via mahadev) (Revision 1244254)

 Result = SUCCESS
mahadev : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244254
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java


 Task attempt failure during commit results in task never completing
 ---

 Key: MAPREDUCE-3858
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3858
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Tom White
Assignee: Tom White
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3858.patch


 On a terasort job a task attempt failed during the commit phase. Another 
 attempt was rescheduled, but when it tried to commit it failed.
 {noformat}
 attempt_1329019187148_0083_r_000586_0 already given a go for committing the 
 task output, so killing attempt_1329019187148_0083_r_000586_1
 {noformat}
 The job hung as new attempts kept getting scheduled only to fail during 
 commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3854) Reinstate environment variable tests in TestMiniMRChildTask

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208116#comment-13208116
 ] 

Hudson commented on MAPREDUCE-3854:
---

Integrated in Hadoop-Common-trunk-Commit #1727 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1727/])
MAPREDUCE-3854. Fixed and reenabled tests related to MR child JVM's 
environmental variables in TestMiniMRChildTask. (Tom White via vinodkv) 
(Revision 1244223)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244223
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMiniMRChildTask.java


 Reinstate environment variable tests in TestMiniMRChildTask
 ---

 Key: MAPREDUCE-3854
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3854
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: mrv2
Reporter: Tom White
Assignee: Tom White
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3854.patch, MAPREDUCE-3854.patch


 MAPREDUCE-3716 reinstated one of the tests in TestMiniMRChildTask, but there 
 are two more which should be run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException

2012-02-14 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated MAPREDUCE-3583:
--

Attachment: mapreduce-3583-trunk.txt

Patch for TRUNK.

All tests under 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
passed.

TestProcfsBasedProcessTree passed as well.

 ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
 -

 Key: MAPREDUCE-3583
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
 Environment: 64-bit Linux:
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, 
 mapreduce-3583-v3.txt, mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, 
 mapreduce-3583.txt


 HBase PreCommit builds frequently gave us NumberFormatException.
 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
 {code}
 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
 set.  User classes may not be found. See JobConf(Class) or 
 JobConf#setJar(String).
 java.lang.NumberFormatException: For input string: 18446743988060683582
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:422)
   at java.lang.Long.parseLong(Long.java:468)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
   at 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
 causing NFE:
 {code}
 // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
 {code}
 You can find information on the OS at the beginning of 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
 {code}
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 20
 file size   (blocks, -f) unlimited
 pending signals (-i) 16382
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 6
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 2048
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 6
 Running in Jenkins mode
 {code}
 From Nicolas Sze:
 {noformat}
 It looks like that the ppid is a 64-bit positive integer but Java long is 
 signed and so only works with 63-bit positive integers.  In your case,
   2^64  18446743988060683582  2^63.
 Therefore, there is a NFE. 
 {noformat}
 I propose changing allProcessInfo to MapString, ProcessInfo so that we 
 don't encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3858) Task attempt failure during commit results in task never completing

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208136#comment-13208136
 ] 

Hudson commented on MAPREDUCE-3858:
---

Integrated in Hadoop-Hdfs-trunk-Commit #1802 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1802/])
MAPREDUCE-3858. Task attempt failure during commit results in task never 
completing. (Tom White via mahadev) (Revision 1244254)

 Result = SUCCESS
mahadev : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244254
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java


 Task attempt failure during commit results in task never completing
 ---

 Key: MAPREDUCE-3858
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3858
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Tom White
Assignee: Tom White
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3858.patch


 On a terasort job a task attempt failed during the commit phase. Another 
 attempt was rescheduled, but when it tried to commit it failed.
 {noformat}
 attempt_1329019187148_0083_r_000586_0 already given a go for committing the 
 task output, so killing attempt_1329019187148_0083_r_000586_1
 {noformat}
 The job hung as new attempts kept getting scheduled only to fail during 
 commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3854) Reinstate environment variable tests in TestMiniMRChildTask

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208135#comment-13208135
 ] 

Hudson commented on MAPREDUCE-3854:
---

Integrated in Hadoop-Hdfs-trunk-Commit #1802 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1802/])
MAPREDUCE-3854. Fixed and reenabled tests related to MR child JVM's 
environmental variables in TestMiniMRChildTask. (Tom White via vinodkv) 
(Revision 1244223)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244223
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMiniMRChildTask.java


 Reinstate environment variable tests in TestMiniMRChildTask
 ---

 Key: MAPREDUCE-3854
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3854
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: mrv2
Reporter: Tom White
Assignee: Tom White
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3854.patch, MAPREDUCE-3854.patch


 MAPREDUCE-3716 reinstated one of the tests in TestMiniMRChildTask, but there 
 are two more which should be run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3736) Variable substitution depth too large for fs.default.name causes jobs to fail

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208134#comment-13208134
 ] 

Hudson commented on MAPREDUCE-3736:
---

Integrated in Hadoop-Hdfs-trunk-Commit #1802 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1802/])
MAPREDUCE-3736. Variable substitution depth too large for fs.default.name 
causes jobs to fail (ahmed via tucu) (Revision 1244264)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244264
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapred/TestMRWithDistributedCache.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/conf/TestNoDefaultsJobConf.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/JHLogAnalyzer.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/io/FileBench.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineFileInputFormat.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestConcatenatedCompressedInput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestTextInputFormat.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestMapCollection.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestMRKeyValueTextInputFormat.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Variable substitution depth too large for fs.default.name causes jobs to fail
 -

 Key: MAPREDUCE-3736
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3736
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Eli Collins
Assignee: Ahmed Radwan
Priority: Blocker
 Attachments: MAPREDUCE-3736.patch, MAPREDUCE-3736_rev2.patch, 
 MAPREDUCE-3736_rev3.patch


 I'm seeing the same failure as MAPREDUCE-3462 in downstream projects running 
 against a recent build of branch-23. MR-3462 modified the tests rather than 
 fixing the framework. In that jira Ravi mentioned I'm still ignorant of the 
 change which made the tests start to fail. I should probably understand 
 better the reasons for that change before proposing a more generalized fix. 
 Let's figure out the general fix (rather than require all projects to set 
 mapreduce.job.hdfs-servers in their conf we should fix this in the 
 framework). Perhaps we should not default this config to $fs.default.name?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3736) Variable substitution depth too large for fs.default.name causes jobs to fail

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208147#comment-13208147
 ] 

Hudson commented on MAPREDUCE-3736:
---

Integrated in Hadoop-Hdfs-0.23-Commit #539 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/539/])
Merge -r 1244263:1244264 from trunk to branch. FIXES: MAPREDUCE-3736 
(Revision 1244265)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244265
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapred/TestMRWithDistributedCache.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/conf/TestNoDefaultsJobConf.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/JHLogAnalyzer.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/io/FileBench.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineFileInputFormat.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestConcatenatedCompressedInput.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestTextInputFormat.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestMapCollection.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestMRKeyValueTextInputFormat.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Variable substitution depth too large for fs.default.name causes jobs to fail
 -

 Key: MAPREDUCE-3736
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3736
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Eli Collins
Assignee: Ahmed Radwan
Priority: Blocker
 Attachments: MAPREDUCE-3736.patch, MAPREDUCE-3736_rev2.patch, 
 MAPREDUCE-3736_rev3.patch


 I'm seeing the same failure as MAPREDUCE-3462 in downstream projects running 
 against a recent build of branch-23. MR-3462 modified the tests rather than 
 fixing the framework. In that jira Ravi mentioned I'm still ignorant of the 
 change which made the tests start to fail. I should probably understand 
 better the reasons for that change before proposing a more generalized fix. 
 Let's figure out the general fix (rather than require all projects to set 
 mapreduce.job.hdfs-servers in their conf we should fix this in the 
 framework). Perhaps we should not default this config to $fs.default.name?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3854) Reinstate environment variable tests in TestMiniMRChildTask

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208148#comment-13208148
 ] 

Hudson commented on MAPREDUCE-3854:
---

Integrated in Hadoop-Hdfs-0.23-Commit #539 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/539/])
MAPREDUCE-3854. Fixed and reenabled tests related to MR child JVM's 
environmental variables in TestMiniMRChildTask. (Tom White via vinodkv)
svn merge --ignore-ancestry -c 1244223 ../../trunk/ (Revision 1244224)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244224
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMiniMRChildTask.java


 Reinstate environment variable tests in TestMiniMRChildTask
 ---

 Key: MAPREDUCE-3854
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3854
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: mrv2
Reporter: Tom White
Assignee: Tom White
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3854.patch, MAPREDUCE-3854.patch


 MAPREDUCE-3716 reinstated one of the tests in TestMiniMRChildTask, but there 
 are two more which should be run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3858) Task attempt failure during commit results in task never completing

2012-02-14 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208149#comment-13208149
 ] 

Hudson commented on MAPREDUCE-3858:
---

Integrated in Hadoop-Hdfs-0.23-Commit #539 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/539/])
MAPREDUCE-3858. Task attempt failure during commit results in task never 
completing. (Tom White via mahadev) - Merging r1244254 from trunk. (Revision 
1244255)

 Result = SUCCESS
mahadev : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1244255
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java


 Task attempt failure during commit results in task never completing
 ---

 Key: MAPREDUCE-3858
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3858
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Tom White
Assignee: Tom White
Priority: Critical
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3858.patch


 On a terasort job a task attempt failed during the commit phase. Another 
 attempt was rescheduled, but when it tried to commit it failed.
 {noformat}
 attempt_1329019187148_0083_r_000586_0 already given a go for committing the 
 task output, so killing attempt_1329019187148_0083_r_000586_1
 {noformat}
 The job hung as new attempts kept getting scheduled only to fail during 
 commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2793) [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs

2012-02-14 Thread Bikas Saha (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated MAPREDUCE-2793:
--

Attachment: MAPREDUCE-2793-branch-0.23.patch

Adding patch with code and test fixes.

 [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs 
 --

 Key: MAPREDUCE-2793
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2793
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Bikas Saha
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793.patch


 appIDs, jobIDs and attempt/container ids are not consistently named in the 
 logs, console and UI. For consistency purpose, they all have to follow a 
 common naming convention.
 Currently, 
 For appID
 =
 On the RM UI: app_1308259676864_5 
 On the JHS UI: No appID 
 Console/logs: No appID
 mapred-local dirs are named as: application_1308259676864_0005
 For jobID
 =
 On the RM UI: job_1308259676864_5_5 
 JHS UI: job_1308259676864_5_5 
 Console/logs: job_1308259676864_0005
 mapred-local dirs are named as: No jobID
 For attemptID
 
 On the RM UI: attempt_1308259676864_5_5_m_24_0
 JHS attempt_1308259676864_5_5_m_24_0
 Console/logs: attempt_1308259676864_0005_m_24_0
 mapred-local dirs are named as: container_1308259676864_0005_24

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3859) CapacityScheduler incorrectly utilizes extra-resources of queue for high-memory jobs

2012-02-14 Thread Arun C Murthy (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208204#comment-13208204
 ] 

Arun C Murthy commented on MAPREDUCE-3859:
--

Sergey, I'm pretty sure the reason you are hitting this is that you have a 
single user in your queue.

By default, a single user can't exceed the queue's capacity (10 in this case). 
You can use 'user-limit-factor' to bump that up: 
http://hadoop.apache.org/common/docs/r1.0.0/capacity_scheduler.html#Configuration


 CapacityScheduler incorrectly utilizes extra-resources of queue for 
 high-memory jobs
 

 Key: MAPREDUCE-3859
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3859
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/capacity-sched
Affects Versions: 1.0.0
 Environment: CDH3u1
Reporter: Sergey Tryuber

 Imagine, we have a queue A with capacity 10 slots and 20 as extra-capacity, 
 jobs which use 3 map slots will never consume more than 9 slots, regardless 
 how many free slots on a cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3634) All daemons should crash instead of hanging around when their EventHandlers get exceptions

2012-02-14 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3634:
---

Status: Patch Available  (was: Open)

 All daemons should crash instead of hanging around when their EventHandlers 
 get exceptions
 --

 Key: MAPREDUCE-3634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3634
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3634-20120118.1.txt, 
 MAPREDUCE-3634-20120119.txt, MAPREDUCE-3634-20120214.txt


 We should make sure that the daemons crash in case the dispatchers get 
 exceptions and stop processing. That way we will be debugging RM/NM/AM 
 crashes instead of hard-to-track hanging jobs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3634) All daemons should crash instead of hanging around when their EventHandlers get exceptions

2012-02-14 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3634:
---

Fix Version/s: (was: 0.23.1)
   0.23.2

 All daemons should crash instead of hanging around when their EventHandlers 
 get exceptions
 --

 Key: MAPREDUCE-3634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3634
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3634-20120118.1.txt, 
 MAPREDUCE-3634-20120119.txt, MAPREDUCE-3634-20120214.txt


 We should make sure that the daemons crash in case the dispatchers get 
 exceptions and stop processing. That way we will be debugging RM/NM/AM 
 crashes instead of hard-to-track hanging jobs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3634) All daemons should crash instead of hanging around when their EventHandlers get exceptions

2012-02-14 Thread Vinod Kumar Vavilapalli (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3634:
---

Attachment: MAPREDUCE-3634-20120214.txt

Addressing Sharad's and Sid's comments + updating to the latest code.

 All daemons should crash instead of hanging around when their EventHandlers 
 get exceptions
 --

 Key: MAPREDUCE-3634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3634
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3634-20120118.1.txt, 
 MAPREDUCE-3634-20120119.txt, MAPREDUCE-3634-20120214.txt


 We should make sure that the daemons crash in case the dispatchers get 
 exceptions and stop processing. That way we will be debugging RM/NM/AM 
 crashes instead of hard-to-track hanging jobs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException

2012-02-14 Thread Mahadev konar (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208261#comment-13208261
 ] 

Mahadev konar commented on MAPREDUCE-3583:
--

Looks like jenkins is down. Will run the trunk patch through hudson as soon as 
the build machines are up!

 ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
 -

 Key: MAPREDUCE-3583
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
 Environment: 64-bit Linux:
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, 
 mapreduce-3583-v3.txt, mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, 
 mapreduce-3583.txt


 HBase PreCommit builds frequently gave us NumberFormatException.
 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
 {code}
 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
 set.  User classes may not be found. See JobConf(Class) or 
 JobConf#setJar(String).
 java.lang.NumberFormatException: For input string: 18446743988060683582
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:422)
   at java.lang.Long.parseLong(Long.java:468)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
   at 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
 causing NFE:
 {code}
 // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
 {code}
 You can find information on the OS at the beginning of 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
 {code}
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 20
 file size   (blocks, -f) unlimited
 pending signals (-i) 16382
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 6
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 2048
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 6
 Running in Jenkins mode
 {code}
 From Nicolas Sze:
 {noformat}
 It looks like that the ppid is a 64-bit positive integer but Java long is 
 signed and so only works with 63-bit positive integers.  In your case,
   2^64  18446743988060683582  2^63.
 Therefore, there is a NFE. 
 {noformat}
 I propose changing allProcessInfo to MapString, ProcessInfo so that we 
 don't encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

69 matches

Mail list logo