[jira] Updated: (MAPREDUCE-732) node health check script should not log UNHEALTHY status for every heartbeat in INFO mode

2009-10-26 Thread Sreekanth Ramakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreekanth Ramakrishnan updated MAPREDUCE-732:
-

Release Note: Changed log level of addition of blacklisted reason in the 
JobTracker log to debug instead of INFO

 node health check script should not log UNHEALTHY status for every 
 heartbeat in INFO mode
 ---

 Key: MAPREDUCE-732
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-732
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Ramya R
Assignee: Sreekanth Ramakrishnan
Priority: Minor
 Fix For: 0.21.0

 Attachments: MAPRED-732-ydist.patch, mapreduce-732-1.patch, 
 MAPREDUCE-732-2.patch, mapreduce-732.patch


 Currently, when a TT is blacklisted by the node health check script, for 
 every heartbeat a message such as the following is being logged.
 {noformat}
 date time INFO org.apache.hadoop.mapred.JobTracker: Adding blacklisted 
 reason for tracker : blacklisted TT Reason for blacklisting is : 
 NODE_UNHEALTHY
 {noformat}
 Due to this, the the JT logs fill up rapidly clogging the logdirs. Hence this 
 message should be logged in DEBUG mode instead of INFO mode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-947) OutputCommitter should have an abortJob method

2009-10-26 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-947:
-

Release Note: Introduced abortJob() method in OutputCommitter which will be 
invoked when the job fails or is killed. By default it invokes 
OutputCommitter.cleanupJob(). Deprecated OutputCommitter.cleanupJob() and 
introduced OutputCommitter.commitJob() method which will be invoked for 
successful jobs. Also a _SUCCESS file is created in the output folder for 
successful jobs.  (was: Introduced abortJob() method in OutputCommitter which 
will be invoked when the job fails or is killed. Also a _done file is created 
in the output folder for successful jobs while _abort is created for 
failed/killed jobs.)

 OutputCommitter should have an abortJob method
 --

 Key: MAPREDUCE-947
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-947
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.21.0
Reporter: Owen O'Malley
Assignee: Amar Kamat
 Fix For: 0.22.0

 Attachments: mapred-948-v1.12-branch-0.20-internal.patch, 
 mapred-948-v1.12.patch, mapred-948-v1.13-branch-0.20-internal.patch, 
 mapred-948-v1.2.patch, mapred-948-v1.3.patch, mapred-948-v1.4.patch, 
 mapred-948-v1.7.patch, mapred-948-v2.1-branch-0.20.patch, 
 mapred-948-v2.3-branch-0.20.patch, mapred-948-v2.3.patch, 
 mapred-948-v3.1.patch, mapred-948-v3.2.patch, mapred-948-v3.4.patch, 
 mr-947-trunk-new.patch, mr-947-trunk-new.patch, mr-947-trunk.patch, 
 mr-947-trunk.patch, mr-947-trunk.patch, mr-947-y20-new.patch, 
 mr-947-y20.patch, mr-947-y20.patch


 The OutputCommitter needs an abortJob method to clean up from failed jobs. 
 Currently there is no way to distinguish between failed or succeeded jobs, 
 making it impossible to write output promotion code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-217) Tasks to run on a different jvm version than the TaskTracker

2009-10-26 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769911#action_12769911
 ] 

Amar Kamat commented on MAPREDUCE-217:
--

Had a discussion with Sharad on this. As he rightly pointed out that giving 
preference to user defined classpath entries over (tt's) inherited classpath 
entries can lead to security issues where a malicious user can define its own 
Task.java or ReduceTask.java. I think we should keep the classpath ordering as 
is.

bq. At least it also needs to set the new classpath for the native libraries 
and probably there's more that I'm missing.
Koji, as of today users can add their libraries which is given preference over 
the inherited ones.

Currently this is what is done
child.jvm : tt.jvm
child.libraries : user-defined-libraries+tt.libraries
child.classpath : 
tt.classpath+job-jar.classpath+dist-cache-entries+current.wor,dir+user-defined.classpath

Changes are 
child.jvm : user-defined.jvm else tt.jvm
// since user is specifying the jvm, the user is responsible for add the the 
required libs too


 Tasks to run on a different jvm version than the TaskTracker
 

 Key: MAPREDUCE-217
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-217
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
 Environment: linux
Reporter: Koji Noguchi
Assignee: Amar Kamat
 Attachments: mapreduce-217-v1.0.patch


 We use 32-bit jvm for TaskTrackers. 
 Sometimes our users want to call 64-bit JNI libraries from their tasks.
 This requires tasks to be running on 64-bit jvm.
 On Solaris, you can simply use -d32/-d64 to choose, but on Linux, it's on a 
 completely different package.
 So far, tasks run on the same jvm version as the TaskTracker.
 {noformat}
 // use same jvm as parent
 File jvm =   new File(new File(System.getProperty(java.home), bin), 
 java);
 {noformat}
 Is it possible to let users provide a java home path 
 or let them choose from a pre-selected list of paths?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1098) Incorrect synchronization in DistributedCache causes TaskTrackers to freeze up during localization of Cache for tasks.

2009-10-26 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1098:
---

Release Note: Fixed the distributed cache's localizeCache to lock only the 
uri it is localizing. 

 Incorrect synchronization in DistributedCache causes TaskTrackers to freeze 
 up during localization of Cache for tasks.
 --

 Key: MAPREDUCE-1098
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1098
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Reporter: Sreekanth Ramakrishnan
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1098.patch, MAPREDUCE-1098.patch, 
 MAPREDUCE-1098.patch, patch-1098-0.20.txt, patch-1098-1.txt, 
 patch-1098-2.txt, patch-1098-3.txt, patch-1098-4.txt, patch-1098-5.txt, 
 patch-1098-6.txt, patch-1098-7.txt, patch-1098-7.txt, patch-1098-ydist.txt, 
 patch-1098-ydist.txt, patch-1098-ydist.txt, patch-1098.txt


 Currently {{org.apache.hadoop.filecache.DistributedCache.getLocalCache(URI, 
 Configuration, Path, FileStatus, boolean, long, Path, boolean)}} allows only 
 one {{TaskRunner}} thread in TT to localize {{DistributedCache}} across jobs. 
 Current way of synchronization is across baseDir this has to be changed to 
 lock on the same baseDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1048) Show total slot usage in cluster summary on jobtracker webui

2009-10-26 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1048:
---

Release Note: Added occupied map/reduce slots and reserved map/reduce slots 
to the Cluster Summary table on jobtracker web ui.

 Show total slot usage in cluster summary on jobtracker webui
 

 Key: MAPREDUCE-1048
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1048
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.20.1
Reporter: Amar Kamat
Assignee: Amareshwari Sriramadasu
 Fix For: 0.22.0

 Attachments: mapred-1048-v1.0.patch, mapred-1048-v1.1.patch, 
 MAPREDUCE-1048-20.patch, MAPREDUCE-1048.patch, patch-1048-0.20.txt, 
 patch-1048-1.txt, patch-1048-2.txt, patch-1048-3.txt, patch-1048-4.txt, 
 patch-1048-5.txt, patch-1048-6.txt, patch-1048-ydist.txt, patch-1048.txt


 With High-Ram jobs coming into the picture, its important to also show the 
 slot usage in cluster summary since total-running-maps  
 total-slots-occupied. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1152) JobTrackerInstrumentation.killed{Map/Reduce} is never called

2009-10-26 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-1152:
--

Attachment: 1152.patch

Patch fixing fail and kill task metrics.

 JobTrackerInstrumentation.killed{Map/Reduce} is never called
 

 Key: MAPREDUCE-1152
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1152
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Sharad Agarwal
 Fix For: 0.22.0

 Attachments: 1152.patch, 1152.patch


 JobTrackerInstrumentation.killed{Map/Reduce} metrics added as part of 
 MAPREDUCE-1103 is not captured

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1153) Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned.

2009-10-26 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal reassigned MAPREDUCE-1153:
-

Assignee: Sharad Agarwal

 Metrics counting tasktrackers and blacklisted tasktrackers are not updated 
 when trackers are decommissioned.
 

 Key: MAPREDUCE-1153
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1153
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Hemanth Yamijala
Assignee: Sharad Agarwal

 MAPREDUCE-1103 added instrumentation on the jobtracker to count the number of 
 actual, blacklisted and decommissioned tasktrackers. When a tracker is 
 decommissioned, the tasktracker count or the blacklisted tracker count is not 
 decremented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1152) JobTrackerInstrumentation.killed{Map/Reduce} is never called

2009-10-26 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1152:
---

Status: Patch Available  (was: Open)

Patch looks fine to me.
Submitting for hudson

 JobTrackerInstrumentation.killed{Map/Reduce} is never called
 

 Key: MAPREDUCE-1152
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1152
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Sharad Agarwal
 Fix For: 0.22.0

 Attachments: 1152.patch, 1152.patch


 JobTrackerInstrumentation.killed{Map/Reduce} metrics added as part of 
 MAPREDUCE-1103 is not captured

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1153) Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned.

2009-10-26 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-1153:
--

Attachment: 1153.patch

Moved common code into a single method - removeTracker(TaskTracker)

 Metrics counting tasktrackers and blacklisted tasktrackers are not updated 
 when trackers are decommissioned.
 

 Key: MAPREDUCE-1153
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1153
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Hemanth Yamijala
Assignee: Sharad Agarwal
 Attachments: 1153.patch


 MAPREDUCE-1103 added instrumentation on the jobtracker to count the number of 
 actual, blacklisted and decommissioned tasktrackers. When a tracker is 
 decommissioned, the tasktracker count or the blacklisted tracker count is not 
 decremented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1102) Job gets killed even when the cleanup completes

2009-10-26 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat reassigned MAPREDUCE-1102:
-

Assignee: Amar Kamat

 Job gets killed even when the cleanup completes
 ---

 Key: MAPREDUCE-1102
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1102
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.1
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0


 When the cleanup completes at the tasktracker and the job is killed by the 
 user, the cleanup runs to completion but the job fails. Ideally if the 
 cleanup is completed then the job should not be killed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1102) Job gets killed even when the cleanup completes

2009-10-26 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769979#action_12769979
 ] 

Amar Kamat commented on MAPREDUCE-1102:
---

One simple approach would be not to honour 'kill-job' when the cleanup is 
launched in which case the job can either move to FAILED or SUCCESSFUL state. 
The job can fail (after cleanup is launched) if all the cleanup attempts fail. 
The only corner case we need to take care if the case where the 
FileOutputCommitter.commitJob() creates _SUCCESS and fails. In such a case the 
job will fail with a _SUCCESS file. 

 Job gets killed even when the cleanup completes
 ---

 Key: MAPREDUCE-1102
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1102
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.1
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0


 When the cleanup completes at the tasktracker and the job is killed by the 
 user, the cleanup runs to completion but the job fails. Ideally if the 
 cleanup is completed then the job should not be killed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-171) TestJobTrackerRestartWithLostTracker sometimes fails while validating history.

2009-10-26 Thread Suman Sehgal (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769984#action_12769984
 ] 

Suman Sehgal commented on MAPREDUCE-171:


Yeah, it's failing on 0.20.1 also!

 TestJobTrackerRestartWithLostTracker sometimes fails while validating history.
 --

 Key: MAPREDUCE-171
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-171
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Amareshwari Sriramadasu
 Attachments: 
 TEST-org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.txt


 TestJobTrackerRestartWithLostTracker fails with following error
 Duplicate START_TIME seen for task task_200906151249_0001_m_01 in history 
 file at line 54
 junit.framework.AssertionFailedError: Duplicate START_TIME seen for task 
 task_200906151249_0001_m_01 in history file at line 54
   at 
 org.apache.hadoop.mapred.TestJobHistory$TestListener.handle(TestJobHistory.java:161)
   at org.apache.hadoop.mapred.JobHistory.parseLine(JobHistory.java:335)
   at 
 org.apache.hadoop.mapred.JobHistory.parseHistoryFromFS(JobHistory.java:299)
   at 
 org.apache.hadoop.mapred.TestJobHistory.validateJobHistoryFileFormat(TestJobHistory.java:478)
   at 
 org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.testRecoveryWithLostTracker(TestJobTrackerRestartWithLostTracker.java:116)
   at 
 org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.testRestartWithLostTracker(TestJobTrackerRestartWithLostTracker.java:162)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1152) JobTrackerInstrumentation.killed{Map/Reduce} is never called

2009-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769992#action_12769992
 ] 

Hadoop QA commented on MAPREDUCE-1152:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12423170/1152.patch
  against trunk revision 829529.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/212/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/212/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/212/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/212/console

This message is automatically generated.

 JobTrackerInstrumentation.killed{Map/Reduce} is never called
 

 Key: MAPREDUCE-1152
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1152
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Sharad Agarwal
 Fix For: 0.22.0

 Attachments: 1152.patch, 1152.patch


 JobTrackerInstrumentation.killed{Map/Reduce} metrics added as part of 
 MAPREDUCE-1103 is not captured

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-947) OutputCommitter should have an abortJob method

2009-10-26 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-947:
---

Attachment: yhadoop20-bug-fix-947.patch

Y! 20 patch has a bug that made TestJobHistory to fail. Patch with the fix for 
the bug for Y! 20 distribution is attached now.

Running unit tests with the fix now.

 OutputCommitter should have an abortJob method
 --

 Key: MAPREDUCE-947
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-947
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.21.0
Reporter: Owen O'Malley
Assignee: Amar Kamat
 Fix For: 0.22.0

 Attachments: mapred-948-v1.12-branch-0.20-internal.patch, 
 mapred-948-v1.12.patch, mapred-948-v1.13-branch-0.20-internal.patch, 
 mapred-948-v1.2.patch, mapred-948-v1.3.patch, mapred-948-v1.4.patch, 
 mapred-948-v1.7.patch, mapred-948-v2.1-branch-0.20.patch, 
 mapred-948-v2.3-branch-0.20.patch, mapred-948-v2.3.patch, 
 mapred-948-v3.1.patch, mapred-948-v3.2.patch, mapred-948-v3.4.patch, 
 mr-947-trunk-new.patch, mr-947-trunk-new.patch, mr-947-trunk.patch, 
 mr-947-trunk.patch, mr-947-trunk.patch, mr-947-y20-new.patch, 
 mr-947-y20.patch, mr-947-y20.patch, yhadoop20-bug-fix-947.patch


 The OutputCommitter needs an abortJob method to clean up from failed jobs. 
 Currently there is no way to distinguish between failed or succeeded jobs, 
 making it impossible to write output promotion code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1102) Job gets killed even when the cleanup completes

2009-10-26 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770071#action_12770071
 ] 

Owen O'Malley commented on MAPREDUCE-1102:
--

Rather than blocking kill-job, I think we are better off guaranteeing that if 
the job fails, we will always call abortJob. Even if commitJob has started (or 
finished). We should also make the FileOutputFormat abortJob delete _SUCCESS to 
handle this case. This would also handle the case where the job commit task 
fails part way through.

I agree that all output committers may not be able to unroll their commit. 
However, I think that we need to give them the ability to do the right thing.

 Job gets killed even when the cleanup completes
 ---

 Key: MAPREDUCE-1102
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1102
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.1
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0


 When the cleanup completes at the tasktracker and the job is killed by the 
 user, the cleanup runs to completion but the job fails. Ideally if the 
 cleanup is completed then the job should not be killed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1102) Job gets killed even when the cleanup completes

2009-10-26 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770079#action_12770079
 ] 

Owen O'Malley commented on MAPREDUCE-1102:
--

Naturally, the fact that it may be invoked after the commitJob method has been 
called should be called out in the JavaDoc for the abortJob method. 

 Job gets killed even when the cleanup completes
 ---

 Key: MAPREDUCE-1102
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1102
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.1
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0


 When the cleanup completes at the tasktracker and the job is killed by the 
 user, the cleanup runs to completion but the job fails. Ideally if the 
 cleanup is completed then the job should not be killed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-10-26 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770121#action_12770121
 ] 

Todd Lipcon commented on MAPREDUCE-967:
---

One note about this JIRA - it will need some fix for Streaming as well. The 
common way that people ship scripts for streaming is using the -file foo.py 
argument. This just includes foo.py in the job jar and assumes it will be 
unpacked on the other side. With this patch, it won't unpack those and breaks 
the -file argument's primary use case.

Two options to fix this issue:
# We could change -file to use DistributedCache instead. The fact that -file 
and -files do different things is confusing in the first place, but changing 
the behavior is potentially breaking change, I think.
# We could change Streaming to add all of the -file paths to the new 
configuration parameter such that the existing behavior is preserved.

If no one else has a preference I'll go for option #2 above.

 TaskTracker does not need to fully unjar job jars
 -

 Key: MAPREDUCE-967
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-967-branch-0.20.txt


 In practice we have seen some users submitting job jars that consist of 
 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
 up after them has a significant cost (both in wall clock and in unnecessary 
 heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1155) Streaming TestMultipleArchiveFiles swallows exceptions

2009-10-26 Thread Todd Lipcon (JIRA)
Streaming TestMultipleArchiveFiles swallows exceptions
--

 Key: MAPREDUCE-1155
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1155
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.20.1, 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


TestMultipleArchiveFiles catches exceptions and prints their stack trace rather 
than failing the job. This means that tests do not fail even when the job fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

2009-10-26 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-1114:
---

Status: Patch Available  (was: Open)

 Speed up ivy resolution in builds with clever caching
 -

 Key: MAPREDUCE-1114
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-1114.txt, mapreduce-1114.txt


 An awful lot of time is spent in the ivy:resolve parts of the build, even 
 when all of the dependencies have been fetched and cached. Profiling showed 
 this was in XML parsing. I have a sort-of-ugly hack which speeds up 
 incremental compiles (and more importantly ant test) significantly using 
 some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

2009-10-26 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-1114:
---

Attachment: mapreduce-1114.txt

Attaching up to date patch.

 Speed up ivy resolution in builds with clever caching
 -

 Key: MAPREDUCE-1114
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-1114.txt, mapreduce-1114.txt


 An awful lot of time is spent in the ivy:resolve parts of the build, even 
 when all of the dependencies have been fetched and cached. Profiling showed 
 this was in XML parsing. I have a sort-of-ugly hack which speeds up 
 incremental compiles (and more importantly ant test) significantly using 
 some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

2009-10-26 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-1114:
---

Attachment: mapreduce-1114.txt

Forgot to include build-macros.xml in previous patch

 Speed up ivy resolution in builds with clever caching
 -

 Key: MAPREDUCE-1114
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-1114.txt, mapreduce-1114.txt, 
 mapreduce-1114.txt


 An awful lot of time is spent in the ivy:resolve parts of the build, even 
 when all of the dependencies have been fetched and cached. Profiling showed 
 this was in XML parsing. I have a sort-of-ugly hack which speeds up 
 incremental compiles (and more importantly ant test) significantly using 
 some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

2009-10-26 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-1114:
---

Status: Patch Available  (was: Open)

 Speed up ivy resolution in builds with clever caching
 -

 Key: MAPREDUCE-1114
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-1114.txt, mapreduce-1114.txt, 
 mapreduce-1114.txt


 An awful lot of time is spent in the ivy:resolve parts of the build, even 
 when all of the dependencies have been fetched and cached. Profiling showed 
 this was in XML parsing. I have a sort-of-ugly hack which speeds up 
 incremental compiles (and more importantly ant test) significantly using 
 some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

2009-10-26 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-1114:
---

Status: Open  (was: Patch Available)

 Speed up ivy resolution in builds with clever caching
 -

 Key: MAPREDUCE-1114
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-1114.txt, mapreduce-1114.txt, 
 mapreduce-1114.txt


 An awful lot of time is spent in the ivy:resolve parts of the build, even 
 when all of the dependencies have been fetched and cached. Profiling showed 
 this was in XML parsing. I have a sort-of-ugly hack which speeds up 
 incremental compiles (and more importantly ant test) significantly using 
 some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1156) Caching localized counter names in mapred.Counters

2009-10-26 Thread Hong Tang (JIRA)
Caching localized counter names in mapred.Counters
--

 Key: MAPREDUCE-1156
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1156
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Hong Tang


Using YourKit profiling mumak, we found that MissingResourceException was 
thrown and caught 1.6 million times in Counters.Group.localize for several 
hundred of jobs. The resource bundle look up and costly exception processing 
can be easily avoided if we have a global cache of localized counter names.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1103) Additional JobTracker metrics

2009-10-26 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-1103:
--

Release Note: 
Add following additional job tracker metrics: 
Reserved{Map, Reduce}Slots
Occupied{Map, Reduce}Slots
Running{Map, Reduce}Tasks
Killed{Map, Reduce}Tasks

FailedJobs
KilledJobs
PrepJobs
RunningJobs

TotalTrackers
BlacklistedTrackers
DecommissionedTrackers

 Additional JobTracker metrics
 -

 Key: MAPREDUCE-1103
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1103
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: Arun C Murthy
Assignee: Sharad Agarwal
 Fix For: 0.22.0

 Attachments: 1103.patch, 1103.patch, 1103_v1.patch, 1103_v2.patch, 
 1103_v3.patch, 1103_v4.patch, 1103_v5.patch, 1103_v5_yahoo_1.patch


 It would be useful for tracking the following additional JobTracker metrics:
 running{map|reduce}tasks
 busy{map|reduce}slots
 reserved{map|reduce}slots

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1153) Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned.

2009-10-26 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1153:
---

Status: Patch Available  (was: Open)

changes look fine. Submitting for hudson

 Metrics counting tasktrackers and blacklisted tasktrackers are not updated 
 when trackers are decommissioned.
 

 Key: MAPREDUCE-1153
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1153
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Hemanth Yamijala
Assignee: Sharad Agarwal
 Attachments: 1153.patch


 MAPREDUCE-1103 added instrumentation on the jobtracker to count the number of 
 actual, blacklisted and decommissioned tasktrackers. When a tracker is 
 decommissioned, the tasktracker count or the blacklisted tracker count is not 
 decremented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1144) JT should not hold lock while writing user history logs to DFS

2009-10-26 Thread Sharad Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770383#action_12770383
 ] 

Sharad Agarwal commented on MAPREDUCE-1144:
---

Since MAPREDUCE-814 adds the capability to have job logs in HDFS, there is not 
much utility in enabling the user logs. Users can directly access those from 
HDFS done folder location. Infact in 0.21, user log has been removed as part of 
job history format/API refactoring - MAPREDUCE-157

 JT should not hold lock while writing user history logs to DFS
 --

 Key: MAPREDUCE-1144
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1144
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.1
Reporter: Todd Lipcon

 I've seen behavior a few times now where the DFS is being slow for one reason 
 or another, and the JT essentially locks up waiting on it while one thread 
 tries for a long time to write history files out. The stack trace blocking 
 everything is:
 Thread 210 (IPC Server handler 10 on 7277):
   State: WAITING
   Blocked count: 171424
   Waited count: 1209604
   Waiting on java.util.linkedl...@407dd154
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3122)
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3202)
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3151)
 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:67)
 org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
 sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:301)
 sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130)
 java.io.OutputStreamWriter.close(OutputStreamWriter.java:216)
 java.io.BufferedWriter.close(BufferedWriter.java:248)
 java.io.PrintWriter.close(PrintWriter.java:295)
 
 org.apache.hadoop.mapred.JobHistory$JobInfo.logFinished(JobHistory.java:1349)
 
 org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2167)
 
 org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2111)
 
 org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:873)
 
 org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3598)
 org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:2792)
 org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2581)
 sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
 We should try not to do external IO while holding the JT lock, and instead 
 write the data to an in-memory buffer, drop the lock, and then write.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.