[jira] [Updated] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated MAPREDUCE-4843: Attachment: MAPREDUCE-4843-branch-1.1.patch Update patch. When using DefaultTaskController, JobLocalizer not thread safe -- Key: MAPREDUCE-4843 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.1 Reporter: zhaoyunjiong Priority: Critical Attachments: MAPREDUCE-4843-branch-1.1.patch In our cluster, some times job will failed due to below exception: 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_23_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213) The root cause is JobLocalizer is not thread safe. In DefaultTaskController.initializeJob method: JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid); but in JobLocalizer, it just simply keep the reference of the conf. When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance. So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf. Then it will cause the previous job's job.xml stored at another user's dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated MAPREDUCE-4843: Status: Patch Available (was: Open) Testing patch When using DefaultTaskController, JobLocalizer not thread safe -- Key: MAPREDUCE-4843 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.1 Reporter: zhaoyunjiong Priority: Critical Attachments: MAPREDUCE-4843-branch-1.1.patch In our cluster, some times job will failed due to below exception: 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_23_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213) The root cause is JobLocalizer is not thread safe. In DefaultTaskController.initializeJob method: JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid); but in JobLocalizer, it just simply keep the reference of the conf. When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance. So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf. Then it will cause the previous job's job.xml stored at another user's dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4849) TaskSelector not used in FairScheduler
Vincent Behar created MAPREDUCE-4849: Summary: TaskSelector not used in FairScheduler Key: MAPREDUCE-4849 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4849 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/fair-share Affects Versions: 1.1.1, 1.0.4 Reporter: Vincent Behar The documentation (http://hadoop.apache.org/docs/r1.0.4/fair_scheduler.html) describes the mapred.fairscheduler.taskselector parameter as an extension point, but while the FairScheduler does instantiate the custom TaskSelector provided this way, it does not call any of its methods (obtainNewMapTask, obtainNewReduceTask, neededSpeculativeMaps or neededSpeculativeReduces). We should either update the FairScheduler to use the TaskSelector when scheduling a task, or completely remove the TaskSelector and update the documentation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4732) testcase testJobRetire fails using IBM JAVA
[ https://issues.apache.org/jira/browse/MAPREDUCE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Sanjar updated MAPREDUCE-4732: --- Summary: testcase testJobRetire fails using IBM JAVA (was: testcase testJobRetire fails using IBM JAVA 7) testcase testJobRetire fails using IBM JAVA Key: MAPREDUCE-4732 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4732 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 1.0.3 Environment: RHEL 6.2 with IBM JAVA 7 on a x86_64 system Reporter: Amir Sanjar Testcase: testJobRetire took 53.352 sec Testcase: testJobRetireWithUnreportedTasks took 41.173 sec FAILED Job did not retire junit.framework.AssertionFailedError: Job did not retire at org.apache.hadoop.mapred.TestJobRetire.waitTillRetire(TestJobRetire.java:130) at org.apache.hadoop.mapred.TestJobRetire.testJobRetireWithUnreportedTasks(TestJobRetire.java:229) Testcase: testJobRemoval took 1.073 sec -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4732) testcase testJobRetire fails using IBM JAVA
[ https://issues.apache.org/jira/browse/MAPREDUCE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510486#comment-13510486 ] Amir Sanjar commented on MAPREDUCE-4732: was able to reprouduce on IBM JAVA 6.. updatting abstract testcase testJobRetire fails using IBM JAVA Key: MAPREDUCE-4732 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4732 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 1.0.3 Environment: RHEL 6.2 with IBM JAVA 7 on a x86_64 system Reporter: Amir Sanjar Testcase: testJobRetire took 53.352 sec Testcase: testJobRetireWithUnreportedTasks took 41.173 sec FAILED Job did not retire junit.framework.AssertionFailedError: Job did not retire at org.apache.hadoop.mapred.TestJobRetire.waitTillRetire(TestJobRetire.java:130) at org.apache.hadoop.mapred.TestJobRetire.testJobRetireWithUnreportedTasks(TestJobRetire.java:229) Testcase: testJobRemoval took 1.073 sec -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-4842: - Status: Open (was: Patch Available) Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.5, 2.0.2-alpha Reporter: Jason Lowe Assignee: Arun C Murthy Priority: Blocker Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-4842: - Attachment: MAPREDUCE-4842.patch Jason, nice unit test! Thanks! I've modified it a little to have 2 barriers (mergeStart and mergeComplete) rather than use the same 4 times (confused me a lot when I was reviewing it). Other than that, it looks great. +1 Also, if you don't mind, I'll assign the jira to you - since you've done all the heavy lifting and deserve way more credit than I do. Thanks again! Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Arun C Murthy Priority: Blocker Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy reassigned MAPREDUCE-4842: Assignee: Jason Lowe (was: Arun C Murthy) Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4850) Job recovery may fail if staging directory has been deleted
Tom White created MAPREDUCE-4850: Summary: Job recovery may fail if staging directory has been deleted Key: MAPREDUCE-4850 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4850 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.1.1 Reporter: Tom White Assignee: Tom White The job staging directory is deleted in the job cleanup task, which happens before the job-info file is deleted from the system directory (by the JobInProgress garbageCollect() method). If the JT shuts down between these two operations, then when the JT restarts and tries to recover the job, it fails since the job.xml and splits are no longer available. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4850) Job recovery may fail if staging directory has been deleted
[ https://issues.apache.org/jira/browse/MAPREDUCE-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-4850: - Attachment: MAPREDUCE-4850.patch A patch that deletes the staging directory after the system directory. Manual testing showed that with this patch I couldn't get a recovery failure in the scenario in the description. It would be nice to add a unit test, but I'm still trying to figure out how to write one for this. Job recovery may fail if staging directory has been deleted --- Key: MAPREDUCE-4850 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4850 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.1.1 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-4850.patch The job staging directory is deleted in the job cleanup task, which happens before the job-info file is deleted from the system directory (by the JobInProgress garbageCollect() method). If the JT shuts down between these two operations, then when the JT restarts and tries to recover the job, it fails since the job.xml and splits are no longer available. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510664#comment-13510664 ] Alejandro Abdelnur commented on MAPREDUCE-4842: --- One minor NIT, the scope of exceptionReporter instance var has been changed from private to protected for testing purposes. It should be package private instead. And preferable, we should add a getter method instead, package private (it could be annotated with the visiblefortesting guava annotation). Other than that looks good to me. Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4842: -- Attachment: MAPREDUCE-4842.patch Thanks for the reviews, Alejandro and Arun. I updated the patch to address Alejandro's comment and also added a comment clarifying why the merge callback occurs outside of the lock and after inProgress is cleared per a side discussion with Arun. Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4696) TestMRServerPorts throws NullReferenceException
[ https://issues.apache.org/jira/browse/MAPREDUCE-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510832#comment-13510832 ] Siddharth Seth commented on MAPREDUCE-4696: --- +1. Simple enough patch. Will commit this shortly. TestMRServerPorts throws NullReferenceException --- Key: MAPREDUCE-4696 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4696 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: mapreduce-4696-2.patch, mapreduce-4696.patch TestMRServerPorts throws {code} java.lang.NullPointerException at org.apache.hadoop.mapred.TestMRServerPorts.canStartJobTracker(TestMRServerPorts.java:99) at org.apache.hadoop.mapred.TestMRServerPorts.testJobTrackerPorts(TestMRServerPorts.java:152) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4697) TestMapredHeartbeat fails assertion on HeartbeatInterval
[ https://issues.apache.org/jira/browse/MAPREDUCE-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510833#comment-13510833 ] Siddharth Seth commented on MAPREDUCE-4697: --- +1. Will commit shortly. TestMapredHeartbeat fails assertion on HeartbeatInterval Key: MAPREDUCE-4697 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4697 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: mapreduce-4697.patch TestMapredHeartbeat fails test on heart beat interval {code} FAILED expected:300 but was:500 junit.framework.AssertionFailedError: expected:300 but was:500 at org.apache.hadoop.mapred.TestMapredHeartbeat.testJobDirCleanup(TestMapredHeartbeat.java:68) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4699) TestFairScheduler TestCapacityScheduler fails due to JobHistory exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-4699: -- Attachment: MAPREDUCE4699.txt The current patch looks good for the CapacityScheduler test. Updating the patch with similar changes for TestFairScheduler - and committing. TestFairScheduler TestCapacityScheduler fails due to JobHistory exception --- Key: MAPREDUCE-4699 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4699 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: mapreduce-4699.patch, MAPREDUCE4699.txt TestFairScheduler fails due to exception from mapred.JobHistory {code} null java.lang.NullPointerException at org.apache.hadoop.mapred.JobHistory$JobInfo.logJobPriority(JobHistory.java:1975) at org.apache.hadoop.mapred.JobInProgress.setPriority(JobInProgress.java:895) at org.apache.hadoop.mapred.TestFairScheduler.testFifoPool(TestFairScheduler.java:2617) {code} TestCapacityScheduler fails due to {code} java.lang.NullPointerException at org.apache.hadoop.mapred.JobHistory$JobInfo.logJobPriority(JobHistory.java:1976) at org.apache.hadoop.mapred.JobInProgress.setPriority(JobInProgress.java:895) at org.apache.hadoop.mapred.TestCapacityScheduler$FakeTaskTrackerManager.setPriority(TestCapacityScheduler.java:653) at org.apache.hadoop.mapred.TestCapacityScheduler.testHighPriorityJobInitialization(TestCapacityScheduler.java:2666) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4696) TestMRServerPorts throws NullReferenceException
[ https://issues.apache.org/jira/browse/MAPREDUCE-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-4696: -- Resolution: Fixed Fix Version/s: 1.1.2 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed. Thanks Gopal! TestMRServerPorts throws NullReferenceException --- Key: MAPREDUCE-4696 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4696 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 1.1.2 Attachments: mapreduce-4696-2.patch, mapreduce-4696.patch TestMRServerPorts throws {code} java.lang.NullPointerException at org.apache.hadoop.mapred.TestMRServerPorts.canStartJobTracker(TestMRServerPorts.java:99) at org.apache.hadoop.mapred.TestMRServerPorts.testJobTrackerPorts(TestMRServerPorts.java:152) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4697) TestMapredHeartbeat fails assertion on HeartbeatInterval
[ https://issues.apache.org/jira/browse/MAPREDUCE-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-4697: -- Resolution: Fixed Fix Version/s: 1.1.2 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed. Thanks Gopal! TestMapredHeartbeat fails assertion on HeartbeatInterval Key: MAPREDUCE-4697 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4697 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 1.1.2 Attachments: mapreduce-4697.patch TestMapredHeartbeat fails test on heart beat interval {code} FAILED expected:300 but was:500 junit.framework.AssertionFailedError: expected:300 but was:500 at org.apache.hadoop.mapred.TestMapredHeartbeat.testJobDirCleanup(TestMapredHeartbeat.java:68) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4699) TestFairScheduler TestCapacityScheduler fails due to JobHistory exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved MAPREDUCE-4699. --- Resolution: Fixed Fix Version/s: 1.1.2 Hadoop Flags: Reviewed Committed. Thanks Gopal! TestFairScheduler TestCapacityScheduler fails due to JobHistory exception --- Key: MAPREDUCE-4699 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4699 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 1.1.2 Attachments: mapreduce-4699.patch, MAPREDUCE4699.txt TestFairScheduler fails due to exception from mapred.JobHistory {code} null java.lang.NullPointerException at org.apache.hadoop.mapred.JobHistory$JobInfo.logJobPriority(JobHistory.java:1975) at org.apache.hadoop.mapred.JobInProgress.setPriority(JobInProgress.java:895) at org.apache.hadoop.mapred.TestFairScheduler.testFifoPool(TestFairScheduler.java:2617) {code} TestCapacityScheduler fails due to {code} java.lang.NullPointerException at org.apache.hadoop.mapred.JobHistory$JobInfo.logJobPriority(JobHistory.java:1976) at org.apache.hadoop.mapred.JobInProgress.setPriority(JobInProgress.java:895) at org.apache.hadoop.mapred.TestCapacityScheduler$FakeTaskTrackerManager.setPriority(TestCapacityScheduler.java:653) at org.apache.hadoop.mapred.TestCapacityScheduler.testHighPriorityJobInitialization(TestCapacityScheduler.java:2666) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4845) ClusterStatus.getMaxMemory() and getUsedMemory() exist in MR1 but not MR2
[ https://issues.apache.org/jira/browse/MAPREDUCE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510896#comment-13510896 ] Hadoop QA commented on MAPREDUCE-4845: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12556024/MAPREDUCE-4845-branch-1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3094//console This message is automatically generated. ClusterStatus.getMaxMemory() and getUsedMemory() exist in MR1 but not MR2 -- Key: MAPREDUCE-4845 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4845 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 1.1.1, 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-4845-branch-1.patch, MAPREDUCE-4845.patch For backwards compatibility, these methods should exist in both MR1 and MR2. Confusingly, these methods return the max memory and used memory of the jobtracker, not the entire cluster. I'd propose to add them to MR2 and return -1, and deprecate them in both MR1 and MR2. Alternatively, I could add plumbing to get the resource manager memory stats. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4839) TextPartioner for hashing Text with good hashing function to get better distribution
[ https://issues.apache.org/jira/browse/MAPREDUCE-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510901#comment-13510901 ] Hadoop QA commented on MAPREDUCE-4839: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555646/textpartitioner1.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3096//console This message is automatically generated. TextPartioner for hashing Text with good hashing function to get better distribution Key: MAPREDUCE-4839 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4839 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Radim Kolar Attachments: textpartitioner1.txt partitioner for Text keys using util.Hash framework for hashing function -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510899#comment-13510899 ] Hadoop QA commented on MAPREDUCE-4843: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12556081/MAPREDUCE-4843-branch-1.1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3095//console This message is automatically generated. When using DefaultTaskController, JobLocalizer not thread safe -- Key: MAPREDUCE-4843 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.1 Reporter: zhaoyunjiong Priority: Critical Attachments: MAPREDUCE-4843-branch-1.1.patch In our cluster, some times job will failed due to below exception: 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_23_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213) The root cause is JobLocalizer is not thread safe. In DefaultTaskController.initializeJob method: JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid); but in JobLocalizer, it just simply keep the reference of the conf. When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance. So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf. Then it will cause the previous job's job.xml stored at another user's dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4827) Increase hash quality of HashPartitioner
[ https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510911#comment-13510911 ] Hadoop QA commented on MAPREDUCE-4827: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555191/betterhash1.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3097//console This message is automatically generated. Increase hash quality of HashPartitioner Key: MAPREDUCE-4827 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Radim Kolar Attachments: betterhash1.txt hash partitioner is using object.hashCode() for splitting keys into partitions. This results in bad distributions because hashCode() quality is poor. These hashCode() functions are sometimes written by hand (very poor quality) and sometimes generated from by commons lang code (poor quality). Applying some transformation on top of hashCode() provides better distribution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4594) Add init/shutdown methods to mapreduce Partitioner
[ https://issues.apache.org/jira/browse/MAPREDUCE-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510922#comment-13510922 ] Hadoop QA commented on MAPREDUCE-4594: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12556006/partitioner1.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3098//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3098//console This message is automatically generated. Add init/shutdown methods to mapreduce Partitioner -- Key: MAPREDUCE-4594 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4594 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: trunk Reporter: Radim Kolar Attachments: partitioner1.txt The Partitioner supports only the Configurable API, which can be used for basic init in setConf(). Problem is that there is no shutdown function. I propose to use standard setup() cleanup() functions like in mapper / reducer. Use case is that I need to start and stop spring context and datagrid client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4839) TextPartioner for hashing Text with good hashing function to get better distribution
[ https://issues.apache.org/jira/browse/MAPREDUCE-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated MAPREDUCE-4839: --- Attachment: textpartitioner2.txt TextPartioner for hashing Text with good hashing function to get better distribution Key: MAPREDUCE-4839 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4839 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Radim Kolar Attachments: textpartitioner1.txt, textpartitioner2.txt partitioner for Text keys using util.Hash framework for hashing function -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4827) Increase hash quality of HashPartitioner
[ https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated MAPREDUCE-4827: --- Attachment: betterhash2.txt change it for old mapred api as well Increase hash quality of HashPartitioner Key: MAPREDUCE-4827 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Radim Kolar Attachments: betterhash1.txt, betterhash2.txt hash partitioner is using object.hashCode() for splitting keys into partitions. This results in bad distributions because hashCode() quality is poor. These hashCode() functions are sometimes written by hand (very poor quality) and sometimes generated from by commons lang code (poor quality). Applying some transformation on top of hashCode() provides better distribution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511003#comment-13511003 ] Karthik Kambatla commented on MAPREDUCE-4843: - [~zhaoyunjiong] The patch looks good. Can you post a patch against trunk for QA to be able to apply it. Also, I was wondering if it would be possible to add a test? When using DefaultTaskController, JobLocalizer not thread safe -- Key: MAPREDUCE-4843 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.1 Reporter: zhaoyunjiong Priority: Critical Attachments: MAPREDUCE-4843-branch-1.1.patch In our cluster, some times job will failed due to below exception: 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_23_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213) The root cause is JobLocalizer is not thread safe. In DefaultTaskController.initializeJob method: JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid); but in JobLocalizer, it just simply keep the reference of the conf. When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance. So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf. Then it will cause the previous job's job.xml stored at another user's dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511019#comment-13511019 ] zhaoyunjiong commented on MAPREDUCE-4843: - No need for trunk. In hadoop 2.0, the problem doesn't exist. It's very difficult to test a thread safe problem, even it's not thread safe, in most case it will pass it. When using DefaultTaskController, JobLocalizer not thread safe -- Key: MAPREDUCE-4843 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.1 Reporter: zhaoyunjiong Priority: Critical Attachments: MAPREDUCE-4843-branch-1.1.patch In our cluster, some times job will failed due to below exception: 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_23_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213) The root cause is JobLocalizer is not thread safe. In DefaultTaskController.initializeJob method: JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid); but in JobLocalizer, it just simply keep the reference of the conf. When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance. So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf. Then it will cause the previous job's job.xml stored at another user's dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4594) Add init/shutdown methods to mapreduce Partitioner
[ https://issues.apache.org/jira/browse/MAPREDUCE-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511028#comment-13511028 ] Harsh J commented on MAPREDUCE-4594: I notice no objects (such as an attempt context object) being passed into the setup and cleanup methods you wish to introduce here. Without that how is this helpful? In my mind I was viewing your proposal as a step over writing extends Configurable for new API partitioner implementations, when one needs at least the Configuration object instance to pull values out from. Plus, the ordering of these calls matter, so tests are absolutely necessary if we do not want to regress by accident in future. Add init/shutdown methods to mapreduce Partitioner -- Key: MAPREDUCE-4594 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4594 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: trunk Reporter: Radim Kolar Attachments: partitioner1.txt The Partitioner supports only the Configurable API, which can be used for basic init in setConf(). Problem is that there is no shutdown function. I propose to use standard setup() cleanup() functions like in mapper / reducer. Use case is that I need to start and stop spring context and datagrid client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511029#comment-13511029 ] Karthik Kambatla commented on MAPREDUCE-4843: - My bad - read the branch name wrong. I applied the patch locally, and verified that the tests that directly use {{DefaultTaskController}} pass - TestTaskTrackerLocalization, TestJvmManager, TestTaskEnvironment. +1 When using DefaultTaskController, JobLocalizer not thread safe -- Key: MAPREDUCE-4843 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.1 Reporter: zhaoyunjiong Priority: Critical Attachments: MAPREDUCE-4843-branch-1.1.patch In our cluster, some times job will failed due to below exception: 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_23_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213) The root cause is JobLocalizer is not thread safe. In DefaultTaskController.initializeJob method: JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid); but in JobLocalizer, it just simply keep the reference of the conf. When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance. So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf. Then it will cause the previous job's job.xml stored at another user's dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4847) Command Parsing in Hadoop Streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511031#comment-13511031 ] Peng Lei commented on MAPREDUCE-4847: - Thank you for your comment! I have put the command in a script file as a workaround, it works. But in this case, the command is not too complex to write a dedicate script file, and on fly script generating is a bit tricky(at least for maintainer). It seems hadoop can't run on windows without cygwin. Another solution may be: add a new option to instruct streaming to use an alternative command invoker, such as: -command_invoker sh -c This could solve the issue and didn't break the existing hadoop-streaming application. -Peng Command Parsing in Hadoop Streaming --- Key: MAPREDUCE-4847 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4847 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming Reporter: Peng Lei Labels: features Original Estimate: 4h Remaining Estimate: 4h Hadoop streaming parse the mapper and reducer commands by itself, this is not a good choice, when I write a complex mapper/reducer script inline, such as 'perl -ne ...', it don't work. An alternative way is to send the command to the shell, simply create new process(sh -c command_and_args), this not also simplize the streaming code, but also improve its capability! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511039#comment-13511039 ] Mariappan Asokan commented on MAPREDUCE-4842: - Hi Jason, Arun, and Alejandro, I came up with a simpler solution to solve this nasty problem. Instead of a single list {{inputs}} in {{MergeThread,}} we can keep a FIFO list of these lists. This will make sure that more than one merge can be pending. The {{run()}} method in {{MergeThread}} will keep pulling out the map output lists from the FIFO list to merge them(this is a typical producer-consumer scenario.) I will outline the changes below: In {{MergeThread}}, * A {{LinkedListListT}} type member({{pendingToBeMerged}}) is added and the member {{inputs}} is removed. * The {{isInProgress()}} method is removed. * The {{startMerge()}} method will no longer be {{synchronized.}} It will add the passed list to the tail of {{pendingToBeMerged}} and it will {{notifyAll()}} on the monitor of {{pendingToBeMerged.}} * The {{run()}} method will sit in a tight loop. So long as there is an item(list of map outputs) to be consumed, it will consume(merge) the item and remove it from {{pendingToBeMerged.}} If {pendingToBeMerged}} has no more item, it will {{notifyAll()}} on the object's monitor after setting {{inProgress}} to {{false.}} In {{MergeManager}}, * All calls to {{isInProgress()}} are removed. * Unnecessary {{synchronized}} clauses on merge thread objects are removed since the methods where they are in themselves are {{synchronized.}} I created a patch with the above changes and tested it on my laptop. The mapreduce tests seem to run without any problem. However, I do not claim that it is completely tested. It has to go through the rigorous testing that Jason did. If you are interested in taking a look at the patch, I will post it to this Jira. I welcome your questions and suggestions on the idea of the patch. -- Asokan Shuffle race can hang reducer - Key: MAPREDUCE-4842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch Saw an instance where the shuffle caused multiple reducers in a job to hang. It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4839) TextPartioner for hashing Text with good hashing function to get better distribution
[ https://issues.apache.org/jira/browse/MAPREDUCE-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511041#comment-13511041 ] Hadoop QA commented on MAPREDUCE-4839: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12556180/textpartitioner2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 2014 javac compiler warnings (more than the trunk's current 2013 warnings). {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3099//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3099//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3099//console This message is automatically generated. TextPartioner for hashing Text with good hashing function to get better distribution Key: MAPREDUCE-4839 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4839 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Radim Kolar Attachments: textpartitioner1.txt, textpartitioner2.txt partitioner for Text keys using util.Hash framework for hashing function -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4812) Create reduce input merger plugin in ReduceTask.java and pass it to Shuffle
[ https://issues.apache.org/jira/browse/MAPREDUCE-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511045#comment-13511045 ] Mariappan Asokan commented on MAPREDUCE-4812: - Hi Arun, I have some ideas to fix the problem in MAPREDUCE-4842. I posted my comments there. Please take a look. Thanks. -- Asokan Create reduce input merger plugin in ReduceTask.java and pass it to Shuffle --- Key: MAPREDUCE-4812 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4812 Project: Hadoop Map/Reduce Issue Type: Sub-task Affects Versions: 2.0.2-alpha Reporter: Mariappan Asokan Assignee: Mariappan Asokan Fix For: 2.0.3-alpha Attachments: COMBO-mapreduce-4809-4812.patch, COMBO-mapreduce-4809-4812.patch, mapreduce-4812.patch, mapreduce-4812.patch, mapreduce-4812.patch, mapreduce-4812.patch, mapreduce-4812.patch This is part of MAPREDUCE-2454. This further breaks down MAPREDUCE-4808 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4827) Increase hash quality of HashPartitioner
[ https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511046#comment-13511046 ] Hadoop QA commented on MAPREDUCE-4827: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12556183/betterhash2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3100//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3100//console This message is automatically generated. Increase hash quality of HashPartitioner Key: MAPREDUCE-4827 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Radim Kolar Attachments: betterhash1.txt, betterhash2.txt hash partitioner is using object.hashCode() for splitting keys into partitions. This results in bad distributions because hashCode() quality is poor. These hashCode() functions are sometimes written by hand (very poor quality) and sometimes generated from by commons lang code (poor quality). Applying some transformation on top of hashCode() provides better distribution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira