[jira] [Commented] (MAPREDUCE-2384) Can MR make error response Immediately?
[ https://issues.apache.org/jira/browse/MAPREDUCE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039150#comment-13039150 ] Harsh J Chouraria commented on MAPREDUCE-2384: -- Justification: Existing test cases already cover submissions. The change does not require a new one, IMO. Can MR make error response Immediately? --- Key: MAPREDUCE-2384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2384 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.21.0 Reporter: Denny Ye Assignee: Harsh J Chouraria Fix For: 0.23.0 Attachments: MAPREDUCE-2384.r1.diff When I read the source code of MapReduce in Hadoop 0.21.0, sometimes it made me confused about error response. For example: 1. JobSubmitter checking output for each job. MapReduce makes rule to limit that each job output must be not exist to avoid fault overwrite. In my opinion, MR should verify output at the point of client submitting. Actually, it copies related files to specified target and then, doing the verifying. 2. JobTracker. Job has been submitted to JobTracker. In first step, JT create JIT object that is very huge . Next step, JT start to verify job queue authority and memory requirements. In normal case, verifying client input then response immediately if any cases in fault. Regular logic can be performed if all the inputs have passed. It seems like that those code does not make sense for understanding. Is only my personal opinion? Wish someone help me to explain the details. Thanks! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-1347) Missing synchronization in MultipleOutputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1347: - Attachment: MAPREDUCE-1347.r2.diff Patch that uses Guava's MapMaker.makeComputingMap. Missing synchronization in MultipleOutputFormat --- Key: MAPREDUCE-1347 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1347 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Attachments: MAPREDUCE-1347.r2.diff, mapreduce.1347.r1.diff MultipleOutputFormat's RecordWriter implementation doesn't use synchronization when accessing the recordWriters member. When using multithreaded mappers or reducers, this can result in problems where two threads will both try to create the same file, causing AlreadyBeingCreatedException. Doing this more fine-grained than just synchronizing the whole method is probably a good idea, so that multithreaded mappers can actually achieve parallelism writing into separate output streams. From what I can tell, the new API's MultipleOutputs seems not to have this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1347) Missing synchronization in MultipleOutputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038623#comment-13038623 ] Harsh J Chouraria commented on MAPREDUCE-1347: -- Note: Guava has been added as a dependency. A thread where it was agreed upon can be found here: http://search-hadoop.com/m/NrnW72tdHRD1/guavasubj=Add+Guava+as+a+dependency+ Missing synchronization in MultipleOutputFormat --- Key: MAPREDUCE-1347 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1347 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Attachments: MAPREDUCE-1347.r2.diff, mapreduce.1347.r1.diff MultipleOutputFormat's RecordWriter implementation doesn't use synchronization when accessing the recordWriters member. When using multithreaded mappers or reducers, this can result in problems where two threads will both try to create the same file, causing AlreadyBeingCreatedException. Doing this more fine-grained than just synchronizing the whole method is probably a good idea, so that multithreaded mappers can actually achieve parallelism writing into separate output streams. From what I can tell, the new API's MultipleOutputs seems not to have this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-1347) Missing synchronization in MultipleOutputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1347: - Release Note: Fix missing synchronization in MultiOutputFormat. Status: Patch Available (was: Open) Tests added for multi-threaded exec of MultiTextOF. Missing synchronization in MultipleOutputFormat --- Key: MAPREDUCE-1347 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1347 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.21.0, 0.20.2, 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Attachments: MAPREDUCE-1347.r2.diff, mapreduce.1347.r1.diff MultipleOutputFormat's RecordWriter implementation doesn't use synchronization when accessing the recordWriters member. When using multithreaded mappers or reducers, this can result in problems where two threads will both try to create the same file, causing AlreadyBeingCreatedException. Doing this more fine-grained than just synchronizing the whole method is probably a good idea, so that multithreaded mappers can actually achieve parallelism writing into separate output streams. From what I can tell, the new API's MultipleOutputs seems not to have this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-1347) Missing synchronization in MultipleOutputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1347: - Attachment: MAPREDUCE-1347.r3.diff The last patch's test case data may not have been sufficient to test out the changes. Only test data contents have been changed in this new patch. Missing synchronization in MultipleOutputFormat --- Key: MAPREDUCE-1347 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1347 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Attachments: MAPREDUCE-1347.r2.diff, MAPREDUCE-1347.r3.diff, mapreduce.1347.r1.diff MultipleOutputFormat's RecordWriter implementation doesn't use synchronization when accessing the recordWriters member. When using multithreaded mappers or reducers, this can result in problems where two threads will both try to create the same file, causing AlreadyBeingCreatedException. Doing this more fine-grained than just synchronizing the whole method is probably a good idea, so that multithreaded mappers can actually achieve parallelism writing into separate output streams. From what I can tell, the new API's MultipleOutputs seems not to have this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2384) Can MR make error response Immediately?
[ https://issues.apache.org/jira/browse/MAPREDUCE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2384: - Attachment: MAPREDUCE-2384.r1.diff Can MR make error response Immediately? --- Key: MAPREDUCE-2384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2384 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.21.0 Reporter: Denny Ye Assignee: Harsh J Chouraria Attachments: MAPREDUCE-2384.r1.diff When I read the source code of MapReduce in Hadoop 0.21.0, sometimes it made me confused about error response. For example: 1. JobSubmitter checking output for each job. MapReduce makes rule to limit that each job output must be not exist to avoid fault overwrite. In my opinion, MR should verify output at the point of client submitting. Actually, it copies related files to specified target and then, doing the verifying. 2. JobTracker. Job has been submitted to JobTracker. In first step, JT create JIT object that is very huge . Next step, JT start to verify job queue authority and memory requirements. In normal case, verifying client input then response immediately if any cases in fault. Regular logic can be performed if all the inputs have passed. It seems like that those code does not make sense for understanding. Is only my personal opinion? Wish someone help me to explain the details. Thanks! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2384) Can MR make error response Immediately?
[ https://issues.apache.org/jira/browse/MAPREDUCE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2384: - Fix Version/s: 0.23.0 Release Note: Submitter should fail on errors early, before transferring files. Status: Patch Available (was: Open) As before, I do not think refactoring (2) is a good idea maintenance-wise. Here's a patch for just the reordering of (1). Some simple jobsubs pass with the change -- I believe existing test cases cover this change already; but let me know if not. Can MR make error response Immediately? --- Key: MAPREDUCE-2384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2384 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.21.0 Reporter: Denny Ye Assignee: Harsh J Chouraria Fix For: 0.23.0 Attachments: MAPREDUCE-2384.r1.diff When I read the source code of MapReduce in Hadoop 0.21.0, sometimes it made me confused about error response. For example: 1. JobSubmitter checking output for each job. MapReduce makes rule to limit that each job output must be not exist to avoid fault overwrite. In my opinion, MR should verify output at the point of client submitting. Actually, it copies related files to specified target and then, doing the verifying. 2. JobTracker. Job has been submitted to JobTracker. In first step, JT create JIT object that is very huge . Next step, JT start to verify job queue authority and memory requirements. In normal case, verifying client input then response immediately if any cases in fault. Regular logic can be performed if all the inputs have passed. It seems like that those code does not make sense for understanding. Is only my personal opinion? Wish someone help me to explain the details. Thanks! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2328) memory-related configurations missing from mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2328: - Status: Open (was: Patch Available) Thanks Todd. I'll put up a fixed patch shortly. memory-related configurations missing from mapred-default.xml - Key: MAPREDUCE-2328 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2328 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Labels: newbie Fix For: 0.22.0 Attachments: MAPREDUCE-2328.r1.diff HADOOP-5881 added new configuration parameters for memory-based scheduling, but they weren't added to mapred-default.xml -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2328) memory-related configurations missing from mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2328: - Attachment: MAPREDUCE-2328.r1.diff memory-related configurations missing from mapred-default.xml - Key: MAPREDUCE-2328 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2328 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Labels: newbie Fix For: 0.22.0 Attachments: MAPREDUCE-2328.r1.diff HADOOP-5881 added new configuration parameters for memory-based scheduling, but they weren't added to mapred-default.xml -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-2328) memory-related configurations missing from mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-2328: Assignee: Harsh J Chouraria memory-related configurations missing from mapred-default.xml - Key: MAPREDUCE-2328 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2328 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Labels: newbie Fix For: 0.22.0 Attachments: MAPREDUCE-2328.r1.diff HADOOP-5881 added new configuration parameters for memory-based scheduling, but they weren't added to mapred-default.xml -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2328) memory-related configurations missing from mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2328: - Status: Patch Available (was: Open) Patch with brief description of the options, in mapred-default.xml. memory-related configurations missing from mapred-default.xml - Key: MAPREDUCE-2328 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2328 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Labels: newbie Fix For: 0.22.0 Attachments: MAPREDUCE-2328.r1.diff HADOOP-5881 added new configuration parameters for memory-based scheduling, but they weren't added to mapred-default.xml -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-2410) document multiple keys per reducer oddity in hadoop streaming FAQ
[ https://issues.apache.org/jira/browse/MAPREDUCE-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-2410: Assignee: Harsh J Chouraria document multiple keys per reducer oddity in hadoop streaming FAQ - Key: MAPREDUCE-2410 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2410 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming, documentation Affects Versions: 0.20.2 Reporter: Dieter Plaetinck Assignee: Harsh J Chouraria Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: MAPREDUCE-2410.r1.diff Original Estimate: 40m Remaining Estimate: 40m Hi, for a newcomer to hadoop streaming, it comes as a surprise that the reducer receives arbitrary keys, unlike the real hadoop where a reducer works on a single key. An explanation for this is @ http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/browser I suggest to add this to the FAQ of hadoop streaming -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2410) document multiple keys per reducer oddity in hadoop streaming FAQ
[ https://issues.apache.org/jira/browse/MAPREDUCE-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2410: - Fix Version/s: 0.23.0 Affects Version/s: 0.20.2 Status: Patch Available (was: Open) document multiple keys per reducer oddity in hadoop streaming FAQ - Key: MAPREDUCE-2410 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2410 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming, documentation Affects Versions: 0.20.2 Reporter: Dieter Plaetinck Assignee: Harsh J Chouraria Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: MAPREDUCE-2410.r1.diff Original Estimate: 40m Remaining Estimate: 40m Hi, for a newcomer to hadoop streaming, it comes as a surprise that the reducer receives arbitrary keys, unlike the real hadoop where a reducer works on a single key. An explanation for this is @ http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/browser I suggest to add this to the FAQ of hadoop streaming -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2383) Improve documentation of DistributedCache methods
[ https://issues.apache.org/jira/browse/MAPREDUCE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2383: - Attachment: MAPREDUCE-2383.r1.diff Patch that updates Job's javadocs to reflect the difference. Improve documentation of DistributedCache methods - Key: MAPREDUCE-2383 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2383 Project: Hadoop Map/Reduce Issue Type: Task Components: distributed-cache, documentation Affects Versions: 0.23.0 Reporter: Todd Lipcon Labels: newbie Fix For: 0.23.0 Attachments: MAPREDUCE-2383.r1.diff Users find the various methods in DistributedCache confusing - it's not clearly documented what the difference is between addArchiveToClassPath vs addFileToClassPath. We should improve the docs to clarify this and perhaps add an example that uses the DistributedCache. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-2383) Improve documentation of DistributedCache methods
[ https://issues.apache.org/jira/browse/MAPREDUCE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-2383: Assignee: Harsh J Chouraria Improve documentation of DistributedCache methods - Key: MAPREDUCE-2383 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2383 Project: Hadoop Map/Reduce Issue Type: Task Components: distributed-cache, documentation Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Labels: newbie Fix For: 0.23.0 Attachments: MAPREDUCE-2383.r1.diff Users find the various methods in DistributedCache confusing - it's not clearly documented what the difference is between addArchiveToClassPath vs addFileToClassPath. We should improve the docs to clarify this and perhaps add an example that uses the DistributedCache. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2383) Improve documentation of DistributedCache methods
[ https://issues.apache.org/jira/browse/MAPREDUCE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2383: - Fix Version/s: 0.23.0 Status: Patch Available (was: Open) Improve documentation of DistributedCache methods - Key: MAPREDUCE-2383 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2383 Project: Hadoop Map/Reduce Issue Type: Task Components: distributed-cache, documentation Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Labels: newbie Fix For: 0.23.0 Attachments: MAPREDUCE-2383.r1.diff Users find the various methods in DistributedCache confusing - it's not clearly documented what the difference is between addArchiveToClassPath vs addFileToClassPath. We should improve the docs to clarify this and perhaps add an example that uses the DistributedCache. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2410) document multiple keys per reducer oddity in hadoop streaming FAQ
[ https://issues.apache.org/jira/browse/MAPREDUCE-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2410: - Status: Open (was: Patch Available) document multiple keys per reducer oddity in hadoop streaming FAQ - Key: MAPREDUCE-2410 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2410 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming, documentation Affects Versions: 0.20.2 Reporter: Dieter Plaetinck Assignee: Harsh J Chouraria Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: MAPREDUCE-2410.r1.diff Original Estimate: 40m Remaining Estimate: 40m Hi, for a newcomer to hadoop streaming, it comes as a surprise that the reducer receives arbitrary keys, unlike the real hadoop where a reducer works on a single key. An explanation for this is @ http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/browser I suggest to add this to the FAQ of hadoop streaming -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2410) document multiple keys per reducer oddity in hadoop streaming FAQ
[ https://issues.apache.org/jira/browse/MAPREDUCE-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2410: - Attachment: MAPREDUCE-2410.r2.diff Incorporated the suggestion (#1) from Dieter. Would adding a link to an external be a good idea from the ASF POV? I think its better if that goes to the wiki pages instead? I may be being too paranoid, so let me know :P document multiple keys per reducer oddity in hadoop streaming FAQ - Key: MAPREDUCE-2410 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2410 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming, documentation Affects Versions: 0.20.2 Reporter: Dieter Plaetinck Assignee: Harsh J Chouraria Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: MAPREDUCE-2410.r1.diff, MAPREDUCE-2410.r2.diff Original Estimate: 40m Remaining Estimate: 40m Hi, for a newcomer to hadoop streaming, it comes as a surprise that the reducer receives arbitrary keys, unlike the real hadoop where a reducer works on a single key. An explanation for this is @ http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/browser I suggest to add this to the FAQ of hadoop streaming -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2410) document multiple keys per reducer oddity in hadoop streaming FAQ
[ https://issues.apache.org/jira/browse/MAPREDUCE-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031845#comment-13031845 ] Harsh J Chouraria commented on MAPREDUCE-2410: -- Dieter - I've added some info here: http://wiki.apache.org/hadoop/HadoopStreaming/AlternativeInterfaces document multiple keys per reducer oddity in hadoop streaming FAQ - Key: MAPREDUCE-2410 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2410 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming, documentation Affects Versions: 0.20.2 Reporter: Dieter Plaetinck Assignee: Harsh J Chouraria Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: MAPREDUCE-2410.r1.diff, MAPREDUCE-2410.r2.diff, MAPREDUCE-2410.r3.diff Original Estimate: 40m Remaining Estimate: 40m Hi, for a newcomer to hadoop streaming, it comes as a surprise that the reducer receives arbitrary keys, unlike the real hadoop where a reducer works on a single key. An explanation for this is @ http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/browser I suggest to add this to the FAQ of hadoop streaming -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2410) document multiple keys per reducer oddity in hadoop streaming FAQ
[ https://issues.apache.org/jira/browse/MAPREDUCE-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2410: - Attachment: MAPREDUCE-2410.r3.diff Add a wiki page link. document multiple keys per reducer oddity in hadoop streaming FAQ - Key: MAPREDUCE-2410 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2410 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming, documentation Affects Versions: 0.20.2 Reporter: Dieter Plaetinck Assignee: Harsh J Chouraria Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: MAPREDUCE-2410.r1.diff, MAPREDUCE-2410.r2.diff, MAPREDUCE-2410.r3.diff Original Estimate: 40m Remaining Estimate: 40m Hi, for a newcomer to hadoop streaming, it comes as a surprise that the reducer receives arbitrary keys, unlike the real hadoop where a reducer works on a single key. An explanation for this is @ http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/browser I suggest to add this to the FAQ of hadoop streaming -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2410) document multiple keys per reducer oddity in hadoop streaming FAQ
[ https://issues.apache.org/jira/browse/MAPREDUCE-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2410: - Release Note: Add an FAQ entry regarding the differences between Java API and Streaming development of MR programs. Status: Patch Available (was: Open) document multiple keys per reducer oddity in hadoop streaming FAQ - Key: MAPREDUCE-2410 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2410 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming, documentation Affects Versions: 0.20.2 Reporter: Dieter Plaetinck Assignee: Harsh J Chouraria Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: MAPREDUCE-2410.r1.diff, MAPREDUCE-2410.r2.diff, MAPREDUCE-2410.r3.diff Original Estimate: 40m Remaining Estimate: 40m Hi, for a newcomer to hadoop streaming, it comes as a surprise that the reducer receives arbitrary keys, unlike the real hadoop where a reducer works on a single key. An explanation for this is @ http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/browser I suggest to add this to the FAQ of hadoop streaming -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2384) Can MR make error response Immediately?
[ https://issues.apache.org/jira/browse/MAPREDUCE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030471#comment-13030471 ] Harsh J Chouraria commented on MAPREDUCE-2384: -- 1. is an easy one to fix (basically to move job spec check a step up). I have a patch for this in pipeline. 2. as per OP, is to not build JIP until after the config checks. I think it is alright to have it the way it is now, since to check pre-JIP construct would still require one extra lookup to occur (the config props that are to be checked, are also used elsewhere later). Besides, its easier to read/maintain with the JIP methods and I don't think the construction time (a few property loads, some array decls) would take much time. What are your thoughts on 2.; would we benefit enough to refactor the parts to not use JIP (and construct it only after validity is verified)? Can MR make error response Immediately? --- Key: MAPREDUCE-2384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2384 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.21.0 Reporter: Denny Ye Assignee: Harsh J Chouraria When I read the source code of MapReduce in Hadoop 0.21.0, sometimes it made me confused about error response. For example: 1. JobSubmitter checking output for each job. MapReduce makes rule to limit that each job output must be not exist to avoid fault overwrite. In my opinion, MR should verify output at the point of client submitting. Actually, it copies related files to specified target and then, doing the verifying. 2. JobTracker. Job has been submitted to JobTracker. In first step, JT create JIT object that is very huge . Next step, JT start to verify job queue authority and memory requirements. In normal case, verifying client input then response immediately if any cases in fault. Regular logic can be performed if all the inputs have passed. It seems like that those code does not make sense for understanding. Is only my personal opinion? Wish someone help me to explain the details. Thanks! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2423) Monitoring the job tracker ui of hadoop using other open source monitoring tools like Nagios
[ https://issues.apache.org/jira/browse/MAPREDUCE-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030473#comment-13030473 ] Harsh J Chouraria commented on MAPREDUCE-2423: -- Saurabh - You can use the JMX metrics Hadoop pushes out via a plugin like check_jmx in Nagios, as Allen pointed out: http://exchange.nagios.org/directory/Plugins/Java-Applications-and-Servers/check_jmx/details Would that work for you? Monitoring the job tracker ui of hadoop using other open source monitoring tools like Nagios Key: MAPREDUCE-2423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2423 Project: Hadoop Map/Reduce Issue Type: Wish Components: jobtracker Reporter: Saurabh Mishra Priority: Trivial I just wish if there is a way I can write monitors to check my hadoop job tracker UI using my existing Nagios infrastructure. As this would help me in keeping everything centrally located and hence under manageable limits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1347) Missing synchronization in MultipleOutputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030378#comment-13030378 ] Harsh J Chouraria commented on MAPREDUCE-1347: -- Hrm, I agree and apologize, that was _silly_. Would using a synchronized map solve this? Missing synchronization in MultipleOutputFormat --- Key: MAPREDUCE-1347 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1347 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Attachments: mapreduce.1347.r1.diff MultipleOutputFormat's RecordWriter implementation doesn't use synchronization when accessing the recordWriters member. When using multithreaded mappers or reducers, this can result in problems where two threads will both try to create the same file, causing AlreadyBeingCreatedException. Doing this more fine-grained than just synchronizing the whole method is probably a good idea, so that multithreaded mappers can actually achieve parallelism writing into separate output streams. From what I can tell, the new API's MultipleOutputs seems not to have this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-2384) Can MR make error response Immediately?
[ https://issues.apache.org/jira/browse/MAPREDUCE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-2384: Assignee: Harsh J Chouraria Can MR make error response Immediately? --- Key: MAPREDUCE-2384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2384 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.21.0 Reporter: Denny Ye Assignee: Harsh J Chouraria When I read the source code of MapReduce in Hadoop 0.21.0, sometimes it made me confused about error response. For example: 1. JobSubmitter checking output for each job. MapReduce makes rule to limit that each job output must be not exist to avoid fault overwrite. In my opinion, MR should verify output at the point of client submitting. Actually, it copies related files to specified target and then, doing the verifying. 2. JobTracker. Job has been submitted to JobTracker. In first step, JT create JIT object that is very huge . Next step, JT start to verify job queue authority and memory requirements. In normal case, verifying client input then response immediately if any cases in fault. Regular logic can be performed if all the inputs have passed. It seems like that those code does not make sense for understanding. Is only my personal opinion? Wish someone help me to explain the details. Thanks! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2474) Add docs to the new API Partitioner on how to access Job Configuration data
Add docs to the new API Partitioner on how to access Job Configuration data --- Key: MAPREDUCE-2474 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2474 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation Affects Versions: 0.21.0 Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Minor Fix For: 0.22.0, 0.23.0 The new Partitioner interface does not extend Configurable classes as the old one and thus need to carry a tip on how to implement a custom partitioner that needs to feed off the Job Configuration data to work. Attaching a patch that adds in the javadoc fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2474) Add docs to the new API Partitioner on how to access Job Configuration data
[ https://issues.apache.org/jira/browse/MAPREDUCE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2474: - Status: Patch Available (was: Open) Marking as P-A. Add docs to the new API Partitioner on how to access Job Configuration data --- Key: MAPREDUCE-2474 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2474 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation Affects Versions: 0.21.0 Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Minor Labels: documentation, partitioners Fix For: 0.22.0, 0.23.0 Attachments: MAPREDUCE-2474.r1.diff Original Estimate: 1m Remaining Estimate: 1m The new Partitioner interface does not extend Configurable classes as the old one and thus need to carry a tip on how to implement a custom partitioner that needs to feed off the Job Configuration data to work. Attaching a patch that adds in the javadoc fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2474) Add docs to the new API Partitioner on how to access Job Configuration data
[ https://issues.apache.org/jira/browse/MAPREDUCE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2474: - Attachment: MAPREDUCE-2474.r1.diff Doc-fix patch. Add docs to the new API Partitioner on how to access Job Configuration data --- Key: MAPREDUCE-2474 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2474 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation Affects Versions: 0.21.0 Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Minor Labels: documentation, partitioners Fix For: 0.22.0, 0.23.0 Attachments: MAPREDUCE-2474.r1.diff Original Estimate: 1m Remaining Estimate: 1m The new Partitioner interface does not extend Configurable classes as the old one and thus need to carry a tip on how to implement a custom partitioner that needs to feed off the Job Configuration data to work. Attaching a patch that adds in the javadoc fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2170) Send out last-minute load averages in TaskTrackerStatus
[ https://issues.apache.org/jira/browse/MAPREDUCE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2170: - Fix Version/s: 0.23.0 Issue Type: New Feature (was: Improvement) Send out last-minute load averages in TaskTrackerStatus --- Key: MAPREDUCE-2170 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2170 Project: Hadoop Map/Reduce Issue Type: New Feature Components: jobtracker Affects Versions: 0.22.0 Environment: GNU/Linux Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Critical Fix For: 0.23.0 Attachments: mapreduce.loadaverage.r3.diff, mapreduce.loadaverage.r4.diff, mapreduce.loadaverage.r5.diff Original Estimate: 20m Remaining Estimate: 20m Load averages could be useful in scheduling. This patch looks to extend the existing Linux resource plugin (via /proc/loadavg file) to allow transmitting load averages of the last one minute via the TaskTrackerStatus. Patch is up for review, with test cases added, at: https://reviews.apache.org/r/20/ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1720) 'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027582#comment-13027582 ] Harsh J Chouraria commented on MAPREDUCE-1720: -- Amar, it would seem that what you ask for may be present in NG's MAPREDUCE-2399 Am not sure how that ticket affects this and all other pending UI issues, but it looks like a welcome change from the usual JSP pages. 'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI Key: MAPREDUCE-1720 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1720 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Environment: all Reporter: Subramaniam Krishnan Assignee: Harsh J Chouraria Fix For: 0.23.0 Attachments: mapred.failed.killed.difference.png, mapreduce.unsuccessfuljobs.ui.r1.diff The JobTracker UI shows both Failed/Killed Jobs as Failed. The Killed job status has been separated from Failed as part of HADOOP-3924, so the UI needs to be updated to reflect the same. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2236) No task may execute due to an Integer overflow possibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2236: - Status: Patch Available (was: Open) No task may execute due to an Integer overflow possibility -- Key: MAPREDUCE-2236 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2236 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Environment: Linux, Hadoop 0.20.2 Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Critical Fix For: 0.23.0 Attachments: MAPREDUCE-2236.r1.diff If the attempts is configured to use Integer.MAX_VALUE, an overflow occurs inside TaskInProgress, and thereby no task is attempted by the cluster and the map tasks stay in pending state forever. For example, here's a job driver that causes this: {code} import java.io.IOException; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.lib.IdentityMapper; import org.apache.hadoop.mapred.lib.NullOutputFormat; @SuppressWarnings(deprecation) public class IntegerOverflow { /** * @param args * @throws IOException */ @SuppressWarnings(deprecation) public static void main(String[] args) throws IOException { JobConf conf = new JobConf(); Path inputPath = new Path(ignore); FileSystem fs = FileSystem.get(conf); if (!fs.exists(inputPath)) { FSDataOutputStream out = fs.create(inputPath); out.writeChars(Test); out.close(); } conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(NullOutputFormat.class); FileInputFormat.addInputPath(conf, inputPath); conf.setMapperClass(IdentityMapper.class); conf.setNumMapTasks(1); // Problem inducing line follows. conf.setMaxMapAttempts(Integer.MAX_VALUE); // No reducer in this test, although setMaxReduceAttempts leads to the same problem. conf.setNumReduceTasks(0); JobClient.runJob(conf); } } {code} The above code will not let any map task run. Additionally, a log would be created inside JobTracker logs with the following information that clearly shows the overflow: {code} 2010-12-30 00:59:07,836 WARN org.apache.hadoop.mapred.TaskInProgress: Exceeded limit of -2147483648 (plus 0 killed) attempts for the tip 'task_201012300058_0001_m_00' {code} The issue lies inside the TaskInProgress class (/o/a/h/mapred/TaskInProgress.java), at line 1018 (trunk), part of the getTaskToRun(String taskTracker) method. {code} public Task getTaskToRun(String taskTracker) throws IOException { // Create the 'taskid'; do not count the 'killed' tasks against the job! TaskAttemptID taskid = null; /* THIS LINE v == */ if (nextTaskId (MAX_TASK_EXECS + maxTaskAttempts + numKilledTasks)) { /* THIS LINE ^== */ // Make sure that the attempts are unqiue across restarts int attemptId = job.getNumRestarts() * NUM_ATTEMPTS_PER_RESTART + nextTaskId; taskid = new TaskAttemptID( id, attemptId); ++nextTaskId; } else { LOG.warn(Exceeded limit of + (MAX_TASK_EXECS + maxTaskAttempts) + (plus + numKilledTasks + killed) + attempts for the tip ' + getTIPId() + '); return null; } {code} Since all three variables being added are integer in type, one of them being Integer.MAX_VALUE makes the condition fail with an overflow, thereby logging and returning a null as the result is negative. One solution would be to make one of these variables into a long, so the addition does not overflow? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2236) No task may execute due to an Integer overflow possibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2236: - Attachment: MAPREDUCE-2236.r1.diff Patch that caps maximum attempts at 100. No task may execute due to an Integer overflow possibility -- Key: MAPREDUCE-2236 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2236 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Environment: Linux, Hadoop 0.20.2 Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Critical Fix For: 0.23.0 Attachments: MAPREDUCE-2236.r1.diff If the attempts is configured to use Integer.MAX_VALUE, an overflow occurs inside TaskInProgress, and thereby no task is attempted by the cluster and the map tasks stay in pending state forever. For example, here's a job driver that causes this: {code} import java.io.IOException; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.lib.IdentityMapper; import org.apache.hadoop.mapred.lib.NullOutputFormat; @SuppressWarnings(deprecation) public class IntegerOverflow { /** * @param args * @throws IOException */ @SuppressWarnings(deprecation) public static void main(String[] args) throws IOException { JobConf conf = new JobConf(); Path inputPath = new Path(ignore); FileSystem fs = FileSystem.get(conf); if (!fs.exists(inputPath)) { FSDataOutputStream out = fs.create(inputPath); out.writeChars(Test); out.close(); } conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(NullOutputFormat.class); FileInputFormat.addInputPath(conf, inputPath); conf.setMapperClass(IdentityMapper.class); conf.setNumMapTasks(1); // Problem inducing line follows. conf.setMaxMapAttempts(Integer.MAX_VALUE); // No reducer in this test, although setMaxReduceAttempts leads to the same problem. conf.setNumReduceTasks(0); JobClient.runJob(conf); } } {code} The above code will not let any map task run. Additionally, a log would be created inside JobTracker logs with the following information that clearly shows the overflow: {code} 2010-12-30 00:59:07,836 WARN org.apache.hadoop.mapred.TaskInProgress: Exceeded limit of -2147483648 (plus 0 killed) attempts for the tip 'task_201012300058_0001_m_00' {code} The issue lies inside the TaskInProgress class (/o/a/h/mapred/TaskInProgress.java), at line 1018 (trunk), part of the getTaskToRun(String taskTracker) method. {code} public Task getTaskToRun(String taskTracker) throws IOException { // Create the 'taskid'; do not count the 'killed' tasks against the job! TaskAttemptID taskid = null; /* THIS LINE v == */ if (nextTaskId (MAX_TASK_EXECS + maxTaskAttempts + numKilledTasks)) { /* THIS LINE ^== */ // Make sure that the attempts are unqiue across restarts int attemptId = job.getNumRestarts() * NUM_ATTEMPTS_PER_RESTART + nextTaskId; taskid = new TaskAttemptID( id, attemptId); ++nextTaskId; } else { LOG.warn(Exceeded limit of + (MAX_TASK_EXECS + maxTaskAttempts) + (plus + numKilledTasks + killed) + attempts for the tip ' + getTIPId() + '); return null; } {code} Since all three variables being added are integer in type, one of them being Integer.MAX_VALUE makes the condition fail with an overflow, thereby logging and returning a null as the result is negative. One solution would be to make one of these variables into a long, so the addition does not overflow? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2397) Allow user to sort jobs in different sections (Completed, Failed, etc.) by the various columns available
[ https://issues.apache.org/jira/browse/MAPREDUCE-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017906#comment-13017906 ] Harsh J Chouraria commented on MAPREDUCE-2397: -- MAPREDUCE-2399 (Part of the MAPREDUCE-279 tree) appears to have this as part of its revamped interface. Allow user to sort jobs in different sections (Completed, Failed, etc.) by the various columns available Key: MAPREDUCE-2397 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2397 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Reporter: Stephen Tunney Priority: Trivial Labels: interface, jsp, page, user, web It would be nice (IMHO) to be able to sort the tables on the jobtracker.jsp page by any column (jobID would be most logical at first) so that one could eliminate scrolling all of the time. Perhaps also have the page save the user's sorting preferences per table too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (MAPREDUCE-486) JobTracker web UI counts COMMIT_PENDING tasks as Running
[ https://issues.apache.org/jira/browse/MAPREDUCE-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-486: --- Assignee: Harsh J Chouraria JobTracker web UI counts COMMIT_PENDING tasks as Running Key: MAPREDUCE-486 URL: https://issues.apache.org/jira/browse/MAPREDUCE-486 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Todd Lipcon Assignee: Harsh J Chouraria Priority: Minor In jobdetails.jsp, tasks in COMMIT_PENDING state are listed as Running. I propose creating another column in this table for COMMIT_PENDING tasks, since users find it confusing that a given job can have more tasks Running than their total cluster capacity. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2390) JobTracker and TaskTrackers fail with a misleading error if one of the mapreduce.cluster.dir has unusable permissions / is unavailable.
JobTracker and TaskTrackers fail with a misleading error if one of the mapreduce.cluster.dir has unusable permissions / is unavailable. --- Key: MAPREDUCE-2390 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2390 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker, tasktracker Affects Versions: 0.20.2 Environment: CDH3 and Apache 0.20 || Linux Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria To reproduce, have a mapred.local.dir property set to a few directories. Before starting up the JT, set one of these directories' permission as 'd-', and then start the JT/TT. The JT, although it tries to ignore this directory, fails with an odd and misleading message claiming that its configured address in use. Fixing the permission clears this issue! This was also reported in the mailing lists by Ted Yu, quite a few months ago. But I had forgotten about filing a bug for it here. Still seems to happen. A log is attached below. {code} 2011-03-17 00:40:32,321 WARN org.apache.hadoop.mapred.JobTracker: Error starting tracker: java.io.IOException: Cannot create toBeDeleted in /home/hack/.tmplocalz/2 at org.apache.hadoop.util.MRAsyncDiskService.init(MRAsyncDiskService.java:86) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2189) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2022) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:276) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:268) at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4712) 2011-03-17 00:40:33,322 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 2011-03-17 00:40:33,322 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s) 2011-03-17 00:40:33,322 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 2011-03-17 00:40:33,322 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) 2011-03-17 00:40:33,322 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2011-03-17 00:40:33,350 INFO org.apache.hadoop.mapred.JobTracker: Starting jobtracker with owner as hack 2011-03-17 00:40:33,351 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to localhost/127.0.0.1:8021 : Address already in use at org.apache.hadoop.ipc.Server.bind(Server.java:227) at org.apache.hadoop.ipc.Server$Listener.init(Server.java:314) at org.apache.hadoop.ipc.Server.init(Server.java:1411) at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:510) at org.apache.hadoop.ipc.RPC.getServer(RPC.java:471) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2112) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2022) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:276) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:268) at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4712) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at org.apache.hadoop.ipc.Server.bind(Server.java:225) ... 9 more 2011-03-17 00:40:33,352 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down JobTracker at QDuo/127.0.0.1 / {code} The list conversation in context, at {{search-hadoop.com}}: http://search-hadoop.com/m/FzN7iqreL/problem+starting+cdh3b2+jobtrackersubj=problem+starting+cdh3b2+jobtracker I'll try to investigate and post the exact problem / solution soon. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2236) No task may execute due to an Integer overflow possibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007091#comment-13007091 ] Harsh J Chouraria commented on MAPREDUCE-2236: -- I'm wondering on how to cap this? Would it be best capped at the set level, or checked and capped at the get level? I'm thinking 'get' is better. No task may execute due to an Integer overflow possibility -- Key: MAPREDUCE-2236 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2236 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Environment: Linux, Hadoop 0.20.2 Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Critical Fix For: 0.23.0 If the attempts is configured to use Integer.MAX_VALUE, an overflow occurs inside TaskInProgress, and thereby no task is attempted by the cluster and the map tasks stay in pending state forever. For example, here's a job driver that causes this: {code} import java.io.IOException; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.lib.IdentityMapper; import org.apache.hadoop.mapred.lib.NullOutputFormat; @SuppressWarnings(deprecation) public class IntegerOverflow { /** * @param args * @throws IOException */ @SuppressWarnings(deprecation) public static void main(String[] args) throws IOException { JobConf conf = new JobConf(); Path inputPath = new Path(ignore); FileSystem fs = FileSystem.get(conf); if (!fs.exists(inputPath)) { FSDataOutputStream out = fs.create(inputPath); out.writeChars(Test); out.close(); } conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(NullOutputFormat.class); FileInputFormat.addInputPath(conf, inputPath); conf.setMapperClass(IdentityMapper.class); conf.setNumMapTasks(1); // Problem inducing line follows. conf.setMaxMapAttempts(Integer.MAX_VALUE); // No reducer in this test, although setMaxReduceAttempts leads to the same problem. conf.setNumReduceTasks(0); JobClient.runJob(conf); } } {code} The above code will not let any map task run. Additionally, a log would be created inside JobTracker logs with the following information that clearly shows the overflow: {code} 2010-12-30 00:59:07,836 WARN org.apache.hadoop.mapred.TaskInProgress: Exceeded limit of -2147483648 (plus 0 killed) attempts for the tip 'task_201012300058_0001_m_00' {code} The issue lies inside the TaskInProgress class (/o/a/h/mapred/TaskInProgress.java), at line 1018 (trunk), part of the getTaskToRun(String taskTracker) method. {code} public Task getTaskToRun(String taskTracker) throws IOException { // Create the 'taskid'; do not count the 'killed' tasks against the job! TaskAttemptID taskid = null; /* THIS LINE v == */ if (nextTaskId (MAX_TASK_EXECS + maxTaskAttempts + numKilledTasks)) { /* THIS LINE ^== */ // Make sure that the attempts are unqiue across restarts int attemptId = job.getNumRestarts() * NUM_ATTEMPTS_PER_RESTART + nextTaskId; taskid = new TaskAttemptID( id, attemptId); ++nextTaskId; } else { LOG.warn(Exceeded limit of + (MAX_TASK_EXECS + maxTaskAttempts) + (plus + numKilledTasks + killed) + attempts for the tip ' + getTIPId() + '); return null; } {code} Since all three variables being added are integer in type, one of them being Integer.MAX_VALUE makes the condition fail with an overflow, thereby logging and returning a null as the result is negative. One solution would be to make one of these variables into a long, so the addition does not overflow? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2272) Job ACL file should not be executable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2272: - Tags: acls, job Fix Version/s: 0.23.0 Release Note: Job ACL files now have permissions set to 600 (previously 700). Status: Patch Available (was: Open) Job ACL file should not be executable - Key: MAPREDUCE-2272 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2272 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Priority: Trivial Fix For: 0.23.0 Attachments: mapreduce.2272.r1.diff For some reason the job ACL file is localized with permissions 700. This doesn't make sense, since it's not executable. It should be 600. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2272) Job ACL file should not be executable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2272: - Attachment: mapreduce.2272.r2.diff Ah, I did not know about that test case (Tried finding one earlier, but apparently this one is testing it rather differently and I did not go through these classes carefully enough). Attaching a new patch with a test case change. Job ACL file should not be executable - Key: MAPREDUCE-2272 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2272 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Priority: Trivial Fix For: 0.23.0 Attachments: mapreduce.2272.r1.diff, mapreduce.2272.r2.diff For some reason the job ACL file is localized with permissions 700. This doesn't make sense, since it's not executable. It should be 600. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-993) bin/hadoop job -events jobid from-event-# #-of-events help message is confusing
[ https://issues.apache.org/jira/browse/MAPREDUCE-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000919#comment-13000919 ] Harsh J Chouraria commented on MAPREDUCE-993: - A change in help-doc-string, does not require a new/modified test case. bin/hadoop job -events jobid from-event-# #-of-events help message is confusing - Key: MAPREDUCE-993 URL: https://issues.apache.org/jira/browse/MAPREDUCE-993 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.21.0 Reporter: Iyappan Srinivasan Assignee: Harsh J Chouraria Priority: Minor Fix For: 0.23.0 Attachments: mapreduce.993.r1.diff More explanation needs to be there like a) events always start from 1 b) the message could be like from-event-number to-event-number where from-event-number starts from 1. This will give teh end user idea as to what to enter. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1242) Chain APIs error misleading
[ https://issues.apache.org/jira/browse/MAPREDUCE-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000405#comment-13000405 ] Harsh J Chouraria commented on MAPREDUCE-1242: -- Just change of strings in an exception message. No test cases should be required for that. Chain APIs error misleading --- Key: MAPREDUCE-1242 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1242 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Reporter: Amogh Vasekar Assignee: Harsh J Chouraria Priority: Trivial Fix For: 0.22.0 Attachments: MAPREDUCE-1242.patch, MAPREDUCE-1242.r2.patch Hi, I was using the Chain[Mapper/Reducer] APIs , and in Class Chain line 207 the error thrown : The Mapper output key class does not match the previous Mapper input key class Shouldn't this be The Mapper *input* key class does not match the previous Mapper *Output* key class ? Sort of misleads :) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2225) MultipleOutputs should not require the use of 'Writable'
[ https://issues.apache.org/jira/browse/MAPREDUCE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2225: - Attachment: multipleoutputs.nowritables.r2.diff Sure, it should avoid a raw types warning I think? Here's a patch for that update. MultipleOutputs should not require the use of 'Writable' Key: MAPREDUCE-2225 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2225 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.20.1 Environment: Linux Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0 Attachments: multipleoutputs.nowritables.r1.diff, multipleoutputs.nowritables.r2.diff, multipleoutputs.nowritables.r2.diff Original Estimate: 1m Remaining Estimate: 1m MultipleOutputs right now requires for Key/Value classes to utilize the Writable and WritableComparable interfaces, and fails if the associated key/value classes aren't doing so. With support for alternates like Avro serialization, using Writables isn't necessary and thus the MO class must not strictly check for them. And since comparators may be given separately, key class doesn't need to be checked for implementing a comparable (although it is good design if the key class does implement Comparable at least). Am not sure if this brings about an incompatible change (does Java have BIC? No idea). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1932) record skipping doesn't work with the new map/reduce api
[ https://issues.apache.org/jira/browse/MAPREDUCE-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1932: - Attachment: mapreduce.1932.skippingreader.r1.diff Here's the first attempt at this to get it rolling (smells like a regression!). Will add a test case for this soon and up a fresh patch post-verification. record skipping doesn't work with the new map/reduce api Key: MAPREDUCE-1932 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1932 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 0.20.1 Reporter: Owen O'Malley Assignee: Harsh J Chouraria Attachments: mapreduce.1932.skippingreader.r1.diff The new HADOOP-1230 map/reduce api doesn't support the record skipping features. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (MAPREDUCE-1932) record skipping doesn't work with the new map/reduce api
[ https://issues.apache.org/jira/browse/MAPREDUCE-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-1932: Assignee: Harsh J Chouraria record skipping doesn't work with the new map/reduce api Key: MAPREDUCE-1932 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1932 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 0.20.1 Reporter: Owen O'Malley Assignee: Harsh J Chouraria Attachments: mapreduce.1932.skippingreader.r1.diff The new HADOOP-1230 map/reduce api doesn't support the record skipping features. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2251) Remove mapreduce.job.userhistorylocation config
[ https://issues.apache.org/jira/browse/MAPREDUCE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2251: - Attachment: mapreduce.2251.jobhistorylocremove.r1.diff Patch that removes all refs of the said property in Map/Reduce. HadoopArchives was using it to disable generation of history, but that's not possible now and hence removed from that class as well. Remove mapreduce.job.userhistorylocation config --- Key: MAPREDUCE-2251 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2251 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Attachments: mapreduce.2251.jobhistorylocremove.r1.diff Best I can tell, this config parameter is no longer used as of MAPREDUCE-157 but still exists in the code and in mapred-default.xml. We should remove it to avoid user confusion. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (MAPREDUCE-2251) Remove mapreduce.job.userhistorylocation config
[ https://issues.apache.org/jira/browse/MAPREDUCE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-2251: Assignee: Harsh J Chouraria Remove mapreduce.job.userhistorylocation config --- Key: MAPREDUCE-2251 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2251 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Attachments: mapreduce.2251.jobhistorylocremove.r1.diff Best I can tell, this config parameter is no longer used as of MAPREDUCE-157 but still exists in the code and in mapred-default.xml. We should remove it to avoid user confusion. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2251) Remove mapreduce.job.userhistorylocation config
[ https://issues.apache.org/jira/browse/MAPREDUCE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2251: - Tags: jobhistory Fix Version/s: 0.23.0 Release Note: Remove the now defunct property `mapreduce.job.userhistorylocation`. Status: Patch Available (was: Open) Marking as PA. Remove mapreduce.job.userhistorylocation config --- Key: MAPREDUCE-2251 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2251 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Fix For: 0.23.0 Attachments: mapreduce.2251.jobhistorylocremove.r1.diff Best I can tell, this config parameter is no longer used as of MAPREDUCE-157 but still exists in the code and in mapred-default.xml. We should remove it to avoid user confusion. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2193: - Attachment: mapreduce.trunk.findbugs.r5.diff Oops, looks like my diff failed to contain the Holder file. Forgot to add it to svn before diff. Uploading a proper patch and setting to 'In Progress'. 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff, mapreduce.trunk.findbugs.r2.diff, mapreduce.trunk.findbugs.r3.diff, mapreduce.trunk.findbugs.r4.diff, mapreduce.trunk.findbugs.r5.diff, mapreduce.trunk.findbugs.r5.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2193: - Status: In Progress (was: Patch Available) 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff, mapreduce.trunk.findbugs.r2.diff, mapreduce.trunk.findbugs.r3.diff, mapreduce.trunk.findbugs.r4.diff, mapreduce.trunk.findbugs.r5.diff, mapreduce.trunk.findbugs.r5.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2193: - Attachment: mapreduce.trunk.findbugs.r6.diff 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff, mapreduce.trunk.findbugs.r2.diff, mapreduce.trunk.findbugs.r3.diff, mapreduce.trunk.findbugs.r4.diff, mapreduce.trunk.findbugs.r5.diff, mapreduce.trunk.findbugs.r5.diff, mapreduce.trunk.findbugs.r6.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2193: - Attachment: mapreduce.trunk.findbugs.r5.diff New patch (still leaves a couple of sync warnings left, as introduced by MAPREDUCE-2026). @Chris - I've made the changes you suggested. Although, even with those changes, is it required for {{getTasks(TaskType)}} to be sync'd? I've sync'd it in this patch, but let me know if it can be removed, and the warning ignored (possibly inaccurate, as admitted by Findbugs). @Todd - Shall I open a new JIRA for other places HolderT may be used at? The additional two findbug warnings generated by the issue pointed by Priyo (MAPREDUCE-2026 - Interesting why it does not show up in the test-patch?) needs to be ignored/re-reviewed as well (for {{getMapCounters}}/{{getReduceCounters}} are left synchronized - why?). 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff, mapreduce.trunk.findbugs.r2.diff, mapreduce.trunk.findbugs.r3.diff, mapreduce.trunk.findbugs.r4.diff, mapreduce.trunk.findbugs.r5.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2193: - Attachment: hadoop-findbugs-report.html Fresh Findbugs report for previous patch. 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff, mapreduce.trunk.findbugs.r2.diff, mapreduce.trunk.findbugs.r3.diff, mapreduce.trunk.findbugs.r4.diff, mapreduce.trunk.findbugs.r5.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1811) Job.monitorAndPrintJob() should print status of the job at completion
[ https://issues.apache.org/jira/browse/MAPREDUCE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1811: - Tags: job Release Note: Print the resultant status of a Job on completion instead of simply saying 'Complete'. Status: Patch Available (was: Open) Setting as available. Job.monitorAndPrintJob() should print status of the job at completion - Key: MAPREDUCE-1811 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1811 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.20.1 Reporter: Amareshwari Sriramadasu Assignee: Harsh J Chouraria Priority: Minor Attachments: mapreduce.job.monitor.status.r1.diff Job.monitorAndPrintJob() just prints Job Complete at the end of the job. It should print the state whether the job SUCCEEDED/FAILED/KILLED. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-993) bin/hadoop job -events jobid from-event-# #-of-events help message is confusing
[ https://issues.apache.org/jira/browse/MAPREDUCE-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-993: Tags: events Fix Version/s: 0.23.0 Release Note: Added a helpful description message to the `mapred job -events` command. Status: Patch Available (was: Open) Marking as patch-available. bin/hadoop job -events jobid from-event-# #-of-events help message is confusing - Key: MAPREDUCE-993 URL: https://issues.apache.org/jira/browse/MAPREDUCE-993 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.21.0 Reporter: Iyappan Srinivasan Assignee: Harsh J Chouraria Priority: Minor Fix For: 0.23.0 Attachments: mapreduce.993.r1.diff More explanation needs to be there like a) events always start from 1 b) the message could be like from-event-number to-event-number where from-event-number starts from 1. This will give teh end user idea as to what to enter. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (MAPREDUCE-1347) Missing synchronization in MultipleOutputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-1347: Assignee: Harsh J Chouraria Missing synchronization in MultipleOutputFormat --- Key: MAPREDUCE-1347 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1347 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Attachments: mapreduce.1347.r1.diff MultipleOutputFormat's RecordWriter implementation doesn't use synchronization when accessing the recordWriters member. When using multithreaded mappers or reducers, this can result in problems where two threads will both try to create the same file, causing AlreadyBeingCreatedException. Doing this more fine-grained than just synchronizing the whole method is probably a good idea, so that multithreaded mappers can actually achieve parallelism writing into separate output streams. From what I can tell, the new API's MultipleOutputs seems not to have this issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1347) Missing synchronization in MultipleOutputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1347: - Attachment: mapreduce.1347.r1.diff Here's a patch that attempts to solve this issue. Missing synchronization in MultipleOutputFormat --- Key: MAPREDUCE-1347 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1347 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.21.0, 0.22.0 Reporter: Todd Lipcon Attachments: mapreduce.1347.r1.diff MultipleOutputFormat's RecordWriter implementation doesn't use synchronization when accessing the recordWriters member. When using multithreaded mappers or reducers, this can result in problems where two threads will both try to create the same file, causing AlreadyBeingCreatedException. Doing this more fine-grained than just synchronizing the whole method is probably a good idea, so that multithreaded mappers can actually achieve parallelism writing into separate output streams. From what I can tell, the new API's MultipleOutputs seems not to have this issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-955) CombineFileRecordReader should pass a InputSplit in the constructor instead of CombineFileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12999726#comment-12999726 ] Harsh J Chouraria commented on MAPREDUCE-955: - Why does it need to be so in the Old API? The new API uses the default {{initialize()}} signature with an InputSplit. CombineFileRecordReader should pass a InputSplit in the constructor instead of CombineFileSplit --- Key: MAPREDUCE-955 URL: https://issues.apache.org/jira/browse/MAPREDUCE-955 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Namit Jain The specific reader can always cast the class as needed. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1360) Record skipping should work with more serializations
[ https://issues.apache.org/jira/browse/MAPREDUCE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12999728#comment-12999728 ] Harsh J Chouraria commented on MAPREDUCE-1360: -- This sounded interesting for me to work upon. So here's some questions after a brief investigation: I investigated what goes on inside of the SkippingRecordReader but couldn't find any area that gets Writable limited. The sequence files for skipped records are created with the Key/Value classes which are then used to load their acceptable serialization classes. Which part of the skipping framework is Writable limited now? Record skipping should work with more serializations Key: MAPREDUCE-1360 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1360 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Aaron Kimball Record skipping currently supports WritableSerialization, but cannot handle non-class-based serialization systems (e.g., AvroSerialization). The record skipping mechanism should be made compatible with the metadata-based serialization configuration. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2272) Job ACL file should not be executable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2272: - Attachment: mapreduce.2272.r1.diff Patch that performs this trivial change. ant test-patch results: {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. {code} Job ACL file should not be executable - Key: MAPREDUCE-2272 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2272 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.22.0 Reporter: Todd Lipcon Priority: Trivial Attachments: mapreduce.2272.r1.diff For some reason the job ACL file is localized with permissions 700. This doesn't make sense, since it's not executable. It should be 600. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (MAPREDUCE-2272) Job ACL file should not be executable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-2272: Assignee: Harsh J Chouraria Job ACL file should not be executable - Key: MAPREDUCE-2272 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2272 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Priority: Trivial Attachments: mapreduce.2272.r1.diff For some reason the job ACL file is localized with permissions 700. This doesn't make sense, since it's not executable. It should be 600. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1159) Limit Job name on jobtracker.jsp to be 80 char long
[ https://issues.apache.org/jira/browse/MAPREDUCE-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993583#comment-12993583 ] Harsh J Chouraria commented on MAPREDUCE-1159: -- I can add in a test case for the shortener, but I'm stumped as to why the TestClusterMapReduceTestCase (Age 1) would fail for this moderately trivial patch. Limit Job name on jobtracker.jsp to be 80 char long --- Key: MAPREDUCE-1159 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1159 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0 Reporter: Zheng Shao Assignee: Zheng Shao Priority: Trivial Fix For: 0.22.0 Attachments: MAPREDUCE-1159.r1.patch, MAPREDUCE-1159.r2.patch, MAPREDUCE-1159.trunk.patch Sometimes a user submits a job with a very long job name. That made jobtracker.jsp very hard to read. We should limit the size of the job name. User can see the full name when they click on the job. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993622#comment-12993622 ] Harsh J Chouraria commented on MAPREDUCE-2193: -- It may be ignored if it is acceptable to do so, but via the findbugs filter XML, there's no way of doing selective-ignore without ignoring the whole field itself (no line number sub-filter either). I don't think ignoring the fields forever would be a good idea for changes that are yet to come in that class. 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff, mapreduce.trunk.findbugs.r2.diff, mapreduce.trunk.findbugs.r3.diff, mapreduce.trunk.findbugs.r4.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2309) While querying the Job Statics from the command-line, if we give wrong status name then there is no warning or response.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992563#comment-12992563 ] Harsh J Chouraria commented on MAPREDUCE-2309: -- Which sub-command of {{mapred}} is this exactly? :) While querying the Job Statics from the command-line, if we give wrong status name then there is no warning or response. Key: MAPREDUCE-2309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.23.0 Reporter: Devaraj K Priority: Minor If we try to get the jobs information by giving the wrong status name from the command line interface, it is not giving any warning or response. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2310) If we stop Job Tracker, Task Tracker is also getting stopped.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992634#comment-12992634 ] Harsh J Chouraria commented on MAPREDUCE-2310: -- Do we really have a stop-jobtracker.sh in 0.20? I couldn't see one in the 0.20 branch under common nor in Y!'s 0.20.100; which release has it? If we stop Job Tracker, Task Tracker is also getting stopped. - Key: MAPREDUCE-2310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2310 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.2 Reporter: Devaraj K Priority: Minor If we execute stop-jobtracker.sh for stopping Job Tracker, Task Tracker is also stopping. This is not applicable for the latest (trunk) code because stop-jobtracker.sh file is not coming. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2310) If we stop Job Tracker, Task Tracker is also getting stopped.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992653#comment-12992653 ] Harsh J Chouraria commented on MAPREDUCE-2310: -- None in CDH2/3 either, nor in Yahoo's one at GitHub. There's a stop-jobhistoryserver.sh available in 0.20.100, however. Is that probably it? If we stop Job Tracker, Task Tracker is also getting stopped. - Key: MAPREDUCE-2310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2310 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.2 Reporter: Devaraj K Priority: Minor If we execute stop-jobtracker.sh for stopping Job Tracker, Task Tracker is also stopping. This is not applicable for the latest (trunk) code because stop-jobtracker.sh file is not coming. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name
[ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2293: - Attachment: mapreduce.mo.removecheck.r1.diff For extensions to be applied, I think one needs to make a change in the OutputFormat class chosen. MultipleOutputs can only handle the part before the '-X' (partition) numbering in Map/Reduce outputs, not after it. I've posted a patch that removes the check (from both old and new API). I can't post the result of an {{ant test-patch}} since that isn't working for me right now (Mumak build is failing for some reason, in MR trunk). I'll post that when I get it working. This should be marked as an Incompatible Change in my opinion, as it is a removal of a strong validation. People may also be relying on the MultiName_OutputName-Partition syntax via string splits, etc. in the Stable API MO class. Also, I'm curious to see if allowing any character to go in is a good idea 'Path' wise. Does HDFS have any restrictions on Filenames? I've not seen a documentation on it (although I think it is pretty POSIX compliant), but HDFS-13 points out that there may be some trouble, any thoughts on that? Enhance MultipleOutputs to allow additional characters in the named output name --- Key: MAPREDUCE-2293 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: David Rosenstrauch Assignee: Harsh J Chouraria Priority: Minor Attachments: mapreduce.mo.removecheck.r1.diff Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class. This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too. (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension. Perhaps '-' and a '_' characters as well.) The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method. (Though I don't know if there's any downstream impact by loosening this restriction.) Would be extremely helpful/useful to have this fixed though! -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2193: - Attachment: mapreduce.trunk.findbugs.r4.diff Another new patch (Rev 4.) * Moved Holder from o.a.h.mapreduce to o.a.h.mapreduce.util * Added the forgotten-about ASF license and API interface annotations, as per Todd's comments ant clean findbugs -Dfindbugs.home=/opt/findbugs - 0, as before. Waiting for another review for the 'synchronized' change made in the {{TIP[] JobInProgress::getTasks(TaskType)}} method. 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff, mapreduce.trunk.findbugs.r2.diff, mapreduce.trunk.findbugs.r3.diff, mapreduce.trunk.findbugs.r4.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1013) MapReduce Project page does not show 0.20.1 documentation/release information.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987138#action_12987138 ] Harsh J Chouraria commented on MAPREDUCE-1013: -- I believe this issue doesn't apply anymore, ever since 0.20.2 went current and stable. Could this be marked Invalid (Invalid now, after 0.20.2)? MapReduce Project page does not show 0.20.1 documentation/release information. --- Key: MAPREDUCE-1013 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1013 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation Affects Versions: 0.20.1 Reporter: Andy Sautins Attachments: MAPREDUCE-1013.patch The MapReduce Project page shows the documentation for 0.20.0 even though the latest stable release version is 0.20.1. The releases page also shows all the pre 0.20.1 releases, but does not show 0.20.1 eventhough if you click on the Download a release now! link the mirror links are for hadoop/core. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1159) Limit Job name on jobtracker.jsp to be 80 char long
[ https://issues.apache.org/jira/browse/MAPREDUCE-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1159: - Fix Version/s: 0.22.0 Release Note: Job names on jobtracker.jsp should be 80 characters long at most. Status: Patch Available (was: Open) A patch is available. Forgot to mark as so. Limit Job name on jobtracker.jsp to be 80 char long --- Key: MAPREDUCE-1159 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1159 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0 Reporter: Zheng Shao Assignee: Zheng Shao Priority: Trivial Fix For: 0.22.0 Attachments: MAPREDUCE-1159.r1.patch, MAPREDUCE-1159.r2.patch, MAPREDUCE-1159.trunk.patch Sometimes a user submits a job with a very long job name. That made jobtracker.jsp very hard to read. We should limit the size of the job name. User can see the full name when they click on the job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1242) Chain APIs error misleading
[ https://issues.apache.org/jira/browse/MAPREDUCE-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-1242: Assignee: Harsh J Chouraria Chain APIs error misleading --- Key: MAPREDUCE-1242 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1242 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Amogh Vasekar Assignee: Harsh J Chouraria Priority: Trivial Attachments: MAPREDUCE-1242.patch, MAPREDUCE-1242.r2.patch Hi, I was using the Chain[Mapper/Reducer] APIs , and in Class Chain line 207 the error thrown : The Mapper output key class does not match the previous Mapper input key class Shouldn't this be The Mapper *input* key class does not match the previous Mapper *Output* key class ? Sort of misleads :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1242) Chain APIs error misleading
[ https://issues.apache.org/jira/browse/MAPREDUCE-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1242: - Attachment: MAPREDUCE-1242.r2.patch New patch that adds in the other pointer as requested by Amareshwari. Hope this clears up the doc issue. Chain APIs error misleading --- Key: MAPREDUCE-1242 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1242 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Amogh Vasekar Assignee: Harsh J Chouraria Priority: Trivial Attachments: MAPREDUCE-1242.patch, MAPREDUCE-1242.r2.patch Hi, I was using the Chain[Mapper/Reducer] APIs , and in Class Chain line 207 the error thrown : The Mapper output key class does not match the previous Mapper input key class Shouldn't this be The Mapper *input* key class does not match the previous Mapper *Output* key class ? Sort of misleads :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1242) Chain APIs error misleading
[ https://issues.apache.org/jira/browse/MAPREDUCE-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1242: - Fix Version/s: 0.22.0 Affects Version/s: 0.20.2 Release Note: Fix a misleading exception message in case the Chained Mappers have mismatch in input/output Key/Value pairs between them. Status: Patch Available (was: Open) Patch available that fixes this minor docs issue. Chain APIs error misleading --- Key: MAPREDUCE-1242 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1242 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Reporter: Amogh Vasekar Assignee: Harsh J Chouraria Priority: Trivial Fix For: 0.22.0 Attachments: MAPREDUCE-1242.patch, MAPREDUCE-1242.r2.patch Hi, I was using the Chain[Mapper/Reducer] APIs , and in Class Chain line 207 the error thrown : The Mapper output key class does not match the previous Mapper input key class Shouldn't this be The Mapper *input* key class does not match the previous Mapper *Output* key class ? Sort of misleads :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1996) API: Reducer.reduce() method detail misstatement
[ https://issues.apache.org/jira/browse/MAPREDUCE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1996: - Fix Version/s: (was: 0.20.2) 0.22.0 Release Note: Fix a misleading documentation note about the usage of Reporter objects in Reducers. Status: Patch Available (was: Open) Patch is available for this trivial doc-fix. API: Reducer.reduce() method detail misstatement Key: MAPREDUCE-1996 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1996 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Environment: Seen in Hadoop 0.20.2 API and Hadoop 0.19.x API Reporter: Glynn Durham Priority: Trivial Fix For: 0.22.0 Attachments: MAPREDUCE-1996.r1.diff Original Estimate: 0.08h Remaining Estimate: 0.08h method detail for Reducer.reduce() method has paragraph starting: Applications can use the Reporter provided to report progress or just indicate that they are alive. In scenarios where the application takes an insignificant amount of time to process individual key/value pairs, this is crucial since the framework might assume that the task has timed-out and kill that task. s/an insignificant amount of time/a significant amount of time/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1996) API: Reducer.reduce() method detail misstatement
[ https://issues.apache.org/jira/browse/MAPREDUCE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1996: - Affects Version/s: 0.20.1 API: Reducer.reduce() method detail misstatement Key: MAPREDUCE-1996 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1996 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 0.20.1 Environment: Seen in Hadoop 0.20.2 API and Hadoop 0.19.x API Reporter: Glynn Durham Assignee: Harsh J Chouraria Priority: Trivial Fix For: 0.22.0 Attachments: MAPREDUCE-1996.r1.diff Original Estimate: 0.08h Remaining Estimate: 0.08h method detail for Reducer.reduce() method has paragraph starting: Applications can use the Reporter provided to report progress or just indicate that they are alive. In scenarios where the application takes an insignificant amount of time to process individual key/value pairs, this is crucial since the framework might assume that the task has timed-out and kill that task. s/an insignificant amount of time/a significant amount of time/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1996) API: Reducer.reduce() method detail misstatement
[ https://issues.apache.org/jira/browse/MAPREDUCE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-1996: Assignee: Harsh J Chouraria API: Reducer.reduce() method detail misstatement Key: MAPREDUCE-1996 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1996 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 0.20.1 Environment: Seen in Hadoop 0.20.2 API and Hadoop 0.19.x API Reporter: Glynn Durham Assignee: Harsh J Chouraria Priority: Trivial Fix For: 0.22.0 Attachments: MAPREDUCE-1996.r1.diff Original Estimate: 0.08h Remaining Estimate: 0.08h method detail for Reducer.reduce() method has paragraph starting: Applications can use the Reporter provided to report progress or just indicate that they are alive. In scenarios where the application takes an insignificant amount of time to process individual key/value pairs, this is crucial since the framework might assume that the task has timed-out and kill that task. s/an insignificant amount of time/a significant amount of time/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2193: - Status: In Progress (was: Patch Available) Thanks for the review Todd. I've made the getTasks synchronized now and also have removed the ignores. About the Holder, must I write the class myself? And if it is going to be used in other places as well, where must I put it (package/project wise)? 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff, mapreduce.trunk.findbugs.r2.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2193: - Status: Patch Available (was: In Progress) 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff, mapreduce.trunk.findbugs.r2.diff, mapreduce.trunk.findbugs.r3.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2193: - Attachment: mapreduce.trunk.findbugs.r3.diff New patch that tries to resolve Todd's additional comments. * Adds a new class 'HolderT' added in o.a.h.mapreduce package, for use right now only in Localizer.java. Re-added synchronized block in Localizer.java. * Removed ignores on maps/reduces/etc TIP arrays in JobInProgress.java and made the getTasks() method synchronized. Checkstyle passes on Holder.java with 0 errors/warnings (via the report of `ant checkstyle`). Findbugs reports 0 errors, like the previous attachment. 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff, mapreduce.trunk.findbugs.r2.diff, mapreduce.trunk.findbugs.r3.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1720) 'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980227#action_12980227 ] Harsh J Chouraria commented on MAPREDUCE-1720: -- I actually like the concept of having all (retained) jobs listed in one page. This way, if you are monitoring one, and it actually fails or is killed, it still remains on the same page; not requiring an inquisitive search for where it really went. With browser search, one can also lookup the ID on a single page, without having to switch any context for the resultant state (be it tab or page). But yes, the representation could use a little work as the pages tend to get longer and longer over load/time. About job history/details, since the current job details page gets pretty large thanks to counters and charts, it wouldn't look good even if it were expanded inline, I think. Although we can have a short summary (defn.?) which can be shown inline when clicked/hovered. 'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI Key: MAPREDUCE-1720 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1720 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Environment: all Reporter: Subramaniam Krishnan Assignee: Harsh J Chouraria Fix For: 0.23.0 Attachments: mapred.failed.killed.difference.png, mapreduce.unsuccessfuljobs.ui.r1.diff The JobTracker UI shows both Failed/Killed Jobs as Failed. The Killed job status has been separated from Failed as part of HADOOP-3924, so the UI needs to be updated to reflect the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2193: - Attachment: hadoop-findbugs-report.html mapreduce.trunk.findbugs.r2.diff I had a look at JIP's use of those TIP arrays again, and I guess that it is alright to ignore non sync access to them via getTasks() as getTasks() is thread safe. But I may be wrong, so please review. Marking as PA. 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff, mapreduce.trunk.findbugs.r2.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2193: - Tags: findbugs Status: Patch Available (was: Open) 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff, mapreduce.trunk.findbugs.r2.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2236) No task may execute due to an Integer overflow possibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978400#action_12978400 ] Harsh J Chouraria commented on MAPREDUCE-2236: -- 100 sounds reasonable to me; would be a good cap for smaller clusters too. No task may execute due to an Integer overflow possibility -- Key: MAPREDUCE-2236 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2236 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Environment: Linux, Hadoop 0.20.2 Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Critical Fix For: 0.23.0 If the attempts is configured to use Integer.MAX_VALUE, an overflow occurs inside TaskInProgress, and thereby no task is attempted by the cluster and the map tasks stay in pending state forever. For example, here's a job driver that causes this: {code} import java.io.IOException; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.lib.IdentityMapper; import org.apache.hadoop.mapred.lib.NullOutputFormat; @SuppressWarnings(deprecation) public class IntegerOverflow { /** * @param args * @throws IOException */ @SuppressWarnings(deprecation) public static void main(String[] args) throws IOException { JobConf conf = new JobConf(); Path inputPath = new Path(ignore); FileSystem fs = FileSystem.get(conf); if (!fs.exists(inputPath)) { FSDataOutputStream out = fs.create(inputPath); out.writeChars(Test); out.close(); } conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(NullOutputFormat.class); FileInputFormat.addInputPath(conf, inputPath); conf.setMapperClass(IdentityMapper.class); conf.setNumMapTasks(1); // Problem inducing line follows. conf.setMaxMapAttempts(Integer.MAX_VALUE); // No reducer in this test, although setMaxReduceAttempts leads to the same problem. conf.setNumReduceTasks(0); JobClient.runJob(conf); } } {code} The above code will not let any map task run. Additionally, a log would be created inside JobTracker logs with the following information that clearly shows the overflow: {code} 2010-12-30 00:59:07,836 WARN org.apache.hadoop.mapred.TaskInProgress: Exceeded limit of -2147483648 (plus 0 killed) attempts for the tip 'task_201012300058_0001_m_00' {code} The issue lies inside the TaskInProgress class (/o/a/h/mapred/TaskInProgress.java), at line 1018 (trunk), part of the getTaskToRun(String taskTracker) method. {code} public Task getTaskToRun(String taskTracker) throws IOException { // Create the 'taskid'; do not count the 'killed' tasks against the job! TaskAttemptID taskid = null; /* THIS LINE v == */ if (nextTaskId (MAX_TASK_EXECS + maxTaskAttempts + numKilledTasks)) { /* THIS LINE ^== */ // Make sure that the attempts are unqiue across restarts int attemptId = job.getNumRestarts() * NUM_ATTEMPTS_PER_RESTART + nextTaskId; taskid = new TaskAttemptID( id, attemptId); ++nextTaskId; } else { LOG.warn(Exceeded limit of + (MAX_TASK_EXECS + maxTaskAttempts) + (plus + numKilledTasks + killed) + attempts for the tip ' + getTIPId() + '); return null; } {code} Since all three variables being added are integer in type, one of them being Integer.MAX_VALUE makes the condition fail with an overflow, thereby logging and returning a null as the result is negative. One solution would be to make one of these variables into a long, so the addition does not overflow? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2225) MultipleOutputs should not require the use of 'Writable'
[ https://issues.apache.org/jira/browse/MAPREDUCE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2225: - Attachment: multipleoutputs.nowritables.r2.diff New patch, with test cases for testing JavaSerialization with MO. MultipleOutputs should not require the use of 'Writable' Key: MAPREDUCE-2225 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2225 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.20.1 Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Fix For: 0.23.0 Attachments: multipleoutputs.nowritables.r1.diff, multipleoutputs.nowritables.r2.diff Original Estimate: 0.02h Remaining Estimate: 0.02h MultipleOutputs right now requires for Key/Value classes to utilize the Writable and WritableComparable interfaces, and fails if the associated key/value classes aren't doing so. With support for alternates like Avro serialization, using Writables isn't necessary and thus the MO class must not strictly check for them. And since comparators may be given separately, key class doesn't need to be checked for implementing a comparable (although it is good design if the key class does implement Comparable at least). Am not sure if this brings about an incompatible change (does Java have BIC? No idea). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2225) MultipleOutputs should not require the use of 'Writable'
[ https://issues.apache.org/jira/browse/MAPREDUCE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2225: - Release Note: MultipleOutputs should not require the use/check of 'Writable' interfaces in key and value classes. (was: MultipleOutputs do not require the use of 'Writable' interfaces.) Status: Patch Available (was: Open) Setting state to PA. 'ant test-patch' passes with: {code} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD SUCCESSFUL Total time: 23 minutes 14 seconds {code} MultipleOutputs should not require the use of 'Writable' Key: MAPREDUCE-2225 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2225 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.20.1 Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Fix For: 0.23.0 Attachments: multipleoutputs.nowritables.r1.diff, multipleoutputs.nowritables.r2.diff Original Estimate: 0.02h Remaining Estimate: 0.02h MultipleOutputs right now requires for Key/Value classes to utilize the Writable and WritableComparable interfaces, and fails if the associated key/value classes aren't doing so. With support for alternates like Avro serialization, using Writables isn't necessary and thus the MO class must not strictly check for them. And since comparators may be given separately, key class doesn't need to be checked for implementing a comparable (although it is good design if the key class does implement Comparable at least). Am not sure if this brings about an incompatible change (does Java have BIC? No idea). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2225) MultipleOutputs should not require the use of 'Writable'
[ https://issues.apache.org/jira/browse/MAPREDUCE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2225: - Environment: Linux Fix Version/s: (was: 0.23.0) 0.22.0 Perhaps this could go in 0.22 itself, if improvements are still accepted? MultipleOutputs should not require the use of 'Writable' Key: MAPREDUCE-2225 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2225 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.20.1 Environment: Linux Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Fix For: 0.22.0 Attachments: multipleoutputs.nowritables.r1.diff, multipleoutputs.nowritables.r2.diff Original Estimate: 0.02h Remaining Estimate: 0.02h MultipleOutputs right now requires for Key/Value classes to utilize the Writable and WritableComparable interfaces, and fails if the associated key/value classes aren't doing so. With support for alternates like Avro serialization, using Writables isn't necessary and thus the MO class must not strictly check for them. And since comparators may be given separately, key class doesn't need to be checked for implementing a comparable (although it is good design if the key class does implement Comparable at least). Am not sure if this brings about an incompatible change (does Java have BIC? No idea). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2225) MultipleOutputs should not require the use of 'Writable'
[ https://issues.apache.org/jira/browse/MAPREDUCE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2225: - Priority: Blocker (was: Major) Setting to BLOCKER as per Nigel's mail. MultipleOutputs should not require the use of 'Writable' Key: MAPREDUCE-2225 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2225 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.20.1 Environment: Linux Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0 Attachments: multipleoutputs.nowritables.r1.diff, multipleoutputs.nowritables.r2.diff Original Estimate: 0.02h Remaining Estimate: 0.02h MultipleOutputs right now requires for Key/Value classes to utilize the Writable and WritableComparable interfaces, and fails if the associated key/value classes aren't doing so. With support for alternates like Avro serialization, using Writables isn't necessary and thus the MO class must not strictly check for them. And since comparators may be given separately, key class doesn't need to be checked for implementing a comparable (although it is good design if the key class does implement Comparable at least). Am not sure if this brings about an incompatible change (does Java have BIC? No idea). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2170) Send out last-minute load averages in TaskTrackerStatus
[ https://issues.apache.org/jira/browse/MAPREDUCE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2170: - Priority: Blocker (was: Minor) Setting to BLOCKER as per Nigel's mail. Send out last-minute load averages in TaskTrackerStatus --- Key: MAPREDUCE-2170 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2170 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.22.0 Environment: GNU/Linux Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0 Attachments: mapreduce.loadaverage.r3.diff, mapreduce.loadaverage.r4.diff, mapreduce.loadaverage.r5.diff Original Estimate: 0.33h Remaining Estimate: 0.33h Load averages could be useful in scheduling. This patch looks to extend the existing Linux resource plugin (via /proc/loadavg file) to allow transmitting load averages of the last one minute via the TaskTrackerStatus. Patch is up for review, with test cases added, at: https://reviews.apache.org/r/20/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1720) 'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976656#action_12976656 ] Harsh J Chouraria commented on MAPREDUCE-1720: -- I think that's a good idea. Jobs can be killed for a reason (say, hadoop job -kill JobID reason why we're killing it?), which can be included into the JobStatus data, same with the whois of the killer; but when it comes to failure, how do we deduce the 'reason' of failure -- just task numbers as explained in the MAPREDUCE-343 ticket? I also think that for Unsuccessful Jobs, displaying map and reduce progress percentages is not a good thing, as it is not very indicative of the actual progress (Always shows 100 or 0). We could remove this and claim some good real estate to display reasons (limited characters of it). 'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI Key: MAPREDUCE-1720 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1720 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Environment: all Reporter: Subramaniam Krishnan Assignee: Harsh J Chouraria Fix For: 0.23.0 Attachments: mapred.failed.killed.difference.png, mapreduce.unsuccessfuljobs.ui.r1.diff The JobTracker UI shows both Failed/Killed Jobs as Failed. The Killed job status has been separated from Failed as part of HADOOP-3924, so the UI needs to be updated to reflect the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-2193: Assignee: Harsh J Chouraria 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Assignee: Harsh J Chouraria Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2193) 13 Findbugs warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/MAPREDUCE-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2193: - Attachment: hadoop-findbugs-report.html mapreduce.trunk.findbugs.r1.diff Attempting this cause it looks like an interesting task for a newcomer. Attached a patch that filters and fixes some of these warnings. [Also attached the resultant warning HTML page]. Please review :) Regarding the four remaining IS2_INCONSISTENT_SYNC warnings, I'm unable to think of a proper way to exclude them (as I think getTasks() doesn't require to be synchronized). I tried performing a Match on the Method, but findbugs is only interested in the Field as a whole. Ignoring the Fields (maps, reduces, setup, cleanup) for the entire JIP class doesn't look like a good idea to me. Any tips on how to resolve this filtering? 13 Findbugs warnings on trunk and branch-0.22 - Key: MAPREDUCE-2193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2193 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: findbugsWarnings.html, hadoop-findbugs-report.html, mapreduce.trunk.findbugs.r1.diff There are 13 findbugs warnings on trunk. See attached html file. These must be fixed or filtered out to get back to 0 warnings. The OK_FINDBUGS_WARNINGS property in src/test/test-patch.properties should also be set to 0 in the patch that fixes this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1811) Job.monitorAndPrintJob() should print status of the job at completion
[ https://issues.apache.org/jira/browse/MAPREDUCE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1811: - Attachment: mapreduce.job.monitor.status.r1.diff This can be a very helpful addition. Attaching a trivial patch that can spit out the job state in the final logs. I've moved the log call after the counters display; which I think makes more sense as a place for state + ID. Job.monitorAndPrintJob() should print status of the job at completion - Key: MAPREDUCE-1811 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1811 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.20.1 Reporter: Amareshwari Sriramadasu Priority: Minor Fix For: 0.22.0 Attachments: mapreduce.job.monitor.status.r1.diff Job.monitorAndPrintJob() just prints Job Complete at the end of the job. It should print the state whether the job SUCCEEDED/FAILED/KILLED. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1811) Job.monitorAndPrintJob() should print status of the job at completion
[ https://issues.apache.org/jira/browse/MAPREDUCE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-1811: Assignee: Harsh J Chouraria Job.monitorAndPrintJob() should print status of the job at completion - Key: MAPREDUCE-1811 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1811 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.20.1 Reporter: Amareshwari Sriramadasu Assignee: Harsh J Chouraria Priority: Minor Fix For: 0.22.0 Attachments: mapreduce.job.monitor.status.r1.diff Job.monitorAndPrintJob() just prints Job Complete at the end of the job. It should print the state whether the job SUCCEEDED/FAILED/KILLED. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1720) 'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-1720: Assignee: Harsh J Chouraria 'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI Key: MAPREDUCE-1720 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1720 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Environment: all Reporter: Subramaniam Krishnan Assignee: Harsh J Chouraria Fix For: 0.23.0 The JobTracker UI shows both Failed/Killed Jobs as Failed. The Killed job status has been separated from Failed as part of HADOOP-3924, so the UI needs to be updated to reflect the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1720) 'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1720: - Fix Version/s: (was: 0.20.3) 0.23.0 'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI Key: MAPREDUCE-1720 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1720 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Environment: all Reporter: Subramaniam Krishnan Assignee: Harsh J Chouraria Fix For: 0.23.0 The JobTracker UI shows both Failed/Killed Jobs as Failed. The Killed job status has been separated from Failed as part of HADOOP-3924, so the UI needs to be updated to reflect the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1720) 'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1720: - Attachment: mapred.failed.killed.difference.png mapreduce.unsuccessfuljobs.ui.r1.diff Attaching a patch that changes the Failed jobs in the UI to Unsucessful jobs and displays a reason column that clearly indicates whether the failed job failed on its own or got killed. Attached PNG image shows a screenshot of the same while executing mapred/TestJobKillAndFail !mapred.failed.killed.difference.png|thumbnail! [Also cleaned up the JSPUtil.generateJobTable(...) method as I was modifying it.] 'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI Key: MAPREDUCE-1720 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1720 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Environment: all Reporter: Subramaniam Krishnan Assignee: Harsh J Chouraria Fix For: 0.23.0 Attachments: mapred.failed.killed.difference.png, mapreduce.unsuccessfuljobs.ui.r1.diff The JobTracker UI shows both Failed/Killed Jobs as Failed. The Killed job status has been separated from Failed as part of HADOOP-3924, so the UI needs to be updated to reflect the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2236) No task may execute due to an Integer overflow possibility
No task may execute due to an Integer overflow possibility -- Key: MAPREDUCE-2236 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2236 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Environment: Linux, Hadoop 0.20.2 Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Critical Fix For: 0.23.0 If the attempts is configured to use Integer.MAX_VALUE, an overflow occurs inside TaskInProgress, and thereby no task is attempted by the cluster and the map tasks stay in pending state forever. For example, here's a job driver that causes this: {code} import java.io.IOException; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.lib.IdentityMapper; import org.apache.hadoop.mapred.lib.NullOutputFormat; @SuppressWarnings(deprecation) public class IntegerOverflow { /** * @param args * @throws IOException */ @SuppressWarnings(deprecation) public static void main(String[] args) throws IOException { JobConf conf = new JobConf(); Path inputPath = new Path(ignore); FileSystem fs = FileSystem.get(conf); if (!fs.exists(inputPath)) { FSDataOutputStream out = fs.create(inputPath); out.writeChars(Test); out.close(); } conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(NullOutputFormat.class); FileInputFormat.addInputPath(conf, inputPath); conf.setMapperClass(IdentityMapper.class); conf.setNumMapTasks(1); // Problem inducing line follows. conf.setMaxMapAttempts(Integer.MAX_VALUE); // No reducer in this test, although setMaxReduceAttempts leads to the same problem. conf.setNumReduceTasks(0); JobClient.runJob(conf); } } {code} The above code will not let any map task run. Additionally, a log would be created inside JobTracker logs with the following information that clearly shows the overflow: {code} 2010-12-30 00:59:07,836 WARN org.apache.hadoop.mapred.TaskInProgress: Exceeded limit of -2147483648 (plus 0 killed) attempts for the tip 'task_201012300058_0001_m_00' {code} The issue lies inside the TaskInProgress class (/o/a/h/mapred/TaskInProgress.java), at line 1018 (trunk), part of the getTaskToRun(String taskTracker) method. {code} public Task getTaskToRun(String taskTracker) throws IOException { // Create the 'taskid'; do not count the 'killed' tasks against the job! TaskAttemptID taskid = null; /* THIS LINE v == */ if (nextTaskId (MAX_TASK_EXECS + maxTaskAttempts + numKilledTasks)) { /* THIS LINE ^== */ // Make sure that the attempts are unqiue across restarts int attemptId = job.getNumRestarts() * NUM_ATTEMPTS_PER_RESTART + nextTaskId; taskid = new TaskAttemptID( id, attemptId); ++nextTaskId; } else { LOG.warn(Exceeded limit of + (MAX_TASK_EXECS + maxTaskAttempts) + (plus + numKilledTasks + killed) + attempts for the tip ' + getTIPId() + '); return null; } {code} Since all three variables being added are integer in type, one of them being Integer.MAX_VALUE makes the condition fail with an overflow, thereby logging and returning a null as the result is negative. One solution would be to make one of these variables into a long, so the addition does not overflow? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2236) No task may execute due to an Integer overflow possibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975886#action_12975886 ] Harsh J Chouraria commented on MAPREDUCE-2236: -- We can also set a hard cap on the attempts amount when its being set; although how much should that be must be discussed. Integer.MAX_VALUE / 2 should do I guess? No task may execute due to an Integer overflow possibility -- Key: MAPREDUCE-2236 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2236 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Environment: Linux, Hadoop 0.20.2 Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Critical Fix For: 0.23.0 If the attempts is configured to use Integer.MAX_VALUE, an overflow occurs inside TaskInProgress, and thereby no task is attempted by the cluster and the map tasks stay in pending state forever. For example, here's a job driver that causes this: {code} import java.io.IOException; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.lib.IdentityMapper; import org.apache.hadoop.mapred.lib.NullOutputFormat; @SuppressWarnings(deprecation) public class IntegerOverflow { /** * @param args * @throws IOException */ @SuppressWarnings(deprecation) public static void main(String[] args) throws IOException { JobConf conf = new JobConf(); Path inputPath = new Path(ignore); FileSystem fs = FileSystem.get(conf); if (!fs.exists(inputPath)) { FSDataOutputStream out = fs.create(inputPath); out.writeChars(Test); out.close(); } conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(NullOutputFormat.class); FileInputFormat.addInputPath(conf, inputPath); conf.setMapperClass(IdentityMapper.class); conf.setNumMapTasks(1); // Problem inducing line follows. conf.setMaxMapAttempts(Integer.MAX_VALUE); // No reducer in this test, although setMaxReduceAttempts leads to the same problem. conf.setNumReduceTasks(0); JobClient.runJob(conf); } } {code} The above code will not let any map task run. Additionally, a log would be created inside JobTracker logs with the following information that clearly shows the overflow: {code} 2010-12-30 00:59:07,836 WARN org.apache.hadoop.mapred.TaskInProgress: Exceeded limit of -2147483648 (plus 0 killed) attempts for the tip 'task_201012300058_0001_m_00' {code} The issue lies inside the TaskInProgress class (/o/a/h/mapred/TaskInProgress.java), at line 1018 (trunk), part of the getTaskToRun(String taskTracker) method. {code} public Task getTaskToRun(String taskTracker) throws IOException { // Create the 'taskid'; do not count the 'killed' tasks against the job! TaskAttemptID taskid = null; /* THIS LINE v == */ if (nextTaskId (MAX_TASK_EXECS + maxTaskAttempts + numKilledTasks)) { /* THIS LINE ^== */ // Make sure that the attempts are unqiue across restarts int attemptId = job.getNumRestarts() * NUM_ATTEMPTS_PER_RESTART + nextTaskId; taskid = new TaskAttemptID( id, attemptId); ++nextTaskId; } else { LOG.warn(Exceeded limit of + (MAX_TASK_EXECS + maxTaskAttempts) + (plus + numKilledTasks + killed) + attempts for the tip ' + getTIPId() + '); return null; } {code} Since all three variables being added are integer in type, one of them being Integer.MAX_VALUE makes the condition fail with an overflow, thereby logging and returning a null as the result is negative. One solution would be to make one of these variables into a long, so the addition does not overflow? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1591) Add better javadocs for RawComparator interface
[ https://issues.apache.org/jira/browse/MAPREDUCE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned MAPREDUCE-1591: Assignee: Harsh J Chouraria Add better javadocs for RawComparator interface --- Key: MAPREDUCE-1591 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1591 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Harsh J Chouraria Priority: Trivial Attachments: common.rawcomparator.jdoc.r1.diff The RawComparator interface is very important to understand for users implementing their own serialization classes. Right now the javadoc is woefully sparse. We should improve that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1591) Add better javadocs for RawComparator interface
[ https://issues.apache.org/jira/browse/MAPREDUCE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-1591: - Attachment: common.rawcomparator.jdoc.r1.diff Attaching a javadoc patch that hopefully adds more details to RawComparator. Please review (correct me if am wrong anywhere too!). Add better javadocs for RawComparator interface --- Key: MAPREDUCE-1591 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1591 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0 Reporter: Todd Lipcon Priority: Trivial Attachments: common.rawcomparator.jdoc.r1.diff The RawComparator interface is very important to understand for users implementing their own serialization classes. Right now the javadoc is woefully sparse. We should improve that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2170) Send out last-minute load averages in TaskTrackerStatus
[ https://issues.apache.org/jira/browse/MAPREDUCE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2170: - Status: In Progress (was: Patch Available) Fixing the cause that led to a failing core test. [New patch upped -- r5] Send out last-minute load averages in TaskTrackerStatus --- Key: MAPREDUCE-2170 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2170 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.22.0 Environment: GNU/Linux Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Minor Fix For: 0.22.0 Attachments: mapreduce.loadaverage.r3.diff, mapreduce.loadaverage.r4.diff, mapreduce.loadaverage.r5.diff Original Estimate: 0.33h Remaining Estimate: 0.33h Load averages could be useful in scheduling. This patch looks to extend the existing Linux resource plugin (via /proc/loadavg file) to allow transmitting load averages of the last one minute via the TaskTrackerStatus. Patch is up for review, with test cases added, at: https://reviews.apache.org/r/20/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2170) Send out last-minute load averages in TaskTrackerStatus
[ https://issues.apache.org/jira/browse/MAPREDUCE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated MAPREDUCE-2170: - Status: Patch Available (was: In Progress) This should pass the tests. Had not added the TaskTracker.java diff previously, oops. Send out last-minute load averages in TaskTrackerStatus --- Key: MAPREDUCE-2170 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2170 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.22.0 Environment: GNU/Linux Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Minor Fix For: 0.22.0 Attachments: mapreduce.loadaverage.r3.diff, mapreduce.loadaverage.r4.diff, mapreduce.loadaverage.r5.diff Original Estimate: 0.33h Remaining Estimate: 0.33h Load averages could be useful in scheduling. This patch looks to extend the existing Linux resource plugin (via /proc/loadavg file) to allow transmitting load averages of the last one minute via the TaskTrackerStatus. Patch is up for review, with test cases added, at: https://reviews.apache.org/r/20/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.