[jira] [Commented] (MAPREDUCE-4993) AM thinks it was killed when an error occurs setting up a task container launch context
[ https://issues.apache.org/jira/browse/MAPREDUCE-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594519#comment-13594519 ] Abhishek Kapoor commented on MAPREDUCE-4993: Though I am not sure who is handling the cleanseing of exception handling in AM, but as for now, we can set diagnostics in catch statement of the below code. If some one is dealing with excpetion handling I can take up the task for the same :) Let me know. AM thinks it was killed when an error occurs setting up a task container launch context --- Key: MAPREDUCE-4993 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4993 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Abhishek Kapoor If an IOException occurs while setting up a container launch context for a task then the AM exits with a KILLED status and no diagnostics. The job should be marked as FAILED (or maybe ERROR) with a useful diagnostics message indicating the nature of the error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4988) ClusterWithCapacityScheduler and related testcases needs to be ported to JUnit4.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated MAPREDUCE-4988: -- Target Version/s: 1.1.3 Fix Version/s: (was: 1.1.2) Hi Amir, very sorry but the 1.1.2-rc5 build that passed the release vote, was built on January 31, and was in voting consideration when you posted this, so I didn't notice it. You can still submit it to branch-1.1 for a potential 1.1.3 release, if you wish. I won't mark it for 1.2.0 since you said it's already good there. Thanks. ClusterWithCapacityScheduler and related testcases needs to be ported to JUnit4. Key: MAPREDUCE-4988 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4988 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 1.1.1, 1.1.2 Environment: Linux on PPC and x86 Reporter: Amir Sanjar Priority: Minor Attachments: MAPREDUCE-4988-1.1.1.patch TestJobTrackerRestartWithCS, TestCapacitySchedulerServlet, and TestCapacitySchedulerWithJobTracker testcases potentially could fail when they are build with ant 1.8.4. Solution: port above testcases and ClusterWithCapacityScheduler class from Junit3 to Junit4. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4821) Unit Test: TestJobTrackerRestart fails when it is run with ant-1.8.4
[ https://issues.apache.org/jira/browse/MAPREDUCE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594539#comment-13594539 ] Matt Foley commented on MAPREDUCE-4821: --- Hi Amir, very sorry but the 1.1.2-rc5 build that passed the release vote, was built on January 31, and was in voting consideration when you posted this, so I didn't notice it. Agree with 1.2.0 as target version. Clearing fixVersion for now, as the patch is not yet committed. Thanks. Unit Test: TestJobTrackerRestart fails when it is run with ant-1.8.4 Key: MAPREDUCE-4821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4821 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 1.0.3, 1.0.4 Environment: RHEL 6.3 on x86 Reporter: Amir Sanjar Fix For: 1.1.2 Attachments: MAPREDUCE-4821-branch1.patch, MAPREDUCE-4821-release-1.0.3.patch Problem: JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not JUnit4: Solution: Migrate the testcase to JUnit4, including: * Remove extends TestCase * Remove import junit.framework.TestCase; * Add import org.junit.*; * Use appropriate annotations such as @After, @Before, @Test. uploading a patch shortly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4821) Unit Test: TestJobTrackerRestart fails when it is run with ant-1.8.4
[ https://issues.apache.org/jira/browse/MAPREDUCE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated MAPREDUCE-4821: -- Fix Version/s: (was: 1.1.2) Unit Test: TestJobTrackerRestart fails when it is run with ant-1.8.4 Key: MAPREDUCE-4821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4821 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 1.0.3, 1.0.4 Environment: RHEL 6.3 on x86 Reporter: Amir Sanjar Attachments: MAPREDUCE-4821-branch1.patch, MAPREDUCE-4821-release-1.0.3.patch Problem: JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not JUnit4: Solution: Migrate the testcase to JUnit4, including: * Remove extends TestCase * Remove import junit.framework.TestCase; * Add import org.junit.*; * Use appropriate annotations such as @After, @Before, @Test. uploading a patch shortly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594598#comment-13594598 ] Hudson commented on MAPREDUCE-5027: --- Integrated in Hadoop-Yarn-trunk #147 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/147/]) MAPREDUCE-5027. Shuffle does not limit number of outstanding connections (Robert Parker via jeagles) (Revision 1453098) Result = SUCCESS jeagles : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1453098 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java Shuffle does not limit number of outstanding connections Key: MAPREDUCE-5027 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Robert Parker Fix For: 3.0.0, 0.23.7, 2.0.4-beta Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch The ShuffleHandler does not have any configurable limits to the number of outstanding connections allowed. Therefore a node with many map outputs and many reducers in the cluster trying to fetch those outputs can exhaust a nodemanager out of file descriptors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594670#comment-13594670 ] Hudson commented on MAPREDUCE-5027: --- Integrated in Hadoop-Hdfs-0.23-Build #545 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/545/]) MAPREDUCE-5027. Shuffle does not limit number of outstanding connections (Robert Parker via jeagles) (Revision 1453100) Result = FAILURE jeagles : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1453100 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java Shuffle does not limit number of outstanding connections Key: MAPREDUCE-5027 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Robert Parker Fix For: 3.0.0, 0.23.7, 2.0.4-beta Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch The ShuffleHandler does not have any configurable limits to the number of outstanding connections allowed. Therefore a node with many map outputs and many reducers in the cluster trying to fetch those outputs can exhaust a nodemanager out of file descriptors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594682#comment-13594682 ] Hudson commented on MAPREDUCE-5027: --- Integrated in Hadoop-Hdfs-trunk #1336 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1336/]) MAPREDUCE-5027. Shuffle does not limit number of outstanding connections (Robert Parker via jeagles) (Revision 1453098) Result = SUCCESS jeagles : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1453098 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java Shuffle does not limit number of outstanding connections Key: MAPREDUCE-5027 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Robert Parker Fix For: 3.0.0, 0.23.7, 2.0.4-beta Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch The ShuffleHandler does not have any configurable limits to the number of outstanding connections allowed. Therefore a node with many map outputs and many reducers in the cluster trying to fetch those outputs can exhaust a nodemanager out of file descriptors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594728#comment-13594728 ] Hudson commented on MAPREDUCE-5027: --- Integrated in Hadoop-Mapreduce-trunk #1364 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1364/]) MAPREDUCE-5027. Shuffle does not limit number of outstanding connections (Robert Parker via jeagles) (Revision 1453098) Result = SUCCESS jeagles : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1453098 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java Shuffle does not limit number of outstanding connections Key: MAPREDUCE-5027 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Robert Parker Fix For: 3.0.0, 0.23.7, 2.0.4-beta Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch The ShuffleHandler does not have any configurable limits to the number of outstanding connections allowed. Therefore a node with many map outputs and many reducers in the cluster trying to fetch those outputs can exhaust a nodemanager out of file descriptors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3688) Need better Error message if AM is killed/throws exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated MAPREDUCE-3688: Attachment: mapreduce-3688-h0.23-v02.patch Another common error is ApplicationMaster going out of memory when number of tasks are large. Adding error message to stdout so that OOM would show. {quote} Diagnostics: Application application_1362579399138_0003 failed 1 times due to AM Container for appattempt_1362579399138_0003_01 exited with exitCode: 255 due to: Error starting MRAppMaster: java.lang.OutOfMemoryError: Java heap space at {quote} Forgot to mention but having these messages to UI also means it would show up on jobclient(console) side as well. Need better Error message if AM is killed/throws exception -- Key: MAPREDUCE-3688 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3688 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, mrv2 Affects Versions: 0.23.1 Reporter: David Capwell Assignee: Sandy Ryza Fix For: 0.23.2 Attachments: mapreduce-3688-h0.23-v01.patch, mapreduce-3688-h0.23-v02.patch We need better error messages in the UI if the AM gets killed or throws an Exception. If the following error gets thrown: java.lang.NumberFormatException: For input string: 9223372036854775807l // last char is an L then the UI should say this exception. Instead I get the following: Application application_1326504761991_0018 failed 1 times due to AM Container for appattempt_1326504761991_0018_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3685) There are some bugs in implementation of MergeManager
[ https://issues.apache.org/jira/browse/MAPREDUCE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-3685: - Resolution: Fixed Fix Version/s: 2.0.4-beta 0.23.7 Target Version/s: (was: 3.0.0, 2.0.3-alpha, 0.23.7) Status: Resolved (was: Patch Available) There are some bugs in implementation of MergeManager - Key: MAPREDUCE-3685 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3685 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: anty.rao Assignee: anty Priority: Critical Fix For: 0.23.7, 2.0.4-beta Attachments: MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3685) There are some bugs in implementation of MergeManager
[ https://issues.apache.org/jira/browse/MAPREDUCE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594761#comment-13594761 ] Arun C Murthy commented on MAPREDUCE-3685: -- I just committed this to trunk, branch-2 and branch-0.23. Thanks Anty and Ravi! There are some bugs in implementation of MergeManager - Key: MAPREDUCE-3685 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3685 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: anty.rao Assignee: anty Priority: Critical Fix For: 0.23.7, 2.0.4-beta Attachments: MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3685) There are some bugs in implementation of MergeManager
[ https://issues.apache.org/jira/browse/MAPREDUCE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594771#comment-13594771 ] Hudson commented on MAPREDUCE-3685: --- Integrated in Hadoop-trunk-Commit #3422 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3422/]) MAPREDUCE-3685. Fix bugs in MergeManager to ensure compression codec is appropriately used and that on-disk segments are correctly sorted on file-size. Contributed by Anty Rao and Ravi Prakash. (Revision 1453365) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1453365 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java There are some bugs in implementation of MergeManager - Key: MAPREDUCE-3685 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3685 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: anty.rao Assignee: anty Priority: Critical Fix For: 0.23.7, 2.0.4-beta Attachments: MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4993) AM thinks it was killed when an error occurs setting up a task container launch context
[ https://issues.apache.org/jira/browse/MAPREDUCE-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594818#comment-13594818 ] Jason Lowe commented on MAPREDUCE-4993: --- Setting the diagnostics isn't sufficient if we then continue to throw the exception. As mentioned above, the exception ends up bubbling all the way up to the AsyncDispatcher handler thread which forcibly exits the process. That's not good and leads to a misleading status like KILLED. There needs to be better exception handling as well as diagnostics. AM thinks it was killed when an error occurs setting up a task container launch context --- Key: MAPREDUCE-4993 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4993 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Abhishek Kapoor If an IOException occurs while setting up a container launch context for a task then the AM exits with a KILLED status and no diagnostics. The job should be marked as FAILED (or maybe ERROR) with a useful diagnostics message indicating the nature of the error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5042) Reducer unable to fetch for a map task that was recovered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5042: -- Attachment: MAPREDUCE-5042.patch Minor update to patch to fix some test failures Reducer unable to fetch for a map task that was recovered - Key: MAPREDUCE-5042 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, security Affects Versions: 0.23.7, 2.0.4-beta Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-5042.patch, MAPREDUCE-5042.patch If an application attempt fails and is relaunched the AM will try to recover previously completed tasks. If a reducer needs to fetch the output of a map task attempt that was recovered then it will fail with a 401 error like this: {noformat} java.io.IOException: Server returned HTTP response code: 401 for URL: http://xx:xx/mapOutput?job=job_1361569180491_21845reduce=0map=attempt_1361569180491_21845_m_16_0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156) {noformat} Looking at the corresponding NM's logs, we see the shuffle failed due to Verification of the hashReply failed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5042) Reducer unable to fetch for a map task that was recovered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5042: -- Target Version/s: 0.23.7, 2.0.4-beta Status: Patch Available (was: Open) Reducer unable to fetch for a map task that was recovered - Key: MAPREDUCE-5042 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, security Affects Versions: 0.23.7, 2.0.4-beta Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-5042.patch, MAPREDUCE-5042.patch If an application attempt fails and is relaunched the AM will try to recover previously completed tasks. If a reducer needs to fetch the output of a map task attempt that was recovered then it will fail with a 401 error like this: {noformat} java.io.IOException: Server returned HTTP response code: 401 for URL: http://xx:xx/mapOutput?job=job_1361569180491_21845reduce=0map=attempt_1361569180491_21845_m_16_0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156) {noformat} Looking at the corresponding NM's logs, we see the shuffle failed due to Verification of the hashReply failed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5042) Reducer unable to fetch for a map task that was recovered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595370#comment-13595370 ] Hadoop QA commented on MAPREDUCE-5042: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12572443/MAPREDUCE-5042.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3389//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3389//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3389//console This message is automatically generated. Reducer unable to fetch for a map task that was recovered - Key: MAPREDUCE-5042 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, security Affects Versions: 0.23.7, 2.0.4-beta Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-5042.patch, MAPREDUCE-5042.patch If an application attempt fails and is relaunched the AM will try to recover previously completed tasks. If a reducer needs to fetch the output of a map task attempt that was recovered then it will fail with a 401 error like this: {noformat} java.io.IOException: Server returned HTTP response code: 401 for URL: http://xx:xx/mapOutput?job=job_1361569180491_21845reduce=0map=attempt_1361569180491_21845_m_16_0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156) {noformat} Looking at the corresponding NM's logs, we see the shuffle failed due to Verification of the hashReply failed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5042) Reducer unable to fetch for a map task that was recovered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595392#comment-13595392 ] Jason Lowe commented on MAPREDUCE-5042: --- Release audit warnings are unrelated to the patch. Reducer unable to fetch for a map task that was recovered - Key: MAPREDUCE-5042 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, security Affects Versions: 0.23.7, 2.0.4-beta Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-5042.patch, MAPREDUCE-5042.patch If an application attempt fails and is relaunched the AM will try to recover previously completed tasks. If a reducer needs to fetch the output of a map task attempt that was recovered then it will fail with a 401 error like this: {noformat} java.io.IOException: Server returned HTTP response code: 401 for URL: http://xx:xx/mapOutput?job=job_1361569180491_21845reduce=0map=attempt_1361569180491_21845_m_16_0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156) {noformat} Looking at the corresponding NM's logs, we see the shuffle failed due to Verification of the hashReply failed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-5028: Fix Version/s: 1.2.0 Thanks Alejandro. Given Alejandro has committed this to branch-1, adding 1.2 to the fix version. Maps fail when io.sort.mb is set to high value -- Key: MAPREDUCE-5028 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Fix For: 1.2.0 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-trunk.patch Verified the problem exists on branch-1 with the following configuration: Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, io.sort.mb=1280, dfs.block.size=2147483648 Run teragen to generate 4 GB data Maps fail when you run wordcount on this configuration with the following error: {noformat} java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45) at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116) at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595519#comment-13595519 ] qiangliu commented on MAPREDUCE-2911: - Where could download the available version? thanks. Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Ralph H Castain Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
[ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595526#comment-13595526 ] Milind Bhandarkar commented on MAPREDUCE-2911: -- The community just created a huge issue for me to make this available to the community, by naming us anti-community. So, while I am trying to get this available to the community, I have to now a few more obstacles to overcome. Please bear with me, or better still try to stop the community to stop their bile-spewing against us, so that we can navigate through this mess. Hamster: Hadoop And Mpi on the same cluSTER --- Key: MAPREDUCE-2911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.0 Environment: All Unix-Environments Reporter: Milind Bhandarkar Assignee: Ralph H Castain Original Estimate: 336h Remaining Estimate: 336h MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3635) Improve Hadoop subcomponent integration in Hadoop 0.23
[ https://issues.apache.org/jira/browse/MAPREDUCE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595534#comment-13595534 ] Suresh Srinivas commented on MAPREDUCE-3635: Given HADOOP-7939 is marked resolved, is this still needed? Improve Hadoop subcomponent integration in Hadoop 0.23 -- Key: MAPREDUCE-3635 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3635 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build, client Affects Versions: 0.23.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Fix For: 0.24.0 Please see HADOOP-7939 for a complete description and discussion. This JIRA is for patch tracking purposes only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3685) There are some bugs in implementation of MergeManager
[ https://issues.apache.org/jira/browse/MAPREDUCE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595550#comment-13595550 ] Mariappan Asokan commented on MAPREDUCE-3685: - Hi Ravi, I guess I am too late to comment since your patch has been committed already. In any case, I have the following comments since you asked:) * In {{closeOnDiskFile()}} the following lines of code {code} if (onDiskMapOutputs.size() = (2 * ioSortFactor - 1)) { onDiskMerger.startMerge(onDiskMapOutputs); } {code} can be changed to {code} if (onDiskMapOutputs.size() = ioSortFactor) { onDiskMerger.startMerge(onDiskMapOutputs); } {code} Please confirm. * In the class {{CompressAwarePath}} there is a nit in {{compareTo().}} The following lines: {code} } else if (this.getCompressedSize() compPath.getCompressedSize()) { return 1; {code} can be simplified as: {code} } else { return 1; {code} The set will be partially ordered without an additional compare and without executing the line {code} return super.compareTo(obj); {code} * Since the patch fixes some performance issues, did you have a chance to run some benchmarks that show improvements? I know this will take some time. I will leave it to you. -- Asokan There are some bugs in implementation of MergeManager - Key: MAPREDUCE-3685 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3685 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: anty.rao Assignee: anty Priority: Critical Fix For: 0.23.7, 2.0.4-beta Attachments: MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4993) AM thinks it was killed when an error occurs setting up a task container launch context
[ https://issues.apache.org/jira/browse/MAPREDUCE-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595601#comment-13595601 ] Abhishek Kapoor commented on MAPREDUCE-4993: If the jar required for the job gets deleted in the middle while job is running, then job should be failed and AM for the specific job should die, isn't it a expected behaviour? If it is a expected then we can gracefully kill the AM with an appropriate diagnostics. Please correct me if am wrong. AM thinks it was killed when an error occurs setting up a task container launch context --- Key: MAPREDUCE-4993 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4993 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Abhishek Kapoor If an IOException occurs while setting up a container launch context for a task then the AM exits with a KILLED status and no diagnostics. The job should be marked as FAILED (or maybe ERROR) with a useful diagnostics message indicating the nature of the error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira