[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632626#comment-13632626 ] Karthik Kambatla commented on MAPREDUCE-5110: - Thanks for chiming in, Vinod. My intention was precisely to add an aggressive timeout for task attempt launches and keeping it job-configurable should be good. We can implement it either on JT or TT. Do you think it is okay to implement in on TT? Please suggest - I ll upload a patch accordingly. If interested, the user should be able to configure this timeout to be shorter than the tracker-expiry-interval to ensure a single attempt. Long task launch delays can lead to multiple parallel attempts of the task -- Key: MAPREDUCE-5110 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.2 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, mr-5110-tt-only.patch If a task takes too long to launch, the JT expires the task and schedules another attempt. The earlier attempt can start after the later attempt leading to two parallel attempts running at the same time. This is particularly an issue if the user turns off speculation and expects a single attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5151) Update MR App after YARN-444
[ https://issues.apache.org/jira/browse/MAPREDUCE-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632741#comment-13632741 ] Hudson commented on MAPREDUCE-5151: --- Integrated in Hadoop-Yarn-trunk #185 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/185/]) YARN-444. Moved special container exit codes from YarnConfiguration to API where they belong. Contributed by Sandy Ryza. MAPREDUCE-5151. Updated MR AM to use standard exit codes from the API after YARN-444. Contributed by Sandy Ryza. (Revision 1468276) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1468276 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ContainerExitStatus.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerStatus.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/ContainerInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java Update MR App after YARN-444 Key: MAPREDUCE-5151 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5151 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Sandy Ryza Fix For: 2.0.5-beta Attachments: MAPREDUCE-5151.txt YARN-444 is moving standard exit codes from YarnConfiguration into a separate record, creating a tracking ticket for MR only changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632742#comment-13632742 ] Hudson commented on MAPREDUCE-4974: --- Integrated in Hadoop-Yarn-trunk #185 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/185/]) MAPREDUCE-4974. Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1468232) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1468232 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: trunk, 2.0.5-beta Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632808#comment-13632808 ] Sonu Prathap commented on MAPREDUCE-4974: - Could somebody kindly do a check, share , update with the Test Case with compressed input in TestLineRecordReader, and recheck MAPREDUCE 4974. please refer, MAPREDUCE-5143 Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: trunk, 2.0.5-beta Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5151) Update MR App after YARN-444
[ https://issues.apache.org/jira/browse/MAPREDUCE-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632814#comment-13632814 ] Hudson commented on MAPREDUCE-5151: --- Integrated in Hadoop-Hdfs-trunk #1374 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1374/]) YARN-444. Moved special container exit codes from YarnConfiguration to API where they belong. Contributed by Sandy Ryza. MAPREDUCE-5151. Updated MR AM to use standard exit codes from the API after YARN-444. Contributed by Sandy Ryza. (Revision 1468276) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1468276 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ContainerExitStatus.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerStatus.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/ContainerInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java Update MR App after YARN-444 Key: MAPREDUCE-5151 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5151 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Sandy Ryza Fix For: 2.0.5-beta Attachments: MAPREDUCE-5151.txt YARN-444 is moving standard exit codes from YarnConfiguration into a separate record, creating a tracking ticket for MR only changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632815#comment-13632815 ] Hudson commented on MAPREDUCE-4974: --- Integrated in Hadoop-Hdfs-trunk #1374 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1374/]) MAPREDUCE-4974. Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1468232) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1468232 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: trunk, 2.0.5-beta Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5151) Update MR App after YARN-444
[ https://issues.apache.org/jira/browse/MAPREDUCE-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632850#comment-13632850 ] Hudson commented on MAPREDUCE-5151: --- Integrated in Hadoop-Mapreduce-trunk #1401 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1401/]) YARN-444. Moved special container exit codes from YarnConfiguration to API where they belong. Contributed by Sandy Ryza. MAPREDUCE-5151. Updated MR AM to use standard exit codes from the API after YARN-444. Contributed by Sandy Ryza. (Revision 1468276) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1468276 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ContainerExitStatus.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerStatus.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/dao/ContainerInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java Update MR App after YARN-444 Key: MAPREDUCE-5151 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5151 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Sandy Ryza Fix For: 2.0.5-beta Attachments: MAPREDUCE-5151.txt YARN-444 is moving standard exit codes from YarnConfiguration into a separate record, creating a tracking ticket for MR only changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632851#comment-13632851 ] Hudson commented on MAPREDUCE-4974: --- Integrated in Hadoop-Mapreduce-trunk #1401 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1401/]) MAPREDUCE-4974. Optimising the LineRecordReader initialize() method (Gelesh via bobby) (Revision 1468232) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1468232 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: trunk, 2.0.5-beta Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5153) Support for running combiners without reducers
Radim Kolar created MAPREDUCE-5153: -- Summary: Support for running combiners without reducers Key: MAPREDUCE-5153 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5153 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Radim Kolar scenario: Workflow mapper - sort - combiner - hdfs No api change is need, if user set combiner class and reducers = 0 then run combiner and sent output to HDFS. Popular libraries such as scalding and cascading are offering this functionality, but they use caching entire mapper output in memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5015) Coverage fix for org.apache.hadoop.mapreduce.tools.CLI
[ https://issues.apache.org/jira/browse/MAPREDUCE-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632950#comment-13632950 ] Thomas Graves commented on MAPREDUCE-5015: -- +1 looks good. Thanks Aleksey! Coverage fix for org.apache.hadoop.mapreduce.tools.CLI -- Key: MAPREDUCE-5015 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5015 Project: Hadoop Map/Reduce Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.5 Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: MAPREDUCE-5015-branch-0.23-a.patch, MAPREDUCE-5015-branch-0.23-b.patch, MAPREDUCE-5015-branch-0.23.patch, MAPREDUCE-5015-branch-2-a.patch, MAPREDUCE-5015-branch-2-b.patch, MAPREDUCE-5015-branch-2.patch, MAPREDUCE-5015-trunk-a.patch, MAPREDUCE-5015-trunk-b.patch, MAPREDUCE-5015-trunk.patch Coverage fix for org.apache.hadoop.mapreduce.tools.CLI MAPREDUCE-5015-trunk.patch patch for trunk MAPREDUCE-5015-branch-2.patch for branch-2 MAPREDUCE-5015-branch-0.23.patch for branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5015) Coverage fix for org.apache.hadoop.mapreduce.tools.CLI
[ https://issues.apache.org/jira/browse/MAPREDUCE-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved MAPREDUCE-5015. -- Resolution: Fixed Fix Version/s: 0.23.8 2.0.5-beta 3.0.0 Coverage fix for org.apache.hadoop.mapreduce.tools.CLI -- Key: MAPREDUCE-5015 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5015 Project: Hadoop Map/Reduce Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.5 Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Fix For: 3.0.0, 2.0.5-beta, 0.23.8 Attachments: MAPREDUCE-5015-branch-0.23-a.patch, MAPREDUCE-5015-branch-0.23-b.patch, MAPREDUCE-5015-branch-0.23.patch, MAPREDUCE-5015-branch-2-a.patch, MAPREDUCE-5015-branch-2-b.patch, MAPREDUCE-5015-branch-2.patch, MAPREDUCE-5015-trunk-a.patch, MAPREDUCE-5015-trunk-b.patch, MAPREDUCE-5015-trunk.patch Coverage fix for org.apache.hadoop.mapreduce.tools.CLI MAPREDUCE-5015-trunk.patch patch for trunk MAPREDUCE-5015-branch-2.patch for branch-2 MAPREDUCE-5015-branch-0.23.patch for branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5015) Coverage fix for org.apache.hadoop.mapreduce.tools.CLI
[ https://issues.apache.org/jira/browse/MAPREDUCE-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632969#comment-13632969 ] Hudson commented on MAPREDUCE-5015: --- Integrated in Hadoop-trunk-Commit #3617 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3617/]) MAPREDUCE-5015. Coverage fix for org.apache.hadoop.mapreduce.tools.CLI (Aleksey Gorshkov via tgraves) (Revision 1468483) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1468483 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/tools/CLI.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/pom.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestMRJobClient.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/resources/job_1329348432655_0001-10.jhist Coverage fix for org.apache.hadoop.mapreduce.tools.CLI -- Key: MAPREDUCE-5015 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5015 Project: Hadoop Map/Reduce Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.5 Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Fix For: 3.0.0, 2.0.5-beta, 0.23.8 Attachments: MAPREDUCE-5015-branch-0.23-a.patch, MAPREDUCE-5015-branch-0.23-b.patch, MAPREDUCE-5015-branch-0.23.patch, MAPREDUCE-5015-branch-2-a.patch, MAPREDUCE-5015-branch-2-b.patch, MAPREDUCE-5015-branch-2.patch, MAPREDUCE-5015-trunk-a.patch, MAPREDUCE-5015-trunk-b.patch, MAPREDUCE-5015-trunk.patch Coverage fix for org.apache.hadoop.mapreduce.tools.CLI MAPREDUCE-5015-trunk.patch patch for trunk MAPREDUCE-5015-branch-2.patch for branch-2 MAPREDUCE-5015-branch-0.23.patch for branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5064) TestRumenJobTraces failing on 1.3.x and 1.2
[ https://issues.apache.org/jira/browse/MAPREDUCE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated MAPREDUCE-5064: -- Summary: TestRumenJobTraces failing on 1.3.x and 1.2 (was: TestRumenJobTraces failing on 1.3.x) TestRumenJobTraces failing on 1.3.x and 1.2 --- Key: MAPREDUCE-5064 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5064 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.3.0 Environment: OS/X, java 1.6.0_41, GMT, home network (no DNS) Reporter: Steve Loughran Priority: Minor {{TestRumenJobTraces.testCurrentJHParser()}} is failing locally, both in a bulk test and standalone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5064) TestRumenJobTraces failing on 1.3.x
[ https://issues.apache.org/jira/browse/MAPREDUCE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633115#comment-13633115 ] Brandon Li commented on MAPREDUCE-5064: --- Same test failed in branch-1.2. Here is the error in the test log: {noformat} - Standard Error - In LoggedJob, we saw the unknown attribute numberMaps. In LoggedJob, we saw the unknown attribute numberReduces. - --- Testcase: testSmallTrace took 1.207 sec Testcase: testTruncatedTask took 0.177 sec Testcase: testRumenViaDispatch took 0.729 sec Testcase: testBracketedCounters took 0.152 sec Testcase: testHadoop20JHParser took 0 sec Testcase: testJobHistoryFilenameParsing took 0.015 sec Testcase: testProcessInputArgument took 0.053 sec Testcase: testCurrentJHParser took 17.699 sec FAILED Content mismatch expected:[MA]P_ATTEMPT_STARTED but was:[SETU]P_ATTEMPT_STARTED junit.framework.AssertionFailedError: Content mismatch expected:[MA]P_ATTEMPT_STARTED but was:[SETU]P_ATTEMPT_STARTED at org.apache.hadoop.tools.rumen.TestRumenJobTraces.testCurrentJHParser(TestRumenJobTraces.java:779) Testcase: testJobConfigurationParsing took 0.026 sec Testcase: testJobConfigurationParser took 0.02 sec Testcase: testResourceUsageMetrics took 0 sec Testcase: testResourceUsageMetricsWithHadoopLogsAnalyzer took 0.047 sec Testcase: testTopologyBuilder took 0.001 sec {noformat} Updated the JIRA title accordingly. TestRumenJobTraces failing on 1.3.x --- Key: MAPREDUCE-5064 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5064 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.3.0 Environment: OS/X, java 1.6.0_41, GMT, home network (no DNS) Reporter: Steve Loughran Priority: Minor {{TestRumenJobTraces.testCurrentJHParser()}} is failing locally, both in a bulk test and standalone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5152) MR App is not using Container from RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633210#comment-13633210 ] Bikas Saha commented on MAPREDUCE-5152: --- Maybe save the container object itself instead of saving most fields in TA? A TA specific unit test that literally checks that the container object sent to TA via container assigned == container object TA sends out to ContainerLauncher? I am hoping container launcher itself is not creating a copy :P MR App is not using Container from RM - Key: MAPREDUCE-5152 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5152 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.5-beta Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: MAPREDUCE-5152-20130415.1.txt, MAPREDUCE-5152-20130415.txt The goal of YARN-486 was to make AMs just pass information encapsulated in Container along to NM instead of doing it themselves by duplicating information. We still do not do this pass-through as intended as YARN-486 avoided the individual field duplication but failed to avoid the duplication of container itself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4680) Job history cleaner should only check timestamps of files in old enough directories
[ https://issues.apache.org/jira/browse/MAPREDUCE-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned MAPREDUCE-4680: --- Assignee: Karthik Kambatla Job history cleaner should only check timestamps of files in old enough directories --- Key: MAPREDUCE-4680 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4680 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 2.0.0-alpha Reporter: Sandy Ryza Assignee: Karthik Kambatla Job history files are stored in /mm/dd folders. Currently, the job history cleaner checks the modification date of each file in every one of these folders to see whether it's past the maximum age. The load on HDFS could be reduced by only checking the ages of files in directories that are old enough, as determined by their name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633462#comment-13633462 ] Hudson commented on MAPREDUCE-5065: --- Integrated in Hadoop-trunk-Commit #3618 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3618/]) MAPREDUCE-5065. DistCp should skip checksum comparisons if block-sizes are different on source/target. Contributed by Mithun Radhakrishnan. (Revision 1468629) Result = SUCCESS kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1468629 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java DistCp should skip checksum comparisons if block-sizes are different on source/target. -- Key: MAPREDUCE-5065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: MAPREDUCE-5065.branch-0.23.patch, MAPREDUCE-5065.branch-2.patch When copying files between 2 clusters with different default block-sizes, one sees that the copy fails with a checksum-mismatch, even though the files have identical contents. The reason is that on HDFS, a file's checksum is unfortunately a function of the block-size of the file. So you could have 2 different files with identical contents (but different block-sizes) have different checksums. (Thus, it's also possible for DistCp to fail to copy files on the same file-system, if the source-file's block-size differs from HDFS default, and -pb isn't used.) I propose that we skip checksum comparisons under the following conditions: 1. -skipCrc is specified. 2. File-size is 0 (in which case the call to the checksum-servlet is moot). 3. source.getBlockSize() != target.getBlockSize(), since the checksums are guaranteed to differ in this case. I have a patch for #3. Edit: I've modified the fix to warn the user (instead of skipping the checksum-check). Skipping parity-checks is unsafe. The code now fails the copy, and suggests that the user either use -pb to preserve block-size, or consider -skipCrc (and forgo copy validation entirely). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633470#comment-13633470 ] Kihwal Lee commented on MAPREDUCE-5065: --- I've committed this to trunk, branch-2 and branch-0.23. DistCp should skip checksum comparisons if block-sizes are different on source/target. -- Key: MAPREDUCE-5065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: MAPREDUCE-5065.branch-0.23.patch, MAPREDUCE-5065.branch-2.patch When copying files between 2 clusters with different default block-sizes, one sees that the copy fails with a checksum-mismatch, even though the files have identical contents. The reason is that on HDFS, a file's checksum is unfortunately a function of the block-size of the file. So you could have 2 different files with identical contents (but different block-sizes) have different checksums. (Thus, it's also possible for DistCp to fail to copy files on the same file-system, if the source-file's block-size differs from HDFS default, and -pb isn't used.) I propose that we skip checksum comparisons under the following conditions: 1. -skipCrc is specified. 2. File-size is 0 (in which case the call to the checksum-servlet is moot). 3. source.getBlockSize() != target.getBlockSize(), since the checksums are guaranteed to differ in this case. I have a patch for #3. Edit: I've modified the fix to warn the user (instead of skipping the checksum-check). Skipping parity-checks is unsafe. The code now fails the copy, and suggests that the user either use -pb to preserve block-size, or consider -skipCrc (and forgo copy validation entirely). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-5065: -- Resolution: Fixed Fix Version/s: 0.23.8 2.0.5-beta 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) DistCp should skip checksum comparisons if block-sizes are different on source/target. -- Key: MAPREDUCE-5065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Fix For: 3.0.0, 2.0.5-beta, 0.23.8 Attachments: MAPREDUCE-5065.branch-0.23.patch, MAPREDUCE-5065.branch-2.patch When copying files between 2 clusters with different default block-sizes, one sees that the copy fails with a checksum-mismatch, even though the files have identical contents. The reason is that on HDFS, a file's checksum is unfortunately a function of the block-size of the file. So you could have 2 different files with identical contents (but different block-sizes) have different checksums. (Thus, it's also possible for DistCp to fail to copy files on the same file-system, if the source-file's block-size differs from HDFS default, and -pb isn't used.) I propose that we skip checksum comparisons under the following conditions: 1. -skipCrc is specified. 2. File-size is 0 (in which case the call to the checksum-servlet is moot). 3. source.getBlockSize() != target.getBlockSize(), since the checksums are guaranteed to differ in this case. I have a patch for #3. Edit: I've modified the fix to warn the user (instead of skipping the checksum-check). Skipping parity-checks is unsafe. The code now fails the copy, and suggests that the user either use -pb to preserve block-size, or consider -skipCrc (and forgo copy validation entirely). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5127) MR job succeeds and exits even when unregister with RM fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633508#comment-13633508 ] Jian He commented on MAPREDUCE-5127: Another scenario regarding AM unregister failing: when RM is restarted beyond maxAttempLimit, after YARN-534, it will remove application state and not populate the attempts back. This will lead AM unregister to fail MR job succeeds and exits even when unregister with RM fails Key: MAPREDUCE-5127 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5127 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, resourcemanager Reporter: Jian He Assignee: Jian He MR app master will clean staging dir, if the job is already succeeded and asked to reboot. If the finishApplicationMaster call fails, RM will consider this job unfinished and launch further attempts, further attempts will fail because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4443) MR AM and job history server should be resilient to jobs that exceed counter limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated MAPREDUCE-4443: - Attachment: MAPREDUCE-4443-trunk-1.patch Adding Test Case. Thanks, Mayank MR AM and job history server should be resilient to jobs that exceed counter limits Key: MAPREDUCE-4443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4443 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Rahul Jain Assignee: Mayank Bansal Labels: usability Attachments: am_failed_counter_limits.txt, MAPREDUCE-4443-trunk-1.patch, MAPREDUCE-4443-trunk-draft.patch We saw this problem migrating applications to MapReduceV2: Our applications use hadoop counters extensively (1000+ counters for certain jobs). While this may not be one of recommended best practices in hadoop, the real issue here is reliability of the framework when applications exceed counter limits. The hadoop servers (yarn, history server) were originally brought up with mapreduce.job.counters.max=1000 under core-site.xml We then ran map-reduce job under an application using its own job specific overrides, with mapreduce.job.counters.max=1 All the tasks for the job finished successfully; however the overall job still failed due to AM encountering exceptions as: {code} 2012-07-12 17:31:43,485 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks : 712012-07-12 17:31:43,502 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher threa dorg.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 1001 max=1000 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:58) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:65) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:77) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:94) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:105) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:202) at org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:337) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1212) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1198) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1179) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:711) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.checkJobCompleteSuccess(JobImpl.java:737) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.checkJobForCompletion(JobImpl.java:1360) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1340) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1323) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:666) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:113) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:890) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:886) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74) at java.lang.Thread.run(Thread.java:662) 2012-07-12 17:31:43,502 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..2012-07-12 17:31:43,503 INFO [Thread-1] org.apache.had {code} The overall job failed, and the job history
[jira] [Updated] (MAPREDUCE-4443) MR AM and job history server should be resilient to jobs that exceed counter limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated MAPREDUCE-4443: - Status: Patch Available (was: Open) MR AM and job history server should be resilient to jobs that exceed counter limits Key: MAPREDUCE-4443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4443 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Rahul Jain Assignee: Mayank Bansal Labels: usability Attachments: am_failed_counter_limits.txt, MAPREDUCE-4443-trunk-1.patch, MAPREDUCE-4443-trunk-draft.patch We saw this problem migrating applications to MapReduceV2: Our applications use hadoop counters extensively (1000+ counters for certain jobs). While this may not be one of recommended best practices in hadoop, the real issue here is reliability of the framework when applications exceed counter limits. The hadoop servers (yarn, history server) were originally brought up with mapreduce.job.counters.max=1000 under core-site.xml We then ran map-reduce job under an application using its own job specific overrides, with mapreduce.job.counters.max=1 All the tasks for the job finished successfully; however the overall job still failed due to AM encountering exceptions as: {code} 2012-07-12 17:31:43,485 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks : 712012-07-12 17:31:43,502 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher threa dorg.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 1001 max=1000 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:58) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:65) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:77) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:94) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:105) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:202) at org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:337) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1212) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1198) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1179) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:711) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.checkJobCompleteSuccess(JobImpl.java:737) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.checkJobForCompletion(JobImpl.java:1360) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1340) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1323) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:666) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:113) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:890) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:886) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74) at java.lang.Thread.run(Thread.java:662) 2012-07-12 17:31:43,502 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..2012-07-12 17:31:43,503 INFO [Thread-1] org.apache.had {code} The overall job failed, and the job history wasn't accessible either at the end of the job
[jira] [Resolved] (MAPREDUCE-3603) Add Web UI to MR2 Fair Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved MAPREDUCE-3603. --- Resolution: Duplicate Assignee: (was: Patrick Wendell) Add Web UI to MR2 Fair Scheduler Key: MAPREDUCE-3603 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3603 Project: Hadoop Map/Reduce Issue Type: New Feature Components: scheduler Reporter: Patrick Wendell -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4443) MR AM and job history server should be resilient to jobs that exceed counter limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633582#comment-13633582 ] Hadoop QA commented on MAPREDUCE-4443: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12579050/MAPREDUCE-4443-trunk-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3529//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3529//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3529//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3529//console This message is automatically generated. MR AM and job history server should be resilient to jobs that exceed counter limits Key: MAPREDUCE-4443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4443 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Rahul Jain Assignee: Mayank Bansal Labels: usability Attachments: am_failed_counter_limits.txt, MAPREDUCE-4443-trunk-1.patch, MAPREDUCE-4443-trunk-draft.patch We saw this problem migrating applications to MapReduceV2: Our applications use hadoop counters extensively (1000+ counters for certain jobs). While this may not be one of recommended best practices in hadoop, the real issue here is reliability of the framework when applications exceed counter limits. The hadoop servers (yarn, history server) were originally brought up with mapreduce.job.counters.max=1000 under core-site.xml We then ran map-reduce job under an application using its own job specific overrides, with mapreduce.job.counters.max=1 All the tasks for the job finished successfully; however the overall job still failed due to AM encountering exceptions as: {code} 2012-07-12 17:31:43,485 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks : 712012-07-12 17:31:43,502 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher threa dorg.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 1001 max=1000 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:58) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:65) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:77) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:94) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:105) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:202) at org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:337) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1212) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1198) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1179) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:711) at
[jira] [Assigned] (MAPREDUCE-4362) If possible, we should get back the feature of propagating task logs back to JobClient
[ https://issues.apache.org/jira/browse/MAPREDUCE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned MAPREDUCE-4362: - Assignee: Sandy Ryza If possible, we should get back the feature of propagating task logs back to JobClient -- Key: MAPREDUCE-4362 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4362 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 2.0.0-alpha Reporter: Vinod Kumar Vavilapalli Assignee: Sandy Ryza MAPREDUCE-3889 removed the code which was trying to pull from /tasklog. We should see if it is possible to get back the feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5154) staging directory deletion fails because delegation tokens have been cancelled
Sandy Ryza created MAPREDUCE-5154: - Summary: staging directory deletion fails because delegation tokens have been cancelled Key: MAPREDUCE-5154 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5154 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza In a secure setup, the jobtracker needs the job's delegation tokens to delete the staging directory. MAPREDUCE-4850 made it so that job cleanup staging directory deletion occurs asynchronously, so that it could order it with system directory deletion. This introduced the issue that a job's delegation tokens could be cancelled before the cleanup thread got around to deleting it, causing the deletion to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5154) staging directory deletion fails because delegation tokens have been cancelled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5154: -- Affects Version/s: (was: 1.1.2) 1.2.0 staging directory deletion fails because delegation tokens have been cancelled -- Key: MAPREDUCE-5154 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5154 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza In a secure setup, the jobtracker needs the job's delegation tokens to delete the staging directory. MAPREDUCE-4850 made it so that job cleanup staging directory deletion occurs asynchronously, so that it could order it with system directory deletion. This introduced the issue that a job's delegation tokens could be cancelled before the cleanup thread got around to deleting it, causing the deletion to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5154) staging directory deletion fails because delegation tokens have been cancelled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5154: -- Affects Version/s: (was: 2.0.3-alpha) 1.1.2 staging directory deletion fails because delegation tokens have been cancelled -- Key: MAPREDUCE-5154 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5154 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.1.2 Reporter: Sandy Ryza Assignee: Sandy Ryza In a secure setup, the jobtracker needs the job's delegation tokens to delete the staging directory. MAPREDUCE-4850 made it so that job cleanup staging directory deletion occurs asynchronously, so that it could order it with system directory deletion. This introduced the issue that a job's delegation tokens could be cancelled before the cleanup thread got around to deleting it, causing the deletion to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5154) staging directory deletion fails because delegation tokens have been cancelled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633637#comment-13633637 ] Sandy Ryza commented on MAPREDUCE-5154: --- Uploading a patch that cancels the delegation tokens asynchronously as well. This required modifying CleanupQueue to accept delegation tokens to cancel in addition to files to delete. Both TestJobRecovery and TestJobCleanup pass. staging directory deletion fails because delegation tokens have been cancelled -- Key: MAPREDUCE-5154 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5154 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza In a secure setup, the jobtracker needs the job's delegation tokens to delete the staging directory. MAPREDUCE-4850 made it so that job cleanup staging directory deletion occurs asynchronously, so that it could order it with system directory deletion. This introduced the issue that a job's delegation tokens could be cancelled before the cleanup thread got around to deleting it, causing the deletion to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5154) staging directory deletion fails because delegation tokens have been cancelled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5154: -- Attachment: MAPREDUCE-5154.patch staging directory deletion fails because delegation tokens have been cancelled -- Key: MAPREDUCE-5154 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5154 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5154.patch In a secure setup, the jobtracker needs the job's delegation tokens to delete the staging directory. MAPREDUCE-4850 made it so that job cleanup staging directory deletion occurs asynchronously, so that it could order it with system directory deletion. This introduced the issue that a job's delegation tokens could be cancelled before the cleanup thread got around to deleting it, causing the deletion to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5154) staging directory deletion fails because delegation tokens have been cancelled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5154: -- Status: Patch Available (was: Open) staging directory deletion fails because delegation tokens have been cancelled -- Key: MAPREDUCE-5154 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5154 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5154.patch In a secure setup, the jobtracker needs the job's delegation tokens to delete the staging directory. MAPREDUCE-4850 made it so that job cleanup staging directory deletion occurs asynchronously, so that it could order it with system directory deletion. This introduced the issue that a job's delegation tokens could be cancelled before the cleanup thread got around to deleting it, causing the deletion to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5154) staging directory deletion fails because delegation tokens have been cancelled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633646#comment-13633646 ] Hadoop QA commented on MAPREDUCE-5154: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12579073/MAPREDUCE-5154.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3530//console This message is automatically generated. staging directory deletion fails because delegation tokens have been cancelled -- Key: MAPREDUCE-5154 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5154 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5154.patch In a secure setup, the jobtracker needs the job's delegation tokens to delete the staging directory. MAPREDUCE-4850 made it so that job cleanup staging directory deletion occurs asynchronously, so that it could order it with system directory deletion. This introduced the issue that a job's delegation tokens could be cancelled before the cleanup thread got around to deleting it, causing the deletion to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5155) Race condition in test case TestFetchFailure cause it to fail
nemon lou created MAPREDUCE-5155: Summary: Race condition in test case TestFetchFailure cause it to fail Key: MAPREDUCE-5155 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5155 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.0.3-alpha Environment: Suse x86_64 GNU/Linux Java(TM) SE Runtime Environment (build 1.6.0_32-b05 Reporter: nemon lou Priority: Minor I run into this once: testFetchFailureWithRecovery(org.apache.hadoop.mapreduce.v2.app.TestFetchFailure): Num completion events not correct expected:1 but was:0 There is a race condition between job.getTaskAttemptCompletionEvents and dealing with JOB_TASK_ATTEMPT_COMPLETED event.If job.getTaskAttemptCompletionEvents invoked because of task in SUCCEEDED state ,but before JOB_TASK_ATTEMPT_COMPLETED event scheduled,the test case will fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5155) Race condition in test case TestFetchFailure cause it to fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nemon lou updated MAPREDUCE-5155: - Attachment: org.apache.hadoop.mapreduce.v2.app.TestFetchFailure.txt org.apache.hadoop.mapreduce.v2.app.TestFetchFailure-output.txt Logs are uploaded Race condition in test case TestFetchFailure cause it to fail - Key: MAPREDUCE-5155 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5155 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.0.3-alpha Environment: Suse x86_64 GNU/Linux Java(TM) SE Runtime Environment (build 1.6.0_32-b05 Reporter: nemon lou Priority: Minor Attachments: org.apache.hadoop.mapreduce.v2.app.TestFetchFailure-output.txt, org.apache.hadoop.mapreduce.v2.app.TestFetchFailure.txt I run into this once: testFetchFailureWithRecovery(org.apache.hadoop.mapreduce.v2.app.TestFetchFailure): Num completion events not correct expected:1 but was:0 There is a race condition between job.getTaskAttemptCompletionEvents and dealing with JOB_TASK_ATTEMPT_COMPLETED event.If job.getTaskAttemptCompletionEvents invoked because of task in SUCCEEDED state ,but before JOB_TASK_ATTEMPT_COMPLETED event scheduled,the test case will fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5155) Race condition in test case TestFetchFailure cause it to fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nemon lou updated MAPREDUCE-5155: - Description: I run into this once: testFetchFailureWithRecovery(org.apache.hadoop.mapreduce.v2.app.TestFetchFailure): Num completion events not correct expected:1 but was:0 There is a race condition between job.getTaskAttemptCompletionEvents and dealing with JOB_TASK_ATTEMPT_COMPLETED event. If job.getTaskAttemptCompletionEvents invoked because of task in SUCCEEDED state ,but before JOB_TASK_ATTEMPT_COMPLETED event scheduled,the test case will fail. was: I run into this once: testFetchFailureWithRecovery(org.apache.hadoop.mapreduce.v2.app.TestFetchFailure): Num completion events not correct expected:1 but was:0 There is a race condition between job.getTaskAttemptCompletionEvents and dealing with JOB_TASK_ATTEMPT_COMPLETED event.If job.getTaskAttemptCompletionEvents invoked because of task in SUCCEEDED state ,but before JOB_TASK_ATTEMPT_COMPLETED event scheduled,the test case will fail. Race condition in test case TestFetchFailure cause it to fail - Key: MAPREDUCE-5155 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5155 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.0.3-alpha Environment: Suse x86_64 GNU/Linux Java(TM) SE Runtime Environment (build 1.6.0_32-b05 Reporter: nemon lou Priority: Minor Attachments: org.apache.hadoop.mapreduce.v2.app.TestFetchFailure-output.txt, org.apache.hadoop.mapreduce.v2.app.TestFetchFailure.txt I run into this once: testFetchFailureWithRecovery(org.apache.hadoop.mapreduce.v2.app.TestFetchFailure): Num completion events not correct expected:1 but was:0 There is a race condition between job.getTaskAttemptCompletionEvents and dealing with JOB_TASK_ATTEMPT_COMPLETED event. If job.getTaskAttemptCompletionEvents invoked because of task in SUCCEEDED state ,but before JOB_TASK_ATTEMPT_COMPLETED event scheduled,the test case will fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira