[jira] [Commented] (MAPREDUCE-2037) Capturing interim progress times, CPU usage, and memory usage, when tasks reach certain progress thresholds
[ https://issues.apache.org/jira/browse/MAPREDUCE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084828#comment-13084828 ] Hudson commented on MAPREDUCE-2037: --- Integrated in Hadoop-Mapreduce-trunk #754 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/754/]) MAPREDUCE-2037. Capture intermediate progress, CPU and memory usage for tasks. Contributed by Dick King. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1157253 Files : * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/AvroArrayUtils.java * /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/MapTaskAttemptInfo.java * /hadoop/common/trunk/mapreduce/src/java/mapred-default.xml * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/TaskInProgress.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/Counters.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/Events.avpr * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/TaskAttemptUnsuccessfulCompletionEvent.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/StatePeriodicStats.java * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/tools/rumen/TestRumenJobTraces.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/ReduceAttemptFinishedEvent.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/server/jobtracker/JTConfig.java * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestTaskPerformanceSplits.java * /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/ZombieJob.java * /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/ReduceAttempt20LineHistoryEventEmitter.java * /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/TaskAttemptInfo.java * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEvents.java * /hadoop/common/trunk/mapreduce/CHANGES.txt * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/CumulativePeriodicStats.java * /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/ReduceTaskAttemptInfo.java * /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/TaskAttempt20LineEventEmitter.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/PeriodicStatsAccumulator.java * /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/JobBuilder.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/ProgressSplitsBlock.java * /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/LoggedTaskAttempt.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java * /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/MapAttempt20LineHistoryEventEmitter.java Capturing interim progress times, CPU usage, and memory usage, when tasks reach certain progress thresholds --- Key: MAPREDUCE-2037 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2037 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Dick King Assignee: Dick King Fix For: 0.23.0 Attachments: MAPREDUCE-2037.patch, MAPREDUCE-2037.patch We would like to capture the following information at certain progress thresholds as a task runs: * Time taken so far * CPU load [either at the time the data are taken, or exponentially smoothed] * Memory load [also either at the time the data are taken, or exponentially smoothed] This would be taken at intervals that depend on the task progress plateaus. For example, reducers have three progress ranges -- [0-1/3], (1/3-2/3], and (2/3-3/3] -- where fundamentally different activities happen. Mappers have different boundaries, I understand, that are not symmetrically placed. Data capture boundaries should coincide with activity boundaries. For the state information capture [CPU and memory] we should average over the covered interval. This data would flow in with the heartbeats. It would be placed in the job history as part of the task attempt completion event, so it could be processed by rumen or some similar tool and could drive a benchmark engine. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-279) Map-Reduce 2.0
[ https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084826#comment-13084826 ] Hudson commented on MAPREDUCE-279: -- Integrated in Hadoop-Mapreduce-trunk #754 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/754/]) MAPREDUCE-2837. Ported bug fixes from y-merge to prepare for MAPREDUCE-279 merge. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1157249 Files : * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCache.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/ACLsManager.java * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/security/TestBinaryTokenFile.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/MapTask.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/Job.java * /hadoop/common/trunk/mapreduce/src/test/mapred-site.xml * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/security/TokenCache.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/MapOutputFile.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/ReduceTask.java * /hadoop/common/trunk/mapreduce/src/webapps/job/jobdetailshistory.jsp * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/security/TestMapredGroupMappingServiceRefresh.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/TaskTracker.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/TaskFinishedEvent.java * /hadoop/common/trunk/mapreduce/CHANGES.txt * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobACLsManager.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/TaskMemoryManagerThread.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/MROutputFiles.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/Task.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/MRConfig.java * /hadoop/common/trunk/mapreduce/src/examples/org/apache/hadoop/examples/terasort/TeraInputFormat.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/LocalJobRunner.java * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestMapRed.java Map-Reduce 2.0 -- Key: MAPREDUCE-279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 0.23.0 Attachments: MR-279.patch, MR-279.patch, MR-279.sh, MR-279_MR_files_to_move.txt, MapReduce_NextGen_Architecture.pdf, capacity-scheduler-dark-theme.png, hadoop_contributors_meet_07_01_2011.pdf, multi-column-stable-sort-default-theme.png, yarn-state-machine.job.dot, yarn-state-machine.job.png, yarn-state-machine.task-attempt.dot, yarn-state-machine.task-attempt.png, yarn-state-machine.task.dot, yarn-state-machine.task.png Re-factor MapReduce into a generic resource scheduler and a per-job, user-defined component that manages the application execution. Check it out by following [the instructions|http://goo.gl/rSJJC]. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2839) MR Jobs fail on a secure cluster with viewfs
[ https://issues.apache.org/jira/browse/MAPREDUCE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084830#comment-13084830 ] Hudson commented on MAPREDUCE-2839: --- Integrated in Hadoop-Mapreduce-trunk #754 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/754/]) MAPREDUCE-2839. Fixed TokenCache to get delegation tokens using both new and old apis. Contributed by Siddharth Seth. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1157420 Files : * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCache.java * /hadoop/common/trunk/mapreduce/CHANGES.txt * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/security/TokenCache.java MR Jobs fail on a secure cluster with viewfs Key: MAPREDUCE-2839 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2839 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: 0.23.0 Attachments: MR2839_0.patch, MR2839_279_0.patch, MR2839_trunk_1.patch TokenCache needs to use the new FileSystem.getDelegationTokens api for it to work with viewfs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure
[ https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084827#comment-13084827 ] Hudson commented on MAPREDUCE-901: -- Integrated in Hadoop-Mapreduce-trunk #754 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/754/]) Fixed bad commit for MAPREDUCE-901. MAPREDUCE-901. Efficient framework counters. Contributed by Luke Lu. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1157454 Files : * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java.orig acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1157290 Files : * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestJobInProgress.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/FileSystemCounter.properties * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/TaskStatus.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/TaskInProgress.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/Counter.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/Counters.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/InterTrackerProtocol.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/counters * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/counters/LimitExceededException.java * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/TestCounters.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/JobCounter.properties * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/counters/AbstractCounters.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/counters/Limits.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/counters/FrameworkCounterGroup.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/counters/package-info.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/counters/CounterGroupBase.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/Task.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java.orig * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/EventReader.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/CounterGroup.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/counters/FileSystemCounterGroup.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/util/CountersStrings.java * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestMiniMRWithDFS.java * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestCombineOutputCollector.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/Counters.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/counters/GenericCounter.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/counters/AbstractCounterGroup.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/MRJobConfig.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/util/ResourceBundles.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/counters/CounterGroupFactory.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/FileSystemCounter.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/TaskTracker.java * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestSeveral.java * /hadoop/common/trunk/mapreduce/CHANGES.txt * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestMiniMRDFSSort.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/TaskCounter.properties * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/counters/AbstractCounter.java Move Framework Counters into a TaskMetric structure --- Key: MAPREDUCE-901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Luke Lu Fix For: 0.23.0 Attachments: 901_1.patch, 901_1.patch,
[jira] [Commented] (MAPREDUCE-2837) MR-279: Bug fixes ported from y-merge
[ https://issues.apache.org/jira/browse/MAPREDUCE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084829#comment-13084829 ] Hudson commented on MAPREDUCE-2837: --- Integrated in Hadoop-Mapreduce-trunk #754 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/754/]) MAPREDUCE-2837. Ported bug fixes from y-merge to prepare for MAPREDUCE-279 merge. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1157249 Files : * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCache.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/ACLsManager.java * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/security/TestBinaryTokenFile.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/MapTask.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/Job.java * /hadoop/common/trunk/mapreduce/src/test/mapred-site.xml * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/security/TokenCache.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/MapOutputFile.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/ReduceTask.java * /hadoop/common/trunk/mapreduce/src/webapps/job/jobdetailshistory.jsp * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/security/TestMapredGroupMappingServiceRefresh.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/TaskTracker.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/TaskFinishedEvent.java * /hadoop/common/trunk/mapreduce/CHANGES.txt * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobACLsManager.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/TaskMemoryManagerThread.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/MROutputFiles.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/Task.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/MRConfig.java * /hadoop/common/trunk/mapreduce/src/examples/org/apache/hadoop/examples/terasort/TeraInputFormat.java * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/LocalJobRunner.java * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestMapRed.java MR-279: Bug fixes ported from y-merge - Key: MAPREDUCE-2837 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2837 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arun C Murthy Attachments: rest.patch, rest.patch Similar to MAPREDUCE-2679. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2727) MR-279: SleepJob throws divide by zero exception when count = 0
[ https://issues.apache.org/jira/browse/MAPREDUCE-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084831#comment-13084831 ] Hudson commented on MAPREDUCE-2727: --- Integrated in Hadoop-Mapreduce-trunk #754 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/754/]) MAPREDUCE-2727. Fix divide-by-zero error in SleepJob for sleepCount equals 0. Contributed by Jeffrey Naisbitt. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1157422 Files : * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/SleepJob.java * /hadoop/common/trunk/mapreduce/CHANGES.txt MR-279: SleepJob throws divide by zero exception when count = 0 --- Key: MAPREDUCE-2727 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2727 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.23.0 Attachments: MAPREDUCE-2727-trunk.patch, MAPREDUCE-2727.patch When the count is 0 for mappers or reducers, a divide-by-zero exception is thrown. There are existing checks to error out when count 0, which obviously doesn't handle the 0 case. This is causing the MRReliabilityTest to fail. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2541) Race Condition in IndexCache(readIndexFileToCache,removeMap) causes value of totalMemoryUsed corrupt, which may cause TaskTracker continue throw Exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084832#comment-13084832 ] Hudson commented on MAPREDUCE-2541: --- Integrated in Hadoop-Mapreduce-trunk #754 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/754/]) MAPREDUCE-2541. Fixed a race condition in IndexCache.removeMap. Contributed by Binglin Chang. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1157346 Files : * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestIndexCache.java * /hadoop/common/trunk/mapreduce/CHANGES.txt * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/IndexCache.java Race Condition in IndexCache(readIndexFileToCache,removeMap) causes value of totalMemoryUsed corrupt, which may cause TaskTracker continue throw Exception -- Key: MAPREDUCE-2541 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2541 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.1, 0.21.0, 0.22.0, 0.23.0 Environment: all Reporter: Binglin Chang Assignee: Binglin Chang Priority: Critical Fix For: 0.23.0 Attachments: MAPREDUCE-2541.patch, MAPREDUCE-2541.v2.patch The race condition goes like this: Thread1: readIndexFileToCache() totalMemoryUsed.addAndGet(newInd.getSize()) Thread2: removeMap() totalMemoryUsed.addAndGet(-info.getSize()); When SpillRecord is being read from fileSystem, client kills the job, info.getSize() equals 0, so in fact totalMemoryUsed is not reduced, but after thread1 finished reading SpillRecord, it adds the real index size to totalMemoryUsed, which makes the value of totalMemoryUsed wrong(larger). When this value(totalMemoryUsed) exceeds totalMemoryAllowed (this usually happens when a vary large job with vary large reduce number is killed by the user, probably because the user sets a wrong reduce number by mistake), and actually indexCache has not cache anything, freeIndexInformation() will throw exception constantly. A quick fix for this issue is to make removeMap() do nothing, let freeIndexInformation() do this job only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira