[jira] Created: (MAPREDUCE-2177) The wait for spill completion should call Condition.awaitNanos(long nanosTimeout)
The wait for spill completion should call Condition.awaitNanos(long nanosTimeout) - Key: MAPREDUCE-2177 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2177 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.2 Reporter: Ted Yu We sometimes saw maptask timeout in cdh3b2. Here is log from one of the maptasks: 2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: bufstart = 119534169; bufend = 59763857; bufvoid = 298844160 2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: kvstart = 438913; kvend = 585320; length = 983040 2010-11-04 10:34:41,615 INFO org.apache.hadoop.mapred.MapTask: Finished spill 3 2010-11-04 10:35:45,352 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-11-04 10:35:45,547 INFO org.apache.hadoop.mapred.MapTask: bufstart = 59763857; bufend = 298837899; bufvoid = 298844160 2010-11-04 10:35:45,547 INFO org.apache.hadoop.mapred.MapTask: kvstart = 585320; kvend = 731585; length = 983040 2010-11-04 10:45:41,289 INFO org.apache.hadoop.mapred.MapTask: Finished spill 4 Note how long the last spill took. In MapTask.java, the following code waits for spill to finish: while (kvstart != kvend) { reporter.progress(); spillDone.await(); } In trunk code, code is similar. There is no timeout mechanism for Condition.await(). In case the SpillThread takes long before calling spillDone.signal(), we would see timeout. Condition.awaitNanos(long nanosTimeout) should be called. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1706) Log RAID recoveries on HDFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929242#action_12929242 ] Ramkumar Vadali commented on MAPREDUCE-1706: This looks good to me. +1 > Log RAID recoveries on HDFS > --- > > Key: MAPREDUCE-1706 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1706 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Reporter: Rodrigo Schmidt >Assignee: Scott Chen > Attachments: MAPREDUCE-1706.txt > > > It would be good to have a way to centralize all the recovery logs, since > recovery can be executed by any hdfs client. The best place to store this > information is HDFS itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1704) Parity files that are outdated or nonexistent should be immediately disregarded
[ https://issues.apache.org/jira/browse/MAPREDUCE-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929239#action_12929239 ] Ramkumar Vadali commented on MAPREDUCE-1704: This is not an issue anymore > Parity files that are outdated or nonexistent should be immediately > disregarded > --- > > Key: MAPREDUCE-1704 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1704 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid >Affects Versions: 0.22.0 >Reporter: Rodrigo Schmidt >Assignee: Scott Chen > Fix For: 0.22.0 > > > In the current implementation, old or nonexistent parity files are not > immediately disregarded. Absence will trigger exceptions, but old files could > lead to bad recoveries and maybe data corruption. This should be fixed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.