[jira] Created: (MAPREDUCE-2177) The wait for spill completion should call Condition.awaitNanos(long nanosTimeout)

2010-11-06 Thread Ted Yu (JIRA)
The wait for spill completion should call Condition.awaitNanos(long 
nanosTimeout)
-

 Key: MAPREDUCE-2177
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2177
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.20.2
Reporter: Ted Yu


We sometimes saw maptask timeout in cdh3b2. Here is log from one of the 
maptasks:

2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: Spilling map 
output: buffer full= true
2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: bufstart = 
119534169; bufend = 59763857; bufvoid = 298844160

2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: kvstart = 
438913; kvend = 585320; length = 983040
2010-11-04 10:34:41,615 INFO org.apache.hadoop.mapred.MapTask: Finished spill 3
2010-11-04 10:35:45,352 INFO org.apache.hadoop.mapred.MapTask: Spilling map 
output: buffer full= true

2010-11-04 10:35:45,547 INFO org.apache.hadoop.mapred.MapTask: bufstart = 
59763857; bufend = 298837899; bufvoid = 298844160
2010-11-04 10:35:45,547 INFO org.apache.hadoop.mapred.MapTask: kvstart = 
585320; kvend = 731585; length = 983040

2010-11-04 10:45:41,289 INFO org.apache.hadoop.mapred.MapTask: Finished spill 4

Note how long the last spill took.

In MapTask.java, the following code waits for spill to finish:
while (kvstart != kvend) { reporter.progress(); spillDone.await(); }

In trunk code, code is similar.

There is no timeout mechanism for Condition.await(). In case the SpillThread 
takes long before calling spillDone.signal(), we would see timeout.
Condition.awaitNanos(long nanosTimeout) should be called.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1706) Log RAID recoveries on HDFS

2010-11-06 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929242#action_12929242
 ] 

Ramkumar Vadali commented on MAPREDUCE-1706:


This looks good to me.
+1

> Log RAID recoveries on HDFS
> ---
>
> Key: MAPREDUCE-1706
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1706
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Reporter: Rodrigo Schmidt
>Assignee: Scott Chen
> Attachments: MAPREDUCE-1706.txt
>
>
> It would be good to have a way to centralize all the recovery logs, since 
> recovery can be executed by any hdfs client. The best place to store this 
> information is HDFS itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1704) Parity files that are outdated or nonexistent should be immediately disregarded

2010-11-06 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929239#action_12929239
 ] 

Ramkumar Vadali commented on MAPREDUCE-1704:


This is not an issue anymore

> Parity files that are outdated or nonexistent should be immediately 
> disregarded
> ---
>
> Key: MAPREDUCE-1704
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1704
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
>Affects Versions: 0.22.0
>Reporter: Rodrigo Schmidt
>Assignee: Scott Chen
> Fix For: 0.22.0
>
>
> In the current implementation, old or nonexistent parity files are not 
> immediately disregarded. Absence will trigger exceptions, but old files could 
> lead to bad recoveries and maybe data corruption. This should be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.