[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

Gopal V (JIRA) Thu, 16 May 2013 13:43:18 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659951#comment-13659951
 ]


Gopal V commented on MAPREDUCE-5028:
------------------------------------

I ran the tests again because something didn't seem right - my '+' operation 
was turning into a string concat operation in logging (*ugh*).

{code}
2013-05-15 18:52:47,876 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: input.length = 1342177280, start = 
687161440, length = 687161444
2013-05-15 18:52:47,876 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: count math 687161440 + 687161444 = 
1374322884
2013-05-15 18:52:47,876 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.io.DataInputBuffer$Buffer.reset(DataInputBuffer.java:58)
2013-05-15 18:52:47,876 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.io.DataInputBuffer.reset(DataInputBuffer.java:92)
2013-05-15 18:52:47,876 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:144)
2013-05-15 18:52:47,876 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.mapreduce.task.ReduceContextImpl$ValueIterator.next(ReduceContextImpl.java:237)
....
2013-05-15 18:52:47,861 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: input.length = 1342177280, start = 
905211353, length = 905211357
2013-05-15 18:52:47,861 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: count math 905211353 + 905211357 = 
1810422710
2013-05-15 18:52:47,861 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.io.DataInputBuffer$Buffer.reset(DataInputBuffer.java:58)
2013-05-15 18:52:47,861 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.io.DataInputBuffer.reset(DataInputBuffer.java:92)
2013-05-15 18:52:47,861 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:144)
2013-05-15 18:52:47,861 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
{code}

Those are wrong, definitely wrong.
                
> Maps fail when io.sort.mb is set to high value
> ----------------------------------------------
>
>                 Key: MAPREDUCE-5028
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>            Priority: Critical
>             Fix For: 1.2.0, 2.0.5-beta
>
>         Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
> mr-5028-branch1.patch, MR-5028_testapp.patch, mr-5028-trunk.patch, 
> mr-5028-trunk.patch, mr-5028-trunk.patch, repro-mr-5028.patch
>
>
> Verified the problem exists on branch-1 with the following configuration:
> Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
> io.sort.mb=1280, dfs.block.size=2147483648
> Run teragen to generate 4 GB data
> Maps fail when you run wordcount on this configuration with the following 
> error: 
> {noformat}
> java.io.IOException: Spill failed
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
>       at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
>       at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>       at 
> org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
>       at 
> org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>       at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.EOFException
>       at java.io.DataInputStream.readInt(DataInputStream.java:375)
>       at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>       at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
>       at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>       at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>       at 
> org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

Reply via email to