[ https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659666#comment-13659666 ]
Gopal V commented on MAPREDUCE-5028: ------------------------------------ [~kkambatl], I took some time to go through your patch. Patch contains 2 different fixes, which deserve their own tests & commits. Good catch on the overflow with the kvindex,kvend variables. That is a bug with the mapper with large buffers. That is a good & clean fix. But for the second issue, I found out it triggered when the inline Combiner is run when there are > 3 spills in the SpillThread. This wasn't tested in [~acmurthy]'s test-app (but the word-count sum combiner does trigger it cleanly). And there I found your fix to be suspect. So, for the sake of data I logged every call to reset and crawled a 13 gb log file to find out offenders in reset (i.e where (long)start + (long)length > input.length). This particular back-trace stood out as a key offender. I found that to be significant instead of merely locating the overflow cases. {code} org.apache.hadoop.mapred.MapTask$MapOutputBuffer$MRResultIterator.getKey(MapTask.java:1784) org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:138) {code} I will take a closer look at that code, it might be cleaner to tackle the issue at the first-cause location. > Maps fail when io.sort.mb is set to high value > ---------------------------------------------- > > Key: MAPREDUCE-5028 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > Priority: Critical > Fix For: 1.2.0, 2.0.5-beta > > Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, > mr-5028-branch1.patch, MR-5028_testapp.patch, mr-5028-trunk.patch, > mr-5028-trunk.patch, mr-5028-trunk.patch, repro-mr-5028.patch > > > Verified the problem exists on branch-1 with the following configuration: > Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, > io.sort.mb=1280, dfs.block.size=2147483648 > Run teragen to generate 4 GB data > Maps fail when you run wordcount on this configuration with the following > error: > {noformat} > java.io.IOException: Spill failed > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692) > at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at > org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45) > at > org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) > at > org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116) > at > org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) > at > org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira