Hi
I have been trying to setup a map reduce job with hadoop 0.20.203.1.
Scenario :
My mapper is writing key value pairs where I have total 13 types of keys and
corresponding value classes.
For each input record I write all these i.e 13 key-val pair to context.
My combiner and reducer are doing the same thing.
Issue :
My job is running fine when I don't use a combiner.
But when I run with combiner , I am getting EOFException.
java.io.EOFException
at java.io.DataInputStream.readUnsignedShort(Unknown Source)
at java.io.DataInputStream.readUTF(Unknown Source)
at java.io.DataInputStream.readUTF(Unknown Source)
at
com.guavus.mapred.common.collection.ValueCollection.readFieldsLong(ValueCollection.java:40)
at
com.guavus.mapred.common.collection.ValueCollection.readFields(ValueCollection.java:21)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at
org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
at
org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
at
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1420)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:852)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1343)
My Finding :
On checking and debugging what I got was that the combiner reads the key
successfully but while trying to read the values it gives EOFException because
it doesn't find anything in DataInput stream. Also this is occurring when data
is large and combiner runs more than once.
I have noticed that the combiner is failing to get the value for this key when
running for the 2nd time . (I read somewhere that combiner begins when the some
amount of data has been written by mapper even though mapper is still writing
data to context).
I verified many times that my mapper is writing no null value. The issue looks
really strange because combiner is able to read the key but doesn't get any
value in data stream.
There is some issue with combiner as it is running fine when I don't use a
combiner. I also tried to set the combiner class to the same class which is my
reducer class but still the issue occured.
Please suggest what could be the root cause for this or what can I do to track
the root cause.
Regards,
Arpit Wanchoo