[ http://issues.apache.org/jira/browse/HADOOP-532?page=all ]
Owen O'Malley updated HADOOP-532: --------------------------------- Attachment: seqfile-underread-check.patch The compression codec is not reading the entire value buffer, but it is getting the correct value. (I suspect the unread bytes are a crc.) This error message is the SequenceFile complaining that the entire buffer was not used. This patch: 1. extends the unit test to use bigger values so that we detect the problem 2. allows the user of the org.apache.hadoop.io.TestSequenceFile main program to control the random seed (and prints out the seed value, even if it is random). 3. check that the stream is done by trying to read the next byte on the input stream. 4. removes some redundant buffering of the already buffered value stream. 5. marks the start of the value in non-block compressed sequence files and does a reset at the front of getCurrentValue. > Writable underrun in sort example > --------------------------------- > > Key: HADOOP-532 > URL: http://issues.apache.org/jira/browse/HADOOP-532 > Project: Hadoop > Issue Type: Bug > Components: io > Affects Versions: 0.6.1 > Reporter: Owen O'Malley > Assigned To: Owen O'Malley > Fix For: 0.6.2 > > Attachments: seqfile-underread-check.patch > > > When running the sort benchmark, I get consistent failures of this sort: > java.lang.RuntimeException: java.io.IOException: [EMAIL PROTECTED] read 2048 > bytes, should read 2052 at > org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:150) > at > org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:39) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:271) at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1066) Caused > by: java.io.IOException: [EMAIL PROTECTED] read 2048 bytes, should read 2052 > at > org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1163) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1239) at > org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:181) > at > org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:147) > ... 3 more -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira