[ https://issues.apache.org/jira/browse/HADOOP-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Nauroth reassigned HADOOP-145: ------------------------------------ Assignee: Chris Nauroth (was: Owen O'Malley) > io.skip.checksum.errors property clashes with > LocalFileSystem#reportChecksumFailure > ----------------------------------------------------------------------------------- > > Key: HADOOP-145 > URL: https://issues.apache.org/jira/browse/HADOOP-145 > Project: Hadoop Common > Issue Type: Bug > Components: io > Reporter: stack > Assignee: Chris Nauroth > > Below is from email to the dev list on Tue, 11 Apr 2006 14:46:09 -0700. > Checksum errors seem to be a fact of life given the hardware we use. They'll > often cause my jobs to fail so I have been trying to figure how to just skip > the bad records and files. At the end is a note where Stefan pointed me at > 'io.skip.checksum.errors'. This property, when set, triggers special > handling of checksum errors inside SequenceFile$Reader: If a checksum, try to > skip to next record. Only, this behavior can conflict with another checksum > handler that moves aside the problematic file whenever a checksum error is > found. Below is from a recent log. > 060411 202203 task_r_22esh3 Moving bad file > /2/hadoop/tmp/task_r_22esh3/task_m_e3chga.out to > /2/bad_files/task_m_e3chga.out.1707416716 > 060411 202203 task_r_22esh3 Bad checksum at 3578152. Skipping entries. > 060411 202203 task_r_22esh3 Error running child > 060411 202203 task_r_22esh3 java.nio.channels.ClosedChannelException > 060411 202203 task_r_22esh3 at > sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:89) > 060411 202203 task_r_22esh3 at > sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:276) > 060411 202203 task_r_22esh3 at > org.apache.hadoop.fs.LocalFileSystem$LocalFSFileInputStream.seek(LocalFileSystem.java:79) > 060411 202203 task_r_22esh3 at > org.apache.hadoop.fs.FSDataInputStream$Checker.seek(FSDataInputStream.java:67) > 060411 202203 task_r_22esh3 at > org.apache.hadoop.fs.FSDataInputStream$PositionCache.seek(FSDataInputStream.java:164) > 060411 202203 task_r_22esh3 at > org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java:193) > 060411 202203 task_r_22esh3 at > org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:243) > 060411 202203 task_r_22esh3 at > org.apache.hadoop.io.SequenceFile$Reader.seek(SequenceFile.java:420) > 060411 202203 task_r_22esh3 at > org.apache.hadoop.io.SequenceFile$Reader.sync(SequenceFile.java:431) > 060411 202203 task_r_22esh3 at > org.apache.hadoop.io.SequenceFile$Reader.handleChecksumException(SequenceFile.java:412) > 060411 202203 task_r_22esh3 at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:389) > 060411 202203 task_r_22esh3 at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:209) > 060411 202203 task_r_22esh3 at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:709) > (Ignore line numbers. My code is a little different from main because I've > other debugging code inside in SequenceFile. Otherwise I'm running w/ head > of hadoop). > The SequenceFile$Reader#handleChecksumException is trying to skip to next > record but the file has been closed by the move-aside. > On the list there is some discussion on merit of moving aside file when bad > checksum found. I've trying to test what happens if we leave the file in > place but haven't had a checksum error in a while. > Opening this issue so place to fill in experience as we go. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira