ChecksumFileSystem doesn't support hflush/sync()/etc -- so I can imagine if you kill -9 it while writing you'd get a truncated commit log, or even one where the last checksum chunk is incorrect.
Maybe best to run this test against a pseudo-distributed HDFS? Or RawLocalFileSystem? -Todd On Thu, Dec 1, 2011 at 10:58 PM, Mikhail Bautin <[email protected]> wrote: > @Stack: I am using hadoop-0.20.205.0 (the default Hadoop version from > pom.xml). There is a private getFileLength() method, but getMethod() does > not allow to retrieve it. We should use getDeclaredMethod() -- this appears > to work in my testing. I will include that fix in the HBaseClusterTest > diff. Not sure why no one saw this bug before. > > @Dhruba: I am running RestartMetaTest, which I am porting from 0.89-fb. > This is a test that starts a local HBase cluster as multiple processes (on > different ports), loads some data, and does a real kill -9 on the > regionserver serving meta. I saw this bug in the data loading part, not > because of killing the regionserver. > > Thanks, > --Mikhail > > On Thu, Dec 1, 2011 at 10:33 PM, Stack <[email protected]> wrote: > >> On Thu, Dec 1, 2011 at 9:59 PM, Mikhail Bautin >> <[email protected]> wrote: >> > 11/12/01 21:40:07 WARN wal.SequenceFileLogReader: Error while trying to >> get >> > accurate file length. Truncation / data loss may occur if RegionServers >> > die. >> > java.lang.NoSuchMethodException: >> > >> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.getFileLength() >> >> Your hadoop doesn't have this Mikhail? >> >> > Besides, even when it works, the reflection-based solution is probably >> > _much_ slower than straightforward method access. Should we create a >> Hadoop >> > patch to expose the appropriate API call and get rid of the reflection >> hack? >> > >> >> It would be the sensible thing to do; much more sensible than the >> reflection gymnastics we have going on here. >> >> St.Ack >> -- Todd Lipcon Software Engineer, Cloudera
