On Sat, Jul 19, 2008 at 2:27 PM, Matthew Dillon <[EMAIL PROTECTED]> wrote: > > :I've been looking at the HAMMER code a bit. It seems the mount will > :hang the kernel at recovery time if the tail of a undo record contains > :a zero size. I've been told the filesystem is implicitly trusted, but > :I think a failed assert would be better than the stuck while loop. > : > :I have a small disk image to illustrate the hang at: > :http://leaf.dragonflybsd.org/~dion/hammer.small.bz2 > : > :This obviously isn't a high priority, but I'm interested in hearing > :opinions on it (does this kind of bug interest us?). > : > :-- Dion > > I'm assuming you just poked the bits in the on-media UNDO FIFO to > create the failure condition and it isn't a bug per-say, right?
Definitely. I was just making sure my understanding of the recovery code was correct. The disk was not "organic"-ly constructed. > > I think an assertion is fine, or even just have the mount return > a failure. Would you like to code up your patch suggestion? We > can commit it after the release. I have a patch up at: http://leaf.dragonflybsd.org/~dion/hammer-mount-badundo.patch It consists of two small changes: - Check that the tail_size is reported at least the size of a tail fifo structure (instead of at least 0) -- this will cause an EIO instead of a loop or panic. - If an error occured in hammer_recover, an io lock leak caused a panic. I now skip the (last) flush if an error occured during mount. This seems safe -- doesn't matter too much, you're screwed at this point. -- Dion > > -Matt > Matthew Dillon > <[EMAIL PROTECTED]> >
