On Tuesday December 12, [EMAIL PROTECTED] wrote:
> 
> 
> On Tue, 12 Dec 2000, Jasper Spaans wrote:
> 
> > On Mon, Dec 11, 2000 at 06:52:55PM -0800, Linus Torvalds wrote:
> > > 
> > > Ok, there it is. Noticeable changes from pre8 are mainly (a) new tq list
> > > compile fixes and (b) the NetApp snapshot thing.
> > 
> > >  - final:
> > >     - Neil Brown: raid and md cleanups
> > 
> > Hmm, while doing some not-so-heavy things with a mysql-db on a raid5-device
> > this kernel Oopsed on me; ksymoops output [which went through klogd,
> > shouldn't matter that much, klogd was using the right System.map]:
> > 
> > Dec 12 14:04:50 spaans kernel: invalid operand: 0000
> > Dec 12 14:04:50 spaans kernel: CPU:    1
> > Dec 12 14:04:50 spaans kernel: EIP:    0010:[end_buffer_io_bad+85/92]
> >
> > Dec 12 14:04:50 spaans kernel: Call Trace:
> >                     [raid5_end_buffer_io+68/128]
> >                     [complete_stripe+151/272]
> >                     [handle_stripe+331/1092]
> >                     [raid5d+173/260]
> >                     [md_thread+299/508]
> 
> Looks like somebody didn't initialize the "b_end_io" pointer - the code
> defaults to it being "end_buffer_io_bad" (which oopses unconditionally on
> purpose exactly to find places where it wasn't initialized).
> 
> And it obviously looks like it's the raid5 code that does it.

Guilt by association :-)

What this bit of code (complete_stripe/raid5_end_buffer_io) is doing
is observing that it as completed some I/O request that was made of
the raid5 device and is calling the b_end_io on the buffer_head that
is was passed.  So it is not one of raid5's buffers that has the bad
b_end_io, but someone else's buffer that raid5 was asked to service.

You say "things with a mysql-db on a raid5-device".  Can I interpret
this to mean that mysql was talking driectly to /dev/md0, or is there
some filesystem in-between?
Either way, I expect Linus' suggestion will provide the answer.

NeilBrown


> 
> It _looks_ like the raid5 code does a "generic_make_request()" without
> setting b_end_io anywhere, but I don't know the raid5 code well enough.
> 
> To get better debug output, could you please do something for me? 
> 
> In fs/buffer.c, get rid of "end_buffer_io_bad" completely, and replace all
> users of it with NULL.
> 
> Then, in drivers/block/ll_rw_block.c: generic_make_request(), add a test
> like
> 
>       if (!bh->b_end_io) BUG();
> 
> to the top of that function.
> 
> You'll still get a oops, but the difference is that you'll get the oops
> when the request is queued, rather than when the requst is finished, which
> will make it easier to figure out what the thing is that leads up to this.
> 
> In the meantime I'm sure Neil can figure out where in the raid5 code we
> don't initialize the buffer head properly even without that, but it's
> worth doing the above anyway.
> 
>       Thanks,
> 
>               Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to