>
> There does appear to be an SMP race in brw_page() which can cause
> this - end_buffer_io_async() unlocks the page, try_to_free_buffers()
> zaps the buffer_head ring and brw_page() gets a null pointer. But
> gee, it's unlikely unless you have super-fast disks and/or something
> which has a super-slow interrupt routine.
>
> Could you please provide a description of your hardware lineup?
>
> And could you please test 2.4.6 with this patch?
>
> --- linux-2.4.6/fs/buffer.c Wed Jul 4 18:21:31 2001
> +++ lk-ext3/fs/buffer.c Fri Jul 6 18:25:00 2001
> @@ -2181,8 +2181,9 @@ int brw_page(int rw, struct page *page,
>
> /* Stage 2: start the IO */
> do {
> + struct buffer_head *next = bh->b_this_page;
> submit_bh(rw, bh);
> - bh = bh->b_this_page;
> + bh = next;
> } while (bh != head);
> return 0;
> }
Howzit Andrew,
OK, I'll give the patch a try. I'll only be able to provide feedback
after about 12-24 hours though, depending on when I can reboot.
Hardware is pretty standard stuff:
Dual CPU Pentium II 233Mhz. 128MB RAM, Gigabyte motherboard (circa
1998), 20Gb IDE disks, realtek 8139 100Mb card. Pretty std stuff.
What's weird though is that this oops doesn't occur on our several
*very* busy clustered cache servers (running squid) - only one task
though (ie, squid/diskd/dnsserver), whereas the problem machines run
various apps.
Cheers
Henry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/