On 8/25/07, Mr. James W. Laferriere <[EMAIL PROTECTED]> wrote:
> Hello Dan ,
>
> On Mon, 20 Aug 2007, Dan Williams wrote:
> > On 8/18/07, Mr. James W. Laferriere <[EMAIL PROTECTED]> wrote:
> >> Hello All , Here we go again . Again attempting to do bonnie++
> >> testing
> >> on a small array .
> >> Kernel 2.6.22.1
> >> Patches involved ,
> >> IOP1 , 2.6.22.1-iop1 for improved sequential write performance
> >> (stripe-queue) , Dan Williams <[EMAIL PROTECTED]>
> >
> > Hello James,
> >
> > Thanks for the report.
> >
> > I tried to reproduce this on my system, no luck.
> Possibly because there is significant hardware differances ?
> See 'lspci -v' below .sig .
>
> > However it looks
> > like their is a potential race between 'handle_queue' and
> > 'add_queue_bio'. The attached patch moves these critical sections
> > under spin_lock(&sq->lock), and adds some debugging output if this BUG
> > triggers. It also includes a fix for retry_aligned_read which is
> > unrelated to this debug.
> > --
> > Dan
> Applied your patch . The same 'kernel BUG at
> drivers/md/raid5.c:3689!'
> messages appear (see attached) . The system is still responsive with your
> patch , the kernel crashed last time . Tho the bonnie++ run is stuck in 'D'
> .
> And doing a '> /md3/asdf' stays hung even after passing the parent process a
> 'kill -9' .
> Any further info You can think of I can/should , I will try to
> acquire
> . But I'll have to repeat these steps to attempt to get the same results .
> I'll be shutting the system down after sending this off .
> Fyi , the previous 'BUG" without your patch was quite repeatable .
> I might have time over the next couple of weeks to be able to see if
> it
> is as repatable as the last one .
>
> Contents of /proc/mdstat for md3 .
>
> md3 : active raid6 sdx1[3] sdw1[2] sdv1[1] sdu1[0] sdt1[7](S) sds1[6] sdr1[5]
> sdq1[4]
> 717378560 blocks level 6, 1024k chunk, algorithm 2 [7/7] [UUUUUUU]
> bitmap: 2/137 pages [8KB], 512KB chunk
>
> Commands I ran that lead to the 'BUG' .
>
> bonniemd3() { /root/bonnie++-1.03a/bonnie++ -u0:0 -d /md3 -s 131072 -f; }
> bonniemd3 > 131072MB-bonnie++-run-md3-xfs.log-20070825 2>&1 &
>
Ok, the 'bitmap' and 'raid6' details were the missing pieces of my
testing. I can now reproduce this bug in handle_queue. I'll keep you
posted on what I find.
Thank you for tracking this.
Regards,
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html