On Monday February 5, [EMAIL PROTECTED] wrote:
> On Mon, 05 Feb 2007 20:08:39 -0800 "Kai" <[EMAIL PROTECTED]> wrote:
> 
> You hit two bugs.  It seems that raid5 is submitting BIOs which are larger
> than the device can accept.  In response someone (probably the block layer)
> caused a page to come unlocked twice, possibly by running bi_end_io twice
> against the same BIO.

At least two bugs... there should be a prize for that :-)

Raid5 was definitely submitting a bio that was too big for the device,
and then when it got an error and went to try it the old-fashioned way
(lots of little Bi's through the stripe-cache) it messed up.
Whether that is what trigger the double-unlock I'm not yet sure.

This patch should fix the worst of the offences, but I'd like to
experiment and think a bit more before I submit it to stable.
And probably test it too - as yet I have only compile and brain
tested.

What is the chunk-size on your raid5?  Presumably at least 128k ?

NeilBrown



### Diffstat output
 ./drivers/md/raid5.c |   40 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c    2007-02-06 16:16:39.000000000 +1100
+++ ./drivers/md/raid5.c        2007-02-06 16:20:57.000000000 +1100
@@ -2669,6 +2669,27 @@ static int raid5_align_endio(struct bio 
        return 0;
 }
 
+static int bio_fits_rdev(struct bio *bi)
+{
+       request_queue_t *q = bdev_get_queue(bi->bi_bdev);
+
+       if ((bi->bi_size>>9) > q->max_sectors)
+               return 0;
+       blk_recount_segments(q, bi);
+       if (bi->bi_phys_segments > q->max_phys_segments ||
+           bi->bi_hw_segments > q->max_hw_segments)
+               return 0;
+
+       if (q->merge_bvec_fn)
+               /* it's too hard to apply the merge_bvec_fn at this stage,
+                * just just give up
+                */
+               return 0;
+
+       return 1;
+}
+
+
 static int chunk_aligned_read(request_queue_t *q, struct bio * raid_bio)
 {
        mddev_t *mddev = q->queuedata;
@@ -2715,6 +2736,13 @@ static int chunk_aligned_read(request_qu
                align_bi->bi_flags &= ~(1 << BIO_SEG_VALID);
                align_bi->bi_sector += rdev->data_offset;
 
+               if (!bio_fits_rdev(align_bi)) {
+                       /* too big in some way */
+                       bio_put(align_bi);
+                       rdev_dec_pending(rdev, mddev);
+                       return 0;
+               }
+
                spin_lock_irq(&conf->device_lock);
                wait_event_lock_irq(conf->wait_for_stripe,
                                    conf->quiesce == 0,
@@ -3107,7 +3135,9 @@ static int  retry_aligned_read(raid5_con
        last_sector = raid_bio->bi_sector + (raid_bio->bi_size>>9);
 
        for (; logical_sector < last_sector;
-            logical_sector += STRIPE_SECTORS, scnt++) {
+            logical_sector += STRIPE_SECTORS,
+                    sector += STRIPE_SECTORS,
+                    scnt++) {
 
                if (scnt < raid_bio->bi_hw_segments)
                        /* already done this stripe */
@@ -3123,7 +3153,13 @@ static int  retry_aligned_read(raid5_con
                }
 
                set_bit(R5_ReadError, &sh->dev[dd_idx].flags);
-               add_stripe_bio(sh, raid_bio, dd_idx, 0);
+               if (!add_stripe_bio(sh, raid_bio, dd_idx, 0)) {
+                       release_stripe(sh);
+                       raid_bio->bi_hw_segments = scnt;
+                       conf->retry_read_aligned = raid_bio;
+                       return handled;
+               }
+
                handle_stripe(sh, NULL);
                release_stripe(sh);
                handled++;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to