Re: [patch v2 1/1] md: Software Raid autodetect dev list not array
On 8/26/07, Kyle Moffett <[EMAIL PROTECTED]> wrote: > On Aug 26, 2007, at 08:20:45, Michael Evans wrote: > > Also, I forgot to mention, the reason I added the counters was > > mostly for debugging. However they're also as useful in the same > > way that listing the partitions when a new disk is added can be (in > > fact this augments that and the existing messages the autodetect > > routines provide). > > > > As for using autodetect or not... the only way to skip it seems to > > be compiling md's raid support as a module. I checked 2.6.22's > > menuconfig and there's no way for me to explicitly turn it on or > > off at compile time. I also feel that forcing the addition of a > > boot parameter to de-activate a broken and deprecated system you > > aren't even aware you are getting is somehow wrong. So if you have > > over 128 devices for it to scan, as I do on one of my PCs, then it > > can bring up > > an array in degraded mode. ... crud. > > Well, you could just change the MSDOS disk label to use a different > "Partition Type" for your raid partitions. Just pick the standard > "Linux" type and you will get exactly the same behavior that > everybody who doesn't use MSDOS partition tables gets. > > Cheers, > Kyle Moffett > > I recall most of the guides I referenced during setup having me change the partition type, additionally parted only calls the flag 'raid' not 'raid autodetect'. However it would still be confusing to anyone not intimately familiar with the subsystem. Also, even though the system has a standard PC BIOS, I liked some of the specifications of the GUID partition table (GPT) provided. Namely a checksum and backup copy, and up to 128 partitions per disk. Parted is the only real tool I could find for editing such a disk label. A problem I experienced that is almost completely unrelated to this patch are other 'magic number' assumptions. It is rather unfortunate that linux allocates a fixed (and very small) number of partitions per scsi disk. A better method might be a complete dis-association of major:minor pair to device name, and instead simply enumerating partitions as they are detected. That way if I choose to have 128 drives but each having at most 2 partitions I still easily fit within one major, or if I choose the opposite (for whatever reason) I still come to the same conclusion. Before anyone mentions using LVM instead, sharing operating systems, and using different partitions for different raid stripe sizes/rebuilding flexibility/restriping flexibility are just two good reasons I can think of for supporting more then 15 partitions per device. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch for boot-time assembly of v1.x-metadata-based soft (MD) arrays: reasoning and future plans
Dan Williams wrote: On 8/26/07, Abe Skolnik <[EMAIL PROTECTED]> wrote: Because you can rely on the configuration file to be certain about which disks to pull in and which to ignore. Without the config file the auto-detect routine may not always do the right thing because it will need to make assumptions. But kernel parameters can provide the same data, no? After all, it is Yes, you can get a similar effect of the config file by adding parameters to the kernel command line. My only point is that if the initramfs update tools were as simple as: mkinitrd "root=/dev/md0 md=0,v1,/dev/sda1,/dev/sdb1,/dev/sdc1" ...then using an initramfs becomes the same amount of work as editing /etc/grub.conf. I'm not sure if you're aware of this so in the spirit of being helpful I thought I'd point out that this has been discussed quite extensively in the past. You may want to take a look at the list archives. It's quite complex and whilst I too would like autodetect to continue to work (I've been 'bitten' by the new superblocks not autodetecting) I accept the arguments of those wiser than me. I agree that the problem is one of old dogs and the new (!) initram trick. Look at it as an opportunity to learn more about a cool capability. David - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
md raid acceleration and the async_tx api
Hello, I tested the h/w accelerated RAID-5 using the kernel with PAGE_SIZE set to 64KB and found the bonnie++ application hangs-up during the "Re-writing" test. I made some investigations and discovered that the hang-up occurs because one of the mpage_end_io_read() calls is missing (these are the callbacks initiated from the ops_complete_biofill() function). The fact is that my low-level ADMA driver (the ppc440spe one) successfully initiated the ops_complete_biofill() callback but the ops_complete_biofill() function itself skipped calling the bi_end_io() handler of the completed bio (current dev->read) because during processing of this (current dev->read) bio some other request had come to the sh (current dev_q->toread). Thus ops_complete_biofill() scheduled another biofill operation which, as a result, overwrote the unacknowledged bio (dev->read in ops_run_biofill()), and so we lost the previous dev->read bio completely. Here is a patch that solves this problem. Perhaps this might be implemented in some more elegant and effective way. What are your thoughts regarding this? Regards, Yuri diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 08b4893..7abc96b 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -838,11 +838,24 @@ static void ops_complete_biofill(void *stripe_head_ref) /* acknowledge completion of a biofill operation */ /* and check if we need to reply to a read request */ - if (test_bit(R5_Wantfill, &dev_q->flags) && !dev_q->toread) { + if (test_bit(R5_Wantfill, &dev_q->flags)) { struct bio *rbi, *rbi2; struct r5dev *dev = &sh->dev[i]; - clear_bit(R5_Wantfill, &dev_q->flags); + /* There is a chance that another fill operation +* had been scheduled for this dev while we +* processed sh. In this case do one of the following +* alternatives: +* - if there is no active completed biofill for the dev +* then go to the next dev leaving Wantfill set; +* - if there is active completed biofill for the dev +* then ack it but leave Wantfill set. +*/ + if (dev_q->toread && !dev->read) + continue; + + if (!dev_q->toread) + clear_bit(R5_Wantfill, &dev_q->flags); /* The access to dev->read is outside of the * spin_lock_irq(&conf->device_lock), but is protected - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 Reshape gone wrong, please help
On Thursday August 23, [EMAIL PROTECTED] wrote: > > > OK I've reproduced the original issue on a seperate box. > 2.6.23-rc3 does not like to grow Raid 5 arrays. MDadm 2.6.3 No, you are right. It doesn't. Obviously insufficient testing and review - thanks for find it for us. This patch seems to make it work - raid5 and raid6. Dan: Could you check it for me, particularly the moving of + async_tx_ack(tx); + dma_wait_for_async_tx(tx); outside of the loop. Greg: could you pleas check it works for you too - it works for me, but double-testing never hurts. Thanks again, NeilBrown - Fix some bugs with growing raid5/raid6 arrays. ### Diffstat output ./drivers/md/raid5.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c2007-08-24 16:36:22.0 +1000 +++ ./drivers/md/raid5.c2007-08-27 20:50:57.0 +1000 @@ -2541,7 +2541,7 @@ static void handle_stripe_expansion(raid struct dma_async_tx_descriptor *tx = NULL; clear_bit(STRIPE_EXPAND_SOURCE, &sh->state); for (i = 0; i < sh->disks; i++) - if (i != sh->pd_idx && (r6s && i != r6s->qd_idx)) { + if (i != sh->pd_idx && (!r6s || i != r6s->qd_idx)) { int dd_idx, pd_idx, j; struct stripe_head *sh2; @@ -2574,7 +2574,8 @@ static void handle_stripe_expansion(raid set_bit(R5_UPTODATE, &sh2->dev[dd_idx].flags); for (j = 0; j < conf->raid_disks; j++) if (j != sh2->pd_idx && - (r6s && j != r6s->qd_idx) && + (!r6s || j != raid6_next_disk(sh2->pd_idx, +sh2->disks)) && !test_bit(R5_Expanded, &sh2->dev[j].flags)) break; if (j == conf->raid_disks) { @@ -2583,12 +2584,12 @@ static void handle_stripe_expansion(raid } release_stripe(sh2); - /* done submitting copies, wait for them to complete */ - if (i + 1 >= sh->disks) { - async_tx_ack(tx); - dma_wait_for_async_tx(tx); - } } + /* done submitting copies, wait for them to complete */ + if (tx) { + async_tx_ack(tx); + dma_wait_for_async_tx(tx); + } } /* @@ -2855,7 +2856,7 @@ static void handle_stripe5(struct stripe sh->disks = conf->raid_disks; sh->pd_idx = stripe_to_pdidx(sh->sector, conf, conf->raid_disks); - s.locked += handle_write_operations5(sh, 0, 1); + s.locked += handle_write_operations5(sh, 1, 1); } else if (s.expanded && !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) { clear_bit(STRIPE_EXPAND_READY, &sh->state); - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [md-accel PATCH 16/19] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines
Hi Dan, > +static dma_cookie_t > +iop_adma_tx_submit(struct dma_async_tx_descriptor *tx) > +{ > + > + old_chain_tail = list_entry(iop_chan->chain.prev, > + struct iop_adma_desc_slot, chain_node); > + list_splice_init(&sw_desc->group_list, &old_chain_tail->chain_node); > + > + /* fix up the hardware chain */ > + iop_desc_set_next_desc(old_chain_tail, grp_start->phys); > + - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [md-accel PATCH 16/19] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines
Hi Dan, I think you have a bug in this function, the list_splice_init adds the new slots in the head of the chain_node, but you get the old_chain_tail (latest descriptor) from the tail of the chain!! > +static dma_cookie_t > +iop_adma_tx_submit(struct dma_async_tx_descriptor *tx) > +{ > + > + old_chain_tail = list_entry(iop_chan->chain.prev, > + struct iop_adma_desc_slot, chain_node); > + list_splice_init(&sw_desc->group_list, &old_chain_tail->chain_node); > + > + /* fix up the hardware chain */ > + iop_desc_set_next_desc(old_chain_tail, grp_start->phys); > + - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: recovery starts from 0 after reboot - normal?
Tomasz Chmielewski schrieb: Justin Piszcz schrieb: According to the fine manual, BITMAP CHANGES belong to the grow mode. So, let's try to do what the manual says - try to add a bitmap to the active array: # mdadm --grow /dev/md0 --bitmap=internal mdadm: failed to set internal bitmap. # dmesg md: couldn't update array info. -16 So, either I don't understand the manual, or there are more caveats in it. (...) Check: http://lkml.org/lkml/2007/6/17/235 It's an external bitmap; I'd prefer an internal one, as my root filesystem is on a flash-IDE disk. However, let's try the external bitmap: # mdadm --grow /dev/md0 --bitmap=/bitmapfile mdadm: Cannot set bitmap file for /dev/md0: Device or resource busy So, it also fails. Perhaps because the array still rebuilds? It looks it only works when the array is in a clean state (or, not rebuilding anymore). Just for reference: # mdadm --grow --bitmap=/bitmapfile /dev/md0 # dmesg -c md0: bitmap file is out of date (0 < 43130) -- forcing full recovery md0: bitmap file is out of date, doing full recovery md0: bitmap initialized from disk: read 24/24 pages, set 763103 bits, status: 0 created bitmap (373 pages) for device md0 # mdadm --grow /dev/md0 --bitmap=internal mdadm: /dev/md0 already has a bitmap (/bitmapfile) # mdadm --grow /dev/md0 --bitmap=none # mdadm --grow /dev/md0 --bitmap=internal # dmesg -c md0: bitmap file is out of date (0 < 43130) -- forcing full recovery md0: bitmap file is out of date, doing full recovery md0: bitmap initialized from disk: read 12/12 pages, set 381552 bits, status: 0 created bitmap (187 pages) for device md0 What are pages/bits in bitmap? It seems to differ, whether internal or external is used. -- Tomasz Chmielewski http://wpkg.org - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid10 or raid1+0 ?
On Mon, 27 Aug 2007, "T. Eichstädt" wrote: Hallo all, I have 4 HDDs and I want to use mirroring and striping. I am wondering what difference between the following two solutions is: - raid0 on top of 2 raid1 devices (raid1+0) - directly using the raid10 module Perhaps someone can give me a hint what the raid10 linux module does in difference to the combination of raid1 and raid0. Another question: How stable is the raid10 module in the linux kernel. I am currently using the debian kernel 2.6.18 but I saw some patches from Neil and others for 2.6.21, 2.6.22 regarding the raid10 module. And their descriptions sound as if it could be useful to integrate them. when using raid10. Okay, backporting isn't complicated, but nevertheless the number of patches makes me feel that raid10 is perhaps not that stable like raid0 and raid1 ??? Perhaps someone can blow away my fear :) Best regards Thimo Eichstädt - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I have a few people who asked me this as well, RAID10 or similiar (SW). I am not so sure, with RAID1 you can have your root disks on it and boot from it using LILO/GRUB and it is proven pretty stable; can the same be said about RAID10?
RE: Raid5 Reshape gone wrong, please help
> From: Neil Brown [mailto:[EMAIL PROTECTED] > On Thursday August 23, [EMAIL PROTECTED] wrote: > > > > > > OK I've reproduced the original issue on a seperate box. > > 2.6.23-rc3 does not like to grow Raid 5 arrays. MDadm 2.6.3 > > No, you are right. It doesn't. > > Obviously insufficient testing and review - thanks for find it for us. > Agreed - seconded. > This patch seems to make it work - raid5 and raid6. > > Dan: Could you check it for me, particularly the moving of > + async_tx_ack(tx); > + dma_wait_for_async_tx(tx); > outside of the loop. > Yes, this definitely needs to be outside the loop. > Greg: could you pleas check it works for you too - it works for me, > but double-testing never hurts. > > Thanks again, > > NeilBrown > > > > - > Fix some bugs with growing raid5/raid6 arrays. > > > > ### Diffstat output > ./drivers/md/raid5.c | 17 + > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c > --- .prev/drivers/md/raid5.c 2007-08-24 16:36:22.0 +1000 > +++ ./drivers/md/raid5.c 2007-08-27 20:50:57.0 +1000 > @@ -2541,7 +2541,7 @@ static void handle_stripe_expansion(raid > struct dma_async_tx_descriptor *tx = NULL; > clear_bit(STRIPE_EXPAND_SOURCE, &sh->state); > for (i = 0; i < sh->disks; i++) > - if (i != sh->pd_idx && (r6s && i != r6s->qd_idx)) { > + if (i != sh->pd_idx && (!r6s || i != r6s->qd_idx)) { > int dd_idx, pd_idx, j; > struct stripe_head *sh2; > > @@ -2574,7 +2574,8 @@ static void handle_stripe_expansion(raid > set_bit(R5_UPTODATE, &sh2->dev[dd_idx].flags); > for (j = 0; j < conf->raid_disks; j++) > if (j != sh2->pd_idx && > - (r6s && j != r6s->qd_idx) && > + (!r6s || j != raid6_next_disk(sh2->pd_idx, > + sh2->disks)) && > !test_bit(R5_Expanded, &sh2->dev[j].flags)) > break; > if (j == conf->raid_disks) { > @@ -2583,12 +2584,12 @@ static void handle_stripe_expansion(raid > } > release_stripe(sh2); > > - /* done submitting copies, wait for them to complete */ > - if (i + 1 >= sh->disks) { > - async_tx_ack(tx); > - dma_wait_for_async_tx(tx); > - } > } > + /* done submitting copies, wait for them to complete */ > + if (tx) { > + async_tx_ack(tx); > + dma_wait_for_async_tx(tx); > + } > } > > /* > @@ -2855,7 +2856,7 @@ static void handle_stripe5(struct stripe > sh->disks = conf->raid_disks; > sh->pd_idx = stripe_to_pdidx(sh->sector, conf, > conf->raid_disks); > - s.locked += handle_write_operations5(sh, 0, 1); > + s.locked += handle_write_operations5(sh, 1, 1); How about for clarity: s.locked += handle_write_operations5(sh, RECONSTRUCT_WRITE, 1); > } else if (s.expanded && > !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) { > clear_bit(STRIPE_EXPAND_READY, &sh->state); Signed-off-by: Dan Williams <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: md raid acceleration and the async_tx api
> From: Yuri Tikhonov [mailto:[EMAIL PROTECTED] > Hello, > > I tested the h/w accelerated RAID-5 using the kernel with PAGE_SIZE set to > 64KB and found the bonnie++ application hangs-up during the "Re-writing" > test. I made some investigations and discovered that the hang-up occurs > because one of the mpage_end_io_read() calls is missing (these are the > callbacks initiated from the ops_complete_biofill() function). > > The fact is that my low-level ADMA driver (the ppc440spe one) successfully > initiated the ops_complete_biofill() callback but the ops_complete_biofill() > function itself skipped calling the bi_end_io() handler of the completed bio > (current dev->read) because during processing of this (current dev->read) bio > some other request had come to the sh (current dev_q->toread). Thus > ops_complete_biofill() scheduled another biofill operation which, as a > result, overwrote the unacknowledged bio (dev->read in ops_run_biofill()), > and so we lost the previous dev->read bio completely. > > Here is a patch that solves this problem. Perhaps this might be implemented > in some more elegant and effective way. What are your thoughts regarding > this? > > Regards, Yuri > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 08b4893..7abc96b 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -838,11 +838,24 @@ static void ops_complete_biofill(void *stripe_head_ref) > /* acknowledge completion of a biofill operation */ > /* and check if we need to reply to a read request >*/ > - if (test_bit(R5_Wantfill, &dev_q->flags) && !dev_q->toread) { > + if (test_bit(R5_Wantfill, &dev_q->flags)) { > struct bio *rbi, *rbi2; > struct r5dev *dev = &sh->dev[i]; > > - clear_bit(R5_Wantfill, &dev_q->flags); > + /* There is a chance that another fill operation > + * had been scheduled for this dev while we > + * processed sh. In this case do one of the following > + * alternatives: > + * - if there is no active completed biofill for the dev > + * then go to the next dev leaving Wantfill set; > + * - if there is active completed biofill for the dev > + * then ack it but leave Wantfill set. > + */ > + if (dev_q->toread && !dev->read) > + continue; > + > + if (!dev_q->toread) > + clear_bit(R5_Wantfill, &dev_q->flags); > > /* The access to dev->read is outside of the >* spin_lock_irq(&conf->device_lock), but is protected This still looks racy... I think the complete fix is to make the R5_Wantfill and dev_q->toread accesses atomic. Please test the following patch (also attached) and let me know if it fixes what you are seeing: Applies on top of git://lost.foo-projects.org/~dwillia2/git/iop md-for-linus --- raid5: fix ops_complete_biofill race in the asynchronous offload case Protect against dev_q->toread toggling while testing and clearing the R5_Wantfill bit. This change prevents all asynchronous completions (tasklets) from running during handle_stripe. --- drivers/md/raid5.c | 16 +--- 1 files changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 2f9022d..91c14c6 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -824,6 +824,7 @@ static void ops_complete_biofill(void *stripe_head_ref) (unsigned long long)sh->sector); /* clear completed biofills */ + spin_lock(&sq->lock); for (i = sh->disks; i--; ) { struct r5dev *dev = &sh->dev[i]; struct r5_queue_dev *dev_q = &sq->dev[i]; @@ -861,6 +862,7 @@ static void ops_complete_biofill(void *stripe_head_ref) } clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack); clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending); + spin_unlock(&sq->lock); return_io(return_bi); @@ -2279,7 +2281,7 @@ static int add_queue_bio(struct stripe_queue *sq, struct bio *bi, int dd_idx, (unsigned long long)bi->bi_sector, (unsigned long long)sq->sector); - spin_lock(&sq->lock); + spin_lock_bh(&sq->lock); spin_lock_irq(&conf->device_lock); sh = sq->sh; if (forwrite) { @@ -2306,7 +2308,7 @@ static int add_queue_bio(struct stripe_queue *sq, struct bio *bi, int dd_idx, *bip = bi; bi->bi_phys_segments ++; spin_unlock_irq(&conf->device_lock); - spin_unlock(&sq->lock); + spin_unlock_bh(&sq->lock); pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n", (unsigned long long)bi->bi_sector, @@ -2339,7 +2341,7 @@ static in
RE: [md-accel PATCH 16/19] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines
> From: saeed bishara [mailto:[EMAIL PROTECTED] > Hi Dan, > I think you have a bug in this function, the list_splice_init adds the > new slots in the head of the chain_node, but you get the > old_chain_tail (latest descriptor) from the tail of the chain!! > > > +static dma_cookie_t > > +iop_adma_tx_submit(struct dma_async_tx_descriptor *tx) > > +{ > > + > > + old_chain_tail = list_entry(iop_chan->chain.prev, > > + struct iop_adma_desc_slot, chain_node); > > + list_splice_init(&sw_desc->group_list, &old_chain_tail- > >chain_node); > > + > > + /* fix up the hardware chain */ > > + iop_desc_set_next_desc(old_chain_tail, grp_start->phys); > > + [ dropped akpm, davem, jeff, and chris from the cc ] This looks ok to me. &sw_desc->group_list is the head of a new list of descriptors to be added to the chain. old_chain_tail is youngest descriptor at the end of the current channel chain. Before list_splice_init: old_chain_tail->next = channel_chain_head channel_chain_head->prev = old_chain_tail new_chain_start->prev = sw_desc->group_list; new_chain_end->next = sw_desc->group_list; After list_splice_init: new_chain_start->prev = old_chain_tail old_chain_tail->next = new_chain_start new_chain_end->next = channel_chain_head channel_chain_head->prev = new_chain_end sw_desc->group_list is empty -- Dan - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[2.6 patch] bitmap.h: remove dead artefacts
bitmap_active() no longer exists and BITMAP_ACTIVE is no longer used. Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> --- include/linux/raid/bitmap.h |2 -- 1 file changed, 2 deletions(-) f2c10857c60e13e63c67c656713b50b18f2bc1e4 diff --git a/include/linux/raid/bitmap.h b/include/linux/raid/bitmap.h index 75e17a0..306a1d1 100644 --- a/include/linux/raid/bitmap.h +++ b/include/linux/raid/bitmap.h @@ -138,7 +138,6 @@ typedef __u16 bitmap_counter_t; /* use these for bitmap->flags and bitmap->sb->state bit-fields */ enum bitmap_state { - BITMAP_ACTIVE = 0x001, /* the bitmap is in use */ BITMAP_STALE = 0x002, /* the bitmap file is out of date or had -EIO */ BITMAP_WRITE_ERROR = 0x004, /* A write error has occurred */ BITMAP_HOSTENDIAN = 0x8000, @@ -258,7 +257,6 @@ struct bitmap { int bitmap_create(mddev_t *mddev); void bitmap_flush(mddev_t *mddev); void bitmap_destroy(mddev_t *mddev); -int bitmap_active(struct bitmap *bitmap); char *file_path(struct file *file, char *buf, int count); void bitmap_print_sb(struct bitmap *bitmap); - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5:md3: kernel BUG , followed by , Silent halt .
On 8/25/07, Mr. James W. Laferriere <[EMAIL PROTECTED]> wrote: > Hello Dan , > > On Mon, 20 Aug 2007, Dan Williams wrote: > > On 8/18/07, Mr. James W. Laferriere <[EMAIL PROTECTED]> wrote: > >> Hello All , Here we go again . Again attempting to do bonnie++ > >> testing > >> on a small array . > >> Kernel 2.6.22.1 > >> Patches involved , > >> IOP1 , 2.6.22.1-iop1 for improved sequential write performance > >> (stripe-queue) , Dan Williams <[EMAIL PROTECTED]> > > > > Hello James, > > > > Thanks for the report. > > > > I tried to reproduce this on my system, no luck. > Possibly because there is significant hardware differances ? > See 'lspci -v' below .sig . > > > However it looks > > like their is a potential race between 'handle_queue' and > > 'add_queue_bio'. The attached patch moves these critical sections > > under spin_lock(&sq->lock), and adds some debugging output if this BUG > > triggers. It also includes a fix for retry_aligned_read which is > > unrelated to this debug. > > -- > > Dan > Applied your patch . The same 'kernel BUG at > drivers/md/raid5.c:3689!' > messages appear (see attached) . The system is still responsive with your > patch , the kernel crashed last time . Tho the bonnie++ run is stuck in 'D' > . > And doing a '> /md3/asdf' stays hung even after passing the parent process a > 'kill -9' . > Any further info You can think of I can/should , I will try to > acquire > . But I'll have to repeat these steps to attempt to get the same results . > I'll be shutting the system down after sending this off . > Fyi , the previous 'BUG" without your patch was quite repeatable . > I might have time over the next couple of weeks to be able to see if > it > is as repatable as the last one . > > Contents of /proc/mdstat for md3 . > > md3 : active raid6 sdx1[3] sdw1[2] sdv1[1] sdu1[0] sdt1[7](S) sds1[6] sdr1[5] > sdq1[4] >717378560 blocks level 6, 1024k chunk, algorithm 2 [7/7] [UUU] >bitmap: 2/137 pages [8KB], 512KB chunk > > Commands I ran that lead to the 'BUG' . > > bonniemd3() { /root/bonnie++-1.03a/bonnie++ -u0:0 -d /md3 -s 131072 -f; } > bonniemd3 > 131072MB-bonnie++-run-md3-xfs.log-20070825 2>&1 & > Ok, the 'bitmap' and 'raid6' details were the missing pieces of my testing. I can now reproduce this bug in handle_queue. I'll keep you posted on what I find. Thank you for tracking this. Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch v3 1/1] md: Software Raid autodetect dev list not array
From: Michael J. Evans <[EMAIL PROTECTED]> In current release kernels the md module (Software RAID) uses a static array (dev_t[128]) to store partition/device info temporarily for autostart. This patch replaces that static array with a list. Signed-off-by: Michael J. Evans <[EMAIL PROTECTED]> --- Sorry, it looks like I hit reply instead of reply to all yesterday. Version 3: md_autodetect_dev failure message is now more usefully verbose. removed unused i_found that was leftover from initial verification. Thank you Randy Dunlap for pointing where INT_MAX was, fixme fixed. Version 2: using list_add_tail, and corrected missing i_passed++;. removed sections of code that would never be reached. Version 1: The data/structures are only used within md.c, and very close together. However I wonder if the structural information shouldn't go in to... ../../include/linux/raid/md_k.h instead. I discovered this (and that the devices are added as disks/partitions are discovered at boot) while I was debugging why only one of my MD arrays would come up whole, while all the others were short a disk. I eventually discovered that it was enumerating through all of 9 of my 11 hds (2 had only 4 partitions apiece) while the other 9 have 15 partitions (I wanted 64 per drive...). The last partition of the 8th drive in my 9 drive raid 5 sets wasn't added, thus making the final md array short both a parity and data disk, and it was started later, elsewhere. Subject: [patch 1/1] md: Software Raid autodetect dev list not array SOFTWARE RAID (Multiple Disks) SUPPORT P: Ingo Molnar M: [EMAIL PROTECTED] P: Neil Brown M: [EMAIL PROTECTED] L: linux-raid@vger.kernel.org S: Supported Unless you have a reason NOT to do so, CC [EMAIL PROTECTED] 12: Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously enabled. It has been tested with CONFIG_SMP set and unset (Different x86_64 systems). It has been tested with CONFIG_PREEMPT set and unset (same system). CONFIG_LBD isn't even an option in my .config file. Note: between 2.6.22 and 2.6.23-rc3-git5 rdev = md_import_device(dev,0, 0); became rdev = md_import_device(dev,0, 90); So the patch has been edited to patch around that line. (might be fuzzy) Signed-off-by: Michael J. Evans <[EMAIL PROTECTED]> = --- linux/drivers/md/md.c.orig 2007-08-21 03:19:42.511576248 -0700 +++ linux/drivers/md/md.c 2007-08-21 04:30:09.775525710 -0700 @@ -24,4 +24,6 @@ + - autodetect dev list not array: Michael J. Evans <[EMAIL PROTECTED]> + This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) @@ -5752,13 +5754,26 @@ void md_autodetect_dev(dev_t dev) * Searches all registered partitions for autorun RAID arrays * at boot time. */ -static dev_t detected_devices[128]; -static int dev_cnt; + +static LIST_HEAD(all_detected_devices); +struct detected_devices_node { + struct list_head list; + dev_t dev; +}; void md_autodetect_dev(dev_t dev) { - if (dev_cnt >= 0 && dev_cnt < 127) - detected_devices[dev_cnt++] = dev; + struct detected_devices_node *node_detected_dev; + char strbuf[BDEVNAME_SIZE]; + + node_detected_dev = kzalloc(sizeof(*node_detected_dev), GFP_KERNEL);\ + if (node_detected_dev) { + node_detected_dev->dev = dev; + list_add_tail(&node_detected_dev->list, &all_detected_devices); + } else { + printk(KERN_CRIT "md: md_autodetect_dev: kzAlloc node failed" + " (null return), skipping dev(%d,%d)\n", MAJOR(dev), MINOR(dev)); + } } @@ -5765,7 +5760,12 @@ static void autostart_arrays(int part) static void autostart_arrays(int part) { mdk_rdev_t *rdev; - int i; + struct detected_devices_node *node_detected_dev; + dev_t dev; + int i_scanned, i_passed; + + i_scanned = 0; + i_passed = 0; printk(KERN_INFO "md: Autodetecting RAID arrays.\n"); @@ -5772,3 +5792,7 @@ static void autostart_arrays(int part) - for (i = 0; i < dev_cnt; i++) { - dev_t dev = detected_devices[i]; - + while (!list_empty(&all_detected_devices) && i_scanned < INT_MAX) { + i_scanned++; + node_detected_dev = list_entry(all_detected_devices.next, + struct detected_devices_node, list); + list_del(&node_detected_dev->list); + dev = node_detected_dev->dev; + kfree(node_detected_dev); @@ -5781,8 +5807,11 @@ static void autostart_arrays(int part) continue;
Re: [patch v3 1/1] md: Software Raid autodetect dev list not array
On Mon, 27 Aug 2007 15:16:21 -0700 Michael J. Evans wrote: > Note: between 2.6.22 and 2.6.23-rc3-git5 > rdev = md_import_device(dev,0, 0); > became > rdev = md_import_device(dev,0, 90); > So the patch has been edited to patch around that line. (might be fuzzy) so you should update the patch to the latest mainline. It's up to the (RAID) maintainer(s) if they want to merge a patch with fuzz. Andrew may fix it up. Linus wouldn't accept it with fuzz. > Signed-off-by: Michael J. Evans <[EMAIL PROTECTED]> > = > --- linux/drivers/md/md.c.orig2007-08-21 03:19:42.511576248 -0700 > +++ linux/drivers/md/md.c 2007-08-21 04:30:09.775525710 -0700 > @@ -24,4 +24,6 @@ > > + - autodetect dev list not array: Michael J. Evans <[EMAIL PROTECTED]> > + Nowadays we use an SCM for such comments, not the source file(s). > This program is free software; you can redistribute it and/or modify > it under the terms of the GNU General Public License as published by > the Free Software Foundation; either version 2, or (at your option) > @@ -5752,13 +5754,26 @@ void md_autodetect_dev(dev_t dev) > * Searches all registered partitions for autorun RAID arrays > * at boot time. > */ > -static dev_t detected_devices[128]; > -static int dev_cnt; > + > +static LIST_HEAD(all_detected_devices); > +struct detected_devices_node { > + struct list_head list; > + dev_t dev; > +}; > > void md_autodetect_dev(dev_t dev) > { > - if (dev_cnt >= 0 && dev_cnt < 127) > - detected_devices[dev_cnt++] = dev; > + struct detected_devices_node *node_detected_dev; > + char strbuf[BDEVNAME_SIZE]; > + > + node_detected_dev = kzalloc(sizeof(*node_detected_dev), GFP_KERNEL);\ Drop the trailing '\', as someone has already commented on. > + if (node_detected_dev) { > + node_detected_dev->dev = dev; > + list_add_tail(&node_detected_dev->list, &all_detected_devices); > + } else { > + printk(KERN_CRIT "md: md_autodetect_dev: kzAlloc node failed" > + " (null return), skipping dev(%d,%d)\n", MAJOR(dev), > MINOR(dev)); printk() formatting is bad. Drop the " (null return)" and indent that line more than the printk line is indented. > + } > } > > --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch v4 1/1] md: Software Raid autodetect dev list not array
From: Michael J. Evans <[EMAIL PROTECTED]> In current release kernels the md module (Software RAID) uses a static array (dev_t[128]) to store partition/device info temporarily for autostart. This patch replaces that static array with a list. Signed-off-by: Michael J. Evans <[EMAIL PROTECTED]> --- Version 4: Minor formatting cleanup for kzAlloc error message. Should I remove the first patch section in version 5? Version 3: md_autodetect_dev failure message is now more usefully verbose. removed unused i_found that was leftover from initial verification. Thank you Randy Dunlap for pointing where INT_MAX was, fixme fixed. Version 2: using list_add_tail, and corrected missing i_passed++;. removed sections of code that would never be reached. Version 1: The data/structures are only used within md.c, and very close together. However I wonder if the structural information shouldn't go in to... ../../include/linux/raid/md_k.h instead. I discovered this (and that the devices are added as disks/partitions are discovered at boot) while I was debugging why only one of my MD arrays would come up whole, while all the others were short a disk. I eventually discovered that it was enumerating through all of 9 of my 11 hds (2 had only 4 partitions apiece) while the other 9 have 15 partitions (I wanted 64 per drive...). The last partition of the 8th drive in my 9 drive raid 5 sets wasn't added, thus making the final md array short both a parity and data disk, and it was started later, elsewhere. Subject: [patch 1/1] md: Software Raid autodetect dev list not array SOFTWARE RAID (Multiple Disks) SUPPORT P: Ingo Molnar M: [EMAIL PROTECTED] P: Neil Brown M: [EMAIL PROTECTED] L: linux-raid@vger.kernel.org S: Supported Unless you have a reason NOT to do so, CC [EMAIL PROTECTED] 12: Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously enabled. It has been tested with CONFIG_SMP set and unset (Different x86_64 systems). It has been tested with CONFIG_PREEMPT set and unset (same system). CONFIG_LBD isn't even an option in my .config file. Note: between 2.6.22 and 2.6.23-rc3-git5 rdev = md_import_device(dev,0, 0); became rdev = md_import_device(dev,0, 90); So the patch has been edited to patch around that line. Signed-off-by: Michael J. Evans <[EMAIL PROTECTED]> = --- linux/drivers/md/md.c.orig 2007-08-21 03:19:42.511576248 -0700 +++ linux/drivers/md/md.c 2007-08-21 04:30:09.775525710 -0700 @@ -24,4 +24,6 @@ + - autodetect dev list not array: Michael J. Evans <[EMAIL PROTECTED]> + This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) @@ -5752,13 +5754,26 @@ void md_autodetect_dev(dev_t dev) * Searches all registered partitions for autorun RAID arrays * at boot time. */ -static dev_t detected_devices[128]; -static int dev_cnt; + +static LIST_HEAD(all_detected_devices); +struct detected_devices_node { + struct list_head list; + dev_t dev; +}; void md_autodetect_dev(dev_t dev) { - if (dev_cnt >= 0 && dev_cnt < 127) - detected_devices[dev_cnt++] = dev; + struct detected_devices_node *node_detected_dev; + char strbuf[BDEVNAME_SIZE]; + + node_detected_dev = kzalloc(sizeof(*node_detected_dev), GFP_KERNEL); + if (node_detected_dev) { + node_detected_dev->dev = dev; + list_add_tail(&node_detected_dev->list, &all_detected_devices); + } else { + printk(KERN_CRIT "md: md_autodetect_dev: kzAlloc node failed" + ", skipping dev(%d,%d)\n", MAJOR(dev), MINOR(dev)); + } } @@ -5765,7 +5760,12 @@ static void autostart_arrays(int part) static void autostart_arrays(int part) { mdk_rdev_t *rdev; - int i; + struct detected_devices_node *node_detected_dev; + dev_t dev; + int i_scanned, i_passed; + + i_scanned = 0; + i_passed = 0; printk(KERN_INFO "md: Autodetecting RAID arrays.\n"); @@ -5772,3 +5792,7 @@ static void autostart_arrays(int part) - for (i = 0; i < dev_cnt; i++) { - dev_t dev = detected_devices[i]; - + while (!list_empty(&all_detected_devices) && i_scanned < INT_MAX) { + i_scanned++; + node_detected_dev = list_entry(all_detected_devices.next, + struct detected_devices_node, list); + list_del(&node_detected_dev->list); + dev = node_detected_dev->dev; + kfree(node_detected_dev); @@ -5781,8 +5807,11 @@ static void autostart_arrays(int part)
Re: [patch v3 1/1] md: Software Raid autodetect dev list not array
On Monday 27 August 2007, Randy Dunlap wrote: > On Mon, 27 Aug 2007 15:16:21 -0700 Michael J. Evans wrote: > > > = > > --- linux/drivers/md/md.c.orig 2007-08-21 03:19:42.511576248 -0700 > > +++ linux/drivers/md/md.c 2007-08-21 04:30:09.775525710 -0700 > > @@ -24,4 +24,6 @@ > > > > + - autodetect dev list not array: Michael J. Evans <[EMAIL PROTECTED]> > > + > > Nowadays we use an SCM for such comments, not the source file(s). SCM? Is this automatic, where do I go to see it? > > > This program is free software; you can redistribute it and/or modify > > it under the terms of the GNU General Public License as published by > > the Free Software Foundation; either version 2, or (at your option) > > @@ -5752,13 +5754,26 @@ void md_autodetect_dev(dev_t dev) > > * Searches all registered partitions for autorun RAID arrays > > * at boot time. > > */ > > -static dev_t detected_devices[128]; > > -static int dev_cnt; > > + > > +static LIST_HEAD(all_detected_devices); > > +struct detected_devices_node { > > + struct list_head list; > > + dev_t dev; > > +}; > > > > void md_autodetect_dev(dev_t dev) > > { > > - if (dev_cnt >= 0 && dev_cnt < 127) > > - detected_devices[dev_cnt++] = dev; > > + struct detected_devices_node *node_detected_dev; > > + char strbuf[BDEVNAME_SIZE]; > > + > > + node_detected_dev = kzalloc(sizeof(*node_detected_dev), GFP_KERNEL);\ > > Drop the trailing '\', as someone has already commented on. > I thought I had fixed that, I must have copied the unfixed version out of the way when I made the other changes to it. > > + if (node_detected_dev) { > > + node_detected_dev->dev = dev; > > + list_add_tail(&node_detected_dev->list, &all_detected_devices); > > + } else { > > + printk(KERN_CRIT "md: md_autodetect_dev: kzAlloc node failed" > > + " (null return), skipping dev(%d,%d)\n", MAJOR(dev), > > MINOR(dev)); > > printk() formatting is bad. > Drop the " (null return)" and indent that line more than the > printk line is indented. > > > + } > > } > > > > > > --- > ~Randy > *** Remember to use Documentation/SubmitChecklist when testing your code *** > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch v3 1/1] md: Software Raid autodetect dev list not array
Michael J. Evans wrote: On Monday 27 August 2007, Randy Dunlap wrote: On Mon, 27 Aug 2007 15:16:21 -0700 Michael J. Evans wrote: = --- linux/drivers/md/md.c.orig 2007-08-21 03:19:42.511576248 -0700 +++ linux/drivers/md/md.c 2007-08-21 04:30:09.775525710 -0700 @@ -24,4 +24,6 @@ + - autodetect dev list not array: Michael J. Evans <[EMAIL PROTECTED]> + Nowadays we use an SCM for such comments, not the source file(s). SCM? Is this automatic, where do I go to see it? See Documentation/SubmittingPatches: 14) The canonical patch format: The canonical patch message body contains the following: - A "from" line specifying the patch author. - An empty line. - The body of the explanation, which will be copied to the permanent changelog to describe this patch. - The "Signed-off-by:" lines, described above, which will also go in the changelog. - A marker line containing simply "---". - Any additional comments not suitable for the changelog. - The actual patch (diff output). so just put whatever you want to be in the permanent SCM logs into the "body of the explanation" part of the email. -- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html