Re: [patch v2 1/1] md: Software Raid autodetect dev list not array

2007-08-27 Thread Michael Evans
On 8/26/07, Kyle Moffett [EMAIL PROTECTED] wrote:
 On Aug 26, 2007, at 08:20:45, Michael Evans wrote:
  Also, I forgot to mention, the reason I added the counters was
  mostly for debugging.  However they're also as useful in the same
  way that listing the partitions when a new disk is added can be (in
  fact this augments that and the existing messages the autodetect
  routines provide).
 
  As for using autodetect or not... the only way to skip it seems to
  be compiling md's raid support as a module.  I checked 2.6.22's
  menuconfig and there's no way for me to explicitly turn it on or
  off at compile time.  I also feel that forcing the addition of a
  boot parameter to de-activate a broken and deprecated system you
  aren't even aware you are getting is somehow wrong.  So if you have
  over 128 devices for it to scan, as I do on one of my PCs, then it
  can bring up
  an array in degraded mode.  ... crud.

 Well, you could just change the MSDOS disk label to use a different
 Partition Type for your raid partitions.  Just pick the standard
 Linux type and you will get exactly the same behavior that
 everybody who doesn't use MSDOS partition tables gets.

 Cheers,
 Kyle Moffett



I recall most of the guides I referenced during setup having me change
the partition type, additionally parted only calls the flag 'raid' not
'raid autodetect'.  However it would still be confusing to anyone not
intimately familiar with the subsystem.

Also, even though the system has a standard PC BIOS, I liked some of
the specifications of the GUID partition table (GPT) provided.  Namely
a checksum and backup copy, and up to 128 partitions per disk.  Parted
is the only real tool I could find for editing such a disk label.

A problem I experienced that is almost completely unrelated to this
patch are other 'magic number' assumptions.  It is rather unfortunate
that linux allocates a fixed (and very small) number of partitions per
scsi disk.  A better method might be a complete dis-association of
major:minor pair to device name, and instead simply enumerating
partitions as they are detected.  That way if I choose to have 128
drives but each having at most 2 partitions I still easily fit within
one major, or if I choose the opposite (for whatever reason) I still
come to the same conclusion.  Before anyone mentions using LVM
instead, sharing operating systems, and using different partitions for
different raid stripe sizes/rebuilding flexibility/restriping
flexibility are just two good reasons I can think of for supporting
more then 15 partitions per device.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch for boot-time assembly of v1.x-metadata-based soft (MD) arrays: reasoning and future plans

2007-08-27 Thread David Greaves

Dan Williams wrote:

On 8/26/07, Abe Skolnik [EMAIL PROTECTED] wrote:

Because you can rely on the configuration file to be certain about
which disks to pull in and which to ignore.  Without the config file
the auto-detect routine may not always do the right thing because it
will need to make assumptions.

But kernel parameters can provide the same data, no?  After all, it is



Yes, you can get a similar effect of the config file by adding
parameters to the kernel command line.  My only point is that if the
initramfs update tools were as simple as:
mkinitrd root=/dev/md0 md=0,v1,/dev/sda1,/dev/sdb1,/dev/sdc1
...then using an initramfs becomes the same amount of work as editing
/etc/grub.conf.




I'm not sure if you're aware of this so in the spirit of being helpful I thought 
I'd point out that this has been discussed quite extensively in the past. You 
may want to take a look at the list archives.


It's quite complex and whilst I too would like autodetect to continue to work 
(I've been 'bitten' by the new superblocks not autodetecting) I accept the 
arguments of those wiser than me.


I agree that the problem is one of old dogs and the new (!) initram trick.
Look at it as an opportunity to learn more about a cool capability.

David
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


md raid acceleration and the async_tx api

2007-08-27 Thread Yuri Tikhonov

 Hello,

 I tested the h/w accelerated RAID-5 using the kernel with PAGE_SIZE set to 
64KB and found the bonnie++ application hangs-up during the Re-writing 
test. I made some investigations and discovered that the hang-up occurs 
because one of the mpage_end_io_read() calls is missing (these are the 
callbacks initiated from the ops_complete_biofill() function).

 The fact is that my low-level ADMA driver (the ppc440spe one) successfully 
initiated the ops_complete_biofill() callback but the ops_complete_biofill() 
function itself skipped calling the bi_end_io() handler of the completed bio 
(current dev-read) because during processing of this (current dev-read) bio 
some other request had come to the sh (current dev_q-toread). Thus 
ops_complete_biofill() scheduled another biofill operation which, as a 
result, overwrote the unacknowledged bio (dev-read in ops_run_biofill()), 
and so we lost the previous dev-read bio completely.

 Here is a patch that solves this problem. Perhaps this might be implemented 
in some more elegant and effective way. What are your thoughts regarding 
this?

 Regards, Yuri

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 08b4893..7abc96b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -838,11 +838,24 @@ static void ops_complete_biofill(void *stripe_head_ref)
/* acknowledge completion of a biofill operation */
/* and check if we need to reply to a read request
 */
-   if (test_bit(R5_Wantfill, dev_q-flags)  !dev_q-toread) {
+   if (test_bit(R5_Wantfill, dev_q-flags)) {
struct bio *rbi, *rbi2;
struct r5dev *dev = sh-dev[i];
 
-   clear_bit(R5_Wantfill, dev_q-flags);
+   /* There is a chance that another fill operation
+* had been scheduled for this dev while we
+* processed sh. In this case do one of the following
+* alternatives:
+* - if there is no active completed biofill for the dev
+*   then go to the next dev leaving Wantfill set;
+* - if there is active completed biofill for the dev
+*   then ack it but leave Wantfill set.
+*/
+   if (dev_q-toread  !dev-read)
+   continue;
+
+   if (!dev_q-toread)
+   clear_bit(R5_Wantfill, dev_q-flags);
 
/* The access to dev-read is outside of the
 * spin_lock_irq(conf-device_lock), but is protected
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid5 Reshape gone wrong, please help

2007-08-27 Thread Neil Brown
On Thursday August 23, [EMAIL PROTECTED] wrote:
 Trimming tons of detail, but keeping the thread
 
 OK I've reproduced the original issue on a seperate box.
 2.6.23-rc3 does not like to grow Raid 5 arrays.  MDadm 2.6.3

No, you are right. It doesn't.

Obviously insufficient testing and review - thanks for find it for us.

This patch seems to make it work - raid5 and raid6.

Dan: Could you check it for me, particularly the moving of
+   async_tx_ack(tx);
+   dma_wait_for_async_tx(tx);
outside of the loop.

Greg: could you pleas check it works for you too - it works for me,
but double-testing never hurts.

Thanks again,

NeilBrown



-
Fix some bugs with growing raid5/raid6 arrays.



### Diffstat output
 ./drivers/md/raid5.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c2007-08-24 16:36:22.0 +1000
+++ ./drivers/md/raid5.c2007-08-27 20:50:57.0 +1000
@@ -2541,7 +2541,7 @@ static void handle_stripe_expansion(raid
struct dma_async_tx_descriptor *tx = NULL;
clear_bit(STRIPE_EXPAND_SOURCE, sh-state);
for (i = 0; i  sh-disks; i++)
-   if (i != sh-pd_idx  (r6s  i != r6s-qd_idx)) {
+   if (i != sh-pd_idx  (!r6s || i != r6s-qd_idx)) {
int dd_idx, pd_idx, j;
struct stripe_head *sh2;
 
@@ -2574,7 +2574,8 @@ static void handle_stripe_expansion(raid
set_bit(R5_UPTODATE, sh2-dev[dd_idx].flags);
for (j = 0; j  conf-raid_disks; j++)
if (j != sh2-pd_idx 
-   (r6s  j != r6s-qd_idx) 
+   (!r6s || j != raid6_next_disk(sh2-pd_idx,
+sh2-disks)) 
!test_bit(R5_Expanded, sh2-dev[j].flags))
break;
if (j == conf-raid_disks) {
@@ -2583,12 +2584,12 @@ static void handle_stripe_expansion(raid
}
release_stripe(sh2);
 
-   /* done submitting copies, wait for them to complete */
-   if (i + 1 = sh-disks) {
-   async_tx_ack(tx);
-   dma_wait_for_async_tx(tx);
-   }
}
+   /* done submitting copies, wait for them to complete */
+   if (tx) {
+   async_tx_ack(tx);
+   dma_wait_for_async_tx(tx);
+   }
 }
 
 /*
@@ -2855,7 +2856,7 @@ static void handle_stripe5(struct stripe
sh-disks = conf-raid_disks;
sh-pd_idx = stripe_to_pdidx(sh-sector, conf,
conf-raid_disks);
-   s.locked += handle_write_operations5(sh, 0, 1);
+   s.locked += handle_write_operations5(sh, 1, 1);
} else if (s.expanded 
!test_bit(STRIPE_OP_POSTXOR, sh-ops.pending)) {
clear_bit(STRIPE_EXPAND_READY, sh-state);
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [md-accel PATCH 16/19] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines

2007-08-27 Thread saeed bishara
Hi Dan,

 +static dma_cookie_t
 +iop_adma_tx_submit(struct dma_async_tx_descriptor *tx)
 +{
 +
 +   old_chain_tail = list_entry(iop_chan-chain.prev,
 +   struct iop_adma_desc_slot, chain_node);
 +   list_splice_init(sw_desc-group_list, old_chain_tail-chain_node);
 +
 +   /* fix up the hardware chain */
 +   iop_desc_set_next_desc(old_chain_tail, grp_start-phys);
 +
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [md-accel PATCH 16/19] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines

2007-08-27 Thread saeed bishara
Hi Dan,
I think you have a bug in this function, the list_splice_init adds the
new slots in the head of the chain_node, but you get the
old_chain_tail (latest descriptor) from the tail of the chain!!

  +static dma_cookie_t
  +iop_adma_tx_submit(struct dma_async_tx_descriptor *tx)
  +{
  +
  +   old_chain_tail = list_entry(iop_chan-chain.prev,
  +   struct iop_adma_desc_slot, chain_node);
  +   list_splice_init(sw_desc-group_list, old_chain_tail-chain_node);
  +
  +   /* fix up the hardware chain */
  +   iop_desc_set_next_desc(old_chain_tail, grp_start-phys);
  +
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: recovery starts from 0 after reboot - normal?

2007-08-27 Thread Tomasz Chmielewski

Tomasz Chmielewski schrieb:

Justin Piszcz schrieb:


According to the fine manual, BITMAP CHANGES belong to the grow mode.
So, let's try to do what the manual says - try to add a bitmap to the 
active array:



# mdadm --grow /dev/md0 --bitmap=internal
mdadm: failed to set internal bitmap.

# dmesg
md: couldn't update array info. -16


So, either I don't understand the manual, or there are more caveats 
in it.


(...)


Check: http://lkml.org/lkml/2007/6/17/235


It's an external bitmap; I'd prefer an internal one, as my root 
filesystem is on a flash-IDE disk.


However, let's try the external bitmap:

# mdadm --grow /dev/md0 --bitmap=/bitmapfile
mdadm: Cannot set bitmap file for /dev/md0: Device or resource busy


So, it also fails. Perhaps because the array still rebuilds?


It looks it only works when the array is in a clean state (or, not 
rebuilding anymore).


Just for reference:

# mdadm --grow --bitmap=/bitmapfile /dev/md0
# dmesg -c
md0: bitmap file is out of date (0  43130) -- forcing full recovery
md0: bitmap file is out of date, doing full recovery
md0: bitmap initialized from disk: read 24/24 pages, set 763103 bits, 
status: 0

created bitmap (373 pages) for device md0

# mdadm --grow /dev/md0 --bitmap=internal
mdadm: /dev/md0 already has a bitmap (/bitmapfile)

# mdadm --grow /dev/md0 --bitmap=none

# mdadm --grow /dev/md0 --bitmap=internal
# dmesg -c
md0: bitmap file is out of date (0  43130) -- forcing full recovery
md0: bitmap file is out of date, doing full recovery
md0: bitmap initialized from disk: read 12/12 pages, set 381552 bits, 
status: 0

created bitmap (187 pages) for device md0


What are pages/bits in bitmap? It seems to differ, whether internal or 
external is used.


--
Tomasz Chmielewski
http://wpkg.org
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid10 or raid1+0 ?

2007-08-27 Thread Justin Piszcz



On Mon, 27 Aug 2007, T. Eichstädt wrote:


Hallo all,

I have 4 HDDs and I want to use mirroring and striping.
I am wondering what difference between the following two solutions is:

- raid0 on top of 2 raid1 devices (raid1+0)
- directly using the raid10 module

Perhaps someone can give me a hint what the raid10 linux module does in 
difference to the combination of raid1 and raid0.


Another question:
How stable is the raid10 module in the linux kernel. I am currently using the 
debian kernel 2.6.18 but I saw some patches from Neil and others for 2.6.21, 
2.6.22 regarding the raid10 module. And their descriptions sound as if it 
could be useful to integrate them. when using raid10.
Okay, backporting isn't complicated, but nevertheless the number of patches 
makes me feel that raid10 is perhaps not that stable like raid0 and raid1 ???


Perhaps someone can blow away my fear :)

Best regards
Thimo Eichstädt

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



I have a few people who asked me this as well, RAID10 or similiar (SW).  I 
am not so sure, with RAID1 you can have your root disks on it and boot 
from it using LILO/GRUB and it is proven pretty stable; can the same be 
said about RAID10?

RE: Raid5 Reshape gone wrong, please help

2007-08-27 Thread Williams, Dan J
 From: Neil Brown [mailto:[EMAIL PROTECTED]
 On Thursday August 23, [EMAIL PROTECTED] wrote:
  Trimming tons of detail, but keeping the thread
 
  OK I've reproduced the original issue on a seperate box.
  2.6.23-rc3 does not like to grow Raid 5 arrays.  MDadm 2.6.3
 
 No, you are right. It doesn't.
 
 Obviously insufficient testing and review - thanks for find it for us.
 
Agreed - seconded.

 This patch seems to make it work - raid5 and raid6.
 
 Dan: Could you check it for me, particularly the moving of
 + async_tx_ack(tx);
 + dma_wait_for_async_tx(tx);
 outside of the loop.
 
Yes, this definitely needs to be outside the loop.

 Greg: could you pleas check it works for you too - it works for me,
 but double-testing never hurts.
 
 Thanks again,
 
 NeilBrown
 
 
 
 -
 Fix some bugs with growing raid5/raid6 arrays.
 
 
 
 ### Diffstat output
  ./drivers/md/raid5.c |   17 +
  1 file changed, 9 insertions(+), 8 deletions(-)
 
 diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
 --- .prev/drivers/md/raid5.c  2007-08-24 16:36:22.0 +1000
 +++ ./drivers/md/raid5.c  2007-08-27 20:50:57.0 +1000
 @@ -2541,7 +2541,7 @@ static void handle_stripe_expansion(raid
   struct dma_async_tx_descriptor *tx = NULL;
   clear_bit(STRIPE_EXPAND_SOURCE, sh-state);
   for (i = 0; i  sh-disks; i++)
 - if (i != sh-pd_idx  (r6s  i != r6s-qd_idx)) {
 + if (i != sh-pd_idx  (!r6s || i != r6s-qd_idx)) {
   int dd_idx, pd_idx, j;
   struct stripe_head *sh2;
 
 @@ -2574,7 +2574,8 @@ static void handle_stripe_expansion(raid
   set_bit(R5_UPTODATE, sh2-dev[dd_idx].flags);
   for (j = 0; j  conf-raid_disks; j++)
   if (j != sh2-pd_idx 
 - (r6s  j != r6s-qd_idx) 
 + (!r6s || j !=
raid6_next_disk(sh2-pd_idx,
 +
sh2-disks)) 
   !test_bit(R5_Expanded,
sh2-dev[j].flags))
   break;
   if (j == conf-raid_disks) {
 @@ -2583,12 +2584,12 @@ static void handle_stripe_expansion(raid
   }
   release_stripe(sh2);
 
 - /* done submitting copies, wait for them to
complete */
 - if (i + 1 = sh-disks) {
 - async_tx_ack(tx);
 - dma_wait_for_async_tx(tx);
 - }
   }
 + /* done submitting copies, wait for them to complete */
 + if (tx) {
 + async_tx_ack(tx);
 + dma_wait_for_async_tx(tx);
 + }
  }
 
  /*
 @@ -2855,7 +2856,7 @@ static void handle_stripe5(struct stripe
   sh-disks = conf-raid_disks;
   sh-pd_idx = stripe_to_pdidx(sh-sector, conf,
   conf-raid_disks);
 - s.locked += handle_write_operations5(sh, 0, 1);
 + s.locked += handle_write_operations5(sh, 1, 1);
How about for clarity:
s.locked += handle_write_operations5(sh, RECONSTRUCT_WRITE, 1);

   } else if (s.expanded 
   !test_bit(STRIPE_OP_POSTXOR, sh-ops.pending)) {
   clear_bit(STRIPE_EXPAND_READY, sh-state);

Signed-off-by: Dan Williams [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: md raid acceleration and the async_tx api

2007-08-27 Thread Williams, Dan J
 From: Yuri Tikhonov [mailto:[EMAIL PROTECTED]
  Hello,
 
  I tested the h/w accelerated RAID-5 using the kernel with PAGE_SIZE
set to
 64KB and found the bonnie++ application hangs-up during the
Re-writing
 test. I made some investigations and discovered that the hang-up
occurs
 because one of the mpage_end_io_read() calls is missing (these are the
 callbacks initiated from the ops_complete_biofill() function).
 
  The fact is that my low-level ADMA driver (the ppc440spe one)
successfully
 initiated the ops_complete_biofill() callback but the
ops_complete_biofill()
 function itself skipped calling the bi_end_io() handler of the
completed bio
 (current dev-read) because during processing of this (current
dev-read) bio
 some other request had come to the sh (current dev_q-toread). Thus
 ops_complete_biofill() scheduled another biofill operation which, as a
 result, overwrote the unacknowledged bio (dev-read in
ops_run_biofill()),
 and so we lost the previous dev-read bio completely.
 
  Here is a patch that solves this problem. Perhaps this might be
implemented
 in some more elegant and effective way. What are your thoughts
regarding
 this?
 
  Regards, Yuri
 
 diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
 index 08b4893..7abc96b 100644
 --- a/drivers/md/raid5.c
 +++ b/drivers/md/raid5.c
 @@ -838,11 +838,24 @@ static void ops_complete_biofill(void
*stripe_head_ref)
   /* acknowledge completion of a biofill operation */
   /* and check if we need to reply to a read request
*/
 - if (test_bit(R5_Wantfill, dev_q-flags) 
!dev_q-toread) {
 + if (test_bit(R5_Wantfill, dev_q-flags)) {
   struct bio *rbi, *rbi2;
   struct r5dev *dev = sh-dev[i];
 
 - clear_bit(R5_Wantfill, dev_q-flags);
 + /* There is a chance that another fill operation
 +  * had been scheduled for this dev while we
 +  * processed sh. In this case do one of the
following
 +  * alternatives:
 +  * - if there is no active completed biofill for
the dev
 +  *   then go to the next dev leaving Wantfill
set;
 +  * - if there is active completed biofill for
the dev
 +  *   then ack it but leave Wantfill set.
 +  */
 + if (dev_q-toread  !dev-read)
 + continue;
 +
 + if (!dev_q-toread)
 + clear_bit(R5_Wantfill, dev_q-flags);
 
   /* The access to dev-read is outside of the
* spin_lock_irq(conf-device_lock), but is
protected

This still looks racy...  I think the complete fix is to make the
R5_Wantfill and dev_q-toread accesses atomic.  Please test the
following patch (also attached) and let me know if it fixes what you are
seeing:

Applies on top of git://lost.foo-projects.org/~dwillia2/git/iop
md-for-linus

---
raid5: fix ops_complete_biofill race in the asynchronous offload case

Protect against dev_q-toread toggling while testing and clearing the
R5_Wantfill bit.  This change prevents all asynchronous completions
(tasklets) from running during handle_stripe.
---

 drivers/md/raid5.c |   16 +---
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 2f9022d..91c14c6 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -824,6 +824,7 @@ static void ops_complete_biofill(void
*stripe_head_ref)
(unsigned long long)sh-sector);
 
/* clear completed biofills */
+   spin_lock(sq-lock);
for (i = sh-disks; i--; ) {
struct r5dev *dev = sh-dev[i];
struct r5_queue_dev *dev_q = sq-dev[i];
@@ -861,6 +862,7 @@ static void ops_complete_biofill(void
*stripe_head_ref)
}
clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack);
clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending);
+   spin_unlock(sq-lock);
 
return_io(return_bi);
 
@@ -2279,7 +2281,7 @@ static int add_queue_bio(struct stripe_queue *sq,
struct bio *bi, int dd_idx,
(unsigned long long)bi-bi_sector,
(unsigned long long)sq-sector);
 
-   spin_lock(sq-lock);
+   spin_lock_bh(sq-lock);
spin_lock_irq(conf-device_lock);
sh = sq-sh;
if (forwrite) {
@@ -2306,7 +2308,7 @@ static int add_queue_bio(struct stripe_queue *sq,
struct bio *bi, int dd_idx,
*bip = bi;
bi-bi_phys_segments ++;
spin_unlock_irq(conf-device_lock);
-   spin_unlock(sq-lock);
+   spin_unlock_bh(sq-lock);
 
pr_debug(added bi b#%llu to stripe s#%llu, disk %d.\n,
(unsigned long long)bi-bi_sector,
@@ -2339,7 +2341,7 @@ static int add_queue_bio(struct stripe_queue *sq,
struct bio *bi, int dd_idx,
  overlap:
set_bit(R5_Overlap, 

RE: [md-accel PATCH 16/19] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines

2007-08-27 Thread Williams, Dan J
 From: saeed bishara [mailto:[EMAIL PROTECTED]
 Hi Dan,
 I think you have a bug in this function, the list_splice_init adds the
 new slots in the head of the chain_node, but you get the
 old_chain_tail (latest descriptor) from the tail of the chain!!
 
   +static dma_cookie_t
   +iop_adma_tx_submit(struct dma_async_tx_descriptor *tx)
   +{
   +
   +   old_chain_tail = list_entry(iop_chan-chain.prev,
   +   struct iop_adma_desc_slot, chain_node);
   +   list_splice_init(sw_desc-group_list, old_chain_tail-
 chain_node);
   +
   +   /* fix up the hardware chain */
   +   iop_desc_set_next_desc(old_chain_tail, grp_start-phys);
   +
[ dropped akpm, davem, jeff, and chris from the cc ]

This looks ok to me.  sw_desc-group_list is the head of a new list of
descriptors to be added to the chain.  old_chain_tail is youngest
descriptor at the end of the current channel chain.

Before list_splice_init:
old_chain_tail-next = channel_chain_head
channel_chain_head-prev = old_chain_tail
new_chain_start-prev = sw_desc-group_list;
new_chain_end-next = sw_desc-group_list;

After list_splice_init:
new_chain_start-prev = old_chain_tail
old_chain_tail-next = new_chain_start
new_chain_end-next = channel_chain_head
channel_chain_head-prev = new_chain_end

sw_desc-group_list is empty

--
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6 patch] bitmap.h: remove dead artefacts

2007-08-27 Thread Adrian Bunk
bitmap_active() no longer exists and BITMAP_ACTIVE is no longer used.

Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

---

 include/linux/raid/bitmap.h |2 --
 1 file changed, 2 deletions(-)

f2c10857c60e13e63c67c656713b50b18f2bc1e4 
diff --git a/include/linux/raid/bitmap.h b/include/linux/raid/bitmap.h
index 75e17a0..306a1d1 100644
--- a/include/linux/raid/bitmap.h
+++ b/include/linux/raid/bitmap.h
@@ -138,7 +138,6 @@ typedef __u16 bitmap_counter_t;
 
 /* use these for bitmap-flags and bitmap-sb-state bit-fields */
 enum bitmap_state {
-   BITMAP_ACTIVE = 0x001, /* the bitmap is in use */
BITMAP_STALE  = 0x002,  /* the bitmap file is out of date or had -EIO */
BITMAP_WRITE_ERROR = 0x004, /* A write error has occurred */
BITMAP_HOSTENDIAN = 0x8000,
@@ -258,7 +257,6 @@ struct bitmap {
 int  bitmap_create(mddev_t *mddev);
 void bitmap_flush(mddev_t *mddev);
 void bitmap_destroy(mddev_t *mddev);
-int  bitmap_active(struct bitmap *bitmap);
 
 char *file_path(struct file *file, char *buf, int count);
 void bitmap_print_sb(struct bitmap *bitmap);

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5:md3: kernel BUG , followed by , Silent halt .

2007-08-27 Thread Dan Williams
On 8/25/07, Mr. James W. Laferriere [EMAIL PROTECTED] wrote:
 Hello Dan ,

 On Mon, 20 Aug 2007, Dan Williams wrote:
  On 8/18/07, Mr. James W. Laferriere [EMAIL PROTECTED] wrote:
  Hello All ,  Here we go again .  Again attempting to do bonnie++ 
  testing
  on a small array .
  Kernel 2.6.22.1
  Patches involved ,
  IOP1 ,  2.6.22.1-iop1 for improved sequential write performance
  (stripe-queue) ,  Dan Williams [EMAIL PROTECTED]
 
  Hello James,
 
  Thanks for the report.
 
  I tried to reproduce this on my system, no luck.
 Possibly because there is significant hardware differances ?
 See 'lspci -v' below .sig .

  However it looks
  like their is a potential race between 'handle_queue' and
  'add_queue_bio'.  The attached patch moves these critical sections
  under spin_lock(sq-lock), and adds some debugging output if this BUG
  triggers.  It also includes a fix for retry_aligned_read which is
  unrelated to this debug.
  --
  Dan
 Applied your patch .  The same 'kernel BUG at 
 drivers/md/raid5.c:3689!'
 messages appear (see attached) .  The system is still responsive with your
 patch ,  the kernel crashed last time .  Tho the bonnie++ run is stuck in 'D' 
 .
 And doing a ' /md3/asdf'  stays hung even after passing the parent process a
 'kill -9' .
 Any further info You can think of I can/should ,  I will try to 
 acquire
 .  But I'll have to repeat these steps to attempt to get the same results .
 I'll be shutting the system down after sending this off .
 Fyi ,  the previous 'BUG without your patch was quite repeatable .
 I might have time over the next couple of weeks to be able to see if 
 it
 is as repatable as the last one .

 Contents of /proc/mdstat for md3 .

 md3 : active raid6 sdx1[3] sdw1[2] sdv1[1] sdu1[0] sdt1[7](S) sds1[6] sdr1[5] 
 sdq1[4]
717378560 blocks level 6, 1024k chunk, algorithm 2 [7/7] [UUU]
bitmap: 2/137 pages [8KB], 512KB chunk

 Commands I ran that lead to the 'BUG' .

 bonniemd3() { /root/bonnie++-1.03a/bonnie++  -u0:0  -d /md3  -s 131072  -f; }
 bonniemd3  131072MB-bonnie++-run-md3-xfs.log-20070825 21 

Ok, the 'bitmap' and 'raid6' details were the missing pieces of my
testing.  I can now reproduce this bug in handle_queue.  I'll keep you
posted on what I find.

Thank you for tracking this.

Regards,
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch v3 1/1] md: Software Raid autodetect dev list not array

2007-08-27 Thread Michael J. Evans
From: Michael J. Evans [EMAIL PROTECTED]

In current release kernels the md module (Software RAID) uses a static array
 (dev_t[128]) to store partition/device info temporarily for autostart.

This patch replaces that static array with a list.

Signed-off-by: Michael J. Evans [EMAIL PROTECTED]
--- 
Sorry, it looks like I hit reply instead of reply to all yesterday.

Version 3:
md_autodetect_dev failure message is now more usefully verbose.
removed unused i_found that was leftover from initial verification.
Thank you Randy Dunlap for pointing where INT_MAX was, fixme fixed.

Version 2:
using list_add_tail, and corrected missing i_passed++;.
removed sections of code that would never be reached.

Version 1:
The data/structures are only used within md.c, and very close together.
However I wonder if the structural information shouldn't go in to...
../../include/linux/raid/md_k.h instead.


I discovered this (and that the devices are added as disks/partitions are
discovered at boot) while I was debugging why only one of my MD arrays would
come up whole, while all the others were short a disk.

I eventually discovered that it was enumerating through all of 9 of my 11 hds
(2 had only 4 partitions apiece) while the other 9 have 15 partitions
(I wanted 64 per drive...). The last partition of the 8th drive in my 9 drive
raid 5 sets wasn't added, thus making the final md array short both a parity
and data disk, and it was started later, elsewhere.

Subject: [patch 1/1] md: Software Raid autodetect dev list not array

SOFTWARE RAID (Multiple Disks) SUPPORT
P:  Ingo Molnar
M:  [EMAIL PROTECTED]
P:  Neil Brown
M:  [EMAIL PROTECTED]
L:  linux-raid@vger.kernel.org
S:  Supported
Unless you have a reason NOT to do so, CC [EMAIL PROTECTED]

12: Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT,
CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES,
CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously
enabled.

It has been tested with CONFIG_SMP set and unset (Different x86_64 systems).
It has been tested with CONFIG_PREEMPT set and unset (same system).
CONFIG_LBD isn't even an option in my .config file.

Note: between 2.6.22 and 2.6.23-rc3-git5
rdev = md_import_device(dev,0, 0);
became
rdev = md_import_device(dev,0, 90);
So the patch has been edited to patch around that line. (might be fuzzy)

Signed-off-by: Michael J. Evans [EMAIL PROTECTED]
=
--- linux/drivers/md/md.c.orig  2007-08-21 03:19:42.511576248 -0700
+++ linux/drivers/md/md.c   2007-08-21 04:30:09.775525710 -0700
@@ -24,4 +24,6 @@

+   - autodetect dev list not array: Michael J. Evans [EMAIL PROTECTED]
+
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
@@ -5752,13 +5754,26 @@ void md_autodetect_dev(dev_t dev)
  * Searches all registered partitions for autorun RAID arrays
  * at boot time.
  */
-static dev_t detected_devices[128];
-static int dev_cnt;
+
+static LIST_HEAD(all_detected_devices);
+struct detected_devices_node {
+   struct list_head list;
+   dev_t dev;
+};
 
 void md_autodetect_dev(dev_t dev)
 {
-   if (dev_cnt = 0  dev_cnt  127)
-   detected_devices[dev_cnt++] = dev;
+   struct detected_devices_node *node_detected_dev;
+   char strbuf[BDEVNAME_SIZE];
+
+   node_detected_dev = kzalloc(sizeof(*node_detected_dev), GFP_KERNEL);\
+   if (node_detected_dev) {
+   node_detected_dev-dev = dev;
+   list_add_tail(node_detected_dev-list, all_detected_devices);
+   } else {
+   printk(KERN_CRIT md: md_autodetect_dev: kzAlloc node failed
+(null return), skipping dev(%d,%d)\n, MAJOR(dev), 
MINOR(dev));
+   }
 }
 
 
@@ -5765,7 +5760,12 @@ static void autostart_arrays(int part)
 static void autostart_arrays(int part)
 {
mdk_rdev_t *rdev;
-   int i;
+   struct detected_devices_node *node_detected_dev;
+   dev_t dev;
+   int i_scanned, i_passed;
+
+   i_scanned = 0;
+   i_passed = 0;
 
printk(KERN_INFO md: Autodetecting RAID arrays.\n);
 
@@ -5772,3 +5792,7 @@ static void autostart_arrays(int part)
-   for (i = 0; i  dev_cnt; i++) {
-   dev_t dev = detected_devices[i];
-
+   while (!list_empty(all_detected_devices)  i_scanned  INT_MAX) {
+   i_scanned++;
+   node_detected_dev = list_entry(all_detected_devices.next,
+   struct detected_devices_node, list);
+   list_del(node_detected_dev-list);
+   dev = node_detected_dev-dev;
+   kfree(node_detected_dev);
@@ -5781,8 +5807,11 @@ static void autostart_arrays(int part)
continue;
}

Re: [patch v3 1/1] md: Software Raid autodetect dev list not array

2007-08-27 Thread Randy Dunlap
On Mon, 27 Aug 2007 15:16:21 -0700 Michael J. Evans wrote:

 Note: between 2.6.22 and 2.6.23-rc3-git5
 rdev = md_import_device(dev,0, 0);
 became
 rdev = md_import_device(dev,0, 90);
 So the patch has been edited to patch around that line. (might be fuzzy)

so you should update the patch to the latest mainline.

It's up to the (RAID) maintainer(s) if they want to merge a patch with fuzz.
Andrew may fix it up.  Linus wouldn't accept it with fuzz.

 Signed-off-by: Michael J. Evans [EMAIL PROTECTED]
 =
 --- linux/drivers/md/md.c.orig2007-08-21 03:19:42.511576248 -0700
 +++ linux/drivers/md/md.c 2007-08-21 04:30:09.775525710 -0700
 @@ -24,4 +24,6 @@
 
 +   - autodetect dev list not array: Michael J. Evans [EMAIL PROTECTED]
 +

Nowadays we use an SCM for such comments, not the source file(s).

 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
 the Free Software Foundation; either version 2, or (at your option)
 @@ -5752,13 +5754,26 @@ void md_autodetect_dev(dev_t dev)
   * Searches all registered partitions for autorun RAID arrays
   * at boot time.
   */
 -static dev_t detected_devices[128];
 -static int dev_cnt;
 +
 +static LIST_HEAD(all_detected_devices);
 +struct detected_devices_node {
 + struct list_head list;
 + dev_t dev;
 +};
  
  void md_autodetect_dev(dev_t dev)
  {
 - if (dev_cnt = 0  dev_cnt  127)
 - detected_devices[dev_cnt++] = dev;
 + struct detected_devices_node *node_detected_dev;
 + char strbuf[BDEVNAME_SIZE];
 +
 + node_detected_dev = kzalloc(sizeof(*node_detected_dev), GFP_KERNEL);\

Drop the trailing '\', as someone has already commented on.

 + if (node_detected_dev) {
 + node_detected_dev-dev = dev;
 + list_add_tail(node_detected_dev-list, all_detected_devices);
 + } else {
 + printk(KERN_CRIT md: md_autodetect_dev: kzAlloc node failed
 +  (null return), skipping dev(%d,%d)\n, MAJOR(dev), 
 MINOR(dev));

printk() formatting is bad.
Drop the  (null return) and indent that line more than the
printk line is indented.

 + }
  }
  
  

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch v4 1/1] md: Software Raid autodetect dev list not array

2007-08-27 Thread Michael J. Evans
From: Michael J. Evans [EMAIL PROTECTED]

In current release kernels the md module (Software RAID) uses a static array
 (dev_t[128]) to store partition/device info temporarily for autostart.

This patch replaces that static array with a list.

Signed-off-by: Michael J. Evans [EMAIL PROTECTED]
--- 
Version 4:
Minor formatting cleanup for kzAlloc error message.
Should I remove the first patch section in version 5?

Version 3:
md_autodetect_dev failure message is now more usefully verbose.
removed unused i_found that was leftover from initial verification.
Thank you Randy Dunlap for pointing where INT_MAX was, fixme fixed.

Version 2:
using list_add_tail, and corrected missing i_passed++;.
removed sections of code that would never be reached.

Version 1:
The data/structures are only used within md.c, and very close together.
However I wonder if the structural information shouldn't go in to...
../../include/linux/raid/md_k.h instead.


I discovered this (and that the devices are added as disks/partitions are
discovered at boot) while I was debugging why only one of my MD arrays would
come up whole, while all the others were short a disk.

I eventually discovered that it was enumerating through all of 9 of my 11 hds
(2 had only 4 partitions apiece) while the other 9 have 15 partitions
(I wanted 64 per drive...). The last partition of the 8th drive in my 9 drive
raid 5 sets wasn't added, thus making the final md array short both a parity
and data disk, and it was started later, elsewhere.

Subject: [patch 1/1] md: Software Raid autodetect dev list not array

SOFTWARE RAID (Multiple Disks) SUPPORT
P:  Ingo Molnar
M:  [EMAIL PROTECTED]
P:  Neil Brown
M:  [EMAIL PROTECTED]
L:  linux-raid@vger.kernel.org
S:  Supported
Unless you have a reason NOT to do so, CC [EMAIL PROTECTED]

12: Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT,
CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES,
CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously
enabled.

It has been tested with CONFIG_SMP set and unset (Different x86_64 systems).
It has been tested with CONFIG_PREEMPT set and unset (same system).
CONFIG_LBD isn't even an option in my .config file.

Note: between 2.6.22 and 2.6.23-rc3-git5
rdev = md_import_device(dev,0, 0);
became
rdev = md_import_device(dev,0, 90);
So the patch has been edited to patch around that line.

Signed-off-by: Michael J. Evans [EMAIL PROTECTED]
=
--- linux/drivers/md/md.c.orig  2007-08-21 03:19:42.511576248 -0700
+++ linux/drivers/md/md.c   2007-08-21 04:30:09.775525710 -0700
@@ -24,4 +24,6 @@

+   - autodetect dev list not array: Michael J. Evans [EMAIL PROTECTED]
+
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
@@ -5752,13 +5754,26 @@ void md_autodetect_dev(dev_t dev)
  * Searches all registered partitions for autorun RAID arrays
  * at boot time.
  */
-static dev_t detected_devices[128];
-static int dev_cnt;
+
+static LIST_HEAD(all_detected_devices);
+struct detected_devices_node {
+   struct list_head list;
+   dev_t dev;
+};
 
 void md_autodetect_dev(dev_t dev)
 {
-   if (dev_cnt = 0  dev_cnt  127)
-   detected_devices[dev_cnt++] = dev;
+   struct detected_devices_node *node_detected_dev;
+   char strbuf[BDEVNAME_SIZE];
+
+   node_detected_dev = kzalloc(sizeof(*node_detected_dev), GFP_KERNEL);
+   if (node_detected_dev) {
+   node_detected_dev-dev = dev;
+   list_add_tail(node_detected_dev-list, all_detected_devices);
+   } else {
+   printk(KERN_CRIT md: md_autodetect_dev: kzAlloc node failed
+   , skipping dev(%d,%d)\n, MAJOR(dev), MINOR(dev));
+   }
 }
 
 
@@ -5765,7 +5760,12 @@ static void autostart_arrays(int part)
 static void autostart_arrays(int part)
 {
mdk_rdev_t *rdev;
-   int i;
+   struct detected_devices_node *node_detected_dev;
+   dev_t dev;
+   int i_scanned, i_passed;
+
+   i_scanned = 0;
+   i_passed = 0;
 
printk(KERN_INFO md: Autodetecting RAID arrays.\n);
 
@@ -5772,3 +5792,7 @@ static void autostart_arrays(int part)
-   for (i = 0; i  dev_cnt; i++) {
-   dev_t dev = detected_devices[i];
-
+   while (!list_empty(all_detected_devices)  i_scanned  INT_MAX) {
+   i_scanned++;
+   node_detected_dev = list_entry(all_detected_devices.next,
+   struct detected_devices_node, list);
+   list_del(node_detected_dev-list);
+   dev = node_detected_dev-dev;
+   kfree(node_detected_dev);
@@ -5781,8 +5807,11 @@ static void autostart_arrays(int part)
continue;

Re: [patch v3 1/1] md: Software Raid autodetect dev list not array

2007-08-27 Thread Michael J. Evans
On Monday 27 August 2007, Randy Dunlap wrote:
 On Mon, 27 Aug 2007 15:16:21 -0700 Michael J. Evans wrote:
 
  =
  --- linux/drivers/md/md.c.orig  2007-08-21 03:19:42.511576248 -0700
  +++ linux/drivers/md/md.c   2007-08-21 04:30:09.775525710 -0700
  @@ -24,4 +24,6 @@
  
  +   - autodetect dev list not array: Michael J. Evans 
[EMAIL PROTECTED]
  +
 
 Nowadays we use an SCM for such comments, not the source file(s).

SCM?  Is this automatic, where do I go to see it?

 
  This program is free software; you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation; either version 2, or (at your option)
  @@ -5752,13 +5754,26 @@ void md_autodetect_dev(dev_t dev)
* Searches all registered partitions for autorun RAID arrays
* at boot time.
*/
  -static dev_t detected_devices[128];
  -static int dev_cnt;
  +
  +static LIST_HEAD(all_detected_devices);
  +struct detected_devices_node {
  +   struct list_head list;
  +   dev_t dev;
  +};
   
   void md_autodetect_dev(dev_t dev)
   {
  -   if (dev_cnt = 0  dev_cnt  127)
  -   detected_devices[dev_cnt++] = dev;
  +   struct detected_devices_node *node_detected_dev;
  +   char strbuf[BDEVNAME_SIZE];
  +
  +   node_detected_dev = kzalloc(sizeof(*node_detected_dev), GFP_KERNEL);\
 
 Drop the trailing '\', as someone has already commented on.
 

I thought I had fixed that, I must have copied the unfixed version out of the 
way when I made the other changes to it.

  +   if (node_detected_dev) {
  +   node_detected_dev-dev = dev;
  +   list_add_tail(node_detected_dev-list, all_detected_devices);
  +   } else {
  +   printk(KERN_CRIT md: md_autodetect_dev: kzAlloc node failed
  +(null return), skipping dev(%d,%d)\n, MAJOR(dev), 
  MINOR(dev));
 
 printk() formatting is bad.
 Drop the  (null return) and indent that line more than the
 printk line is indented.
 
  +   }
   }
   
   
 
 ---
 ~Randy
 *** Remember to use Documentation/SubmitChecklist when testing your code ***
 


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch v3 1/1] md: Software Raid autodetect dev list not array

2007-08-27 Thread Randy Dunlap

Michael J. Evans wrote:

On Monday 27 August 2007, Randy Dunlap wrote:

On Mon, 27 Aug 2007 15:16:21 -0700 Michael J. Evans wrote:


=
--- linux/drivers/md/md.c.orig  2007-08-21 03:19:42.511576248 -0700
+++ linux/drivers/md/md.c   2007-08-21 04:30:09.775525710 -0700
@@ -24,4 +24,6 @@

+   - autodetect dev list not array: Michael J. Evans 

[EMAIL PROTECTED]

+

Nowadays we use an SCM for such comments, not the source file(s).


SCM?  Is this automatic, where do I go to see it?


See Documentation/SubmittingPatches:
14) The canonical patch format:

The canonical patch message body contains the following:

 - A from line specifying the patch author.

 - An empty line.

 - The body of the explanation, which will be copied to the
   permanent changelog to describe this patch.

 - The Signed-off-by: lines, described above, which will
   also go in the changelog.

 - A marker line containing simply ---.

 - Any additional comments not suitable for the changelog.

 - The actual patch (diff output).


so just put whatever you want to be in the permanent SCM logs
into the body of the explanation part of the email.


--
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html