Re: raid5 hang on get_active_stripe

2006-06-02 Thread Neil Brown
On Friday June 2, [EMAIL PROTECTED] wrote:
 On Thu, 1 Jun 2006, Neil Brown wrote:
 
  I've got one more long-shot I would like to try first.  If you could
  backout that change to ll_rw_block, and apply this patch instead.
  Then when it hangs, just cat the stripe_cache_active file and see if
  that unplugs things or not (cat it a few times).
 
 nope that didn't unstick it... i had to raise stripe_cache_size (from 256 
 to 768... 512 wasn't enough)...
 
 -dean

Ok, thanks.
I still don't know what is really going on, but I'm 99.9863% sure this
will fix it, and is a reasonable thing to do.
(Yes, I lose a ';'.  That is deliberate).

Please let me know what this proves, and thanks again for your
patience.

NeilBrown


Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid5.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~   2006-05-28 21:56:56.0 +1000
+++ ./drivers/md/raid5.c2006-06-02 17:24:07.0 +1000
@@ -285,7 +285,7 @@ static struct stripe_head *get_active_st
  (conf-max_nr_stripes 
*3/4)
 || 
!conf-inactive_blocked),
conf-device_lock,
-   unplug_slaves(conf-mddev);
+   
raid5_unplug_device(conf-mddev-queue)
);
conf-inactive_blocked = 0;
} else
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Clarifications about check/repair, i.e. RAID SCRUBBING

2006-06-02 Thread Roy Waldspurger

Hi Neil/folks,

I'm seeking some (hopefully) simple clarifications about the newer raid 
checking and scrubbing behavior present in more recent kernels.  I must 
say that I was more than pleased when I learned about the new 
functionality.  Kudos, Neil for this addition.  Unfortunately, because 
this is new it's not to be found in the FAQs or HOW-TOs... with the 
exception of the Gentoo HOWTO Install on Software RAID.


I've looked at the following sources of info:
linux-2.6.16.19/Documentation/md.txt
linux-2.6.16.19/drivers/md:raid5.c and raid6main.c
(the raid5_end_read_request and raid6_end_read_request routines)

emails on the linux-raid mailing list, in particular:

http://lkml.org/lkml/2005/12/4/118
http://www.mail-archive.com/linux-raid@vger.kernel.org/msg04615.html
===

In any regard:

I'm talking about triggering the following functionality:

echo check  /sys/block/mdX/md/sync_action
echo repair  /sys/block/mdX/md/sync_action

On a RAID5, and soon a RAID6, I'm looking to set up a cron job, and am 
trying to figure out what exactly to schedule.  The answers to the 
following questions might shed some light on this:


1. GENERALLY SPEAKING, WHAT IS THE DIFFERENCE BETWEEN THE CHECK AND 
REPAIR COMMANDS?
The md.txt doc mentions for check that a repair may also happen for 
some raid levels.
Which RAID levels, and in what cases?  If I perform a check is there a 
cache of bad blocks that need to be fixed that can quickly be repaired 
by executing the repair command?  Or would it go through the entire 
array again?  I'm working with new drives, and haven't come across any 
bad blocks to test this with.


2. CAN CHECK BE RUN ON A DEGRADED ARRAY (say with N out of N+1 disks 
on a RAID level 5)?  I can test this out, but was it designed to do 
this, versus REPAIR only working on a full set of active drives? 
Perhaps repair is assuming that I have N+1 disks so that parity can be 
WRITTEN?


3. RE: FEEDBACK/LOGGING: it seems that I might see some messages in 
dmesg logging output such as raid5:read error corrected!, is that 
right?  I realize that mismatch_count can also be used to see if there 
was any action during a check or repair.  I'm assuming this stuff 
doesn't make its way into an email.


4. DOES REPAIR PERFORM READS TO CHECK THE ARRAY, AND THEN WRITE TO THE 
ARRAY *ONLY WHEN NECESSARY* TO PERFORM FIXES FOR CERTAIN BLOCKS?  (I 
know, it's sorta a repeat of question number 1+2).


5. IS THERE ILL-EFFECT TO STOP EITHER CHECK OR REPAIR BY ISSUING IDLE?

6. IS IT AT ALL POSSIBLE TO CHECK A CERTAIN RANGE OF BLOCKS?  And to 
keep track of which blocks were checked?  The motivation is to start 
checking some blocks overnight, and to pick-up where I left off the next 
night...


7. ANY OTHER CONSIDERATIONS WHEN SCRUBBING THE RAID?

Sorry for some of these questions being so similar in nature.  I just 
want to make sure I understand it correctly.


Neil, again, a BIG thanks for this new functionality.  I'm looking 
forward to putting a system in place to exercise my drives!


Cheers,

-- roy
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clarifications about check/repair, i.e. RAID SCRUBBING

2006-06-02 Thread Neil Brown
On Friday June 2, [EMAIL PROTECTED] wrote:
 
 In any regard:
 
 I'm talking about triggering the following functionality:
 
 echo check  /sys/block/mdX/md/sync_action
 echo repair  /sys/block/mdX/md/sync_action
 
 On a RAID5, and soon a RAID6, I'm looking to set up a cron job, and am 
 trying to figure out what exactly to schedule.  The answers to the 
 following questions might shed some light on this:
 
 1. GENERALLY SPEAKING, WHAT IS THE DIFFERENCE BETWEEN THE CHECK AND 
 REPAIR COMMANDS?
 The md.txt doc mentions for check that a repair may also happen for 
 some raid levels.
 Which RAID levels, and in what cases?  If I perform a check is there a 
 cache of bad blocks that need to be fixed that can quickly be repaired 
 by executing the repair command?  Or would it go through the entire 
 array again?  I'm working with new drives, and haven't come across any 
 bad blocks to test this with.

'check' just reads everything and doesn't trigger any writes unless a
read error is detected, in which case the normally read-error handing
kicks in.  So it can be useful on a read-only array.

'repair' does that same but when it finds an inconsistency is corrects
it by writing something.
If any raid personality had not be taught to specifically understand
'check', then a 'check' run would effect a 'repair'.  I think 2.6.17
will have all personalities doing the right thing.

check doesn't keep a record of problems, just a count.  'repair' will
reprocess the whole array.


 
 2. CAN CHECK BE RUN ON A DEGRADED ARRAY (say with N out of N+1 disks 
 on a RAID level 5)?  I can test this out, but was it designed to do 
 this, versus REPAIR only working on a full set of active drives? 
 Perhaps repair is assuming that I have N+1 disks so that parity can be 
 WRITTEN?

No, check on a degraded raid5, or a raid6 with 2 missing devices, or a
raid1 with only one device will not do anything.  It will terminate
immediately.   After all, there is nothing useful that it can do.

 
 3. RE: FEEDBACK/LOGGING: it seems that I might see some messages in 
 dmesg logging output such as raid5:read error corrected!, is that 
 right?  I realize that mismatch_count can also be used to see if there 
 was any action during a check or repair.  I'm assuming this stuff 
 doesn't make its way into an email.

You are correct on all counts.  mdadm --monitor doesn't know about
this yet. ((writes notes in mdadm todo list)).

 
 4. DOES REPAIR PERFORM READS TO CHECK THE ARRAY, AND THEN WRITE TO THE 
 ARRAY *ONLY WHEN NECESSARY* TO PERFORM FIXES FOR CERTAIN BLOCKS?  (I 
 know, it's sorta a repeat of question number 1+2).
 

repair only writes when necessary.  In the normal case, it will only
read every blocks.


 5. IS THERE ILL-EFFECT TO STOP EITHER CHECK OR REPAIR BY ISSUING IDLE?

No.

 
 6. IS IT AT ALL POSSIBLE TO CHECK A CERTAIN RANGE OF BLOCKS?  And to 
 keep track of which blocks were checked?  The motivation is to start 
 checking some blocks overnight, and to pick-up where I left off the next 
 night...

Not yet.  It might be possible one day.

 
 7. ANY OTHER CONSIDERATIONS WHEN SCRUBBING THE RAID?
 

Not that I am aware of.

NeilBrown


 Sorry for some of these questions being so similar in nature.  I just 
 want to make sure I understand it correctly.
 
 Neil, again, a BIG thanks for this new functionality.  I'm looking 
 forward to putting a system in place to exercise my drives!
 
 Cheers,
 
 -- roy
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Problems with device-mapper on top of RAID-5 and RAID-6

2006-06-02 Thread Dr. Uwe Meyer-Gruhl

Hi List,


just to draw your attention to some discussion starting out here:

http://thread.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1576/focus=1576

and going on here:

http://thread.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1617/focus=1617

To recap, what has been found so far is:

1. There are problems with the combination of RAID and device mapper 
(e.g. for encrypted filesystems). The thread started off with this 
observation.
2. There are filesystem corruptions with heavy loads (i.e. copying big 
files or many files to the filesystem). The bug usually takes long to 
reproduce.

3. Problems occur with any filesystem type (ext3, reiser4 et. al.).
4. Problems occur with RAID-5 and RAID-6. Both are O.K. without dm-crypt.
5. Problems are unaffected by different ciphers under dm-crypt (at least 
AES, Serpent and Twofish expose the bug). dm-linear is reported to have 
failed, too. So, we suspect that neither dm-crypt nor ciphers are the 
culprit here, but rather the device mapper core functionality or RAID 
subsystem.
6. Bug seems to exist in at least kernel 2.6.13 to 2.6.16 (2.6.17 not 
yet tested, earlier versions may be affected).


There have been discussions going on about his from earlier kernel 
versions, like here:


http://lwn.net/Articles/150583/

Neil's suggestion indicates that there may be a race condition stacking 
md and dm over each other, but I have not yet tested that patch. I once 
had problems stacking cryptoloop over RAID-6, so it might really be a 
stacking problem. We don't know yet if LVM over RAID is affected as well.


This bug is very critical and should be fixed as soon as possible, IMHO.



Uwe
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clarifications about check/repair, i.e. RAID SCRUBBING

2006-06-02 Thread Roy Waldspurger
Thanks for clearing things up, Neil.  Looks like I will be issuing 
weekly repairs on most of the arrays.


Cheers,

-- roy


Neil Brown wrote:

On Friday June 2, [EMAIL PROTECTED] wrote:


In any regard:

I'm talking about triggering the following functionality:

echo check  /sys/block/mdX/md/sync_action
echo repair  /sys/block/mdX/md/sync_action

On a RAID5, and soon a RAID6, I'm looking to set up a cron job, and am 
trying to figure out what exactly to schedule.  The answers to the 
following questions might shed some light on this:


1. GENERALLY SPEAKING, WHAT IS THE DIFFERENCE BETWEEN THE CHECK AND 
REPAIR COMMANDS?
The md.txt doc mentions for check that a repair may also happen for 
some raid levels.
Which RAID levels, and in what cases?  If I perform a check is there a 
cache of bad blocks that need to be fixed that can quickly be repaired 
by executing the repair command?  Or would it go through the entire 
array again?  I'm working with new drives, and haven't come across any 
bad blocks to test this with.



'check' just reads everything and doesn't trigger any writes unless a
read error is detected, in which case the normally read-error handing
kicks in.  So it can be useful on a read-only array.

'repair' does that same but when it finds an inconsistency is corrects
it by writing something.
If any raid personality had not be taught to specifically understand
'check', then a 'check' run would effect a 'repair'.  I think 2.6.17
will have all personalities doing the right thing.

check doesn't keep a record of problems, just a count.  'repair' will
reprocess the whole array.



2. CAN CHECK BE RUN ON A DEGRADED ARRAY (say with N out of N+1 disks 
on a RAID level 5)?  I can test this out, but was it designed to do 
this, versus REPAIR only working on a full set of active drives? 
Perhaps repair is assuming that I have N+1 disks so that parity can be 
WRITTEN?



No, check on a degraded raid5, or a raid6 with 2 missing devices, or a
raid1 with only one device will not do anything.  It will terminate
immediately.   After all, there is nothing useful that it can do.


3. RE: FEEDBACK/LOGGING: it seems that I might see some messages in 
dmesg logging output such as raid5:read error corrected!, is that 
right?  I realize that mismatch_count can also be used to see if there 
was any action during a check or repair.  I'm assuming this stuff 
doesn't make its way into an email.



You are correct on all counts.  mdadm --monitor doesn't know about
this yet. ((writes notes in mdadm todo list)).


4. DOES REPAIR PERFORM READS TO CHECK THE ARRAY, AND THEN WRITE TO THE 
ARRAY *ONLY WHEN NECESSARY* TO PERFORM FIXES FOR CERTAIN BLOCKS?  (I 
know, it's sorta a repeat of question number 1+2).





repair only writes when necessary.  In the normal case, it will only
read every blocks.




5. IS THERE ILL-EFFECT TO STOP EITHER CHECK OR REPAIR BY ISSUING IDLE?



No.


6. IS IT AT ALL POSSIBLE TO CHECK A CERTAIN RANGE OF BLOCKS?  And to 
keep track of which blocks were checked?  The motivation is to start 
checking some blocks overnight, and to pick-up where I left off the next 
night...



Not yet.  It might be possible one day.



7. ANY OTHER CONSIDERATIONS WHEN SCRUBBING THE RAID?




Not that I am aware of.

NeilBrown



Sorry for some of these questions being so similar in nature.  I just 
want to make sure I understand it correctly.


Neil, again, a BIG thanks for this new functionality.  I'm looking 
forward to putting a system in place to exercise my drives!


Cheers,

-- roy
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html






-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


problems with raid6, mdadm: RUN_ARRAY failed

2006-06-02 Thread Kresimir Kukulj
I have some old controler Mylex Acceleraid 170LP with 6 SCSI 36GB disks on
it. Running hardware raid5 resulted with very poor performance (7Mb/sec in
sequential writing, with horrid iowait).

So I configured it to export 6 logical disks and tried creating raid6 and see
if I can get better results. Trying to create an array with a missing component
results in:

~/mdadm-2.5/mdadm -C /dev/md3 -l6 -n6 /dev/rd/c0d0p3  /dev/rd/c0d2p3 
/dev/rd/c0d3p3 /dev/rd/c0d4p3 /dev/rd/c0d5p3 missing
mdadm: RUN_ARRAY failed: Input/output error

strace shows it barfed on some ioctl:

read(4, \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0..., 1024) = 
1024
ioctl(4, BLKGETSIZE64, 0xbffe64b0)  = 0
ioctl(4, BLKFLSBUF, 0)  = 0
_llseek(4, 4096, [4096], SEEK_SET)  = 0
read(4, \3\0\0\0\4\0\0\0\5\0\0\0\355]\325?\2\0\0\0\0\0\0\0\0\0..., 1024) = 
1024
close(4)= 0
open(/dev/rd/c0d5p3, O_RDWR|O_EXCL)   = 4
ioctl(4, BLKGETSIZE64, 0xbffe66d0)  = 0
_llseek(4, 31330009088, [31330009088], SEEK_SET) = 0
write(4, \374N+\251\0\0\0\0Z\0\0\0\0\0\0\0\0\0\0\0\34\3077\262\353..., 4096) 
= 4096
fsync(4)= 0
close(4)= 0
ioctl(3, 0x40140921, 0xbffe67a0)= 0
ioctl(3, 0x400c0930, 0xbffe67a0)= -1 EIO (Input/output error)
write(2, mdadm: RUN_ARRAY failed: Input/o..., 44) = 44
fstat64(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 3), ...}) = 0
ioctl(3, 0x800c0910, 0xbffe6660)= 0
fstat64(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 3), ...}) = 0
ioctl(3, 0x800c0910, 0xbffe6660)= 0
ioctl(3, 0x932, 0)  = 0
exit_group(1)   = ?

Running:
vanilla 2.6.16.19
Personalities : [linear] [raid0] [raid1] [raid5] [raid4] [raid6]
mdadm-2.5

Creating a filesystem (mke2fs) on /dev/rd/c0d[0234]p3 works without a problem
so devices are accessible.

Any hints ?

-- 
Kresimir Kukulj  [EMAIL PROTECTED]
+--+
Remember, if you break Debian, you get to keep both parts.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html