Re: [CentOS] Re: question about software Raid 1

2008-10-01 Thread Les Mikesell

Alexander Georgiev wrote:

2008/9/30 Les Mikesell [EMAIL PROTECTED]:

BTW, there is - even with current kernels - no speed gain in using RAID1 -
see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .

I don't think I believe that - you can see the reads alternating drives by
watching the lights.


Indeed, there is a patch linux-2.6-dm-mirroring.patch in Centos5.2
kernel sources which implements a proper body of choose-mirror()
function.


Which also explains why, once my mirror was corrupt, that a new problem 
would show up every few weeks even after the cause (bad RAM) was fixed.


--
  Les Mikesell
   [EMAIL PROTECTED]
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Re: question about software Raid 1

2008-10-01 Thread Kanwar Ranbir Sandhu
On Sun, 2008-09-21 at 21:01 +0200, Kay Diederichs wrote:
 Fact is that with CentOS-5 kernels (but not with CentOS-4, as this 
 functionality became available in kernel 2.6.17) you could (or rather 
 _should_ regularly)
 echo check  /sys/block/mdX/md/sync_action
 to check agreement between the two (or more) copies. When this finishes, 
 /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You 
 can fix these with
 echo repair  /sys/block/mdX/md/sync_action

Interesting.  I'll give this a go on my own desktop system which is
running RAID 1.

You said above, When this finishes..., but how do you know the check
is completed?  I saw this in /var/log/messages: 

Oct  1 11:02:47 ranbir kernel: md: data-check of RAID array md0
Oct  1 11:02:47 ranbir kernel: md: minimum _guaranteed_  speed: 1000 
KB/sec/disk.
Oct  1 11:02:47 ranbir kernel: md: using maximum available idle IO bandwidth 
(but not more than 20 KB/sec) for data-check.
Oct  1 11:02:47 ranbir kernel: md: using 128k window, over a total of 104320 
blocks.
Oct  1 11:02:48 ranbir kernel: md: md0: data-check done.
Oct  1 11:02:48 ranbir kernel: RAID1 conf printout:
Oct  1 11:02:48 ranbir kernel: --- wd:2 rd:2
Oct  1 11:02:48 ranbir kernel: disk 0, wo:0, o:1, dev:sda1
Oct  1 11:02:48 ranbir kernel: disk 1, wo:0, o:1, dev:sdb1

There was nothing else after the last line.  I don't know exactly what
the disk lines mean.

Regards,

Ranbir

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Re: question about software Raid 1

2008-10-01 Thread nate
Kanwar Ranbir Sandhu wrote:

 You said above, When this finishes..., but how do you know the check
 is completed?  I saw this in /var/log/messages:

cat /proc/mdstat ? That at least shows status of RAID rebuilds, not
sure about other types of tasks.

nate



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Re: question about software Raid 1

2008-10-01 Thread Kanwar Ranbir Sandhu
On Wed, 2008-10-01 at 12:09 -0400, Toby Bluhm wrote: 
 cat /proc/mdstat gives progress
 
 cat /sys/block/md0/md/sync_action gives current mode

Of course!  I guess when I ran the check on md0, it finished before I
had the opportunity to watch the progress, so I wasn't sure what to
check.

Also, I just noticed this:

Oct  1 11:02:48 ranbir kernel: md: md0: data-check done.

Whoops!  It was right there in the log, and I completely missed it.

Regards,

Ranbir

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Re: question about software Raid 1

2008-09-30 Thread Les Mikesell

Kay Diederichs wrote:


Fact is that with CentOS-5 kernels (but not with CentOS-4, as this 
functionality became available in kernel 2.6.17) you could (or rather 
_should_ regularly)

   echo check  /sys/block/mdX/md/sync_action
to check agreement between the two (or more) copies. When this finishes, 
/sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You 
can fix these with

   echo repair  /sys/block/mdX/md/sync_action

This applies to at least RAID1 and RAID5.
At this point the question arises: how does the repair job know which 
copy is the correct one? I have no answer to this question.


Thanks for posting this.  I have a machine that periodically had 
filesystem errors on a RAID1 volume that I eventually found were caused 
by bad RAM but even after replacing it I'd still see filesystem problems 
 reappear every few weeks.  It turned out that there were quite a few 
mismatched blocks between the mirrors and the fsck passes must have 
sometimes seen the good copy but subsequently the still-bad alternate 
would be used.  Now I've done a repair and fsck and so far everything 
seems stable.  It's hard to tell with problems that only happen once or 
twice a month, though.  I suppose I have some files with corrupt 
contents on there but they are backups that will expire as more current 
ones are saved anyway.


BTW, there is - even with current kernels - no speed gain in using RAID1 
- see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .


I don't think I believe that - you can see the reads alternating drives 
by watching the lights.


--
  Les Mikesell
   [EMAIL PROTECTED]
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Re: question about software Raid 1

2008-09-21 Thread Kay Diederichs

Nataraj wrote:

Does software raid 1 compare checksums or otherwise verify that the same
bits are coming from both disks during reads?  What I'm interested in,
is whether bit errors that were somehow undetected by the hardware would
be detected by the raid 1 software.

Thanks,
Nataraj


I've been thinking about this as well.

Fact is that with CentOS-5 kernels (but not with CentOS-4, as this 
functionality became available in kernel 2.6.17) you could (or rather 
_should_ regularly)

   echo check  /sys/block/mdX/md/sync_action
to check agreement between the two (or more) copies. When this finishes, 
/sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You 
can fix these with

   echo repair  /sys/block/mdX/md/sync_action

This applies to at least RAID1 and RAID5.
At this point the question arises: how does the repair job know which 
copy is the correct one? I have no answer to this question.


BTW, there is - even with current kernels - no speed gain in using RAID1 
- see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .


HTH a bit,

Kay

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Re: question about software Raid 1

2008-09-21 Thread John R Pierce

Kay Diederichs wrote:
BTW, there is - even with current kernels - no speed gain in using 
RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .


except, thats wrong. I unwrapped a recent kernel source tarball from 
kernel.org and found...


static struct mirror *choose_mirror(struct mirror_set *ms, sector_t sector)
{
   struct mirror *m = get_default_mirror(ms);

   do {
   if (likely(!atomic_read(m-error_count)))
   return m;

   if (m-- == ms-mirror)
   m += ms-nr_mirrors;
   } while (m != get_default_mirror(ms));

   return NULL;
}


so it appears its a round robin ...
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Re: question about software Raid 1

2008-09-21 Thread Nataraj
On Sun, 2008-09-21 at 21:01 +0200, Kay Diederichs wrote:
 Nataraj wrote:
  Does software raid 1 compare checksums or otherwise verify that the same
  bits are coming from both disks during reads?  What I'm interested in,
  is whether bit errors that were somehow undetected by the hardware would
  be detected by the raid 1 software.
  
  Thanks,
  Nataraj
 
 I've been thinking about this as well.
 
 Fact is that with CentOS-5 kernels (but not with CentOS-4, as this 
 functionality became available in kernel 2.6.17) you could (or rather 
 _should_ regularly)
 echo check  /sys/block/mdX/md/sync_action
 to check agreement between the two (or more) copies. When this finishes, 
 /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You 
 can fix these with
 echo repair  /sys/block/mdX/md/sync_action
 
 This applies to at least RAID1 and RAID5.
 At this point the question arises: how does the repair job know which 
 copy is the correct one? I have no answer to this question.
 
 BTW, there is - even with current kernels - no speed gain in using RAID1 
 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .
 
 HTH a bit,
 
 Kay
 

Hi Kay,

From reading the following url:
http://linux-raid.osdl.org/index.php/RAID_Administration

my understanding is that if repair detects a read error on one of the
drives, and sucessfully reads the corresponding data from the other
drive, then it will attempt to rewrite those blocks on the drive that
got the read error.  It looks like it may do this even if you only run
check.  I don't think it can repair when there is a data discrepency
without the hardware returning an error.  This is primarily what brings
up my concern over sata drives, because I think the hardware error
detection is inferior to SCSI or SAS drives.

Nataraj




___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Re: question about software Raid 1

2008-09-21 Thread Nataraj
On Sun, 2008-09-21 at 12:53 -0700, John R Pierce wrote:
 Kay Diederichs wrote:
  BTW, there is - even with current kernels - no speed gain in using 
  RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .
 
 except, thats wrong. I unwrapped a recent kernel source tarball from 
 kernel.org and found...
 
 static struct mirror *choose_mirror(struct mirror_set *ms, sector_t sector)
 {
 struct mirror *m = get_default_mirror(ms);
 
 do {
 if (likely(!atomic_read(m-error_count)))
 return m;
 
 if (m-- == ms-mirror)
 m += ms-nr_mirrors;
 } while (m != get_default_mirror(ms));
 
 return NULL;
 }
 
 
 so it appears its a round robin ...
 ___

This makes sense.  I'm pretty sure that tests that I've run in the past
using bonnie++ or iozone showed faster reads with raid1 than with a
single drive.  I would think that if the drives are on seperate
controllers (and depending upon the performance/capacity of the drives
and controllers), there could be notable improvements.

Nataraj


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Re: question about software Raid 1

2008-09-21 Thread John R Pierce



This makes sense.  I'm pretty sure that tests that I've run in the past
using bonnie++ or iozone showed faster reads with raid1 than with a
single drive.  I would think that if the drives are on seperate
controllers (and depending upon the performance/capacity of the drives
and controllers), there could be notable improvements.
  



with SATA or SAS, of course, every drive is on its own channel.   even 
with PATA, at 100 or 133Mbyte/sec, only the fastest newer drives would 
saturate the bus doing two transfers concurrently.




now, after i wrote what I did above, I dug up the kernel.org 2.6.18 
kernel that RHEL/CentOS 5 is based on, and it still had the older code 
sequence as shown in that 'to do' list entry...but I didn't run the 
RHEL patch sequences against it, its quite possible RHEL retrofitted 
this patch to it.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos