Re: [CentOS] Re: question about software Raid 1
Alexander Georgiev wrote: 2008/9/30 Les Mikesell [EMAIL PROTECTED]: BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing . I don't think I believe that - you can see the reads alternating drives by watching the lights. Indeed, there is a patch linux-2.6-dm-mirroring.patch in Centos5.2 kernel sources which implements a proper body of choose-mirror() function. Which also explains why, once my mirror was corrupt, that a new problem would show up every few weeks even after the cause (bad RAM) was fixed. -- Les Mikesell [EMAIL PROTECTED] ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Re: question about software Raid 1
On Sun, 2008-09-21 at 21:01 +0200, Kay Diederichs wrote: Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair /sys/block/mdX/md/sync_action Interesting. I'll give this a go on my own desktop system which is running RAID 1. You said above, When this finishes..., but how do you know the check is completed? I saw this in /var/log/messages: Oct 1 11:02:47 ranbir kernel: md: data-check of RAID array md0 Oct 1 11:02:47 ranbir kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Oct 1 11:02:47 ranbir kernel: md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for data-check. Oct 1 11:02:47 ranbir kernel: md: using 128k window, over a total of 104320 blocks. Oct 1 11:02:48 ranbir kernel: md: md0: data-check done. Oct 1 11:02:48 ranbir kernel: RAID1 conf printout: Oct 1 11:02:48 ranbir kernel: --- wd:2 rd:2 Oct 1 11:02:48 ranbir kernel: disk 0, wo:0, o:1, dev:sda1 Oct 1 11:02:48 ranbir kernel: disk 1, wo:0, o:1, dev:sdb1 There was nothing else after the last line. I don't know exactly what the disk lines mean. Regards, Ranbir ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Re: question about software Raid 1
Kanwar Ranbir Sandhu wrote: You said above, When this finishes..., but how do you know the check is completed? I saw this in /var/log/messages: cat /proc/mdstat ? That at least shows status of RAID rebuilds, not sure about other types of tasks. nate ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Re: question about software Raid 1
On Wed, 2008-10-01 at 12:09 -0400, Toby Bluhm wrote: cat /proc/mdstat gives progress cat /sys/block/md0/md/sync_action gives current mode Of course! I guess when I ran the check on md0, it finished before I had the opportunity to watch the progress, so I wasn't sure what to check. Also, I just noticed this: Oct 1 11:02:48 ranbir kernel: md: md0: data-check done. Whoops! It was right there in the log, and I completely missed it. Regards, Ranbir ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Re: question about software Raid 1
Kay Diederichs wrote: Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair /sys/block/mdX/md/sync_action This applies to at least RAID1 and RAID5. At this point the question arises: how does the repair job know which copy is the correct one? I have no answer to this question. Thanks for posting this. I have a machine that periodically had filesystem errors on a RAID1 volume that I eventually found were caused by bad RAM but even after replacing it I'd still see filesystem problems reappear every few weeks. It turned out that there were quite a few mismatched blocks between the mirrors and the fsck passes must have sometimes seen the good copy but subsequently the still-bad alternate would be used. Now I've done a repair and fsck and so far everything seems stable. It's hard to tell with problems that only happen once or twice a month, though. I suppose I have some files with corrupt contents on there but they are backups that will expire as more current ones are saved anyway. BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing . I don't think I believe that - you can see the reads alternating drives by watching the lights. -- Les Mikesell [EMAIL PROTECTED] ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] Re: question about software Raid 1
Nataraj wrote: Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in, is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software. Thanks, Nataraj I've been thinking about this as well. Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair /sys/block/mdX/md/sync_action This applies to at least RAID1 and RAID5. At this point the question arises: how does the repair job know which copy is the correct one? I have no answer to this question. BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing . HTH a bit, Kay ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Re: question about software Raid 1
Kay Diederichs wrote: BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing . except, thats wrong. I unwrapped a recent kernel source tarball from kernel.org and found... static struct mirror *choose_mirror(struct mirror_set *ms, sector_t sector) { struct mirror *m = get_default_mirror(ms); do { if (likely(!atomic_read(m-error_count))) return m; if (m-- == ms-mirror) m += ms-nr_mirrors; } while (m != get_default_mirror(ms)); return NULL; } so it appears its a round robin ... ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Re: question about software Raid 1
On Sun, 2008-09-21 at 21:01 +0200, Kay Diederichs wrote: Nataraj wrote: Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in, is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software. Thanks, Nataraj I've been thinking about this as well. Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair /sys/block/mdX/md/sync_action This applies to at least RAID1 and RAID5. At this point the question arises: how does the repair job know which copy is the correct one? I have no answer to this question. BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing . HTH a bit, Kay Hi Kay, From reading the following url: http://linux-raid.osdl.org/index.php/RAID_Administration my understanding is that if repair detects a read error on one of the drives, and sucessfully reads the corresponding data from the other drive, then it will attempt to rewrite those blocks on the drive that got the read error. It looks like it may do this even if you only run check. I don't think it can repair when there is a data discrepency without the hardware returning an error. This is primarily what brings up my concern over sata drives, because I think the hardware error detection is inferior to SCSI or SAS drives. Nataraj ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Re: question about software Raid 1
On Sun, 2008-09-21 at 12:53 -0700, John R Pierce wrote: Kay Diederichs wrote: BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing . except, thats wrong. I unwrapped a recent kernel source tarball from kernel.org and found... static struct mirror *choose_mirror(struct mirror_set *ms, sector_t sector) { struct mirror *m = get_default_mirror(ms); do { if (likely(!atomic_read(m-error_count))) return m; if (m-- == ms-mirror) m += ms-nr_mirrors; } while (m != get_default_mirror(ms)); return NULL; } so it appears its a round robin ... ___ This makes sense. I'm pretty sure that tests that I've run in the past using bonnie++ or iozone showed faster reads with raid1 than with a single drive. I would think that if the drives are on seperate controllers (and depending upon the performance/capacity of the drives and controllers), there could be notable improvements. Nataraj ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Re: question about software Raid 1
This makes sense. I'm pretty sure that tests that I've run in the past using bonnie++ or iozone showed faster reads with raid1 than with a single drive. I would think that if the drives are on seperate controllers (and depending upon the performance/capacity of the drives and controllers), there could be notable improvements. with SATA or SAS, of course, every drive is on its own channel. even with PATA, at 100 or 133Mbyte/sec, only the fastest newer drives would saturate the bus doing two transfers concurrently. now, after i wrote what I did above, I dug up the kernel.org 2.6.18 kernel that RHEL/CentOS 5 is based on, and it still had the older code sequence as shown in that 'to do' list entry...but I didn't run the RHEL patch sequences against it, its quite possible RHEL retrofitted this patch to it. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos