Re: FailSpare event?
google BadBlockHowto Any just google it response sounds glib, but this is actually how to do it :-) If you're new to md and mdadm, don't forget to actually remove the drive from the array before you start working on it with 'dd' -Mike Mike wrote: On Fri, 12 Jan 2007, Neil Brown might have said: On Thursday January 11, [EMAIL PROTECTED] wrote: So I'm ok for the moment? Yes, I need to find the error and fix everything back to the (S) state. Yes, OK for the moment. The messages in $HOST:/var/log/messages for the time of the email are: Jan 11 16:04:25 elo kernel: sd 2:0:4:0: SCSI error: return code = 0x802 Jan 11 16:04:25 elo kernel: sde: Current: sense key: Hardware Error Jan 11 16:04:25 elo kernel: Additional sense: Internal target failure Jan 11 16:04:25 elo kernel: Info fld=0x10b93c4d Jan 11 16:04:25 elo kernel: end_request: I/O error, dev sde, sector 280575053 Jan 11 16:04:25 elo kernel: raid5: Disk failure on sde2, disabling device. Operation continuing on 5 devices Given the sector number it looks likely that it was a superblock update. No idea how bad an 'internal target failure' is. Maybe powercycling the drive would 'fix' it, maybe not. On AIX boxes I can blink the drives to identify a bad/failing device. Is there a way to blink the drives in linux? Unfortunately not. NeilBrown I found the smartctl command. I have a 'long' test running in the background. I checked this drive and the other drives. This drive has been used the least (confirms it is a spare?) and is the only one with 'Total uncorrected errors' 0. How to determine the error, correct the error, or clear the error? Mike [EMAIL PROTECTED] ~]# smartctl -a /dev/sde smartctl version 5.36 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: SEAGATE ST3146707LC Version: D703 Serial number: 3KS30WY8 Device type: disk Transport protocol: Parallel SCSI (SPI-4) Local Time is: Thu Jan 11 17:00:26 2007 CST Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 48 C Drive Trip Temperature:68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 66108 Blocks received from initiator = 147374656 Blocks read from cache and sent to initiator = 42215 Number of read and write commands whose size = segment size = 12635583 Number of read and write commands whose size segment size = 0 Vendor (Seagate/Hitachi) factory information number of hours powered up = 3943.42 number of minutes until next internal SMART test = 94 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read:3540 0 354354 0.546 0 write: 00 0 0 0185.871 1 Non-medium error count:0 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Completed, segment failed -3943 - [- --] Long (extended) Self Test duration: 2726 seconds [45.4 minutes] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 repair issue with 2.6.16.36 kernel
Michel Lespinasse wrote: Hi, I'm hitting a small issue with a RAID1 array and a 2.6.16.36 kernel. Debian's mdadm package has a checkarray process which runs monthly and checks the RAID arrays. Among other things, this process does an echo check /sys/block/md1/md/sync_action . Looking into my RAID1 array, I noticed that /sys/block/md1/md/mismatch_cnt was set to 128 - so there is a small amount of unsynchronized blocks in my RAID1 partition. I tried to fix the issue by writing repair into /sys/block/md1/md/sync_action but the command was refused: # cat /sys/block/md0/md/sync_action idle # echo repair /sys/block/md1/md/sync_action echo: write error: invalid argument I looked at the sources for my kernel (2.6.16.36) and noticed that in md.c action_store(), the following code rejects the repair action (but accepts everything else and treats it as a repair): if (cmd_match(page, check)) set_bit(MD_RECOVERY_CHECK, mddev-recovery); else if (cmd_match(page, repair)) return -EINVAL; So I tried to issue a repair the hacky way: # echo asdf /sys/block/md1/md/sync_action # cat /sys/block/md1/md/sync_action repair # cat /proc/mdstat Personalities : [raid1] ... md1 : active raid1 hdg2[1] hde2[0] 126953536 blocks [2/2] [UU] [==..] resync = 14.2% (18054976/126953536) +finish=53.7min speed=33773K/sec ... unused devices: none # ... wait one hour ... # cat /sys/block/md1/md/sync_action idle # cat /sys/block/md1/md/mismatch_cnt 128 The kernel (still 2.6.16.36) reports it has repaired the array, but another check still shows 128 mismatched blocks: # echo check /sys/block/md1/md/sync_action # cat /sys/block/md1/md/sync_action check When I did the check, while I still had mismatches (and a SMART test was failing, so the drive definitely had problems) I didn't notice the error count going up on the drive, which I thought was odd and probably a bug. # ... wait one hour ... # cat /sys/block/md1/md/mismatch_cnt 128 I had the same problem with mismatch_cnt not decreasing. It seems to me that either it shouldn't be a counter, i.e. each mismatch should be associated with a block, and the count should be decreased when that block checks out in the future, or the mismatch and error count should be cleared out when a repair or check is run If it doesn't ever go back to zero though, it will be very difficult to write a reliable monitor for array health based on those files. I'm not sure it could ever be made perfectly reliable actually, so those files end up not being useful It's clear that something was done in the repair step though, as a SMART test on the drive worked after that So I'm a bit confused about how to proceed now... Well, the way I proceeded, since it didn't seem to me that I could rely on the array mismatch count or per-drive error counts was to fail the drive out of the array and re-add it. Everything was reset then. I looked at the source for debian's linux-2.6_2.6.18-8 kernel and I see that the issue with the inverted cmd_match(page, repair) condition is fixed there. So I assume you guys found this issue sometime between 2.6.16 and 2.6.18. Would you by any chance also know why the repair process did not work with 2.6.16.36 ??? Has any related bug been fixed recently ? Should I try again with a newer kernel, or should I rather avoid this for now ? Assuming the fix is small, is there any reason not to backport it into 2.6.16.x ? I would be grateful for any suggestions. Thanks, -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raidreconf for 5 x 320GB - 8 x 320GB
You don't want to use raidreconf unless I'm misunderstanding your goal - I have also had success with raidreconf but have had data-loss failures as well (I've posted to the list about it if you search). The data-loss failures were after I had run tests that showed me it should work. raidreconf is no longer maintained, so it's a dead end to try to hunt down the failures. Luckily, Neil Brown has added raid5 reshape support (same thing raidreconf did) to the md driver, so you can just use 'mdadm --grow' commands to do what you want. So I'd say to update your kernel to the newest 2.6.18.xx or whatever is out, and update mdadm give that a shot with your test partitions. The new versions are working fine as near as I can tell, and I've got them in use (FC5 machines - you can see their versions, and call me foolish for putting FC in production if you want) in a production environment with no issues. -Mike Timo Bernack wrote: Hi there, i am running a 5-disk RAID5 using mdadm on a suse 10.1 system. As the array is running out of space, i consider adding three more HDDs. Before i set up the current array, i made a small test with raidreconf: - build a 4-disk RAID5 /dev/md0 (with only 1.5gb for each partition) with 4.5gb userspace in total - put an ext3 filesystem on it - copy some data to it -- some episodes of American Dad ;-) - use raidreconf to add a 5th disk - use resize2fs to make use of the new additional space - check for the video-clips.. all fine (also compared checksums) This test was a full success, but of course it was very small scaled, so maybe there are issues that only come up when there is (much) more space involved. That leads to my questions: What are potential sources for failures (and thus, losing all data) reconfiguring the array using the method described above? Loss of power during the process (which would take quite some time, 24 hours minimum, i think) is one of them, i suppose. But are there known issues with raidreconf, concerning the 2TB-barrier, for example? I know that raidreconf is quite outdated, but it did what it promised on my system. I heard of the possibility to achieve the same result just by using mdadm, but this required a newer version of mdadm, and upgrading it and using a method that i can't test beforehand scares me a little -- a little more than letting out raidreconf on my precious data does ;-). All comments will be greatly appreciated! Timo P.S.: I do have a backup, but since it is scattered to a huge stack of CDs / DVDs (about 660 disks) it would be a terrible pain-in-the-ass to be forced to restore it again. In fact, getting away from storing my data using a DVD-burner was the main reason to build up the array at all. It took me about 1 week (!) to copy all these disks, as you can easily imagine. - Hardware: - Board / CPU: ASUS M2NPV-VM (4 x S-ATA onboard) / AMD Sempron 3200+ AM2 - Add. S-ATA-Controller: Promise SATA300 TX4 - HDDs: 5 x Western Digital Caviar SE 320GB SATA II (WD3200JS) Software (OpenSUSE 10.1 Default-Installation): - Kernel: 2.6.16 - mdadm - v2.2 - 5 December 2005 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Checking individual drive state
dean gaudet wrote: On Sun, 5 Nov 2006, Bradshaw wrote: I don't know how to scan the one disk for bad sectors, stopping the array and doing an fsck or similar throws errors, so I need help in determining whether the disc itself is faulty. try swapping the cable first. after that swap ports with another disk and see if the problem follows the port or the disk. you can see if smartctl -a (from smartmontools) tells you anything interesting. (it can be quite difficult, to impossible, to understand smartctl -a output though. but if you've got errors in the SMART error log that's a good place to start.) I don't think SMART output is that hard to understand. And checking the entire drive for errors is as easy as 'smartctl -t long /dev/drive' usually. If it is SATA as you say, you may need to put a '-d ata' in there. Wait for however long it says to wait, then do a 'smartctl -a /dev/drive' and you should see the self test log at the bottom. Did it finish? If not, there are bad sectors. If there are bad sectors, you should google the string 'BadBlockHowTo' to see if you can clear them (after failing the drive out of the array) Note that this won't tell you anything about cables or controllers or power or anything else that could and may be wrong. It's just for the drive media and firmware. -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New features?
Neil Brown wrote: On Tuesday October 31, [EMAIL PROTECTED] wrote: 1 Warm swap - replacing drives without taking down the array but maybe having to type in a few commands. Presumably a sata or sata/raid interface issue. (True hot swap is nice but not worth delaying warm- swap.) I believe that 2.6.18 has SATA hot-swap, so this should be available know ... providing you can find out what commands to use. I forgot 2.6.18 has SATA hot-swap, has anyone tested that? FWIW, SCSI (or SAS now, using SCSI or SATA drives) has full hot-swap with completely online drive exchanges. I have done this on recent kernels in production and it works. 2 Adding new disks to arrays. Allows incremental upgrades and to take advantage of the hard disk equivalent of Moore's law. Works for raid5 and linear. Raid6 one day. Also works for raid1! 4. Uneven disk sizes, eg adding a 400GB disk to a 2x200GB mirror to create a 400GB mirror. Together with 2 and 3, allows me to continuously expand a disk array. So you have a RAID1 (md) from sda and sdb, both 200GB, and you now have a sdc which is 400GB. So mdadm /dev/md0 -a /dev/sdc mdadm /dev/md0 -f /dev/sda mdadm /dev/md0 -r /dev/sda # wait for recovery Could be: mdadm /dev/md0 -a /dev/sdc mdadm --grow /dev/md0 --raid-devices=3 # 3-disk mirror # wait for recovery # don't forget grub-install /dev/sda (or similar)! mdadm /dev/md0 -f /dev/sda mdadm /dev/md0 -r /dev/sda mdadm --grow /dev/md0 --raid-devices=2 # 2-disk again # Run a 'smartctl -d ata -t long /dev/sdb' before next line... mdadm /dev/md0 -f /dev/sdb mdadm /dev/md0 -r /dev/sdb mdadm -C /dev/md1 -l linear -n 2 /dev/sda /dev/sdb mdadm /dev/md0 -a /dev/md1 # wait for recovery mdadm --grow /dev/md0 --size=max You do run with a degraded array for a while, but you can do it entirely online. It might be possible to decrease the time when the array is degraded, but it is too late at night to think about that. All I did was decrease the degradation time, but hey it could help. And don't forget the long SMART test before running degraded for real. Could save you some pain. -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: future hardware
Justin Piszcz wrote: cards perhaps. Or, after reading that article, consider SAS maybe..? I hate to be the guy that breaks out the unsubstantiated anecdotal evidence, but I've got a RAID10 with 4x300GB Maxtor SAS drives, and I've already had two trigger their internal SMART I'm about to fail message. They've been in service now for around 2 months, and they do have an okay temperature, and I have not been beating the crap out of them. More than a little disappointing. They are fast though... -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MegaRaid problems..
Gordon Henderson wrote: This might not be strictly on-topic here, but you may provide enlightenment, as a lot of web searching hasn't helpmed me so-far )-: A client has bought some Dell hardware - Dell 1950 1U server, 2 on-board SATA drives connected to a Fusion MPT SAS controller. This works just fine. The on-board drives are mirrored using s/w RAID, which is great and just how I want it. The server also has 2 x Dell PERC dual-port SAS Raid Cards which have LSI MegaRaid chipssets on them. One cable from each raid card connect to half this is 2.6.18), and at boot time the dmesg output sees the drives in the external enclosure, but does not associate them to sdX drives! The underlying distro is Debian stable, but I doubt theres anything of issue there. I have several Dell 2950s (same chassis) and they have this problem. You can't do the PERC card and get JBOD basically. The PERC5 card has no JBOD mode, whereas the PERC4 card did. Dell said they may get a BIOS update, but wouldn't commit. In the meantime, you have to exchange the PERC5 card for a SAS5 card, then you can have JBOD. I was a little disappointed, as the PERC5 card can drive 6 or 8 devices, but the SAS5 card can only drive 4. Lame. -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checking state of RAID (for automated notifications)
berlin % rpm -qf /usr/lib/nagios/plugins/contrib/check_linux_raid.pl nagios-plugins-1.4.1-1.2.fc4.rf It is built in to my nagios plugins package at least, and works great. -Mike Tomasz Chmielewski wrote: I would like to have RAID status monitored by nagios. This sounds like a simple script, but I'm not sure what approach is correct. Considering, that the health status of /proc/mdstat looks like this: # cat /proc/mdstat Personalities : [raid1] [raid10] md2 : active raid10 sda2[4] sdd2[3] sdc2[2] sdb2[1] 779264640 blocks super 1.0 64K chunks 2 near-copies [4/4] [] md1 : active raid1 sdd1[1] sdc1[0] 1076224 blocks [2/2] [UU] md0 : active raid1 sdb1[1] sda1[0] 1076224 blocks [2/2] [UU] unused devices: none What my script should be checking? Does the number of U (8 for this host) letters indicate that RAID is healthy? Or should I count in_sync in cat /sys/block/md*/md/rd*/state? Perhaps the two approaches are the same, though. What's the best way to determine that the RAID is running fine? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Care and feeding of RAID?
Steve Cousins wrote: MAILADDR [EMAIL PROTECTED] ARRAY /dev/md0 level=raid5 num-devices=3 UUID=39d07542:f3c97e69:fbb63d9d:64a052d3 devices=/dev/sdb1,/dev/sdc1,/dev/sdd1 If you list the devices explicitly, you're opening the possibility for errors when the devices are re-ordered following insertion (or removal) of any other SATA or SCSI (or USB storage) device I think you want is a DEVICE partitions line accompanied by ARRAY lines that have the UUID attribute you've already got in there. -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID over Firewire
Richard Scobie wrote: Dexter Filmore wrote: Of all modes I wouldn't use a linear setup for backups. One disk dies - all data is lost. I'd go for an external raid5 solution, tho those tend to be slow and expensive. Unfortunately budget is the overriding factor here. Unlike RAID 0, I thought there may be a way of recovering data from undamaged disks in a linear array, although I guess the file system used has some say in this. I hope to mitgate the risk somewhat by regularly using smartd to do long self tests on the disks. Long self tests will just tell you that you lost a block before RAID or the FS notices it, it's not going to stop the block (and your data) from going away. One more disk and you have raid 5 at least with the same storage capacity. md will transparently (to the OS, you'll get a log message) recover from single block errors in raid5. I'm not sure SMART works over firewire anyway. That's a question. http://smartmontools.sourceforge.net/: As for USB and FireWire (ieee1394) disks and tape drives, the news is not good. They appear to Linux as SCSI devices but their implementations do not usually support those SCSI commands needed by smartmontools. Note that page is slightly out of date - they mention SMART for SATA is supported through a patch to mainline, but it is in fact mainline now. -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is mdadm --create safe for existing arrays ?
Warning: I'm not certain this info is correct (I test on fake loopback arrays before taking my own advice - be warned). More authoritative folks are more than welcome to correct me or disagree. create is safe on existing arrays in general, so long as you get the old device order correct in the new create statement, and you use the 'missing' keyword appropriately so resyncs don't start immediately and you can mount the device to make sure you're data is there. Once you're certain, you add a drive in place of the missing component, sync up, and you're set. In this case, that'd be an 'mdadm --create -l1 -n2 /dev/md0 /dev/sda1 missing'. You should have an array there that you can test. But wait! If the superblock wasn't persistent before, it's possible that the device is using the space that would be used for the superblock for filesystem information - it may not be reserved for md use. This is where I'm not sure. If that's the case, re-creating with persistent superblocks may clobber the end of your filesystem, and you may not notice until you try to use it. There was a thread quite recently (one week? two? can't quite remember) specifically about putting a non-raid FS into a raid set that touched on these issues, and how to do the FS shrink so it would have room for the raid superblock. I'd refer to that. The goal being to shrink 1MB or so off the FS, create the raid, then grow the FS to max again (or let it be, whatever) -Mike Peter Greis wrote: Greetings, I have a SuSE 10.0 raid-1 root which will not properly boot, and I have noticed that / and /boot have non-persistent super blocks (which I read is required for booting). So, how do I change / and /boot to make the super blocks persistent ? Is it safe to run mdadm --create /dev/md0 --raid-devices=2 --level=1 /dev/sda1 /dev/sdb1 without loosing any data ? regards, Peter PS Yes, I have googled extensively without finding a conclusive answer. Peter Greis freethinker gmbh Stäfa Switzerland __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reshape
Nigel J. Terry wrote: One comment - As I look at the rebuild, which is now over 20%, the time till finish makes no sense. It did make sense when the first reshape started. I guess your estimating / averaging algorithm doesn't work for a restarted reshape. A minor cosmetic issue - see below Nigel [EMAIL PROTECTED] ~]$ cat /proc/mdstat Personalities : [raid5] [raid4] md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2] 490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3] [UUU_] [] reshape = 22.7% (55742816/245111552) finish=5.8min speed=542211K/sec Unless something has changed recently the parity-rebuild-interrupted / restarted-parity-rebuild case shows the same behavior. It's probably the same chunk of code (I haven't looked, bad hacker! bad!), but I thought I'd mention it in case Neil goes looking The speed is truly impressive though. I'll almost be sorry to see it fixed :-) -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 disaster
Bruno Seoane wrote: mdadm -C -l5 -n5 -c=128 /dev/md0 /dev/sdb1 /dev/sdd1 /dev/sde1 /dev/sdc1 /dev/sda1 I took the devices order from the mdadm output on a working device. Is this the way it's supposed to be the command assembled? Is there anything alse I should consider or any other valid solution to gain access to my data? If you create the array, it will immediately start resyncing unless you list one of the devices in your command line as missing. Just pick one (ideally one of the ones that isn't getting picked up anyway) and put 'missing' in its place. Using missing is the only way to have it be read-only in the data regions. That'll let you make a mistake and still be able to recover data after you find the right command line. -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slackware -current softraid5 boot problem - additional info
Something fishy here Dexter Filmore wrote: # mdadm -E /dev/sdd Device /dev/sdd # cat /proc/mdstat Personalities : [raid5] md0 : active raid5 sda1[0] sdd1[3] sdc1[2] sdb1[1] 732563712 blocks level 5, 32k chunk, algorithm 2 [4/4] [] Components that are all the first partition. Are you using the whole disk, or the first partition? It appears that to some extent, you are using both. Perhaps some confusion on that point between your boot scripts and your manual run explains things? -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 resizing
Neil Brown wrote: On Monday May 1, [EMAIL PROTECTED] wrote: Hey folks. There's no point in using LVM on a raid5 setup if all you intend to do in the future is resize the filesystem on it, is there? The new raid5 resizing code takes care of providing the extra space and then as long as the say ext3 filesystem is created with resize_inode all should be sweet. Right? Or have I missed something crucial here? :) You are correct. md/raid5 makes the extra space available all by itself. Further - even if you don't create the filesystem with the right amount of extra metadata space for online resizing, you can resize any ext2/3 filesystem offline, and it doesn't take very long. You just use resize2fsf instead of ext2online -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md: Change ENOTSUPP to EOPNOTSUPP
Paul Clements wrote: Gil wrote: So for those of us using other filesystems (e.g. ext2/3), is there some way to determine whether or not barriers are available? You'll see something like this in your system log if barriers are not supported: Apr 3 16:44:01 adam kernel: JBD: barrier-based sync failed on md0 - disabling barriers Otherwise, assume that they are. But like Neil said, it shouldn't matter to a user whether they are supported or not. Filesystems will work correctly either way. This seems very important to me to understand thoroughly, so please forgive me if I'm being dense. What I'm not sure of in the above is for what definition of working? For the definition where the code simply doesn't bomb out, or for the stricter definition that despite write caching at the drive level there is no point where there could possibly be a data inconsistency between what the filesystem thinks is written and what got written, power loss or no? My understanding to this point is that with write caching and no barrier support, you would still care as power loss would give you a window of inconsistency. With the exception of the very minor situation Neil mentioned about the first write through md not being a superblock write... -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: data recovery on raid5
Recreate the array from the constituent drives in the order you mention, with 'missing' in place of the first drive that failed? It won't resync because it has a missing drive. If you created it correctly, the data will be there If you didn't create it correctly, you can keep trying permutations of 4-disk arrays with one missing until you see your data, and you should find it. -Mike Sam Hopkins wrote: Hello, I have a client with a failed raid5 that is in desperate need of the data that's on the raid. The attached file holds the mdadm -E superblocks that are hopefully the keys to the puzzle. Linux-raid folks, if you can give any help here it would be much appreciated. # mdadm -V mdadm - v1.7.0 - 11 August 2004 # uname -a Linux hazel 2.6.13-gentoo-r5 #1 SMP Sat Jan 21 13:24:15 PST 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux Here's my take: Logfiles show that last night drive /dev/etherd/e0.4 failed and around noon today /dev/etherd/e0.0 failed. This jibes with the superblock dates and info. My assessment is that since the last known good configuration was 0 missing 1 /dev/etherd/e0.0 2 /dev/etherd/e0.2 3 /dev/etherd/e0.3 then we should shoot for this. I couldn't figure out how to get there using mdadm -A since /dev/etherd/e0.0 isn't in sync with e0.2 or e0.3. If anyone can suggest a way to get this back using -A, please chime in. The alternative is to recreate the array with this configuration hoping the data blocks will all line up properly so the filesystem can be mounted and data retrieved. It looks like the following command is the right way to do this, but not being an expert I (and the client) would like someone else to verify the sanity of this approach. Will mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023] do what we want? Linux-raid folks, please reply-to-all as we're probably all not on the list. Thanks for your help, Sam /dev/etherd/e0.0: Magic : a92b4efc Version : 00.90.00 UUID : 8fe1fe85:eeb90460:c525faab:cdaab792 Creation Time : Mon Jan 3 03:16:48 2005 Raid Level : raid5 Device Size : 195360896 (186.31 GiB 200.05 GB) Raid Devices : 4 Total Devices : 5 Preferred Minor : 0 Update Time : Fri Apr 21 12:45:07 2006 State : clean Active Devices : 3 Working Devices : 4 Failed Devices : 1 Spare Devices : 1 Checksum : 4cc955da - correct Events : 0.3488315 Layout : left-asymmetric Chunk Size : 32K Number Major Minor RaidDevice State this 1 15201 active sync /dev/etherd/e0.0 0 0 000 removed 1 1 15201 active sync /dev/etherd/e0.0 2 2 152 322 active sync /dev/etherd/e0.2 3 3 152 483 active sync /dev/etherd/e0.3 4 4 152 160 spare /dev/etherd/e0.1 /dev/etherd/e0.2: Magic : a92b4efc Version : 00.90.00 UUID : 8fe1fe85:eeb90460:c525faab:cdaab792 Creation Time : Mon Jan 3 03:16:48 2005 Raid Level : raid5 Device Size : 195360896 (186.31 GiB 200.05 GB) Raid Devices : 4 Total Devices : 5 Preferred Minor : 0 Update Time : Fri Apr 21 14:03:12 2006 State : clean Active Devices : 2 Working Devices : 3 Failed Devices : 3 Spare Devices : 1 Checksum : 4cc991e9 - correct Events : 0.3493633 Layout : left-asymmetric Chunk Size : 32K Number Major Minor RaidDevice State this 2 152 322 active sync /dev/etherd/e0.2 0 0 000 removed 1 1 001 faulty removed 2 2 152 322 active sync /dev/etherd/e0.2 3 3 152 483 active sync /dev/etherd/e0.3 4 4 152 164 spare /dev/etherd/e0.1 /dev/etherd/e0.3: Magic : a92b4efc Version : 00.90.00 UUID : 8fe1fe85:eeb90460:c525faab:cdaab792 Creation Time : Mon Jan 3 03:16:48 2005 Raid Level : raid5 Device Size : 195360896 (186.31 GiB 200.05 GB) Raid Devices : 4 Total Devices : 5 Preferred Minor : 0 Update Time : Fri Apr 21 14:03:12 2006 State : clean Active Devices : 2 Working Devices : 3 Failed Devices : 3 Spare Devices : 1 Checksum : 4cc991fb - correct Events : 0.3493633 Layout : left-asymmetric Chunk Size : 32K Number Major Minor RaidDevice State this 3 152 483 active sync /dev/etherd/e0.3 0 0 000 removed 1 1 001 faulty removed 2
Re: A failed-disk-how-to anywhere?
Brad Campbell wrote: Martin Stender wrote: Hi there! I have two identical disks sitting on a Promise dual channel IDE controller. I guess both disks are primary's then. One of the disks have failed, so I bought a new disk, took out the failed disk, and put in the new one. That might seem a little naive, and apparently it was, since the system won't boot up now. It boots fine, when only the old, healthy disk is connected. My initial thought would be you have hde and hdg in a raid-1 and nothing on the on-board controllers. hde has failed and when you removed it your controller tried the 1st disk it could find (hdg) to boot of.. Bingo.. away we go. You plug a new shiny disk into hde and now the controller tries to boot off that, except it's blank and therefore a no-go. I'd either try and force the controller to boot off hdg (which might be a controller bios option) or swap hde hdg.. then it might boot and let you create your partitions on hdg and then add it back into the mirror. I'd add another stab in the dark and guess that you didn't install your boot loader on both drives. Not that I've ever done that before (ok, a few times, most recently two days ago, sigh) Typically the BIOS will try all hard drives and so it should have rolled to one that worked, but if only the failed drive had the boot loader then you are of course not bootable. I solved this by booting rescue mode, starting up the raid arrays, mounting them, and manually grub installing. Here's a good page for the grub incantations: http://gentoo-wiki.com/HOWTO_Gentoo_Install_on_Software_RAID_mirror_and_LVM2_on_top_of_RAID#Bootloader_installation_and_configuration -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: addendum: was Re: recovering data on a failed raid-0 installation
Well, honestly I'm not really sure. I've never done this as I only use the redundant raid levels, and when they're gone, things are a complete hash and there's no hope. In fact, with raid-0 (striping, right? not linear/append?) I believe you are in the same boat. Each large file will have half its contents on the disk that died. So really, there's very little hope. Anyway, I'll try to give you pointers to what I would try anyway, with as much detail as I can. First, you just need to get the raid device up. It sounds like you are actually already doing that, but who knows. If you have one drive but not the other, you could make a sparse file that is the same size as the disk you lost. I know this is possible, but haven't done it so you'll have to see for yourself - I think there are examples in linux-raid archives in reference to testing very large raid arrays. Loopback mount the file as a device (losetup is he command to use here) and now you have a virtual device of the same size as the drive you lost. Recreate the raid array using the drive you have, and the new virtual drive in place of the one you lost. It's probably best to do this with non-persistent superblocks and just generally as read-only as possible for data preservation on the drive you have. So now you have a raid array. For the filesystem, well, I don't know. That's a mess. I assume it's possible to mount the filesystem with some degree of force (probably a literally -force argument) as well as read-only. You may need to point at a different superblock, who knows? You just want to get the filesystem to mount somehow, any way you need to, but hopefully in a read-only mode. I would not even attempt to fsck it. At this point, you have a mostly busted filesystem on a fairly broken raid setup, but it might be possible to pull some data out of it, who knows? You could pull what looks like data but is instead garbage to though - if you don't have md5sums of the files you get (if you get any) it'll be hard to tell without checking them all. Honestly, that's as much as I can think of. I know I'm just repeating myself when I say this, but raid is no replacement for backups. They have different purposes, and backups are no less necessary. I was sorry to hear you didn't have any, because that probably seals the coffin on your data. With regard to people recommending you get a pro. In this field (data recovery) there are software guys (most of the people on this list) that can do a lot while the platters are spinning and there are hardware guys (the pros I think most people are talking about). They have physical tools that can get data out of platters that wouldn't spin otherwise. There's nothing the folks on the list can do really other than recommend seeing someone (or shipping the drive to) one of those dudes. When you get the replacement drive back from them with your data on it, then we're back in software land and you may have half a chance. That said, it sounded like you had already tried to fsck the filesystem on this thing, so you may have hashed the remaining drive. It's hard to say. Truly bleak though... -Mike Technomage wrote: mike. given the problem, I have a request. On Friday 31 March 2006 15:55, Mike Hardy wrote: I can't imagine how to coax a filesystem to work when it's missing half it's contents, but maybe a combination of forcing a start on the raid and read-only FS mounts could make it hobble along. we will test any well laid out plan. lay out for us (from beginning to end) all the steps required, in your test. do not be afraid to detail the obvious. it is better that we be in good communication than to be working on assumptions. it will save you a lot of frustration trying to correct for our assumptions, if there are none. tmh - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendations for supported 4-port SATA PCI card ?
Addonics adst114 was the cheapest one I've found that works. I found it for $41 at thenerds.net but you may be better at the price searching than me. It's a Silicon Images 3114 chip, driven by the sata_sil driver I honestly don't recall if it was out-of-the-box working on FC4, but the updated kernels drive it fine, and FC5 (with 2.6.16+) should be fine with it. -Mike Ian Thurlbeck wrote: Dear All I have 4x500GB Maxtor SATA drives and I want to attach these to a 4-port SATA PCI card and RAID5 them using md Could anybody recommend a card that will have out of box support on a Fedora system ? Many thanks Ian - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to clone a disk
I can think of two things I'd do slightly differently... Do a smartctl -t long on each disk before you do anything, to verify that you don't have single sector errors on other drives Use ddrescue for better results copying a failing drive -Mike PFC wrote: I have a raid5 array that contain 4 disk and 1 spare disk. now i saw one disk have sign of going fail via smart log. Better safe than sorry... replace the failing disk and resync, that's all. You might want to do cat /dev/md# /dev/null, or cat /dev/hd? /dev/null first. This is to be sure there isn't some yet-unseen bad sector on some other drive which would screw your resync. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: No syncing after crash. Is this a software raid bug?
Why would you not be happy? resyncs in general are bad since they indicate your data is possibly out-of-sync and the resync itself consumes an enormous amount of resources This is a feature of new-ish md driver code that more aggressively marks the array as clean after writes The end result is that the array will most likely be clean in all circumstances even a crash, and you simply won't need to resync That's a good thing! -Mike Kasper Dupont wrote: I have a FC4 installation (upgraded from FC3) using kernel version 2.6.15-1.1831_FC4. I see some symptoms in the software raid, which I'm not quite happy about. After an unclean shutdown caused by a crash or power failure, it does not resync the md devices. I have tried comparing the contents of the two mirrors for each of the md devices. And I found that on the swap device, there were differences. Isn't this a bug in the software raid? Shouldn't it always resync after reboot, if there could possibly be any difference between the contents on the two disks? I know that as long as only swap is affected, it is not going to cause data loss. But how can I be sure it is not going to happen on file systems as well? Should I report this as a bug in Fedora Core or did I miss something? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question: array locking, possible?
Chris Osicki wrote: To rephrase my question, is there any way to make it visible to the other host that the array is up an running on the this host? Any comments, ideas? Would that not imply an unlock command before you could run the array on the other host? Would that not then break the automatic fail-over you want, as no machine that died or hung would issue the unlock command, meaning that the fail-over node could not then use the disks It's an interesting idea, I just can't think of a way to make it work unattended It might be possible wrap the 'mdadm' binary with a script that checks (maybe via some deep check using ssh to execute remote commands, or just a ping) the hosts status and just prints a little table of host status that can only be avoided by passing a special --yes-i-know flag to the wrapper -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Newbie questions: Max Active disks, RAID migration, Compiling mdadm 2.3
If you remove the '-Werror' it'll compile and work, but you still can't convert a raid 0 to a raid 5. You're raid level understanding is off as well, raid 5 is a parity block rotating around all drives, you were thinking of raid 4 which has a single parity disk. Migrating raid 0 to raid 4 (and vice versa) should be possible technically, but I don't think it's implemented anywhere You should be able to have more than 4 active drives though. I am at this moment building an array with 6 components, I'm running a few with more than that, and these are by no means the largest arrays that people are running - just examples of it working. -Mike Martin Ritchie wrote: Sorry if these are total newbie questions. Why can't I have more than 4 active drives in my md RAID? Why can't I easily migrate a RAID 0 to RAID 5. As I see it RAID 0 is just RAID 5 with a failed parity check drive? Perhaps this is a limitation of the old v1.11 that FC4 updates to. I tried to compile 2.3 but I get this error: $make gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\/etc/mdadm.conf\ - ggdb -DSendmail=\/usr/sbin/sendmail -t\ -c -o super0.o super0.c In file included from super0.c:31: /usr/include/asm/byteorder.h:6:2: error: #warning using private kernel header; include endian.h instead! make: *** [super0.o] Error 1 I'm not too familiar with compiling this sort of thing. (I usually live further away from the hardware and endian issues). I'm guessing there is some sort of option i have to specify to say that this should use the private kernel headers. Including endian.h instead didn't help: $make gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\/etc/mdadm.conf\ - ggdb -DSendmail=\/usr/sbin/sendmail -t\ -c -o super0.o super0.c cc1: warnings being treated as errors super0.c: In function ‘add_internal_bitmap0’: super0.c:737: warning: implicit declaration of function ‘__cpu_to_le32’ super0.c:742: warning: implicit declaration of function ‘__cpu_to_le64’ make: *** [super0.o] Error 1 Oh just because I know it is going to be an issue I'm building on a Athlon 64... my first 64bit linux box so I'm sure there are going to be gotchas that I've not thought about. Is there somewhere I over looked for finding this information. TIA - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ludicrous speed: raid6 reconstruction
I saw this on my array, and other(s) have reported it as well. Apparently the reconstruction speed algorithm doesn't understand that it's not syncing all the blocks and hilarity ensues. I believe that was it, anyway Either that or you really have a hell of a server :-) -Mike jurriaan wrote: Personalities : [linear] [raid0] [raid1] [raid5] [raid4] [raid6] md0 : active raid6 sdh1[7] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0] 1465175424 blocks level 6, 64k chunk, algorithm 2 [8/8] [] [==..] resync = 91.0% (39104/244195904) finish=0.0min speed=7369041K/sec bitmap: 23/233 pages [92KB], 512KB chunk I am reminded of Spaceball's 'ludicrous speed' here. This is after a reboot from 2.6.16-rc1-mm3 (where the array was built) to 2.6.16-rc1-mm5 (where rebuilding continued thanks to the bitmap). - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid reconstruction speed
PFC wrote: When rebuilding md1, it does not realize accesses to md0 wait for the same disks. Thus reconstruction of md1 runs happily at full speed, and the machine is dog slow, because the OS and everything is on md0. (I cat /dev/zero to a file on md1 to slow the rebuild so it would let me start a web browser so I don't get bored to death) echo 1 /proc/sys/dev/raid/speed-limit-max (or similar?) You can do that in /etc/rc.local or something to make sure it sticks, then you'll be able to use your machine while any array rebuilds. I guess the feature you're asking for is for md to guess that accessing any partition component on a disk that has a partition being rebuilt should throttle the rebuild, right? Can that heuristic be successful at all times? I think it might. Does md have enough information to do that? I don't know... -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
Slightly off-topic, but: Simon Valiquette wrote: Francois Barre a écrit : On production server with large RAID array, I tends to like very much XFS and trust it more than ReiserFS (I had some bad experience with ReiserFS in the past). You can also grow a XFS filesystem live, which is really nice. I didn't know this until recently, but ext2/3 can be grown online as well (using 'ext2online'), given that you create it originally with enough block group descriptor table room to support the size you're growing too. From the man page for mke2fs: -E extended-options Set extended options for the filesystem. Extended options are comma separated, and may take an argument using the equals (’=’) sign. The -E option used to be -R in earlier versions of mke2fs. The -R option is still accepted for backwards compati- bility. The following extended options are supported: stride=stripe-size Configure the filesystem for a RAID array with stripe-size filesystem blocks per stripe. resize=max-online-resize Reserve enough space so that the block group descriptor table can grow to support a filesystem that has max-online-resize blocks. I have done it, and it works. -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 write performance
Moreover, and I'm sure Neil will chime in here, isn't the clean/unclean thing designed to prevent this exact scenario? The array is marked unclean immediately prior to write, then the write and parity write happens, then the array is marked clean. If you crash during the write but before parity is correct, the array is unclean and you resync (quickly now thanks to intent logging if you have that on) The non-parity blocks that were partially written are then the responsibility of your journalling filesystem, which should make sure there is no corruption, silent or otherwise. If I'm misunderstanding that, I'd love to be corrected. I was under the impression that the silent corruption issue was mythical at this point and if it's not I'd like to know. -Mike Dan Stromberg wrote: Would it really be that much slower to have a journal of RAID 5 writes? On Fri, 2005-11-18 at 15:05 +0100, Jure Pečar wrote: Hi all, Currently zfs is a major news in the storage area. It is very interesting to read various details about it on varios blogs of Sun employees. Among the more interesting I found was this: http://blogs.sun.com/roller/page/bonwick?entry=raid_z The point the guy makes is that it is impossible to atomically both write data and update parity, which leaves a window of crash that would silently leave on-disk data+paritiy in an inconsistent state. Then he mentions that there are software only workarounds for that but that they are very very slow. It's interesting that my expirience with veritas raid5 for example is just that: slow to the point of being unuseable. Now, I'm wondering what kind of magic does linux md raid5 does, since its write performance is quite good? Or, does it actually do something regarding this? :) Niel? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 write performance
Guy wrote: It is not just a parity issue. If you have a 4 disk RAID 5, you can't be sure which if any have written the stripe. Maybe the parity was updated, but nothing else. Maybe the parity and 2 data disks, leaving 1 data disk with old data. Beyond that, md does write caching. I don't think the file system can tell when a write is truly complete. I don't recall ever having a Linux system crash, so I am not worried. But power failures cause the same risk, or maybe more. I have seen power failures, even with a UPS! Good points there Guy - I do like your example. I'll go further with crashing too and say that I actually crash outright occasionally. Usually when building out new machines where I don't know the proper driver tweaks, or failing hardware, but it happens without power loss. Its important to get this correct and well understood. That said, unless I hear otherwise from someone that works in the code, I think md won't report the write as complete to upper layers until it actually is. I don't believe it does write-caching, and regardless, if it does it must not do it until some durable representation of the data is committed to hardware and the parity stays dirty until redundancy is committed. Building on that, barring hardware write-caching, I think with a journalling FS like ext3 and md only reporting the write complete when it really is, things won't be trusted at the FS level unless they're durably written to hardware. I think that's sufficient to prove consistency across crashes. For example, even if you crash during an update to a file smaller than a stripe, the stripe will be dirty so the bad parity will be discarded and the filesystem won't trust the blocks that didn't get reported back as written by md. So that file update is lost, but the FS is consistent and all the data it can reach is consistent with what it thinks is there. So, I continue to believe silent corruption is mythical. I'm still open to good explanation it's not though. -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raidreconf / growing raid 5 doesn't seem to work anymore
Hello all - This is more of a cautionary tale than anything, as I have not attempted to determine the root cause or anything, but I have been able to add a disk to a raid5 array using raidreconf in the past and my last attempt looked like it worked but still scrambled the filesystem. So, if you're thinking of relying on raidreconf (instead of a backup/restore cycle) to grow your raid 5 array, I'd say its probably time to finally invest in enough backup space. Or you could dig in and test raidreconf until you know it will work. I'll paste the commands and their output in below so you can see what happened - raidreconf appeared to work just fine, but the file-system is completely corrupted as far as I can tell. Maybe I just did something wrong though. I used a make no changes mke2fs command to generate the list of alternate superblock locations. They could be wrong, but the first one being corrupt is enough by itself to be a fail mark for raidreconf. This isn't a huge deal in my opinion, as this actually is my backup array, but it would have been cool if it had worked. I'm not going to be able to do any testing on it past this point though as I'm going to rsync the main array onto this thing ASAP... -Mike --- marvin/root # raidreconf -o /etc/raidtab -n /etc/raidtab.new -m /dev/md2 Working with device /dev/md2 Parsing /etc/raidtab Parsing /etc/raidtab.new Size of old array: 2441960010 blocks, Size of new array: 2930352012 blocks Old raid-disk 0 has 953890 chunks, 244195904 blocks Old raid-disk 1 has 953890 chunks, 244195904 blocks Old raid-disk 2 has 953890 chunks, 244195904 blocks Old raid-disk 3 has 953890 chunks, 244195904 blocks Old raid-disk 4 has 953890 chunks, 244195904 blocks New raid-disk 0 has 953890 chunks, 244195904 blocks New raid-disk 1 has 953890 chunks, 244195904 blocks New raid-disk 2 has 953890 chunks, 244195904 blocks New raid-disk 3 has 953890 chunks, 244195904 blocks New raid-disk 4 has 953890 chunks, 244195904 blocks New raid-disk 5 has 953890 chunks, 244195904 blocks Using 256 Kbyte blocks to move from 256 Kbyte chunks to 256 Kbyte chunks. Detected 256024 KB of physical memory in system A maximum of 292 outstanding requests is allowed --- I will grow your old device /dev/md2 of 3815560 blocks to a new device /dev/md2 of 4769450 blocks using a block-size of 256 KB Is this what you want? (yes/no): yes Converting 3815560 block device to 4769450 block device Allocated free block map for 5 disks 6 unique disks detected. Working (\) [03815560/03815560] [] Source drained, flushing sink. Reconfiguration succeeded, will update superblocks... Updating superblocks... handling MD device /dev/md2 analyzing super-block disk 0: /dev/hdc1, 244196001kB, raid superblock at 244195904kB disk 1: /dev/hde1, 244196001kB, raid superblock at 244195904kB disk 2: /dev/hdg1, 244196001kB, raid superblock at 244195904kB disk 3: /dev/hdi1, 244196001kB, raid superblock at 244195904kB disk 4: /dev/hdk1, 244196001kB, raid superblock at 244195904kB disk 5: /dev/hdj1, 244196001kB, raid superblock at 244195904kB Array is updated with kernel. Disks re-inserted in array... Hold on while starting the array... Maximum friend-freeing depth: 8 Total wishes hooked:3815560 Maximum wishes hooked: 292 Total gifts hooked: 3815560 Maximum gifts hooked: 200 Congratulations, your array has been reconfigured, and no errors seem to have occured. marvin/root # cat /proc/mdstat Personalities : [raid1] [raid5] md1 : active raid1 hda1[0] hdb1[1] 146944 blocks [2/2] [UU] md3 : active raid1 hda2[0] hdb2[1] 440384 blocks [2/2] [UU] md2 : active raid5 hdj1[5] hdk1[4] hdi1[3] hdg1[2] hde1[1] hdc1[0] 1220979200 blocks level 5, 256k chunk, algorithm 0 [6/6] [UU] [=...] resync = 7.7% (19008512/244195840) finish=434.5min speed=8635K/sec md0 : active raid1 hda3[0] hdb3[1] 119467264 blocks [2/2] [UU] unused devices: none marvin/root # mount /backup mount: wrong fs type, bad option, bad superblock on /dev/md2, or too many mounted file systems (aren't you trying to mount an extended partition, instead of some logical partition inside?) marvin/root # fsck.ext3 -C 0 -v /dev/md2 e2fsck 1.35 (28-Feb-2004) fsck.ext3: Filesystem revision too high while trying to open /dev/md2 The filesystem revision is apparently too high for this version of e2fsck. (Or the filesystem superblock is corrupt) The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 device marvin/root # mke2fs -j -m 1 -n -v Usage: mke2fs [-c|-t|-l filename] [-b block-size] [-f
Re: raidreconfig advice
Max Waterman wrote: OK, I am going to try to expand the capacity of my raid5 array and I want to make sure I've got it right. Not a bad idea, as its all or nothing... Disk /dev/hdg: 200.0 GB, 200049647616 bytes Disk /dev/hdi: 200.0 GB, 200049647616 bytes Disk /dev/hdk: 200.0 GB, 200049647616 bytes Disk /dev/sda: 200.0 GB, 200049647616 bytes Disk /dev/sdb: 200.0 GB, 200049647616 bytes Disk /dev/sdc: 200.0 GB, 200049647616 bytes Disk /dev/sdd: 200.0 GB, 200049647616 bytes They certainly all looked the same (including the C/H/S counts) This leaves me with sdc which I can try to add. If that goes OK, I'll trash the backup and add sd[ab] too. I'd be very wary of this, for two reasons. One, you have the backup during the add for a reason. If anything goes wrong, there goes your data. Second, where would you ever back your raid up to? What about fs corruption? The rule of thumb with databases is to always have enough contiguous scratch space to dump and restore your biggest table. With large RAID, you should always be able to dump and restore your largest raid device, imho. Its a bunch more disk yes, but you'll need it at some point, I promise. Many future tears can be averted... 2) Where do I get raidreconfig from? Google wasn't much help. I saw you noticed it raidreconf - you should be set there 3) Are there any instructions for raidreconfig? I understand is uses some non-mdadm config files as from/to input. the man page is great - honest. Two conf files (current and future) and you're set The last question should be an open-ended is there anything else? 1) Run a long SMART test on all drives first. Imagine if you get a bad block during the reconfig... 2) Validate your backup (just in case) 3) ?? It takes a long time to do, be patient I guess 4) You could use the script I posted earlier that sets up a loopback device practice raid set to practice perhaps (if you really wanted) Good luck- -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md Grow for Raid 5
Frank Wittig wrote: It actually is available. I've tested it and it worked fine for me. But taking a backup is highly recommended. The trick is not to use mdadm, since growing with mdadm is not possible at the moment. Use raid-tools instead. The program raidreconf comes along with raidtools. This prog takes two raid-tab files as input which describe the array configuration before and after reconfiguration. (See man raidreconf for further details) I'll second both major points here: raidreconf does work, but it can fail and leave things completely destroyed (imagine one bad block somewhere after parity was partially migrated), so take a backup. Given that you're taking a backup already then, creating a new array (with its optimized resync) might be faster if its an online backup. I'm 2 for 4 now on raidreconf working, with the two failures (sadly) being of the operator error variety - raidreconf is picky and fails slow if your disk sizes aren't what it expects, I found. It got to the end and ran out of space on me due to a slightly different 250GB disk size once. The other was a bad block along the way - I should have done smartctl -t long on all drives prior to resize. Both lessons learned... -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md Grow for Raid 5
berk walker wrote: Have you guys seen/tried mdadm 1.90? I am delightfully experiencing the I believe the mdadm based grow does not work for raid5, but only for raid0 or raid1. raidreconf is actually capable of adding disks to raid5 and re-laying out the stripes / moving parity blocks, etc You're very correct about needing to grow the FS after growing the device though. Most FS's have tools for that, or there's LVM... -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 1 on a server with possible mad mobo
Colin McDonald wrote: Is it a bad idea to write the grub to a software mirror. Is it written to a specific disk when this is done? The Software Raid and Grub HOW-TO http://lists.us.dell.com/pipermail/linux-poweredge/2003-July/014331.html I use grub+raid1 on the root drive of a number of machines, but you do have to be careful as its my understanding its not raid aware, and picks a drive, whereas lilo was raid-aware and wrote the boot sector on all components. After following the directions at the link above, I've been able to boot the machine off each component (during failure testing), so the directions appear to work, to me. -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH md 9 of 9] Optimise reconstruction when re-adding a recently failed drive.
NeilBrown wrote: When an array is degraded, bit in the intent-bitmap are never cleared. So if a recently failed drive is re-added, we only need to reconstruct the block that are still reflected in the bitmap. This patch adds support for this re-adding. Hi there - If I understand this correctly, this means that: 1) if I had a raid1 mirror (for example) that has no writes to it since a resync 2) a drive fails out, and some writes occur 3) when I re-add the drive, only the areas where the writes occurred would be re-synced? I can think of a bunch of peripheral questions around this scenario, and bad sectors / bad sector clearing, but I may not be understanding the basic idea, so I wanted to ask first. -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH md 2 of 4] Fix raid6 problem
Mark Hahn wrote: Interesting - the private mail was from me, and I've got two dual Opterons in service. The one with significantly more PCI activity has significantly more problems then the one with less PCI activity. that's pretty odd, since the most intense IO devices I know of are cluster interconnect (quadrics, myrinet, infiniband), and those vendors *love* opterons. I've never heard any of them say other than that Opteron IO handling is noticably better than Intel's. Sure, but which variables are changed between the rigs the vendors loved, and the rig we're having problems with? otoh, I could easily believe that if you're running the Opteron systems in acts-like-a-faster-xeon mode (ie, not x86_64), you might be exercising some less-tested paths. Its running x86_64 (Fedora Core 3) and the problem is rooted in the chipset I believe. I don't think its Opterons per se, I think its just the Athlon take two - which is to say that its a wonderful chip, but some of the chipsets its saddled with are horrible, and careful selection (as well as heavy testing prior to putting a machine in service) is essential. -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating from SINGLE DISK to RAID1
Robert Heinzmann wrote: Hello, can someone verify if the following statements are true ? - It's not possible to simply convert a existing partition with a filesystem on it to a raid1 mirror set. I believe you're right, but I'm not totally sure on this one. I'd take the second disk, create a new RAID1 with the first drive missing on the mdadm --create commandline, copy everything over to it, put grub on it, then test that it boots correctly by pulling the first drive out. Only once the RAID1 is working (in degraded mode) add the original first drive back in, but booting off the RAID1, then add it to the RAID set to complete the pair. With that process, the question is somewhat moot, although I'm interested in the real answer too - Using a former disk of a raid1 array as a usual disk (not mounted as degrated /dev/mdX, but instead mounted as /dev/sdX or /dev/hdX) is successfull. This is because the MD device layer reports the device size as size of disk - superblock offset during the creation of a filesystem on the MD device. Thus the used size of the disk, when mounting it as /dev/sdX /dev/hdX, is some KB smaller than it could be, but no data is lost. This matches my experience, although autodiscovery can get in your way, as you mention yourself -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Broken harddisk
Guy wrote: For future reference: Everyone should do a nightly disk test to prevent bad blocks from hiding undetected. smartd, badblocks or dd can be used. Example: dd if=/dev/sda of=/dev/null bs=64k Just create a nice little script that emails you the output. Put this script in a nighty cron to run while the system is idle. While I agree with your purpose 100% Guy, I respectfully disagree with the method. If at all possible, you should use tools that access the SMART capabilities of the device so that you get more than a read test - you also get statistics on the various other health parameters the drive checks some of which can serve fair warning of impending death before you get bad blocks. http://smartmontools.sf.net is the source for fresh packages there, and smartd can be set up with a config file to do tests on any schedule you like, emailing you urgent results as it gets them, or just putting information of general interest in the logs that Logwatch picks up. If you're drives don't talk SMART (older ones don't, it doesn't work through all interfaces either) then by all means take Guy's advice. A 'dd' test is certainly valuable. But if they do talk SMART, I think its better -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: migrating raid-1 to different drive geometry ?
Robin Bowes wrote: Mike Hardy wrote: To grow component count on raid5 you have to use raidreconf, which can work, but will toast the array if anything goes bad. I have personally had it work, and not work, in different instances. The failures were not necessarily raidreconf's fault, but its not fault tolerant is the point, as it starts at the first stripe, laying things out the new way, and if it doesn't finish, and finish correctly, you are in an irretrievable inconsistent state. Bah, too bad. I don't need it yet, but at some stage I'd like to be able to add another 250GB drive(s) to me array and grow the array to use the additional space in a safe/failsafe way. Perhaps by the time I come to need it this might be possible? Well, I want to be clear here, as who ever wrote raidreconf deserves some respect, and I don't want to appear to be disparaging it. raidreconf works. I'm not aware of any bugs in it. Further, if mdadm was to implement the feature of adding components to a raid5 array, I'm guessing it would look exactly the same as raidreconf, simply because of the work it has to do (re-configuring each stripe, moving parity blocks and data blocks around, etc). Its just the way the raid5 disk layout is. So, since raidreconf does work, its definitely possible now, but you have to make absolutely amazingly sure of three things: 1) the component size you add is at least as large as the rest of the components (it'll barf at the end if not) 2) the old and new configurations you feed raidreconf are perfect (or what happens is undefined) 3) you have absolutely no bad blocks on any component, as it will read each block on each component and write each block on each component. (that's a tall order these days, if you get a bad block, what can it do?) If any of those things go bad, your array goes bad, but its not the algorithm's fault, as far as I can tell. Its constrained by the problem's requirements. So I'd add: 4) you have a perfect, fresh backup of the array ;-) Honestly, I've done it, and it does work, its just touchy. You can practice with it with loop devices (check for a raid5 loop array creator and destructor script I posted a week or so back) if you want to see it. -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid5 chunk calculator (was Re: the dreaded double disk failure)
Mike Hardy wrote: Mike Hardy wrote: What I'm thinking of doing is writing a small (or, as small as possible, anyway) perl program that can take a few command line arguments (like the array construction information) and know how to read the data blocks on the array, and calculate parity, as a baseline. If perl offends you, sorry, I'm quicker at it than C by a long-shot, and I don't really care about speed here, just speed of development. Here's the shell script I'm using as a test harness. It creates a loopback raid5 system, fills it up with random data, and then takes the md5sum. It has a few modes of operation (to initialize or not as it starts or stops the array). Probably bad form to keep replying to myself, but what the heck. Ok, I've got a basic perl program together where you specify an arbitrary raid5 array layout, an array component, and a sector address in that component, and it can tell you: a) what the computed value of the sector's chunk should be b) if the real data in the chunk matches the computed value It still needs more structure and cleaning to be useful (it needs a loop to be a general parity checker, or some write logic to be a bad-sector-clearance script). However, the basic raid math seems to work with the test-array creation script I posted earlier in the testing I threw at it, and it might already be useful to others. If anyone checks it out and finds bugs I need to fix or can think of a use for it other than what I'm thinking, let me know, and that'll save me time or show me where I'm missing useful abstractions so I can clean it up properly. Otherwise I'm going to do a lot more testing, wrap this up tomorrow, and (hopefully!) fix the unreadable sectors on the second bad drive in my array with it. -Mike #!/usr/bin/perl -w # # raid5 perl utility # Copyright (C) 2005 Mike Hardy [EMAIL PROTECTED] # # This script understands the default linux raid5 disk layout, # and can be used to check parity in an array stripe, or to calculate # the data that should be present in a chunk with a read error. # # Constructive criticism, detailed bug reports, patches, etc gladly accepted! # # Thanks to Ashford Computer Consulting Service for their handy RAID information: #http://www.accs.com/p_and_p/RAID/index.html # # Thanks also to the various linux kernel hackers that have worked on 'md', # the header files and source code were quite informative when writing this. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option) # any later version. # # You should have received a copy of the GNU General Public License # (for example /usr/src/linux/COPYING); if not, write to the Free # Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. # my @array_components = ( /dev/loop0, /dev/loop1, /dev/loop2, /dev/loop3, /dev/loop4, /dev/loop5, /dev/loop6, /dev/loop7 ); my $chunk_size = 64 * 1024; # chunk size is 64K my $sectors_per_chunk = $chunk_size / 512; # Problem - I have a bad sector on one disk in an array my %component = ( sector = 2032, device = /dev/loop3 ); # 1) Get the array-related info for that sector # 2) See if it was the parity disk or not # 2a) If it was the parity disk, calculate the parity # 2b) If it was not the parity disk, calculate its value from parity # 3) Write the data back into the sector ( $component{array_chunk}, $component{chunk_offset}, $component{stripe}, $component{parity_device} ) = getInfoForComponentAddress($component{sector}, $component{device}); foreach my $KEY (keys(%component)) { print $KEY . = . $component{$KEY} . \n; } # We started with the information on the bad sector, and now we know how it fits into the array # Lets see if we can fix the bad sector with the information at hand # Build up the list of devices to xor in order to derive our value my $xor_count = -1; for (my $i = 0; $i = $#array_components; $i++) { # skip ourselves as we roll through next if ($component{device} eq $array_components[$i]); # skip the parity chunk as we roll through next if ($component{parity_device} eq $array_components[$i]); $xor_devices{++$xor_count} = $array_components[$i]; print Adding xor device . $array_components[$i] . as xor device . $xor_count . \n; } # If we are not the parity device, put the parity device at the end if (!($component{device} eq $component{parity_device})) { $xor_devices{++$xor_count} = $component{parity_device}; print Adding parity device . $component{parity_device} . as xor device . $xor_count . \n; } # pre-calculate the device