Re: very degraded RAID5, or increasing capacity by adding discs
On Tue, Oct 09, 2007 at 01:48:50PM +0400, Michael Tokarev wrote: There still is - at least for ext[23]. Even offline resizers can't do resizes from any to any size, extfs developers recommend to recreate filesystem anyway if size changes significantly. I'm too lazy to find a reference now, it has been mentioned here on linux-raid at least this year. It's sorta like fat (yea, that ms-dog filesystem) - when you resize it from, say, 501Mb to 999Mb, everything is ok, but if you want to go from 501Mb to 1Gb+1, you have to recreate almost all data structures because sizes of all internal fields changes - and here it's much safer to just re-create it from scratch than trying to modify it in place. Sure it's much better for extfs, but the point is still the same. I'll just mention that I once resized a multi-Tera ext3 filesystem and it took 8hours +, a comparable XFS online resize lasted all of 10 seconds! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
flaky controller or disk error?
Hi, [using kernel 2.6.23 and mdadm 2.6.3+20070929] I have a rather flaky sata controller with which I am trying to resync a raid5 array. It usually starts failing after 40% of the resync is done. Short of changing the controller (which I will do later this week), is there a way to have mdmadm resume the resync where it left at reboot time? Here is the error I am seeing in the syslog. Can this actually be a disk error? Oct 18 11:54:34 sylla kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x1 action 0x2 frozen Oct 18 11:54:34 sylla kernel: ata1.00: irq_stat 0x0040, PHY RDY changed Oct 18 11:54:34 sylla kernel: ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 Oct 18 11:54:34 sylla kernel: res 40/00:00:19:26:33/00:00:3a:00:00/40 Emask 0x10 (ATA bus error) Oct 18 11:54:35 sylla kernel: ata1: soft resetting port Oct 18 11:54:40 sylla kernel: ata1: failed to reset engine (errno=-95)4ata1: port is slow to respond, please be patient (Status 0xd0) Oct 18 11:54:45 sylla kernel: ata1: softreset failed (device not ready) Oct 18 11:54:45 sylla kernel: ata1: hard resetting port Oct 18 11:54:46 sylla kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 18 11:54:46 sylla kernel: ata1.00: configured for UDMA/133 Oct 18 11:54:46 sylla kernel: ata1: EH complete Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB) Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Write Protect is off Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Thanks, - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Growing a raid 6 array
On Fri, Apr 13, 2007 at 10:15:05AM +0200, Laurent CARON wrote: Neil Brown a écrit : On Thursday March 1, [EMAIL PROTECTED] wrote: You can only grow a RAID5 array in Linux as of 2.6.20 AFAIK. There are two dimensions for growth. You can increase the amount of each device that is used, or you can increase the number of devices. You are correct that increasing the number of devices only works for RAID5 (and RAID1, but you don't get extra space) in 2.6.20 (RAID6 coming in 2.6.21). However this question is about growing an array the first way: increasing the amount of space used on each devices, and that is supported for RAID1/4/5/6. And Laurent: 1/ Yes, it is that easy 2/ I doubt a nearly-full ext3 array increases the risk 3/ The effect of adding a bitmap is that if you suffer a crash while the array is degraded, it will resync faster so you have less exposure to multiple failure. I just finished changing disks, growing the array, and then the filesystem. It worked flawlessy. Just a little notice: I had to unmount my ext3 filesystem to be able to resize it. (Took ~8 hours to fsck + resize the 15 disks array from 6 to 9TB on a dual Xeon with 4GB RAM). FWIW: I changed a 6 x 400G system to 500G disks and grew the raid5 array. It worked fine save for these warnings: Apr 8 16:54:33 sylla mdadm: RebuildStarted event detected on md device /dev/md1 Apr 8 16:54:33 sylla mdadm: Rebuild20 event detected on md device /dev/md1 Apr 8 16:54:33 sylla mdadm: Rebuild40 event detected on md device /dev/md1 Apr 8 16:54:33 sylla mdadm: Rebuild60 event detected on md device /dev/md1 Apr 8 16:54:33 sylla mdadm: Rebuild80 event detected on md device /dev/md1 Apr 8 16:54:33 sylla kernel: md1: invalid bitmap page request: 187 ( 186) Apr 8 16:54:33 sylla kernel: md1: invalid bitmap page request: 187 ( 186) etc... So after rebooting I removed the bitmap and recreated it. Online resizing of the 2.0T xfs to 2.4T only took a few _seconds_. A similar operation on a 4 x 250G raid5 system upgraded to 400G disks with reiser3 also took a few seconds of _online_ time. Kernel 2.6.20.6 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems
On Sat, Jan 06, 2007 at 12:17:36AM -0500, Chaitanya Patti wrote: We are in the process of porting RAIF to 2.6.19 right now. Should be done in early January. The trick is that we are trying to keep the same source good for a wide range of kernel versions. In fact, not too long ago we even were able to compile it for 2.4.24! Nikolai. We now have RAIF for the 2.6.19 kernel available at: ftp://ftp.fsl.cs.sunysb.edu/pub/raif/raif-1.1.tar.gz This version is more stable but there are for sure still some remaining bugs and we very much appreciate your feedback. Can RAIF be compared in functionality to drbd (when using nfs mounts from different hosts) ? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [solved] supermicro failure
On Tue, Dec 19, 2006 at 10:47:29PM -0500, Bill Davidsen wrote: Louis-David Mitterrand wrote: On Thu, Nov 09, 2006 at 03:27:31PM +0100, Louis-David Mitterrand wrote: I forgot to add that to help us solve this we are ready to hire a paid consultant please contact me by mail or phone at +33.1.46.47.21.30 Update: we eventually succeded in reassembling the partition, with two missing disks. Your update would be far more interesting if you found out why it ejected three drives at once... The obvious common failures, controller and power supply would not prevent reassembly in a functional environment. Actually the motherboard and/or its on-board scsi controller turned out defective. Reassembly succeded once the disk were transfered to another box. Has anyone seen such hardware failure on a brand new SuperMicro machine? Thanks, - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[urgent] supermicro ejecting disks
Hello, We recently changed our main server to a Supermicro with 6 scsi disks in soft raid6 with kernel 2.6.18. After running OK for a few days, 3 disks were suddenly ejected from the raid6. We are now trying to reassemble the partition in another box but keep getting a superblock corrupted error on the fs. We're in a pretty bad state right now, as this is a production machine. Of course we have bakcups but restoring them will take some time. Does anyone have any idea of what has happended and how to fix it? Thanks, - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[consultant needed] (was: supermicro ejecting disks)
I forgot to add that to help us solve this we are ready to hire a paid consultant please contact me by mail or phone at +33.1.46.47.21.30 Thanks On Thu, Nov 09, 2006 at 03:18:11PM +0100, Louis-David Mitterrand wrote: Hello, We recently changed our main server to a Supermicro with 6 scsi disks in soft raid6 with kernel 2.6.18. After running OK for a few days, 3 disks were suddenly ejected from the raid6. We are now trying to reassemble the partition in another box but keep getting a superblock corrupted error on the fs. We're in a pretty bad state right now, as this is a production machine. Of course we have bakcups but restoring them will take some time. Does anyone have any idea of what has happended and how to fix it? Thanks, - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [solved] (was: supermicro ejecting disks)
On Thu, Nov 09, 2006 at 03:27:31PM +0100, Louis-David Mitterrand wrote: I forgot to add that to help us solve this we are ready to hire a paid consultant please contact me by mail or phone at +33.1.46.47.21.30 Update: we eventually succeded in reassembling the partition, with two missing disks. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NCQ general question
On Wed, Mar 08, 2006 at 12:17:51PM -0500, Jeff Garzik wrote: Louis-David Mitterrand wrote: Do you plan on updating your AHCI NCQ patch found in http://www.kernel.org/pub/linux/kernel/people/jgarzik/libata/archive/ It no longer applies cleanly to the latest 2.6.15.x kernel. No, but, Jens Axboe and Tejun Heo will have a better version. Is it available somewhere? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NCQ general question
On Sun, Mar 05, 2006 at 02:29:15AM -0500, Jeff Garzik wrote: Raz Ben-Jehuda(caro) wrote: Is NCQ supported when setting the controller to JBOD instead of using HW raid? 1) The two have nothing to do with each other 2) It sounds like you haven't yet read http://linux-ata.org/faq-sata-raid.html Hello, Do you plan on updating your AHCI NCQ patch found in http://www.kernel.org/pub/linux/kernel/people/jgarzik/libata/archive/ It no longer applies cleanly to the latest 2.6.15.x kernel. Thanks, - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid, resync and hotspare
On Tue, Jul 12, 2005 at 05:33:08PM +0200, Laurent Caron wrote: Hi, I recently moved a server from old disks to now ones and added a hotspare (mdadm /dev/md1 -a /dev/sdf2) the hotspare appears in /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] md1 : active raid5 sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0] 285699584 blocks level 5, 64k chunk, algorithm 2 [5/5] [U] when I fail a disk, mdadm does *not* send me any warning. From the mdamd man page: Only Fail , FailSpare , DegradedArray , and TestMessage cause Email to be sent. Your event is SpareActive which does not trigger an alert mail, however all events can be reported through the --program switch: All events cause the program to be run. The program is run with two or three arguments, they being the event name, the array device and possibly a second device. Only at the second failure when the array is in degraded state I receive a warning. How may I receive a warning when the hotspare disk has been used to cope with a disk failure? --program my_mail_script.sh #!/bin/sh nail -s $1 event detected on device $2 $3 root EOF Dear admin, You array seems to have suffered to breakage: A $1 event was received on for device $2 $3. Fix it ASAP! Regards, EOF -- This space for rent. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html