Re: very degraded RAID5, or increasing capacity by adding discs

2007-10-22 Thread Louis-David Mitterrand
On Tue, Oct 09, 2007 at 01:48:50PM +0400, Michael Tokarev wrote:
 
 There still is - at least for ext[23].  Even offline resizers
 can't do resizes from any to any size, extfs developers recommend
 to recreate filesystem anyway if size changes significantly.
 I'm too lazy to find a reference now, it has been mentioned here
 on linux-raid at least this year.  It's sorta like fat (yea, that
 ms-dog filesystem) - when you resize it from, say, 501Mb to 999Mb,
 everything is ok, but if you want to go from 501Mb to 1Gb+1, you
 have to recreate almost all data structures because sizes of
 all internal fields changes - and here it's much safer to just
 re-create it from scratch than trying to modify it in place.
 Sure it's much better for extfs, but the point is still the same.

I'll just mention that I once resized a multi-Tera ext3 filesystem and 
it took 8hours +, a comparable XFS online resize lasted all of 10 
seconds! 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


flaky controller or disk error?

2007-10-22 Thread Louis-David Mitterrand
Hi,

[using kernel 2.6.23 and mdadm 2.6.3+20070929]

I have a rather flaky sata controller with which I am trying to resync a raid5
array. It usually starts failing after 40% of the resync is done. Short of
changing the controller (which I will do later this week), is there a way to
have mdmadm resume the resync where it left at reboot time?

Here is the error I am seeing in the syslog. Can this actually be a disk 
error?

Oct 18 11:54:34 sylla kernel: ata1.00: exception Emask 0x10 SAct 0x0 
SErr 0x1 action 0x2 frozen
Oct 18 11:54:34 sylla kernel: ata1.00: irq_stat 0x0040, PHY RDY 
changed
Oct 18 11:54:34 sylla kernel: ata1.00: cmd 
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 
Oct 18 11:54:34 sylla kernel: res 40/00:00:19:26:33/00:00:3a:00:00/40 
Emask 0x10 (ATA bus error)
Oct 18 11:54:35 sylla kernel: ata1: soft resetting port
Oct 18 11:54:40 sylla kernel: ata1: failed to reset engine 
(errno=-95)4ata1: port is slow to respond, please be patient (Status 0xd0)
Oct 18 11:54:45 sylla kernel: ata1: softreset failed (device not ready)
Oct 18 11:54:45 sylla kernel: ata1: hard resetting port
Oct 18 11:54:46 sylla kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 
SControl 300)
Oct 18 11:54:46 sylla kernel: ata1.00: configured for UDMA/133
Oct 18 11:54:46 sylla kernel: ata1: EH complete
Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] 976773168 512-byte 
hardware sectors (500108 MB)
Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Write cache: enabled, 
read cache: enabled, doesn't support DPO or FUA


Thanks,
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Growing a raid 6 array

2007-04-13 Thread Louis-David Mitterrand
On Fri, Apr 13, 2007 at 10:15:05AM +0200, Laurent CARON wrote:
 Neil Brown a écrit :
  On Thursday March 1, [EMAIL PROTECTED] wrote:
  You can only grow a RAID5 array in Linux as of 2.6.20 AFAIK.
  
  There are two dimensions for growth.
  You can increase the amount of each device that is used, or you can
  increase the number of devices.
  
  You are correct that increasing the number of devices only works for
  RAID5 (and RAID1, but you don't get extra space) in 2.6.20 (RAID6
  coming in 2.6.21).
  
  However this question is about growing an array the first way:
  increasing the amount of space used on each devices, and that is
  supported for RAID1/4/5/6.
  
  And Laurent:
1/ Yes, it is that easy
2/ I doubt a nearly-full ext3 array increases the risk
3/ The effect of adding a bitmap is that if you suffer a crash while
   the array is degraded, it will resync faster so you have less
   exposure to multiple failure.
 
 I just finished changing disks, growing the array, and then the filesystem.
 
 It worked flawlessy.
 
 Just a little notice: I had to unmount my ext3 filesystem to be able to
 resize it. (Took ~8 hours to fsck + resize the 15 disks array from 6 to
 9TB on a dual Xeon with 4GB RAM).

FWIW:

I changed a 6 x 400G system to 500G disks and grew the raid5 array. It 
worked fine save for these warnings:

Apr  8 16:54:33 sylla mdadm: RebuildStarted event detected on md device 
/dev/md1
Apr  8 16:54:33 sylla mdadm: Rebuild20 event detected on md device 
/dev/md1
Apr  8 16:54:33 sylla mdadm: Rebuild40 event detected on md device 
/dev/md1
Apr  8 16:54:33 sylla mdadm: Rebuild60 event detected on md device 
/dev/md1
Apr  8 16:54:33 sylla mdadm: Rebuild80 event detected on md device 
/dev/md1
Apr  8 16:54:33 sylla kernel: md1: invalid bitmap page request: 187 ( 
186)
Apr  8 16:54:33 sylla kernel: md1: invalid bitmap page request: 187 ( 
186)

etc...

So after rebooting I removed the bitmap and recreated it.

Online resizing of the 2.0T xfs to 2.4T only took a few _seconds_.

A similar operation on a 4 x 250G raid5 system upgraded to 400G disks 
with reiser3 also took a few seconds of _online_ time.

Kernel 2.6.20.6
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2007-01-08 Thread Louis-David Mitterrand
On Sat, Jan 06, 2007 at 12:17:36AM -0500, Chaitanya Patti wrote:
 
  We are in the process of porting RAIF to 2.6.19 right now.  Should be done
  in early January.  The trick is that we are trying to keep the same source
  good for a wide range of kernel versions.  In fact, not too long ago we
  even were able to compile it for 2.4.24!
 
  Nikolai.
 
 We now have RAIF for the 2.6.19 kernel available at:
 
 ftp://ftp.fsl.cs.sunysb.edu/pub/raif/raif-1.1.tar.gz
 
 This version is more stable but there are for sure still some remaining
 bugs and we very much appreciate your feedback.

Can RAIF be compared in functionality to drbd (when using nfs mounts 
from different hosts) ?
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [solved] supermicro failure

2006-12-19 Thread Louis-David Mitterrand
On Tue, Dec 19, 2006 at 10:47:29PM -0500, Bill Davidsen wrote:
 Louis-David Mitterrand wrote:
 On Thu, Nov 09, 2006 at 03:27:31PM +0100, Louis-David Mitterrand wrote:
   
 I forgot to add that to help us solve this we are ready to hire a paid 
 consultant please contact me by mail or phone at +33.1.46.47.21.30
 
 
 Update: we eventually succeded in reassembling the partition, with two 
 missing disks.
 
 Your update would be far more interesting if you found out why it 
 ejected three drives at once... The obvious common failures, controller 
 and power supply would not prevent reassembly in a functional environment.

Actually the motherboard and/or its on-board scsi controller turned out 
defective. Reassembly succeded once the disk were transfered to another 
box.

Has anyone seen such hardware failure on a brand new SuperMicro machine?

Thanks,
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[urgent] supermicro ejecting disks

2006-11-09 Thread Louis-David Mitterrand
Hello,

We recently changed our main server to a Supermicro with 6 scsi disks in 
soft raid6 with kernel 2.6.18. 

After running OK for a few days, 3 disks were suddenly ejected from the 
raid6. We are now trying to reassemble the partition in another box but 
keep getting a superblock corrupted error on the fs. We're in a pretty 
bad state right now, as this is a production machine. Of course we have 
bakcups but restoring them will take some time.

Does anyone have any idea of what has happended and how to fix it?

Thanks,
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[consultant needed] (was: supermicro ejecting disks)

2006-11-09 Thread Louis-David Mitterrand
I forgot to add that to help us solve this we are ready to hire a paid 
consultant please contact me by mail or phone at +33.1.46.47.21.30

Thanks

On Thu, Nov 09, 2006 at 03:18:11PM +0100, Louis-David Mitterrand wrote:
 Hello,
 
 We recently changed our main server to a Supermicro with 6 scsi disks in 
 soft raid6 with kernel 2.6.18. 
 
 After running OK for a few days, 3 disks were suddenly ejected from the 
 raid6. We are now trying to reassemble the partition in another box but 
 keep getting a superblock corrupted error on the fs. We're in a pretty 
 bad state right now, as this is a production machine. Of course we have 
 bakcups but restoring them will take some time.
 
 Does anyone have any idea of what has happended and how to fix it?
 
 Thanks,
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [solved] (was: supermicro ejecting disks)

2006-11-09 Thread Louis-David Mitterrand
On Thu, Nov 09, 2006 at 03:27:31PM +0100, Louis-David Mitterrand wrote:
 I forgot to add that to help us solve this we are ready to hire a paid 
 consultant please contact me by mail or phone at +33.1.46.47.21.30

Update: we eventually succeded in reassembling the partition, with two 
missing disks.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NCQ general question

2006-03-14 Thread Louis-David Mitterrand
On Wed, Mar 08, 2006 at 12:17:51PM -0500, Jeff Garzik wrote:
 Louis-David Mitterrand wrote:
 
 Do you plan on updating your AHCI NCQ patch found in 
 http://www.kernel.org/pub/linux/kernel/people/jgarzik/libata/archive/
 It no longer applies cleanly to the latest 2.6.15.x kernel.
 
 No, but, Jens Axboe and Tejun Heo will have a better version.

Is it available somewhere?
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NCQ general question

2006-03-08 Thread Louis-David Mitterrand
On Sun, Mar 05, 2006 at 02:29:15AM -0500, Jeff Garzik wrote:
 Raz Ben-Jehuda(caro) wrote:
 Is NCQ supported when setting the controller to JBOD instead of using HW 
 raid?
 
 1) The two have nothing to do with each other
 
 2) It sounds like you haven't yet read
 http://linux-ata.org/faq-sata-raid.html

Hello,

Do you plan on updating your AHCI NCQ patch found in 
http://www.kernel.org/pub/linux/kernel/people/jgarzik/libata/archive/
It no longer applies cleanly to the latest 2.6.15.x kernel.

Thanks,
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid, resync and hotspare

2005-07-13 Thread Louis-David Mitterrand
On Tue, Jul 12, 2005 at 05:33:08PM +0200, Laurent Caron wrote:
 Hi,
 
 I recently moved a server from old disks to now ones and added a 
 hotspare (mdadm /dev/md1 -a /dev/sdf2)
 
 the hotspare appears in  /proc/mdstat
 
 Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6]
 md1 : active raid5 sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
  285699584 blocks level 5, 64k chunk, algorithm 2 [5/5] [U]
 
 
 when I fail a disk, mdadm does *not* send me any warning.

From the mdamd man page:

Only  Fail , FailSpare , DegradedArray , and TestMessage cause Email to 
be
sent.

Your event is SpareActive which does not trigger an alert mail,
however all events can be reported through the --program switch:

All events cause the program to be run.  The program  is  run with
two or  three  arguments,  they  being the event name, the array
device and possibly a second device.

 Only at the second failure when the array is in degraded state I receive 
 a warning.

 How may I receive a warning when the hotspare disk has been used to cope 
 with a disk failure?

--program my_mail_script.sh

#!/bin/sh
nail -s $1 event detected on device $2 $3 root  EOF

Dear admin,

You array seems to have suffered to breakage:

A $1 event was received on for device $2 $3.

Fix it ASAP!

Regards,

EOF

-- 
This space for rent.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html