Help needed - RAID5 recovery from Power-fail

2006-04-03 Thread Nigel J. Terry
I wonder if you could help a Raid Newbie with a problem

I had a power fail, and now I can't access my RAID array. It has been
working fine for months until I lost power... Being a fool, I don't have
a full backup, so I really need to get this data back.

I run FC4 (64bit).
I have an array of two disks /dev/sda1 and /dev/sdb1 as a raid5 array
/dev/md0 on top of which I run lvm and mount the whole lot as /home. My
intention was always to add another disk to this array, and I purchased
one yesterday.

When I boot, I get:

md0 is not clean
Cannot start dirty degraded array
failed to run raid set md0


I can provide the following extra information:

# cat /proc/mdstat
Personalities : [raid5]
unused devices: 

# mdadm --query /dev/md0
/dev/md0: is an md device which is not active

# mdadm --query /dev/md0
/dev/md0: is an md device which is not active
/dev/md0: is too small to be an md component.

# mdadm --query /dev/sda1
/dev/sda1: is not an md array
/dev/sda1: device 0 in 2 device undetected raid5 md0.  Use mdadm
--examine for more detail.

#mdadm --query /dev/sdb1
/dev/sdb1: is not an md array
/dev/sdb1: device 1 in 2 device undetected raid5 md0.  Use mdadm
--examine for more detail.

# mdadm --examine /dev/md0
mdadm: /dev/md0 is too small for md

# mdadm --examine /dev/sda1
/dev/sda1:
  Magic : a92b4efc
Version : 00.90.02
   UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1
  Creation Time : Thu Dec 15 15:29:36 2005
 Raid Level : raid5
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

Update Time : Tue Mar 21 06:25:52 2006
  State : active
 Active Devices : 1
Working Devices : 1
 Failed Devices : 2
  Spare Devices : 0
   Checksum : 2ba99f09 - correct
 Events : 0.1498318

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice State
this 0   810  active sync   /dev/sda1

   0 0   810  active sync   /dev/sda1
   1 1   001  faulty removed

#mdadm --examine /dev/sdb1
/dev/sdb1:
  Magic : a92b4efc
Version : 00.90.02
   UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1
  Creation Time : Thu Dec 15 15:29:36 2005
 Raid Level : raid5
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

Update Time : Tue Mar 21 06:23:57 2006
  State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
   Checksum : 2ba99e95 - correct
 Events : 0.1498307

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice State
this 1   8   171  active sync   /dev/sdb1

   0 0   810  active sync   /dev/sda1
   1 1   8   171  active sync   /dev/sdb1

It looks to me like there is no hardware problem, but maybe I am wrong.
I cannot find any file /etc/mdadm.confnor   /etc/raidtab.

How would you suggest I proceed? I'm wary of doing anything (assemble,
build, create) until I am sure it won't reset everything.

Many Thanks

Nigel



-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help needed - RAID5 recovery from Power-fail

2006-04-03 Thread Neil Brown
On Monday April 3, [EMAIL PROTECTED] wrote:
> I wonder if you could help a Raid Newbie with a problem
> 
> I had a power fail, and now I can't access my RAID array. It has been
> working fine for months until I lost power... Being a fool, I don't have
> a full backup, so I really need to get this data back.
> 
> I run FC4 (64bit).
> I have an array of two disks /dev/sda1 and /dev/sdb1 as a raid5 array
> /dev/md0 on top of which I run lvm and mount the whole lot as /home. My
> intention was always to add another disk to this array, and I purchased
> one yesterday.

2 devices in a raid5??  Doesn't seem a lot of point it being raid5
rather than raid1.

> 
> When I boot, I get:
> 
> md0 is not clean
> Cannot start dirty degraded array
> failed to run raid set md0

This tells use that the array is degraded.  A dirty degraded array can
have undetectable data corruption.  That is why it won't start it for
you.
However with only two devices, data corruption from this cause isn't
actually possible. 

The kernel parameter
   md_mod.start_dirty_degraded=1
will bypass this message and start the array anyway.

Alternately:
  mdadm -A --force /dev/md0 /dev/sd[ab]1

> 
> # mdadm --examine /dev/sda1
> /dev/sda1:
>   Magic : a92b4efc
> Version : 00.90.02
>UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1
>   Creation Time : Thu Dec 15 15:29:36 2005
>  Raid Level : raid5
>Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 0
> 
> Update Time : Tue Mar 21 06:25:52 2006
>   State : active
>  Active Devices : 1

So at 06:25:52, there was only one working devices, while...


> 
> #mdadm --examine /dev/sdb1
> /dev/sdb1:
>   Magic : a92b4efc
> Version : 00.90.02
>UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1
>   Creation Time : Thu Dec 15 15:29:36 2005
>  Raid Level : raid5
>Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 0
> 
> Update Time : Tue Mar 21 06:23:57 2006
>   State : active
>  Active Devices : 2

at 06:23:57 there were two.

It looks like you lost a drive a while ago. Did you notice?

Anyway, the 'mdadm' command I gave above should get the array working
again for you.  Then you might want to
   mdadm /dev/md0 -a /dev/sdb1
is you trust /dev/sdb

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help needed - RAID5 recovery from Power-fail

2006-04-03 Thread David Greaves
Neil Brown wrote:

>On Monday April 3, [EMAIL PROTECTED] wrote:
>  
>
>>I wonder if you could help a Raid Newbie with a problem
>>
>>


>It looks like you lost a drive a while ago. Did you notice?
>
This is not unusual - raid just keeps on going if a disk fails.
When things are working again you really should read up on "mdadm -F" -
it runs as a daemon and sends you mail if any raid events occur.

See if FC4 has a script that automatically runs it - you may need to
tweak some config parameters somewhere (I use Debian so I'm not much help).

David

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help needed - RAID5 recovery from Power-fail

2006-04-04 Thread Al Boldi
Neil Brown wrote:
> 2 devices in a raid5??  Doesn't seem a lot of point it being raid5
> rather than raid1.

Wouldn't a 2-dev raid5 imply a striped block mirror (i.e faster) rather than 
a raid1 duplicate block mirror (i.e. slower) ?

Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help needed - RAID5 recovery from Power-fail - SOLVED

2006-04-05 Thread Nigel J. Terry
Thanks for all the help. I am now up and running again and have been
stable for over a day. I will now install my new drive and add it to
give me an array of three drives.

I'll also learn more about Raid, mdadm and smartd so that I am better
prepared next time.

Thanks again

Nigel
Neil Brown wrote:
> On Monday April 3, [EMAIL PROTECTED] wrote:
>   
>> I wonder if you could help a Raid Newbie with a problem
>>
>> I had a power fail, and now I can't access my RAID array. It has been
>> working fine for months until I lost power... Being a fool, I don't have
>> a full backup, so I really need to get this data back.
>>
>> I run FC4 (64bit).
>> I have an array of two disks /dev/sda1 and /dev/sdb1 as a raid5 array
>> /dev/md0 on top of which I run lvm and mount the whole lot as /home. My
>> intention was always to add another disk to this array, and I purchased
>> one yesterday.
>> 
>
> 2 devices in a raid5??  Doesn't seem a lot of point it being raid5
> rather than raid1.
>
>   
>> When I boot, I get:
>>
>> md0 is not clean
>> Cannot start dirty degraded array
>> failed to run raid set md0
>> 
>
> This tells use that the array is degraded.  A dirty degraded array can
> have undetectable data corruption.  That is why it won't start it for
> you.
> However with only two devices, data corruption from this cause isn't
> actually possible. 
>
> The kernel parameter
>md_mod.start_dirty_degraded=1
> will bypass this message and start the array anyway.
>
> Alternately:
>   mdadm -A --force /dev/md0 /dev/sd[ab]1
>
>   
>> # mdadm --examine /dev/sda1
>> /dev/sda1:
>>   Magic : a92b4efc
>> Version : 00.90.02
>>UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1
>>   Creation Time : Thu Dec 15 15:29:36 2005
>>  Raid Level : raid5
>>Raid Devices : 2
>>   Total Devices : 2
>> Preferred Minor : 0
>>
>> Update Time : Tue Mar 21 06:25:52 2006
>>   State : active
>>  Active Devices : 1
>> 
>
> So at 06:25:52, there was only one working devices, while...
>
>
>   
>> #mdadm --examine /dev/sdb1
>> /dev/sdb1:
>>   Magic : a92b4efc
>> Version : 00.90.02
>>UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1
>>   Creation Time : Thu Dec 15 15:29:36 2005
>>  Raid Level : raid5
>>Raid Devices : 2
>>   Total Devices : 2
>> Preferred Minor : 0
>>
>> Update Time : Tue Mar 21 06:23:57 2006
>>   State : active
>>  Active Devices : 2
>> 
>
> at 06:23:57 there were two.
>
> It looks like you lost a drive a while ago. Did you notice?
>
> Anyway, the 'mdadm' command I gave above should get the array working
> again for you.  Then you might want to
>mdadm /dev/md0 -a /dev/sdb1
> is you trust /dev/sdb
>
> NeilBrown
>
>
>   
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html