Re: RAID5 Recovery

2007-11-14 Thread David Greaves
Neil Cavan wrote:
 Hello,
Hi Neil

What kernel version?
What mdadm version?

 This morning, I woke up to find the array had kicked two disks. This
 time, though, /proc/mdstat showed one of the failed disks (U_U_U, one
 of the _s) had been marked as a spare - weird, since there are no
 spare drives in this array. I rebooted, and the array came back in the
 same state: one failed, one spare. I hot-removed and hot-added the
 spare drive, which put the array back to where I thought it should be
 ( still U_U_U, but with both _s marked as failed). Then I rebooted,
 and the array began rebuilding on its own. Usually I have to hot-add
 manually, so that struck me as a little odd, but I gave it no mind and
 went to work. Without checking the contents of the filesystem. Which
 turned out not to have been mounted on reboot.
OK

 Because apparently things went horribly wrong.
Yep :(

 Do I have any hope of recovering this data? Could rebuilding the
 reiserfs superblock help if the rebuild managed to corrupt the
 superblock but not the data?
See below



 Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr:
 status=0x51 { DriveReady SeekComplete Error }
snip
 Nov 13 02:01:06 localhost kernel: [17805775.156000] lost page write
 due to I/O error on md0
hdc1 fails


 Nov 13 02:01:06 localhost kernel: [17805775.196000] RAID5 conf printout:
 Nov 13 02:01:06 localhost kernel: [17805775.196000]  --- rd:5 wd:3 fd:2
 Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 0, o:1, dev:hda1
 Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 1, o:0, dev:hdc1
 Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 2, o:1, dev:hde1
 Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 4, o:1, dev:hdi1

hdg1 is already missing?

 Nov 13 02:01:06 localhost kernel: [17805775.212000] RAID5 conf printout:
 Nov 13 02:01:06 localhost kernel: [17805775.212000]  --- rd:5 wd:3 fd:2
 Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 0, o:1, dev:hda1
 Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 2, o:1, dev:hde1
 Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 4, o:1, dev:hdi1

so now the array is bad.

a reboot happens and:
 Nov 13 07:21:07 localhost kernel: [17179584.712000] md: md0 stopped.
 Nov 13 07:21:07 localhost kernel: [17179584.876000] md: bindhdc1
 Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bindhde1
 Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bindhdg1
 Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bindhdi1
 Nov 13 07:21:07 localhost kernel: [17179584.892000] md: bindhda1
 Nov 13 07:21:07 localhost kernel: [17179584.892000] md: kicking
 non-fresh hdg1 from array!
 Nov 13 07:21:07 localhost kernel: [17179584.892000] md: unbindhdg1
 Nov 13 07:21:07 localhost kernel: [17179584.892000] md: export_rdev(hdg1)
 Nov 13 07:21:07 localhost kernel: [17179584.896000] raid5: allocated
 5245kB for md0
... apparently hdc1 is OK? Hmmm.

 Nov 13 07:21:07 localhost kernel: [17179665.524000] ReiserFS: md0:
 found reiserfs format 3.6 with standard journal
 Nov 13 07:21:07 localhost kernel: [17179676.136000] ReiserFS: md0:
 using ordered data mode
 Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0:
 journal params: device md0, size 8192, journal first block 18, max
 trans len 1024, max batch 900, max commit age 30, max trans age 30
 Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0:
 checking transaction log (md0)
 Nov 13 07:21:07 localhost kernel: [17179676.828000] ReiserFS: md0:
 replayed 7 transactions in 1 seconds
 Nov 13 07:21:07 localhost kernel: [17179677.012000] ReiserFS: md0:
 Using r5 hash to sort names
 Nov 13 07:21:09 localhost kernel: [17179682.064000] lost page write
 due to I/O error on md0
Reiser tries to mount/replay itself relying on hdc1 (which is partly bad)

 Nov 13 07:25:39 localhost kernel: [17179584.828000] md: raid5
 personality registered as nr 4
 Nov 13 07:25:39 localhost kernel: [17179585.708000] md: kicking
 non-fresh hdg1 from array!
Another reboot...

 Nov 13 07:25:40 localhost kernel: [17179666.064000] ReiserFS: md0:
 found reiserfs format 3.6 with standard journal
 Nov 13 07:25:40 localhost kernel: [17179676.904000] ReiserFS: md0:
 using ordered data mode
 Nov 13 07:25:40 localhost kernel: [17179676.928000] ReiserFS: md0:
 journal params: device md0, size 8192, journal first block 18, max
 trans len 1024, max batch 900, max commit age 30, max trans age 30
 Nov 13 07:25:40 localhost kernel: [17179676.932000] ReiserFS: md0:
 checking transaction log (md0)
 Nov 13 07:25:40 localhost kernel: [17179677.08] ReiserFS: md0:
 Using r5 hash to sort names
 Nov 13 07:25:42 localhost kernel: [17179683.128000] lost page write
 due to I/O error on md0
Reiser tries again...

 Nov 13 07:26:57 localhost kernel: [17179757.524000] md: unbindhdc1
 Nov 13 07:26:57 localhost kernel: [17179757.524000] md: export_rdev(hdc1)
 Nov 13 07:27:03 localhost kernel: [17179763.70] md: bindhdc1
 Nov 13 07:30:24 

Re: RAID5 Recovery

2006-10-22 Thread Neil Brown
On Saturday October 21, [EMAIL PROTECTED] wrote:
 Hi,
 
 I had a run-in with the Ubuntu Server installer, and in trying to get
 the new system to recognize the clean 5-disk raid5 array left behind by
 the previous Ubuntu system, I think I inadvertently instructed it to
 create a new raid array using those same partitions.
 
 What I know for sure is that now, I get this:
 
 [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hda1
 mdadm: No super block found on /dev/hda1 (Expected magic a92b4efc, got
 )
 [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdc1
 mdadm: No super block found on /dev/hdc1 (Expected magic a92b4efc, got
 )
 [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hde1
 mdadm: No super block found on /dev/hde1 (Expected magic a92b4efc, got
 )
 [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdg1
 mdadm: No super block found on /dev/hdg1 (Expected magic a92b4efc, got
 )
 [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdi1
 mdadm: No super block found on /dev/hdi1 (Expected magic a92b4efc, got
 )
 
 I didn't format the partitions or write any data to the disk, so I think
 the array's data should be intact. Is there a way to recreate the
 superblocks, or am I hosed?

Weirds Could the drives have been repartitioned in the process,
with the partitions being slightly different sizes or at slightly
different offsets?  That might explain the disappearing superblocks,
and remaking the partitions might fix it.

Or you can just re-create the array.  Doing so won't destroy any data
that happens to be there.
To be on the safe side, create it with --assume-clean.  This will avoid
a resync so you can be sure that no data blocks will be written at
all.
Then 'fsck -n' or mount readonly and see if you data is safe.  
Once you are happy that you have the data safe you can trigger the
resync with
   mdadm --assemble --update=resync .
or 
   echo resync  /sys/block/md0/md/sync_action

(assuming it is 'md0').

Good luck.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-19 Thread Maurice Hilarius
Nathanial Byrnes wrote:
 Yes, I did not have the funding nor approval to purchase more hardware
 when I set it up (read wife). Once it was working... the rest is
 history.

   

OK, so if you have a pair of IDE disks, jumpered as Master and slave,
and if one fails:

If Master failed, re-jumper remaining disk on pair on same cable as
Master, no slave present

If Slave failed, re-jumper remaining disk on pair on same cable as
Master, no slave present.

Then you will have the remaining disk working normally, at least.

When you can afford it I suggest buying a controller with enough ports
to support the number of drives you have, with no Master/Slave pairing.

Good luck !

And to the  software guys trying to help: We need to start with the
(obvious) hardware problem, before we advise on how to recover data from
a borked system..
Once he has the jumpering on the drives sorted out, the drive that went
missing will be back again..


-- 

Regards,
Maurice

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-19 Thread Nate Byrnes

Hi All,
   I'm not sure that is entirely the case. From a hardware perspective, 
I can access all the disks from the OS, via fdisk and dd. It is really 
just mdadm that is failing.  Would I still need to work the jumper issue?

   Thanks,
   Nate

Maurice Hilarius wrote:

Nathanial Byrnes wrote:
  

Yes, I did not have the funding nor approval to purchase more hardware
when I set it up (read wife). Once it was working... the rest is
history.

  



OK, so if you have a pair of IDE disks, jumpered as Master and slave,
and if one fails:

If Master failed, re-jumper remaining disk on pair on same cable as
Master, no slave present

If Slave failed, re-jumper remaining disk on pair on same cable as
Master, no slave present.

Then you will have the remaining disk working normally, at least.

When you can afford it I suggest buying a controller with enough ports
to support the number of drives you have, with no Master/Slave pairing.

Good luck !

And to the  software guys trying to help: We need to start with the
(obvious) hardware problem, before we advise on how to recover data from
a borked system..
Once he has the jumpering on the drives sorted out, the drive that went
missing will be back again..


  

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-19 Thread Maurice Hilarius
Nate Byrnes wrote:
 Hi All,
I'm not sure that is entirely the case. From a hardware
 perspective, I can access all the disks from the OS, via fdisk and dd.
 It is really just mdadm that is failing.  Would I still need to work
 the jumper issue?
Thanks,
Nate

IF the disks are as we suspect (master and slave relationships) and IF
you now have either a failed or a removed drive, then you  MUST correct
the jumpering.
Sure, you can often see a disk that is misconfigured.
It is almost certain, however, that when you write to it you will simply
cause corruption on it.

Of course, so far this is all speculation, as you have not actually said
what the disks, controller interfaces, and jumpering and so forth are at.
I was merely speculating, based on what you have said.

No amount of software magic will cure a hardware problem..


-- 

With our best regards,


Maurice W. HilariusTelephone: 01-780-456-9771
Hard Data Ltd.  FAX:   01-780-456-9772
11060 - 166 Avenue email:[EMAIL PROTECTED]
Edmonton, AB, Canada   http://www.harddata.com/
   T5X 1Y3

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-19 Thread Nate Byrnes

Hello,
   I replaced the failed disk. The configuration is /dev/hde, /dev/hdf 
(replaced), on IDE channel 0, /dev/hdg, /dev/hdh on IDE channel 1, on a 
single PCI controller card. The issue here is that hde in now also not 
accessible after the failure of hdf.  I cannot see the jumper configs as 
the server is at home, and I am at work. The general thinking was that 
the hde superblock got hosed with the loss of hdf.


My initial post only did discuss the disk ordering and device names. As 
I had replaced the disk which had failed (in a previously fully 
functioning array), with a new disk with exactly the same configuration 
(jumpers, cable locations, etc), and each of the disks could be 
accessed, my thinking was that there would not be a hardware problem to 
sort through. Is this logic flawed?

   Thanks again,
   Nate

Maurice Hilarius wrote:

Nate Byrnes wrote:
  

Hi All,
   I'm not sure that is entirely the case. From a hardware
perspective, I can access all the disks from the OS, via fdisk and dd.
It is really just mdadm that is failing.  Would I still need to work
the jumper issue?
   Thanks,
   Nate



IF the disks are as we suspect (master and slave relationships) and IF
you now have either a failed or a removed drive, then you  MUST correct
the jumpering.
Sure, you can often see a disk that is misconfigured.
It is almost certain, however, that when you write to it you will simply
cause corruption on it.

Of course, so far this is all speculation, as you have not actually said
what the disks, controller interfaces, and jumpering and so forth are at.
I was merely speculating, based on what you have said.

No amount of software magic will cure a hardware problem..


  

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-18 Thread Nathanial Byrnes
2.4.1 behaves just like 2.1. so far nothing in the syslog or messages.

On Tue, 2006-04-18 at 10:24 +1000, Neil Brown wrote:
 On Monday April 17, [EMAIL PROTECTED] wrote:
  Unfortunately nothing changed. 
 
 Weird... so hdf still reports as 'busy'?
 Is it mentioned anywhere in /var/log/messages since reboot?
 
 What version of mdadm are you using?  Try 2.4.1 and see if that works
 differently.
 
 NeilBrown
 
  
  
  On Tue, 2006-04-18 at 07:43 +1000, Neil Brown wrote:
   On Monday April 17, [EMAIL PROTECTED] wrote:
Hi Neil, List,
Am I just out of luck? Perhaps a full reboot? Something else?
Thanks,
Nate
   
   Reboot and try again seems like the best bet at this stage.
   
   NeilBrown
   -
   To unsubscribe from this list: send the line unsubscribe linux-raid in
   the body of a message to [EMAIL PROTECTED]
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
   
   
   
  
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 !DSPAM:31e693751804284693!
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-18 Thread Maurice Hilarius
Nathanial Byrnes wrote:
 Hi All,
   Recently I lost a disk in my raid5 SW array. It seems that it took a
 second disk with it. The other disk appears to still be funtional (from
 an fdisk perspective...). I am trying to get the array to work in
 degraded mode via failed-disk in raidtab, but am always getting the
 following error:

   
Let me guess:
IDE disks, in pairs.
Jumpered as Master and Salve.

Right?





-- 

With our best regards,


Maurice W. HilariusTelephone: 01-780-456-9771
Hard Data Ltd.  FAX:   01-780-456-9772
11060 - 166 Avenue email:[EMAIL PROTECTED]
Edmonton, AB, Canada   http://www.harddata.com/
   T5X 1Y3

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-18 Thread Nathanial Byrnes
Yes, I did not have the funding nor approval to purchase more hardware
when I set it up (read wife). Once it was working... the rest is
history.

On Tue, 2006-04-18 at 16:13 -0600, Maurice Hilarius wrote:
 Nathanial Byrnes wrote:
  Hi All,
  Recently I lost a disk in my raid5 SW array. It seems that it took a
  second disk with it. The other disk appears to still be funtional (from
  an fdisk perspective...). I am trying to get the array to work in
  degraded mode via failed-disk in raidtab, but am always getting the
  following error:
 

 Let me guess:
 IDE disks, in pairs.
 Jumpered as Master and Salve.
 
 Right?
 
 
 
 
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Nathanial Byrnes
Please see below.

On Mon, 2006-04-17 at 13:04 +1000, Neil Brown wrote:
 On Sunday April 16, [EMAIL PROTECTED] wrote:
  Hi Neil,
  Thanks for your reply. I tried that, but here is there error I
  received:
  
  [EMAIL PROTECTED]:/etc# mdadm --assemble /dev/md0
  --uuid=38081921:59a998f9:64c1a001:ec53 4ef2 /dev/hd[efgh]
  mdadm: failed to add /dev/hdf to /dev/md0: Device or resource busy
  mdadm: /dev/md0 assembled from 2 drives and -1 spares - not enough to
  start the array.
 
 What is /dev/hdf busy? Is it in use? mounted? something?
 
Not that I am aware of. Here is the mount output:

[EMAIL PROTECTED]:/etc# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
/dev/sdb1 on /usr type ext3 (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
nfsd on /proc/fs/nfsd type nfsd (rw)
usbfs on /proc/bus/usb type usbfs (rw)

lsof | grep hdf does not return any results.

is there some other way to find out?
  
  The output from lsraid against each device is as follows (I think that I
  messed up my superblocks pretty well...): 
 
 Sorry, but I don't use lsraid and cannot tell anything useful from it's
 output.
ok
 
 NeilBrown
 
 !DSPAM:444305b971501811819476!
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Neil Brown
On Monday April 17, [EMAIL PROTECTED] wrote:
  
  What is /dev/hdf busy? Is it in use? mounted? something?
  
 Not that I am aware of. Here is the mount output:
 
 [EMAIL PROTECTED]:/etc# mount
 /dev/sda1 on / type ext3 (rw)
 proc on /proc type proc (rw)
 sysfs on /sys type sysfs (rw)
 /dev/sdb1 on /usr type ext3 (rw)
 devpts on /dev/pts type devpts (rw,gid=5,mode=620)
 nfsd on /proc/fs/nfsd type nfsd (rw)
 usbfs on /proc/bus/usb type usbfs (rw)
 
 lsof | grep hdf does not return any results.
 
 is there some other way to find out?

 cat /proc/swaps
 cat /proc/mounts
 cat /proc/mdstat

as well as 'lsof' should find it.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Nate Byrnes

Hi Neil,
   Nothing references hdf as you can see below.  I have also rmmod'ed 
md and raid5 modules and modprobed them back in. Thoughts?


   Thanks again,
   Nate

[EMAIL PROTECTED]:~# cat /proc/swaps
FilenameTypeSizeUsed
Priority

/dev/sdb2   partition   1050616 1028-1

[EMAIL PROTECTED]:~# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 rw 0 0
proc /proc proc rw,nodiratime 0 0
sysfs /sys sysfs rw 0 0
none /dev ramfs rw 0 0
/dev/sdb1 /usr ext3 rw 0 0
devpts /dev/pts devpts rw 0 0
nfsd /proc/fs/nfsd nfsd rw 0 0
usbfs /proc/bus/usb usbfs rw 0 0

[EMAIL PROTECTED]:~# cat /proc/mdstat
Personalities : [raid5]
md0 : inactive hdh[2] hdg[3] hde[1]
 234451968 blocks

unused devices: none


Neil Brown wrote:

On Monday April 17, [EMAIL PROTECTED] wrote:
  

What is /dev/hdf busy? Is it in use? mounted? something?

  

Not that I am aware of. Here is the mount output:

[EMAIL PROTECTED]:/etc# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
/dev/sdb1 on /usr type ext3 (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
nfsd on /proc/fs/nfsd type nfsd (rw)
usbfs on /proc/bus/usb type usbfs (rw)

lsof | grep hdf does not return any results.

is there some other way to find out?



 cat /proc/swaps
 cat /proc/mounts
 cat /proc/mdstat

as well as 'lsof' should find it.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

!DSPAM:44436e3576593808182809!

  

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Nate Byrnes

Hi Neil, List,
   Am I just out of luck? Perhaps a full reboot? Something else?
   Thanks,
   Nate

Nate Byrnes wrote:

Hi Neil,
   Nothing references hdf as you can see below.  I have also rmmod'ed 
md and raid5 modules and modprobed them back in. Thoughts?


   Thanks again,
   Nate

[EMAIL PROTECTED]:~# cat /proc/swaps
FilenameTypeSize
UsedPriority
/dev/sdb2   partition   1050616 
1028-1


[EMAIL PROTECTED]:~# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 rw 0 0
proc /proc proc rw,nodiratime 0 0
sysfs /sys sysfs rw 0 0
none /dev ramfs rw 0 0
/dev/sdb1 /usr ext3 rw 0 0
devpts /dev/pts devpts rw 0 0
nfsd /proc/fs/nfsd nfsd rw 0 0
usbfs /proc/bus/usb usbfs rw 0 0

[EMAIL PROTECTED]:~# cat /proc/mdstat
Personalities : [raid5]
md0 : inactive hdh[2] hdg[3] hde[1]
 234451968 blocks

unused devices: none


Neil Brown wrote:

On Monday April 17, [EMAIL PROTECTED] wrote:
 

What is /dev/hdf busy? Is it in use? mounted? something?

  

Not that I am aware of. Here is the mount output:

[EMAIL PROTECTED]:/etc# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
/dev/sdb1 on /usr type ext3 (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
nfsd on /proc/fs/nfsd type nfsd (rw)
usbfs on /proc/bus/usb type usbfs (rw)

lsof | grep hdf does not return any results.

is there some other way to find out?



 cat /proc/swaps
 cat /proc/mounts
 cat /proc/mdstat

as well as 'lsof' should find it.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



  


!DSPAM:444386c978211215816793!


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Neil Brown
On Monday April 17, [EMAIL PROTECTED] wrote:
 Hi Neil, List,
 Am I just out of luck? Perhaps a full reboot? Something else?
 Thanks,
 Nate

Reboot and try again seems like the best bet at this stage.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Nathanial Byrnes
Unfortunately nothing changed. 


On Tue, 2006-04-18 at 07:43 +1000, Neil Brown wrote:
 On Monday April 17, [EMAIL PROTECTED] wrote:
  Hi Neil, List,
  Am I just out of luck? Perhaps a full reboot? Something else?
  Thanks,
  Nate
 
 Reboot and try again seems like the best bet at this stage.
 
 NeilBrown
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 !DSPAM:0c1a90901937570534!
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Neil Brown
On Monday April 17, [EMAIL PROTECTED] wrote:
 Unfortunately nothing changed. 

Weird... so hdf still reports as 'busy'?
Is it mentioned anywhere in /var/log/messages since reboot?

What version of mdadm are you using?  Try 2.4.1 and see if that works
differently.

NeilBrown

 
 
 On Tue, 2006-04-18 at 07:43 +1000, Neil Brown wrote:
  On Monday April 17, [EMAIL PROTECTED] wrote:
   Hi Neil, List,
   Am I just out of luck? Perhaps a full reboot? Something else?
   Thanks,
   Nate
  
  Reboot and try again seems like the best bet at this stage.
  
  NeilBrown
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
  !DSPAM:0c1a90901937570534!
  
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-16 Thread Neil Brown
On Saturday April 15, [EMAIL PROTECTED] wrote:
 Hi All,
   Recently I lost a disk in my raid5 SW array. It seems that it took a
 second disk with it. The other disk appears to still be funtional (from
 an fdisk perspective...). I am trying to get the array to work in
 degraded mode via failed-disk in raidtab, but am always getting the
 following error:
 
 md: could not bd_claim hde.
 md: autostart failed!
 
 When I try to raidstart the array. Is it the case tha I had been running
 in degraded mode before the disk failure, and then lost the other disk?
 if so, how can I tell. 

raidstart is deprecated.  It doesn't work reliably.  Don't use it.

 
 I have been messing about with mkraid -R and I have tried to
 add /dev/hdf (a new disk) back to the array. However, I am fairly
 confident that I have not kicked off the recovery process, so I am
 imagining that once I get the superblocks in order, I should be able to
 recover to the new disk?
 
 My system and raid config are:
 Kernel 2.6.13.1
 Slack 10.2
 RAID 5 which originally looked like:
 /dev/hde
 /dev/hdg
 /dev/hdi
 /dev/hdk
 
 but when I moved the disks to another box with fewer IDE controllers
 /dev/hde
 /dev/hdf
 /dev/hdg
 /dev/hdh
 
 How should I approach this?

mdadm --assemble /dev/md0 --uuid=38081921:59a998f9:64c1a001:ec534ef2 /dev/hd*

If that doesn't work, add --force but be cautious of the data - do
an fsck atleast.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-16 Thread Nathanial Byrnes
Hi Neil,
Thanks for your reply. I tried that, but here is there error I
received:

[EMAIL PROTECTED]:/etc# mdadm --assemble /dev/md0
--uuid=38081921:59a998f9:64c1a001:ec53 4ef2 /dev/hd[efgh]
mdadm: failed to add /dev/hdf to /dev/md0: Device or resource busy
mdadm: /dev/md0 assembled from 2 drives and -1 spares - not enough to
start the array.

The output from lsraid against each device is as follows (I think that I
messed up my superblocks pretty well...): 

[EMAIL PROTECTED]:/etc# lsraid -d /dev/hde
[dev   9,   0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2
offline
[dev   ?,   ?] (unknown)...
missing
[dev   ?,   ?] (unknown)...
missing
[dev  34,  64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  34,   0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  33,  64] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown
[dev  33,   0] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown

[dev  33,   0] /dev/hde 38081921.59A998F9.64C1A001.EC534EF2
unbound
[EMAIL PROTECTED]:/etc# lsraid -d /dev/hdf
[dev   9,   0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2
offline
[dev   ?,   ?] (unknown)...
missing
[dev   ?,   ?] (unknown)...
missing
[dev  34,  64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  34,   0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  33,  64] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown
[dev  33,   0] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown

[dev  33,  64] /dev/hdf 38081921.59A998F9.64C1A001.EC534EF2
unbound
[EMAIL PROTECTED]:/etc# lsraid -d /dev/hdg
[dev   9,   0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2
offline
[dev   ?,   ?] (unknown)...
missing
[dev   ?,   ?] (unknown)...
missing
[dev  34,  64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  34,   0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  33,  64] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown
[dev  33,   0] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown

[EMAIL PROTECTED]:/etc# lsraid -d /dev/hdh
[dev   9,   0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2
offline
[dev   ?,   ?] (unknown)...
missing
[dev   ?,   ?] (unknown)...
missing
[dev  34,  64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  34,   0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  33,  64] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown
[dev  33,   0] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown


Thanks again,
Nate

On Mon, 2006-04-17 at 08:46 +1000, Neil Brown wrote:
 On Saturday April 15, [EMAIL PROTECTED] wrote:
  Hi All,
  Recently I lost a disk in my raid5 SW array. It seems that it took a
  second disk with it. The other disk appears to still be funtional (from
  an fdisk perspective...). I am trying to get the array to work in
  degraded mode via failed-disk in raidtab, but am always getting the
  following error:
  
  md: could not bd_claim hde.
  md: autostart failed!
  
  When I try to raidstart the array. Is it the case tha I had been running
  in degraded mode before the disk failure, and then lost the other disk?
  if so, how can I tell. 
 
 raidstart is deprecated.  It doesn't work reliably.  Don't use it.
 
  
  I have been messing about with mkraid -R and I have tried to
  add /dev/hdf (a new disk) back to the array. However, I am fairly
  confident that I have not kicked off the recovery process, so I am
  imagining that once I get the superblocks in order, I should be able to
  recover to the new disk?
  
  My system and raid config are:
  Kernel 2.6.13.1
  Slack 10.2
  RAID 5 which originally looked like:
  /dev/hde
  /dev/hdg
  /dev/hdi
  /dev/hdk
  
  but when I moved the disks to another box with fewer IDE controllers
  /dev/hde
  /dev/hdf
  /dev/hdg
  /dev/hdh
  
  How should I approach this?
 
 mdadm --assemble /dev/md0 --uuid=38081921:59a998f9:64c1a001:ec534ef2 /dev/hd*
 
 If that doesn't work, add --force but be cautious of the data - do
 an fsck atleast.
 
 NeilBrown
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 !DSPAM:4442c93863991804284693!
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-16 Thread Neil Brown
On Sunday April 16, [EMAIL PROTECTED] wrote:
 Hi Neil,
   Thanks for your reply. I tried that, but here is there error I
 received:
 
 [EMAIL PROTECTED]:/etc# mdadm --assemble /dev/md0
 --uuid=38081921:59a998f9:64c1a001:ec53 4ef2 /dev/hd[efgh]
 mdadm: failed to add /dev/hdf to /dev/md0: Device or resource busy
 mdadm: /dev/md0 assembled from 2 drives and -1 spares - not enough to
 start the array.

What is /dev/hdf busy? Is it in use? mounted? something?

 
 The output from lsraid against each device is as follows (I think that I
 messed up my superblocks pretty well...): 

Sorry, but I don't use lsraid and cannot tell anything useful from it's
output.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 recovery fails

2005-11-14 Thread Ross Vandegrift
On Mon, Nov 14, 2005 at 09:27:25PM +0200, Raz Ben-Jehuda(caro) wrote:
 I have made the following test with my raid5:
 1. created raid5 with 4 sata disks.
 2. waited untill raid was fully initialized.
 3. pulled a disk from the panel.
 4. shut the system.
 5. put back the disk.
 6. turn on the system.
 
 The raid failed failed to recver. i got message from the md layer
 saying that it rejects the dirty disk.
 Anyone ?

Did you re-add the disk to the array?

# mdadm --add /dev/md0 /dev/sda2

Of course, substitude your appropriate devices for the ones that I
randomly chose ::-)


-- 
Ross Vandegrift
[EMAIL PROTECTED]

The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell.
--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html