Re: RAID5 Recovery

2007-11-14 Thread David Greaves
Neil Cavan wrote:
 Hello,
Hi Neil

What kernel version?
What mdadm version?

 This morning, I woke up to find the array had kicked two disks. This
 time, though, /proc/mdstat showed one of the failed disks (U_U_U, one
 of the _s) had been marked as a spare - weird, since there are no
 spare drives in this array. I rebooted, and the array came back in the
 same state: one failed, one spare. I hot-removed and hot-added the
 spare drive, which put the array back to where I thought it should be
 ( still U_U_U, but with both _s marked as failed). Then I rebooted,
 and the array began rebuilding on its own. Usually I have to hot-add
 manually, so that struck me as a little odd, but I gave it no mind and
 went to work. Without checking the contents of the filesystem. Which
 turned out not to have been mounted on reboot.
OK

 Because apparently things went horribly wrong.
Yep :(

 Do I have any hope of recovering this data? Could rebuilding the
 reiserfs superblock help if the rebuild managed to corrupt the
 superblock but not the data?
See below



 Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr:
 status=0x51 { DriveReady SeekComplete Error }
snip
 Nov 13 02:01:06 localhost kernel: [17805775.156000] lost page write
 due to I/O error on md0
hdc1 fails


 Nov 13 02:01:06 localhost kernel: [17805775.196000] RAID5 conf printout:
 Nov 13 02:01:06 localhost kernel: [17805775.196000]  --- rd:5 wd:3 fd:2
 Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 0, o:1, dev:hda1
 Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 1, o:0, dev:hdc1
 Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 2, o:1, dev:hde1
 Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 4, o:1, dev:hdi1

hdg1 is already missing?

 Nov 13 02:01:06 localhost kernel: [17805775.212000] RAID5 conf printout:
 Nov 13 02:01:06 localhost kernel: [17805775.212000]  --- rd:5 wd:3 fd:2
 Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 0, o:1, dev:hda1
 Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 2, o:1, dev:hde1
 Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 4, o:1, dev:hdi1

so now the array is bad.

a reboot happens and:
 Nov 13 07:21:07 localhost kernel: [17179584.712000] md: md0 stopped.
 Nov 13 07:21:07 localhost kernel: [17179584.876000] md: bindhdc1
 Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bindhde1
 Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bindhdg1
 Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bindhdi1
 Nov 13 07:21:07 localhost kernel: [17179584.892000] md: bindhda1
 Nov 13 07:21:07 localhost kernel: [17179584.892000] md: kicking
 non-fresh hdg1 from array!
 Nov 13 07:21:07 localhost kernel: [17179584.892000] md: unbindhdg1
 Nov 13 07:21:07 localhost kernel: [17179584.892000] md: export_rdev(hdg1)
 Nov 13 07:21:07 localhost kernel: [17179584.896000] raid5: allocated
 5245kB for md0
... apparently hdc1 is OK? Hmmm.

 Nov 13 07:21:07 localhost kernel: [17179665.524000] ReiserFS: md0:
 found reiserfs format 3.6 with standard journal
 Nov 13 07:21:07 localhost kernel: [17179676.136000] ReiserFS: md0:
 using ordered data mode
 Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0:
 journal params: device md0, size 8192, journal first block 18, max
 trans len 1024, max batch 900, max commit age 30, max trans age 30
 Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0:
 checking transaction log (md0)
 Nov 13 07:21:07 localhost kernel: [17179676.828000] ReiserFS: md0:
 replayed 7 transactions in 1 seconds
 Nov 13 07:21:07 localhost kernel: [17179677.012000] ReiserFS: md0:
 Using r5 hash to sort names
 Nov 13 07:21:09 localhost kernel: [17179682.064000] lost page write
 due to I/O error on md0
Reiser tries to mount/replay itself relying on hdc1 (which is partly bad)

 Nov 13 07:25:39 localhost kernel: [17179584.828000] md: raid5
 personality registered as nr 4
 Nov 13 07:25:39 localhost kernel: [17179585.708000] md: kicking
 non-fresh hdg1 from array!
Another reboot...

 Nov 13 07:25:40 localhost kernel: [17179666.064000] ReiserFS: md0:
 found reiserfs format 3.6 with standard journal
 Nov 13 07:25:40 localhost kernel: [17179676.904000] ReiserFS: md0:
 using ordered data mode
 Nov 13 07:25:40 localhost kernel: [17179676.928000] ReiserFS: md0:
 journal params: device md0, size 8192, journal first block 18, max
 trans len 1024, max batch 900, max commit age 30, max trans age 30
 Nov 13 07:25:40 localhost kernel: [17179676.932000] ReiserFS: md0:
 checking transaction log (md0)
 Nov 13 07:25:40 localhost kernel: [17179677.08] ReiserFS: md0:
 Using r5 hash to sort names
 Nov 13 07:25:42 localhost kernel: [17179683.128000] lost page write
 due to I/O error on md0
Reiser tries again...

 Nov 13 07:26:57 localhost kernel: [17179757.524000] md: unbindhdc1
 Nov 13 07:26:57 localhost kernel: [17179757.524000] md: export_rdev(hdc1)
 Nov 13 07:27:03 localhost kernel: [17179763.70] md: bindhdc1
 Nov 13 07:30:24 

Fwd: RAID5 Recovery

2007-11-14 Thread Neil Cavan
Thanks for taking a look, David.

Kernel:
2.6.15-27-k7, stock for Ubuntu 6.06 LTS

mdadm:
mdadm - v1.12.0 - 14 June 2005

You're right, earlier in /var/log/messages there's a notice that hdg
dropped, I missed it before. I use mdadm --monitor, but I recently
changed the target email address - I guess it didn't take properly.

As for replacing hdc, thanks for the diagnosis but it won't help: the
drive is actually fine, as is hdg. I've replaced hdc before, only to
have the brand new hdc show the same behaviour, and SMART says the
drive is A-OK. There's something flaky about these PCI IDE
controllers. I think it's new system time.

Reiserfs recovery-wise: any suggestions? A simple fsck doesn't find a
file system superblock. Is --rebuild-sb the way to go here?

Thanks,
Neil


On Nov 14, 2007 5:58 AM, David Greaves [EMAIL PROTECTED] wrote:
 Neil Cavan wrote:
  Hello,
 Hi Neil

 What kernel version?
 What mdadm version?

  This morning, I woke up to find the array had kicked two disks. This
  time, though, /proc/mdstat showed one of the failed disks (U_U_U, one
  of the _s) had been marked as a spare - weird, since there are no
  spare drives in this array. I rebooted, and the array came back in the
  same state: one failed, one spare. I hot-removed and hot-added the
  spare drive, which put the array back to where I thought it should be
  ( still U_U_U, but with both _s marked as failed). Then I rebooted,
  and the array began rebuilding on its own. Usually I have to hot-add
  manually, so that struck me as a little odd, but I gave it no mind and
  went to work. Without checking the contents of the filesystem. Which
  turned out not to have been mounted on reboot.
 OK

  Because apparently things went horribly wrong.
 Yep :(

  Do I have any hope of recovering this data? Could rebuilding the
  reiserfs superblock help if the rebuild managed to corrupt the
  superblock but not the data?
 See below



  Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr:
  status=0x51 { DriveReady SeekComplete Error }
 snip
  Nov 13 02:01:06 localhost kernel: [17805775.156000] lost page write
  due to I/O error on md0
 hdc1 fails


  Nov 13 02:01:06 localhost kernel: [17805775.196000] RAID5 conf printout:
  Nov 13 02:01:06 localhost kernel: [17805775.196000]  --- rd:5 wd:3 fd:2
  Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 0, o:1, dev:hda1
  Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 1, o:0, dev:hdc1
  Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 2, o:1, dev:hde1
  Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 4, o:1, dev:hdi1

 hdg1 is already missing?

  Nov 13 02:01:06 localhost kernel: [17805775.212000] RAID5 conf printout:
  Nov 13 02:01:06 localhost kernel: [17805775.212000]  --- rd:5 wd:3 fd:2
  Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 0, o:1, dev:hda1
  Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 2, o:1, dev:hde1
  Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 4, o:1, dev:hdi1

 so now the array is bad.

 a reboot happens and:
  Nov 13 07:21:07 localhost kernel: [17179584.712000] md: md0 stopped.
  Nov 13 07:21:07 localhost kernel: [17179584.876000] md: bindhdc1
  Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bindhde1
  Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bindhdg1
  Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bindhdi1
  Nov 13 07:21:07 localhost kernel: [17179584.892000] md: bindhda1
  Nov 13 07:21:07 localhost kernel: [17179584.892000] md: kicking
  non-fresh hdg1 from array!
  Nov 13 07:21:07 localhost kernel: [17179584.892000] md: unbindhdg1
  Nov 13 07:21:07 localhost kernel: [17179584.892000] md: export_rdev(hdg1)
  Nov 13 07:21:07 localhost kernel: [17179584.896000] raid5: allocated
  5245kB for md0
 ... apparently hdc1 is OK? Hmmm.

  Nov 13 07:21:07 localhost kernel: [17179665.524000] ReiserFS: md0:
  found reiserfs format 3.6 with standard journal
  Nov 13 07:21:07 localhost kernel: [17179676.136000] ReiserFS: md0:
  using ordered data mode
  Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0:
  journal params: device md0, size 8192, journal first block 18, max
  trans len 1024, max batch 900, max commit age 30, max trans age 30
  Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0:
  checking transaction log (md0)
  Nov 13 07:21:07 localhost kernel: [17179676.828000] ReiserFS: md0:
  replayed 7 transactions in 1 seconds
  Nov 13 07:21:07 localhost kernel: [17179677.012000] ReiserFS: md0:
  Using r5 hash to sort names
  Nov 13 07:21:09 localhost kernel: [17179682.064000] lost page write
  due to I/O error on md0
 Reiser tries to mount/replay itself relying on hdc1 (which is partly bad)

  Nov 13 07:25:39 localhost kernel: [17179584.828000] md: raid5
  personality registered as nr 4
  Nov 13 07:25:39 localhost kernel: [17179585.708000] md: kicking
  non-fresh hdg1 from array!
 Another reboot...

  Nov 13 07:25:40 localhost kernel: 

Re: Fwd: RAID5 Recovery

2007-11-14 Thread David Greaves
Neil Cavan wrote:
 Thanks for taking a look, David.
No problem.

 Kernel:
 2.6.15-27-k7, stock for Ubuntu 6.06 LTS
 
 mdadm:
 mdadm - v1.12.0 - 14 June 2005
OK - fairly old then. Not really worth trying to figure out why hdc got re-added
when things had gone wrong.

 You're right, earlier in /var/log/messages there's a notice that hdg
 dropped, I missed it before. I use mdadm --monitor, but I recently
 changed the target email address - I guess it didn't take properly.
 
 As for replacing hdc, thanks for the diagnosis but it won't help: the
 drive is actually fine, as is hdg. I've replaced hdc before, only to
 have the brand new hdc show the same behaviour, and SMART says the
 drive is A-OK. There's something flaky about these PCI IDE
 controllers. I think it's new system time.
Any excuse eh? :)


 Reiserfs recovery-wise: any suggestions? A simple fsck doesn't find a
 file system superblock. Is --rebuild-sb the way to go here?
No idea, sorry. I only ever tried Reiser once and it failed. It was very hard to
get recovered so I swapped back to XFS.

Good luck on the fscking

David
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID5 Recovery

2007-11-13 Thread Neil Cavan
Hello,

I have a 5-disk RAID5 array that has gone belly-up. It consists of 2x
2 disks on Promise PCI controllers, and one on the mobo controller.

This array has been running for a couple years, and every so often
(randomly, sometimes every couple weeks sometimes no problem for
months) it will drop a drive. It's not a drive failure per se, it's
something controller-related since the failures tend to happen in
pairs and SMART gives the drives a clean bill of health. If it's only
one drive, I can hot-add with no problem. If it's 2 drives my heart
leaps into my mouth but I reboot, only one of the drives comes up as
failed, and I can hot-add with no problem. The 2-drive case has
happened a dozen times and my array is never any worse for the wear.

This morning, I woke up to find the array had kicked two disks. This
time, though, /proc/mdstat showed one of the failed disks (U_U_U, one
of the _s) had been marked as a spare - weird, since there are no
spare drives in this array. I rebooted, and the array came back in the
same state: one failed, one spare. I hot-removed and hot-added the
spare drive, which put the array back to where I thought it should be
( still U_U_U, but with both _s marked as failed). Then I rebooted,
and the array began rebuilding on its own. Usually I have to hot-add
manually, so that struck me as a little odd, but I gave it no mind and
went to work. Without checking the contents of the filesystem. Which
turned out not to have been mounted on reboot. Because apparently
things went horribly wrong.

The rebuild process ran its course. I now have an array that mdadm
insists is peachy:
---
md0 : active raid5 hda1[0] hdc1[1] hdi1[4] hdg1[3] hde1[2]
  468872704 blocks level 5, 64k chunk, algorithm 2 [5/5] [U]

unused devices: none
---

But there is no filesystem on /dev/md0:

---
sudo mount -t reiserfs /dev/md0 /storage/
mount: wrong fs type, bad option, bad superblock on /dev/md0,
   missing codepage or other error
---

Do I have any hope of recovering this data? Could rebuilding the
reiserfs superblock help if the rebuild managed to corrupt the
superblock but not the data?

Any help is appreciated, below is the failure event in
/var/log/messages, followed by the output of cat /var/log/messages |
grep md.

Thanks,
Neil Cavan

Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr:
status=0x51 { DriveReady SeekComplete Error }
Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr:
error=0x40 { UncorrectableError }, LBAsect=11736, sector=1
1719
Nov 13 02:01:03 localhost kernel: [17805772.424000] ide: failed opcode
was: unknown
Nov 13 02:01:03 localhost kernel: [17805772.424000] end_request: I/O
error, dev hdc, sector 11719
Nov 13 02:01:03 localhost kernel: [17805772.424000] R5: read error not
correctable.
Nov 13 02:01:03 localhost kernel: [17805772.464000] lost page write
due to I/O error on md0
Nov 13 02:01:05 localhost kernel: [17805773.776000] hdc: dma_intr:
status=0x51 { DriveReady SeekComplete Error }
Nov 13 02:01:05 localhost kernel: [17805773.776000] hdc: dma_intr:
error=0x40 { UncorrectableError }, LBAsect=11736, sector=1
1727
Nov 13 02:01:05 localhost kernel: [17805773.776000] ide: failed opcode
was: unknown
Nov 13 02:01:05 localhost kernel: [17805773.776000] end_request: I/O
error, dev hdc, sector 11727
Nov 13 02:01:05 localhost kernel: [17805773.776000] R5: read error not
correctable.
Nov 13 02:01:05 localhost kernel: [17805773.776000] lost page write
due to I/O error on md0
Nov 13 02:01:06 localhost kernel: [17805775.156000] hdc: dma_intr:
status=0x51 { DriveReady SeekComplete Error }
Nov 13 02:01:06 localhost kernel: [17805775.156000] hdc: dma_intr:
error=0x40 { UncorrectableError }, LBAsect=11736, sector=1
1735
Nov 13 02:01:06 localhost kernel: [17805775.156000] ide: failed opcode
was: unknown
Nov 13 02:01:06 localhost kernel: [17805775.156000] end_request: I/O
error, dev hdc, sector 11735
Nov 13 02:01:06 localhost kernel: [17805775.156000] R5: read error not
correctable.
Nov 13 02:01:06 localhost kernel: [17805775.156000] lost page write
due to I/O error on md0
Nov 13 02:01:06 localhost kernel: [17805775.196000] RAID5 conf printout:
Nov 13 02:01:06 localhost kernel: [17805775.196000]  --- rd:5 wd:3 fd:2
Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 0, o:1, dev:hda1
Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 1, o:0, dev:hdc1
Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 2, o:1, dev:hde1
Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 4, o:1, dev:hdi1
Nov 13 02:01:06 localhost kernel: [17805775.212000] RAID5 

Re: RAID5 Recovery

2006-10-22 Thread Neil Brown
On Saturday October 21, [EMAIL PROTECTED] wrote:
 Hi,
 
 I had a run-in with the Ubuntu Server installer, and in trying to get
 the new system to recognize the clean 5-disk raid5 array left behind by
 the previous Ubuntu system, I think I inadvertently instructed it to
 create a new raid array using those same partitions.
 
 What I know for sure is that now, I get this:
 
 [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hda1
 mdadm: No super block found on /dev/hda1 (Expected magic a92b4efc, got
 )
 [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdc1
 mdadm: No super block found on /dev/hdc1 (Expected magic a92b4efc, got
 )
 [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hde1
 mdadm: No super block found on /dev/hde1 (Expected magic a92b4efc, got
 )
 [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdg1
 mdadm: No super block found on /dev/hdg1 (Expected magic a92b4efc, got
 )
 [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdi1
 mdadm: No super block found on /dev/hdi1 (Expected magic a92b4efc, got
 )
 
 I didn't format the partitions or write any data to the disk, so I think
 the array's data should be intact. Is there a way to recreate the
 superblocks, or am I hosed?

Weirds Could the drives have been repartitioned in the process,
with the partitions being slightly different sizes or at slightly
different offsets?  That might explain the disappearing superblocks,
and remaking the partitions might fix it.

Or you can just re-create the array.  Doing so won't destroy any data
that happens to be there.
To be on the safe side, create it with --assume-clean.  This will avoid
a resync so you can be sure that no data blocks will be written at
all.
Then 'fsck -n' or mount readonly and see if you data is safe.  
Once you are happy that you have the data safe you can trigger the
resync with
   mdadm --assemble --update=resync .
or 
   echo resync  /sys/block/md0/md/sync_action

(assuming it is 'md0').

Good luck.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID5 Recovery

2006-10-21 Thread Neil Cavan
Hi,

I had a run-in with the Ubuntu Server installer, and in trying to get
the new system to recognize the clean 5-disk raid5 array left behind by
the previous Ubuntu system, I think I inadvertently instructed it to
create a new raid array using those same partitions.

What I know for sure is that now, I get this:

[EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hda1
mdadm: No super block found on /dev/hda1 (Expected magic a92b4efc, got
)
[EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdc1
mdadm: No super block found on /dev/hdc1 (Expected magic a92b4efc, got
)
[EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hde1
mdadm: No super block found on /dev/hde1 (Expected magic a92b4efc, got
)
[EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdg1
mdadm: No super block found on /dev/hdg1 (Expected magic a92b4efc, got
)
[EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdi1
mdadm: No super block found on /dev/hdi1 (Expected magic a92b4efc, got
)

I didn't format the partitions or write any data to the disk, so I think
the array's data should be intact. Is there a way to recreate the
superblocks, or am I hosed?

Thanks,
Neil

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-19 Thread Maurice Hilarius
Nathanial Byrnes wrote:
 Yes, I did not have the funding nor approval to purchase more hardware
 when I set it up (read wife). Once it was working... the rest is
 history.

   

OK, so if you have a pair of IDE disks, jumpered as Master and slave,
and if one fails:

If Master failed, re-jumper remaining disk on pair on same cable as
Master, no slave present

If Slave failed, re-jumper remaining disk on pair on same cable as
Master, no slave present.

Then you will have the remaining disk working normally, at least.

When you can afford it I suggest buying a controller with enough ports
to support the number of drives you have, with no Master/Slave pairing.

Good luck !

And to the  software guys trying to help: We need to start with the
(obvious) hardware problem, before we advise on how to recover data from
a borked system..
Once he has the jumpering on the drives sorted out, the drive that went
missing will be back again..


-- 

Regards,
Maurice

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-19 Thread Nate Byrnes

Hi All,
   I'm not sure that is entirely the case. From a hardware perspective, 
I can access all the disks from the OS, via fdisk and dd. It is really 
just mdadm that is failing.  Would I still need to work the jumper issue?

   Thanks,
   Nate

Maurice Hilarius wrote:

Nathanial Byrnes wrote:
  

Yes, I did not have the funding nor approval to purchase more hardware
when I set it up (read wife). Once it was working... the rest is
history.

  



OK, so if you have a pair of IDE disks, jumpered as Master and slave,
and if one fails:

If Master failed, re-jumper remaining disk on pair on same cable as
Master, no slave present

If Slave failed, re-jumper remaining disk on pair on same cable as
Master, no slave present.

Then you will have the remaining disk working normally, at least.

When you can afford it I suggest buying a controller with enough ports
to support the number of drives you have, with no Master/Slave pairing.

Good luck !

And to the  software guys trying to help: We need to start with the
(obvious) hardware problem, before we advise on how to recover data from
a borked system..
Once he has the jumpering on the drives sorted out, the drive that went
missing will be back again..


  

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-19 Thread Maurice Hilarius
Nate Byrnes wrote:
 Hi All,
I'm not sure that is entirely the case. From a hardware
 perspective, I can access all the disks from the OS, via fdisk and dd.
 It is really just mdadm that is failing.  Would I still need to work
 the jumper issue?
Thanks,
Nate

IF the disks are as we suspect (master and slave relationships) and IF
you now have either a failed or a removed drive, then you  MUST correct
the jumpering.
Sure, you can often see a disk that is misconfigured.
It is almost certain, however, that when you write to it you will simply
cause corruption on it.

Of course, so far this is all speculation, as you have not actually said
what the disks, controller interfaces, and jumpering and so forth are at.
I was merely speculating, based on what you have said.

No amount of software magic will cure a hardware problem..


-- 

With our best regards,


Maurice W. HilariusTelephone: 01-780-456-9771
Hard Data Ltd.  FAX:   01-780-456-9772
11060 - 166 Avenue email:[EMAIL PROTECTED]
Edmonton, AB, Canada   http://www.harddata.com/
   T5X 1Y3

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-19 Thread Nate Byrnes

Hello,
   I replaced the failed disk. The configuration is /dev/hde, /dev/hdf 
(replaced), on IDE channel 0, /dev/hdg, /dev/hdh on IDE channel 1, on a 
single PCI controller card. The issue here is that hde in now also not 
accessible after the failure of hdf.  I cannot see the jumper configs as 
the server is at home, and I am at work. The general thinking was that 
the hde superblock got hosed with the loss of hdf.


My initial post only did discuss the disk ordering and device names. As 
I had replaced the disk which had failed (in a previously fully 
functioning array), with a new disk with exactly the same configuration 
(jumpers, cable locations, etc), and each of the disks could be 
accessed, my thinking was that there would not be a hardware problem to 
sort through. Is this logic flawed?

   Thanks again,
   Nate

Maurice Hilarius wrote:

Nate Byrnes wrote:
  

Hi All,
   I'm not sure that is entirely the case. From a hardware
perspective, I can access all the disks from the OS, via fdisk and dd.
It is really just mdadm that is failing.  Would I still need to work
the jumper issue?
   Thanks,
   Nate



IF the disks are as we suspect (master and slave relationships) and IF
you now have either a failed or a removed drive, then you  MUST correct
the jumpering.
Sure, you can often see a disk that is misconfigured.
It is almost certain, however, that when you write to it you will simply
cause corruption on it.

Of course, so far this is all speculation, as you have not actually said
what the disks, controller interfaces, and jumpering and so forth are at.
I was merely speculating, based on what you have said.

No amount of software magic will cure a hardware problem..


  

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-18 Thread Nathanial Byrnes
2.4.1 behaves just like 2.1. so far nothing in the syslog or messages.

On Tue, 2006-04-18 at 10:24 +1000, Neil Brown wrote:
 On Monday April 17, [EMAIL PROTECTED] wrote:
  Unfortunately nothing changed. 
 
 Weird... so hdf still reports as 'busy'?
 Is it mentioned anywhere in /var/log/messages since reboot?
 
 What version of mdadm are you using?  Try 2.4.1 and see if that works
 differently.
 
 NeilBrown
 
  
  
  On Tue, 2006-04-18 at 07:43 +1000, Neil Brown wrote:
   On Monday April 17, [EMAIL PROTECTED] wrote:
Hi Neil, List,
Am I just out of luck? Perhaps a full reboot? Something else?
Thanks,
Nate
   
   Reboot and try again seems like the best bet at this stage.
   
   NeilBrown
   -
   To unsubscribe from this list: send the line unsubscribe linux-raid in
   the body of a message to [EMAIL PROTECTED]
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
   
   
   
  
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 !DSPAM:31e693751804284693!
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-18 Thread Maurice Hilarius
Nathanial Byrnes wrote:
 Hi All,
   Recently I lost a disk in my raid5 SW array. It seems that it took a
 second disk with it. The other disk appears to still be funtional (from
 an fdisk perspective...). I am trying to get the array to work in
 degraded mode via failed-disk in raidtab, but am always getting the
 following error:

   
Let me guess:
IDE disks, in pairs.
Jumpered as Master and Salve.

Right?





-- 

With our best regards,


Maurice W. HilariusTelephone: 01-780-456-9771
Hard Data Ltd.  FAX:   01-780-456-9772
11060 - 166 Avenue email:[EMAIL PROTECTED]
Edmonton, AB, Canada   http://www.harddata.com/
   T5X 1Y3

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-18 Thread Nathanial Byrnes
Yes, I did not have the funding nor approval to purchase more hardware
when I set it up (read wife). Once it was working... the rest is
history.

On Tue, 2006-04-18 at 16:13 -0600, Maurice Hilarius wrote:
 Nathanial Byrnes wrote:
  Hi All,
  Recently I lost a disk in my raid5 SW array. It seems that it took a
  second disk with it. The other disk appears to still be funtional (from
  an fdisk perspective...). I am trying to get the array to work in
  degraded mode via failed-disk in raidtab, but am always getting the
  following error:
 

 Let me guess:
 IDE disks, in pairs.
 Jumpered as Master and Salve.
 
 Right?
 
 
 
 
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Nathanial Byrnes
Please see below.

On Mon, 2006-04-17 at 13:04 +1000, Neil Brown wrote:
 On Sunday April 16, [EMAIL PROTECTED] wrote:
  Hi Neil,
  Thanks for your reply. I tried that, but here is there error I
  received:
  
  [EMAIL PROTECTED]:/etc# mdadm --assemble /dev/md0
  --uuid=38081921:59a998f9:64c1a001:ec53 4ef2 /dev/hd[efgh]
  mdadm: failed to add /dev/hdf to /dev/md0: Device or resource busy
  mdadm: /dev/md0 assembled from 2 drives and -1 spares - not enough to
  start the array.
 
 What is /dev/hdf busy? Is it in use? mounted? something?
 
Not that I am aware of. Here is the mount output:

[EMAIL PROTECTED]:/etc# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
/dev/sdb1 on /usr type ext3 (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
nfsd on /proc/fs/nfsd type nfsd (rw)
usbfs on /proc/bus/usb type usbfs (rw)

lsof | grep hdf does not return any results.

is there some other way to find out?
  
  The output from lsraid against each device is as follows (I think that I
  messed up my superblocks pretty well...): 
 
 Sorry, but I don't use lsraid and cannot tell anything useful from it's
 output.
ok
 
 NeilBrown
 
 !DSPAM:444305b971501811819476!
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Neil Brown
On Monday April 17, [EMAIL PROTECTED] wrote:
  
  What is /dev/hdf busy? Is it in use? mounted? something?
  
 Not that I am aware of. Here is the mount output:
 
 [EMAIL PROTECTED]:/etc# mount
 /dev/sda1 on / type ext3 (rw)
 proc on /proc type proc (rw)
 sysfs on /sys type sysfs (rw)
 /dev/sdb1 on /usr type ext3 (rw)
 devpts on /dev/pts type devpts (rw,gid=5,mode=620)
 nfsd on /proc/fs/nfsd type nfsd (rw)
 usbfs on /proc/bus/usb type usbfs (rw)
 
 lsof | grep hdf does not return any results.
 
 is there some other way to find out?

 cat /proc/swaps
 cat /proc/mounts
 cat /proc/mdstat

as well as 'lsof' should find it.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Nate Byrnes

Hi Neil,
   Nothing references hdf as you can see below.  I have also rmmod'ed 
md and raid5 modules and modprobed them back in. Thoughts?


   Thanks again,
   Nate

[EMAIL PROTECTED]:~# cat /proc/swaps
FilenameTypeSizeUsed
Priority

/dev/sdb2   partition   1050616 1028-1

[EMAIL PROTECTED]:~# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 rw 0 0
proc /proc proc rw,nodiratime 0 0
sysfs /sys sysfs rw 0 0
none /dev ramfs rw 0 0
/dev/sdb1 /usr ext3 rw 0 0
devpts /dev/pts devpts rw 0 0
nfsd /proc/fs/nfsd nfsd rw 0 0
usbfs /proc/bus/usb usbfs rw 0 0

[EMAIL PROTECTED]:~# cat /proc/mdstat
Personalities : [raid5]
md0 : inactive hdh[2] hdg[3] hde[1]
 234451968 blocks

unused devices: none


Neil Brown wrote:

On Monday April 17, [EMAIL PROTECTED] wrote:
  

What is /dev/hdf busy? Is it in use? mounted? something?

  

Not that I am aware of. Here is the mount output:

[EMAIL PROTECTED]:/etc# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
/dev/sdb1 on /usr type ext3 (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
nfsd on /proc/fs/nfsd type nfsd (rw)
usbfs on /proc/bus/usb type usbfs (rw)

lsof | grep hdf does not return any results.

is there some other way to find out?



 cat /proc/swaps
 cat /proc/mounts
 cat /proc/mdstat

as well as 'lsof' should find it.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

!DSPAM:44436e3576593808182809!

  

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Nate Byrnes

Hi Neil, List,
   Am I just out of luck? Perhaps a full reboot? Something else?
   Thanks,
   Nate

Nate Byrnes wrote:

Hi Neil,
   Nothing references hdf as you can see below.  I have also rmmod'ed 
md and raid5 modules and modprobed them back in. Thoughts?


   Thanks again,
   Nate

[EMAIL PROTECTED]:~# cat /proc/swaps
FilenameTypeSize
UsedPriority
/dev/sdb2   partition   1050616 
1028-1


[EMAIL PROTECTED]:~# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 rw 0 0
proc /proc proc rw,nodiratime 0 0
sysfs /sys sysfs rw 0 0
none /dev ramfs rw 0 0
/dev/sdb1 /usr ext3 rw 0 0
devpts /dev/pts devpts rw 0 0
nfsd /proc/fs/nfsd nfsd rw 0 0
usbfs /proc/bus/usb usbfs rw 0 0

[EMAIL PROTECTED]:~# cat /proc/mdstat
Personalities : [raid5]
md0 : inactive hdh[2] hdg[3] hde[1]
 234451968 blocks

unused devices: none


Neil Brown wrote:

On Monday April 17, [EMAIL PROTECTED] wrote:
 

What is /dev/hdf busy? Is it in use? mounted? something?

  

Not that I am aware of. Here is the mount output:

[EMAIL PROTECTED]:/etc# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
/dev/sdb1 on /usr type ext3 (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
nfsd on /proc/fs/nfsd type nfsd (rw)
usbfs on /proc/bus/usb type usbfs (rw)

lsof | grep hdf does not return any results.

is there some other way to find out?



 cat /proc/swaps
 cat /proc/mounts
 cat /proc/mdstat

as well as 'lsof' should find it.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



  


!DSPAM:444386c978211215816793!


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Neil Brown
On Monday April 17, [EMAIL PROTECTED] wrote:
 Hi Neil, List,
 Am I just out of luck? Perhaps a full reboot? Something else?
 Thanks,
 Nate

Reboot and try again seems like the best bet at this stage.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Nathanial Byrnes
Unfortunately nothing changed. 


On Tue, 2006-04-18 at 07:43 +1000, Neil Brown wrote:
 On Monday April 17, [EMAIL PROTECTED] wrote:
  Hi Neil, List,
  Am I just out of luck? Perhaps a full reboot? Something else?
  Thanks,
  Nate
 
 Reboot and try again seems like the best bet at this stage.
 
 NeilBrown
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 !DSPAM:0c1a90901937570534!
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-17 Thread Neil Brown
On Monday April 17, [EMAIL PROTECTED] wrote:
 Unfortunately nothing changed. 

Weird... so hdf still reports as 'busy'?
Is it mentioned anywhere in /var/log/messages since reboot?

What version of mdadm are you using?  Try 2.4.1 and see if that works
differently.

NeilBrown

 
 
 On Tue, 2006-04-18 at 07:43 +1000, Neil Brown wrote:
  On Monday April 17, [EMAIL PROTECTED] wrote:
   Hi Neil, List,
   Am I just out of luck? Perhaps a full reboot? Something else?
   Thanks,
   Nate
  
  Reboot and try again seems like the best bet at this stage.
  
  NeilBrown
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
  !DSPAM:0c1a90901937570534!
  
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-16 Thread Neil Brown
On Saturday April 15, [EMAIL PROTECTED] wrote:
 Hi All,
   Recently I lost a disk in my raid5 SW array. It seems that it took a
 second disk with it. The other disk appears to still be funtional (from
 an fdisk perspective...). I am trying to get the array to work in
 degraded mode via failed-disk in raidtab, but am always getting the
 following error:
 
 md: could not bd_claim hde.
 md: autostart failed!
 
 When I try to raidstart the array. Is it the case tha I had been running
 in degraded mode before the disk failure, and then lost the other disk?
 if so, how can I tell. 

raidstart is deprecated.  It doesn't work reliably.  Don't use it.

 
 I have been messing about with mkraid -R and I have tried to
 add /dev/hdf (a new disk) back to the array. However, I am fairly
 confident that I have not kicked off the recovery process, so I am
 imagining that once I get the superblocks in order, I should be able to
 recover to the new disk?
 
 My system and raid config are:
 Kernel 2.6.13.1
 Slack 10.2
 RAID 5 which originally looked like:
 /dev/hde
 /dev/hdg
 /dev/hdi
 /dev/hdk
 
 but when I moved the disks to another box with fewer IDE controllers
 /dev/hde
 /dev/hdf
 /dev/hdg
 /dev/hdh
 
 How should I approach this?

mdadm --assemble /dev/md0 --uuid=38081921:59a998f9:64c1a001:ec534ef2 /dev/hd*

If that doesn't work, add --force but be cautious of the data - do
an fsck atleast.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-16 Thread Nathanial Byrnes
Hi Neil,
Thanks for your reply. I tried that, but here is there error I
received:

[EMAIL PROTECTED]:/etc# mdadm --assemble /dev/md0
--uuid=38081921:59a998f9:64c1a001:ec53 4ef2 /dev/hd[efgh]
mdadm: failed to add /dev/hdf to /dev/md0: Device or resource busy
mdadm: /dev/md0 assembled from 2 drives and -1 spares - not enough to
start the array.

The output from lsraid against each device is as follows (I think that I
messed up my superblocks pretty well...): 

[EMAIL PROTECTED]:/etc# lsraid -d /dev/hde
[dev   9,   0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2
offline
[dev   ?,   ?] (unknown)...
missing
[dev   ?,   ?] (unknown)...
missing
[dev  34,  64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  34,   0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  33,  64] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown
[dev  33,   0] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown

[dev  33,   0] /dev/hde 38081921.59A998F9.64C1A001.EC534EF2
unbound
[EMAIL PROTECTED]:/etc# lsraid -d /dev/hdf
[dev   9,   0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2
offline
[dev   ?,   ?] (unknown)...
missing
[dev   ?,   ?] (unknown)...
missing
[dev  34,  64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  34,   0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  33,  64] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown
[dev  33,   0] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown

[dev  33,  64] /dev/hdf 38081921.59A998F9.64C1A001.EC534EF2
unbound
[EMAIL PROTECTED]:/etc# lsraid -d /dev/hdg
[dev   9,   0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2
offline
[dev   ?,   ?] (unknown)...
missing
[dev   ?,   ?] (unknown)...
missing
[dev  34,  64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  34,   0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  33,  64] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown
[dev  33,   0] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown

[EMAIL PROTECTED]:/etc# lsraid -d /dev/hdh
[dev   9,   0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2
offline
[dev   ?,   ?] (unknown)...
missing
[dev   ?,   ?] (unknown)...
missing
[dev  34,  64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  34,   0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good
[dev  33,  64] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown
[dev  33,   0] (unknown)38081921.59A998F9.64C1A001.EC534EF2
unknown


Thanks again,
Nate

On Mon, 2006-04-17 at 08:46 +1000, Neil Brown wrote:
 On Saturday April 15, [EMAIL PROTECTED] wrote:
  Hi All,
  Recently I lost a disk in my raid5 SW array. It seems that it took a
  second disk with it. The other disk appears to still be funtional (from
  an fdisk perspective...). I am trying to get the array to work in
  degraded mode via failed-disk in raidtab, but am always getting the
  following error:
  
  md: could not bd_claim hde.
  md: autostart failed!
  
  When I try to raidstart the array. Is it the case tha I had been running
  in degraded mode before the disk failure, and then lost the other disk?
  if so, how can I tell. 
 
 raidstart is deprecated.  It doesn't work reliably.  Don't use it.
 
  
  I have been messing about with mkraid -R and I have tried to
  add /dev/hdf (a new disk) back to the array. However, I am fairly
  confident that I have not kicked off the recovery process, so I am
  imagining that once I get the superblocks in order, I should be able to
  recover to the new disk?
  
  My system and raid config are:
  Kernel 2.6.13.1
  Slack 10.2
  RAID 5 which originally looked like:
  /dev/hde
  /dev/hdg
  /dev/hdi
  /dev/hdk
  
  but when I moved the disks to another box with fewer IDE controllers
  /dev/hde
  /dev/hdf
  /dev/hdg
  /dev/hdh
  
  How should I approach this?
 
 mdadm --assemble /dev/md0 --uuid=38081921:59a998f9:64c1a001:ec534ef2 /dev/hd*
 
 If that doesn't work, add --force but be cautious of the data - do
 an fsck atleast.
 
 NeilBrown
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 !DSPAM:4442c93863991804284693!
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 recovery trouble, bd_claim failed?

2006-04-16 Thread Neil Brown
On Sunday April 16, [EMAIL PROTECTED] wrote:
 Hi Neil,
   Thanks for your reply. I tried that, but here is there error I
 received:
 
 [EMAIL PROTECTED]:/etc# mdadm --assemble /dev/md0
 --uuid=38081921:59a998f9:64c1a001:ec53 4ef2 /dev/hd[efgh]
 mdadm: failed to add /dev/hdf to /dev/md0: Device or resource busy
 mdadm: /dev/md0 assembled from 2 drives and -1 spares - not enough to
 start the array.

What is /dev/hdf busy? Is it in use? mounted? something?

 
 The output from lsraid against each device is as follows (I think that I
 messed up my superblocks pretty well...): 

Sorry, but I don't use lsraid and cannot tell anything useful from it's
output.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID5 recovery trouble, bd_claim failed?

2006-04-15 Thread Nathanial Byrnes
Hi All,
Recently I lost a disk in my raid5 SW array. It seems that it took a
second disk with it. The other disk appears to still be funtional (from
an fdisk perspective...). I am trying to get the array to work in
degraded mode via failed-disk in raidtab, but am always getting the
following error:

md: could not bd_claim hde.
md: autostart failed!

When I try to raidstart the array. Is it the case tha I had been running
in degraded mode before the disk failure, and then lost the other disk?
if so, how can I tell. 

I have been messing about with mkraid -R and I have tried to
add /dev/hdf (a new disk) back to the array. However, I am fairly
confident that I have not kicked off the recovery process, so I am
imagining that once I get the superblocks in order, I should be able to
recover to the new disk?

My system and raid config are:
Kernel 2.6.13.1
Slack 10.2
RAID 5 which originally looked like:
/dev/hde
/dev/hdg
/dev/hdi
/dev/hdk

but when I moved the disks to another box with fewer IDE controllers
/dev/hde
/dev/hdf
/dev/hdg
/dev/hdh

How should I approach this?

Below is the output of mdadm --examine /dev/hd*

Thanks in advance,
Nate

/dev/hde:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 38081921:59a998f9:64c1a001:ec534ef2
  Creation Time : Fri Aug 22 16:34:37 2003
 Raid Level : raid5
Device Size : 78150656 (74.53 GiB 80.03 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

Update Time : Wed Apr 12 02:26:37 2006
  State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
   Checksum : 165c1b4c - correct
 Events : 0.37523832

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice State
this 1  3301  active sync   /dev/hde

   0 0   000  removed
   1 1  3301  active sync   /dev/hde
   2 2  34   642  active sync   /dev/hdh
   3 3  3403  active sync   /dev/hdg

/dev/hdf:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 38081921:59a998f9:64c1a001:ec534ef2
  Creation Time : Fri Aug 22 16:34:37 2003
 Raid Level : raid5
Device Size : 78150656 (74.53 GiB 80.03 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

Update Time : Wed Apr 12 02:26:37 2006
  State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
   Checksum : 165c1bc5 - correct
 Events : 0.37523832

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice State
this 3  33   64   -1  sync   /dev/hdf

   0 0   000  removed
   1 1  3301  active sync   /dev/hde
   2 2  34   642  active sync   /dev/hdh
   3 3  33   64   -1  sync   /dev/hdf
/dev/hdg:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 38081921:59a998f9:64c1a001:ec534ef2
  Creation Time : Fri Aug 22 16:34:37 2003
 Raid Level : raid5
Device Size : 78150656 (74.53 GiB 80.03 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

Update Time : Wed Apr 12 06:12:58 2006
  State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 3
  Spare Devices : 0
   Checksum : 1898e1fd - correct
 Events : 0.37523844

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice State
this 3  3403  active sync   /dev/hdg

   0 0   000  removed
   1 1   001  faulty removed
   2 2  34   642  active sync   /dev/hdh
   3 3  3403  active sync   /dev/hdg
/dev/hdh:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 38081921:59a998f9:64c1a001:ec534ef2
  Creation Time : Fri Aug 22 16:34:37 2003
 Raid Level : raid5
Device Size : 78150656 (74.53 GiB 80.03 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

Update Time : Wed Apr 12 06:12:58 2006
  State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 3
  Spare Devices : 0
   Checksum : 1898e23b - correct
 Events : 0.37523844

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice State
this 2  34   642  active sync   /dev/hdh

   0 0   000  removed
   1 1   001  faulty removed
   2 2  34   642  active sync   /dev/hdh
   3 3  3403  active sync   /dev/hdg


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

Re: Help needed - RAID5 recovery from Power-fail - SOLVED

2006-04-05 Thread Nigel J. Terry
Thanks for all the help. I am now up and running again and have been
stable for over a day. I will now install my new drive and add it to
give me an array of three drives.

I'll also learn more about Raid, mdadm and smartd so that I am better
prepared next time.

Thanks again

Nigel
Neil Brown wrote:
 On Monday April 3, [EMAIL PROTECTED] wrote:
   
 I wonder if you could help a Raid Newbie with a problem

 I had a power fail, and now I can't access my RAID array. It has been
 working fine for months until I lost power... Being a fool, I don't have
 a full backup, so I really need to get this data back.

 I run FC4 (64bit).
 I have an array of two disks /dev/sda1 and /dev/sdb1 as a raid5 array
 /dev/md0 on top of which I run lvm and mount the whole lot as /home. My
 intention was always to add another disk to this array, and I purchased
 one yesterday.
 

 2 devices in a raid5??  Doesn't seem a lot of point it being raid5
 rather than raid1.

   
 When I boot, I get:

 md0 is not clean
 Cannot start dirty degraded array
 failed to run raid set md0
 

 This tells use that the array is degraded.  A dirty degraded array can
 have undetectable data corruption.  That is why it won't start it for
 you.
 However with only two devices, data corruption from this cause isn't
 actually possible. 

 The kernel parameter
md_mod.start_dirty_degraded=1
 will bypass this message and start the array anyway.

 Alternately:
   mdadm -A --force /dev/md0 /dev/sd[ab]1

   
 # mdadm --examine /dev/sda1
 /dev/sda1:
   Magic : a92b4efc
 Version : 00.90.02
UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1
   Creation Time : Thu Dec 15 15:29:36 2005
  Raid Level : raid5
Raid Devices : 2
   Total Devices : 2
 Preferred Minor : 0

 Update Time : Tue Mar 21 06:25:52 2006
   State : active
  Active Devices : 1
 

 So at 06:25:52, there was only one working devices, while...


   
 #mdadm --examine /dev/sdb1
 /dev/sdb1:
   Magic : a92b4efc
 Version : 00.90.02
UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1
   Creation Time : Thu Dec 15 15:29:36 2005
  Raid Level : raid5
Raid Devices : 2
   Total Devices : 2
 Preferred Minor : 0

 Update Time : Tue Mar 21 06:23:57 2006
   State : active
  Active Devices : 2
 

 at 06:23:57 there were two.

 It looks like you lost a drive a while ago. Did you notice?

 Anyway, the 'mdadm' command I gave above should get the array working
 again for you.  Then you might want to
mdadm /dev/md0 -a /dev/sdb1
 is you trust /dev/sdb

 NeilBrown


   
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help needed - RAID5 recovery from Power-fail

2006-04-04 Thread David Greaves
Neil Brown wrote:

On Monday April 3, [EMAIL PROTECTED] wrote:
  

I wonder if you could help a Raid Newbie with a problem


snip

It looks like you lost a drive a while ago. Did you notice?

This is not unusual - raid just keeps on going if a disk fails.
When things are working again you really should read up on mdadm -F -
it runs as a daemon and sends you mail if any raid events occur.

See if FC4 has a script that automatically runs it - you may need to
tweak some config parameters somewhere (I use Debian so I'm not much help).

David

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help needed - RAID5 recovery from Power-fail

2006-04-04 Thread Al Boldi
Neil Brown wrote:
 2 devices in a raid5??  Doesn't seem a lot of point it being raid5
 rather than raid1.

Wouldn't a 2-dev raid5 imply a striped block mirror (i.e faster) rather than 
a raid1 duplicate block mirror (i.e. slower) ?

Thanks!

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Help needed - RAID5 recovery from Power-fail

2006-04-03 Thread Nigel J. Terry
I wonder if you could help a Raid Newbie with a problem

I had a power fail, and now I can't access my RAID array. It has been
working fine for months until I lost power... Being a fool, I don't have
a full backup, so I really need to get this data back.

I run FC4 (64bit).
I have an array of two disks /dev/sda1 and /dev/sdb1 as a raid5 array
/dev/md0 on top of which I run lvm and mount the whole lot as /home. My
intention was always to add another disk to this array, and I purchased
one yesterday.

When I boot, I get:

md0 is not clean
Cannot start dirty degraded array
failed to run raid set md0


I can provide the following extra information:

# cat /proc/mdstat
Personalities : [raid5]
unused devices: none

# mdadm --query /dev/md0
/dev/md0: is an md device which is not active

# mdadm --query /dev/md0
/dev/md0: is an md device which is not active
/dev/md0: is too small to be an md component.

# mdadm --query /dev/sda1
/dev/sda1: is not an md array
/dev/sda1: device 0 in 2 device undetected raid5 md0.  Use mdadm
--examine for more detail.

#mdadm --query /dev/sdb1
/dev/sdb1: is not an md array
/dev/sdb1: device 1 in 2 device undetected raid5 md0.  Use mdadm
--examine for more detail.

# mdadm --examine /dev/md0
mdadm: /dev/md0 is too small for md

# mdadm --examine /dev/sda1
/dev/sda1:
  Magic : a92b4efc
Version : 00.90.02
   UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1
  Creation Time : Thu Dec 15 15:29:36 2005
 Raid Level : raid5
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

Update Time : Tue Mar 21 06:25:52 2006
  State : active
 Active Devices : 1
Working Devices : 1
 Failed Devices : 2
  Spare Devices : 0
   Checksum : 2ba99f09 - correct
 Events : 0.1498318

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice State
this 0   810  active sync   /dev/sda1

   0 0   810  active sync   /dev/sda1
   1 1   001  faulty removed

#mdadm --examine /dev/sdb1
/dev/sdb1:
  Magic : a92b4efc
Version : 00.90.02
   UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1
  Creation Time : Thu Dec 15 15:29:36 2005
 Raid Level : raid5
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

Update Time : Tue Mar 21 06:23:57 2006
  State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
   Checksum : 2ba99e95 - correct
 Events : 0.1498307

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice State
this 1   8   171  active sync   /dev/sdb1

   0 0   810  active sync   /dev/sda1
   1 1   8   171  active sync   /dev/sdb1

It looks to me like there is no hardware problem, but maybe I am wrong.
I cannot find any file /etc/mdadm.confnor   /etc/raidtab.

How would you suggest I proceed? I'm wary of doing anything (assemble,
build, create) until I am sure it won't reset everything.

Many Thanks

Nigel



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help needed - RAID5 recovery from Power-fail

2006-04-03 Thread Neil Brown
On Monday April 3, [EMAIL PROTECTED] wrote:
 I wonder if you could help a Raid Newbie with a problem
 
 I had a power fail, and now I can't access my RAID array. It has been
 working fine for months until I lost power... Being a fool, I don't have
 a full backup, so I really need to get this data back.
 
 I run FC4 (64bit).
 I have an array of two disks /dev/sda1 and /dev/sdb1 as a raid5 array
 /dev/md0 on top of which I run lvm and mount the whole lot as /home. My
 intention was always to add another disk to this array, and I purchased
 one yesterday.

2 devices in a raid5??  Doesn't seem a lot of point it being raid5
rather than raid1.

 
 When I boot, I get:
 
 md0 is not clean
 Cannot start dirty degraded array
 failed to run raid set md0

This tells use that the array is degraded.  A dirty degraded array can
have undetectable data corruption.  That is why it won't start it for
you.
However with only two devices, data corruption from this cause isn't
actually possible. 

The kernel parameter
   md_mod.start_dirty_degraded=1
will bypass this message and start the array anyway.

Alternately:
  mdadm -A --force /dev/md0 /dev/sd[ab]1

 
 # mdadm --examine /dev/sda1
 /dev/sda1:
   Magic : a92b4efc
 Version : 00.90.02
UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1
   Creation Time : Thu Dec 15 15:29:36 2005
  Raid Level : raid5
Raid Devices : 2
   Total Devices : 2
 Preferred Minor : 0
 
 Update Time : Tue Mar 21 06:25:52 2006
   State : active
  Active Devices : 1

So at 06:25:52, there was only one working devices, while...


 
 #mdadm --examine /dev/sdb1
 /dev/sdb1:
   Magic : a92b4efc
 Version : 00.90.02
UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1
   Creation Time : Thu Dec 15 15:29:36 2005
  Raid Level : raid5
Raid Devices : 2
   Total Devices : 2
 Preferred Minor : 0
 
 Update Time : Tue Mar 21 06:23:57 2006
   State : active
  Active Devices : 2

at 06:23:57 there were two.

It looks like you lost a drive a while ago. Did you notice?

Anyway, the 'mdadm' command I gave above should get the array working
again for you.  Then you might want to
   mdadm /dev/md0 -a /dev/sdb1
is you trust /dev/sdb

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 recovery fails

2005-11-14 Thread Ross Vandegrift
On Mon, Nov 14, 2005 at 09:27:25PM +0200, Raz Ben-Jehuda(caro) wrote:
 I have made the following test with my raid5:
 1. created raid5 with 4 sata disks.
 2. waited untill raid was fully initialized.
 3. pulled a disk from the panel.
 4. shut the system.
 5. put back the disk.
 6. turn on the system.
 
 The raid failed failed to recver. i got message from the md layer
 saying that it rejects the dirty disk.
 Anyone ?

Did you re-add the disk to the array?

# mdadm --add /dev/md0 /dev/sda2

Of course, substitude your appropriate devices for the ones that I
randomly chose ::-)


-- 
Ross Vandegrift
[EMAIL PROTECTED]

The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell.
--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html