RAID5 rebuild with a bad source drive fails

2007-11-24 Thread Jim Klimov
Hello linux-raid,

  I have a home fileserver which used a 6-disk RAID5 array
with old disks and cheap IDE controllers (all disks are 
IDE masters).

  As was expected, sooner or later the old hardware (and/or 
cabling) began failing. The array falls apart, in particular
currently it has 5 working disks and one marked as a spare 
(which was working before).

  The rebuild does not complete, because half-way through
one of the working disks has a set of bad blocks (about
30 of them). When the rebuild process (or the mount process)
hits these blocks, I get a non-running array with 4 working 
drives, one failed and one spare.

  While I can force-assemble the failing drive back into
the array, it's not useful - rebuild fails again and again.

  Question 1: is there a superblock-edit function, or maybe 
an equivalent manual procedure, which would let me mark the 
spare drive as a working part of the array?

  It [mostly] has all the data in correct stripes; at least 
the event counters are all the same, and it may be a better
working drive than the one with bad blocks.

  Even if I succeeded in editing all the superblocks to believe
that the spare disk is okay now, would it help in my data
recovery? :)

  Question 2: the disk's hardware apparetly fails to relocate
the bad blocks. Is it possible for the metadevice layer to 
do the same - remap and/or ignore the bad blocks? 

  In particular, is it possible for linux md to consider a
block of data as a failed quantum, not the whole partition
or disk, and try to use all 6 drives I have to deliver the
usable data (at least in some sort of recovery mode)?

-- 
Best regards,
 Jim Klimov  mailto:[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


raid5 reshape/resync

2007-11-24 Thread Nagilum

Hi,
I'm running 2.6.23.8 x86_64 using mdadm v2.6.4.
I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] - md0)
During that reshape (at around 4%) /dev/sdd reported read errors and  
went offline.
I replaced /dev/sdd with a new drive and tried to reassemble the array  
(/dev/sdd was shown as removed and now as spare).

Assembly worked but it would not run unless I use --force.
Since I'm always reluctant to use force I put the bad disk back in,  
this time as /dev/sdg . I re-added the drive and could run the array.  
The array started to resync (since the disk can be read until 4%) and  
then I marked the disk as failed. Now the array is active, degraded,  
recovering:


nas:~# mdadm -Q --detail /dev/md0
/dev/md0:
Version : 00.91.03
  Creation Time : Sat Sep 15 21:11:41 2007
 Raid Level : raid5
 Array Size : 1953234688 (1862.75 GiB 2000.11 GB)
  Used Dev Size : 488308672 (465.69 GiB 500.03 GB)
   Raid Devices : 6
  Total Devices : 7
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Sat Nov 24 10:10:46 2007
  State : active, degraded, recovering
 Active Devices : 5
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 1

 Layout : left-symmetric
 Chunk Size : 16K

 Reshape Status : 19% complete
  Delta Devices : 1, (5-6)

   UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
 Events : 0.726347

Number   Major   Minor   RaidDevice State
   0   800  active sync   /dev/sda
   1   8   161  active sync   /dev/sdb
   2   8   322  active sync   /dev/sdc
   6   8   963  faulty spare rebuilding   /dev/sdg
   4   8   644  active sync   /dev/sde
   5   8   805  active sync   /dev/sdf

   7   8   48-  spare   /dev/sdd

iostat:
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 129.48  1498.01  1201.59   7520   6032
sdb 134.86  1498.01  1201.59   7520   6032
sdc 127.69  1498.01  1201.59   7520   6032
sdd   0.40 0.00 3.19  0 16
sde 111.55  1498.01  1201.59   7520   6032
sdf 117.73 0.00  1201.59  0   6032
sdg   0.00 0.00 0.00  0  0

What I find somewhat confusing/disturbing is that does not appear to  
utilize /dev/sdd. What I see here could be explained by md doing a  
RAID5 resync from the 4 drives sd[a-c,e] to sd[a-c,e,f] but I would  
have expected it to use the new spare sdd for that. Also the speed is  
unusually low which seems to indicate a lot of seeking as if two  
operations are happening at the same time.
Also when I look at the data rates it looks more like the reshape is  
continuing even though one drive is missing (possible but risky).

Can someone relief my doubts as to whether md does the right thing here?
Thanks,


#_  __  _ __ http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__  _(_) /_  _  [EMAIL PROTECTED] \n +491776461165 #
#  // _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#   /___/ x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #




cakebox.homeunix.net - all the machine one needs..



pgpuZCEerJfLu.pgp
Description: PGP Digital Signature


Re: md RAID 10 on Linux 2.6.20?

2007-11-24 Thread Peter Grandi
 On Thu, 22 Nov 2007 22:09:27 -0500, [EMAIL PROTECTED]
 said:

 [ ... ] a RAID 10 personality defined in md that can be
 implemented using mdadm. If so, is it available in 2.6.20.11,
 [ ... ]

Very good choice about 'raid10' in general. For a single layer
just use '-l raid10'. Run 'man mdadm', the '-l' option and also
the '-p' option for the more exotic variants. Also 'man 4 md'
the RAID10 section.

The pairs are formed naturally out of the block device list
(first with second listed, and so on).

 8 drive RAID 10 would actually consist of 5 md devices (four
 RAID 1's and one RAID 0). [ ... ] one RAID 10, that would of
 course be better both in terms of management and probably
 performance I would guess. [ ... ]

Indeed easier in terms of management and there are some
interesting options for layout. Not sure about performance, as
sometimes I have seen strange interactions with the page cache
either way, but usually '-l raid10' is the way to go as you say.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HELP! New disks being dropped from RAID 6 array on every reboot

2007-11-24 Thread Bill Davidsen

Joshua Johnson wrote:

Greetings, long time listener, first time caller.

I recently replaced a disk in my existing 8 disk RAID 6 array.
Previously, all disks were PATA drives connected to the motherboard
IDE and 3 promise Ultra 100/133 controllers.  I replaced one of the
Promise controllers with a Via 64xx based controller, which has 2 SATA
ports and one PATA port.  I connected a new SATA drive to the new
card, partitioned the drive and added it to the array.  After 5 or 6
hours the resyncing process finished and the array showed up complete.
 Upon rebooting I discovered that the new drive had not been added to
the array when it was assembled on boot.   I resynced it and tried
again -- still would not persist after a reboot.  I moved one of the
existing PATA drives to the new controller (so I could have the slot
for network), rebooted and rebuilt the array.  Now when I reboot BOTH
disks are missing from the array (sda and sdb).  Upon examining the
disks it appears they think they are part of the array, but for some
reason they are not being added when the array is being assembled.
For example, this is a disk on the new controller which was not added
to the array after rebooting:

# mdadm --examine /dev/sda1
/dev/sda1:
  Magic : a92b4efc
Version : 00.90.03
   UUID : 63ee7d14:a0ac6a6e:aef6fe14:50e047a5
  Creation Time : Thu Sep 21 23:52:19 2006
 Raid Level : raid6
Device Size : 191157248 (182.30 GiB 195.75 GB)
 Array Size : 1146943488 (1093.81 GiB 1174.47 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

Update Time : Fri Nov 23 10:22:57 2007
  State : clean
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0
   Checksum : 50df590e - correct
 Events : 0.96419878

 Chunk Size : 256K

  Number   Major   Minor   RaidDevice State
this 6   816  active sync   /dev/sda1

   0 0   320  active sync   /dev/hda2
   1 1  5721  active sync   /dev/hdk2
   2 2  3322  active sync   /dev/hde2
   3 3  3423  active sync   /dev/hdg2
   4 4  2224  active sync   /dev/hdc2
   5 5  5625  active sync   /dev/hdi2
   6 6   816  active sync   /dev/sda1
   7 7   8   177  active sync   /dev/sdb1


Everything there seems to be correct and current up to the last
shutdown.  But the disk is not being added on boot.  Examining a disk
that is currently running in the array shows:

# mdadm --examine /dev/hdc2
/dev/hdc2:
  Magic : a92b4efc
Version : 00.90.03
   UUID : 63ee7d14:a0ac6a6e:aef6fe14:50e047a5
  Creation Time : Thu Sep 21 23:52:19 2006
 Raid Level : raid6
Device Size : 191157248 (182.30 GiB 195.75 GB)
 Array Size : 1146943488 (1093.81 GiB 1174.47 GB)
   Raid Devices : 8
  Total Devices : 6
Preferred Minor : 0

Update Time : Fri Nov 23 10:23:52 2007
  State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 0
   Checksum : 50df5934 - correct
 Events : 0.96419880

 Chunk Size : 256K

  Number   Major   Minor   RaidDevice State
this 4  2224  active sync   /dev/hdc2

   0 0   320  active sync   /dev/hda2
   1 1  5721  active sync   /dev/hdk2
   2 2  3322  active sync   /dev/hde2
   3 3  3423  active sync   /dev/hdg2
   4 4  2224  active sync   /dev/hdc2
   5 5  5625  active sync   /dev/hdi2
   6 6   006  faulty removed
   7 7   007  faulty removed


Here is my /etc/mdadm/mdadm.conf:

DEVICE partitions
PROGRAM /bin/echo
MAILADDR redacted
ARRAY /dev/md0 level=raid6 num-devices=8
UUID=63ee7d14:a0ac6a6e:aef6fe14:50e047a5


Can anyone see anything that is glaringly wrong here?  Has anybody
experienced similar behavior?  I am running Debian using kernel
2.6.23.8.  All partitions are set to type 0xFD and it appears the
superblocks on the sd* disks were written, why wouldn't they be added
to the array on boot?  Any help is greatly appreciated!


Does that match what's in the init files used at boot? By any chance 
does the information there explicitly list partitions by name? If you 
change to PARTITIONS in /etc/mdadm.conf it won't bite you until you 
change the detected partitions so they no longer match what was correct 
at install time.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HELP! New disks being dropped from RAID 6 array on every reboot

2007-11-24 Thread Joshua Johnson
On Nov 24, 2007 12:20 PM, Bill Davidsen [EMAIL PROTECTED] wrote:

 Does that match what's in the init files used at boot? By any chance
 does the information there explicitly list partitions by name? If you
 change to PARTITIONS in /etc/mdadm.conf it won't bite you until you
 change the detected partitions so they no longer match what was correct
 at install time.

According to the man page, using 'partitions' as your DEVICE should
cause mdadm to read /proc/partitions and scan all partitions listed
there.  The sda*/sdb* partitions were in /proc/partitions (at least
after the machine fully booted) but for some reason when mdadm
assembled the array it was not adding those partitions.  Changing the
DEVICE to '/dev/hd* /dev/sd*' rather than 'partitions' resolved the
issue.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html