Hey guys,
A couple questions.
1. Is there anyway to identify from mdstat which disk is the hot spare
and which are actually being used?
2. (longer) Is there anyway to add a replacement disk to a raid-5
without taking the raid offline?
My setup: 4 SCA disks, 3 in a raid-5, 1 as a hot spare. Here's my
raidtab:
raiddev /dev/md0
raid-level 5
nr-raid-disks 3
nr-spare-disks 1
persistent-superblock 1
parity-algorithm left-symmetric
chunk-size 128
device /dev/sdd1
raid-disk 0
device /dev/sde1
raid-disk 1
device /dev/sdf1
raid-disk 2
device /dev/sdg1
spare-disk 0
I pull /dev/sdd from the machine and it starts reconstructing on the
spare:
Personalities : [linear] [raid0] [raid1] [raid5] [translucent]
read_ahead 1024 sectors
md0 : active raid5 sdg1[4] sdf1[2] sde1[1] sdd1[0](F) 17863936 blocks
level 5, 128k chunk, algorithm 2 [3/2] [_UU] recovery=3%
finish=20.8min
unused devices: <none>
Once that finished I tried to add /dev/sdd back (simulating the
replacement of a failed disk). Nothing happened at this point so I
tried to add it back with raidhotadd:
% root@sybil:/usr/src/raidtools-0.90
14:27 $ raidhotadd /dev/md0 /dev/sdd1
/dev/md0: can not hot-add disk: disk busy!
and in syslog i get this:
Sep 8 14:27:36 sybil kernel: trying to hot-add sdd1 to md0 ...
So I figure I'll take the raid offline and start fresh:
root@sybil:/usr/src/raidtools-0.90
14:49 $ raidhotadd /dev/md0 /dev/sdd1
/dev/md0: can not hot-add disk: disk busy!
% root@sybil:/usr/src/raidtools-0.90
14:52 $ umount /raid
% root@sybil:/usr/src/raidtools-0.90
14:52 $ raidstop /dev/md0
then from syslog:
Sep 8 14:52:33 sybil kernel: interrupting MD-thread pid 122
Sep 8 14:52:33 sybil kernel: raid5d(122) flushing signals.
Sep 8 14:52:33 sybil kernel: marking sb clean...
Sep 8 14:52:33 sybil kernel: md: updating md0 RAID superblock on
device
Sep 8 14:52:33 sybil kernel: sdg1 [events: 00000037](write) sdg1's sb
offset: 8932032
Sep 8 14:52:33 sybil kernel: sdf1 [events: 00000037](write) sdf1's sb
offset: 8964160
Sep 8 14:52:33 sybil kernel: sde1 [events: 00000037](write) sde1's sb
offset: 8932032
Sep 8 14:52:33 sybil kernel: (skipping faulty sdd1 )
Sep 8 14:52:33 sybil kernel: .
Sep 8 14:52:33 sybil kernel: unbind<sdg1,3>
Sep 8 14:52:33 sybil kernel: export_rdev(sdg1)
Sep 8 14:52:33 sybil kernel: unbind<sdf1,2>
Sep 8 14:52:33 sybil kernel: export_rdev(sdf1)
Sep 8 14:52:33 sybil kernel: unbind<sde1,1>
Sep 8 14:52:33 sybil kernel: export_rdev(sde1)
Sep 8 14:52:33 sybil kernel: unbind<sdd1,0>
Sep 8 14:52:33 sybil kernel: export_rdev(sdd1)
Sep 8 14:52:33 sybil kernel: md0 stopped.
% root@sybil:/usr/src/raidtools-0.90
14:52 $ mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [translucent]
read_ahead 1024 sectors
unused devices: <none>
then
% root@sybil/usr/src/raidtools-0.90
14:53 $ raidstart /dev/md0
and then from syslog:
Sep 8 14:53:17 sybil kernel: (read) sdd1's sb offset: 8932032 [events: 00000033]
Sep 8 14:53:17 sybil kernel: (read) sde1's sb offset: 8932032 [events: 00000037]
Sep 8 14:53:17 sybil kernel: (read) sdf1's sb offset: 8964160 [events: 00000037]
Sep 8 14:53:17 sybil kernel: (read) sdg1's sb offset: 8932032 [events: 00000037]
Sep 8 14:53:17 sybil kernel: autorun ...
Sep 8 14:53:17 sybil kernel: considering sdg1 ...
Sep 8 14:53:17 sybil kernel: adding sdg1 ...
Sep 8 14:53:17 sybil kernel: adding sdf1 ...
Sep 8 14:53:17 sybil kernel: adding sde1 ...
Sep 8 14:53:17 sybil kernel: adding sdd1 ...
Sep 8 14:53:17 sybil kernel: created md0
Sep 8 14:53:17 sybil kernel: bind<sdd1,1>
Sep 8 14:53:17 sybil kernel: bind<sde1,2>
Sep 8 14:53:17 sybil kernel: bind<sdf1,3>
Sep 8 14:53:17 sybil kernel: bind<sdg1,4>
Sep 8 14:53:17 sybil kernel: running: <sdg1><sdf1><sde1><sdd1>
Sep 8 14:53:17 sybil kernel: now!
Sep 8 14:53:17 sybil kernel: sdg1's event counter: 00000037
Sep 8 14:53:17 sybil kernel: sdf1's event counter: 00000037
Sep 8 14:53:17 sybil kernel: sde1's event counter: 00000037
Sep 8 14:53:17 sybil kernel: sdd1's event counter: 00000033
Sep 8 14:53:17 sybil kernel: freshest: sdg1
Sep 8 14:53:17 sybil kernel: md: kicking non-fresh sdd1 from array!
Sep 8 14:53:17 sybil kernel: unbind<sdd1,3>
Sep 8 14:53:17 sybil kernel: export_rdev(sdd1)
Sep 8 14:53:17 sybil kernel: md0: removing former faulty sdd1!
Sep 8 14:53:17 sybil kernel: md0: max total readahead window set to 1024k
Sep 8 14:53:17 sybil kernel: md0: 2 data-disks, max readahead per data-disk: 512k
Sep 8 14:53:17 sybil kernel: raid5: device sdg1 operational as raid disk 0
Sep 8 14:53:17 sybil kernel: raid5: device sdf1 operational as raid disk 2
Sep 8 14:53:17 sybil kernel: raid5: device sde1 operational as raid disk 1
Sep 8 14:53:17 sybil kernel: raid5: allocated 3197kB for md0
Sep 8 14:53:17 sybil kernel: raid5: raid level 5 set md0 active with 3 out of 3
devices, algorithm 2
Sep 8 14:53:17 sybil kernel: RAID5 conf printout:
Sep 8 14:53:17 sybil kernel: --- rd:3 wd:3 fd:0
Sep 8 14:53:17 sybil kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdg1
Sep 8 14:53:17 sybil kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sde1
Sep 8 14:53:17 sybil kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdf1
Sep 8 14:53:17 sybil kernel: disk 3, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 4, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 5, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: RAID5 conf printout:
Sep 8 14:53:17 sybil kernel: --- rd:3 wd:3 fd:0
Sep 8 14:53:17 sybil kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdg1
Sep 8 14:53:17 sybil kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sde1
Sep 8 14:53:17 sybil kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdf1
Sep 8 14:53:17 sybil kernel: disk 3, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 4, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 5, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
Sep 8 14:53:17 sybil kernel: md: updating md0 RAID superblock on device
Sep 8 14:53:17 sybil kernel: sdg1 [events: 00000038](write) sdg1's sb offset: 8932032
Sep 8 14:53:17 sybil kernel: sdf1 [events: 00000038](write) sdf1's sb offset: 8964160
Sep 8 14:53:17 sybil kernel: sde1 [events: 00000038](write) sde1's sb offset: 8932032
Sep 8 14:53:17 sybil kernel: .
Sep 8 14:53:17 sybil kernel: ... autorun DONE.
So I would guess from looking at this that I am running normally
except how can I tell which devices are actually part of raid and
which one is the hot spare now?
Is there any way to tell from the mdstat output? And is it possible
to replace the bad disk and bring the new one into the raid without
ever having to bring it offline?
Any insights would (again) be greatly appreciated!!!
Jacob Martinson
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]