On Sat, Apr 17, 2010 at 3:01 PM, Neil Bothwick <n...@digimed.co.uk> wrote:
> On Sat, 17 Apr 2010 14:36:39 -0700, Mark Knecht wrote:
>
>> Empirically any way there doesn't seem to be a problem. I built the
>> new kernel and it booted normally so I think I'm misinterpreting what
>> was written in the Wiki or the Wiki is wrong.
>
> As long as /boot is not on RAID, or is on RAID1, you don't need an
> initrd. I've been booting this system for years with / on RAID1 and
> everything else on RAID5.
>
>
> --
> Neil Bothwick

Neil,
   Completely agreed, and in fact it's the way I built my new system.
/boot is just a partition, / is RAID1 is three partitions marked with
0xfd partition type, using metadata=0.90 and assembled by the kernel.
I'm using WD RAID Edition drives and an Asus Rampage II Extreme
motherboard.

   It works, however I'm running into the sort of thing I ran into
this morning when booting - both md5 and md6 have problems this
morning. Random partitions get dropped out. It's never the same ones,
and it's sometimes only 1 partition out of three on the same drive -
sdc5 and sdc6 aren't found until I reboot, but sda3, sdb3 & sdc3 were.
Flakey hardware? What? The motherboard? The drives?

   I've noticed the entering the BIOS setup screens before allowing
grub to take over seems to eliminate the problem. Timing?

m...@c2stable ~ $ cat /proc/mdstat
Personalities : [raid0] [raid1]
md6 : active raid1 sda6[0] sdb6[1]
      247416933 blocks super 1.1 [3/2] [UU_]

md11 : active raid0 sdd1[0] sde1[1]
      104871936 blocks super 1.1 512k chunks

md3 : active raid1 sdc3[2] sdb3[1] sda3[0]
      52436096 blocks [3/3] [UUU]

md5 : active raid1 sdb5[1] sda5[0]
      52436032 blocks [3/2] [UU_]

unused devices: <none>
m...@c2stable ~ $

   For clarity, md3 is the only one needed to boot the system. The
other three RAIDs aren't required until I start running apps. However
they are all being assembled by the kernel at boot time and I would
prefer not to do that, or at least learn how not to do it.

   Now, as to why they are being assembled I suspect it's because I
marked them all with partition type 0xfd when possibly it's not the
best thing to have done. The kernel won't bother with non-0xfd
partitions and then mdadm could have done it later:

c2stable ~ # fdisk -l /dev/sda

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x8b45be24

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1           7       56196   83  Linux
/dev/sda2               8         530     4200997+  82  Linux swap / Solaris
/dev/sda3             536        7063    52436160   fd  Linux raid autodetect
/dev/sda4            7064       60801   431650485    5  Extended
/dev/sda5            7064       13591    52436128+  fd  Linux raid autodetect
/dev/sda6           30000       60801   247417065   fd  Linux raid autodetect
c2stable ~ #

However the Gentoo Wiki says we are supposed to mark everything 0xfd:

http://en.gentoo-wiki.com/wiki/RAID/Software#Setup_Partitions

I'm not sure that we good advice or not for RAIDs that could be
assembled later but that's what I did and it leads to the kernel
trying to do everything before the system is totally up and mdadm is
really running.

   Anyway, the failures happen, so I can step through and fail, remove
and add the partition back to the array. (In this case fail and remove
aren't necessary)

c2stable ~ # mdadm /dev/md5 -f /dev/sdc5
mdadm: set device faulty failed for /dev/sdc5:  No such device
c2stable ~ # mdadm /dev/md5 -r /dev/sdc5
mdadm: hot remove failed for /dev/sdc5: No such device or address
c2stable ~ # mdadm /dev/md5 -a /dev/sdc5
mdadm: re-added /dev/sdc5
c2stable ~ # mdadm /dev/md6 -a /dev/sdc6
mdadm: re-added /dev/sdc6
c2stable ~ #

At this point md5 is repaired and I'm waiting for md6

c2stable ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md6 : active raid1 sdc6[2] sda6[0] sdb6[1]
      247416933 blocks super 1.1 [3/2] [UU_]
      [====>................]  recovery = 22.0% (54525440/247416933)
finish=38.1min speed=84230K/sec

md11 : active raid0 sdd1[0] sde1[1]
      104871936 blocks super 1.1 512k chunks

md3 : active raid1 sdc3[2] sdb3[1] sda3[0]
      52436096 blocks [3/3] [UUU]

md5 : active raid1 sdc5[2] sdb5[1] sda5[0]
      52436032 blocks [3/3] [UUU]

unused devices: <none>
c2stable ~ #c2stable ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md6 : active raid1 sdc6[2] sda6[0] sdb6[1]
      247416933 blocks super 1.1 [3/2] [UU_]
      [====>................]  recovery = 22.0% (54525440/247416933)
finish=38.1min speed=84230K/sec

md11 : active raid0 sdd1[0] sde1[1]
      104871936 blocks super 1.1 512k chunks

md3 : active raid1 sdc3[2] sdb3[1] sda3[0]
      52436096 blocks [3/3] [UUU]

md5 : active raid1 sdc5[2] sdb5[1] sda5[0]
      52436032 blocks [3/3] [UUU]

unused devices: <none>
c2stable ~ #

   How do I get past this? It's happening 2-3 times a week! I'm
figuring if the kernel doesn't auto-assemble the RAIDs that I don't
need assembled then I can somehow check that all the partitions are
ready to go before I start them up. This exercise this morning will
have taken an hour before I can start using the machine.

- Mark

- Mark

Reply via email to