Re: LILO error with a raid-1 /

Martin Bene Thu, 06 Apr 2000 09:38:24 -0700
At 17:07 06.04.00, you wrote:
>In this foray into software raid I created two new devices.  /dev/md0 was
>created to hold / and /dev/md1 is for /boot.  I created both with the
>"failed-disk" option with one immediatly following the other.  Then I
>copied my whole system over to the new devices.  I left it in failed mode,
>fixed /etc/fstab and /etc/lilo.conf, ran lilo (worked fine this time with
>a failed disk in the array), and rebooted.

The "mkraid in failed mode" bug and the "raid handling of remove/add/fail 
disk" problem cancel each other out in this state.

mkraid in failed modes sets the number of disks 1 too high; normal raid 
code in degraded mode sets it 1 too low; -> create degraded & array in 
degraded mode gives the correct number of disks. only when you hotadd the 
original disk do you get a bad count.

>When the system came up on the
>raid devices properly I fixed up the partitions on the "failed" drive and
>did raidhotadd for both, it resync and I left it running.  It was under
>this "step" that a day later I tried to run lilo (now with both drives in
>normal raid functioning) and it failed.  So I guess perhaps that is the
>source of the problem unless I'm misunderstanding you?  It didn't occur to
>me that anything would be different before and after raidhotadding
>(especially after messages on this list saying it worked fine).

It does work fine. the only place where you'll ever see the error is with 
external programs trying to work on the unterlying physical devices of a 
raid array; the only known program where the problem surfaces is lilo.

>So with the fix for A) I would have to end up backing up my data and
>recreating the raid-1 array anyway?  I'm not opposed to doing this (in
>fact, I was prepared for it if the "failed-disk" method didn't work.

I'd just save the contents of /boot somewhere and rebuild that device 
(without dailed-disk stuff), then restore the copy; no patches needed for this.

>If you think this will solve my problem could I get the patch(es) from you
>and try it?

You can have a look at the state of your md devices using mkraid /dev/md0 
--debug. If you look at the logs, you'll see that in the raid superblock, 
ND (Number of disks in the array) is 3 instead of 2 for your arrays. Lilo 
tries to get the physical data for each of the disks and dies a horrible 
death when it tries to access the (non-existing) 3rd disk.

You could use the patch from 
ftp://ftp.sime.com/pub/linux/raidtools-19990824-0.90.mabene.gz to patch 
your raidtools.

You could put your array(s) in the normal state (correct count of disks) by 
redoing the raid creation stuff after installing the fixed mkraid 
executable. Just mount the original /dev/hdaxx partitions again, stop the 
raid arrays and recreate them in failed mode, copy, raidhotadd... same 
procedure as before.

>So would the "last disk" be /dev/md3 because it's the last md device or
>/dev/md(0|1) since it has the last physical disk?

Last disk referes to the last physical disk in each raid array.

Final point: I'm not too sure about your lilo.conf file - I don't have any 
of the
        bios=0x81
stuff in my config file; here's my config file for a working /boot on raid1 
lilo setup:
---------------------------------
prompt
timeout = 50
vga = normal
boot=/dev/md4
# End LILO global section
# Linux bootable partition config begins
image = /boot/vmlinuz
   root = /dev/md0
   label = Linux
   read-only
---------------------------------
check that the partition you put under boot= is the one containing the 
/boot filesystem, NOT your / filesystem. Works nicely for me, lilo cycles 
over the physical disks and makes each of them bootable.

Bye, Martin

"you have moved your mouse, please reboot to make this change take effect"
--------------------------------------------------
  Martin Bene               vox: +43-316-813824
  simon media               fax: +43-316-813824-6
  Andreas-Hofer-Platz 9     e-mail: [EMAIL PROTECTED]
  8010 Graz, Austria
--------------------------------------------------
finger [EMAIL PROTECTED] for PGP public key
Re: LILO error with a raid-1 /

Reply via email to