RE: Root RAID and unmounting /boot

Bruno Prior Wed, 27 Oct 1999 12:38:58 -0700
> > Wouldn't it be easier to stick the kernel, lilo config, relevant boot info
> > on a floppy and boot raid1 systems from that?
> > perhaps i'm missing something..more likely that not :-)
>
> Because LILO doesn't understand RAID, and thus cannot (yet) load the kernel
> of it.

Actually, I think Tony has a point. He isn't suggesting booting from RAID-1. He
is suggesting booting from a floppy to a root RAID-1 system. This would be one
way of doing it. However, it still doesn't give you any redundancy if you have a
problem with the floppy or the floppy drive. The best solution is as Jakob says:

> > The thing I TOTALLY don't get is if the first drive dies, how can
> > you boot the 2nd drive (mirror) when you're still losing the small
> > bootable partition (as it's still part of drive 1). Of course you can put
> > this bootable partition on a seperate drive, but you still dont have the
> > redundancy because THAT drive can die.
>
> True.  That's why you put identical /boot partitions on all the drives
> in your array if you want to be really safe.

Just to flesh this out a bit, let's take a simple scenario where you have setup
root RAID-1 across 2 SCSI disks: /dev/sda and /dev/sdb, with the following
(simple) scheme:

/dev/sda1   /dev/sdb1   swap
/dev/sda3   /dev/sdb3   /boot[b]
/dev/sda4   /dev/sdb4   /

/dev/sda3 is mounted on /boot. The use of /dev/sdb3 is shown below.

Let's imagine a scenario where /dev/sda fails. The root filesystem will remain
available, and if we have swap on RAID or are not using swap much, the system
may continue to run. However, it is easy to imagine circumstances in which the
failure of /dev/sda causes the system to crash or need rebooting. But with the
system as it currently stands there is a major problem: all the files necessary
for booting are on /dev/sda3, and this is no longer available. As things stand,
we will be unable to boot the system if /dev/sda fails.

To get round this problem, we create a copy of /boot on /dev/sdb3. First we will
create a mount-point for /dev/sdb3 (let's call it /bootb) and mount /dev/sdb3
there:

mkdir /bootb
mount /dev/sdb3 /bootb

Now we copy the files on /boot to /bootb:

cp -a /boot /bootb

Let's make sure that this is mounted every time we boot, by adding the following
line to /etc/fstab:

/dev/sdb3    /bootb    ext2    defaults    1 1

Now we have to install a boot-loader onto /dev/sdb, so that we can boot from it
if /dev/sda fails. There is a complication. /dev/sdb will only be called upon to
boot if /dev/sda has failed so catastrophically that it is not recognized (or if
it has been removed for replacement). But in this case, the disk that was
/dev/sdb will be recognized by the BIOS as the first disk and become /dev/sda.
We therefore have to tell lilo that although we are installing to /dev/sdb, in
the event that we boot from it, it will actually be recognized as /dev/sda. We
do this by adding the line "disk=/dev/sdb bios=0x80" to the lilo.conf for
/dev/sdb. 0x80 is the BIOS position of the first hard disk, so this line tells
lilo that /dev/sdb will actually be in the BIOS position of /dev/sda if it is
booted from.

We still need to keep /etc/lilo.conf for when we want to make changes to the
functioning system. So we will create an alternative lilo.conf for /dev/sdb.
Let's call it /etc/lilo.conf.sdb. This file should look something like:

boot=/dev/sdb
disk=/dev/sdb  bios=0x80
map=/bootb/map
install=/bootb/boot.b
prompt
timeout=50
image=/bootb/vmlinuz-raid
    label=linux
    root=/dev/md0
    read-only

Notice that we use /bootb/ instead of /boot/ for all the system files.

Install lilo to the MBR of /dev/sdb using:

lilo -C /etc/lilo.conf.sdb

Lilo may warn you that you are not installing to the first disk. You can safely
ignore such warnings.

We are now covered against the catastrophic failure of /dev/sda. The
catastrophic failure of /dev/sdb is not a problem as we can still boot as normal
from /dev/sda (we may crash if we do not have swap on RAID, but we will be able
to reboot). The only other circumstance against which we can cover ourselves is
the situation where we get corruption, rather than wholesale failure, on
/dev/sda. If this corruption affects the MBR, there is nothing we can do - we
will have to remove /dev/sda and then rely on our ability to boot from /dev/sdb.
But if the MBR is intact, but some of the system files, such as the kernel image
have become corrupted, we can allow ourselves another option to reboot if we
need to. This option is to boot from /dev/sda, using /dev/sdb's kernel image. To
achieve this, we should change /etc/lilo.conf to something like:

boot=/dev/sda
map=/boot/map
install=/boot/boot.b
prompt
timeout=50
image=/boot/vmlinuz-raid
    label=linux
    root=/dev/md0
    fallback=sdb
    append="panic=30"
    read-only
image=/bootb/vmlinuz-raid
    label=sdb
    root=/dev/md0
    fallback=desperate
    append="panic=30"
    read-only
image=/bootb/vmlinuz-raid
    label=desperate
    root=/dev/sdb4
    append="noautodetect"
    read-only

Notice the sliding scale of gradually less desirable boot options. The first
image section is the standard one for normal operation, where at least /dev/sda
is functioning properly. The 'append="panic=30"' line tells this option that if
a kernel panic occurs during bootup, it should wait 30 seconds and then reboot.
The "fallback=sdb" line says that in the even of a panic and reboot, a new
default image should be selected, in this case going by the name of sdb. This is
the name of the second image section, so what this is effectively saying is "if
booting normally fails, try booting with the options in the sdb image section".
This section attempts to run the kernel image on /bootb/ (i.e. on /dev/sdb),
rather than the kernel image on /boot/ (/dev/sda). If even this fails, we fall
back to trying to boot the kernel on /dev/sdb with /dev/sdb4 (the partition on
/dev/sdb which is part of the root RAID array) as root, just in case booting is
failing because corruption on /dev/sda4 is preventing the root-RAID array from
functioning. To allow us to do this, we use the "noautodetect" kernel boot
option so that the RAID arrays are not started automatically, which would
prevent us from mounting /dev/sdb4. Ideally, this section would also include a
command to force /dev/sdb4 to be mounted read-only (so the filesystem does not
get out of sync with the superblock), but I can't figure out a way to do this
off the top of my head.

These options cover some fairly unlikely scenarios. To provide more reliable
security against problems with the hard disks, you will also want to make sure
that you have a working boot floppy, which is where we come back to Tony's
point.

That's about as far as you can go with securing your system using root-RAID-1.
The main thing to remember in daily operation is that, if you make any changes
to the kernel, initrd or anything else in /boot, you will have to copy the
changed files to /bootb and run lilo against /etc/lilo.conf.sdb, as well as
against /etc/lilo.conf. This is why having a non-RAIDed /boot is a little more
work than the alternative methods. Of course, you could always setup a shell
script to do this for you, in which case the overhead of using this method
should not be significant. And, in fact, I suppose it would be possible to have
/boot on a RAID-1 of /dev/sda3 and /dev/sdb3, and setup a script for whenever
you have made changes to the files in /boot, that unmounts the RAID, stops it,
mounts one of the constituent partitions read-only, runs lilo against
/etc/lilo.conf and /etc/lilo.conf.sdb, unmounts the partition, starts the RAID
and mounts it again on /boot.

On the other hand, why go to this trouble, when you could just get the latest
lilo, which can read RAID-1 partitions anyway.

Cheers,


Bruno Prior         [EMAIL PROTECTED]
RE: Root RAID and unmounting /boot

Reply via email to