Hi there, just want to share some nasty experience (and what to do to solve
it).

Two weeks ago I installed a Raid-5 on two IDE and one SCSI partition under
SuSE 6.3 with the raid0145 patches.
The SCSI drive hangs of an adaptec controller. The drives were type fd and
autodetection of raid is on in the kernel.
md is compiled into the kernel (not a module), although the system boots
from a 'normal' ide ext2 partition.
I left the machine running after fdisking the drives, making the raid and
installing ext2 (would liked to have installed reiserfs but readers of this
group probably know why I did not).

The people the machine belongs to tested and were happy. Yesterday I was
again at there office and looked at /var/log/boot.msg to find that after a
reboot the raid was running in degraged mode.
The reason: the adaptec code was compiled as a module (standard for SuSE),
during boot the md software first detects only two raid5 partitions and
starts the raid in degraded mode. Only later on it loads the SCSI module,
but then the raid is already detected and running.
I decided to compile the adaptec code into the kernel (not as module) but
that gave big problems after rebooting, SCSI errors on (a non-existing) host
#1.
I removed the aic7xxx module from /lib/modules/2.2.13, rebooted but that did
not help.
I remembered finding a new setting INITRD_MODULES in the /etc/rc.config file
and removed its reference to aic7xxx. Unfortunately there is no comment in
the config file on what to do after you change it. It took another reboot to
find out that just changing that and running lilo doesn't help.
You have to run mk_initrd to update the /boot/initrd compressed filing
system that is loaded by the kernel as the initial filing system. After that
the module and the compiled in kernel would no longer compete for the one
SCSI device.

Unfortunately this whole stuff is not in the manual, but there are some
updates ( http://sdb.suse.de/sdb/de/html/errata-63-d.html and
http://sdb.suse.de/sdb/de/html/adrian_6.3_boot.html ), these are in German,
but you might also find some English version. Of course I only found them
after solving the problem :(
The new way of booting SuSE uses makes sense, but there could be some more
pointers (in /etc/rc.config ) or better some automated check to see if the
/boot/initrd file doesn't clash with the kernel (or its compile settings). I
don't know if it is a good idea to put that in (SuSEs version of) lilo or
the kernel configuration/compilation stuff.
This all would not have mattered so much if rebooting a system takes so long
(especially the SCSI scanning).

Hope you remember this mail if you get the strange SCSI error messages and
get to a solution faster than I did.

Anthon



Reply via email to