Bug#401916: Bug 401916: analysis and suggested solution

David Härdeman Fri, 16 Feb 2007 07:33:40 -0800

I've spent more time researching this by reading kernel code, checking the
boot process of other distros and trolling through mailing list archives
and I think I have a pretty good picture of the problem now.




Description:

Basically udevsettle will return once all modules have been loaded and no
more uevents are pending. "all modules" include e.g. scsi host drivers and
usb host drivers. The problem is that even if a module has been loaded for
a usb host which has a storage device attached, the usb host driver will
not emit uevents for the device immediately. Instead the scanning is done
asynchronously and might take an arbitrary amount of time (based on things
like the reset-time of the storage device, which can be several seconds,
the number of hubs between the host and the device, etc).

The same goes for several other buses (e.g. SCSI, Firewire, fibre-channel,
etc), and we won't be able to solve it completely by watching kernel
threads (the approach that I tried in earlier mails to the same BR).



Short-term solution:

Therefore, I think the best short-term solution (considering the
ever-impending Etch release) would be to add the "root_wait=" boot
parameter so that affected users can set the timeout value manually. If
that parameter was added, and documented in the release docs, the severity
of these bugs could be downgraded (imho).

Alternatively, or additionally, the scripts could check whether one of
several "problematic" modules have been loaded when udevsettle returns and
if so, sleep a couple of extra seconds (most other distros that take this
approach seem to wait around 6 - 10 seconds). The problem is that the list
of problematic modules is potentially huge (see list of buses above)



Long-term solution:

In the long term (post-Etch), I think something like the following might
be a good solution:

Take all scripts under /usr/share/initramfs-tools/scripts/local-top/ that
setup block devices (i.e. cryptsetup, lvm, evms, etc), and split them in
two, a udev rule snippet and a script.

The udev rule snippet would list the devices that this particular script
is interested in, and tell udev to call the script whenever such a device
node is created.

The script is basically the old script with minor changes so that it takes
a device node as argument, and also so that it doesn't preserve any state
between invocations.

Then the main init script is changed to sleep until $ROOT (not /dev/root
but whatever is set as the $ROOT variable) appears



Advantages of the long-term approach:

there will be no more sleeping than necessary
everything will be asynchronous
there will be no need to specify dependencies between the
/usr/share/initramfs-tools/scripts/local-top/ scripts

The last one might seem minor, but it actually makes the system much
simpler. Right now it is not possible to support both crypto-on-lvm and
lvm-on-crypto without duplicating the lvm functionality in the cryptsetup
initramfs script (as you can tell initramfs to run lvm before or after
cryptsetup, but not both).



Example:

Let's say we have the scripts "lvm", "cryptsetup" and "md" in
/usr/share/initramfs-tools/scripts/blockdev-scripts/

Each script has a udev rule snippet in
/usr/share/initramfs-tools/scripts/blockdev-rules/

Most probably these rule snippets would be something like (this is
probably not a valid udev rule, I can't check the syntax right now):
KERNEL="[sh]d[a-z]",
PROGRAM="/usr/share/initramfs-tools/scripts/blockdev-rules/md"

Let's say that /dev/sda1 is detected.

udev will then use its rules to execute
/usr/share/initramfs-tools/scripts/blockdev-scripts/lvm which will check
the device, realize it's no lvm pv and exit

the same thing then happens for the cryptsetup script

the md script recognizes /dev/sda1 as a raid partition, but it is missing
an additional device, so no action is taken

Later, /dev/sdb1 is detected.

udev calls the lvm script again, which exits again

the same thing then happends for the cryptsetup script

the md script recognizes /dev/sdb1 as a raid partition, and /dev/sda1 is
the other part of the raid device, so the device is setup and a new uevent
is triggered

in response, udev creates /dev/md1 and starts going through the scripts again

udev calls the lvm script again, which exits again

udev calls the cryptsetup script which recognizes /dev/md1 as a crypto
device, prompts for the password and sets it up, this generates another
uevent

in response, udev creates /dev/mapper/cryptroot and starts going through
the scripts again

udev calls the lvm script again, which recognizes /dev/mapper/cryptroot as
a lvm pv and sets up the vg and its lv's

the lv's generate new uevents

in response, udev creates (among others) /dev/mapper/mainvg-mainlv

init notices this and boot continues



Phew, this mail became much longer than expected....so whaddaya think Maks?

-- 
David Härdeman




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#401916: Bug 401916: analysis and suggested solution

Reply via email to