Re: lenny regression initrd/lvm/ rootfs detection timeout

2008-10-12 Thread Florian Lohoff
On Tue, Oct 07, 2008 at 07:04:09PM +0100, Stephen Gran wrote:
> If modprobe returns before the device is actually initialized and has
> created sysfs entries, this is probably not fixable in shell scripts.
> If, as I suspect, modprobe does not return immediately, this is probably
> a bug in the scripts that don't call udevsettle and wait for the sysfs
> entries to be turned into block devices for the next script to act on.
> Ferenc, since you are affected, can you test?

The point is that an easy fix would be to rescan lvm devices on timeout
instead of just looking for the root dev node existing.

The point is if the lvm pv's are not available on lvm scan but come up
later the whole boot process stops.

So instead of going the "right" way of using some kind of udev trigger 
one could now as a quick fix rerun the lvm start script on the timeouts
which would solve the logical volume as root on late blockdev.

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED] +49-171-2280134
Those who would give up a little freedom to get a little 
  security shall soon have neither - Benjamin Franklin


signature.asc
Description: Digital signature


Re: lenny regression initrd/lvm/ rootfs detection timeout

2008-10-08 Thread Ferenc Wagner
Ferenc Wagner <[EMAIL PROTECTED]> writes:

> Stephen Gran <[EMAIL PROTECTED]> writes:
>
>> Ferenc, since you are affected, can you test?
>
> Using udevsettle (udevadm settle) instead of sleep?

That seems to work for me in this case.  Thanks for the tip!
-- 
Cheers,
Feri.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: lenny regression initrd/lvm/ rootfs detection timeout

2008-10-07 Thread Ferenc Wagner
Stephen Gran <[EMAIL PROTECTED]> writes:

> This one time, at band camp, Ferenc Wagner said:
>> maximilian attems <[EMAIL PROTECTED]> writes:
>> 
>>> On Tue, Oct 07, 2008 at 03:54:59PM +0200, Florian Lohoff wrote:
 On Tue, Oct 07, 2008 at 03:10:47PM +0200, maximilian attems wrote:
>>> standard answer boot with
>>> rootdelay=X
>> 
>> It worked with etch without that parameter and the upgraded did not add
>> it so its a lenny regression - isnt it?
> 
> no it was just luck that it didn't hit you previously.
> kernel gives no guarantee on timing.
 
 This renders the argument with "rootdelay=" moot - When the kernel gives
 no guarantee on timing ANY rootdelay works just by luck. So coming back
 to this issue i consider this still a bug - When a block device comes
 available the lvm code needs to scan it in case the rootfs is an lvm.
 
 The whole issue with finding the rootfs in the initrd needs to be
 triggered and not waited for base on the statement of yours.
>>>
>>> too late for such changements and no that is not the solution either.
>> 
>> Care to elaborate why not?  The principle surely sounds better:
>> instead of fragile arbitrary delays, wait until the event we're
>> interested in happens.  Actually, I always dreamed of a system where
>> every dependency is encoded as udev rules, and the boot process only
>> has to wait for the root device to appear.  And I'm stuck now with a
>> problem even rootdelay can't help: local-top/iscsi finishes before
>> /dev/sda appears, so local-top/lvm has nothing to activate.  And
>> there's no rootdelay in between...
>
> If modprobe returns before the device is actually initialized and has
> created sysfs entries, this is probably not fixable in shell scripts.
> If, as I suspect, modprobe does not return immediately, this is probably
> a bug in the scripts that don't call udevsettle and wait for the sysfs
> entries to be turned into block devices for the next script to act on.

It isn't always a modprobe issue.  For example iSCSI can create new
block devices long after the modules are loaded, depending on network
delays (it may not be the case here, I'm not sure what iscsistart
does).  But you can also think CONFIG_SCSI_SCAN_ASYNC...  Events can
arrive any time (when you plug in your pendrive), udevsettle can only
wait for the event queue to empty, not for future events.

> Ferenc, since you are affected, can you test?

Using udevsettle (udevadm settle) instead of sleep?  Sure, but only
tomorrow.  Actually, that may be the best fix for the open-iscsi
initramfs script, if iscsistart provides the timing guarrantees the
kernel does not. :)  Thanks for the suggestion!
-- 
Cheers,
Feri.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: lenny regression initrd/lvm/ rootfs detection timeout

2008-10-07 Thread Stephen Gran
This one time, at band camp, Ferenc Wagner said:
> maximilian attems <[EMAIL PROTECTED]> writes:
> 
> > On Tue, Oct 07, 2008 at 03:54:59PM +0200, Florian Lohoff wrote:
> >> On Tue, Oct 07, 2008 at 03:10:47PM +0200, maximilian attems wrote:
> > standard answer boot with
> > rootdelay=X
>  
>  It worked with etch without that parameter and the upgraded did not add
>  it so its a lenny regression - isnt it?
> >>> 
> >>> no it was just luck that it didn't hit you previously.
> >>> kernel gives no guarantee on timing.
> >> 
> >> This renders the argument with "rootdelay=" moot - When the kernel gives
> >> no guarantee on timing ANY rootdelay works just by luck. So coming back
> >> to this issue i consider this still a bug - When a block device comes
> >> available the lvm code needs to scan it in case the rootfs is an lvm.
> >> 
> >> The whole issue with finding the rootfs in the initrd needs to be
> >> triggered and not waited for base on the statement of yours.
> >
> > too late for such changements and no that is not the solution either.
> 
> Care to elaborate why not?  The principle surely sounds better:
> instead of fragile arbitrary delays, wait until the event we're
> interested in happens.  Actually, I always dreamed of a system where
> every dependency is encoded as udev rules, and the boot process only
> has to wait for the root device to appear.  And I'm stuck now with a
> problem even rootdelay can't help: local-top/iscsi finishes before
> /dev/sda appears, so local-top/lvm has nothing to activate.  And
> there's no rootdelay in between...

(I have nothing to do with any of the affected software, just a comment
as an interested person).

If modprobe returns before the device is actually initialized and has
created sysfs entries, this is probably not fixable in shell scripts.
If, as I suspect, modprobe does not return immediately, this is probably
a bug in the scripts that don't call udevsettle and wait for the sysfs
entries to be turned into block devices for the next script to act on.
Ferenc, since you are affected, can you test?

Cheers,
-- 
 -
|   ,''`.Stephen Gran |
|  : :' :[EMAIL PROTECTED] |
|  `. `'Debian user, admin, and developer |
|`- http://www.debian.org |
 -


signature.asc
Description: Digital signature


Re: lenny regression initrd/lvm/ rootfs detection timeout

2008-10-07 Thread Ferenc Wagner
maximilian attems <[EMAIL PROTECTED]> writes:

> On Tue, Oct 07, 2008 at 03:54:59PM +0200, Florian Lohoff wrote:
>> On Tue, Oct 07, 2008 at 03:10:47PM +0200, maximilian attems wrote:
> standard answer boot with
> rootdelay=X
 
 It worked with etch without that parameter and the upgraded did not add
 it so its a lenny regression - isnt it?
>>> 
>>> no it was just luck that it didn't hit you previously.
>>> kernel gives no guarantee on timing.
>> 
>> This renders the argument with "rootdelay=" moot - When the kernel gives
>> no guarantee on timing ANY rootdelay works just by luck. So coming back
>> to this issue i consider this still a bug - When a block device comes
>> available the lvm code needs to scan it in case the rootfs is an lvm.
>> 
>> The whole issue with finding the rootfs in the initrd needs to be
>> triggered and not waited for base on the statement of yours.
>
> too late for such changements and no that is not the solution either.

Care to elaborate why not?  The principle surely sounds better:
instead of fragile arbitrary delays, wait until the event we're
interested in happens.  Actually, I always dreamed of a system where
every dependency is encoded as udev rules, and the boot process only
has to wait for the root device to appear.  And I'm stuck now with a
problem even rootdelay can't help: local-top/iscsi finishes before
/dev/sda appears, so local-top/lvm has nothing to activate.  And
there's no rootdelay in between...
-- 
Thanks,
Feri.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: lenny regression initrd/lvm/ rootfs detection timeout

2008-10-07 Thread maximilian attems
On Tue, Oct 07, 2008 at 03:54:59PM +0200, Florian Lohoff wrote:
> On Tue, Oct 07, 2008 at 03:10:47PM +0200, maximilian attems wrote:
> > > > standard answer boot with
> > > > rootdelay=X
> > > 
> > > It worked with etch without that parameter and the upgraded did not add
> > > it so its a lenny regression - isnt it?
> > 
> > no it was just luck that it didn't hit you previously.
> > kernel gives no guarantee on timing.
> 
> This renders the argument with "rootdelay=" moot - When the kernel gives
> no guarantee on timing ANY rootdelay works just by luck. So coming back
> to this issue i consider this still a bug - When a block device comes
> available the lvm code needs to scan it in case the rootfs is an lvm.
> 
> The whole issue with finding the rootfs in the initrd needs to be
> triggered and not waited for base on the statement of yours.

too late for such changements and no that is not the solution either.

if you read realease notes for etch you find a rootdelay chapter,
probably is copied over to lenny.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: lenny regression initrd/lvm/ rootfs detection timeout

2008-10-07 Thread maximilian attems
On Tue, Oct 07, 2008 at 03:05:47PM +0200, Florian Lohoff wrote:
> On Tue, Oct 07, 2008 at 03:02:05PM +0200, maximilian attems wrote:
> > Subject: Re: lenny regression initrd/lvm/ rootfs detection timeout
> > 
> > On Tue, Oct 07, 2008 at 02:20:13PM +0200, Florian Lohoff wrote:
> > > 
> > > Hi,
> > > after upgrading an FSI RX/300 from etch to lenny the machine would not
> > > boot anymore. It got stuck in the initrd not beeing able to find the
> > > root filesystem. The cause was that the aacraid took too long to make
> > > the root filesystem available. Thus the boot timed out and the initrd
> > > waited for the root filesystem to get available. After some seconds >45
> > > the root disks (sda on an aacraid) got available but the boot failed
> > > anyway dropping into the initrd. The cause was that the root is an lvm
> > > which is on that disk and the lvm does not get retried after more disks
> > > get available.
> > > 
> > > I got the machine to boot by running /scripts/top-local/lvm2 which made
> > > the root filesystem in the lvm available and ctrl-d to continue booting.
> > > 
> > > I think after more disks get available the initrd should retry running
> > > the lvm detection otherwise a lot of lvm based systems might die/get
> > > stuck on upgrade.
> > > 
> > > I'd consider this a RC bug - no clue whose fault this is though ...
> > 
> > standard answer boot with
> > rootdelay=X
> 
> It worked with etch without that parameter and the upgraded did not add
> it so its a lenny regression - isnt it?

no it was just luck that it didn't hit you previously.
kernel gives no guarantee on timing.
 


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: lenny regression initrd/lvm/ rootfs detection timeout

2008-10-07 Thread Florian Lohoff
On Tue, Oct 07, 2008 at 03:10:47PM +0200, maximilian attems wrote:
> > > standard answer boot with
> > > rootdelay=X
> > 
> > It worked with etch without that parameter and the upgraded did not add
> > it so its a lenny regression - isnt it?
> 
> no it was just luck that it didn't hit you previously.
> kernel gives no guarantee on timing.

This renders the argument with "rootdelay=" moot - When the kernel gives
no guarantee on timing ANY rootdelay works just by luck. So coming back
to this issue i consider this still a bug - When a block device comes
available the lvm code needs to scan it in case the rootfs is an lvm.

The whole issue with finding the rootfs in the initrd needs to be
triggered and not waited for base on the statement of yours.

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED] +49-171-2280134
Those who would give up a little freedom to get a little 
  security shall soon have neither - Benjamin Franklin


signature.asc
Description: Digital signature


Re: lenny regression initrd/lvm/ rootfs detection timeout

2008-10-07 Thread maximilian attems
On Tue, Oct 07, 2008 at 02:20:13PM +0200, Florian Lohoff wrote:
> 
> Hi,
> after upgrading an FSI RX/300 from etch to lenny the machine would not
> boot anymore. It got stuck in the initrd not beeing able to find the
> root filesystem. The cause was that the aacraid took too long to make
> the root filesystem available. Thus the boot timed out and the initrd
> waited for the root filesystem to get available. After some seconds >45
> the root disks (sda on an aacraid) got available but the boot failed
> anyway dropping into the initrd. The cause was that the root is an lvm
> which is on that disk and the lvm does not get retried after more disks
> get available.
> 
> I got the machine to boot by running /scripts/top-local/lvm2 which made
> the root filesystem in the lvm available and ctrl-d to continue booting.
> 
> I think after more disks get available the initrd should retry running
> the lvm detection otherwise a lot of lvm based systems might die/get
> stuck on upgrade.
> 
> I'd consider this a RC bug - no clue whose fault this is though ...

standard answer boot with
rootdelay=X


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: lenny regression initrd/lvm/ rootfs detection timeout

2008-10-07 Thread Florian Lohoff
On Tue, Oct 07, 2008 at 03:02:05PM +0200, maximilian attems wrote:
> Subject: Re: lenny regression initrd/lvm/ rootfs detection timeout
> 
> On Tue, Oct 07, 2008 at 02:20:13PM +0200, Florian Lohoff wrote:
> > 
> > Hi,
> > after upgrading an FSI RX/300 from etch to lenny the machine would not
> > boot anymore. It got stuck in the initrd not beeing able to find the
> > root filesystem. The cause was that the aacraid took too long to make
> > the root filesystem available. Thus the boot timed out and the initrd
> > waited for the root filesystem to get available. After some seconds >45
> > the root disks (sda on an aacraid) got available but the boot failed
> > anyway dropping into the initrd. The cause was that the root is an lvm
> > which is on that disk and the lvm does not get retried after more disks
> > get available.
> > 
> > I got the machine to boot by running /scripts/top-local/lvm2 which made
> > the root filesystem in the lvm available and ctrl-d to continue booting.
> > 
> > I think after more disks get available the initrd should retry running
> > the lvm detection otherwise a lot of lvm based systems might die/get
> > stuck on upgrade.
> > 
> > I'd consider this a RC bug - no clue whose fault this is though ...
> 
> standard answer boot with
> rootdelay=X

It worked with etch without that parameter and the upgraded did not add
it so its a lenny regression - isnt it?

And in my case it was a  remote reboot where the machine did not come
back - so i needed to go there physically - i am on the lucky side as
i tested with a machine next door and not 400km away ...

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED] +49-171-2280134
Those who would give up a little freedom to get a little 
  security shall soon have neither - Benjamin Franklin


signature.asc
Description: Digital signature


lenny regression initrd/lvm/ rootfs detection timeout

2008-10-07 Thread Florian Lohoff

Hi,
after upgrading an FSI RX/300 from etch to lenny the machine would not
boot anymore. It got stuck in the initrd not beeing able to find the
root filesystem. The cause was that the aacraid took too long to make
the root filesystem available. Thus the boot timed out and the initrd
waited for the root filesystem to get available. After some seconds >45
the root disks (sda on an aacraid) got available but the boot failed
anyway dropping into the initrd. The cause was that the root is an lvm
which is on that disk and the lvm does not get retried after more disks
get available.

I got the machine to boot by running /scripts/top-local/lvm2 which made
the root filesystem in the lvm available and ctrl-d to continue booting.

I think after more disks get available the initrd should retry running
the lvm detection otherwise a lot of lvm based systems might die/get
stuck on upgrade.

I'd consider this a RC bug - no clue whose fault this is though ...

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED] +49-171-2280134
Those who would give up a little freedom to get a little 
  security shall soon have neither - Benjamin Franklin


signature.asc
Description: Digital signature