Bug#782793: systemd: ext4 filesystem on lvm on raid causes boot to enter emergency shell

Rick Thomas Wed, 22 Apr 2015 03:15:52 -0700

Hi Michael,

Thanks very much for helping me with this. (continued following quoted material)

On Apr 21, 2015, at 11:17 AM, Michael Biebl <bi...@debian.org> wrote:

> control: tags -1 moreinfo unreproducible
> 
> Am 18.04.2015 um 02:02 schrieb Rick Thomas:
>> 
>> On Apr 17, 2015, at 3:44 PM, Michael Biebl <bi...@debian.org> wrote:
>>> 
>>> Thanks for the data.
>>> Looks like an lvm issue to me:
>>> 
>>> root@cube:~# lvscan
>>> inactive          '/dev/vg1/backup' [87.29 GiB] inherit
>>> 
>>> and as a result, /dev/disk/by-label/BACKUP is missing.
>> 
>> Yes, that’s true, of course.  But the question is, what is keeping lvm from 
>> activating the volume?
>> 
>> It works fine for a logical volume on a single physical disk.  And 
>> /proc/mdstat shows that the RAID device, /dev/md127, _is_ active.  Or, at 
>> least it is when we get to emergency mode… I don’t know if it’s active when 
>> the fsck times out, of course… If you know how to figure that out from the 
>> systemd journal I attached to the original bug report, or any other way that 
>> I can try, I’d appreciate any assistance you can give!
> 
> fwiw, I tried to reproduce the problem in a VM with two additional disks
> attached and a setup like the following:
> 
> ext4 on RAID1 (via mdadm)
> ext4 on LVM on RAID1 (mdadm)
> ext4 on LVM
> ext4 on dos partition.
> 
> All partitions were correctly mounted during boot without any issues.
> 
> 
> Is this a fresh jessie installation or an upgraded system?
> Do you have any custom udev rules in /lib/udev/rules.d or /etc/udev/rules.d?
> 
> If it's an upgraded system and you have the sysvinit package installed,
> you can try booting with sysvinit temporarily via
> init=/lib/sysvinit/init on the kernel command line.
> 
> Does that work?

My physical setup is this:  The hardware is a quad-core armv7 Cubox i4pro ( 
https://wiki.debian.org/ArmHardFloatPort/CuBox-i )

With some help from Karsten Merker, I got a plain-vanilla — un-modified — 
Jessie installed on it to use for experimenting.  I wanted experience with the 
Cubox hardware and with using Jessie in a “real life” situation.

The boot (and system residency: root, swap, /home, /var — the works) is on a 
32GB microSD card.

I’ve added to that an eSATA 1TB hard disk (currently configured as single 
filesystem using LVM) and a 7-port USB2.0 hub with 5 of the ports each holding 
a 32GB USB-Flash stick.  Those 5 devices are configured as a software (md) 
RAID6 array (I wanted to get some experience with RAID6) providing about 90GB 
of useful space configured with LVM as a single logical volume.

It’s the RAID6 array (or rather the lv on it) that is having the problem.

I’ve managed to make it work using a cron script that runs at reboot time 
(crontab has:
        @reboot bash -x bin/mount-backup
and the mount-backup script looks like this:
###################################
    logger -s -t mount-backup 'Mounting the /backup filesystem'

    (
    let count=10    # don’t try uselessly forever if it fails
    # If it doesn’t exist, take remedial action.
    while [ ! -h /dev/disk/by-label/BACKUP ]
    do
        let count=$count-1
        [ $count -lt 1 ] && exit 1
        sleep 10   # give things some time to settle
        cat /proc/mdstat    # show some debugging information
        # see if the raid has assembled and can be used
        /sbin/vgchange -ay
    done

    # If the fsck isn’t perfect, quit and wait for human intervention
    /sbin/fsck -nf /backup && /bin/mount /backup
    ) | logger -s -t mount-backup
###################################

This works.  Interestingly, without the sleep loop the vgchange fails.

Now, you say that a VM with two virtual disks configured as RAID1 with a 
logical volume works fresh out of the box.  This makes me wonder if it’s some 
kind of a timing problem…  It takes a few seconds for the freshly rebooted 
system to find the USB-Flash sticks and assemble them.  So  some time-out is 
triggered in the systemd stuff on my setup, while your setup has no such 
physical constraints — everything is available immediately.

That’s just a guess…  But fortunately, it’s a testable guess!

My setup is (at the moment) just for experimentation and learning — no actual 
useful work.  So I can re-install it at will or make any changes I need to to 
track this down.

Your suggestion about trying sysvinit will be a good place to start.  If that 
works with my workaround script disabled, the next experiment will be to try 
systemd with a rootdelay=10.   I will also try the VM setup, just to see if I 
can replicate your result.  After that, I’m not sure — any suggestions will be 
appreciated!

I’ll get back to you when I’ve made those tests.  Real-life(TM) will probably 
prevent me from doing that before the week-end.

Enjoy!  And Thanks for all the help!

Rick

--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#782793: systemd: ext4 filesystem on lvm on raid causes boot to enter emergency shell

Reply via email to