Package: dracut-core
Version: 059-4
Severity: normal
X-Debbugs-Cc: adrela...@kicksecure.com

While testing
https://salsa.debian.org/live-team/live-build/-/merge_requests/353, it was
discovered that a Debian 12 (Bookworm) live ISO build with dracut as an
initramfs would fail to boot. Debian Trixie and Debian 11 (Bullseye)
ISOs were both bootable however. This can be easily reproduced using
the fork of live-build above, however any method of creating a live
Debian Bookworm ISO with a dracut initramfs should fail similarly.
(This issue is affecting Kicksecure at the moment, which is not yet
built with live-build.)

Steps to reproduce:

* Clone https://salsa.debian.org/rclobus-guest/live-build.git
* Checkout the rclobus/dracut_support branch
* Build the source package (I used sbuild for this)
* Install the resulting live-build package into a Debian Sid machine
* Create a new directory to build a live ISO in, name it "live-dracut"
* Run `cd live-dracut; lb config --initramfs dracut-live`
* Using your text editor of choice, replace all instances of "bullseye"
  with "bookworm" in the "binary" and "config" files
* Just so your live image has a GUI when it works, run
  `echo "task-xfce-desktop >> config/package-lists/my.list.chroot`
* Run `sudo lb build 2>&1 | tee build.log` to make the ISO
* Attempt to boot the ISO (this can be done easily with QEMU by running
  `qemu-system-x86_64 -m 4G -smp 2 -enable-kvm -cdrom
  live-image-amd64.hybrid.iso`
* See boot failure, you are dropped to an emergency shell

The reason the boot fails can be seen by running
`cat /run/initramfs/rdsosreport.txt | grep LiveOS` - the following
error will be shown:

    mount: /sysroot: special device LiveOS_rootfs does not exist.

One easy workaround to the issue is to add
'rd.live.overlay.overlayfs=1' to the kernel command line. This results
in the iamge booting, however the mount failure error message still
appears and can be seen by grepping for 'LiveOS' in the output of `sudo
journalctl` once the live image boots.

I build a Trixie image using roughly the same steps as above and it
boots out of the box without problems, no kernel command line
modifications needed. Curiously Bullseye also boots without problems.

After much debugging, I determined that Trixie and Bookworm were
handling the overlay mounting step of the boot process slightly
differently, by booting with "rd.debug" and "rd.break" kernel
parameters and then grepping through the "rdsosreport.txt" file that
was generated when I was dropped into the rescue shell.

* Trixie:
  * dracut loads the overlayfs kernel module.
  * dracut then creates a directory "/run/overlayfs" using the command
    `mkdir -m 0755 /run/overlayfs`.
  * systemd then runs
    `mount LiveOS_rootfs /sysroot -t overlay -o
  ,lowerdir=/run/rootfsbase,upperdir=/run/overlayfs,workdir=/run/ovlwork`.
* Bookworm:
  * dracut loads the overlayfs kernel module.
  * systemd then immediately tries to run
    `mount LiveOS_rootfs /sysroot -t overlay -o
  ,lowerdir=/run/rootfsbase,upperdir=/run/overlayfs,workdir=/run/ovlwork`,
  without making "/run/overlayfs" first.
  * systemd's mount attempt fails with
    "kernel: overlayfs: failed to resolve '/run/overlayfs': -2", i.e.
  griping that "/run/overlayfs" does not exist.

After searching through "rdsosreport.txt" a bit more, I discovered that
the script responsible for running `mkdir -m 0755 /run/overlayfs`
earlier was a component of the 90overlayfs module. This came as a
surprise, since Trixie did not have the "rd.live.overlay.overlayfs=1"
variable set on the kernel command line. This led me to compare the
modules and their behavior under Trixie and Bookworm (again using
"rdsosreport.txt" to provide me line-by-line debug logs of what the
scripts were doing). This revealed three things:

* dracut in Bookworm cannot autodetect when an overlayfs is required and
  requires overlayfs support to be explicitly requested. dracut in
  Trixie can autodetect when overlayfs support is needed.
* dracut's 90overlayfs module in Trixie is split into two stages, one in
  "prepare-overlayfs.sh" and one in "mount-overlayfs.sh". The former
  does some prep work (notably including `mkdir -m 0755 -p
  /run/overlayfs`), the latter is *intended* to mount the overlayfs but
  will skip doing so if something else (i.e. a systemd unit) has
  already done so. The "prepare-overlayfs.sh" script is installed as a
  pre-mount hook in dracut, so it runs well before systemd tries to
  mount the overlayfs. The "mount-overlayfs.sh" script it installed as
  both mount and pre-pivot hooks (though curiously only runs in the
  pre-pivot hook), so it runs after systemd mounts the overlayfs.
* In Bookworm, the 90overlayfs module consists of only a single stage,
  "mount-overlayfs.sh". This one stage handles both preparation and
  mounting of the overlayfs. It is installed as a mount hook only,
  meaning it only runs after systemd attempts (and fails) to mount the
  squashfs.

This results in the observed behavior. In Bookworm, systemd will always
try and fail to mount the overlayfs itself since it's trying to mount
it before the upper directory "/run/overlayfs" is ever created. When
booting without "rd.live.overlay.overlayfs=1", the overlayfs is never
mounted since Bookworm can't autodetect when the mount is needed, and
so the root filesystem isn't set up and the boot fails. When booting
with "rd.live.overlay.overlayfs=1", systemd still fails to mount the
overlayfs, but this time "mount-overlayfs.sh" succeeds in doing so and
the boot succeeds (albeit with an error message).

Trixie's boot on the other hand just works. It autodetects that an
overlayfs mount is needed. Before systemd tries to mount it, the
"prepare-overlayfs.sh" script runs, meaning that "/run/overlayfs" is
created. systemd then tries to mount the overlayfs and succeeds at
doing so. THen when "mount-overlayfs.sh" attempt to do its job, it sees
the overlayfs is already mounted, and doesn't try to mount it again.

To fix the bug under Bookworm, I believe the only thing necessary is to
patch Bookworm's 90overlayfs module to use the split stage approach
used in Trixie. The boot will still fail if the user boots without
"rd.live.overlay.overlayfs=1" (since the need for an overlayfs isn't
automatically detected), but when booting with
"rd.live.overlay.overlayfs=1", the boot will succeed *and* not throw an
error message. (Figuring out how Trixie's dracut autodetects when an
overlayfs mount is needed would be neat, but would also probably
require a substantial backporting effort that might not be appropriate
for a stable release.)


-- System Information:
Debian Release: trixie/sid
  APT prefers noble-updates
  APT policy: (500, 'noble-updates'), (500, 'noble-security'), (500,
'noble'), (100, 'noble-backports') Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.8.0-45-kfocus (SMP w/16 CPU threads; PREEMPT)
Kernel taint flags: TAINT_USER, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8),
LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages dracut-core depends on:
ii  cpio            2.15+dfsg-1ubuntu2
ii  dracut-install  060+5-1ubuntu3.2
ii  e2fsprogs       1.47.0-2.4~exp1ubuntu4.1
ii  kmod            31+20240202-2ubuntu7
ii  kpartx          0.9.4-5ubuntu8
ii  libc6           2.39-0ubuntu8.3
ii  libkmod2        31+20240202-2ubuntu7
ii  udev            255.4-1ubuntu8.4

Versions of packages dracut-core recommends:
ii  binutils              2.42-4ubuntu2
ii  console-setup         1.226ubuntu1
ii  cryptsetup            2:2.7.0-1ubuntu4.1
ii  dmraid                1.0.0.rc16-12ubuntu2
ii  dmsetup               2:1.02.185-3ubuntu3.1
ii  lvm2                  2.03.16-3ubuntu3.1
pn  mdadm                 <none>
ii  pigz                  2.8-1
ii  pkg-config            1.8.1-2build1
ii  pkgconf [pkg-config]  1.8.1-2build1
ii  systemd               255.4-1ubuntu8.4

dracut-core suggests no packages.

Reply via email to