Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot
Hey I got lucky with a google search and found what would be a dupe if it were ever finished. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=805821 Somebody else managed to track the problem down. For some reason systemd is only watching for /dev/dm-* rather than the long dmraid device names. This is the equivalent of watching for /dev/sda1 instead of the UUID. Nikita Youshchenko: > This way boot is ok (i.e. no delays and swap is enabled). But this is ugly > and unreliable. This does work when there's only one raid array in the OS, which is most of the time. These are two real devices (no symlinks) with the same major and minor numbers.
Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot
Am 13.01.20 um 01:10 schrieb Joshua Hudson: > I double-checked with init=/bin/sh. All the devices already exist in > /dev/mapper before /linuxrc shuts down the initrd copy of udev. > systemd will never see the activation of a udev device because that's > already done. They're all active. > Well, the udev database from the initrd is transferred to the real root (via /run/udev) So I don't see how this explains the problem. signature.asc Description: OpenPGP digital signature
Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot
I double-checked with init=/bin/sh. All the devices already exist in /dev/mapper before /linuxrc shuts down the initrd copy of udev. systemd will never see the activation of a udev device because that's already done. They're all active.
Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot
That turned out to be really easy to track down. The reason you aren't getting any notification of the raid array being activated is it was already activated in initrd. It will not be activated a second time. dmraid has some interesting principles in play here. 1) dmraid has its own idea of persistent, path-independent names of devices. 2) dmraid is incompatible with UUID= or LABEL=; all users of dmraid must have the raid device names in /etc/fstab and on the kernel command line. GRUB_LINUX_DISABLE_UUID *must* be used. 3) Under normal operation, dmraid is included in initrd as soon as it is installed on the system. 4) If dmraid is installed in initrd, udev will discover the dmraid devices before / is mounted. If udev discovers the dmraid devices, it will start the system. 5) Redundant activations are not possible. So the effect of all this is really simple. systemd can just assume that a dmraid device under /dev/mapper is active if the device node exists at all, and on any reasonable system it doesn't even need to poll the device path; it can check *once* and know the success or failure immediately.
Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot
Control: retitle -1 dmraid volumes are not marked as active, resulting in time out Am 12.01.20 um 20:49 schrieb Joshua Hudson: >>> Dec 31 08:27:23 nova dmraid-activate[554]: ERROR: Cannot retrieve RAID > set information for isw_cfbejbfeib_Volume1 >> Have you further investigated those error messages? > > I have now investigated and it's clearly spurious. > > udev tries to run the following commands: > > /sbin/dmraid-activate /dev/sda > /sbin/dmraid-activate /dev/sdb > /sbin/dmraid-activate sda > /sbin/dmraid-activate sdb > > The first two are correct and succeed. The second two are not correct > and fail. The failure results in log spam. > >> Have you tried with mdadm instead? > > mdadm doesn't recognize the raid descriptor. > >> udevadm info /dev/mapper/isw_cfbejbfeib_Volume14 > > output of this command for this device and all similar devices attached. > Bad news is, that dmraid is basically dead and unmaintained since quite a while. It's unfortunate, that mdadm doesn't have support for your RAID controller. Unfortunately I'm not sure if I can help you further at this point, since I'm not familiar with dmraid. It would probably be a good idea to reassign this bug report to dmraid. But since dmraid upstream is dead and the last Debian upload is 3 years old, I'm not sure you'll get any help there. Maybe you can hack the /sbin/dmraid-activate script and call dmraid with the -d switch to get a more verbose debug output. Maybe that provides you with some clues. signature.asc Description: OpenPGP digital signature
Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot
>> Dec 31 08:27:23 nova dmraid-activate[554]: ERROR: Cannot retrieve RAID set information for isw_cfbejbfeib_Volume1 > Have you further investigated those error messages? I have now investigated and it's clearly spurious. udev tries to run the following commands: /sbin/dmraid-activate /dev/sda /sbin/dmraid-activate /dev/sdb /sbin/dmraid-activate sda /sbin/dmraid-activate sdb The first two are correct and succeed. The second two are not correct and fail. The failure results in log spam. > Have you tried with mdadm instead? mdadm doesn't recognize the raid descriptor. > udevadm info /dev/mapper/isw_cfbejbfeib_Volume14 output of this command for this device and all similar devices attached. P: /devices/virtual/block/dm-0 N: dm-0 L: 0 S: disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume1 S: disk/by-id/dm-name-isw_cfbejbfeib_Volume1 S: android0 E: DEVPATH=/devices/virtual/block/dm-0 E: DEVNAME=/dev/dm-0 E: DEVTYPE=disk E: MAJOR=254 E: MINOR=0 E: SUBSYSTEM=block E: USEC_INITIALIZED=1701034 E: DM_UDEV_DISABLE_DM_RULES_FLAG=1 E: DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG=1 E: DM_UDEV_PRIMARY_SOURCE_FLAG=1 E: DM_UDEV_RULES=1 E: DM_UDEV_RULES_VSN=2 E: DM_ACTIVATION=1 E: DM_NAME=isw_cfbejbfeib_Volume1 E: DM_UUID=DMRAID-isw_cfbejbfeib_Volume1 E: DM_SUSPENDED=0 E: ID_PART_TABLE_UUID=373c9bd5 E: ID_PART_TABLE_TYPE=dos E: DEVLINKS=/dev/disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume1 /dev/disk/by-id/dm-name-isw_cfbejbfeib_Volume1 /dev/android0 E: TAGS=:systemd: P: /devices/virtual/block/dm-1 N: dm-1 L: 0 S: android1 S: disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume11 S: disk/by-id/dm-name-isw_cfbejbfeib_Volume11 S: disk/by-uuid/d847c628-aa10-4bef-92b6-72ebacc07d7b E: DEVPATH=/devices/virtual/block/dm-1 E: DEVNAME=/dev/dm-1 E: DEVTYPE=disk E: MAJOR=254 E: MINOR=1 E: SUBSYSTEM=block E: USEC_INITIALIZED=1707100 E: DM_UDEV_DISABLE_DM_RULES_FLAG=1 E: DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG=1 E: DM_UDEV_PRIMARY_SOURCE_FLAG=1 E: DM_UDEV_RULES=1 E: DM_UDEV_RULES_VSN=2 E: DM_ACTIVATION=1 E: DM_NAME=isw_cfbejbfeib_Volume11 E: DM_UUID=DMRAID-isw_cfbejbfeib_Volume11 E: DM_SUSPENDED=0 E: ID_FS_UUID=d847c628-aa10-4bef-92b6-72ebacc07d7b E: ID_FS_UUID_ENC=d847c628-aa10-4bef-92b6-72ebacc07d7b E: ID_FS_VERSION=1.0 E: ID_FS_TYPE=ext3 E: ID_FS_USAGE=filesystem E: DEVLINKS=/dev/android1 /dev/disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume11 /dev/disk/by-id/dm-name-isw_cfbejbfeib_Volume11 /dev/disk/by-uuid/d847c628-aa10-4bef-92b6-72ebacc07d7b E: TAGS=:systemd: P: /devices/virtual/block/dm-2 N: dm-2 L: 0 S: disk/by-id/dm-name-isw_cfbejbfeib_Volume12 S: disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume12 S: disk/by-uuid/baad791f-0ba2-4c87-92a1-0cd0ef4f7301 S: android2 E: DEVPATH=/devices/virtual/block/dm-2 E: DEVNAME=/dev/dm-2 E: DEVTYPE=disk E: MAJOR=254 E: MINOR=2 E: SUBSYSTEM=block E: USEC_INITIALIZED=1707179 E: DM_UDEV_DISABLE_DM_RULES_FLAG=1 E: DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG=1 E: DM_UDEV_PRIMARY_SOURCE_FLAG=1 E: DM_UDEV_RULES=1 E: DM_UDEV_RULES_VSN=2 E: DM_ACTIVATION=1 E: DM_NAME=isw_cfbejbfeib_Volume12 E: DM_UUID=DMRAID-isw_cfbejbfeib_Volume12 E: DM_SUSPENDED=0 E: ID_FS_UUID=baad791f-0ba2-4c87-92a1-0cd0ef4f7301 E: ID_FS_UUID_ENC=baad791f-0ba2-4c87-92a1-0cd0ef4f7301 E: ID_FS_VERSION=1 E: ID_FS_TYPE=swap E: ID_FS_USAGE=other E: DEVLINKS=/dev/disk/by-id/dm-name-isw_cfbejbfeib_Volume12 /dev/disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume12 /dev/disk/by-uuid/baad791f-0ba2-4c87-92a1-0cd0ef4f7301 /dev/android2 E: TAGS=:systemd: P: /devices/virtual/block/dm-3 N: dm-3 L: 0 S: android3 S: disk/by-id/dm-name-isw_cfbejbfeib_Volume13 S: disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume13 S: disk/by-uuid/5744b8b7-dcbf-48c4-868e-fb71fa65d456 E: DEVPATH=/devices/virtual/block/dm-3 E: DEVNAME=/dev/dm-3 E: DEVTYPE=disk E: MAJOR=254 E: MINOR=3 E: SUBSYSTEM=block E: USEC_INITIALIZED=1707545 E: DM_UDEV_DISABLE_DM_RULES_FLAG=1 E: DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG=1 E: DM_UDEV_PRIMARY_SOURCE_FLAG=1 E: DM_UDEV_RULES=1 E: DM_UDEV_RULES_VSN=2 E: DM_ACTIVATION=1 E: DM_NAME=isw_cfbejbfeib_Volume13 E: DM_UUID=DMRAID-isw_cfbejbfeib_Volume13 E: DM_SUSPENDED=0 E: ID_FS_UUID=5744b8b7-dcbf-48c4-868e-fb71fa65d456 E: ID_FS_UUID_ENC=5744b8b7-dcbf-48c4-868e-fb71fa65d456 E: ID_FS_VERSION=1.0 E: ID_FS_TYPE=ext3 E: ID_FS_USAGE=filesystem E: DEVLINKS=/dev/android3 /dev/disk/by-id/dm-name-isw_cfbejbfeib_Volume13 /dev/disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume13 /dev/disk/by-uuid/5744b8b7-dcbf-48c4-868e-fb71fa65d456 E: TAGS=:systemd: P: /devices/virtual/block/dm-4 N: dm-4 L: 0 S: disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume14 S: disk/by-uuid/c03a09e6-ff90-4803-bd7c-78e34e99457e S: disk/by-id/dm-name-isw_cfbejbfeib_Volume14 S: android4 E: DEVPATH=/devices/virtual/block/dm-4 E: DEVNAME=/dev/dm-4 E: DEVTYPE=disk E: MAJOR=254 E: MINOR=4 E: SUBSYSTEM=block E: USEC_INITIALIZED=1708143 E: DM_UDEV_DISABLE_DM_RULES_FLAG=1 E: DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG=1 E: DM_UDEV_PRIMARY_SOURCE_FLAG=1 E: DM_UDEV_RULES=1 E: DM_UDEV_RULES_VSN=2 E: DM_ACTIVATION=1 E:
Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot
Thanks for the log. Seems there is something at odds with the RAID configuration: Dec 31 08:27:23 nova dmraid-activate[553]: ERROR: Cannot retrieve RAID set information for isw_cfbejbfeib_Volume1 Dec 31 08:27:23 nova dmraid-activate[554]: ERROR: Cannot retrieve RAID set information for isw_cfbejbfeib_Volume1 Have you further investigated those error messages? This smells like a dmraid issue to me. dmraid needs to properly activate your volumes and mark them as ready, which apparently doesn't happen. Have you tried with mdadm instead? signature.asc Description: OpenPGP digital signature
Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot
Am 31.12.19 um 01:24 schrieb Joshua: > Package: systemd > Version: 241-7~deb10u2 > Severity: important > > Dear Maintainer, > >* What led up to the situation? > > apt-get install systemd-sysv && reboot > >* What was the outcome of this action? > > boots to emergency; journald spams the console with errors, can't login as > nonroot, can't start X When you are dropped into the emergency shell, please run udevadm info /dev/mapper/isw_cfbejbfeib_Volume14 and attach the output to the bug report. If you have a locked-down root user without password, you can also enable the debug-shell on tty9, by adding systemd.debug_shell=true to the kernel command line. signature.asc Description: OpenPGP digital signature
Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot
Control: tags -1 + moreinfo Please provide a verbose debug log: https://freedesktop.org/wiki/Software/systemd/Debugging/#index1h1 signature.asc Description: OpenPGP digital signature
Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot
To let the gravity of the but sink in, this is the pair of scripts that I jammed in front of systemd that got it to somehow behave itself: /sbin/init: #!/bin/sh echo "INIT: rc init is in startup" PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin /etc/rc.earlyboot || PS1='init# ' PS2='init >' PS4='init +' exec /bin/sh -i exec /lib/systemd/systemd /etc/rc.earlyboot: #!/bin/bash exec 0<>/dev/tty1 1>&0 2>&1 chvt /dev/tty1 emergency() { echo "The previos command failed. Starting shell to investigate." PS1='emergency# ' PS2='emergency> ' PS4='emergency+ ' /bin/bash } fsck -Asp ERR=$? if [ $((ERR & 2)) -ne 0 ] thenecho "*** changed mounted filesystem, reboot requried ***" sleep 3 reboot -f fi if [ $ERR -ne 0 ] thenemergency fi mount / -o remount,rw || emergency mount -a || emergency swapon -a || emergency exit 0
Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot
Package: systemd Version: 241-7~deb10u2 Severity: important Dear Maintainer, * What led up to the situation? apt-get install systemd-sysv && reboot * What was the outcome of this action? boots to emergency; journald spams the console with errors, can't login as nonroot, can't start X * What outcome did you expect instead? system boots all the way to X login It's not clear exactly what the problem is, but if I exit the emergency shell, booting hangs forever. The system will go back to working correctly immediately on reinstalling sysvinit-sysv and rebooting. -- Package-specific info: -- System Information: Debian Release: 10.2 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.19.0-6-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1), LANGUAGE=en_US (charmap=ISO-8859-1) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages systemd depends on: ii adduser 3.118 ii libacl1 2.2.53-4 ii libapparmor1 2.13.2-10 ii libaudit11:2.8.4-3 ii libblkid12.33.1-0.1 ii libc62.28-10 ii libcap2 1:2.25-2 ii libcryptsetup12 2:2.1.0-5+deb10u2 ii libgcrypt20 1.8.4-5 ii libgnutls30 3.6.7-4 ii libgpg-error01.35-1 ii libidn11 1.33-2.2 ii libip4tc01.8.2-4 ii libkmod2 26-1 ii liblz4-1 1.8.3-1 ii liblzma5 5.2.4-1 ii libmount12.33.1-0.1 ii libpam0g 1.3.1-5 ii libseccomp2 2.3.3-4 ii libselinux1 2.8-1+b1 ii libsystemd0 241-7~deb10u2 ii mount2.33.1-0.1 ii util-linux 2.33.1-0.1 Versions of packages systemd recommends: ii dbus 1.12.16-1 ii libpam-systemd-apt-holepunch [libpam-systemd] 1 Versions of packages systemd suggests: ii policykit-10.105-25 pn systemd-container Versions of packages systemd is related to: pn dracut ii initramfs-tools 0.133+deb10u1 ii udev 241-7~deb10u2 -- no debconf information [OVERRIDDEN] /etc/tmpfiles.d/screen-cleanup.conf -> /usr/lib/tmpfiles.d/screen-cleanup.conf --- /usr/lib/tmpfiles.d/screen-cleanup.conf 2017-07-01 05:07:57.0 -0700 +++ /etc/tmpfiles.d/screen-cleanup.conf 2019-02-24 13:04:03.007215314 -0800 @@ -1 +1 @@ -d /run/screen 0777 root utmp +d /run/screen 1777 root utmp [MASKED] /etc/systemd/system/bootlogs.service -> /lib/systemd/system/bootlogs.service [MASKED] /etc/systemd/system/bootmisc.service -> /lib/systemd/system/bootmisc.service [MASKED] /etc/systemd/system/checkfs.service -> /lib/systemd/system/checkfs.service [MASKED] /etc/systemd/system/checkroot-bootclean.service -> /lib/systemd/system/checkroot-bootclean.service [MASKED] /etc/systemd/system/checkroot.service -> /lib/systemd/system/checkroot.service [MASKED] /etc/systemd/system/halt.service -> /lib/systemd/system/halt.service [MASKED] /etc/systemd/system/hostname.service -> /lib/systemd/system/hostname.service [MASKED] /etc/systemd/system/killprocs.service -> /lib/systemd/system/killprocs.service [MASKED] /etc/systemd/system/mountall-bootclean.service -> /lib/systemd/system/mountall-bootclean.service [MASKED] /etc/systemd/system/mountall.service -> /lib/systemd/system/mountall.service [MASKED] /etc/systemd/system/mountdevsubfs.service -> /lib/systemd/system/mountdevsubfs.service [MASKED] /etc/systemd/system/mountkernfs.service -> /lib/systemd/system/mountkernfs.service [MASKED] /etc/systemd/system/mountnfs-bootclean.service -> /lib/systemd/system/mountnfs-bootclean.service [MASKED] /etc/systemd/system/mountnfs.service -> /lib/systemd/system/mountnfs.service [MASKED] /etc/systemd/system/rc.local.service -> /lib/systemd/system/rc.local.service [MASKED] /etc/systemd/system/reboot.service -> /lib/systemd/system/reboot.service [MASKED] /etc/systemd/system/rmnologin.service -> /lib/systemd/system/rmnologin.service [MASKED] /etc/systemd/system/sendsigs.service -> /lib/systemd/system/sendsigs.service [MASKED] /etc/systemd/system/single.service -> /lib/systemd/system/single.service [MASKED] /etc/systemd/system/umountfs.service -> /lib/systemd/system/umountfs.service [MASKED] /etc/systemd/system/umountnfs.service -> /lib/systemd/system/umountnfs.service [MASKED] /etc/systemd/system/umountroot.service -> /lib/systemd/system/umountroot.service [MASKED] /etc/systemd/system/urandom.service -> /lib/systemd/system/urandom.service [EXTENDED] /lib/systemd/system/rc-local.service -> /lib/systemd/system/rc-local.service.d/debian.conf [EXTENDED] /lib/systemd/system/systemd-resolved.service -> /lib/systemd/system/systemd-resolved.service.d/resolvconf.conf [EXTENDED] /lib/systemd/system/systemd-timesyncd.service ->