Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot

2020-01-12 Thread Joshua Hudson
Hey I got lucky with a google search and found what would be a dupe if
it were ever finished.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=805821

Somebody else managed to track the problem down.

For some reason systemd is only watching for /dev/dm-* rather than the
long dmraid device names. This is the equivalent of watching for
/dev/sda1 instead of the UUID.  Nikita Youshchenko:

> This way boot is ok (i.e. no delays and swap is enabled). But this is ugly 
> and unreliable.

This does work when there's only one raid array in the OS, which is
most of the time. These are two real devices (no symlinks) with the
same major and minor numbers.



Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot

2020-01-12 Thread Michael Biebl
Am 13.01.20 um 01:10 schrieb Joshua Hudson:
> I double-checked with init=/bin/sh. All the devices already exist in
> /dev/mapper before /linuxrc shuts down the initrd copy of udev.
> systemd will never see the activation of a udev device because that's
> already done. They're all active.
> 

Well, the udev database from the initrd is transferred to the real root
(via /run/udev)
So I don't see how this explains the problem.



signature.asc
Description: OpenPGP digital signature


Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot

2020-01-12 Thread Joshua Hudson
I double-checked with init=/bin/sh. All the devices already exist in
/dev/mapper before /linuxrc shuts down the initrd copy of udev.
systemd will never see the activation of a udev device because that's
already done. They're all active.



Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot

2020-01-12 Thread Joshua Hudson
That turned out to be really easy to track down. The reason you aren't
getting any notification of the raid array being activated is it was
already activated in initrd. It will not be activated a second time.

dmraid has some interesting principles in play here.

1) dmraid has its own idea of persistent, path-independent names of devices.
2) dmraid is incompatible with UUID= or LABEL=; all users of dmraid
must have the raid device names in /etc/fstab and on the kernel
command line. GRUB_LINUX_DISABLE_UUID *must* be used.
3) Under normal operation, dmraid is included in initrd as soon as it
is installed on the system.
4) If dmraid is installed in initrd, udev will discover the dmraid
devices before / is mounted. If udev discovers the dmraid devices, it
will start the system.
5) Redundant activations are not possible.

So the effect of all this is really simple. systemd can just assume
that a dmraid device under /dev/mapper is active if the device node
exists at all, and on any reasonable system it doesn't even need to
poll the device path; it can check *once* and know the success or
failure immediately.



Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot

2020-01-12 Thread Michael Biebl
Control: retitle -1 dmraid volumes are not marked as active, resulting
in time out
Am 12.01.20 um 20:49 schrieb Joshua Hudson:
>>> Dec 31 08:27:23 nova dmraid-activate[554]: ERROR: Cannot retrieve RAID
> set information for isw_cfbejbfeib_Volume1
>> Have you further investigated those error messages?
> 
> I have now investigated and it's clearly spurious.
> 
> udev tries to run the following commands:
> 
> /sbin/dmraid-activate /dev/sda
> /sbin/dmraid-activate /dev/sdb
> /sbin/dmraid-activate sda
> /sbin/dmraid-activate sdb
> 
> The first two are correct and succeed. The second two are not correct
> and fail. The failure results in log spam.
> 
>> Have you tried with mdadm instead?
> 
> mdadm doesn't recognize the raid descriptor.
> 
>> udevadm info /dev/mapper/isw_cfbejbfeib_Volume14
> 
> output of this command for this device and all similar devices attached.
> 

Bad news is, that dmraid is basically dead and unmaintained since quite
a while. It's unfortunate, that mdadm doesn't have support for your RAID
controller.
Unfortunately I'm not sure if I can help you further at this point,
since I'm not familiar with dmraid.
It would probably be a good idea to reassign this bug report to dmraid.
But since dmraid upstream is dead and the last Debian upload is 3 years
old, I'm not sure you'll get any help there.

Maybe you can hack the /sbin/dmraid-activate script and call dmraid with
the -d switch to get a more verbose debug output. Maybe that provides
you with some clues.



signature.asc
Description: OpenPGP digital signature


Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot

2020-01-12 Thread Joshua Hudson
>> Dec 31 08:27:23 nova dmraid-activate[554]: ERROR: Cannot retrieve RAID
set information for isw_cfbejbfeib_Volume1
> Have you further investigated those error messages?

I have now investigated and it's clearly spurious.

udev tries to run the following commands:

/sbin/dmraid-activate /dev/sda
/sbin/dmraid-activate /dev/sdb
/sbin/dmraid-activate sda
/sbin/dmraid-activate sdb

The first two are correct and succeed. The second two are not correct
and fail. The failure results in log spam.

> Have you tried with mdadm instead?

mdadm doesn't recognize the raid descriptor.

> udevadm info /dev/mapper/isw_cfbejbfeib_Volume14

output of this command for this device and all similar devices attached.
P: /devices/virtual/block/dm-0
N: dm-0
L: 0
S: disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume1
S: disk/by-id/dm-name-isw_cfbejbfeib_Volume1
S: android0
E: DEVPATH=/devices/virtual/block/dm-0
E: DEVNAME=/dev/dm-0
E: DEVTYPE=disk
E: MAJOR=254
E: MINOR=0
E: SUBSYSTEM=block
E: USEC_INITIALIZED=1701034
E: DM_UDEV_DISABLE_DM_RULES_FLAG=1
E: DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG=1
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_UDEV_RULES=1
E: DM_UDEV_RULES_VSN=2
E: DM_ACTIVATION=1
E: DM_NAME=isw_cfbejbfeib_Volume1
E: DM_UUID=DMRAID-isw_cfbejbfeib_Volume1
E: DM_SUSPENDED=0
E: ID_PART_TABLE_UUID=373c9bd5
E: ID_PART_TABLE_TYPE=dos
E: DEVLINKS=/dev/disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume1 /dev/disk/by-id/dm-name-isw_cfbejbfeib_Volume1 /dev/android0
E: TAGS=:systemd:

P: /devices/virtual/block/dm-1
N: dm-1
L: 0
S: android1
S: disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume11
S: disk/by-id/dm-name-isw_cfbejbfeib_Volume11
S: disk/by-uuid/d847c628-aa10-4bef-92b6-72ebacc07d7b
E: DEVPATH=/devices/virtual/block/dm-1
E: DEVNAME=/dev/dm-1
E: DEVTYPE=disk
E: MAJOR=254
E: MINOR=1
E: SUBSYSTEM=block
E: USEC_INITIALIZED=1707100
E: DM_UDEV_DISABLE_DM_RULES_FLAG=1
E: DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG=1
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_UDEV_RULES=1
E: DM_UDEV_RULES_VSN=2
E: DM_ACTIVATION=1
E: DM_NAME=isw_cfbejbfeib_Volume11
E: DM_UUID=DMRAID-isw_cfbejbfeib_Volume11
E: DM_SUSPENDED=0
E: ID_FS_UUID=d847c628-aa10-4bef-92b6-72ebacc07d7b
E: ID_FS_UUID_ENC=d847c628-aa10-4bef-92b6-72ebacc07d7b
E: ID_FS_VERSION=1.0
E: ID_FS_TYPE=ext3
E: ID_FS_USAGE=filesystem
E: DEVLINKS=/dev/android1 /dev/disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume11 /dev/disk/by-id/dm-name-isw_cfbejbfeib_Volume11 /dev/disk/by-uuid/d847c628-aa10-4bef-92b6-72ebacc07d7b
E: TAGS=:systemd:

P: /devices/virtual/block/dm-2
N: dm-2
L: 0
S: disk/by-id/dm-name-isw_cfbejbfeib_Volume12
S: disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume12
S: disk/by-uuid/baad791f-0ba2-4c87-92a1-0cd0ef4f7301
S: android2
E: DEVPATH=/devices/virtual/block/dm-2
E: DEVNAME=/dev/dm-2
E: DEVTYPE=disk
E: MAJOR=254
E: MINOR=2
E: SUBSYSTEM=block
E: USEC_INITIALIZED=1707179
E: DM_UDEV_DISABLE_DM_RULES_FLAG=1
E: DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG=1
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_UDEV_RULES=1
E: DM_UDEV_RULES_VSN=2
E: DM_ACTIVATION=1
E: DM_NAME=isw_cfbejbfeib_Volume12
E: DM_UUID=DMRAID-isw_cfbejbfeib_Volume12
E: DM_SUSPENDED=0
E: ID_FS_UUID=baad791f-0ba2-4c87-92a1-0cd0ef4f7301
E: ID_FS_UUID_ENC=baad791f-0ba2-4c87-92a1-0cd0ef4f7301
E: ID_FS_VERSION=1
E: ID_FS_TYPE=swap
E: ID_FS_USAGE=other
E: DEVLINKS=/dev/disk/by-id/dm-name-isw_cfbejbfeib_Volume12 /dev/disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume12 /dev/disk/by-uuid/baad791f-0ba2-4c87-92a1-0cd0ef4f7301 /dev/android2
E: TAGS=:systemd:

P: /devices/virtual/block/dm-3
N: dm-3
L: 0
S: android3
S: disk/by-id/dm-name-isw_cfbejbfeib_Volume13
S: disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume13
S: disk/by-uuid/5744b8b7-dcbf-48c4-868e-fb71fa65d456
E: DEVPATH=/devices/virtual/block/dm-3
E: DEVNAME=/dev/dm-3
E: DEVTYPE=disk
E: MAJOR=254
E: MINOR=3
E: SUBSYSTEM=block
E: USEC_INITIALIZED=1707545
E: DM_UDEV_DISABLE_DM_RULES_FLAG=1
E: DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG=1
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_UDEV_RULES=1
E: DM_UDEV_RULES_VSN=2
E: DM_ACTIVATION=1
E: DM_NAME=isw_cfbejbfeib_Volume13
E: DM_UUID=DMRAID-isw_cfbejbfeib_Volume13
E: DM_SUSPENDED=0
E: ID_FS_UUID=5744b8b7-dcbf-48c4-868e-fb71fa65d456
E: ID_FS_UUID_ENC=5744b8b7-dcbf-48c4-868e-fb71fa65d456
E: ID_FS_VERSION=1.0
E: ID_FS_TYPE=ext3
E: ID_FS_USAGE=filesystem
E: DEVLINKS=/dev/android3 /dev/disk/by-id/dm-name-isw_cfbejbfeib_Volume13 /dev/disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume13 /dev/disk/by-uuid/5744b8b7-dcbf-48c4-868e-fb71fa65d456
E: TAGS=:systemd:

P: /devices/virtual/block/dm-4
N: dm-4
L: 0
S: disk/by-id/dm-uuid-DMRAID-isw_cfbejbfeib_Volume14
S: disk/by-uuid/c03a09e6-ff90-4803-bd7c-78e34e99457e
S: disk/by-id/dm-name-isw_cfbejbfeib_Volume14
S: android4
E: DEVPATH=/devices/virtual/block/dm-4
E: DEVNAME=/dev/dm-4
E: DEVTYPE=disk
E: MAJOR=254
E: MINOR=4
E: SUBSYSTEM=block
E: USEC_INITIALIZED=1708143
E: DM_UDEV_DISABLE_DM_RULES_FLAG=1
E: DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG=1
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_UDEV_RULES=1
E: DM_UDEV_RULES_VSN=2
E: DM_ACTIVATION=1
E: 

Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot

2020-01-12 Thread Michael Biebl
Thanks for the log.

Seems there is something at odds with the RAID configuration:

Dec 31 08:27:23 nova dmraid-activate[553]: ERROR: Cannot retrieve RAID
set information for isw_cfbejbfeib_Volume1
Dec 31 08:27:23 nova dmraid-activate[554]: ERROR: Cannot retrieve RAID
set information for isw_cfbejbfeib_Volume1

Have you further investigated those error messages?

This smells like a dmraid issue to me. dmraid needs to properly activate
your volumes and mark them as ready, which apparently doesn't happen.

Have you tried with mdadm instead?





signature.asc
Description: OpenPGP digital signature


Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot

2020-01-12 Thread Michael Biebl
Am 31.12.19 um 01:24 schrieb Joshua:
> Package: systemd
> Version: 241-7~deb10u2
> Severity: important
> 
> Dear Maintainer,
> 
>* What led up to the situation?
> 
> apt-get install systemd-sysv && reboot
> 
>* What was the outcome of this action?
> 
> boots to emergency; journald spams the console with errors, can't login as 
> nonroot, can't start X

When you are dropped into the emergency shell, please run

udevadm info /dev/mapper/isw_cfbejbfeib_Volume14

and attach the output to the bug report.

If you have a locked-down root user without password, you can also
enable the debug-shell on tty9, by adding
systemd.debug_shell=true to the kernel command line.



signature.asc
Description: OpenPGP digital signature


Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot

2019-12-30 Thread Michael Biebl
Control: tags -1 + moreinfo

Please provide a verbose debug log:
https://freedesktop.org/wiki/Software/systemd/Debugging/#index1h1




signature.asc
Description: OpenPGP digital signature


Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot

2019-12-30 Thread Joshua Hudson
To let the gravity of the but sink in, this is the pair of scripts
that I jammed in front of systemd that got it to somehow behave
itself:

/sbin/init:
#!/bin/sh

echo "INIT: rc init is in startup"

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

/etc/rc.earlyboot || PS1='init# ' PS2='init >' PS4='init +' exec /bin/sh -i

exec /lib/systemd/systemd

/etc/rc.earlyboot:
#!/bin/bash

exec 0<>/dev/tty1 1>&0 2>&1
chvt /dev/tty1

emergency()
{
echo "The previos command failed. Starting shell to investigate."
PS1='emergency# ' PS2='emergency> ' PS4='emergency+ ' /bin/bash
}

fsck -Asp
ERR=$?

if [ $((ERR & 2)) -ne 0 ]
thenecho "*** changed mounted filesystem, reboot requried ***"
sleep 3
reboot -f
fi

if [ $ERR -ne 0 ]
thenemergency
fi

mount / -o remount,rw || emergency
mount -a || emergency
swapon -a || emergency
exit 0



Bug#947806: systemd doesn't like my raid, times out waiting for online partitions to come online, and can't continue boot

2019-12-30 Thread Joshua
Package: systemd
Version: 241-7~deb10u2
Severity: important

Dear Maintainer,

   * What led up to the situation?

apt-get install systemd-sysv && reboot

   * What was the outcome of this action?

boots to emergency; journald spams the console with errors, can't login as 
nonroot, can't start X

   * What outcome did you expect instead?

system boots all the way to X login

It's not clear exactly what the problem is, but if I exit the emergency shell, 
booting hangs forever.
The system will go back to working correctly immediately on reinstalling 
sysvinit-sysv and rebooting.

-- Package-specific info:

-- System Information:
Debian Release: 10.2
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.19.0-6-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1), LANGUAGE=en_US 
(charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages systemd depends on:
ii  adduser  3.118
ii  libacl1  2.2.53-4
ii  libapparmor1 2.13.2-10
ii  libaudit11:2.8.4-3
ii  libblkid12.33.1-0.1
ii  libc62.28-10
ii  libcap2  1:2.25-2
ii  libcryptsetup12  2:2.1.0-5+deb10u2
ii  libgcrypt20  1.8.4-5
ii  libgnutls30  3.6.7-4
ii  libgpg-error01.35-1
ii  libidn11 1.33-2.2
ii  libip4tc01.8.2-4
ii  libkmod2 26-1
ii  liblz4-1 1.8.3-1
ii  liblzma5 5.2.4-1
ii  libmount12.33.1-0.1
ii  libpam0g 1.3.1-5
ii  libseccomp2  2.3.3-4
ii  libselinux1  2.8-1+b1
ii  libsystemd0  241-7~deb10u2
ii  mount2.33.1-0.1
ii  util-linux   2.33.1-0.1

Versions of packages systemd recommends:
ii  dbus   1.12.16-1
ii  libpam-systemd-apt-holepunch [libpam-systemd]  1

Versions of packages systemd suggests:
ii  policykit-10.105-25
pn  systemd-container  

Versions of packages systemd is related to:
pn  dracut   
ii  initramfs-tools  0.133+deb10u1
ii  udev 241-7~deb10u2

-- no debconf information
[OVERRIDDEN] /etc/tmpfiles.d/screen-cleanup.conf -> 
/usr/lib/tmpfiles.d/screen-cleanup.conf

--- /usr/lib/tmpfiles.d/screen-cleanup.conf 2017-07-01 05:07:57.0 
-0700
+++ /etc/tmpfiles.d/screen-cleanup.conf 2019-02-24 13:04:03.007215314 -0800
@@ -1 +1 @@
-d /run/screen 0777 root utmp
+d /run/screen 1777 root utmp

[MASKED] /etc/systemd/system/bootlogs.service -> 
/lib/systemd/system/bootlogs.service
[MASKED] /etc/systemd/system/bootmisc.service -> 
/lib/systemd/system/bootmisc.service
[MASKED] /etc/systemd/system/checkfs.service -> 
/lib/systemd/system/checkfs.service
[MASKED] /etc/systemd/system/checkroot-bootclean.service -> 
/lib/systemd/system/checkroot-bootclean.service
[MASKED] /etc/systemd/system/checkroot.service -> 
/lib/systemd/system/checkroot.service
[MASKED] /etc/systemd/system/halt.service -> 
/lib/systemd/system/halt.service
[MASKED] /etc/systemd/system/hostname.service -> 
/lib/systemd/system/hostname.service
[MASKED] /etc/systemd/system/killprocs.service -> 
/lib/systemd/system/killprocs.service
[MASKED] /etc/systemd/system/mountall-bootclean.service -> 
/lib/systemd/system/mountall-bootclean.service
[MASKED] /etc/systemd/system/mountall.service -> 
/lib/systemd/system/mountall.service
[MASKED] /etc/systemd/system/mountdevsubfs.service -> 
/lib/systemd/system/mountdevsubfs.service
[MASKED] /etc/systemd/system/mountkernfs.service -> 
/lib/systemd/system/mountkernfs.service
[MASKED] /etc/systemd/system/mountnfs-bootclean.service -> 
/lib/systemd/system/mountnfs-bootclean.service
[MASKED] /etc/systemd/system/mountnfs.service -> 
/lib/systemd/system/mountnfs.service
[MASKED] /etc/systemd/system/rc.local.service -> 
/lib/systemd/system/rc.local.service
[MASKED] /etc/systemd/system/reboot.service -> 
/lib/systemd/system/reboot.service
[MASKED] /etc/systemd/system/rmnologin.service -> 
/lib/systemd/system/rmnologin.service
[MASKED] /etc/systemd/system/sendsigs.service -> 
/lib/systemd/system/sendsigs.service
[MASKED] /etc/systemd/system/single.service -> 
/lib/systemd/system/single.service
[MASKED] /etc/systemd/system/umountfs.service -> 
/lib/systemd/system/umountfs.service
[MASKED] /etc/systemd/system/umountnfs.service -> 
/lib/systemd/system/umountnfs.service
[MASKED] /etc/systemd/system/umountroot.service -> 
/lib/systemd/system/umountroot.service
[MASKED] /etc/systemd/system/urandom.service -> 
/lib/systemd/system/urandom.service
[EXTENDED]   /lib/systemd/system/rc-local.service -> 
/lib/systemd/system/rc-local.service.d/debian.conf
[EXTENDED]   /lib/systemd/system/systemd-resolved.service -> 
/lib/systemd/system/systemd-resolved.service.d/resolvconf.conf
[EXTENDED]   /lib/systemd/system/systemd-timesyncd.service ->