I still have to test this simulating a hung kernel, but the panic + no-reboot logic looks nice and solid. The crashdump part is still not fully clear to me, as I don't understand why we need to install linux-image-generic (the VM is already running a kernel, isn't it?) and why we need MODULES=most to the point of patching /etc/kernel/postinst.d/kdump-tools...
I found a typo (see inline comment). Diff comments: > diff --git a/examples/tests/crashdump.cfg b/examples/tests/crashdump.cfg > new file mode 100644 > index 0000000..e010961 > --- /dev/null > +++ b/examples/tests/crashdump.cfg > @@ -0,0 +1,19 @@ > +_install_crashdump: > + - &install_crashdump | > + command -v apt &>/dev/null && { > + DEBIAN_FRONTEND=noninteractive apt-get -qy install linux-image-generic > + debconf-set-selections <<< "kexec-tools kexec-tools/load_kexec > boolean true" Just a FTR note. The debconf question for this selection says: "Should kexec-tools handle reboots (sysvinit only)?", but it also works with systemd: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954290 > + debconf-set-selections <<< "kdump-tools kdump-tools/use_kdname > boolean true" > + DEBIAN_FRONTEND=noninteractive apt-get -qy install linux-crashdump; > + mkdir -p /var/lib/kdump > + # fix up crashdump post-inst to just put all of the modules in > + sed -i -e 's,MODULES=dep,MODULES=most,' > /etc/kernel/postinst.d/kdump-tools I have no doubt there is a good reason for wanting 'MODULES=most' instead of 'dep', but I can't see exactly why... > + kdump-config load > + kdump-config show > + } > + exit 0 > + > + > +early_commands: > + # run before other install commands > + 0000_aaaa_install_crashdump: ['bash', '-c', *install_crashdump] > diff --git a/tests/vmtests/__init__.py b/tests/vmtests/__init__.py > index 222adcc..e102b6d 100644 > --- a/tests/vmtests/__init__.py > +++ b/tests/vmtests/__init__.py > @@ -967,6 +968,25 @@ class VMBaseClass(TestCase): > for service in ["systemd.mask=snapd.seeded.service", > "systemd.mask=snapd.service"]]) > > + # We set guest kernel panic=1 to trigger immediate rebooot, combined typo (rebooot) > + # with the (xkvm) -no-reboot qemu parameter should prevent vmtests > from > + # wasting time in a soft-lockup loop. Add the params after the '---' > + # separator to extend the parameters to the target system as well. > + cmd.extend(["--no-reboot", "--append=panic=-1", > + "--append=softlockup_panic=1", > + "--append=hung_task_panic=1", > + "--append=nmi_watchdog=panic,1"]) > + > + # configure guest with crashdump to capture kernel failures for debug > + if cls.crashdump: > + # we need to install a kernel and modules so bump the memory by > 2g > + # for the ephemeral environment to hold it all > + cls.mem = int(cls.mem) + 2048 > + logger.info( > + 'Enabling linux-crashdump during install, mem += 2048 = %s', > + cls.mem) > + cmd.extend(["--append=crashkernel=384M-5000M:192M"]) > + > # getting resolvconf configured is only fixed in bionic > # the iscsi_auto handles resolvconf setup via call to > # configure_networking in initramfs -- https://code.launchpad.net/~raharper/curtin/+git/curtin/+merge/383805 Your team curtin developers is requested to review the proposed merge of ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master. -- Mailing list: https://launchpad.net/~curtin-dev Post to : curtin-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~curtin-dev More help : https://help.launchpad.net/ListHelp