I still have to test this simulating a hung kernel, but the panic + no-reboot 
logic looks nice and solid. The crashdump part is still not fully clear to me, 
as I don't understand why we need to install linux-image-generic (the VM is 
already running a kernel, isn't it?) and why we need MODULES=most to the point 
of patching /etc/kernel/postinst.d/kdump-tools...

I found a typo (see inline comment).

Diff comments:

> diff --git a/examples/tests/crashdump.cfg b/examples/tests/crashdump.cfg
> new file mode 100644
> index 0000000..e010961
> --- /dev/null
> +++ b/examples/tests/crashdump.cfg
> @@ -0,0 +1,19 @@
> +_install_crashdump:
> + - &install_crashdump |
> +   command -v apt &>/dev/null && {
> +       DEBIAN_FRONTEND=noninteractive apt-get -qy install linux-image-generic
> +       debconf-set-selections <<< "kexec-tools  kexec-tools/load_kexec  
> boolean true"

Just a FTR note. The debconf question for this selection says: "Should 
kexec-tools handle reboots (sysvinit only)?", but it also works with systemd:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954290

> +       debconf-set-selections <<< "kdump-tools  kdump-tools/use_kdname  
> boolean true"
> +       DEBIAN_FRONTEND=noninteractive apt-get -qy install linux-crashdump;
> +       mkdir -p /var/lib/kdump
> +       # fix up crashdump post-inst to just put all of the modules in
> +       sed -i -e 's,MODULES=dep,MODULES=most,' 
> /etc/kernel/postinst.d/kdump-tools

I have no doubt there is a good reason for wanting 'MODULES=most' instead of 
'dep', but I can't see exactly why...

> +       kdump-config load
> +       kdump-config show
> +    }
> +    exit 0
> +
> +
> +early_commands:
> +  # run before other install commands
> +  0000_aaaa_install_crashdump: ['bash', '-c', *install_crashdump]
> diff --git a/tests/vmtests/__init__.py b/tests/vmtests/__init__.py
> index 222adcc..e102b6d 100644
> --- a/tests/vmtests/__init__.py
> +++ b/tests/vmtests/__init__.py
> @@ -967,6 +968,25 @@ class VMBaseClass(TestCase):
>                      for service in ["systemd.mask=snapd.seeded.service",
>                                      "systemd.mask=snapd.service"]])
>  
> +        # We set guest kernel panic=1 to trigger immediate rebooot, combined

typo (rebooot)

> +        # with the (xkvm) -no-reboot qemu parameter should prevent vmtests 
> from
> +        # wasting time in a soft-lockup loop. Add the params after the '---'
> +        # separator to extend the parameters to the target system as well.
> +        cmd.extend(["--no-reboot", "--append=panic=-1",
> +                    "--append=softlockup_panic=1",
> +                    "--append=hung_task_panic=1",
> +                    "--append=nmi_watchdog=panic,1"])
> +
> +        # configure guest with crashdump to capture kernel failures for debug
> +        if cls.crashdump:
> +            # we need to install a kernel and modules so bump the memory by 
> 2g
> +            # for the ephemeral environment to hold it all
> +            cls.mem = int(cls.mem) + 2048
> +            logger.info(
> +                'Enabling linux-crashdump during install, mem += 2048 = %s',
> +                cls.mem)
> +            cmd.extend(["--append=crashkernel=384M-5000M:192M"])
> +
>          # getting resolvconf configured is only fixed in bionic
>          # the iscsi_auto handles resolvconf setup via call to
>          # configure_networking in initramfs


-- 
https://code.launchpad.net/~raharper/curtin/+git/curtin/+merge/383805
Your team curtin developers is requested to review the proposed merge of 
~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master.

-- 
Mailing list: https://launchpad.net/~curtin-dev
Post to     : curtin-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~curtin-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to