Hi Gerald, I wasn't aware that you already started to work on/with upstream stable - that's great!
I had a look at the backport at https://lore.kernel.org/stable/patch-1.thread-41918b.git-41918be365c0.your-ad-here.call-01600439945-ext-8991@work.hours/ and it applied cleanly on current focal master-next. So I've built a patched focal kernel - in addition to the above groovy kernel - and share it here as well for any further testing: https://people.canonical.com/~fheimes/lp1896726/ I just sent a patch request for groovy based on a cherry-pick from upstream: https://lists.ubuntu.com/archives/kernel-team/2020-September/thread.html#113731 hence changing status for groovy to 'In Progress'. The patch must land in groovy too, to avoid any potential regression once it landed in focal, but not is not in groovy and someone upgrades from focal to groovy... I'll keep an eye on the upstream stable release process and try to keep this bug in sync and updated, based on the upstream stable bug that will eventually be opened by the kernel team... I'll add the summary that I've added to the patch request for further reference to the bug description here. ** Description changed: + Justification: + ============== + + Secure KVM guest (using secure execution on Ubuntu Server 20.04 for s390x) + crashes happen from time to time during boot. + Such crashed guests ("reason=crashed" in the libvirt log) end up in hutoff state instead of crashed state (<on_crash> preserve is set). + The crash points to a kernel memory management problem, addressed by the following patch/fix. + The modifications touch common memory management code, + but it will have no effect to architectures other than s390x. + This is ensured by the fact that only s390 provides / implements the new helper functions. + And for s390x, this is actually a critical (and carefully tested) fix for a (previous) regression, so it can hardly get any more regressive. + The patch landed upstream in linux-next, is in depth discussed + at LKML https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 + and here https://lore.kernel.org/linux-arch/patch.git-943f1e5dcff2.your-ad-here.call-01599856292-ext-8676@work.hours/ + and will soon land via the regular upstream stable release update for kernel 5.4 in focal, too. + The process already started: + https://lore.kernel.org/stable/patch-1.thread-41918b.git-41918be365c0.your-ad-here.call-01600439945-ext-8991@work.hours/ + + Hence this cherry-pick from the upstream patch should be added to groovy + to avoid any potential regression in case the patch landed in focal via the upstream release update process, + but is not in groovy and someones upgrades from focal to groovy. + + __________ + Secure Execution with Ubuntu 20.04, secure guest crash during boot from time to time, crashed guest went into Shufoff state instead of Crashed state (<on_crash>preserve is set), so I can't get a dump. - libvirt log file: + libvirt log file: 2020-04-21T16:35:39.382999Z qemu-system-s390x: Guest says index 19608 is available 2020-04-21 16:35:44.831+0000: shutting down, reason=crashed - + ---uname output--- Linux ubu204uclg1002 5.4.0-25-generic #29-Ubuntu SMP Fri Apr 17 15:05:32 UTC 2020 s390x s390x s390x GNU/Linux - - Machine Type = z15 8561 - + + Machine Type = z15 8561 + ---Debugger--- A debugger is not configured - + ---Steps to Reproduce--- - I have a setup with 72 KVM guests which I can start in secure or non-secure mode. Starting all of them in secure mode back to back results in a number of guests (4..8) in Shutoff state and reason=crashed in the libvirt log. I can manually start the guest again.... no problem. Different guests are failing. + I have a setup with 72 KVM guests which I can start in secure or non-secure mode. Starting all of them in secure mode back to back results in a number of guests (4..8) in Shutoff state and reason=crashed in the libvirt log. I can manually start the guest again.... no problem. Different guests are failing. Host and guests are on latest Ubuntu 20.04. The supposed fix (kernel memory management) has landed in Andrew Mortons mm tree https://lore.kernel.org/mm-commits/20200916003608.ib4ln%25a...@linux-foundation.org/T/#u Please note: while this was found with secure execution, the bug is actually present for non-KVM workloads as well. The complete patch is this: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=a338e69ba37286c0fc300ab7e6fa0227e6ca68b1 ** Changed in: linux (Ubuntu Groovy) Status: Triaged => In Progress ** Changed in: ubuntu-z-systems Status: Incomplete => In Progress ** Changed in: ubuntu-z-systems Importance: Medium => Critical -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896726 Title: [UBUNTU 20.04.1] qemu (secure guest) crash due to gup_fast / dynamic page table folding issue Status in Ubuntu on IBM z Systems: In Progress Status in linux package in Ubuntu: In Progress Status in linux source package in Focal: Incomplete Status in linux source package in Groovy: In Progress Bug description: Justification: ============== Secure KVM guest (using secure execution on Ubuntu Server 20.04 for s390x) crashes happen from time to time during boot. Such crashed guests ("reason=crashed" in the libvirt log) end up in hutoff state instead of crashed state (<on_crash> preserve is set). The crash points to a kernel memory management problem, addressed by the following patch/fix. The modifications touch common memory management code, but it will have no effect to architectures other than s390x. This is ensured by the fact that only s390 provides / implements the new helper functions. And for s390x, this is actually a critical (and carefully tested) fix for a (previous) regression, so it can hardly get any more regressive. The patch landed upstream in linux-next, is in depth discussed at LKML https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 and here https://lore.kernel.org/linux-arch/patch.git-943f1e5dcff2.your-ad-here.call-01599856292-ext-8676@work.hours/ and will soon land via the regular upstream stable release update for kernel 5.4 in focal, too. The process already started: https://lore.kernel.org/stable/patch-1.thread-41918b.git-41918be365c0.your-ad-here.call-01600439945-ext-8991@work.hours/ Hence this cherry-pick from the upstream patch should be added to groovy to avoid any potential regression in case the patch landed in focal via the upstream release update process, but is not in groovy and someones upgrades from focal to groovy. __________ Secure Execution with Ubuntu 20.04, secure guest crash during boot from time to time, crashed guest went into Shufoff state instead of Crashed state (<on_crash>preserve is set), so I can't get a dump. libvirt log file: 2020-04-21T16:35:39.382999Z qemu-system-s390x: Guest says index 19608 is available 2020-04-21 16:35:44.831+0000: shutting down, reason=crashed ---uname output--- Linux ubu204uclg1002 5.4.0-25-generic #29-Ubuntu SMP Fri Apr 17 15:05:32 UTC 2020 s390x s390x s390x GNU/Linux Machine Type = z15 8561 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- I have a setup with 72 KVM guests which I can start in secure or non-secure mode. Starting all of them in secure mode back to back results in a number of guests (4..8) in Shutoff state and reason=crashed in the libvirt log. I can manually start the guest again.... no problem. Different guests are failing. Host and guests are on latest Ubuntu 20.04. The supposed fix (kernel memory management) has landed in Andrew Mortons mm tree https://lore.kernel.org/mm-commits/20200916003608.ib4ln%25a...@linux-foundation.org/T/#u Please note: while this was found with secure execution, the bug is actually present for non-KVM workloads as well. The complete patch is this: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=a338e69ba37286c0fc300ab7e6fa0227e6ca68b1 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1896726/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp