Hi Gerald, I wasn't aware that you already started to work on/with
upstream stable - that's great!

I had a look at the backport at 
https://lore.kernel.org/stable/patch-1.thread-41918b.git-41918be365c0.your-ad-here.call-01600439945-ext-8991@work.hours/
 and it applied cleanly on current focal master-next.
So I've built a patched focal kernel - in addition to the above groovy kernel - 
and share it here as well for any further testing: 
https://people.canonical.com/~fheimes/lp1896726/

I just sent a patch request for groovy based on a cherry-pick from upstream:
https://lists.ubuntu.com/archives/kernel-team/2020-September/thread.html#113731
hence changing status for groovy to 'In Progress'.

The patch must land in groovy too, to avoid any potential regression
once it landed in focal, but not is not in groovy and someone upgrades
from focal to groovy...

I'll keep an eye on the upstream stable release process and try to keep
this bug in sync and updated, based on the upstream stable bug that will
eventually be opened by the kernel team...

I'll add the summary that I've added to the patch request for further
reference to the bug description here.


** Description changed:

+ Justification:
+ ==============
+ 
+ Secure KVM guest (using secure execution on Ubuntu Server 20.04 for s390x)
+ crashes happen from time to time during boot.
+ Such crashed guests ("reason=crashed" in the libvirt log) end up in hutoff 
state instead of crashed state (<on_crash> preserve is set).
+ The crash points to a kernel memory management problem, addressed by the 
following patch/fix.
+ The modifications touch common memory management code,
+ but it will have no effect to architectures other than s390x.
+ This is ensured by the fact that only s390 provides / implements the new 
helper functions.
+ And for s390x, this is actually a critical (and carefully tested) fix for a 
(previous) regression, so it can hardly get any more regressive.
+ The patch landed upstream in linux-next, is in depth discussed
+ at LKML https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1
+ and here 
https://lore.kernel.org/linux-arch/patch.git-943f1e5dcff2.your-ad-here.call-01599856292-ext-8676@work.hours/
+ and will soon land via the regular upstream stable release update for kernel 
5.4 in focal, too.
+ The process already started:
+ 
https://lore.kernel.org/stable/patch-1.thread-41918b.git-41918be365c0.your-ad-here.call-01600439945-ext-8991@work.hours/
+ 
+ Hence this cherry-pick from the upstream patch should be added to groovy
+ to avoid any potential regression in case the patch landed in focal via the 
upstream release update process,
+ but is not in groovy and someones upgrades from focal to groovy.
+ 
+ __________
+ 
  Secure Execution with Ubuntu 20.04, secure guest crash during boot from
  time to time, crashed guest went into Shufoff state instead of Crashed
  state (<on_crash>preserve is set), so I can't get a dump.
  
- libvirt log file:  
+ libvirt log file:
  2020-04-21T16:35:39.382999Z qemu-system-s390x: Guest says index 19608 is 
available
  2020-04-21 16:35:44.831+0000: shutting down, reason=crashed
-  
+ 
  ---uname output---
  Linux ubu204uclg1002 5.4.0-25-generic #29-Ubuntu SMP Fri Apr 17 15:05:32 UTC 
2020 s390x s390x s390x GNU/Linux
-  
- Machine Type = z15 8561 
-  
+ 
+ Machine Type = z15 8561
+ 
  ---Debugger---
  A debugger is not configured
-  
+ 
  ---Steps to Reproduce---
-  I have a setup with 72 KVM guests which I can start in secure or non-secure 
mode. Starting all of them in secure mode back to back results in a number of 
guests (4..8) in Shutoff state and reason=crashed in the libvirt log. I can 
manually start the guest again.... no problem. Different guests are failing.
+  I have a setup with 72 KVM guests which I can start in secure or non-secure 
mode. Starting all of them in secure mode back to back results in a number of 
guests (4..8) in Shutoff state and reason=crashed in the libvirt log. I can 
manually start the guest again.... no problem. Different guests are failing.
  Host and guests are on latest Ubuntu 20.04.
  
  The supposed fix (kernel memory management) has landed in Andrew Mortons mm
  tree
  
https://lore.kernel.org/mm-commits/20200916003608.ib4ln%25a...@linux-foundation.org/T/#u
  
  Please note: while this was found with secure execution, the bug is
  actually present for non-KVM workloads as well.
  
  The complete patch is this:
  
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=a338e69ba37286c0fc300ab7e6fa0227e6ca68b1

** Changed in: linux (Ubuntu Groovy)
       Status: Triaged => In Progress

** Changed in: ubuntu-z-systems
       Status: Incomplete => In Progress

** Changed in: ubuntu-z-systems
   Importance: Medium => Critical

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896726

Title:
  [UBUNTU 20.04.1] qemu (secure guest) crash due to gup_fast / dynamic
  page table folding issue

Status in Ubuntu on IBM z Systems:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Focal:
  Incomplete
Status in linux source package in Groovy:
  In Progress

Bug description:
  Justification:
  ==============

  Secure KVM guest (using secure execution on Ubuntu Server 20.04 for s390x)
  crashes happen from time to time during boot.
  Such crashed guests ("reason=crashed" in the libvirt log) end up in hutoff 
state instead of crashed state (<on_crash> preserve is set).
  The crash points to a kernel memory management problem, addressed by the 
following patch/fix.
  The modifications touch common memory management code,
  but it will have no effect to architectures other than s390x.
  This is ensured by the fact that only s390 provides / implements the new 
helper functions.
  And for s390x, this is actually a critical (and carefully tested) fix for a 
(previous) regression, so it can hardly get any more regressive.
  The patch landed upstream in linux-next, is in depth discussed
  at LKML https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1
  and here 
https://lore.kernel.org/linux-arch/patch.git-943f1e5dcff2.your-ad-here.call-01599856292-ext-8676@work.hours/
  and will soon land via the regular upstream stable release update for kernel 
5.4 in focal, too.
  The process already started:
  
https://lore.kernel.org/stable/patch-1.thread-41918b.git-41918be365c0.your-ad-here.call-01600439945-ext-8991@work.hours/

  Hence this cherry-pick from the upstream patch should be added to groovy
  to avoid any potential regression in case the patch landed in focal via the 
upstream release update process,
  but is not in groovy and someones upgrades from focal to groovy.

  __________

  Secure Execution with Ubuntu 20.04, secure guest crash during boot
  from time to time, crashed guest went into Shufoff state instead of
  Crashed state (<on_crash>preserve is set), so I can't get a dump.

  libvirt log file:
  2020-04-21T16:35:39.382999Z qemu-system-s390x: Guest says index 19608 is 
available
  2020-04-21 16:35:44.831+0000: shutting down, reason=crashed

  ---uname output---
  Linux ubu204uclg1002 5.4.0-25-generic #29-Ubuntu SMP Fri Apr 17 15:05:32 UTC 
2020 s390x s390x s390x GNU/Linux

  Machine Type = z15 8561

  ---Debugger---
  A debugger is not configured

  ---Steps to Reproduce---
   I have a setup with 72 KVM guests which I can start in secure or non-secure 
mode. Starting all of them in secure mode back to back results in a number of 
guests (4..8) in Shutoff state and reason=crashed in the libvirt log. I can 
manually start the guest again.... no problem. Different guests are failing.
  Host and guests are on latest Ubuntu 20.04.

  The supposed fix (kernel memory management) has landed in Andrew Mortons mm
  tree
  
https://lore.kernel.org/mm-commits/20200916003608.ib4ln%25a...@linux-foundation.org/T/#u

  Please note: while this was found with secure execution, the bug is
  actually present for non-KVM workloads as well.

  The complete patch is this:
  
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=a338e69ba37286c0fc300ab7e6fa0227e6ca68b1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1896726/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to