** Changed in: ubuntu-z-systems Status: Fix Committed => Fix Released
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1979296 Title: [UBUNTU 20.04] Include patches to avoid self-detected stall with Secure Execution Status in Ubuntu on IBM z Systems: Fix Released Status in linux package in Ubuntu: Invalid Status in linux source package in Focal: Fix Released Status in linux source package in Jammy: Fix Released Bug description: SRU Justification: ================== [Impact] * On IBM Z secure execution environments under heavy load (means with over-committed resources - KVM guests) rcu_sched self-detected stalls can occur, which lead to LPAR crashes. [Fix] * 57c5df13eca4 57c5df13eca4017ed28f9375dc1d246ec0f54217 "KVM: s390: pv: add macros for UVC CC values" * 1e2aa46de526 1e2aa46de526a5adafe580bca4c25856bb06f09e "KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm" * f0a1a0615a6f f0a1a0615a6ff6d38af2c65a522698fb4bb85df6 "KVM: s390: pv: avoid stalls when making pages secure" [Test Plan] * An IBM z15 or LinuxONE III LPAR with FC 115 (secure execution) enabled is required. * Installation of Ubuntu Server 20.04 LTS (18.04 with hwe-5.4) or 22.04 LTS on top. * Install a kernel that incl. the above two patches/commits * Bring the system under high load with KVM guests. * Monitor dmesg for 'rcu_sched self-detected stalls' and/or look for crashes. * Due to hardware requirements this test needs to be conducted by IBM. [Where problems could occur] * The definition from 57c5df13eca4 are missing in both jammy and focal, but shouldn't harm. * The change in 1e2aa46de526 only uses uv_call_sched instead of just uv_call, which should lead to a snappier system under high load, but may consume overall some more cycles. * With f0a1a0615a6f the uv_call_sched cannot simply replace uv_call, due to locks being held. * Instead __uv_call is replacing uv_call, which does not loop. * But due to these changes of the (uv) calls, - in case erroneous - they may lead to wrong states, and even broken ultravisor calls and with that broken secure execution (SE). * As a side effect the uv might no longer loop over all pages, and in worst case leaving some unprotected. * All this is s390x-only functionality, that is only available on IBM z15 / LinuxONE III systems and newer, and only is the optional feature 'FC 115' in place, which is limited to 'secure-execution' workloads. [Other Info] * Patches are upstream accepted with kernel 5.16. * Commit 1e2aa46de526 is already included in jammy but 57c5df13eca4 and f0a1a0615a6f are missing. * Focal requires all 3 commits 57c5df13eca4, 1e2aa46de526 and f0a1a0615a6f. * Since impish is very close to it's EOL, it's not covered by this SRU. __________ ---Problem Description--- rcu_sched self-detected stall with Secure Execution When the system is busy and additional Secure Execution guests are started, the LPAR crashes. Christian Borntraeger looked at the stack trace and identified two commits which should fix the issue: 1e2aa46de526a5adafe580bca4c25856bb06f09e and f0a1a0615a6ff6d38af2c65a522698fb4bb85df6 Please include these two fixes into 20.04, and 18.04 HWE. Here the stack trace: [592792.725078] rcu: INFO: rcu_sched self-detected stall on CPU [592792.725089] rcu: 4-....: (2099 ticks this GP) idle=7d2/1/0x4000000000000002 softirq=3920041/3920042 fqs=984 [592792.725133] (t=2100 jiffies g=26268505 q=410280) [592792.725135] Task dump for CPU 4: [592792.725137] qemu-system-s39 R running task 0 2557923 1644255 0x06000004 [592792.725139] Call Trace: [592792.725146] ([<000000566e2dcf52>] show_stack+0x7a/0xc0) [592792.725150] [<000000566dab696c>] sched_show_task.part.0+0xdc/0x100 [592792.725151] [<000000566e2df248>] rcu_dump_cpu_stacks+0xc0/0x100 [592792.725154] [<000000566db0510c>] rcu_sched_clock_irq+0x75c/0x980 [592792.725156] [<000000566db1326c>] update_process_times+0x3c/0x80 [592792.725160] [<000000566db24fea>] tick_sched_handle.isra.0+0x4a/0x70 [592792.725161] [<000000566db2528e>] tick_sched_timer+0x5e/0xc0 [592792.725163] [<000000566db14294>] __hrtimer_run_queues+0x114/0x2f0 [592792.725165] [<000000566db14fdc>] hrtimer_interrupt+0x12c/0x2a0 [592792.725167] [<000000566da14b6a>] do_IRQ+0xaa/0xb0 [592792.725170] [<000000566e2eed08>] ext_int_handler+0x130/0x134 [592792.725174] [<000000566da2bad8>] gmap_make_secure+0x1c8/0x340 [592792.725175] ([<000000566da2b9fe>] gmap_make_secure+0xee/0x340) [592792.725180] [<000000566da6e796>] kvm_s390_pv_unpack+0xc6/0x2b0 [592792.725183] [<000000566da535c0>] kvm_s390_handle_pv+0x390/0x580 [592792.725184] [<000000566da55b30>] kvm_arch_vm_ioctl+0x250/0x9e0 [592792.725187] [<000000566da44c26>] kvm_vm_ioctl+0x396/0x760 [592792.725191] [<000000566dceb0b6>] do_vfs_ioctl+0x376/0x690 [592792.725193] [<000000566dceb454>] ksys_ioctl+0x84/0xb0 [592792.725194] [<000000566dceb4ea>] __s390x_sys_ioctl+0x2a/0x40 [592792.725195] [<000000566e2ee6b2>] system_call+0x2a6/0x2c8 Contact Information = stefan.am...@de.ibm.com, cborn...@de.ibm.com ---uname output--- 5.4.0-90-generic #101-Ubuntu Machine Type = 8562 A00-GT2 ---System Hang--- LPAR crashed and needed to be re-booted ---Debugger--- A debugger is not configured ---Steps to Reproduce--- Cause high load. Then start Secure Execution enabled KVM guest To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1979296/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp