SRU request sent to the kernel team mailing list:
https://lists.ubuntu.com/archives/kernel-team/2019-February/098872.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817918

Title:
  Hard lockups due to unrestricted lapic timer delay

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [Impact]

  * There is a long-time report of an issue with the TSC delay present
  in wait_lapic_expire() - basically the guest could have an expiration
  timer configured in a way it induces host to wait a long time (with
  preemption disabled), so there's a potential scenario for host lockups.

  * The stack trace we have access (from an user report of this issue)
  is (summarized) below:

  NMI watchdog: Watchdog detected hard LOCKUP on cpu 16
  [...]
  CPU: 16 PID: 3024910 Comm: CPU 0/KVM Not tainted 4.4.0-139-generic #165-Ubuntu
  RIP: 0010:[<addr>]  [<addr>] delay_tsc+0x20/0x60
  [...]
   __delay+0x15/0x20
  wait_lapic_expire+0xc3/0x150 [kvm]
  vcpu_enter_guest+0x743/0x11d0 [kvm]
  kvm_arch_vcpu_ioctl_run+0xe6/0x410 [kvm]
  kvm_vcpu_ioctl+0x33d/0x620 [kvm]
  do_vfs_ioctl+0x2af/0x4b0
  ? __do_page_fault+0x1c1/0x410
  ? fire_user_return_notifiers+0x3e/0x50
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x22/0xc1

  This matches the reported problem in the KVM mailing-list:
  https://marc.info/?l=kvm&m=146374488028339

  * A fix was proposed in the above thread, but discarded in favor of the
  following approach: https://marc.info/?l=kvm&m=146647260109315
  The patch was merged in Linus tree, hence we hereby request the SRU:
  b606f189c7d5 ("KVM: LAPIC: cap __delay at lapic_timer_advance_ns").
  There's one additional patch needed, which is just the header adjustment
  for exporting a necessary function.

  * The patch is missing only in 4.4 kernel series; Bionic (4.15) and
  the other newer releases have the patch already.

  [Test Case]

  * Unfortunately this is a hard to reproduce issue; we have reports of
  this lockup from an user, hence the SRU request here.
  Also, the patch was introduced originally in kernel 4.7, approx. 2.5 years
  ago. So, we are confident that community is running this code long enough
  without errors reported. Also, checked in the Linus tree and no fixes
  for this code were introduced since kernel 4.7.

  [Regression Potential]

  * The code modification requested here affects the amount of delay in
  a specific timer; the patch introduces a maximum time for delay, preventing 
unbounded delays in host.
  The regression potential is considered low, and given the nature of the
  modification, latency issues in guests are likely to be the most problematic 
regression potential we have.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817918/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to