** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2056227

Title:
  KVM: arm64: softlockups in stage2_apply_range

Status in linux package in Ubuntu:
  New
Status in linux source package in Jammy:
  New

Bug description:
  [Impact]

  Tearing down kvm VMs on arm64 can cause softlockups to appear on console.  
When
  terminating VMs with > 100Gb of memory and 4k pages, the memory unmap times
  often exceed 20 seconds, which can trigger the softlockup detector.  Portions 
of
  the unmap path also have interrupts disabled while tlb invalidation 
instructions
  run, which can further contribute to latency problems.  My team has observed
  networking latency problems if the cpu where the teardown is occurring is also
  mapped to handle a NIC interrupt.
    
  Fortunately, a solution has been in place since Linux 6.1.  A small pair of
  patches modify stage2_apply_range to operate on smaller memory ranges before
  performing a cond_resched.  With these patches applied, softlockups are no
  longer observed when tearing down VMs with large amounts of memory. 

  Although I also submitted the patches to 5.15 LTS (link to LTS submission in
  "Backport" section), I'd appreciate it if Ubuntu were willing to take this
  submission in parallel since the impact has left us unable to utilize arm64 
for
  kvm until we can either migrate our hypervisors to hugepages, pick up this 
fix,
  or some combination of the two.

  [Backport]

  Backport the following fixes from linux 6.1: 
    
  3b5c082bbf KVM: arm64: Work out supported block level at compile time
  5994bc9e05 KVM: arm64: Limit stage2_apply_range() batch size to largest block
    
  The fix is in 5994bc9e05 and 3b5c082bbf is a dependency that was submitted as
  part of the series.  The original submission is here:
    
  https://lore.kernel.org/all/20221007234151.461779-1-oliver.up...@linux.dev/
    
  I've also submitted the patches to 5.15 LTS here:
    
  https://lore.kernel.org/stable/cover.1709665227.git.k...@templeofstupid.com/
    
  Both fixes cherry picked cleanly and there were no conflicts.
    
  [Test]
     
  Executed the test from 5994bc9e05 as well as my own run of kvm_page_table_test
  on a VM with 4k pages and a memory size > 100Gb.  Without the patches,
  softlockups were observed in both tests.  With the patches applied, the tests
  ran without incident.

  This was tested against both LTS 5.15.150 and linux-aws-5.15.0-1055.
   
  [Potential Regression]
     
  Regression potential is low.  These patches have been present in Linux since 
6.1
  and appear to have needed no further maintenance.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056227/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to