[Kernel-packages] [Bug 2067862] Re: Removing legacy virtio-pci devices causes kernel panic

Ubuntu Kernel Bot Thu, 08 Aug 2024 03:37:38 -0700

This bug is awaiting verification that the linux-raspi/6.8.0-1009.10
kernel in -proposed solves the problem. Please test the kernel and
update this bug with the results. If the problem is solved, change the
tag 'verification-needed-noble-linux-raspi' to 'verification-done-noble-
linux-raspi'. If the problem still exists, change the tag 'verification-
needed-noble-linux-raspi' to 'verification-failed-noble-linux-raspi'.



If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-noble-linux-raspi-v2 
verification-needed-noble-linux-raspi

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2067862

Title:
  Removing legacy virtio-pci devices causes kernel panic

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Noble:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2067862

  [Impact]

  If you detach a legacy virtio-pci device from a current Noble system,
  it will cause a null pointer dereference, and panic the system. This
  is an issue if you force noble to use legacy virtio-pci devices, or
  run noble on very old hypervisors that only support legacy virtio-pci
  devices, e.g. trusty and older.

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  ...
  CPU: 2 PID: 358 Comm: kworker/u8:3 Kdump: loaded Not tainted 6.8.0-31-generic 
#31-Ubuntu
  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
  RIP: 0010:0x0
  ...
  Call Trace:
  <TASK>
   ? show_regs+0x6d/0x80
   ? __die+0x24/0x80
   ? page_fault_oops+0x99/0x1b0
   ? do_user_addr_fault+0x2ee/0x6b0
   ? exc_page_fault+0x83/0x1b0
   ? asm_exc_page_fault+0x27/0x30
   vp_del_vqs+0x6e/0x2a0
   remove_vq_common+0x166/0x1a0
   virtnet_remove+0x61/0x80
   virtio_dev_remove+0x3f/0xc0
   device_remove+0x40/0x80
   device_release_driver_internal+0x20b/0x270
   device_release_driver+0x12/0x20
   bus_remove_device+0xcb/0x140
   device_del+0x161/0x3e0
   ? pci_bus_generic_read_dev_vendor_id+0x2c/0x1a0
   device_unregister+0x17/0x60
   unregister_virtio_device+0x16/0x40
   virtio_pci_remove+0x43/0xa0
   pci_device_remove+0x36/0xb0
   device_remove+0x40/0x80
   device_release_driver_internal+0x20b/0x270
   device_release_driver+0x12/0x20
   pci_stop_bus_device+0x7a/0xb0
   pci_stop_and_remove_bus_device+0x12/0x30
   disable_slot+0x4f/0xa0
   acpiphp_disable_and_eject_slot+0x1c/0xa0
   hotplug_event+0x11b/0x280
   ? __pfx_acpiphp_hotplug_notify+0x10/0x10
   acpiphp_hotplug_notify+0x27/0x70
   acpi_device_hotplug+0xb6/0x300
   acpi_hotplug_work_fn+0x1e/0x40
   process_one_work+0x16c/0x350
   worker_thread+0x306/0x440
   ? _raw_spin_lock_irqsave+0xe/0x20
   ? __pfx_worker_thread+0x10/0x10
   kthread+0xef/0x120
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x44/0x70
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1b/0x30
  </TASK>

  The issue was introduced in:

  commit fd27ef6b44bec26915c5b2b22c13856d9f0ba17a
  Author: Feng Liu <fe...@nvidia.com>
  Date:   Tue Dec 19 11:32:40 2023 +0200
  Subject: virtio-pci: Introduce admin virtqueue
  Link: 
https://github.com/torvalds/linux/commit/fd27ef6b44bec26915c5b2b22c13856d9f0ba17a

  Modern virtio-pci devices are not affected. If the device is a legacy
  virtio device, the is_avq function pointer is not assigned in the
  virtio_pci_device structure of the legacy virtio device, resulting in
  a NULL pointer dereference when the code calls if
  (vp_dev->is_avq(vdev, vq->index)).

  There is no workaround. If you are affected, then not detaching
  devices for the time being is the only solution.

  [Fix]

  This was fixed in 6.9-rc1 by:

  commit c8fae27d141a32a1624d0d0d5419d94252824498
  From: Li Zhang <zhangliker...@gmail.com>
  Date: Sat, 16 Mar 2024 13:25:54 +0800
  Subject: virtio-pci: Check if is_avq is NULL
  Link: 
https://github.com/torvalds/linux/commit/c8fae27d141a32a1624d0d0d5419d94252824498

  This is a clean cherry pick to noble. The commit just adds a basic
  NULL pointer check before it dereferences the pointer.

  [Testcase]

  Start a fresh Noble VM.

  Edit the grub kernel command line:

  1) sudo vim /etc/default/grub
  GRUB_CMDLINE_LINUX_DEFAULT="virtio_pci.force_legacy=1" 
  2) sudo update-grub
  3) sudo reboot

  Outside the VM, on the host:

  $ qemu-img create -f qcow2 /root/share-device.qcow2 2G
  $ cat >> share-device.xml << EOF
  disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='writeback' io='threads'/>
      <source file='/root/share-device.qcow2'/>
      <target dev='vdc' bus='virtio'/>
  </disk>
  EOF
  $ sudo -s
  # virsh attach-device noble-test share-device.xml --config --live
  # virsh detach-device noble-test share-device.xml --config --live

  A kernel panic should occur.

  There is a test kernel available in:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2067862-test

  If you install it, the panic should no longer occur.

  [Where problems could occur]

  We are adding a basic null pointer check right before the pointer is
  about to be used, which is quite low risk.

  If a regression were to occur, it would only affect VMs using legacy
  virtio-pci devices, which is not the default. It would potentially
  have large impacts on fleets of very old hypervisors running trusty,
  precise or lucid, but that is very unlikely in this day and age.

  [Other Info]

  Upstream mailing list discussion and author testcase:
  
https://lore.kernel.org/kvm/CACGkMEs1t-ipP7TasHkKNKd=pevees6xdw1zssjkb-bc9et...@mail.gmail.com/T/#m167335bf7ab09b12fec3bdc5d46a30bc2e26cac7

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2067862/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2067862] Re: Removing legacy virtio-pci devices causes kernel panic

Reply via email to