I modified the kernel to have a few functions non-inlined to be better tracable:
vfio_dma_do_map
vfio_dma_do_unmap
mutex_lock
mutex_unlock
kzalloc
vfio_link_dma
vfio_pin_map_dma
vfio_pin_pages_remote
vfio_iommu_map

Then run tracing on this load with limited to the functions in my focus:
$ sudo trace-cmd record -p function_graph -l vfio_iommu_type1_ioctl -l 
vfio_pci_ioctl -l vfio_set_irqs_validate_and_prepare -l vfio_pci_set_irqs_ioctl 
-l pci_probe_reset_slot -l pci_probe_reset_bus -l vfio_pci_count_devs -l 
vfio_pci_fill_devs -l vfio_pci_validate_devs -l vfio_group_get_external_user -l 
vfio_group_put_external_user -l vfio_external_user_iommu_id -l vfio_dma_do_map 
-l vfio_dma_do_unmap -l mutex_lock -l mutex_unlock -l vfio_link_dma -l 
vfio_pin_map_dma -l vfio_pin_pages_remote -l vfio_iommu_map

This I traced twice to compare it:
- after a clean boot; duration ~12 seconds
- right afterwards (rerun from gdb); duration 175~ seconds

It shows that the locks are interesting for concurrency considerations, but 
here mostly noisy.
OTOH it is interesting that the slow case has much more of them, maybe they 
need to unlock for interactiveness in between more often. I'll count them in 
postprocessing like:

The result shows that the locking involved scaled much more.
Duration is ~*14.5 and locks are ~*34 the average lock activity between vfio 
calls is even more with
1.52 -> 6.32.

But the calls to vfio_pin_pages_remote / vfio_iommu_map went up as well.
64020 -> 327852; which means smaller chunks I guess.
The timing of these functions changes slightly, initially it is much different 
than later in the trace.
Lets take a clean shot at just those durations and sizes:

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1838575

Title:
  passthrough devices cause >17min boot delay

Status in linux package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Incomplete

Bug description:
  Adding passthrough devices to a guest introduces a boot delay of > 17
  minutes. During this time libvirt reports the guest to be "paused".
  The delay does not seem to scale with the # of hostdevs - that is, 1
  hostdev causes about a 17min delay, while 16 only bumps that up to
  ~18min. Removing all hostdevs avoids the delay.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1838575/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to