On 1/27/26 10:46, Stanislav Kinsburskii wrote:
On Mon, Jan 26, 2026 at 07:02:29PM -0800, Mukesh R wrote:
On 1/26/26 07:57, Stanislav Kinsburskii wrote:
On Fri, Jan 23, 2026 at 05:26:19PM -0800, Mukesh R wrote:
On 1/20/26 16:12, Stanislav Kinsburskii wrote:
On Mon, Jan 19, 2026 at 10:42:27PM -0800, Mukesh R wrote:
From: Mukesh Rathor <[email protected]>
Add a new file to implement management of device domains, mapping and
unmapping of iommu memory, and other iommu_ops to fit within the VFIO
framework for PCI passthru on Hyper-V running Linux as root or L1VH
parent. This also implements direct attach mechanism for PCI passthru,
and it is also made to work within the VFIO framework.
At a high level, during boot the hypervisor creates a default identity
domain and attaches all devices to it. This nicely maps to Linux iommu
subsystem IOMMU_DOMAIN_IDENTITY domain. As a result, Linux does not
need to explicitly ask Hyper-V to attach devices and do maps/unmaps
during boot. As mentioned previously, Hyper-V supports two ways to do
PCI passthru:
1. Device Domain: root must create a device domain in the hypervisor,
and do map/unmap hypercalls for mapping and unmapping guest RAM.
All hypervisor communications use device id of type PCI for
identifying and referencing the device.
2. Direct Attach: the hypervisor will simply use the guest's HW
page table for mappings, thus the host need not do map/unmap
device memory hypercalls. As such, direct attach passthru setup
during guest boot is extremely fast. A direct attached device
must be referenced via logical device id and not via the PCI
device id.
At present, L1VH root/parent only supports direct attaches. Also direct
attach is default in non-L1VH cases because there are some significant
performance issues with device domain implementation currently for guests
with higher RAM (say more than 8GB), and that unfortunately cannot be
addressed in the short term.
<snip>
<snip>
+static void hv_iommu_detach_dev(struct iommu_domain *immdom, struct device
*dev)
+{
+ struct pci_dev *pdev;
+ struct hv_domain *hvdom = to_hv_domain(immdom);
+
+ /* See the attach function, only PCI devices for now */
+ if (!dev_is_pci(dev))
+ return;
+
+ if (hvdom->num_attchd == 0)
+ pr_warn("Hyper-V: num_attchd is zero (%s)\n", dev_name(dev));
+
+ pdev = to_pci_dev(dev);
+
+ if (hvdom->attached_dom) {
+ hv_iommu_det_dev_from_guest(hvdom, pdev);
+
+ /* Do not reset attached_dom, hv_iommu_unmap_pages happens
+ * next.
+ */
+ } else {
+ hv_iommu_det_dev_from_dom(hvdom, pdev);
+ }
+
+ hvdom->num_attchd--;
Shouldn't this be modified iff the detach succeeded?
We want to still free the domain and not let it get stuck. The purpose
is more to make sure detach was called before domain free.
How can one debug subseqent errors if num_attchd is decremented
unconditionally? In reality the device is left attached, but the related
kernel metadata is gone.
Error is printed in case of failed detach. If there is panic, at least
you can get some info about the device. Metadata in hypervisor is
around if failed.
With this approach the only thing left is a kernel message.
But if the state is kept intact, one could collect a kernel core and
analyze it.
Again, most of linux stuff is cleaned up, the only state is in
hypervisor, and hypervisor can totally protect itself and devices.
So there is not much in kernel core as it got cleaned up already.
Think of this as additional check, we can remove in future after
it stands the test of time, until then, every debugging bit helps.
And note, that there won't be a hypervisor core by default: our main
context with the usptreamed version of the driver is L1VH and a kernel
core is the only thing a third party customer can provide for our
analysis.
Wei can correct me, but we are not only l1vh focused here. There is
work going on on all fronts.
Thanks,
-Mukesh
Thanks,
Stanislav