On Mon, Jan 26, 2026 at 07:02:29PM -0800, Mukesh R wrote:
> On 1/26/26 07:57, Stanislav Kinsburskii wrote:
> > On Fri, Jan 23, 2026 at 05:26:19PM -0800, Mukesh R wrote:
> > > On 1/20/26 16:12, Stanislav Kinsburskii wrote:
> > > > On Mon, Jan 19, 2026 at 10:42:27PM -0800, Mukesh R wrote:
> > > > > From: Mukesh Rathor <[email protected]>
> > > > > 
> > > > > Add a new file to implement management of device domains, mapping and
> > > > > unmapping of iommu memory, and other iommu_ops to fit within the VFIO
> > > > > framework for PCI passthru on Hyper-V running Linux as root or L1VH
> > > > > parent. This also implements direct attach mechanism for PCI passthru,
> > > > > and it is also made to work within the VFIO framework.
> > > > > 
> > > > > At a high level, during boot the hypervisor creates a default identity
> > > > > domain and attaches all devices to it. This nicely maps to Linux iommu
> > > > > subsystem IOMMU_DOMAIN_IDENTITY domain. As a result, Linux does not
> > > > > need to explicitly ask Hyper-V to attach devices and do maps/unmaps
> > > > > during boot. As mentioned previously, Hyper-V supports two ways to do
> > > > > PCI passthru:
> > > > > 
> > > > >     1. Device Domain: root must create a device domain in the 
> > > > > hypervisor,
> > > > >        and do map/unmap hypercalls for mapping and unmapping guest 
> > > > > RAM.
> > > > >        All hypervisor communications use device id of type PCI for
> > > > >        identifying and referencing the device.
> > > > > 
> > > > >     2. Direct Attach: the hypervisor will simply use the guest's HW
> > > > >        page table for mappings, thus the host need not do map/unmap
> > > > >        device memory hypercalls. As such, direct attach passthru setup
> > > > >        during guest boot is extremely fast. A direct attached device
> > > > >        must be referenced via logical device id and not via the PCI
> > > > >        device id.
> > > > > 
> > > > > At present, L1VH root/parent only supports direct attaches. Also 
> > > > > direct
> > > > > attach is default in non-L1VH cases because there are some significant
> > > > > performance issues with device domain implementation currently for 
> > > > > guests
> > > > > with higher RAM (say more than 8GB), and that unfortunately cannot be
> > > > > addressed in the short term.
> > > > > 
> > > > 
> > > > <snip>
> > > > 
> > 
> > <snip>
> > 
> > > > > +static void hv_iommu_detach_dev(struct iommu_domain *immdom, struct 
> > > > > device *dev)
> > > > > +{
> > > > > +     struct pci_dev *pdev;
> > > > > +     struct hv_domain *hvdom = to_hv_domain(immdom);
> > > > > +
> > > > > +     /* See the attach function, only PCI devices for now */
> > > > > +     if (!dev_is_pci(dev))
> > > > > +             return;
> > > > > +
> > > > > +     if (hvdom->num_attchd == 0)
> > > > > +             pr_warn("Hyper-V: num_attchd is zero (%s)\n", 
> > > > > dev_name(dev));
> > > > > +
> > > > > +     pdev = to_pci_dev(dev);
> > > > > +
> > > > > +     if (hvdom->attached_dom) {
> > > > > +             hv_iommu_det_dev_from_guest(hvdom, pdev);
> > > > > +
> > > > > +             /* Do not reset attached_dom, hv_iommu_unmap_pages 
> > > > > happens
> > > > > +              * next.
> > > > > +              */
> > > > > +     } else {
> > > > > +             hv_iommu_det_dev_from_dom(hvdom, pdev);
> > > > > +     }
> > > > > +
> > > > > +     hvdom->num_attchd--;
> > > > 
> > > > Shouldn't this be modified iff the detach succeeded?
> > > 
> > > We want to still free the domain and not let it get stuck. The purpose
> > > is more to make sure detach was called before domain free.
> > > 
> > 
> > How can one debug subseqent errors if num_attchd is decremented
> > unconditionally? In reality the device is left attached, but the related
> > kernel metadata is gone.
> 
> Error is printed in case of failed detach. If there is panic, at least
> you can get some info about the device. Metadata in hypervisor is
> around if failed.
> 

With this approach the only thing left is a kernel message.
But if the state is kept intact, one could collect a kernel core and
analyze it.

And note, that there won't be a hypervisor core by default: our main
context with the usptreamed version of the driver is L1VH and a kernel
core is the only thing a third party customer can provide for our
analysis.

Thanks,
Stanislav


Reply via email to