On Tue, 6 Jan 2026 14:22:57 +0100
Eric Auger <[email protected]> wrote:

> On 1/6/26 12:38 PM, Shameer Kolothum wrote:
> > Hi Eric,
> >  
> >> -----Original Message-----
> >> From: Eric Auger <[email protected]>
> >> Sent: 06 January 2026 10:55
> >> To: Shameer Kolothum <[email protected]>; Yi Liu
> >> <[email protected]>; [email protected]; [email protected]
> >> Cc: [email protected]; Jason Gunthorpe <[email protected]>; Nicolin
> >> Chen <[email protected]>; [email protected]; [email protected];
> >> Nathan Chen <[email protected]>; Matt Ochs <[email protected]>;
> >> [email protected]; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected]; Krishnakant Jaju
> >> <[email protected]>; [email protected]
> >> Subject: Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> Hi Shameer,  
> > [...]
> >  
> >>>>> Besides the fact the offset is arbitrarily chosen so that this is the
> >>>>> last cap of the vconfig space, the code looks good to me.
> >>>>> So
> >>>>> Reviewed-by: Eric Auger <[email protected]>
> >>>>>
> >>>>> Just wondering whether we couldn't add some generic pcie code that
> >>>>> parses the extended cap linked list to check the offset range is not
> >>>>> used by another cap before allowing the insertion at a given offset?
> >>>>> This wouldn't prevent a subsequent addition from failing but at least we
> >>>>> would know if there is some collision.this could be added later on 
> >>>>> though.
> >>>>>  
> >>>> You're absolutely right. My approach of using the last 8 bytes was a
> >>>> shortcut to avoid implementing proper capability parsing logic
> >>>> (importing pci_regs.h and maintaining a cap_id-to-cap_size mapping
> >>>> table), and it simplified PASID capability detection by only examining
> >>>> the last 8bytes by a simple dump :(. However, this approach is not
> >>>> good as we cannot guarantee that the last 8bytes are unused by any
> >>>> device.
> >>>>
> >>>> Let's just implement the logic to walk the linked list of ext_caps to
> >>>> find an appropriate offset for our use case.  
> >>> I had a go at this. Based on my understanding, even if we walk the PCIe
> >>> extended capability linked list, we still can't easily determine the size
> >>> occupied by the last capability as the extended capability header does not
> >>> encode a length, it only provides the "next" pointer, and for the last 
> >>> entry
> >>> next == 0.  
> >> If my understanding is correct when walking the linked list, you can
> >> enumerate the start index and the PCIe extended Capability variable size
> >> which is made of fix header size + register block variable size which
> >> depends on the capability ID). After that we shall be able to allocate a
> >> slot within holes or at least check that adding the new prop at the end
> >> of the 4kB is safe, no?. What do I miss?  
> > I think the main issue is that we can't know whether the apparent "holes"
> > between extended capabilities are actually free. Depending on the vendor
> > implementation, those regions may be reserved or used for vendor specific
> > purposes, and I am not sure(please correct me) PCIe spec guarantee that
> > such gaps are available for reuse. Hence thought of relying on the “next”
> > pointer as a safe bet.
> >
> > Even if we look at the last CAP ID and derive a size based on the
> > spec defined register layout, we still can;t know whether there is
> > any additional vendor specific data beyond that "size". It is still
> > a best guess and I don't think we gain much in adding this additional
> > check.  
> 
> Ah OK I see what you mean (you may have discussed that earlier in other
> threads sorry). So you may have vendor specific private data in the
> holes. In that case I guess we cannot do much :-/

Also, we can only know the size of capabilities that are currently
defined, we don't do a great job of keeping up with the latest ECNs.

Unless we have device specific knowledge, the best we can do is hope
that a gap between capabilities is unused.  It might be a helpful
indicator to verify the config space we intend to overlap is zero,
though we can get false positives with such a method if we overlap a
capability that kernel vfio-pci has disconnected from the capability
chain an marked read-only.

> >
> > Perhaps, I think we could inform the user that we are placing
> > teh PASID at the last offset and the onus is on user to make sure
> > it is safe to do so.   
> or another solution is to let the user opt-in for this hasardous
> placement using an explicit x- prefixed option? Dunno

Yeah, this is probably one of those cases where we expect we don't have
a foolproof solution and we should allow an override.  I think we're
defining the initial 'auto' algorithm that a -x-vpasid_cap_offset=
might use by default.  It should also take a numerical value as an
override though.  We'll need a table so that when we find a device that
requires a different value, that becomes the auto value for that
device.

Also consider how adding subsequent purely virtual capability would
work.  We don't want to have each capability defining a new algorithm
and default offset.  To that extent, we might be better served with a
command line override that specifies available ranges of config space
rather than an offset for a specific vcap.  I think our discussions
related to a kernel interface were headed more in this direction.
Thanks,

Alex

Reply via email to