On Tue, 6 Jan 2026 14:22:57 +0100 Eric Auger <[email protected]> wrote:
> On 1/6/26 12:38 PM, Shameer Kolothum wrote: > > Hi Eric, > > > >> -----Original Message----- > >> From: Eric Auger <[email protected]> > >> Sent: 06 January 2026 10:55 > >> To: Shameer Kolothum <[email protected]>; Yi Liu > >> <[email protected]>; [email protected]; [email protected] > >> Cc: [email protected]; Jason Gunthorpe <[email protected]>; Nicolin > >> Chen <[email protected]>; [email protected]; [email protected]; > >> Nathan Chen <[email protected]>; Matt Ochs <[email protected]>; > >> [email protected]; [email protected]; > >> [email protected]; [email protected]; > >> [email protected]; [email protected]; Krishnakant Jaju > >> <[email protected]>; [email protected] > >> Subject: Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM > >> > >> External email: Use caution opening links or attachments > >> > >> > >> Hi Shameer, > > [...] > > > >>>>> Besides the fact the offset is arbitrarily chosen so that this is the > >>>>> last cap of the vconfig space, the code looks good to me. > >>>>> So > >>>>> Reviewed-by: Eric Auger <[email protected]> > >>>>> > >>>>> Just wondering whether we couldn't add some generic pcie code that > >>>>> parses the extended cap linked list to check the offset range is not > >>>>> used by another cap before allowing the insertion at a given offset? > >>>>> This wouldn't prevent a subsequent addition from failing but at least we > >>>>> would know if there is some collision.this could be added later on > >>>>> though. > >>>>> > >>>> You're absolutely right. My approach of using the last 8 bytes was a > >>>> shortcut to avoid implementing proper capability parsing logic > >>>> (importing pci_regs.h and maintaining a cap_id-to-cap_size mapping > >>>> table), and it simplified PASID capability detection by only examining > >>>> the last 8bytes by a simple dump :(. However, this approach is not > >>>> good as we cannot guarantee that the last 8bytes are unused by any > >>>> device. > >>>> > >>>> Let's just implement the logic to walk the linked list of ext_caps to > >>>> find an appropriate offset for our use case. > >>> I had a go at this. Based on my understanding, even if we walk the PCIe > >>> extended capability linked list, we still can't easily determine the size > >>> occupied by the last capability as the extended capability header does not > >>> encode a length, it only provides the "next" pointer, and for the last > >>> entry > >>> next == 0. > >> If my understanding is correct when walking the linked list, you can > >> enumerate the start index and the PCIe extended Capability variable size > >> which is made of fix header size + register block variable size which > >> depends on the capability ID). After that we shall be able to allocate a > >> slot within holes or at least check that adding the new prop at the end > >> of the 4kB is safe, no?. What do I miss? > > I think the main issue is that we can't know whether the apparent "holes" > > between extended capabilities are actually free. Depending on the vendor > > implementation, those regions may be reserved or used for vendor specific > > purposes, and I am not sure(please correct me) PCIe spec guarantee that > > such gaps are available for reuse. Hence thought of relying on the “next” > > pointer as a safe bet. > > > > Even if we look at the last CAP ID and derive a size based on the > > spec defined register layout, we still can;t know whether there is > > any additional vendor specific data beyond that "size". It is still > > a best guess and I don't think we gain much in adding this additional > > check. > > Ah OK I see what you mean (you may have discussed that earlier in other > threads sorry). So you may have vendor specific private data in the > holes. In that case I guess we cannot do much :-/ Also, we can only know the size of capabilities that are currently defined, we don't do a great job of keeping up with the latest ECNs. Unless we have device specific knowledge, the best we can do is hope that a gap between capabilities is unused. It might be a helpful indicator to verify the config space we intend to overlap is zero, though we can get false positives with such a method if we overlap a capability that kernel vfio-pci has disconnected from the capability chain an marked read-only. > > > > Perhaps, I think we could inform the user that we are placing > > teh PASID at the last offset and the onus is on user to make sure > > it is safe to do so. > or another solution is to let the user opt-in for this hasardous > placement using an explicit x- prefixed option? Dunno Yeah, this is probably one of those cases where we expect we don't have a foolproof solution and we should allow an override. I think we're defining the initial 'auto' algorithm that a -x-vpasid_cap_offset= might use by default. It should also take a numerical value as an override though. We'll need a table so that when we find a device that requires a different value, that becomes the auto value for that device. Also consider how adding subsequent purely virtual capability would work. We don't want to have each capability defining a new algorithm and default offset. To that extent, we might be better served with a command line override that specifies available ranges of config space rather than an offset for a specific vcap. I think our discussions related to a kernel interface were headed more in this direction. Thanks, Alex
