On 22/05/12 13:21, Alexander Graf wrote: > > > On 22.05.2012, at 04:02, Benjamin Herrenschmidt <b...@kernel.crashing.org> > wrote: > >> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>> Alexander, >>> >>> Is that any better? :) >> >> Alex (Graf that is), ping ? >> >> The original patch from Alexey was fine btw. >> >> VFIO will always call things with the existing capability offset so >> there's no real risk of doing the wrong thing or break the list or >> anything. >> >> IE. A small simple patch that addresses the problem :-) >> >> The new patch is a bit more "robust" I believe, I don't think we need to >> go too far to fix a problem we don't have. But we need a fix for the >> real issue and the simple patch does it neatly from what I can >> understand. >> >> Cheers, >> Ben. >> >>> >>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >>> * in pci config space */ >>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>> uint8_t offset, uint8_t size) >>> { >>> - uint8_t *config; >>> + uint8_t *config, existing; > > Existing is a pointer to the target dev's config space, right?
Yes. >>> int i, overlapping_cap; >>> >>> + existing = pci_find_capability(pdev, cap_id); >>> + if (existing) { >>> + if (offset && (existing != offset)) { >>> + return -EEXIST; >>> + } >>> + for (i = existing; i < size; ++i) { > > So how does this possibly make sense? Although I do not expect VFIO to add capabilities (does not make sense), I still want to double check that this space has not been tried to use by someone else. >>> + if (pdev->used[i]) { >>> + return -EFAULT; >>> + } >>> + } >>> + memset(pdev->used + offset, 0xFF, size); > Why? Because I am marking the space this capability takes as used. >>> + /* Make capability read-only by default */ >>> + memset(pdev->wmask + offset, 0, size); > Why? Because the pci_add_capability() does it for a new capability by default. >>> + /* Check capability by default */ >>> + memset(pdev->cmask + offset, 0xFF, size); > > I don't understand this part either. The pci_add_capability() does it for a new capability by default. > > Alex > >>> + return existing; >>> + } >>> + >>> if (!offset) { >>> offset = pci_find_space(pdev, size); >>> if (!offset) { >>> return -ENOSPC; >>> >>> >>> >>> >>> >>> >>> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>>> On 12/05/12 00:13, Alexander Graf wrote: >>>>> >>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>>> >>>>>> 11.05.2012 20:52, Alexander Graf написал: >>>>>>> >>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>>> >>>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>>> is being built by QEMU. >>>>>>>> >>>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>>> >>>>>>>> For example, the old code destroys the following config >>>>>>>> of PCIe Intel E1000E: >>>>>>>> >>>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>>> 0x34: 0xC8 >>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>> 0xD0: 0x05 0xE0 >>>>>>>> 0xE0: 0x10 0x00 >>>>>>>> >>>>>>>> after: >>>>>>>> 0x34: 0xD0 >>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>> 0xD0: 0x05 0xC8 >>>>>>>> 0xE0: 0x10 0x00 >>>>>>>> >>>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>>> >>>>>>>> The proposed patch does not change capability pointers when >>>>>>>> the same type capability is about to add. >>>>>>>> >>>>>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>>>>>> --- >>>>>>>> hw/pci.c | 10 ++++++---- >>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>>> --- a/hw/pci.c >>>>>>>> +++ b/hw/pci.c >>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, >>>>>>>> uint8_t cap_id, >>>>>>>> } >>>>>>>> >>>>>>>> config = pdev->config + offset; >>>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>>> >>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do >>>>>>> a loop through all capabilities, check if the one you want to add is >>>>>>> there already and if so either >>>>>>> * replace the existing one or >>>>>>> * drop out and not write the new one in. >>>>> >>>>> * hw_error :) >>>>> >>>>>>> >>>>>>> I'm not sure which way would be more natural. >>>>>> >>>>>> There is a third option - add another function, lets call it >>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>>> but won't touch list pointers. >>>>> >>>>> What good is a function that breaks internal consistency? >>>> >>>> >>>> It is broken already by having PCIDevice.used field. Normally >>>> pci_add_capability() would go through >>>> the whole list and add a capability if it does not exist. Emulated devices >>>> which care about having a >>>> capability at some fixed offset would have initialized their config space >>>> before calling this >>>> capabilities API (as VFIO does). >>>> >>>> If we really want to support emulated devices which want some capabilities >>>> be at fixed offset and >>>> others at random offsets (strange, but ok), I do not see how it is bad to >>>> restore this consistency >>>> by special function (pci_fixup_capability()) to avoid its rewriting at >>>> different location as a guest >>>> driver may care about its offset. >>>> >>>> >>>> >>>>>> When vfio, pci_add_capability() is called from the code which knows >>>>>> exactly that the capability exists and where it is and it calls >>>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>>> just for imaginery scalability is a bit weird, no? >>>>> >>>>> Not sure I understand your proposal. The more generic a framework is, the >>>>> better, no? In this code path we don't care about speed. We only care >>>>> about consistency and reliability. >>>>> >>>>> >>>>> Alex >>>>> >>>> >>>> >>> >>> >> >> -- Alexey -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html