On Mon, Jul 29, 2019 at 01:04:41PM -0600, Alex Williamson wrote: > On Mon, 29 Jul 2019 16:26:46 +0800 > Peter Xu <zh...@redhat.com> wrote: > > > On Fri, Jul 26, 2019 at 06:55:53PM -0600, Alex Williamson wrote: > > > When we account for DMA aliases in the PCI address space, we can no > > > longer use a single IVHD entry in the IVRS covering all devices. We > > > instead need to walk the PCI bus and create alias ranges when we find > > > a conventional bus. These alias ranges cannot overlap with a "Select > > > All" range (as currently implemented), so we also need to enumerate > > > each device with IVHD entries. > > > > > > Importantly, the IVHD entries used here include a Device ID, which is > > > simply the PCI BDF (Bus/Device/Function). The guest firmware is > > > responsible for programming bus numbers, so the final revision of this > > > table depends on the update mechanism (acpi_build_update) to be called > > > after guest PCI enumeration. > > > > Ouch... so the ACPI build procedure is after those guest PCI code! > > Could I ask how do you find this? :) It seems much easier for sure > > this way... > > I believe this is what MST was referring to with the MCFG update, > acpi_build() is called from both acpi_setup() and acpi_build_update(), > the latter is setup in numerous places to be called via a mechanism > that re-writes the table on certain guest actions. For instance > acpi_add_rom_blob() passes this function as a callback which turns into > a select_cb in fw_cfg, such that (aiui) the tables are updated before > the guest reads them. I added some fprintfs in the PCI config write > path to confirm that the update callback occurs after PCI enumeration > for both SeaBIOS and OVMF. Since we seem to have other dependencies on > this ordering elsewhere, I don't think that the IVRS update is unique > in this regard.
Agreed. [...] > > We've implmented the similar logic for multiple times: > > > > - When we want to do DMA (pci_requester_id) > > - When we want to fetch the DMA address space (the previous patch) > > - When we fill in the AMD ACPI table (this patch) > > > > Do you think we can generalize them somehow? I'm thinking how about > > we directly fetch RID in the 2nd/3rd use case using pci_requester_id() > > (which existed already) and simply use it? > > For this patch, I think we could use pci_requester_id() for dev_id_b > above, but we still need to walk the buses and devices, so it really > only simplifies the 'if (pci_is_express...' code block above, and > barely at that since we're at the point in the topology where such a > decision is made. For the previous patch, pci_requester_id() returns a > BDF and that code is executed before bus numbers are programmed. We > might still make use of requester_id_cache, but a different interface > would be necessary. Also note how pci_req_id_cache_get() assumes we're > looking for the requester ID as seen from the root bus while > pci_device_iommu_address_space() is from the bus hosting iommu_fn. > While these are generally the same in practice, it's not required. I'd > therefore tend to leave that to some future consolidation. I can > update the comment to include that justification in the previous patch. Yes, we can work on top in the future if needed. I see that Michael already plan to merge this version, then it may not worth a repost for the comment (unless there will be a repost, we could mark a TODO). > > > > + /* > > > + * A PCI bus walk, for each PCI host bridge, is necessary to create a > > > + * complete set of IVHD entries. Do this into a separate blob so > > > that we > > > + * can calculate the total IVRS table length here and then append > > > the new > > > + * blob further below. Fall back to an entry covering all devices, > > > which > > > + * is sufficient when no aliases are present. > > > + */ > > > + object_child_foreach_recursive(object_get_root(), > > > + ivrs_host_bridges, ivhd_blob); > > > + > > > + if (!ivhd_blob->len) { > > > + /* > > > + * Type 1 device entry reporting all devices > > > + * These are 4-byte device entries currently reporting the > > > range of > > > + * Refer to Spec - Table 95:IVHD Device Entry Type > > > Codes(4-byte) > > > + */ > > > + build_append_int_noprefix(ivhd_blob, 0x0000001, 4); > > > + } > > > > Is there a real use case for ivhd_blob->len==0? > > It was mostly paranoia, but I believe it's really only an Intel > convention that the PCI host bridge appears as a device on the bus. It > seems possible that we could have a host bridge with no devices, in > which case we'd insert this select-all entry to make the IVRS valid. > Perhaps in combination with AMD exposing their IOMMU as a device on the > PCI bus this is not really an issue, but it's a trivial safety net. > Thanks, That question was only for curiousity. This code path will only trigger when AMD vIOMMU is detected so I assume the IOMMU device should always be there, but of course it won't hurt as a safety net. Thanks for doing this! -- Peter Xu