Re: [PATCH v3 7/7] iommu/amd: Do not support IOMMUv2 APIs when SNP is enabled
On 6/23/2022 3:23 PM, Joerg Roedel wrote: On Wed, Jun 22, 2022 at 12:11:31PM -0500, Suravee Suthikulpanit wrote: bool amd_iommu_v2_supported(void) { - return amd_iommu_v2_present; + /* +* Since DTE[Mode]=0 is prohibited on SNP-enabled system +* (i.e. EFR[SNPSup]=1), IOMMUv2 page table cannot be used without +* setting up IOMMUv1 page table. +*/ + return amd_iommu_v2_present && !amd_iommu_snp_en; IOMMU_v2 APIs could actually be supported with GIOV and IOMMUv2 page-tables in-use, no? We can support IOMMUv2 iff the v1 table is also setup (i.e. DTE[Mode] != 0). Currently w/ IOMMU_v2 APIs, the IOMMU sets the mode to zero. Therefore, we cannot support this use case. Best Regards, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/7] iommu/amd: Warn when found inconsistency EFR mask
On 6/23/2022 3:22 PM, Joerg Roedel wrote: On Wed, Jun 22, 2022 at 12:11:25PM -0500, Suravee Suthikulpanit wrote: #ifdef CONFIG_IRQ_REMAP +/* + * Iterate through all the IOMMUs to verify if the specified + * EFR bitmask of IOMMU feature are set. + * Warn and return false if found inconsistency. + */ static bool check_feature_on_all_iommus(u64 mask) { bool ret = false; struct amd_iommu *iommu; for_each_iommu(iommu) { - ret = iommu_feature(iommu, mask); - if (!ret) + bool tmp = iommu_feature(iommu, mask); + + if ((ret != tmp) && + !list_is_first(>list, _iommu_list)) { + pr_err(FW_BUG "Found inconsistent EFR mask (%#llx) on iommu%d (%04x:%02x:%02x.%01x).\n", + mask, iommu->index, iommu->pci_seg->id, PCI_BUS_NUM(iommu->devid), + PCI_SLOT(iommu->devid), PCI_FUNC(iommu->devid)); return false; + } + ret = tmp; It is better to implement this by introducing a global feature mask, which represents the minial set of features supported by any IOMMU in the system. The warning is then something like: if ((global_feature_mask & iommu_features) != global_feature_mask) pr_warn(...); This also makes the global variable to track SNP support obsolete. Regards, Joerg That's actually better. I'll send out v4 w/ global EFR variable. Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 4/7] iommu/amd: Introduce function to check and enable SNP
On 6/22/2022 3:35 PM, Robin Murphy wrote: Overall though, this is way nicer than v1, and it's definitely the right name in the right place now, thanks! FWIW, with those nits picked one way or another: Reviewed-by: Robin Murphy Cheers, Robin. Thanks for your review. I'll send out v3 w/ your suggestions and reviewed-by tag. Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 4/7] iommu/amd: Introduce function to check and enable SNP
Recap discussion on the other thread. https://lore.kernel.org/linux-mm/camkat6qorwbaxapacasm0sc9o2uq9zqzb6s1kbkvav2d4tk...@mail.gmail.com/#t On 6/16/2022 8:55 AM, Suravee Suthikulpanit wrote: +int amd_iommu_snp_enable(void) +{ + /* +* The SNP support requires that IOMMU must be enabled, and is +* not configured in the passthrough mode. +*/ + if (no_iommu || iommu_default_passthrough()) { + pr_err("SNP: IOMMU is either disabled or configured in passthrough mode.\n"); + return -EINVAL; + } Peter has suggested rewording to something more descriptive such as: "SNP: IOMMU is either disabled or configured in passthrough mode, SNP cannot be supported". Thank you, Suravee + /* +* Prevent enabling SNP after IOMMU_ENABLED state because this process +* affect how IOMMU driver sets up data structures and configures +* IOMMU hardware. +*/ + if (init_state > IOMMU_ENABLED) { + pr_err("SNP: Too late to enable SNP for IOMMU.\n"); + return -EINVAL; + } + + amd_iommu_snp_en = amd_iommu_snp_sup; + if (!amd_iommu_snp_en) + return -EINVAL; + + pr_info("SNP enabled\n"); + + /* Enforce IOMMU v1 pagetable when SNP is enabled. */ + if (amd_iommu_pgtable != AMD_IOMMU_V1) { + pr_warn("Force to using AMD IOMMU v1 page table due to SNP\n"); + amd_iommu_pgtable = AMD_IOMMU_V1; + amd_iommu_ops.pgsize_bitmap = AMD_IOMMU_PGSIZES; + } + + return 0; +} +EXPORT_SYMBOL_GPL(amd_iommu_snp_enable); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/7] iommu: Add domain_type_supported() callback in iommu_ops
Robin, On 6/14/2022 4:51 PM, Robin Murphy wrote: On 2022-06-13 15:38, Suthikulpanit, Suravee wrote: Robin, On 6/13/2022 4:31 PM, Robin Murphy wrote: Introducing check_domain_type_supported() callback in iommu_ops, which allows IOMMU generic layer to check with vendor-specific IOMMU driver whether the requested type is supported. This allows user to request types other than the default type. Note also that you're only adding this in the sysfs path - what about the "iommu.passthrough=" parameter or CONFIG_IOMMU_DEFAULT_PASSTHROUGH? For SNP case, we cannot enable SNP if iommu=off or iommu=pt or iommu.passthrough=1 or CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y. So, when another driver tries to enable SNP, the IOMMU driver prevents it (see iommu_sev_snp_supported() in patch 3). Ugh, I hadn't looked too closely at the other patches, but an interface that looks like a simple "is this feature supported?" check with a secret side-effect of changing global behaviour as well? Yuck :( What external drivers are expected to have the authority to affect the entire system and call that? The fact that you're exporting it suggests they could be loaded from modules *after* v2 features have been enabled and/or the user has configured a non-default identity domain for a group via sysfs... Fun! I see your point. Currently, the function to enable SNP will be called from SEV driver when it tries to enable SNP support globally on the system. This is done during fs_initcall(), which is early in the boot process. I can also add a guard code to make sure that this won't be done after a certain phase. Instead, if we boot with iommu.passhthrough=0, when another driver tries to enable SNP, the IOMMU driver allows this and switch to SNP enable mode. Subsequently, if user tries to switch a domain (via sysfs) to IOMMU_DOMAIN_IDENTITY, the IOMMU needs to prevent this because it has already switch to SNP-enabled mode. AFAICS there shouldn't need to be any core-level changes to support this. We already have drivers which don't support passthrough at all, so conditionally not supporting it should be no big deal. What should happen currently is that def_domain_type returns 0 for "don't care", then domain_alloc rejects IOMMU_DOMAIN_IDENTITY and and returns NULL, so iommu_group_alloc_default_domain() falls back to IOMMU_DOMAIN_DMA. Technically, we can do it the way you suggest. But isn't this confusing? At first, def_domain_type() returns 0 for "don't care", but then it rejects the request to change to IOMMU_DOMAIN_IDENTITY when trying to call domain_alloc(). Yes, that's how it works; def_domain_type is responsible for quirking individual *devices* that need to have a specific domain type (in practice, devices which need identity mapping), while domain_alloc is responsible for saying which domain types the driver supports as a whole, by allocating them or not as appropriate. We don't have a particularly neat way to achieve the negative of def_domain_type - i.e. saying that a specific device *can't* use a specific otherwise-supported domain type - other than subsequently failing in attach_dev, but so far we've not needed such a thing. And if SNP is expected to be mutually exclusive with identity domain support globally, then we still shouldn't need it. Thanks for your feedback. Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/7] iommu/amd: Introduce function to check SEV-SNP support
On 6/13/2022 8:24 AM, Suravee Suthikulpanit wrote: @@ -3543,3 +3537,30 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64 return iommu_pc_get_set_reg(iommu, bank, cntr, fxn, value, true); } + +bool iommu_sev_snp_supported(void) +{ + /* +* The SEV-SNP support requires that IOMMU must be enabled, and is +* not configured in the passthrough mode. +*/ + if (no_iommu || iommu_default_passthrough()) { + pr_err("SEV-SNP: IOMMU is either disabled or configured in passthrough mode.\n"); + return false; + } + + amd_iommu_snp_en = amd_iommu_snp_sup; + if (amd_iommu_snp_en) + pr_info("SNP enabled\n"); + + /* Enforce IOMMU v1 pagetable when SNP is enabled. */ + if ((amd_iommu_pgtable != AMD_IOMMU_V1) && +amd_iommu_snp_en) { + pr_info("Force to using AMD IOMMU v1 page table due to SNP\n"); + amd_iommu_pgtable = AMD_IOMMU_V1; + amd_iommu_ops.pgsize_bitmap = AMD_IOMMU_PGSIZES; + } + + return amd_iommu_snp_en; +} +EXPORT_SYMBOL_GPL(iommu_sev_snp_supported); I need to guard this function w/ #ifdef CONFIG_AMD_MEM_ENCRYPT to prevent build error when CONFIG_AMD_MEM_ENCRYPT=n. I will send out v2 to fix this. Best Regards, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/7] iommu: Add domain_type_supported() callback in iommu_ops
Robin, On 6/13/2022 4:31 PM, Robin Murphy wrote: On 2022-06-13 02:25, Suravee Suthikulpanit wrote: When user requests to change IOMMU domain to a new type, IOMMU generic layer checks the requested type against the default domain type returned by vendor-specific IOMMU driver. However, there is only one default domain type, and current mechanism does not allow if the requested type does not match the default type. I don't really follow the reasoning here. If a driver's def_domain_type callback returns a specific type, it's saying that the device *has* to have that specific domain type for driver/platform-specific reasons. Agree, and I understand this part. If that's not the case, then the driver shouldn't say so in the first place. Considering the case: 1. Boot w/ default domain = IOMMU_DOMAIN_DMA_FQ 2. User wants to change to IOMMU_DOMAIN_IDENTITY, which is not supported by IOMMU driver. In this case, IOMMU driver can return IOMMU_DOMAIN_DMA_FQ and prevent the mode change. 3. However, if user want to change to IOMMU_DOMAIN_DMA. The driver can support this. However, since the def_domain_type() returns IOMMU_DOMAIN_DMA_FQ, it ends up prevent the mode change. IIUC, we should support step 3 above. Basically, with the newly proposed interface, it allows us to check with IOMMU driver if it can support certain domain types before trying to allocate the domain. Introducing check_domain_type_supported() callback in iommu_ops, which allows IOMMU generic layer to check with vendor-specific IOMMU driver whether the requested type is supported. This allows user to request types other than the default type. Note also that you're only adding this in the sysfs path - what about the "iommu.passthrough=" parameter or CONFIG_IOMMU_DEFAULT_PASSTHROUGH? For SNP case, we cannot enable SNP if iommu=off or iommu=pt or iommu.passthrough=1 or CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y. So, when another driver tries to enable SNP, the IOMMU driver prevents it (see iommu_sev_snp_supported() in patch 3). Instead, if we boot with iommu.passhthrough=0, when another driver tries to enable SNP, the IOMMU driver allows this and switch to SNP enable mode. Subsequently, if user tries to switch a domain (via sysfs) to IOMMU_DOMAIN_IDENTITY, the IOMMU needs to prevent this because it has already switch to SNP-enabled mode. AFAICS there shouldn't need to be any core-level changes to support this. We already have drivers which don't support passthrough at all, so conditionally not supporting it should be no big deal. What should happen currently is that def_domain_type returns 0 for "don't care", then domain_alloc rejects IOMMU_DOMAIN_IDENTITY and and returns NULL, so iommu_group_alloc_default_domain() falls back to IOMMU_DOMAIN_DMA. Technically, we can do it the way you suggest. But isn't this confusing? At first, def_domain_type() returns 0 for "don't care", but then it rejects the request to change to IOMMU_DOMAIN_IDENTITY when trying to call domain_alloc(). Please let me know if I am missing something. Best Regards, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: iommu/amd: bug report: page table memory leak
Hi Daniel, On 1/19/2022 2:47 AM, Daniel Jordan wrote: Hi, I've hit a memory leak while testing qemu v6.2.0-rc4 on an AMD EPYC 7J13 (Milan) system. Starting an almost 1T guest, the leak is over 1.5G per qemu invocation. I haven't checked whether the leak is proportional to guest size. It happens with a vfio device, and only when the guest's memory is preallocated using qemu prealloc (this latter part is kinda strange). It happens when the guest memory uses THP but not hugetlb. Bisection: # bad: [df0cc57e057f18e44dac8e6c18aba47ab53202f9] Linux 5.16 # good: [f40ddce88593482919761f74910f42f4b84c004b] Linux 5.11 git bisect start 'df0cc57e057f1' 'f40ddce885934' '--' 'drivers/vfio' 'drivers/iommu' 'include/linux/amd-iommu.h' 'include/linux/dma-iommu.h' 'include/linux/intel-iommu.h' 'include/linux/iommu-helper.h' 'include/linux/of_iommu.h' 'include/ linux/omap-iommu.h' 'include/linux/platform_data/iommu-omap.h' 'include/linux/iommu.h' 'include/trace/events/intel_iommu.h' 'include/trace/events/iommu.h' 'include/uapi/linux/iommu.h' 'include/uapi/linux/virtio_iommu.h' 'arch/x86/events/a md/iommu.h' 'arch/x86/events/amd/iommu.c' 'arch/x86/include/asm/iommu.h' 'arch/x86/include/asm/iommu_table.h' 'arch/x86/kernel/pci-iommu_table.c' # bad: [cee57d4fe74e82e784f6566bad3e3bb1ca51a211] iommu/vt-d: Remove unnecessary braces git bisect bad cee57d4fe74e82e784f6566bad3e3bb1ca51a211 # bad: [9fb5fad562fa0a41c84691714d99c23f54168a9e] iommu: remove DOMAIN_ATTR_PAGING git bisect bad 9fb5fad562fa0a41c84691714d99c23f54168a9e # bad: [45e606f2726926b04094e1c9bf809bca4884c57f] Merge branches 'arm/renesas', 'arm/smmu', 'x86/amd', 'x86/vt-d' and 'core' into next git bisect bad 45e606f2726926b04094e1c9bf809bca4884c57f # good: [7060377ce06f9cd3ed6274c0f2310463feb5baec] Merge branch 'for-joerg/mtk' into for-joerg/arm-smmu/updates git bisect good 7060377ce06f9cd3ed6274c0f2310463feb5baec # bad: [6778ff5b21bd8e78c8bd547fd66437cf2657fd9b] iommu/amd: Fix performance counter initialization git bisect bad 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b # good: [f9b4df790aa4372bfa11b7d212e537b763295429] iommu/amd: Declare functions as extern git bisect good f9b4df790aa4372bfa11b7d212e537b763295429 # bad: [33aef9786046d9a5744cd1e8d5d0ce800d611fdc] iommu/amd: Rename variables to be consistent with struct io_pgtable_ops git bisect bad 33aef9786046d9a5744cd1e8d5d0ce800d611fdc # bad: [e42ba0633064ef23eb1c8c21edf96bac1541bd4b] iommu/amd: Restructure code for freeing page table git bisect bad e42ba0633064ef23eb1c8c21edf96bac1541bd4b # good: [18954252a1d0b12e1b77087b55c37fb43b09e12a] iommu/amd: Move IO page table related functions git bisect good 18954252a1d0b12e1b77087b55c37fb43b09e12a # first bad commit: [e42ba0633064ef23eb1c8c21edf96bac1541bd4b] iommu/amd: Restructure code for freeing page table commit e42ba0633064ef23eb1c8c21edf96bac1541bd4b Author: Suravee Suthikulpanit Date: Tue Dec 15 01:36:59 2020 -0600 iommu/amd: Restructure code for freeing page table By consolidate logic into v1_free_pgtable helper function, which is called from IO page table framework. Signed-off-by: Suravee Suthikulpanit Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fr%2F20201215073705.123786-8-suravee.suthikulpanit%40amd.comdata=04%7C01%7Csuravee.suthikulpanit%40amd.com%7C143de50116b0431302ce08d9dabb5dab%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637781323390114388%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=RK%2F8spS7L5qdvaBYx0OE9T75TOfb9QiA8%2BKk4C00Jqo%3Dreserved=0 Signed-off-by: Joerg Roedel drivers/iommu/amd/amd_iommu.h | 1 - drivers/iommu/amd/io_pgtable.c | 41 - drivers/iommu/amd/iommu.c | 21 - 3 files changed, 28 insertions(+), 35 deletions(-) Qemu command line: numactl -m 1 -N 1 "$QEMU" \ -name vmol74 \ -machine q35,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram \ -cpu host,host-phys-bits=true \ -smp cpus=32 \ -no-user-config \ -nodefaults \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -boot strict=on \ -drive file=${vm_image},format=raw,if=none,id=drive-ide0-0-0 \ -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 \ -device vfio-pci,host=${pci_addr},id=net2,bus=pcie.0 \ -msg timestamp=on \ -nographic \ -object memory-backend-ram,id=pc.ram,size=980g,prealloc=on,prealloc-threads=16 -m 980g \ -daemonize Kernel config
Re: [PATCH 0/3] iommu/amd: Fix unable to handle page fault due to AVIC
On 9/2/2021 12:38 AM, Joerg Roedel wrote: Hi Suravee, On Tue, Aug 31, 2021 at 12:10:27PM -0500, Suthikulpanit, Suravee wrote: Here is an dditional tags for this series: Fixes: 8bda0cfbdc1a ("iommu/amd: Detect and initialize guest vAPIC log") Are there any concerns with this series? No concerns, please add the tag and re-post when -rc1 is out. I will it queue for -rc2 then. Thanks, Joerg I'll do that. Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/3] iommu/amd: Fix unable to handle page fault due to AVIC
Joerg, Here is an dditional tags for this series: Fixes: 8bda0cfbdc1a ("iommu/amd: Detect and initialize guest vAPIC log") Are there any concerns with this series? Thanks Suravee On 8/20/2021 3:29 PM, Suravee Suthikulpanit wrote: This bug is triggered when rebooting VM on a system which SVM AVIC is enabled but IOMMU AVIC is disabled in the BIOS. The series reworks interrupt remapping intialiation to check for IOMMU AVIC support (GAMSup) at earlier stage using EFR provided by IVRS table instead of the PCI MMIO register, which is available after PCI support for IOMMU is initialized. This helps avoid having to disable and clean up the already initialized interrupt-remapping-related parameter. Thanks, Suravee Suravee Suthikulpanit (2): iommu/amd: Introduce helper function to check feature bit on all IOMMUs iommu/amd: Remove iommu_init_ga() Wei Huang (1): iommu/amd: Relocate GAMSup check to early_enable_iommus drivers/iommu/amd/init.c | 45 ++-- 1 file changed, 25 insertions(+), 20 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3] iommu/amd: Use report_iommu_fault()
Lennert, FYI: I have made some comments in V2 thread specifically around the new changes that we discussed in that thread. Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/amd: Use report_iommu_fault()
Lennert, On 7/29/2021 9:32 PM, Lennert Buytenhek wrote: We have three cases to handle: - EVENT_FLAG_I set: IRQ remapping fault, don't call report_iommu_fault() - EVENT_FLAG_I unset, but the request was a translation request (EVENT_FLAG_TR set) or the target page was not present (EVENT_FLAG_PR unset): call report_iommu_fault(), but the RW bit will be invalid, so don't try to map it to a IOMMU_FAULT_{READ,WRITE} code So, why do we need to call report_iommu_fault() for this case? My understanding is we only have IOMMU_FAULT_[READ|WRITE]. So, if we can't identify whether the DMA is read / write, we should not need to call report_iommu_fauilt(), is it? - EVENT_FLAG_I unset, the request is a transaction request (EVENT_FLAG_TR unset) and the target page was present (EVENT_FLAG_PR set): call report_iommu_fault(), and use the RW bit to set IOMMU_FAULT_{READ,WRITE} So I don't think we can merge the test for EVENT_FLAG_I with the test for EVENT_FLAG_TR/EVENT_FLAG_PR. The only condition that we would report_iommu_fault is I=0, TR=0, PR=1, isn't it. So we should be able to just check if PR=1. We could do something like this, if you'd prefer: #define IS_IOMMU_MEM_TRANSACTION(flags) \ (((flags) & EVENT_FLAG_I) == 0) #define IS_RW_FLAG_VALID(flags) \ (((flags) & (EVENT_FLAG_TR | EVENT_FLAG_PR)) == EVENT_FLAG_PR) #define IS_WRITE_REQUEST(flags) \ (IS_RW_FLAG_VALID(flags) && (flags & EVENT_FLAG_RW)) And then do something like: if (dev_data && IS_IOMMU_MEM_TRANSACTION(flags)) { if (!report_iommu_fault(_data->domain->domain, >dev, address, IS_WRITE_REQUEST(flags) ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) Actually, IS_WRITE_REQUEST() == 0 could mean: - I=0, TR=0, PR=1 and RW=0: This is fine. - I=0, (TR=1 or PR=0), and we should not be calling report_iommu_fault() here since we cannot specify READ/WRITE here. Thanks, Suravee goto out; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/amd: Use report_iommu_fault()
Lennert, On 7/26/2021 11:31 AM, Lennert Buytenhek wrote: This patch makes iommu/amd call report_iommu_fault() when an I/O page fault occurs, which has two effects: 1) It allows device drivers to register a callback to be notified of I/O page faults, via the iommu_set_fault_handler() API. 2) It triggers the io_page_fault tracepoint in report_iommu_fault() when an I/O page fault occurs. I'm mainly interested in (2). We have a daemon with some rasdaemon-like functionality for handling platform errors, and being able to be notified of I/O page faults for initiating corrective action is very useful -- and receiving such events via event tracing is a lot nicer than having to scrape them from kmsg. A number of other IOMMU drivers already use report_iommu_fault(), and I/O page faults on those IOMMUs therefore already seem to trigger this tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR. I copied the logic from the other callers of report_iommu_fault(), where if that function returns zero, the driver will have handled the fault, in which case we avoid logging information about the fault to the printk buffer from the IOMMU driver. With this patch I see io_page_fault event tracing entries as expected: irq/24-AMD-Vi-48[002] 978.554289: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482640 flags=0x irq/24-AMD-Vi-48[002] 978.554294: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482650 flags=0x irq/24-AMD-Vi-48[002] 978.554299: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482660 flags=0x irq/24-AMD-Vi-48[002] 978.554305: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482670 flags=0x irq/24-AMD-Vi-48[002] 978.554310: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482680 flags=0x irq/24-AMD-Vi-48[002] 978.554315: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x914826a0 flags=0x For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU spec, but I haven't tested that bit of the code, as the page faults I encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which case EVENT_FLAG_RW doesn't make sense. Signed-off-by: Lennert Buytenhek --- Changes since v1 RFC: - Don't call report_iommu_fault() for IRQ remapping faults. (Suggested by Joerg Roedel.) drivers/iommu/amd/amd_iommu_types.h | 4 drivers/iommu/amd/iommu.c | 29 + 2 files changed, 33 insertions(+) diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 94c1a7a9876d..2f2c6630c24c 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -138,6 +138,10 @@ #define EVENT_DOMID_MASK_HI 0xf #define EVENT_FLAGS_MASK 0xfff #define EVENT_FLAGS_SHIFT 0x10 +#define EVENT_FLAG_TR 0x100 +#define EVENT_FLAG_RW 0x020 +#define EVENT_FLAG_PR 0x010 +#define EVENT_FLAG_I 0x008 /* feature control bits */ #define CONTROL_IOMMU_EN0x00ULL diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index a7d6d78147b7..d9fb2c22d44a 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c What if we introduce: +/* + * AMD I/O Virtualization Technology (IOMMU) Specification, + * revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says + * that the RW ("read-write") bit is only valid if the I/O + * page fault was caused by a memory transaction request + * referencing a page that was marked present. + */ +#define IO_PAGE_FAULT_MEM_MASK \ + (EVENT_FLAG_TR | EVENT_FLAG_PR | EVENT_FLAG_I) +#define IS_IOMMU_MEM_TRANSACTION(x)\ + ((x & IO_PAGE_FAULT_MEM_MASK) == EVENT_FLAG_PR) Note that this should have already checked w/ EVENT_FLAG_I == 0. @@ -484,6 +484,34 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, if (pdev) dev_data = dev_iommu_priv_get(>dev); + /* +* If this is a DMA fault (for which the I(nterrupt) bit will +* be unset), allow report_iommu_fault() to prevent logging it. +*/ + if (dev_data && ((flags & EVENT_FLAG_I) == 0)) { + int report_flags; + + /* +* AMD I/O Virtualization Technology (IOMMU) Specification, +* revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says +* that the RW ("read-write") bit is only valid if the I/O +* page fault was caused by a memory transaction request +* referencing a page that was marked present. +*/ + report_flags = 0; + if ((flags & (EVENT_FLAG_TR | EVENT_FLAG_PR)) == + EVENT_FLAG_PR) { + if (flags & EVENT_FLAG_RW) +
Re: [PATCH] iommu/amd: Convert from atomic_t to refcount_t on pasid_state->count
On 7/19/2021 3:32 AM, Xiyu Yang wrote: refcount_t type and corresponding API can protect refcounters from accidental underflow and overflow and further use-after-free situations. Signed-off-by: Xiyu Yang Signed-off-by: Xin Tan Thanks, Reviewed-by: Suravee Suthikulpanit ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Convert from atomic_t to refcount_t on device_state->count
On 7/19/2021 1:00 AM, Xiyu Yang wrote: refcount_t type and corresponding API can protect refcounters from accidental underflow and overflow and further use-after-free situations. Signed-off-by: Xiyu Yang Signed-off-by: Xin Tan --- drivers/iommu/amd/iommu_v2.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) Thanks, Reviewed-by: Suravee Suthikulpanit ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH,RFC] iommu/amd: Use report_iommu_fault()
Lennert, On 7/19/2021 4:54 AM, Lennert Buytenhek wrote: This patch makes iommu/amd call report_iommu_fault() when an I/O page fault occurs, which has two effects: 1) It allows device drivers to register a callback to be notified of I/O page faults, via the iommu_set_fault_handler() API. 2) It triggers the io_page_fault tracepoint in report_iommu_fault() when an I/O page fault occurs. I'm mainly interested in (2). We have a daemon with some rasdaemon-like functionality for handling platform errors, and being able to be notified of I/O page faults for initiating corrective action is very useful -- and receiving such events via event tracing is a lot nicer than having to scrape them from kmsg. Interesting. Just curious what types of error handling are done here? A number of other IOMMU drivers already use report_iommu_fault(), and I/O page faults on those IOMMUs therefore already seem to trigger this tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR. I copied the logic from the other callers of report_iommu_fault(), where if that function returns zero, the driver will have handled the fault, in which case we avoid logging information about the fault to the printk buffer from the IOMMU driver. With this patch I see io_page_fault event tracing entries as expected: irq/24-AMD-Vi-48[002] 978.554289: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482640 flags=0x irq/24-AMD-Vi-48[002] 978.554294: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482650 flags=0x irq/24-AMD-Vi-48[002] 978.554299: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482660 flags=0x irq/24-AMD-Vi-48[002] 978.554305: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482670 flags=0x irq/24-AMD-Vi-48[002] 978.554310: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x91482680 flags=0x irq/24-AMD-Vi-48[002] 978.554315: io_page_fault: IOMMU:[drvname] :05:00.0 iova=0x914826a0 flags=0x For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU spec, but I haven't tested that bit of the code, as the page faults I encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which case EVENT_FLAG_RW doesn't make sense. Since, IO_PAGE_FAULT event is used to communicate various types of fault events, why don't we just pass the flags as-is? This way, it can be used to report/trace various types of IO_PAGE_FAULT events (e.g. for I/O page table, interrupt remapping, and etc). Interested parties can register domain fault handler, and it can takes care of parsing information of the flag as needed. Signed-off-by: Lennert Buytenhek --- drivers/iommu/amd/amd_iommu_types.h |4 drivers/iommu/amd/iommu.c | 25 + 2 files changed, 29 insertions(+) diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 94c1a7a9876d..2f2c6630c24c 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -138,6 +138,10 @@ #define EVENT_DOMID_MASK_HI 0xf #define EVENT_FLAGS_MASK 0xfff #define EVENT_FLAGS_SHIFT 0x10 +#define EVENT_FLAG_TR 0x100 +#define EVENT_FLAG_RW 0x020 +#define EVENT_FLAG_PR 0x010 +#define EVENT_FLAG_I 0x008 /* feature control bits */ #define CONTROL_IOMMU_EN0x00ULL diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 811a49a95d04..a02ace7ee794 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -480,6 +480,30 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, if (pdev) dev_data = dev_iommu_priv_get(>dev); + if (dev_data) { + int report_flags; + + /* +* AMD I/O Virtualization Technology (IOMMU) Specification, +* revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says +* that the RW ("read-write") bit is only valid if the I/O +* page fault was caused by a memory transaction request +* referencing a page that was marked present. +*/ + report_flags = 0; + if ((flags & (EVENT_FLAG_TR | EVENT_FLAG_PR | EVENT_FLAG_I)) == + EVENT_FLAG_PR) { Let's not do this check + if (flags & EVENT_FLAG_RW) + report_flags |= IOMMU_FAULT_WRITE; + else + report_flags |= IOMMU_FAULT_READ; ... and then we don't need to translate the EVENT_FLAG_XX to IOMMU_FAULT_XXX flags. + } + + if (!report_iommu_fault(_data->domain->domain, + >dev, address, report_flags)) Let's just
Re: [PATCH] iommu/amd: Fix printing of IOMMU events when rate limiting kicks in
On 7/21/2021 8:44 AM, Lennert Buytenhek wrote: For the printing of RMP_HW_ERROR / RMP_PAGE_FAULT / IO_PAGE_FAULT events, the AMD IOMMU code uses such logic: if (pdev) dev_data = dev_iommu_priv_get(>dev); if (dev_data && __ratelimit(_data->rs)) { pci_err(pdev, ... } else { printk_ratelimit() / pr_err{,_ratelimited}(... } This means that if we receive an event for a PCI devid which actually does have a struct pci_dev and an attached struct iommu_dev_data, but rate limiting kicks in, we'll fall back to the non-PCI branch of the test, and print the event in a different format. Fix this by changing the logic to: if (dev_data) { if (__ratelimit(_data->rs)) { pci_err(pdev, ... } } else { pr_err_ratelimited(... } Suggested-by: Suravee Suthikulpanit Signed-off-by: Lennert Buytenhek Reviewed-by: Suravee Suthikulpanit ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix I/O page fault logging ratelimit test
Hi Lennert, On 7/18/2021 7:47 PM, Lennert Buytenhek wrote: On an AMD system, I/O page faults are usually logged like this: diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 811a49a95d04..7ae426b092f2 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -483,7 +483,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, if (dev_data && __ratelimit(_data->rs)) { pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", domain_id, address, flags); - } else if (printk_ratelimit()) { + } else if (!dev_data && printk_ratelimit()) { This seems a bit confusing. Also, according to the following comment in include/linux/printk.h: /* * Please don't use printk_ratelimit(), because it shares ratelimiting state * with all other unrelated printk_ratelimit() callsites. Instead use * printk_ratelimited() or plain old __ratelimit(). */ We probably should move away from using printk_ratelimit() here. What about the following change instead? diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 811a49a95d04..8eb5d3519743 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -480,11 +480,12 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, if (pdev) dev_data = dev_iommu_priv_get(>dev); - if (dev_data && __ratelimit(_data->rs)) { - pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", - domain_id, address, flags); - } else if (printk_ratelimit()) { - pr_err("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n", + if (dev_data) { + if (__ratelimit(_data->rs)) + pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", + domain_id, address, flags); + } else { + pr_err_ratelimited("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n", PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), domain_id, address, flags); } Note also that there might be other places in this file that would need similar modification as well. Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
On 5/5/2021 8:05 PM, Peter Zijlstra wrote: On Wed, May 05, 2021 at 07:39:14PM +0700, Suthikulpanit, Suravee wrote: Peter, On 5/4/2021 7:13 PM, Peter Zijlstra wrote: On Tue, May 04, 2021 at 06:58:29PM +0700, Suthikulpanit, Suravee wrote: Peter, On 5/4/2021 4:39 PM, Peter Zijlstra wrote: On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote: 2. Since AMD IOMMU PMU does not support interrupt mode, the logic can be simplified to always start counting with value zero, and accumulate the counter value when stopping without the need to keep track and reprogram the counter with the previously read counter value. This relies on the hardware counter being the full 64bit wide, is it? The HW counter value is 48-bit. Not sure why it needs to be 64-bit? I might be missing some points here? Could you please describe? How do you deal with the 48bit overflow if you don't use the interrupt? The IOMMU Perf driver does not currently handle counter overflow since the overflow notification mechanism (i.e. IOMMU creates an EVENT_COUNTER_ZERO event in the IOMMU event log, and generate an IOMMU MSI interrupt to signal IOMMU driver to process the event.) is not currently supported. When counter overflows, the counter becomes zero, and Perf reports value zero for the event. Alternatively, to detect overflow, we might start counting with value 1 so that we can detect overflow when the value becomes zero in which case the Perf driver could generate error message. Urgh.. the intel uncore driver programs an hrtimer to periodically fold deltas. That way the counter will never be short. Thanks for the info. I'll look into ways to detect and handle counter overflow for this. So far, with the current power-gating, it has several restrictions regarding when the HW counter can be accessed, which makes it more difficult to handle this. Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
Peter, On 5/4/2021 7:13 PM, Peter Zijlstra wrote: On Tue, May 04, 2021 at 06:58:29PM +0700, Suthikulpanit, Suravee wrote: Peter, On 5/4/2021 4:39 PM, Peter Zijlstra wrote: On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote: 2. Since AMD IOMMU PMU does not support interrupt mode, the logic can be simplified to always start counting with value zero, and accumulate the counter value when stopping without the need to keep track and reprogram the counter with the previously read counter value. This relies on the hardware counter being the full 64bit wide, is it? The HW counter value is 48-bit. Not sure why it needs to be 64-bit? I might be missing some points here? Could you please describe? How do you deal with the 48bit overflow if you don't use the interrupt? The IOMMU Perf driver does not currently handle counter overflow since the overflow notification mechanism (i.e. IOMMU creates an EVENT_COUNTER_ZERO event in the IOMMU event log, and generate an IOMMU MSI interrupt to signal IOMMU driver to process the event.) is not currently supported. When counter overflows, the counter becomes zero, and Perf reports value zero for the event. Alternatively, to detect overflow, we might start counting with value 1 so that we can detect overflow when the value becomes zero in which case the Perf driver could generate error message. Regards, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
Peter, On 5/4/2021 4:39 PM, Peter Zijlstra wrote: On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote: 2. Since AMD IOMMU PMU does not support interrupt mode, the logic can be simplified to always start counting with value zero, and accumulate the counter value when stopping without the need to keep track and reprogram the counter with the previously read counter value. This relies on the hardware counter being the full 64bit wide, is it? The HW counter value is 48-bit. Not sure why it needs to be 64-bit? I might be missing some points here? Could you please describe? Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
David / Joerg, On 4/10/2021 5:03 PM, David Coe wrote: The immediately obvious difference is the with the enormous count seen on mem_dte_mis on the older Ryzen 2400G. Will do some RTFM but anyone with comments and insight? 841,689,151,202,939 amd_iommu_0/mem_dte_mis/ (33.44%) Otherwise, all seems to running smoothly (especially for a distribution still in β). Bravo and many thanks all! The initial hypothesis is that the issue happens only when users specify more number of events than the available counters, which Perf will time-multiplex the events onto the counters. Looking at the Perf and AMD IOMMU PMU multiplexing logic, it requires: 1. Stop the counter (i.e. set CSOURCE to zero to stop counting) 2. Save the counter value of the current event 3. Reload the counter value of the new event (previously saved) 4. Start the counter (i.e. set CSOURCE to count new events) The problem here is that when the driver writes zero to CSOURCE register in step 1, this would enable power-gating, which prevents access to the counter and result in writing/reading value in step 2 and 3. I have found a system that reproduced this case (w/ unusually large number of count), and debug the issue further. As a hack, I have tried skipping step 1, and it seems to eliminate this issue. However, this is logically incorrect, and might result in inaccurate data depending on the events. Here are the options: 1. Continue to look for workaround for this issue. 2. Find a way to disable event time-multiplexing (e.g. only limit the number of counters to 8) if power gating is enabled on the platform. 3. Back to the original logic where we had the pre-init check of the counter vlues, which is still the safest choice at the moment unless Regards, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
David, On 4/14/2021 10:33 PM, David Coe wrote: Hi Suravee! I've re-run your revert+update patch on Ubuntu's latest kernel 5.11.0-14 partly to check my mailer's 'mangling' hadn't also reached the code! There are 3 sets of results in the attachment, all for the Ryzen 2400G. The as-distributed kernel already incorporates your IOMMU RFCv3 patch. A. As-distributed kernel (cold boot) >5 retries, so no IOMMU read/write capability, no amd_iommu events. B. As-distributed kernel (warm boot) <5 retries, amd_iommu running stats show large numbers as before. C. Revert+Update kernel amd_iommu events listed and also show large hit/miss numbers. In due course, I'll load the new (revert+update) kernel on the 4700G but won't overload your mail-box unless something unusual turns up. Best regards, For the Ryzen 2400G, could you please try with: - 1 event at a time - Not more than 8 events (On your system, it has 2 banks x 4 counters/bank. I am trying to see if this issue might be related to the counters multiplexing). Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
On 4/10/2021 5:03 PM, David Coe wrote: Results for AMD Ryzen 4700U running Ubuntu 21.04β kernel 5.11.0-13 $ sudo dmesg | grep IOMMU [ 0.490352] pci :00:00.2: AMD-Vi: IOMMU performance counters supported [ 0.491985] pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40 [ 0.493732] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank). [ 0.793259] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel $ sudo perf stat -e 'amd_iommu_0/cmd_processed/, amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, amd_iommu_0/mem_iommu_tlb_pte_hit/, amd_iommu_0/mem_iommu_tlb_pte_mis/, amd_iommu_0/mem_pass_excl/, amd_iommu_0/mem_pass_pretrans/, amd_iommu_0/mem_pass_untrans/, amd_iommu_0/mem_target_abort/, amd_iommu_0/mem_trans_total/, amd_iommu_0/page_tbl_read_gst/, amd_iommu_0/page_tbl_read_nst/, amd_iommu_0/page_tbl_read_tot/, amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, amd_iommu_0/tlb_inv/, amd_iommu_0/vapic_int_guest/, amd_iommu_0/vapic_int_non_guest/' sleep 10 Performance counter stats for 'system wide': 12 amd_iommu_0/cmd_processed/ (33.28%) 6 amd_iommu_0/cmd_processed_inv/ (33.32%) 0 amd_iommu_0/ign_rd_wr_mmio_1ff8h/ (33.36%) 290 amd_iommu_0/int_dte_hit/ (33.40%) 20 amd_iommu_0/int_dte_mis/ (33.46%) 391 amd_iommu_0/mem_dte_hit/ (33.49%) 3,720 amd_iommu_0/mem_dte_mis/ (33.49%) 44 amd_iommu_0/mem_iommu_tlb_pde_hit/ (33.46%) 810 amd_iommu_0/mem_iommu_tlb_pde_mis/ (33.45%) 35 amd_iommu_0/mem_iommu_tlb_pte_hit/ (33.41%) 746 amd_iommu_0/mem_iommu_tlb_pte_mis/ (33.37%) 0 amd_iommu_0/mem_pass_excl/ (33.32%) 0 amd_iommu_0/mem_pass_pretrans/ (33.28%) 0 amd_iommu_0/mem_pass_untrans/ (33.28%) 0 amd_iommu_0/mem_target_abort/ (33.27%) 715 amd_iommu_0/mem_trans_total/ (33.27%) 0 amd_iommu_0/page_tbl_read_gst/ (33.28%) 36 amd_iommu_0/page_tbl_read_nst/ (33.27%) 36 amd_iommu_0/page_tbl_read_tot/ (33.27%) 0 amd_iommu_0/smi_blk/ (33.28%) 0 amd_iommu_0/smi_recv/ (33.26%) 0 amd_iommu_0/tlb_inv/ (33.23%) 0 amd_iommu_0/vapic_int_guest/ (33.24%) 366 amd_iommu_0/vapic_int_non_guest/ (33.27%) The immediately obvious difference is the with the enormous count seen on mem_dte_mis on the older Ryzen 2400G. Will do some RTFM but anyone with comments and insight? 841,689,151,202,939 amd_iommu_0/mem_dte_mis/ (33.44%) Otherwise, all seems to running smoothly (especially for a distribution still in β). Bravo and many thanks all! That doesn't look correct. Lemme do some more investigation also. Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] Revert "iommu/amd: Fix performance counter initialization"
Shuah, On 4/10/2021 12:06 AM, Shuah Khan wrote: diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 321f5906e6ed..648cdfd03074 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1729,39 +1728,17 @@ static void __init init_iommu_perf_ctr(struct amd_iommu *iommu) amd_iommu_pc_present = true; /* save the value to restore, if writable */ -if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false) || -iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, false)) -goto pc_false; - -/* - * Disable power gating by programing the performance counter - * source to 20 (i.e. counts the reads and writes from/to IOMMU - * Reserved Register [MMIO Offset 1FF8h] that are ignored.), - * which never get incremented during this init phase. - * (Note: The event is also deprecated.) - */ -val = 20; -if (iommu_pc_get_set_reg(iommu, 0, 0, 8, , true)) +if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false)) goto pc_false; /* Check if the performance counters can be written to */ -val = 0xabcd; -for (retry = 5; retry; retry--) { -if (iommu_pc_get_set_reg(iommu, 0, 0, 0, , true) || -iommu_pc_get_set_reg(iommu, 0, 0, 0, , false) || -val2) -break; - -/* Wait about 20 msec for power gating to disable and retry. */ -msleep(20); -} - -/* restore */ -if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true) || -iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, true)) +if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) || +(iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) || +(val != val2)) Probably don't need parentheses around 'val != val2' This is the result from git revert. Also, the logic is removed in patch 2/2. Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
iommu: amd: Simpify decoding logic for INVALID_PPR_REQUEST event
Reuse existing macro to simplify the code and improve readability. Cc: Joerg Roedel Cc: Gary R Hook Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd_iommu.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index c1cb759..b249aa7 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -617,8 +617,7 @@ static void iommu_print_event(struct amd_iommu *iommu, void *__evt) pasid, address, flags); break; case EVENT_TYPE_INV_PPR_REQ: - pasid = ((event[0] >> 16) & 0x) - | ((event[1] << 6) & 0xF); + pasid = PPR_PASID(*((u64 *)__evt)); tag = event[1] & 0x03FF; dev_err(dev, "Event logged [INVALID_PPR_REQUEST device=%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x tag=0x%03x]\n", PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
iommu: amd: Fix incorrect PASID decoding from event log
IOMMU Event Log encodes 20-bit PASID for events: ILLEGAL_DEV_TABLE_ENTRY IO_PAGE_FAULT PAGE_TAB_HARDWARE_ERROR INVALID_DEVICE_REQUEST as: PASID[15:0] = bit 47:32 PASID[19:16] = bit 19:16 Note that INVALID_PPR_REQUEST event has different encoding from the rest of the events as the following: PASID[15:0] = bit 31:16 PASID[19:16] = bit 45:42 So, fixes the decoding logic. Fixes: d64c0486ed50 ("iommu/amd: Update the PASID information printed to the system log") Cc: Joerg Roedel Cc: Gary R Hook Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd_iommu.c | 5 +++-- drivers/iommu/amd_iommu_types.h | 4 ++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 61de819..c1cb759 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -560,7 +560,8 @@ static void iommu_print_event(struct amd_iommu *iommu, void *__evt) retry: type= (event[1] >> EVENT_TYPE_SHIFT) & EVENT_TYPE_MASK; devid = (event[0] >> EVENT_DEVID_SHIFT) & EVENT_DEVID_MASK; - pasid = PPR_PASID(*(u64 *)[0]); + pasid = (event[0] & EVENT_DOMID_MASK_HI) | + (event[1] & EVENT_DOMID_MASK_LO); flags = (event[1] >> EVENT_FLAGS_SHIFT) & EVENT_FLAGS_MASK; address = (u64)(((u64)event[3]) << 32) | event[2]; @@ -593,7 +594,7 @@ static void iommu_print_event(struct amd_iommu *iommu, void *__evt) address, flags); break; case EVENT_TYPE_PAGE_TAB_ERR: - dev_err(dev, "Event logged [PAGE_TAB_HARDWARE_ERROR device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n", + dev_err(dev, "Event logged [PAGE_TAB_HARDWARE_ERROR device=%02x:%02x.%x pasid=0x%04x address=0x%llx flags=0x%04x]\n", PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), pasid, address, flags); break; diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index 64edd5a..5a698ad 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -130,8 +130,8 @@ #define EVENT_TYPE_INV_PPR_REQ 0x9 #define EVENT_DEVID_MASK 0x #define EVENT_DEVID_SHIFT 0 -#define EVENT_DOMID_MASK 0x -#define EVENT_DOMID_SHIFT 0 +#define EVENT_DOMID_MASK_LO0x +#define EVENT_DOMID_MASK_HI0xf #define EVENT_FLAGS_MASK 0xfff #define EVENT_FLAGS_SHIFT 0x10 -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: Re-factor guest virtual APIC (de-)activation code
Re-factore the logic for activate/deactivate guest virtual APIC mode (GAM) into helper functions, and export them for other drivers (e.g. SVM). to support run-time activate/deactivate of SVM AVIC. Cc: Joerg Roedel Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd_iommu.c | 85 + drivers/iommu/amd_iommu_types.h | 9 + include/linux/amd-iommu.h | 12 ++ 3 files changed, 82 insertions(+), 24 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index dce1d8d..42fba8d 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -4301,13 +4301,62 @@ static void irq_remapping_deactivate(struct irq_domain *domain, .deactivate = irq_remapping_deactivate, }; +int amd_iommu_activate_guest_mode(void *data) +{ + struct amd_ir_data *ir_data = (struct amd_ir_data *)data; + struct irte_ga *entry = (struct irte_ga *) ir_data->entry; + + if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) || + !entry || entry->lo.fields_vapic.guest_mode) + return 0; + + entry->lo.val = 0; + entry->hi.val = 0; + + entry->lo.fields_vapic.guest_mode = 1; + entry->lo.fields_vapic.ga_log_intr = 1; + entry->hi.fields.ga_root_ptr = ir_data->ga_root_ptr; + entry->hi.fields.vector= ir_data->ga_vector; + entry->lo.fields_vapic.ga_tag = ir_data->ga_tag; + + return modify_irte_ga(ir_data->irq_2_irte.devid, + ir_data->irq_2_irte.index, entry, NULL); +} +EXPORT_SYMBOL(amd_iommu_activate_guest_mode); + +int amd_iommu_deactivate_guest_mode(void *data) +{ + struct amd_ir_data *ir_data = (struct amd_ir_data *)data; + struct irte_ga *entry = (struct irte_ga *) ir_data->entry; + struct irq_cfg *cfg = ir_data->cfg; + + if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) || + !entry || !entry->lo.fields_vapic.guest_mode) + return 0; + + entry->lo.val = 0; + entry->hi.val = 0; + + entry->lo.fields_remap.dm = apic->irq_dest_mode; + entry->lo.fields_remap.int_type= apic->irq_delivery_mode; + entry->hi.fields.vector= cfg->vector; + entry->lo.fields_remap.destination = + APICID_TO_IRTE_DEST_LO(cfg->dest_apicid); + entry->hi.fields.destination = + APICID_TO_IRTE_DEST_HI(cfg->dest_apicid); + + return modify_irte_ga(ir_data->irq_2_irte.devid, + ir_data->irq_2_irte.index, entry, NULL); +} +EXPORT_SYMBOL(amd_iommu_deactivate_guest_mode); + static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *vcpu_info) { + int ret; struct amd_iommu *iommu; struct amd_iommu_pi_data *pi_data = vcpu_info; struct vcpu_data *vcpu_pi_info = pi_data->vcpu_data; struct amd_ir_data *ir_data = data->chip_data; - struct irte_ga *irte = (struct irte_ga *) ir_data->entry; struct irq_2_irte *irte_info = _data->irq_2_irte; struct iommu_dev_data *dev_data = search_dev_data(irte_info->devid); @@ -4318,6 +4367,7 @@ static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *vcpu_info) if (!dev_data || !dev_data->use_vapic) return 0; + ir_data->cfg = irqd_cfg(data); pi_data->ir_data = ir_data; /* Note: @@ -4336,37 +4386,24 @@ static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *vcpu_info) pi_data->prev_ga_tag = ir_data->cached_ga_tag; if (pi_data->is_guest_mode) { - /* Setting */ - irte->hi.fields.ga_root_ptr = (pi_data->base >> 12); - irte->hi.fields.vector = vcpu_pi_info->vector; - irte->lo.fields_vapic.ga_log_intr = 1; - irte->lo.fields_vapic.guest_mode = 1; - irte->lo.fields_vapic.ga_tag = pi_data->ga_tag; - - ir_data->cached_ga_tag = pi_data->ga_tag; + ir_data->ga_root_ptr = (pi_data->base >> 12); + ir_data->ga_vector = vcpu_pi_info->vector; + ir_data->ga_tag = pi_data->ga_tag; + ret = amd_iommu_activate_guest_mode(ir_data); + if (!ret) + ir_data->cached_ga_tag = pi_data->ga_tag; } else { - /* Un-Setting */ - struct irq_cfg *cfg = irqd_cfg(data); - - irte->hi.val = 0; - irte->lo.val = 0; - irte->hi.fields.vector = cfg->vector; - irte->lo.fields_remap.guest_mode = 0; - irte->lo.fields_remap.destination = - APICID_TO_IRTE_DEST_LO(cfg->dest_apicid); - irte->hi.fields.destination = - APICID_TO_IRTE_DEST_HI(cfg->dest_apicid); - irte->lo.fields_remap.int_type = apic->irq_delivery_mode; -
Re: [PATCH] iommu/amd: Add support for X2APIC IOMMU interrupts
Joerg, This patch also fixes general x2APIC support for AMD IOMMU, which was introduced earlier. Therefore, I am including the Fixes tag here. Fixes: 90fcffd9cf5e ('iommu/amd: Add support for IOMMU XT mode') Thanks, Suravee On 7/15/2019 11:29 PM, Suthikulpanit, Suravee wrote: > AMD IOMMU requires IntCapXT registers to be setup in order to generate > its own interrupts (for Event Log, PPR Log, and GA Log) with 32-bit > APIC destination ID. Without this support, AMD IOMMU MSI interrupts > will not be routed correctly when booting the system in X2APIC mode. > > Cc: Joerg Roedel > Signed-off-by: Suravee Suthikulpanit > --- > drivers/iommu/amd_iommu_init.c | 90 > + > drivers/iommu/amd_iommu_types.h | 9 + > 2 files changed, 99 insertions(+) > > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c > index ff40ba7..a4c5b1e 100644 > --- a/drivers/iommu/amd_iommu_init.c > +++ b/drivers/iommu/amd_iommu_init.c > @@ -35,6 +35,8 @@ > #include > #include > #include > +#include > +#include > #include > #include > #include > @@ -1935,6 +1937,90 @@ static int iommu_setup_msi(struct amd_iommu *iommu) > return 0; > } > > +#define XT_INT_DEST_MODE(x) (((x) & 0x1ULL) << 2) > +#define XT_INT_DEST_LO(x)(((x) & 0xFFULL) << 8) > +#define XT_INT_VEC(x)(((x) & 0xFFULL) << 32) > +#define XT_INT_DEST_HI(x)x) >> 24) & 0xFFULL) << 56) > + > +/** > + * Setup the IntCapXT registers with interrupt routing information > + * based on the PCI MSI capability block registers, accessed via > + * MMIO MSI address low/hi and MSI data registers. > + */ > +static void iommu_update_intcapxt(struct amd_iommu *iommu) > +{ > + u64 val; > + u32 addr_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET); > + u32 addr_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET); > + u32 data= readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET); > + bool dm = (addr_lo >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1; > + u32 dest= ((addr_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xFF); > + > + if (x2apic_enabled()) > + dest |= MSI_ADDR_EXT_DEST_ID(addr_hi); > + > + val = XT_INT_VEC(data & 0xFF) | > + XT_INT_DEST_MODE(dm) | > + XT_INT_DEST_LO(dest) | > + XT_INT_DEST_HI(dest); > + > + /** > + * Current IOMMU implemtation uses the same IRQ for all > + * 3 IOMMU interrupts. > + */ > + writeq(val, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET); > + writeq(val, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET); > + writeq(val, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET); > +} > + > +static void _irq_notifier_notify(struct irq_affinity_notify *notify, > + const cpumask_t *mask) > +{ > + struct amd_iommu *iommu; > + > + for_each_iommu(iommu) { > + if (iommu->dev->irq == notify->irq) { > + iommu_update_intcapxt(iommu); > + break; > + } > + } > +} > + > +static void _irq_notifier_release(struct kref *ref) > +{ > +} > + > +static int iommu_init_intcapxt(struct amd_iommu *iommu) > +{ > + int ret; > + struct irq_affinity_notify *notify = >intcapxt_notify; > + > + /** > + * IntCapXT requires XTSup=1, which can be inferred > + * amd_iommu_xt_mode. > + */ > + if (amd_iommu_xt_mode != IRQ_REMAP_X2APIC_MODE) > + return 0; > + > + /** > + * Also, we need to setup notifier to update the IntCapXT registers > + * whenever the irq affinity is changed from user-space. > + */ > + notify->irq = iommu->dev->irq; > + notify->notify = _irq_notifier_notify, > + notify->release = _irq_notifier_release, > + ret = irq_set_affinity_notifier(iommu->dev->irq, notify); > + if (ret) { > + pr_err("Failed to register irq affinity notifier (devid=%#x, > irq %d)\n", > +iommu->devid, iommu->dev->irq); > + return ret; > + } > + > + iommu_update_intcapxt(iommu); > + iommu_feature_enable(iommu, CONTROL_INTCAPXT_EN); > + return ret; > +} > + > static int iommu_init_msi(struct amd_iommu *iommu) > { > int ret; > @@ -1951,6 +2037,10 @@ static int iommu_init_msi(struct amd_iommu *iommu) > return ret; > > enable_faults: > + ret = iommu_init_intcapxt(iommu); > + if (ret) > + retu
[PATCH] iommu/amd: Add support for X2APIC IOMMU interrupts
AMD IOMMU requires IntCapXT registers to be setup in order to generate its own interrupts (for Event Log, PPR Log, and GA Log) with 32-bit APIC destination ID. Without this support, AMD IOMMU MSI interrupts will not be routed correctly when booting the system in X2APIC mode. Cc: Joerg Roedel Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd_iommu_init.c | 90 + drivers/iommu/amd_iommu_types.h | 9 + 2 files changed, 99 insertions(+) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index ff40ba7..a4c5b1e 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -35,6 +35,8 @@ #include #include #include +#include +#include #include #include #include @@ -1935,6 +1937,90 @@ static int iommu_setup_msi(struct amd_iommu *iommu) return 0; } +#define XT_INT_DEST_MODE(x)(((x) & 0x1ULL) << 2) +#define XT_INT_DEST_LO(x) (((x) & 0xFFULL) << 8) +#define XT_INT_VEC(x) (((x) & 0xFFULL) << 32) +#define XT_INT_DEST_HI(x) x) >> 24) & 0xFFULL) << 56) + +/** + * Setup the IntCapXT registers with interrupt routing information + * based on the PCI MSI capability block registers, accessed via + * MMIO MSI address low/hi and MSI data registers. + */ +static void iommu_update_intcapxt(struct amd_iommu *iommu) +{ + u64 val; + u32 addr_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET); + u32 addr_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET); + u32 data= readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET); + bool dm = (addr_lo >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1; + u32 dest= ((addr_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xFF); + + if (x2apic_enabled()) + dest |= MSI_ADDR_EXT_DEST_ID(addr_hi); + + val = XT_INT_VEC(data & 0xFF) | + XT_INT_DEST_MODE(dm) | + XT_INT_DEST_LO(dest) | + XT_INT_DEST_HI(dest); + + /** +* Current IOMMU implemtation uses the same IRQ for all +* 3 IOMMU interrupts. +*/ + writeq(val, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET); + writeq(val, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET); + writeq(val, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET); +} + +static void _irq_notifier_notify(struct irq_affinity_notify *notify, +const cpumask_t *mask) +{ + struct amd_iommu *iommu; + + for_each_iommu(iommu) { + if (iommu->dev->irq == notify->irq) { + iommu_update_intcapxt(iommu); + break; + } + } +} + +static void _irq_notifier_release(struct kref *ref) +{ +} + +static int iommu_init_intcapxt(struct amd_iommu *iommu) +{ + int ret; + struct irq_affinity_notify *notify = >intcapxt_notify; + + /** +* IntCapXT requires XTSup=1, which can be inferred +* amd_iommu_xt_mode. +*/ + if (amd_iommu_xt_mode != IRQ_REMAP_X2APIC_MODE) + return 0; + + /** +* Also, we need to setup notifier to update the IntCapXT registers +* whenever the irq affinity is changed from user-space. +*/ + notify->irq = iommu->dev->irq; + notify->notify = _irq_notifier_notify, + notify->release = _irq_notifier_release, + ret = irq_set_affinity_notifier(iommu->dev->irq, notify); + if (ret) { + pr_err("Failed to register irq affinity notifier (devid=%#x, irq %d)\n", + iommu->devid, iommu->dev->irq); + return ret; + } + + iommu_update_intcapxt(iommu); + iommu_feature_enable(iommu, CONTROL_INTCAPXT_EN); + return ret; +} + static int iommu_init_msi(struct amd_iommu *iommu) { int ret; @@ -1951,6 +2037,10 @@ static int iommu_init_msi(struct amd_iommu *iommu) return ret; enable_faults: + ret = iommu_init_intcapxt(iommu); + if (ret) + return ret; + iommu_feature_enable(iommu, CONTROL_EVT_INT_EN); if (iommu->ppr_log != NULL) diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index 87965e4..39be804f 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -72,6 +72,12 @@ #define MMIO_PPR_LOG_OFFSET0x0038 #define MMIO_GA_LOG_BASE_OFFSET0x00e0 #define MMIO_GA_LOG_TAIL_OFFSET0x00e8 +#define MMIO_MSI_ADDR_LO_OFFSET0x015C +#define MMIO_MSI_ADDR_HI_OFFSET0x0160 +#define MMIO_MSI_DATA_OFFSET 0x0164 +#define MMIO_INTCAPXT_EVT_OFFSET 0x0170 +#define MMIO_INTCAPXT_PPR_OFFSET 0x0178 +#define MMIO_INTCAPXT_GALOG_OFFSET 0x0180 #define MMIO_CMD_HEAD_OFFSET 0x2000 #define MMIO_CMD_TAIL_OFFSET 0x2008 #define MMIO_EVT_HEAD_OFFSET 0x2010 @@ -162,6 +168,7 @@ #define CONTROL_GALOG_EN0x1CULL #define CONTROL_GAINT_EN0x1DULL #define CONTROL_XT_EN
[PATCH] svm/avic: iommu/amd: Flush IOMMU IRT after update all entries
When AVIC is enabled and the VM has discrete device assignment, the interrupt remapping table (IRT) is used to keep track of which destination APIC ID the IOMMU will inject the device interrput to. This means every time a vcpu is blocked or context-switched (i.e. vcpu_blocking/unblocking() and vcpu_load/put()), the information in IRT must be updated and the IOMMU IRT flush command must be issued. The current implementation flushes IOMMU IRT every time an entry is modified. If the assigned device has large number of interrupts (hence large number of entries), this would add large amount of overhead to vcpu context-switch. Instead, this can be optmized by only flush IRT once per vcpu context-switch per device after all IRT entries are modified. The function amd_iommu_update_ga() is refactored to only update IRT entry, while the amd_iommu_sync_ga() is introduced to allow IRT flushing to be done separately. Cc: Joerg Roedel Cc: Radim Krčmář Cc: Paolo Bonzini Signed-off-by: Suravee Suthikulpanit --- arch/x86/kvm/svm.c| 35 ++- drivers/iommu/amd_iommu.c | 20 +--- include/linux/amd-iommu.h | 13 ++--- 3 files changed, 61 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 47c4993448c7..a5c7ca5d70d3 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -118,6 +118,8 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id); #define AVIC_GATAG_TO_VMID(x) ((x >> AVIC_VCPU_ID_BITS) & AVIC_VM_ID_MASK) #define AVIC_GATAG_TO_VCPUID(x)(x & AVIC_VCPU_ID_MASK) +#define AVIC_DEVID_BITMAP_SIZE (1 << 16) + static bool erratum_383_found __read_mostly; static const u32 host_save_user_msrs[] = { @@ -251,6 +253,12 @@ struct vcpu_svm { /* which host CPU was used for running this vcpu */ unsigned int last_cpu; + + /* +* Bitmap used to store PCI devid to sync +* AMD IOMMU interrupt remapping table +*/ + unsigned long *avic_devid_sync_bitmap; }; /* @@ -1984,6 +1992,7 @@ static inline int avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r) { int ret = 0; + int devid = 0; unsigned long flags; struct amd_svm_iommu_ir *ir; struct vcpu_svm *svm = to_svm(vcpu); @@ -2001,9 +2010,21 @@ avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r) goto out; list_for_each_entry(ir, >ir_list, node) { - ret = amd_iommu_update_ga(cpu, r, ir->data); + ret = amd_iommu_update_ga(cpu, r, ir->data, ); if (ret) break; + set_bit(devid, svm->avic_devid_sync_bitmap); + } + + /* Sync AMD IOMMU interrupt remapping table changes for each device. */ + devid = find_next_bit(svm->avic_devid_sync_bitmap, + AVIC_DEVID_BITMAP_SIZE, 0); + + while (devid < AVIC_DEVID_BITMAP_SIZE) { + clear_bit(devid, svm->avic_devid_sync_bitmap); + ret = amd_iommu_sync_ga(devid); + devid = find_next_bit(svm->avic_devid_sync_bitmap, + AVIC_DEVID_BITMAP_SIZE, devid+1); } out: spin_unlock_irqrestore(>ir_list_lock, flags); @@ -2107,6 +2128,13 @@ static int avic_init_vcpu(struct vcpu_svm *svm) INIT_LIST_HEAD(>ir_list); spin_lock_init(>ir_list_lock); + svm->avic_devid_sync_bitmap = (void *)__get_free_pages( + GFP_KERNEL | __GFP_ZERO, + get_order(AVIC_DEVID_BITMAP_SIZE/8)); + if (svm->avic_devid_sync_bitmap == NULL) + ret = -ENOMEM; + memset(svm->avic_devid_sync_bitmap, 0, AVIC_DEVID_BITMAP_SIZE/8); + return ret; } @@ -2221,6 +2249,11 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) __free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER); __free_page(virt_to_page(svm->nested.hsave)); __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); + + free_pages((unsigned long)svm->avic_devid_sync_bitmap, + get_order(AVIC_DEVID_BITMAP_SIZE/8)); + svm->avic_devid_sync_bitmap = NULL; + kvm_vcpu_uninit(vcpu); kmem_cache_free(x86_fpu_cache, svm->vcpu.arch.guest_fpu); kmem_cache_free(kvm_vcpu_cache, svm); diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 2a7b78bb98b4..637bcc9192e5 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -4499,7 +4499,20 @@ int amd_iommu_create_irq_domain(struct amd_iommu *iommu) return 0; } -int amd_iommu_update_ga(int cpu, bool is_run, void *data) +int amd_iommu_sync_ga(int devid) +{ + struct amd_iommu *iommu = amd_iommu_rlookup_table[devid]; + + if (!iommu) + return -ENODEV; + + iommu_flush_irt(iommu, devid); +
Re: [PATCH v3] iommu: amd: Fix IOMMU page flush when detach device from a domain
Joerg, On 1/24/19 9:11 PM, j...@8bytes.org wrote: > On Thu, Jan 24, 2019 at 04:16:45AM +0000, Suthikulpanit, Suravee wrote: >> drivers/iommu/amd_iommu.c | 15 +++ >> 1 file changed, 11 insertions(+), 4 deletions(-) > > Applied, thanks Suravee. > Thanks. Also, should this also back-ported to stable tree as well? Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3] iommu: amd: Fix IOMMU page flush when detach device from a domain
From: Suravee Suthikulpanit When a VM is terminated, the VFIO driver detaches all pass-through devices from VFIO domain by clearing domain id and page table root pointer from each device table entry (DTE), and then invalidates the DTE. Then, the VFIO driver unmap pages and invalidate IOMMU pages. Currently, the IOMMU driver keeps track of which IOMMU and how many devices are attached to the domain. When invalidate IOMMU pages, the driver checks if the IOMMU is still attached to the domain before issuing the invalidate page command. However, since VFIO has already detached all devices from the domain, the subsequent INVALIDATE_IOMMU_PAGES commands are being skipped as there is no IOMMU attached to the domain. This results in data corruption and could cause the PCI device to end up in indeterministic state. Fix this by invalidate IOMMU pages when detach a device, and before decrementing the per-domain device reference counts. Cc: Boris Ostrovsky Suggested-by: Joerg Roedel Co-developed-by: Brijesh Singh Signed-off-by: Brijesh Singh Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd_iommu.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 87ba23a75b38..6cd4a00036c1 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1991,16 +1991,13 @@ static void do_attach(struct iommu_dev_data *dev_data, static void do_detach(struct iommu_dev_data *dev_data) { + struct protection_domain *domain = dev_data->domain; struct amd_iommu *iommu; u16 alias; iommu = amd_iommu_rlookup_table[dev_data->devid]; alias = dev_data->alias; - /* decrease reference counters */ - dev_data->domain->dev_iommu[iommu->index] -= 1; - dev_data->domain->dev_cnt -= 1; - /* Update data structures */ dev_data->domain = NULL; list_del(_data->list); @@ -2010,6 +2007,16 @@ static void do_detach(struct iommu_dev_data *dev_data) /* Flush the DTE entry */ device_flush_dte(dev_data); + + /* Flush IOTLB */ + domain_flush_tlb_pde(domain); + + /* Wait for the flushes to finish */ + domain_flush_complete(domain); + + /* decrease reference counters - needs to happen after the flushes */ + domain->dev_iommu[iommu->index] -= 1; + domain->dev_cnt -= 1; } /* -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
Joerg, On 1/23/19 2:56 PM, j...@8bytes.org wrote: > Hi Suravee, > > On Tue, Jan 22, 2019 at 03:53:18PM +, Suthikulpanit, Suravee wrote: >> Thanks for the detail. Alright then, let's just go with the version you >> sent on 1/16/19. Do you want me to resend V3 with that changes, or >> would you be taking care of that? > > Please send me a v3 based on the diff I sent last week. Also add a > separate patch which adds the missing dte flush for the alias entry. > > Thanks, > > Joerg > Actually, I just noticed that device_flush_dte() has already handled flushing the DTE for alias device as well. Please see the link below. https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd_iommu.c#L1219 So, we don't need to call device_flush_dte() for alias device in do_detach(). Regards, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
Joerg, On 1/22/19 5:44 PM, j...@8bytes.org wrote: > Hi Suravee, > > On Thu, Jan 17, 2019 at 08:44:36AM +, Suthikulpanit, Suravee wrote: >> Then, in __domain_flush_pages, we issue command when the dev_iommu[] >= 0. >> This should preserve previous behavior, and only add flushing condition to >> the specific IOMMU in detached state. Please let me know what you think. > > I think the whole point why VFIO is detaching all devices first and then > goes into unmapping pages is that there are no flushes needed anymore > when devices are detached. > > But when we follow your proposal we would still do IOTLB flushes even > when no devices are attached anymore. So I'd like to avoid this, given > the implications on unmapping performance. We should just flush the > IOMMU TLB at detach time and be done with it. > > Makes sense? > > Regards, > > Joerg > Thanks for the detail. Alright then, let's just go with the version you sent on 1/16/19. Do you want me to resend V3 with that changes, or would you be taking care of that? Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
Joerg, On 1/17/19 3:44 PM, Suravee Suthikulpanit wrote: > Joerg, > > On 1/17/19 12:08 AM, j...@8bytes.org wrote: >> On Wed, Jan 16, 2019 at 02:08:55PM +, Suthikulpanit, Suravee wrote: >>> Actually, I am not sure how we would be missing the flush on the last >>> device. >>> In my test, I am seeing the flush command being issued correctly during >>> vfio_unmap_unpin(), which is after all devices are detached. >>> Although, I might be missing your point here. Could you please elaborate? >> >> Okay, you are right, the patch effectivly adds an unconditional flush of >> the domain on all iommus when the last device is detached. So it is >> correct in that regard. But that code path is also used in the >> iommu_unmap() path. >> >> The problem now is, that with your change we send flush commands to all >> IOMMUs in the unmap path when no device is attached to the domain. >> This will hurt performance there, no? >> >> Regards, >> >> Joerg >> > > Sounds like we need a way to track state of each IOMMU for a domain. > What if we define the following: > > enum IOMMU_DOMAIN_STATES { > DOMAIN_FREE = -1, > DOMAIN_DETACHED = 0, > DOMAIN_ATTACHED >= 1 > } > > We should be able to use the dev_iommu[] to help track the state. > - In amd_iommu_domain_alloc, we initialize the array to DOMAIN_FREE > - In do_attach(), we change to DOMAIN_ATTACH or we can increment the > count > if it is already in DOMAIN_ATTACH state. > - In do_detach(). we change to DOMAIN_DETACH. > > Then, in __domain_flush_pages, we issue command when the dev_iommu[] >= 0. > This should preserve previous behavior, and only add flushing condition to > the specific IOMMU in detached state. Please let me know what you think. > > Regards, > Suravee By the way, I just sent V2 of this patch since it might be more clear. Regards, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
From: Suravee Suthikulpanit When a VM is terminated, the VFIO driver detaches all pass-through devices from VFIO domain by clearing domain id and page table root pointer from each device table entry (DTE), and then invalidates the DTE. Then, the VFIO driver unmap pages and invalidate IOMMU pages. Currently, the IOMMU driver keeps track of which IOMMU and how many devices are attached to the domain. When invalidate IOMMU pages, the driver checks if the IOMMU is still attached to the domain before issuing the invalidate page command. However, since VFIO has already detached all devices from the domain, the subsequent INVALIDATE_IOMMU_PAGES commands are being skipped as there is no IOMMU attached to the domain. This results in data corruption and could cause the PCI device to end up in indeterministic state. Fix this by introducing IOMMU domain states attached, detached, and free. Then only issuing the IOMMU pages invalidate command when domain is in attached and detached states. Cc: Boris Ostrovsky Signed-off-by: Brijesh Singh Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd_iommu.c | 25 - drivers/iommu/amd_iommu_types.h | 8 +++- 2 files changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 525659b88ade..1c64a26c50fb 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1248,7 +1248,14 @@ static void __domain_flush_pages(struct protection_domain *domain, build_inv_iommu_pages(, address, size, domain->id, pde); for (i = 0; i < amd_iommu_get_num_iommus(); ++i) { - if (!domain->dev_iommu[i]) + /* +* We issue the command only when the domain is +* in ATTACHED and DETACHED state. This is required +* since VFIO detaches all devices from the group +* before invalidating IOMMU pages. Please see +* vfio_iommu_type1_detach_group(). +*/ + if (domain->dev_iommu[i] == IOMMU_DOMAIN_STATE_FREE) continue; /* @@ -1292,7 +1299,8 @@ static void domain_flush_complete(struct protection_domain *domain) int i; for (i = 0; i < amd_iommu_get_num_iommus(); ++i) { - if (domain && !domain->dev_iommu[i]) + if (domain && + domain->dev_iommu[i] < IOMMU_DOMAIN_STATE_ATTACHED) continue; /* @@ -1978,8 +1986,11 @@ static void do_attach(struct iommu_dev_data *dev_data, list_add(_data->list, >dev_list); /* Do reference counting */ - domain->dev_iommu[iommu->index] += 1; - domain->dev_cnt += 1; + if (domain->dev_iommu[iommu->index] < IOMMU_DOMAIN_STATE_ATTACHED) + domain->dev_iommu[iommu->index] = IOMMU_DOMAIN_STATE_ATTACHED; + else + domain->dev_iommu[iommu->index] += 1; + domain->dev_cnt += 1; /* Update device table */ set_dte_entry(dev_data->devid, domain, ats, dev_data->iommu_v2); @@ -2904,6 +2915,7 @@ static int protection_domain_init(struct protection_domain *domain) static struct protection_domain *protection_domain_alloc(void) { + int i; struct protection_domain *domain; domain = kzalloc(sizeof(*domain), GFP_KERNEL); @@ -2915,6 +2927,9 @@ static struct protection_domain *protection_domain_alloc(void) add_domain_to_list(domain); + for (i = 0; i < MAX_IOMMUS; i++) + domain->dev_iommu[i] = IOMMU_DOMAIN_STATE_FREE; + return domain; out_err: @@ -3364,7 +3379,7 @@ static int __flush_pasid(struct protection_domain *domain, int pasid, * prevent device TLB refill from IOMMU TLB */ for (i = 0; i < amd_iommu_get_num_iommus(); ++i) { - if (domain->dev_iommu[i] == 0) + if (domain->dev_iommu[i] < IOMMU_DOMAIN_STATE_ATTACHED) continue; ret = iommu_queue_command(amd_iommus[i], ); diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index eae0741f72dc..6d17cdcfa306 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -461,6 +461,12 @@ struct amd_irte_ops; #define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED (1 << 0) +enum IOMMU_DOMAIN_STATES { + IOMMU_DOMAIN_STATE_FREE = -1, + IOMMU_DOMAIN_STATE_DETTACHED, + IOMMU_DOMAIN_STATE_ATTACHED +}; + /* * This structure contains generic data for IOMMU protection domains * independent of their use. @@ -480,7 +486,7 @@ struct protection_domain { unsigned long flags;/* flags to find out type of domain */ bool updated; /* complete domain flush required */ unsigned dev_cnt; /* devices assigned to this domain */ - unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
Joerg, On 1/17/19 12:08 AM, j...@8bytes.org wrote: > On Wed, Jan 16, 2019 at 02:08:55PM +0000, Suthikulpanit, Suravee wrote: >> Actually, I am not sure how we would be missing the flush on the last device. >> In my test, I am seeing the flush command being issued correctly during >> vfio_unmap_unpin(), which is after all devices are detached. >> Although, I might be missing your point here. Could you please elaborate? > > Okay, you are right, the patch effectivly adds an unconditional flush of > the domain on all iommus when the last device is detached. So it is > correct in that regard. But that code path is also used in the > iommu_unmap() path. > > The problem now is, that with your change we send flush commands to all > IOMMUs in the unmap path when no device is attached to the domain. > This will hurt performance there, no? > > Regards, > > Joerg > Sounds like we need a way to track state of each IOMMU for a domain. What if we define the following: enum IOMMU_DOMAIN_STATES { DOMAIN_FREE = -1, DOMAIN_DETACHED = 0, DOMAIN_ATTACHED >= 1 } We should be able to use the dev_iommu[] to help track the state. - In amd_iommu_domain_alloc, we initialize the array to DOMAIN_FREE - In do_attach(), we change to DOMAIN_ATTACH or we can increment the count if it is already in DOMAIN_ATTACH state. - In do_detach(). we change to DOMAIN_DETACH. Then, in __domain_flush_pages, we issue command when the dev_iommu[] >= 0. This should preserve previous behavior, and only add flushing condition to the specific IOMMU in detached state. Please let me know what you think. Regards, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
Joerg, On 1/16/19 8:26 PM, j...@8bytes.org wrote: > How about the attached diff? If > I understand the problem correctly, it should fix the problem more > reliably. > > Thanks, > > Joerg > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index 87ba23a75b38..dc1e2a8a19d7 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -1991,25 +1991,36 @@ static void do_attach(struct iommu_dev_data *dev_data, > > static void do_detach(struct iommu_dev_data *dev_data) > { > + struct protection_domain *domain = dev_data->domain; > struct amd_iommu *iommu; > u16 alias; > > iommu = amd_iommu_rlookup_table[dev_data->devid]; > alias = dev_data->alias; > > - /* decrease reference counters */ > - dev_data->domain->dev_iommu[iommu->index] -= 1; > - dev_data->domain->dev_cnt -= 1; > - > /* Update data structures */ > dev_data->domain = NULL; > list_del(_data->list); > - clear_dte_entry(dev_data->devid); > - if (alias != dev_data->devid) > - clear_dte_entry(alias); > > + clear_dte_entry(dev_data->devid); > /* Flush the DTE entry */ > device_flush_dte(dev_data); > + > + if (alias != dev_data->devid) { > + clear_dte_entry(alias); > + /* Flush the Alias DTE entry */ > + device_flush_dte(alias); > + } > + Actually, device_flush_dte(alias) should be needed regardless of this patch. Are you planning to add this? Regards, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Mark translation invalid during device detach
Joerg, On 1/16/19 8:03 PM, j...@8bytes.org wrote: > Hi Suravee, > > On Wed, Jan 16, 2019 at 04:15:10AM +, Suthikulpanit, Suravee wrote: >> From: Suravee Suthikulpanit >> >> When a device switches domain, IOMMU driver detach device from the old >> domain, and attach device to the new domain. During this period >> the host table root pointer is not set, which means DMA translation >> should be marked as invalid (clear TV bit). >> >> So, clear the TV bit when detach the device. The TV bit will be set >> again when attaching device to the new domain. > > Is there a specific problem with setting the TV bit? We are not currently seeing issue. > Note that the update will clear all other fields in the first 128 bits > of the DTE, which means that IR, IW and Mode are all set to 0. This > effectivly blocks all DMA requests from the device, which is by design. Ahh.. This makes sense now. I missed the IR/IW/Mode=0 part. It was not clear to us earlier. Thanks for clarification. Regards, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
Joerg, On 1/16/19 8:26 PM, j...@8bytes.org wrote: > On Wed, Jan 16, 2019 at 04:16:25AM +0000, Suthikulpanit, Suravee wrote: >> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c >> index 525659b88ade..ab31ba75da1b 100644 >> --- a/drivers/iommu/amd_iommu.c >> +++ b/drivers/iommu/amd_iommu.c >> @@ -1248,7 +1248,13 @@ static void __domain_flush_pages(struct >> protection_domain *domain, >> build_inv_iommu_pages(, address, size, domain->id, pde); >> >> for (i = 0; i < amd_iommu_get_num_iommus(); ++i) { >> -if (!domain->dev_iommu[i]) >> +/* >> + * The dev_cnt is zero when all devices are detached >> + * from the domain. This is the case when VFIO detaches >> + * all devices from the group before flushing IOMMU pages. >> + * So, always issue the flush command. >> + */ >> +if (domain->dev_cnt && !domain->dev_iommu[i]) >> continue; > > This doesn't look like the correct fix. We still miss the flush if we > detach the last device from the domain. Actually, I am not sure how we would be missing the flush on the last device. In my test, I am seeing the flush command being issued correctly during vfio_unmap_unpin(), which is after all devices are detached. Although, I might be missing your point here. Could you please elaborate? > How about the attached diff? If > I understand the problem correctly, it should fix the problem more > reliably. > > Thanks, > > Joerg > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index 87ba23a75b38..dc1e2a8a19d7 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -1991,25 +1991,36 @@ static void do_attach(struct iommu_dev_data *dev_data, > > static void do_detach(struct iommu_dev_data *dev_data) > { > + struct protection_domain *domain = dev_data->domain; > struct amd_iommu *iommu; > u16 alias; > > iommu = amd_iommu_rlookup_table[dev_data->devid]; > alias = dev_data->alias; > > - /* decrease reference counters */ > - dev_data->domain->dev_iommu[iommu->index] -= 1; > - dev_data->domain->dev_cnt -= 1; > - > /* Update data structures */ > dev_data->domain = NULL; > list_del(_data->list); > - clear_dte_entry(dev_data->devid); > - if (alias != dev_data->devid) > - clear_dte_entry(alias); > > + clear_dte_entry(dev_data->devid); > /* Flush the DTE entry */ > device_flush_dte(dev_data); > + > + if (alias != dev_data->devid) { > + clear_dte_entry(alias); > + /* Flush the Alias DTE entry */ > + device_flush_dte(alias); > + } > + > + /* Flush IOTLB */ > + domain_flush_tlb_pde(domain); > + > + /* Wait for the flushes to finish */ > + domain_flush_complete(domain); > + > + /* decrease reference counters - needs to happen after the flushes */ > + domain->dev_iommu[iommu->index] -= 1; > + domain->dev_cnt -= 1; > } I have also considered this. This would also work. But since we are already doing page flushes during page unmapping later on after all devices are detached. So, would this be necessary? Please see vfio_iommu_type1_detach_group(). Also, if we consider the case where there are more than one devices sharing the domain. This would issue page flush every time we detach a device, and while other devices still attached to the domain. Regards, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] amd/iommu: Fix Guest Virtual APIC Log Tail Address Register
From: Filippo Sironi This register should have been programmed with the physical address of the memory location containing the shadow tail pointer for the guest virtual APIC log instead of the base address. Fixes: 8bda0cfbdc1a ('iommu/amd: Detect and initialize guest vAPIC log') Signed-off-by: Filippo Sironi Signed-off-by: Wei Wang Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd_iommu_init.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 9c3d610e1e19..e777fa90b2c2 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -797,7 +797,8 @@ static int iommu_init_ga_log(struct amd_iommu *iommu) entry = iommu_virt_to_phys(iommu->ga_log) | GA_LOG_SIZE_512; memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_BASE_OFFSET, , sizeof(entry)); - entry = (iommu_virt_to_phys(iommu->ga_log) & 0xFULL) & ~7ULL; + entry = (iommu_virt_to_phys(iommu->ga_log_tail) & +(BIT_ULL(52)-1)) & ~7ULL; memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_TAIL_OFFSET, , sizeof(entry)); writel(0x00, iommu->mmio_base + MMIO_GA_HEAD_OFFSET); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu