Re: [PATCH v3 7/7] iommu/amd: Do not support IOMMUv2 APIs when SNP is enabled

2022-06-29 Thread Suthikulpanit, Suravee via iommu




On 6/23/2022 3:23 PM, Joerg Roedel wrote:

On Wed, Jun 22, 2022 at 12:11:31PM -0500, Suravee Suthikulpanit wrote:

  bool amd_iommu_v2_supported(void)
  {
-   return amd_iommu_v2_present;
+   /*
+* Since DTE[Mode]=0 is prohibited on SNP-enabled system
+* (i.e. EFR[SNPSup]=1), IOMMUv2 page table cannot be used without
+* setting up IOMMUv1 page table.
+*/
+   return amd_iommu_v2_present && !amd_iommu_snp_en;


IOMMU_v2 APIs could actually be supported with GIOV and IOMMUv2
page-tables in-use, no?


We can support IOMMUv2 iff the v1 table is also setup (i.e. DTE[Mode] != 0).
Currently w/ IOMMU_v2 APIs, the IOMMU sets the mode to zero. Therefore, we
cannot support this use case.

Best Regards,
Suravee

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/7] iommu/amd: Warn when found inconsistency EFR mask

2022-06-29 Thread Suthikulpanit, Suravee via iommu




On 6/23/2022 3:22 PM, Joerg Roedel wrote:

On Wed, Jun 22, 2022 at 12:11:25PM -0500, Suravee Suthikulpanit wrote:

  #ifdef CONFIG_IRQ_REMAP
+/*
+ * Iterate through all the IOMMUs to verify if the specified
+ * EFR bitmask of IOMMU feature are set.
+ * Warn and return false if found inconsistency.
+ */
  static bool check_feature_on_all_iommus(u64 mask)
  {
bool ret = false;
struct amd_iommu *iommu;
  
  	for_each_iommu(iommu) {

-   ret = iommu_feature(iommu, mask);
-   if (!ret)
+   bool tmp = iommu_feature(iommu, mask);
+
+   if ((ret != tmp) &&
+   !list_is_first(>list, _iommu_list)) {
+   pr_err(FW_BUG "Found inconsistent EFR mask (%#llx) on 
iommu%d (%04x:%02x:%02x.%01x).\n",
+  mask, iommu->index, iommu->pci_seg->id, 
PCI_BUS_NUM(iommu->devid),
+  PCI_SLOT(iommu->devid), PCI_FUNC(iommu->devid));
return false;
+   }
+   ret = tmp;


It is better to implement this by introducing a global feature mask,
which represents the minial set of features supported by any IOMMU in
the system.

The warning is then something like:

if ((global_feature_mask & iommu_features) != global_feature_mask)
pr_warn(...);

This also makes the global variable to track SNP support obsolete.

Regards,

Joerg


That's actually better. I'll send out v4 w/ global EFR variable.

Thanks,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/7] iommu/amd: Introduce function to check and enable SNP

2022-06-22 Thread Suthikulpanit, Suravee via iommu



On 6/22/2022 3:35 PM, Robin Murphy wrote:


Overall though, this is way nicer than v1, and it's definitely the right name 
in the right place now, thanks! FWIW, with those nits picked one way or another:

Reviewed-by: Robin Murphy 

Cheers,
Robin.


Thanks for your review. I'll send out v3 w/ your suggestions and reviewed-by 
tag.

Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 4/7] iommu/amd: Introduce function to check and enable SNP

2022-06-22 Thread Suthikulpanit, Suravee via iommu

Recap discussion on the other thread.

https://lore.kernel.org/linux-mm/camkat6qorwbaxapacasm0sc9o2uq9zqzb6s1kbkvav2d4tk...@mail.gmail.com/#t

On 6/16/2022 8:55 AM, Suravee Suthikulpanit wrote:

+int amd_iommu_snp_enable(void)
+{
+   /*
+* The SNP support requires that IOMMU must be enabled, and is
+* not configured in the passthrough mode.
+*/
+   if (no_iommu || iommu_default_passthrough()) {
+   pr_err("SNP: IOMMU is either disabled or configured in passthrough 
mode.\n");
+   return -EINVAL;
+   }


Peter has suggested rewording to something more descriptive such as:

"SNP: IOMMU is either disabled or configured in passthrough mode, SNP cannot be 
supported".

Thank you,
Suravee


+   /*
+* Prevent enabling SNP after IOMMU_ENABLED state because this process
+* affect how IOMMU driver sets up data structures and configures
+* IOMMU hardware.
+*/
+   if (init_state > IOMMU_ENABLED) {
+   pr_err("SNP: Too late to enable SNP for IOMMU.\n");
+   return -EINVAL;
+   }
+
+   amd_iommu_snp_en = amd_iommu_snp_sup;
+   if (!amd_iommu_snp_en)
+   return -EINVAL;
+
+   pr_info("SNP enabled\n");
+
+   /* Enforce IOMMU v1 pagetable when SNP is enabled. */
+   if (amd_iommu_pgtable != AMD_IOMMU_V1) {
+   pr_warn("Force to using AMD IOMMU v1 page table due to SNP\n");
+   amd_iommu_pgtable = AMD_IOMMU_V1;
+   amd_iommu_ops.pgsize_bitmap = AMD_IOMMU_PGSIZES;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(amd_iommu_snp_enable);


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/7] iommu: Add domain_type_supported() callback in iommu_ops

2022-06-14 Thread Suthikulpanit, Suravee via iommu

Robin,

On 6/14/2022 4:51 PM, Robin Murphy wrote:

On 2022-06-13 15:38, Suthikulpanit, Suravee wrote:

Robin,

On 6/13/2022 4:31 PM, Robin Murphy wrote:



Introducing check_domain_type_supported() callback in iommu_ops,
which allows IOMMU generic layer to check with vendor-specific IOMMU driver
whether the requested type is supported. This allows user to request
types other than the default type.


Note also that you're only adding this in the sysfs path - what about the 
"iommu.passthrough=" parameter or CONFIG_IOMMU_DEFAULT_PASSTHROUGH?


For SNP case, we cannot enable SNP if iommu=off or iommu=pt or 
iommu.passthrough=1 or CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y.
So, when another driver tries to enable SNP, the IOMMU driver prevents it (see 
iommu_sev_snp_supported() in patch 3).


Ugh, I hadn't looked too closely at the other patches, but an interface that looks like a 
simple "is this feature supported?" check with a secret side-effect of changing 
global behaviour as well? Yuck :(

What external drivers are expected to have the authority to affect the entire 
system and call that? The fact that you're exporting it suggests they could be 
loaded from modules *after* v2 features have been enabled and/or the user has 
configured a non-default identity domain for a group via sysfs... Fun!


I see your point.

Currently, the function to enable SNP will be called from SEV driver when it 
tries to enable SNP support globally on the system.
This is done during fs_initcall(), which is early in the boot process. I can 
also add a guard code to make sure that this won't
be done after a certain phase.


Instead, if we boot with iommu.passhthrough=0, when another driver tries to 
enable SNP, the IOMMU driver allows this and switch to SNP enable mode.
Subsequently, if user tries to switch a domain (via sysfs) to 
IOMMU_DOMAIN_IDENTITY, the IOMMU needs to prevent this because it has already 
switch
to SNP-enabled mode.


AFAICS there shouldn't need to be any core-level changes to support this. We already have 
drivers which don't support passthrough at all, so conditionally not supporting it should 
be no big deal. What should happen currently is that def_domain_type returns 0 for 
"don't care", then domain_alloc rejects IOMMU_DOMAIN_IDENTITY and and returns 
NULL, so iommu_group_alloc_default_domain() falls back to IOMMU_DOMAIN_DMA.


Technically, we can do it the way you suggest. But isn't this confusing? At first, 
def_domain_type() returns 0 for "don't care",
but then it rejects the request to change to IOMMU_DOMAIN_IDENTITY when trying 
to call domain_alloc().


Yes, that's how it works; def_domain_type is responsible for quirking 
individual *devices* that need to have a specific domain type (in practice, 
devices which need identity mapping), while domain_alloc is responsible for 
saying which domain types the driver supports as a whole, by allocating them or 
not as appropriate.

We don't have a particularly neat way to achieve the negative of 
def_domain_type - i.e. saying that a specific device *can't* use a specific 
otherwise-supported domain type - other than subsequently failing in 
attach_dev, but so far we've not needed such a thing. And if SNP is expected to 
be mutually exclusive with identity domain support globally, then we still 
shouldn't need it.


Thanks for your feedback.

Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/7] iommu/amd: Introduce function to check SEV-SNP support

2022-06-13 Thread Suthikulpanit, Suravee via iommu




On 6/13/2022 8:24 AM, Suravee Suthikulpanit wrote:

@@ -3543,3 +3537,30 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 
bank, u8 cntr, u8 fxn, u64
  
  	return iommu_pc_get_set_reg(iommu, bank, cntr, fxn, value, true);

  }
+
+bool iommu_sev_snp_supported(void)
+{
+   /*
+* The SEV-SNP support requires that IOMMU must be enabled, and is
+* not configured in the passthrough mode.
+*/
+   if (no_iommu || iommu_default_passthrough()) {
+   pr_err("SEV-SNP: IOMMU is either disabled or configured in 
passthrough mode.\n");
+   return false;
+   }
+
+   amd_iommu_snp_en = amd_iommu_snp_sup;
+   if (amd_iommu_snp_en)
+   pr_info("SNP enabled\n");
+
+   /* Enforce IOMMU v1 pagetable when SNP is enabled. */
+   if ((amd_iommu_pgtable != AMD_IOMMU_V1) &&
+amd_iommu_snp_en) {
+   pr_info("Force to using AMD IOMMU v1 page table due to SNP\n");
+   amd_iommu_pgtable = AMD_IOMMU_V1;
+   amd_iommu_ops.pgsize_bitmap = AMD_IOMMU_PGSIZES;
+   }
+
+   return amd_iommu_snp_en;
+}
+EXPORT_SYMBOL_GPL(iommu_sev_snp_supported);


I need to guard this function w/ #ifdef CONFIG_AMD_MEM_ENCRYPT to prevent build 
error when CONFIG_AMD_MEM_ENCRYPT=n.
I will send out v2 to fix this.

Best Regards,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/7] iommu: Add domain_type_supported() callback in iommu_ops

2022-06-13 Thread Suthikulpanit, Suravee via iommu

Robin,

On 6/13/2022 4:31 PM, Robin Murphy wrote:

On 2022-06-13 02:25, Suravee Suthikulpanit wrote:

When user requests to change IOMMU domain to a new type, IOMMU generic
layer checks the requested type against the default domain type returned
by vendor-specific IOMMU driver.

However, there is only one default domain type, and current mechanism
does not allow if the requested type does not match the default type.


I don't really follow the reasoning here. If a driver's def_domain_type callback returns a specific type, it's saying that the device *has* to have that specific domain type for driver/platform-specific reasons. 


Agree, and I understand this part.


If that's not the case, then the driver shouldn't say so in the first place.


Considering the case:
1. Boot w/ default domain = IOMMU_DOMAIN_DMA_FQ
2. User wants to change to IOMMU_DOMAIN_IDENTITY, which is not supported by 
IOMMU driver. In this case, IOMMU driver can return IOMMU_DOMAIN_DMA_FQ and 
prevent the mode change.
3. However, if user want to change to IOMMU_DOMAIN_DMA. The driver can support 
this. However, since the def_domain_type() returns IOMMU_DOMAIN_DMA_FQ, it ends 
up prevent the mode change.

IIUC, we should support step 3 above. Basically, with the newly proposed 
interface, it allows us to check with IOMMU driver if it can support certain 
domain types before trying
to allocate the domain.




Introducing check_domain_type_supported() callback in iommu_ops,
which allows IOMMU generic layer to check with vendor-specific IOMMU driver
whether the requested type is supported. This allows user to request
types other than the default type.


Note also that you're only adding this in the sysfs path - what about the 
"iommu.passthrough=" parameter or CONFIG_IOMMU_DEFAULT_PASSTHROUGH?


For SNP case, we cannot enable SNP if iommu=off or iommu=pt or 
iommu.passthrough=1 or CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y.
So, when another driver tries to enable SNP, the IOMMU driver prevents it (see 
iommu_sev_snp_supported() in patch 3).

Instead, if we boot with iommu.passhthrough=0, when another driver tries to 
enable SNP, the IOMMU driver allows this and switch to SNP enable mode.
Subsequently, if user tries to switch a domain (via sysfs) to 
IOMMU_DOMAIN_IDENTITY, the IOMMU needs to prevent this because it has already 
switch
to SNP-enabled mode.


AFAICS there shouldn't need to be any core-level changes to support this. We already have 
drivers which don't support passthrough at all, so conditionally not supporting it should 
be no big deal. What should happen currently is that def_domain_type returns 0 for 
"don't care", then domain_alloc rejects IOMMU_DOMAIN_IDENTITY and and returns 
NULL, so iommu_group_alloc_default_domain() falls back to IOMMU_DOMAIN_DMA.


Technically, we can do it the way you suggest. But isn't this confusing? At first, 
def_domain_type() returns 0 for "don't care",
but then it rejects the request to change to IOMMU_DOMAIN_IDENTITY when trying 
to call domain_alloc().

Please let me know if I am missing something.

Best Regards,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: iommu/amd: bug report: page table memory leak

2022-02-08 Thread Suthikulpanit, Suravee via iommu

Hi Daniel,

On 1/19/2022 2:47 AM, Daniel Jordan wrote:

Hi,

I've hit a memory leak while testing qemu v6.2.0-rc4 on an AMD EPYC 7J13
(Milan) system.  Starting an almost 1T guest, the leak is over 1.5G per
qemu invocation.  I haven't checked whether the leak is proportional to
guest size.  It happens with a vfio device, and only when the guest's
memory is preallocated using qemu prealloc (this latter part is kinda
strange).  It happens when the guest memory uses THP but not hugetlb.

Bisection:

# bad: [df0cc57e057f18e44dac8e6c18aba47ab53202f9] Linux 5.16
# good: [f40ddce88593482919761f74910f42f4b84c004b] Linux 5.11
git bisect start 'df0cc57e057f1' 'f40ddce885934' '--' 'drivers/vfio' 
'drivers/iommu' 'include/linux/amd-iommu.h' 'include/linux/dma-iommu.h' 
'include/linux/intel-iommu.h' 'include/linux/iommu-helper.h' 
'include/linux/of_iommu.h' 'include/
linux/omap-iommu.h' 'include/linux/platform_data/iommu-omap.h' 
'include/linux/iommu.h' 'include/trace/events/intel_iommu.h' 
'include/trace/events/iommu.h' 'include/uapi/linux/iommu.h' 
'include/uapi/linux/virtio_iommu.h' 'arch/x86/events/a
md/iommu.h' 'arch/x86/events/amd/iommu.c' 'arch/x86/include/asm/iommu.h' 
'arch/x86/include/asm/iommu_table.h' 'arch/x86/kernel/pci-iommu_table.c'
# bad: [cee57d4fe74e82e784f6566bad3e3bb1ca51a211] iommu/vt-d: Remove 
unnecessary braces
git bisect bad cee57d4fe74e82e784f6566bad3e3bb1ca51a211
# bad: [9fb5fad562fa0a41c84691714d99c23f54168a9e] iommu: remove 
DOMAIN_ATTR_PAGING
git bisect bad 9fb5fad562fa0a41c84691714d99c23f54168a9e
# bad: [45e606f2726926b04094e1c9bf809bca4884c57f] Merge branches 'arm/renesas', 
'arm/smmu', 'x86/amd', 'x86/vt-d' and 'core' into next
git bisect bad 45e606f2726926b04094e1c9bf809bca4884c57f
# good: [7060377ce06f9cd3ed6274c0f2310463feb5baec] Merge branch 'for-joerg/mtk' 
into for-joerg/arm-smmu/updates
git bisect good 7060377ce06f9cd3ed6274c0f2310463feb5baec
# bad: [6778ff5b21bd8e78c8bd547fd66437cf2657fd9b] iommu/amd: Fix performance 
counter initialization
git bisect bad 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b
# good: [f9b4df790aa4372bfa11b7d212e537b763295429] iommu/amd: Declare functions 
as extern
git bisect good f9b4df790aa4372bfa11b7d212e537b763295429
# bad: [33aef9786046d9a5744cd1e8d5d0ce800d611fdc] iommu/amd: Rename variables 
to be consistent with struct io_pgtable_ops
git bisect bad 33aef9786046d9a5744cd1e8d5d0ce800d611fdc
# bad: [e42ba0633064ef23eb1c8c21edf96bac1541bd4b] iommu/amd: Restructure code 
for freeing page table
git bisect bad e42ba0633064ef23eb1c8c21edf96bac1541bd4b
# good: [18954252a1d0b12e1b77087b55c37fb43b09e12a] iommu/amd: Move IO page 
table related functions
git bisect good 18954252a1d0b12e1b77087b55c37fb43b09e12a
# first bad commit: [e42ba0633064ef23eb1c8c21edf96bac1541bd4b] iommu/amd: 
Restructure code for freeing page table

commit e42ba0633064ef23eb1c8c21edf96bac1541bd4b
Author: Suravee Suthikulpanit 
Date:   Tue Dec 15 01:36:59 2020 -0600
  
 iommu/amd: Restructure code for freeing page table
  
 By consolidate logic into v1_free_pgtable helper function,

 which is called from IO page table framework.
 
 Signed-off-by: Suravee Suthikulpanit 

 Link: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fr%2F20201215073705.123786-8-suravee.suthikulpanit%40amd.comdata=04%7C01%7Csuravee.suthikulpanit%40amd.com%7C143de50116b0431302ce08d9dabb5dab%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637781323390114388%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=RK%2F8spS7L5qdvaBYx0OE9T75TOfb9QiA8%2BKk4C00Jqo%3Dreserved=0
 Signed-off-by: Joerg Roedel 
 
  drivers/iommu/amd/amd_iommu.h  |  1 -

  drivers/iommu/amd/io_pgtable.c | 41 -
  drivers/iommu/amd/iommu.c  | 21 -
  3 files changed, 28 insertions(+), 35 deletions(-)

Qemu command line:

numactl -m 1 -N 1 "$QEMU" \
 -name vmol74 \
 -machine q35,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram \
 -cpu host,host-phys-bits=true \
 -smp cpus=32 \
 -no-user-config \
 -nodefaults \
 -rtc base=utc,driftfix=slew \
 -global kvm-pit.lost_tick_policy=delay \
 -no-hpet \
 -no-shutdown \
 -boot strict=on \
 -drive file=${vm_image},format=raw,if=none,id=drive-ide0-0-0 \
 -device 
ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 \
 -device vfio-pci,host=${pci_addr},id=net2,bus=pcie.0 \
 -msg timestamp=on \
 -nographic \
 -object 
memory-backend-ram,id=pc.ram,size=980g,prealloc=on,prealloc-threads=16 -m 980g \
 -daemonize

Kernel config 

Re: [PATCH 0/3] iommu/amd: Fix unable to handle page fault due to AVIC

2021-09-02 Thread Suthikulpanit, Suravee via iommu




On 9/2/2021 12:38 AM, Joerg Roedel wrote:

Hi Suravee,

On Tue, Aug 31, 2021 at 12:10:27PM -0500, Suthikulpanit, Suravee wrote:

Here is an dditional tags for this series:

Fixes: 8bda0cfbdc1a ("iommu/amd: Detect and initialize guest vAPIC log")

Are there any concerns with this series?


No concerns, please add the tag and re-post when -rc1 is out. I will it
queue for -rc2 then.

Thanks,

Joerg



I'll do that.

Thanks,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/amd: Fix unable to handle page fault due to AVIC

2021-08-31 Thread Suthikulpanit, Suravee via iommu

Joerg,

Here is an dditional tags for this series:

Fixes: 8bda0cfbdc1a ("iommu/amd: Detect and initialize guest vAPIC log")

Are there any concerns with this series?

Thanks
Suravee

On 8/20/2021 3:29 PM, Suravee Suthikulpanit wrote:

This bug is triggered when rebooting VM on a system which
SVM AVIC is enabled but IOMMU AVIC is disabled in the BIOS.

The series reworks interrupt remapping intialiation to
check for IOMMU AVIC support (GAMSup) at earlier stage using
EFR provided by IVRS table instead of the PCI MMIO register,
which is available after PCI support for IOMMU is initialized.
This helps avoid having to disable and clean up the already
initialized interrupt-remapping-related parameter.

Thanks,
Suravee

Suravee Suthikulpanit (2):
   iommu/amd: Introduce helper function to check feature bit on all
 IOMMUs
   iommu/amd: Remove iommu_init_ga()

Wei Huang (1):
   iommu/amd: Relocate GAMSup check to early_enable_iommus

  drivers/iommu/amd/init.c | 45 ++--
  1 file changed, 25 insertions(+), 20 deletions(-)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] iommu/amd: Use report_iommu_fault()

2021-08-05 Thread Suthikulpanit, Suravee via iommu

Lennert,

FYI: I have made some comments in V2 thread specifically around the new changes 
that we discussed in that thread.

Thanks,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/amd: Use report_iommu_fault()

2021-08-05 Thread Suthikulpanit, Suravee via iommu

Lennert,

On 7/29/2021 9:32 PM, Lennert Buytenhek wrote:

We have three cases to handle:

- EVENT_FLAG_I set: IRQ remapping fault, don't call report_iommu_fault()

- EVENT_FLAG_I unset, but the request was a translation request
   (EVENT_FLAG_TR set) or the target page was not present (EVENT_FLAG_PR
   unset): call report_iommu_fault(), but the RW bit will be invalid, so
   don't try to map it to a IOMMU_FAULT_{READ,WRITE} code


So, why do we need to call report_iommu_fault() for this case?
My understanding is we only have IOMMU_FAULT_[READ|WRITE].
So, if we can't identify whether the DMA is read / write,
we should not need to call report_iommu_fauilt(), is it?


- EVENT_FLAG_I unset, the request is a transaction request (EVENT_FLAG_TR
   unset) and the target page was present (EVENT_FLAG_PR set): call
   report_iommu_fault(), and use the RW bit to set IOMMU_FAULT_{READ,WRITE}

So I don't think we can merge the test for EVENT_FLAG_I with the
test for EVENT_FLAG_TR/EVENT_FLAG_PR.


The only condition that we would report_iommu_fault is
I=0, TR=0, PR=1, isn't it. So we should be able to just check if PR=1.



We could do something like this, if you'd prefer:

#define IS_IOMMU_MEM_TRANSACTION(flags) \
(((flags) & EVENT_FLAG_I) == 0)

#define IS_RW_FLAG_VALID(flags) \
(((flags) & (EVENT_FLAG_TR | EVENT_FLAG_PR)) == EVENT_FLAG_PR)

#define IS_WRITE_REQUEST(flags) \
(IS_RW_FLAG_VALID(flags) && (flags & EVENT_FLAG_RW))

And then do something like:

if (dev_data && IS_IOMMU_MEM_TRANSACTION(flags)) {
if (!report_iommu_fault(_data->domain->domain, >dev,
address,
IS_WRITE_REQUEST(flags) ?
IOMMU_FAULT_WRITE : IOMMU_FAULT_READ))


Actually, IS_WRITE_REQUEST() == 0 could mean:
- I=0, TR=0, PR=1 and RW=0: This is fine.
- I=0, (TR=1 or PR=0), and we should not be calling report_iommu_fault() here
  since we cannot specify READ/WRITE here.

Thanks,
Suravee


goto out;
}


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/amd: Use report_iommu_fault()

2021-07-28 Thread Suthikulpanit, Suravee via iommu

Lennert,

On 7/26/2021 11:31 AM, Lennert Buytenhek wrote:

This patch makes iommu/amd call report_iommu_fault() when an I/O page
fault occurs, which has two effects:

1) It allows device drivers to register a callback to be notified of
I/O page faults, via the iommu_set_fault_handler() API.

2) It triggers the io_page_fault tracepoint in report_iommu_fault()
when an I/O page fault occurs.

I'm mainly interested in (2).  We have a daemon with some rasdaemon-like
functionality for handling platform errors, and being able to be notified
of I/O page faults for initiating corrective action is very useful -- and
receiving such events via event tracing is a lot nicer than having to
scrape them from kmsg.

A number of other IOMMU drivers already use report_iommu_fault(), and
I/O page faults on those IOMMUs therefore already seem to trigger this
tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR.

I copied the logic from the other callers of report_iommu_fault(), where
if that function returns zero, the driver will have handled the fault,
in which case we avoid logging information about the fault to the printk
buffer from the IOMMU driver.

With this patch I see io_page_fault event tracing entries as expected:

irq/24-AMD-Vi-48[002]    978.554289: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482640 flags=0x
irq/24-AMD-Vi-48[002]    978.554294: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482650 flags=0x
irq/24-AMD-Vi-48[002]    978.554299: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482660 flags=0x
irq/24-AMD-Vi-48[002]    978.554305: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482670 flags=0x
irq/24-AMD-Vi-48[002]    978.554310: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482680 flags=0x
irq/24-AMD-Vi-48[002]    978.554315: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x914826a0 flags=0x

For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU
spec, but I haven't tested that bit of the code, as the page faults I
encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which
case EVENT_FLAG_RW doesn't make sense.

Signed-off-by: Lennert Buytenhek 
---
Changes since v1 RFC:

- Don't call report_iommu_fault() for IRQ remapping faults.
   (Suggested by Joerg Roedel.)

  drivers/iommu/amd/amd_iommu_types.h |  4 
  drivers/iommu/amd/iommu.c   | 29 +
  2 files changed, 33 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 94c1a7a9876d..2f2c6630c24c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -138,6 +138,10 @@
  #define EVENT_DOMID_MASK_HI   0xf
  #define EVENT_FLAGS_MASK  0xfff
  #define EVENT_FLAGS_SHIFT 0x10
+#define EVENT_FLAG_TR  0x100
+#define EVENT_FLAG_RW  0x020
+#define EVENT_FLAG_PR  0x010
+#define EVENT_FLAG_I   0x008
  
  /* feature control bits */

  #define CONTROL_IOMMU_EN0x00ULL
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a7d6d78147b7..d9fb2c22d44a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c


What if we introduce:

+/*
+ * AMD I/O Virtualization Technology (IOMMU) Specification,
+ * revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says
+ * that the RW ("read-write") bit is only valid if the I/O
+ * page fault was caused by a memory transaction request
+ * referencing a page that was marked present.
+ */
+#define IO_PAGE_FAULT_MEM_MASK \
+   (EVENT_FLAG_TR | EVENT_FLAG_PR | EVENT_FLAG_I)
+#define IS_IOMMU_MEM_TRANSACTION(x)\
+   ((x & IO_PAGE_FAULT_MEM_MASK) == EVENT_FLAG_PR)

Note that this should have already checked w/ EVENT_FLAG_I == 0.



@@ -484,6 +484,34 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
  
+	/*

+* If this is a DMA fault (for which the I(nterrupt) bit will
+* be unset), allow report_iommu_fault() to prevent logging it.
+*/
+   if (dev_data && ((flags & EVENT_FLAG_I) == 0)) {
+   int report_flags;
+
+   /*
+* AMD I/O Virtualization Technology (IOMMU) Specification,
+* revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says
+* that the RW ("read-write") bit is only valid if the I/O
+* page fault was caused by a memory transaction request
+* referencing a page that was marked present.
+*/
+   report_flags = 0;
+   if ((flags & (EVENT_FLAG_TR | EVENT_FLAG_PR)) ==
+   EVENT_FLAG_PR) {
+   if (flags & EVENT_FLAG_RW)
+   

Re: [PATCH] iommu/amd: Convert from atomic_t to refcount_t on pasid_state->count

2021-07-22 Thread Suthikulpanit, Suravee via iommu




On 7/19/2021 3:32 AM, Xiyu Yang wrote:

refcount_t type and corresponding API can protect refcounters from
accidental underflow and overflow and further use-after-free situations.

Signed-off-by: Xiyu Yang 
Signed-off-by: Xin Tan 


Thanks,

Reviewed-by: Suravee Suthikulpanit 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Convert from atomic_t to refcount_t on device_state->count

2021-07-22 Thread Suthikulpanit, Suravee via iommu




On 7/19/2021 1:00 AM, Xiyu Yang wrote:

refcount_t type and corresponding API can protect refcounters from
accidental underflow and overflow and further use-after-free situations.

Signed-off-by: Xiyu Yang 
Signed-off-by: Xin Tan 
---
  drivers/iommu/amd/iommu_v2.c | 11 ++-
  1 file changed, 6 insertions(+), 5 deletions(-)


Thanks,

Reviewed-by: Suravee Suthikulpanit 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH,RFC] iommu/amd: Use report_iommu_fault()

2021-07-22 Thread Suthikulpanit, Suravee via iommu

Lennert,

On 7/19/2021 4:54 AM, Lennert Buytenhek wrote:

This patch makes iommu/amd call report_iommu_fault() when an I/O page
fault occurs, which has two effects:

1) It allows device drivers to register a callback to be notified of
I/O page faults, via the iommu_set_fault_handler() API.

2) It triggers the io_page_fault tracepoint in report_iommu_fault()
when an I/O page fault occurs.

I'm mainly interested in (2).  We have a daemon with some rasdaemon-like
functionality for handling platform errors, and being able to be notified
of I/O page faults for initiating corrective action is very useful -- and
receiving such events via event tracing is a lot nicer than having to
scrape them from kmsg.


Interesting. Just curious what types of error handling are done here?


A number of other IOMMU drivers already use report_iommu_fault(), and
I/O page faults on those IOMMUs therefore already seem to trigger this
tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR.

I copied the logic from the other callers of report_iommu_fault(), where
if that function returns zero, the driver will have handled the fault,
in which case we avoid logging information about the fault to the printk
buffer from the IOMMU driver.

With this patch I see io_page_fault event tracing entries as expected:

irq/24-AMD-Vi-48[002]    978.554289: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482640 flags=0x
irq/24-AMD-Vi-48[002]    978.554294: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482650 flags=0x
irq/24-AMD-Vi-48[002]    978.554299: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482660 flags=0x
irq/24-AMD-Vi-48[002]    978.554305: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482670 flags=0x
irq/24-AMD-Vi-48[002]    978.554310: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482680 flags=0x
irq/24-AMD-Vi-48[002]    978.554315: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x914826a0 flags=0x

For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU
spec, but I haven't tested that bit of the code, as the page faults I
encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which
case EVENT_FLAG_RW doesn't make sense.


Since, IO_PAGE_FAULT event is used to communicate various types of fault events,
why don't we just pass the flags as-is? This way, it can be used to report/trace
various types of IO_PAGE_FAULT events (e.g. for I/O page table, interrupt 
remapping, and etc).

Interested parties can register domain fault handler, and it can takes care of 
parsing information
of the flag as needed.


Signed-off-by: Lennert Buytenhek 
---
  drivers/iommu/amd/amd_iommu_types.h |4 
  drivers/iommu/amd/iommu.c   |   25 +
  2 files changed, 29 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 94c1a7a9876d..2f2c6630c24c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -138,6 +138,10 @@
  #define EVENT_DOMID_MASK_HI   0xf
  #define EVENT_FLAGS_MASK  0xfff
  #define EVENT_FLAGS_SHIFT 0x10
+#define EVENT_FLAG_TR  0x100
+#define EVENT_FLAG_RW  0x020
+#define EVENT_FLAG_PR  0x010
+#define EVENT_FLAG_I   0x008
  
  /* feature control bits */

  #define CONTROL_IOMMU_EN0x00ULL
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 811a49a95d04..a02ace7ee794 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -480,6 +480,30 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
  
+	if (dev_data) {

+   int report_flags;
+
+   /*
+* AMD I/O Virtualization Technology (IOMMU) Specification,
+* revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says
+* that the RW ("read-write") bit is only valid if the I/O
+* page fault was caused by a memory transaction request
+* referencing a page that was marked present.
+*/
+   report_flags = 0;
+   if ((flags & (EVENT_FLAG_TR | EVENT_FLAG_PR | EVENT_FLAG_I)) ==
+   EVENT_FLAG_PR) {


Let's not do this check 


+   if (flags & EVENT_FLAG_RW)
+   report_flags |= IOMMU_FAULT_WRITE;
+   else
+   report_flags |= IOMMU_FAULT_READ;


... and then we don't need to translate the EVENT_FLAG_XX to IOMMU_FAULT_XXX 
flags.


+   }
+
+   if (!report_iommu_fault(_data->domain->domain,
+   >dev, address, report_flags))


Let's just 

Re: [PATCH] iommu/amd: Fix printing of IOMMU events when rate limiting kicks in

2021-07-22 Thread Suthikulpanit, Suravee via iommu

On 7/21/2021 8:44 AM, Lennert Buytenhek wrote:

For the printing of RMP_HW_ERROR / RMP_PAGE_FAULT / IO_PAGE_FAULT
events, the AMD IOMMU code uses such logic:

if (pdev)
dev_data = dev_iommu_priv_get(>dev);

if (dev_data && __ratelimit(_data->rs)) {
pci_err(pdev, ...
} else {
printk_ratelimit() / pr_err{,_ratelimited}(...
}

This means that if we receive an event for a PCI devid which actually
does have a struct pci_dev and an attached struct iommu_dev_data, but
rate limiting kicks in, we'll fall back to the non-PCI branch of the
test, and print the event in a different format.

Fix this by changing the logic to:

if (dev_data) {
if (__ratelimit(_data->rs)) {
pci_err(pdev, ...
}
} else {
pr_err_ratelimited(...
}

Suggested-by: Suravee Suthikulpanit
Signed-off-by: Lennert Buytenhek


Reviewed-by: Suravee Suthikulpanit 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Fix I/O page fault logging ratelimit test

2021-07-20 Thread Suthikulpanit, Suravee via iommu

Hi Lennert,

On 7/18/2021 7:47 PM, Lennert Buytenhek wrote:

On an AMD system, I/O page faults are usually logged like this:

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 811a49a95d04..7ae426b092f2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -483,7 +483,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
if (dev_data && __ratelimit(_data->rs)) {
pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x 
address=0x%llx flags=0x%04x]\n",
domain_id, address, flags);
-   } else if (printk_ratelimit()) {
+   } else if (!dev_data && printk_ratelimit()) {


This seems a bit confusing. Also, according to the following comment in 
include/linux/printk.h:

/*
 * Please don't use printk_ratelimit(), because it shares ratelimiting state
 * with all other unrelated printk_ratelimit() callsites.  Instead use
 * printk_ratelimited() or plain old __ratelimit().
 */

We probably should move away from using printk_ratelimit() here.
What about the following change instead?

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 811a49a95d04..8eb5d3519743 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -480,11 +480,12 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
if (pdev)
dev_data = dev_iommu_priv_get(>dev);

-   if (dev_data && __ratelimit(_data->rs)) {
-   pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x 
address=0x%llx flags=0x%04x]\n",
-   domain_id, address, flags);
-   } else if (printk_ratelimit()) {
-   pr_err("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x 
domain=0x%04x address=0x%llx flags=0x%04x]\n",
+   if (dev_data) {
+   if (__ratelimit(_data->rs))
+   pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x 
address=0x%llx flags=0x%04x]\n",
+   domain_id, address, flags);
+   } else {
+   pr_err_ratelimited("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x 
domain=0x%04x address=0x%llx flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
domain_id, address, flags);
}

Note also that there might be other places in this file that would need similar 
modification as well.

Thanks,
Suravee

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

2021-05-09 Thread Suthikulpanit, Suravee




On 5/5/2021 8:05 PM, Peter Zijlstra wrote:

On Wed, May 05, 2021 at 07:39:14PM +0700, Suthikulpanit, Suravee wrote:

Peter,

On 5/4/2021 7:13 PM, Peter Zijlstra wrote:

On Tue, May 04, 2021 at 06:58:29PM +0700, Suthikulpanit, Suravee wrote:

Peter,

On 5/4/2021 4:39 PM, Peter Zijlstra wrote:

On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote:


2. Since AMD IOMMU PMU does not support interrupt mode, the logic
  can be simplified to always start counting with value zero,
  and accumulate the counter value when stopping without the need
  to keep track and reprogram the counter with the previously read
  counter value.


This relies on the hardware counter being the full 64bit wide, is it?



The HW counter value is 48-bit. Not sure why it needs to be 64-bit?
I might be missing some points here? Could you please describe?


How do you deal with the 48bit overflow if you don't use the interrupt?


The IOMMU Perf driver does not currently handle counter overflow since the 
overflow
notification mechanism (i.e. IOMMU creates an EVENT_COUNTER_ZERO event in the 
IOMMU event log,
and generate an IOMMU MSI interrupt to signal IOMMU driver to process the 
event.) is not
currently supported. When counter overflows, the counter becomes zero, and Perf
reports value zero for the event.

Alternatively, to detect overflow, we might start counting with value 1 so that
we can detect overflow when the value becomes zero in which case the Perf driver
could generate error message.


Urgh.. the intel uncore driver programs an hrtimer to periodically fold
deltas. That way the counter will never be short.



Thanks for the info. I'll look into ways to detect and handle counter overflow 
for this.
So far, with the current power-gating, it has several restrictions regarding 
when
the HW counter can be accessed, which makes it more difficult to handle this.

Thanks,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

2021-05-05 Thread Suthikulpanit, Suravee

Peter,

On 5/4/2021 7:13 PM, Peter Zijlstra wrote:

On Tue, May 04, 2021 at 06:58:29PM +0700, Suthikulpanit, Suravee wrote:

Peter,

On 5/4/2021 4:39 PM, Peter Zijlstra wrote:

On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote:


2. Since AMD IOMMU PMU does not support interrupt mode, the logic
 can be simplified to always start counting with value zero,
 and accumulate the counter value when stopping without the need
 to keep track and reprogram the counter with the previously read
 counter value.


This relies on the hardware counter being the full 64bit wide, is it?



The HW counter value is 48-bit. Not sure why it needs to be 64-bit?
I might be missing some points here? Could you please describe?


How do you deal with the 48bit overflow if you don't use the interrupt?


The IOMMU Perf driver does not currently handle counter overflow since the 
overflow
notification mechanism (i.e. IOMMU creates an EVENT_COUNTER_ZERO event in the 
IOMMU event log,
and generate an IOMMU MSI interrupt to signal IOMMU driver to process the 
event.) is not
currently supported. When counter overflows, the counter becomes zero, and Perf
reports value zero for the event.

Alternatively, to detect overflow, we might start counting with value 1 so that
we can detect overflow when the value becomes zero in which case the Perf driver
could generate error message.

Regards,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

2021-05-04 Thread Suthikulpanit, Suravee

Peter,

On 5/4/2021 4:39 PM, Peter Zijlstra wrote:

On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote:


2. Since AMD IOMMU PMU does not support interrupt mode, the logic
can be simplified to always start counting with value zero,
and accumulate the counter value when stopping without the need
to keep track and reprogram the counter with the previously read
counter value.


This relies on the hardware counter being the full 64bit wide, is it?



The HW counter value is 48-bit. Not sure why it needs to be 64-bit?
I might be missing some points here? Could you please describe?

Thanks,
Suravee


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-20 Thread Suthikulpanit, Suravee

David / Joerg,

On 4/10/2021 5:03 PM, David Coe wrote:


The immediately obvious difference is the with the enormous count seen on 
mem_dte_mis on the older Ryzen 2400G. Will do some RTFM but anyone with 
comments and insight?

841,689,151,202,939   amd_iommu_0/mem_dte_mis/  (33.44%)

Otherwise, all seems to running smoothly (especially for a distribution still 
in β). Bravo and many thanks all!


The initial hypothesis is that the issue happens only when users specify more 
number of events than
the available counters, which Perf will time-multiplex the events onto the 
counters.

Looking at the Perf and AMD IOMMU PMU multiplexing logic, it requires:
 1. Stop the counter (i.e. set CSOURCE to zero to stop counting)
 2. Save the counter value of the current event
 3. Reload the counter value of the new event (previously saved)
 4. Start the counter (i.e. set CSOURCE to count new events)

The problem here is that when the driver writes zero to CSOURCE register in 
step 1, this would enable power-gating,
which prevents access to the counter and result in writing/reading value in 
step 2 and 3.

I have found a system that reproduced this case (w/ unusually large number of 
count), and debug the issue further.
As a hack, I have tried skipping step 1, and it seems to eliminate this issue. 
However, this is logically incorrect,
and might result in inaccurate data depending on the events.

Here are the options:
1. Continue to look for workaround for this issue.
2. Find a way to disable event time-multiplexing (e.g. only limit the number of 
counters to 8)
   if power gating is enabled on the platform.
3. Back to the original logic where we had the pre-init check of the counter 
vlues, which is still the safest choice
   at the moment unless

Regards,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-15 Thread Suthikulpanit, Suravee

David,

On 4/14/2021 10:33 PM, David Coe wrote:

Hi Suravee!

I've re-run your revert+update patch on Ubuntu's latest kernel 5.11.0-14 partly 
to check my mailer's 'mangling' hadn't also reached the code!

There are 3 sets of results in the attachment, all for the Ryzen 2400G. The 
as-distributed kernel already incorporates your IOMMU RFCv3 patch.

A. As-distributed kernel (cold boot)
    >5 retries, so no IOMMU read/write capability, no amd_iommu events.

B. As-distributed kernel (warm boot)
    <5 retries, amd_iommu running stats show large numbers as before.

C. Revert+Update kernel
    amd_iommu events listed and also show large hit/miss numbers.

In due course, I'll load the new (revert+update) kernel on the 4700G but won't 
overload your mail-box unless something unusual turns up.

Best regards,



For the Ryzen 2400G, could you please try with:
- 1 event at a time
- Not more than 8 events (On your system, it has 2 banks x 4 counters/bank.
I am trying to see if this issue might be related to the counters multiplexing).

Thanks,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-13 Thread Suthikulpanit, Suravee



On 4/10/2021 5:03 PM, David Coe wrote:

Results for AMD Ryzen 4700U running Ubuntu 21.04β kernel 5.11.0-13

$ sudo dmesg | grep IOMMU
[    0.490352] pci :00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.491985] pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.493732] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    0.793259] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel 



$ sudo perf stat -e 'amd_iommu_0/cmd_processed/, amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, amd_iommu_0/mem_iommu_tlb_pte_hit/, amd_iommu_0/mem_iommu_tlb_pte_mis/, amd_iommu_0/mem_pass_excl/, amd_iommu_0/mem_pass_pretrans/, amd_iommu_0/mem_pass_untrans/, amd_iommu_0/mem_target_abort/, 
amd_iommu_0/mem_trans_total/, amd_iommu_0/page_tbl_read_gst/, amd_iommu_0/page_tbl_read_nst/, amd_iommu_0/page_tbl_read_tot/, amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, amd_iommu_0/tlb_inv/, amd_iommu_0/vapic_int_guest/, amd_iommu_0/vapic_int_non_guest/' sleep 10


Performance counter stats for 'system wide':

    12  amd_iommu_0/cmd_processed/ (33.28%)
     6   amd_iommu_0/cmd_processed_inv/    (33.32%)
     0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/ (33.36%)
   290   amd_iommu_0/int_dte_hit/  (33.40%)
    20   amd_iommu_0/int_dte_mis/  (33.46%)
   391   amd_iommu_0/mem_dte_hit/  (33.49%)
     3,720   amd_iommu_0/mem_dte_mis/  (33.49%)
    44   amd_iommu_0/mem_iommu_tlb_pde_hit/    (33.46%)
   810   amd_iommu_0/mem_iommu_tlb_pde_mis/    (33.45%)
    35   amd_iommu_0/mem_iommu_tlb_pte_hit/    (33.41%)
   746   amd_iommu_0/mem_iommu_tlb_pte_mis/    (33.37%)
     0   amd_iommu_0/mem_pass_excl/    (33.32%)
     0   amd_iommu_0/mem_pass_pretrans/    (33.28%)
     0   amd_iommu_0/mem_pass_untrans/ (33.28%)
     0   amd_iommu_0/mem_target_abort/ (33.27%)
   715   amd_iommu_0/mem_trans_total/  (33.27%)
     0   amd_iommu_0/page_tbl_read_gst/    (33.28%)
    36   amd_iommu_0/page_tbl_read_nst/    (33.27%)
    36   amd_iommu_0/page_tbl_read_tot/    (33.27%)
     0   amd_iommu_0/smi_blk/  (33.28%)
     0   amd_iommu_0/smi_recv/ (33.26%)
     0   amd_iommu_0/tlb_inv/  (33.23%)
     0   amd_iommu_0/vapic_int_guest/  (33.24%)
   366   amd_iommu_0/vapic_int_non_guest/  (33.27%)

The immediately obvious difference is the with the enormous count seen on 
mem_dte_mis on the older Ryzen 2400G. Will do some RTFM but anyone with 
comments and insight?

841,689,151,202,939   amd_iommu_0/mem_dte_mis/  (33.44%)

Otherwise, all seems to running smoothly (especially for a distribution still 
in β). Bravo and many thanks all!


That doesn't look correct. Lemme do some more investigation also.

Thanks,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] Revert "iommu/amd: Fix performance counter initialization"

2021-04-13 Thread Suthikulpanit, Suravee

Shuah,

On 4/10/2021 12:06 AM, Shuah Khan wrote:

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 321f5906e6ed..648cdfd03074 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c

@@ -1729,39 +1728,17 @@ static void __init init_iommu_perf_ctr(struct amd_iommu 
*iommu)
  amd_iommu_pc_present = true;
  /* save the value to restore, if writable */
-if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false) ||
-iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, false))
-goto pc_false;
-
-/*
- * Disable power gating by programing the performance counter
- * source to 20 (i.e. counts the reads and writes from/to IOMMU
- * Reserved Register [MMIO Offset 1FF8h] that are ignored.),
- * which never get incremented during this init phase.
- * (Note: The event is also deprecated.)
- */
-val = 20;
-if (iommu_pc_get_set_reg(iommu, 0, 0, 8, , true))
+if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, false))
  goto pc_false;
  /* Check if the performance counters can be written to */
-val = 0xabcd;
-for (retry = 5; retry; retry--) {
-if (iommu_pc_get_set_reg(iommu, 0, 0, 0, , true) ||
-iommu_pc_get_set_reg(iommu, 0, 0, 0, , false) ||
-val2)
-break;
-
-/* Wait about 20 msec for power gating to disable and retry. */
-msleep(20);
-}
-
-/* restore */
-if (iommu_pc_get_set_reg(iommu, 0, 0, 0, _reg, true) ||
-iommu_pc_get_set_reg(iommu, 0, 0, 8, _src, true))
+if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, , true)) ||
+(iommu_pc_get_set_reg(iommu, 0, 0, 0, , false)) ||
+(val != val2))


Probably don't need parentheses around 'val != val2'


This is the result from git revert. Also, the logic is removed in patch 2/2.

Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

iommu: amd: Simpify decoding logic for INVALID_PPR_REQUEST event

2019-10-14 Thread Suthikulpanit, Suravee
Reuse existing macro to simplify the code and improve readability.

Cc: Joerg Roedel 
Cc: Gary R Hook 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index c1cb759..b249aa7 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -617,8 +617,7 @@ static void iommu_print_event(struct amd_iommu *iommu, void 
*__evt)
pasid, address, flags);
break;
case EVENT_TYPE_INV_PPR_REQ:
-   pasid = ((event[0] >> 16) & 0x)
-   | ((event[1] << 6) & 0xF);
+   pasid = PPR_PASID(*((u64 *)__evt));
tag = event[1] & 0x03FF;
dev_err(dev, "Event logged [INVALID_PPR_REQUEST 
device=%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x tag=0x%03x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


iommu: amd: Fix incorrect PASID decoding from event log

2019-10-14 Thread Suthikulpanit, Suravee
IOMMU Event Log encodes 20-bit PASID for events:
ILLEGAL_DEV_TABLE_ENTRY
IO_PAGE_FAULT
PAGE_TAB_HARDWARE_ERROR
INVALID_DEVICE_REQUEST
as:
PASID[15:0]  = bit 47:32
PASID[19:16] = bit 19:16

Note that INVALID_PPR_REQUEST event has different encoding
from the rest of the events as the following:
PASID[15:0]  = bit 31:16
PASID[19:16] = bit 45:42

So, fixes the decoding logic.

Fixes: d64c0486ed50 ("iommu/amd: Update the PASID information printed to the 
system log")
Cc: Joerg Roedel 
Cc: Gary R Hook 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c   | 5 +++--
 drivers/iommu/amd_iommu_types.h | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 61de819..c1cb759 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -560,7 +560,8 @@ static void iommu_print_event(struct amd_iommu *iommu, void 
*__evt)
 retry:
type= (event[1] >> EVENT_TYPE_SHIFT)  & EVENT_TYPE_MASK;
devid   = (event[0] >> EVENT_DEVID_SHIFT) & EVENT_DEVID_MASK;
-   pasid   = PPR_PASID(*(u64 *)[0]);
+   pasid   = (event[0] & EVENT_DOMID_MASK_HI) |
+ (event[1] & EVENT_DOMID_MASK_LO);
flags   = (event[1] >> EVENT_FLAGS_SHIFT) & EVENT_FLAGS_MASK;
address = (u64)(((u64)event[3]) << 32) | event[2];
 
@@ -593,7 +594,7 @@ static void iommu_print_event(struct amd_iommu *iommu, void 
*__evt)
address, flags);
break;
case EVENT_TYPE_PAGE_TAB_ERR:
-   dev_err(dev, "Event logged [PAGE_TAB_HARDWARE_ERROR 
device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n",
+   dev_err(dev, "Event logged [PAGE_TAB_HARDWARE_ERROR 
device=%02x:%02x.%x pasid=0x%04x address=0x%llx flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
pasid, address, flags);
break;
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 64edd5a..5a698ad 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -130,8 +130,8 @@
 #define EVENT_TYPE_INV_PPR_REQ 0x9
 #define EVENT_DEVID_MASK   0x
 #define EVENT_DEVID_SHIFT  0
-#define EVENT_DOMID_MASK   0x
-#define EVENT_DOMID_SHIFT  0
+#define EVENT_DOMID_MASK_LO0x
+#define EVENT_DOMID_MASK_HI0xf
 #define EVENT_FLAGS_MASK   0xfff
 #define EVENT_FLAGS_SHIFT  0x10
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: Re-factor guest virtual APIC (de-)activation code

2019-07-23 Thread Suthikulpanit, Suravee
Re-factore the logic for activate/deactivate guest virtual APIC mode (GAM)
into helper functions, and export them for other drivers (e.g. SVM).
to support run-time activate/deactivate of SVM AVIC.

Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c   | 85 +
 drivers/iommu/amd_iommu_types.h |  9 +
 include/linux/amd-iommu.h   | 12 ++
 3 files changed, 82 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index dce1d8d..42fba8d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -4301,13 +4301,62 @@ static void irq_remapping_deactivate(struct irq_domain 
*domain,
.deactivate = irq_remapping_deactivate,
 };
 
+int amd_iommu_activate_guest_mode(void *data)
+{
+   struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
+   struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
+
+   if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
+   !entry || entry->lo.fields_vapic.guest_mode)
+   return 0;
+
+   entry->lo.val = 0;
+   entry->hi.val = 0;
+
+   entry->lo.fields_vapic.guest_mode  = 1;
+   entry->lo.fields_vapic.ga_log_intr = 1;
+   entry->hi.fields.ga_root_ptr   = ir_data->ga_root_ptr;
+   entry->hi.fields.vector= ir_data->ga_vector;
+   entry->lo.fields_vapic.ga_tag  = ir_data->ga_tag;
+
+   return modify_irte_ga(ir_data->irq_2_irte.devid,
+ ir_data->irq_2_irte.index, entry, NULL);
+}
+EXPORT_SYMBOL(amd_iommu_activate_guest_mode);
+
+int amd_iommu_deactivate_guest_mode(void *data)
+{
+   struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
+   struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
+   struct irq_cfg *cfg = ir_data->cfg;
+
+   if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
+   !entry || !entry->lo.fields_vapic.guest_mode)
+   return 0;
+
+   entry->lo.val = 0;
+   entry->hi.val = 0;
+
+   entry->lo.fields_remap.dm  = apic->irq_dest_mode;
+   entry->lo.fields_remap.int_type= apic->irq_delivery_mode;
+   entry->hi.fields.vector= cfg->vector;
+   entry->lo.fields_remap.destination =
+   APICID_TO_IRTE_DEST_LO(cfg->dest_apicid);
+   entry->hi.fields.destination =
+   APICID_TO_IRTE_DEST_HI(cfg->dest_apicid);
+
+   return modify_irte_ga(ir_data->irq_2_irte.devid,
+ ir_data->irq_2_irte.index, entry, NULL);
+}
+EXPORT_SYMBOL(amd_iommu_deactivate_guest_mode);
+
 static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
 {
+   int ret;
struct amd_iommu *iommu;
struct amd_iommu_pi_data *pi_data = vcpu_info;
struct vcpu_data *vcpu_pi_info = pi_data->vcpu_data;
struct amd_ir_data *ir_data = data->chip_data;
-   struct irte_ga *irte = (struct irte_ga *) ir_data->entry;
struct irq_2_irte *irte_info = _data->irq_2_irte;
struct iommu_dev_data *dev_data = search_dev_data(irte_info->devid);
 
@@ -4318,6 +4367,7 @@ static int amd_ir_set_vcpu_affinity(struct irq_data 
*data, void *vcpu_info)
if (!dev_data || !dev_data->use_vapic)
return 0;
 
+   ir_data->cfg = irqd_cfg(data);
pi_data->ir_data = ir_data;
 
/* Note:
@@ -4336,37 +4386,24 @@ static int amd_ir_set_vcpu_affinity(struct irq_data 
*data, void *vcpu_info)
 
pi_data->prev_ga_tag = ir_data->cached_ga_tag;
if (pi_data->is_guest_mode) {
-   /* Setting */
-   irte->hi.fields.ga_root_ptr = (pi_data->base >> 12);
-   irte->hi.fields.vector = vcpu_pi_info->vector;
-   irte->lo.fields_vapic.ga_log_intr = 1;
-   irte->lo.fields_vapic.guest_mode = 1;
-   irte->lo.fields_vapic.ga_tag = pi_data->ga_tag;
-
-   ir_data->cached_ga_tag = pi_data->ga_tag;
+   ir_data->ga_root_ptr = (pi_data->base >> 12);
+   ir_data->ga_vector = vcpu_pi_info->vector;
+   ir_data->ga_tag = pi_data->ga_tag;
+   ret = amd_iommu_activate_guest_mode(ir_data);
+   if (!ret)
+   ir_data->cached_ga_tag = pi_data->ga_tag;
} else {
-   /* Un-Setting */
-   struct irq_cfg *cfg = irqd_cfg(data);
-
-   irte->hi.val = 0;
-   irte->lo.val = 0;
-   irte->hi.fields.vector = cfg->vector;
-   irte->lo.fields_remap.guest_mode = 0;
-   irte->lo.fields_remap.destination =
-   APICID_TO_IRTE_DEST_LO(cfg->dest_apicid);
-   irte->hi.fields.destination =
-   APICID_TO_IRTE_DEST_HI(cfg->dest_apicid);
-   irte->lo.fields_remap.int_type = apic->irq_delivery_mode;
-   

Re: [PATCH] iommu/amd: Add support for X2APIC IOMMU interrupts

2019-07-23 Thread Suthikulpanit, Suravee
Joerg,

This patch also fixes general x2APIC support for AMD IOMMU, which
was introduced earlier. Therefore, I am including the Fixes tag here.

Fixes: 90fcffd9cf5e ('iommu/amd: Add support for IOMMU XT mode')

Thanks,
Suravee

On 7/15/2019 11:29 PM, Suthikulpanit, Suravee wrote:
> AMD IOMMU requires IntCapXT registers to be setup in order to generate
> its own interrupts (for Event Log, PPR Log, and GA Log) with 32-bit
> APIC destination ID. Without this support, AMD IOMMU MSI interrupts
> will not be routed correctly when booting the system in X2APIC mode.
> 
> Cc: Joerg Roedel 
> Signed-off-by: Suravee Suthikulpanit 
> ---
>   drivers/iommu/amd_iommu_init.c  | 90 
> +
>   drivers/iommu/amd_iommu_types.h |  9 +
>   2 files changed, 99 insertions(+)
> 
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index ff40ba7..a4c5b1e 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -35,6 +35,8 @@
>   #include 
>   #include 
>   #include 
> +#include 
> +#include 
>   #include 
>   #include 
>   #include 
> @@ -1935,6 +1937,90 @@ static int iommu_setup_msi(struct amd_iommu *iommu)
>   return 0;
>   }
>   
> +#define XT_INT_DEST_MODE(x)  (((x) & 0x1ULL) << 2)
> +#define XT_INT_DEST_LO(x)(((x) & 0xFFULL) << 8)
> +#define XT_INT_VEC(x)(((x) & 0xFFULL) << 32)
> +#define XT_INT_DEST_HI(x)x) >> 24) & 0xFFULL) << 56)
> +
> +/**
> + * Setup the IntCapXT registers with interrupt routing information
> + * based on the PCI MSI capability block registers, accessed via
> + * MMIO MSI address low/hi and MSI data registers.
> + */
> +static void iommu_update_intcapxt(struct amd_iommu *iommu)
> +{
> + u64 val;
> + u32 addr_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET);
> + u32 addr_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET);
> + u32 data= readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET);
> + bool dm = (addr_lo >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
> + u32 dest= ((addr_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xFF);
> +
> + if (x2apic_enabled())
> + dest |= MSI_ADDR_EXT_DEST_ID(addr_hi);
> +
> + val = XT_INT_VEC(data & 0xFF) |
> +   XT_INT_DEST_MODE(dm) |
> +   XT_INT_DEST_LO(dest) |
> +   XT_INT_DEST_HI(dest);
> +
> + /**
> +  * Current IOMMU implemtation uses the same IRQ for all
> +  * 3 IOMMU interrupts.
> +  */
> + writeq(val, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
> + writeq(val, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
> + writeq(val, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
> +}
> +
> +static void _irq_notifier_notify(struct irq_affinity_notify *notify,
> +  const cpumask_t *mask)
> +{
> + struct amd_iommu *iommu;
> +
> + for_each_iommu(iommu) {
> + if (iommu->dev->irq == notify->irq) {
> + iommu_update_intcapxt(iommu);
> + break;
> + }
> + }
> +}
> +
> +static void _irq_notifier_release(struct kref *ref)
> +{
> +}
> +
> +static int iommu_init_intcapxt(struct amd_iommu *iommu)
> +{
> + int ret;
> + struct irq_affinity_notify *notify = >intcapxt_notify;
> +
> + /**
> +  * IntCapXT requires XTSup=1, which can be inferred
> +  * amd_iommu_xt_mode.
> +  */
> + if (amd_iommu_xt_mode != IRQ_REMAP_X2APIC_MODE)
> + return 0;
> +
> + /**
> +  * Also, we need to setup notifier to update the IntCapXT registers
> +  * whenever the irq affinity is changed from user-space.
> +  */
> + notify->irq = iommu->dev->irq;
> + notify->notify = _irq_notifier_notify,
> + notify->release = _irq_notifier_release,
> + ret = irq_set_affinity_notifier(iommu->dev->irq, notify);
> + if (ret) {
> + pr_err("Failed to register irq affinity notifier (devid=%#x, 
> irq %d)\n",
> +iommu->devid, iommu->dev->irq);
> + return ret;
> + }
> +
> + iommu_update_intcapxt(iommu);
> + iommu_feature_enable(iommu, CONTROL_INTCAPXT_EN);
> + return ret;
> +}
> +
>   static int iommu_init_msi(struct amd_iommu *iommu)
>   {
>   int ret;
> @@ -1951,6 +2037,10 @@ static int iommu_init_msi(struct amd_iommu *iommu)
>   return ret;
>   
>   enable_faults:
> + ret = iommu_init_intcapxt(iommu);
> + if (ret)
> + retu

[PATCH] iommu/amd: Add support for X2APIC IOMMU interrupts

2019-07-15 Thread Suthikulpanit, Suravee
AMD IOMMU requires IntCapXT registers to be setup in order to generate
its own interrupts (for Event Log, PPR Log, and GA Log) with 32-bit
APIC destination ID. Without this support, AMD IOMMU MSI interrupts
will not be routed correctly when booting the system in X2APIC mode.

Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu_init.c  | 90 +
 drivers/iommu/amd_iommu_types.h |  9 +
 2 files changed, 99 insertions(+)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index ff40ba7..a4c5b1e 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -35,6 +35,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -1935,6 +1937,90 @@ static int iommu_setup_msi(struct amd_iommu *iommu)
return 0;
 }
 
+#define XT_INT_DEST_MODE(x)(((x) & 0x1ULL) << 2)
+#define XT_INT_DEST_LO(x)  (((x) & 0xFFULL) << 8)
+#define XT_INT_VEC(x)  (((x) & 0xFFULL) << 32)
+#define XT_INT_DEST_HI(x)  x) >> 24) & 0xFFULL) << 56)
+
+/**
+ * Setup the IntCapXT registers with interrupt routing information
+ * based on the PCI MSI capability block registers, accessed via
+ * MMIO MSI address low/hi and MSI data registers.
+ */
+static void iommu_update_intcapxt(struct amd_iommu *iommu)
+{
+   u64 val;
+   u32 addr_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET);
+   u32 addr_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET);
+   u32 data= readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET);
+   bool dm = (addr_lo >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
+   u32 dest= ((addr_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xFF);
+
+   if (x2apic_enabled())
+   dest |= MSI_ADDR_EXT_DEST_ID(addr_hi);
+
+   val = XT_INT_VEC(data & 0xFF) |
+ XT_INT_DEST_MODE(dm) |
+ XT_INT_DEST_LO(dest) |
+ XT_INT_DEST_HI(dest);
+
+   /**
+* Current IOMMU implemtation uses the same IRQ for all
+* 3 IOMMU interrupts.
+*/
+   writeq(val, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
+   writeq(val, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
+   writeq(val, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
+}
+
+static void _irq_notifier_notify(struct irq_affinity_notify *notify,
+const cpumask_t *mask)
+{
+   struct amd_iommu *iommu;
+
+   for_each_iommu(iommu) {
+   if (iommu->dev->irq == notify->irq) {
+   iommu_update_intcapxt(iommu);
+   break;
+   }
+   }
+}
+
+static void _irq_notifier_release(struct kref *ref)
+{
+}
+
+static int iommu_init_intcapxt(struct amd_iommu *iommu)
+{
+   int ret;
+   struct irq_affinity_notify *notify = >intcapxt_notify;
+
+   /**
+* IntCapXT requires XTSup=1, which can be inferred
+* amd_iommu_xt_mode.
+*/
+   if (amd_iommu_xt_mode != IRQ_REMAP_X2APIC_MODE)
+   return 0;
+
+   /**
+* Also, we need to setup notifier to update the IntCapXT registers
+* whenever the irq affinity is changed from user-space.
+*/
+   notify->irq = iommu->dev->irq;
+   notify->notify = _irq_notifier_notify,
+   notify->release = _irq_notifier_release,
+   ret = irq_set_affinity_notifier(iommu->dev->irq, notify);
+   if (ret) {
+   pr_err("Failed to register irq affinity notifier (devid=%#x, 
irq %d)\n",
+  iommu->devid, iommu->dev->irq);
+   return ret;
+   }
+
+   iommu_update_intcapxt(iommu);
+   iommu_feature_enable(iommu, CONTROL_INTCAPXT_EN);
+   return ret;
+}
+
 static int iommu_init_msi(struct amd_iommu *iommu)
 {
int ret;
@@ -1951,6 +2037,10 @@ static int iommu_init_msi(struct amd_iommu *iommu)
return ret;
 
 enable_faults:
+   ret = iommu_init_intcapxt(iommu);
+   if (ret)
+   return ret;
+
iommu_feature_enable(iommu, CONTROL_EVT_INT_EN);
 
if (iommu->ppr_log != NULL)
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 87965e4..39be804f 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -72,6 +72,12 @@
 #define MMIO_PPR_LOG_OFFSET0x0038
 #define MMIO_GA_LOG_BASE_OFFSET0x00e0
 #define MMIO_GA_LOG_TAIL_OFFSET0x00e8
+#define MMIO_MSI_ADDR_LO_OFFSET0x015C
+#define MMIO_MSI_ADDR_HI_OFFSET0x0160
+#define MMIO_MSI_DATA_OFFSET   0x0164
+#define MMIO_INTCAPXT_EVT_OFFSET   0x0170
+#define MMIO_INTCAPXT_PPR_OFFSET   0x0178
+#define MMIO_INTCAPXT_GALOG_OFFSET 0x0180
 #define MMIO_CMD_HEAD_OFFSET   0x2000
 #define MMIO_CMD_TAIL_OFFSET   0x2008
 #define MMIO_EVT_HEAD_OFFSET   0x2010
@@ -162,6 +168,7 @@
 #define CONTROL_GALOG_EN0x1CULL
 #define CONTROL_GAINT_EN0x1DULL
 #define CONTROL_XT_EN 

[PATCH] svm/avic: iommu/amd: Flush IOMMU IRT after update all entries

2019-03-20 Thread Suthikulpanit, Suravee
When AVIC is enabled and the VM has discrete device assignment,
the interrupt remapping table (IRT) is used to keep track of which
destination APIC ID the IOMMU will inject the device interrput to.

This means every time a vcpu is blocked or context-switched (i.e.
vcpu_blocking/unblocking() and vcpu_load/put()), the information
in IRT must be updated and the IOMMU IRT flush command must be
issued.

The current implementation flushes IOMMU IRT every time an entry
is modified. If the assigned device has large number of interrupts
(hence large number of entries), this would add large amount of
overhead to vcpu context-switch. Instead, this can be optmized by
only flush IRT once per vcpu context-switch per device after all
IRT entries are modified.

The function amd_iommu_update_ga() is refactored to only update
IRT entry, while the amd_iommu_sync_ga() is introduced to allow
IRT flushing to be done separately.

Cc: Joerg Roedel 
Cc: Radim Krčmář 
Cc: Paolo Bonzini 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/kvm/svm.c| 35 ++-
 drivers/iommu/amd_iommu.c | 20 +---
 include/linux/amd-iommu.h | 13 ++---
 3 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 47c4993448c7..a5c7ca5d70d3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -118,6 +118,8 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
 #define AVIC_GATAG_TO_VMID(x)  ((x >> AVIC_VCPU_ID_BITS) & 
AVIC_VM_ID_MASK)
 #define AVIC_GATAG_TO_VCPUID(x)(x & AVIC_VCPU_ID_MASK)
 
+#define AVIC_DEVID_BITMAP_SIZE (1 << 16)
+
 static bool erratum_383_found __read_mostly;
 
 static const u32 host_save_user_msrs[] = {
@@ -251,6 +253,12 @@ struct vcpu_svm {
 
/* which host CPU was used for running this vcpu */
unsigned int last_cpu;
+
+   /*
+* Bitmap used to store PCI devid to sync
+* AMD IOMMU interrupt remapping table
+*/
+   unsigned long *avic_devid_sync_bitmap;
 };
 
 /*
@@ -1984,6 +1992,7 @@ static inline int
 avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r)
 {
int ret = 0;
+   int devid = 0;
unsigned long flags;
struct amd_svm_iommu_ir *ir;
struct vcpu_svm *svm = to_svm(vcpu);
@@ -2001,9 +2010,21 @@ avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, 
int cpu, bool r)
goto out;
 
list_for_each_entry(ir, >ir_list, node) {
-   ret = amd_iommu_update_ga(cpu, r, ir->data);
+   ret = amd_iommu_update_ga(cpu, r, ir->data, );
if (ret)
break;
+   set_bit(devid, svm->avic_devid_sync_bitmap);
+   }
+
+   /* Sync AMD IOMMU interrupt remapping table changes for each device. */
+   devid = find_next_bit(svm->avic_devid_sync_bitmap,
+ AVIC_DEVID_BITMAP_SIZE, 0);
+
+   while (devid < AVIC_DEVID_BITMAP_SIZE) {
+   clear_bit(devid, svm->avic_devid_sync_bitmap);
+   ret = amd_iommu_sync_ga(devid);
+   devid = find_next_bit(svm->avic_devid_sync_bitmap,
+ AVIC_DEVID_BITMAP_SIZE, devid+1);
}
 out:
spin_unlock_irqrestore(>ir_list_lock, flags);
@@ -2107,6 +2128,13 @@ static int avic_init_vcpu(struct vcpu_svm *svm)
INIT_LIST_HEAD(>ir_list);
spin_lock_init(>ir_list_lock);
 
+   svm->avic_devid_sync_bitmap = (void *)__get_free_pages(
+   GFP_KERNEL | __GFP_ZERO,
+   get_order(AVIC_DEVID_BITMAP_SIZE/8));
+   if (svm->avic_devid_sync_bitmap == NULL)
+   ret = -ENOMEM;
+   memset(svm->avic_devid_sync_bitmap, 0, AVIC_DEVID_BITMAP_SIZE/8);
+
return ret;
 }
 
@@ -2221,6 +2249,11 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
__free_page(virt_to_page(svm->nested.hsave));
__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
+
+   free_pages((unsigned long)svm->avic_devid_sync_bitmap,
+  get_order(AVIC_DEVID_BITMAP_SIZE/8));
+   svm->avic_devid_sync_bitmap = NULL;
+
kvm_vcpu_uninit(vcpu);
kmem_cache_free(x86_fpu_cache, svm->vcpu.arch.guest_fpu);
kmem_cache_free(kvm_vcpu_cache, svm);
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 2a7b78bb98b4..637bcc9192e5 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -4499,7 +4499,20 @@ int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
return 0;
 }
 
-int amd_iommu_update_ga(int cpu, bool is_run, void *data)
+int amd_iommu_sync_ga(int devid)
+{
+   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
+
+   if (!iommu)
+   return -ENODEV;
+
+   iommu_flush_irt(iommu, devid);
+   

Re: [PATCH v3] iommu: amd: Fix IOMMU page flush when detach device from a domain

2019-01-24 Thread Suthikulpanit, Suravee
Joerg,

On 1/24/19 9:11 PM, j...@8bytes.org wrote:
> On Thu, Jan 24, 2019 at 04:16:45AM +0000, Suthikulpanit, Suravee wrote:
>>   drivers/iommu/amd_iommu.c | 15 +++
>>   1 file changed, 11 insertions(+), 4 deletions(-)
> 
> Applied, thanks Suravee.
> 

Thanks. Also, should this also back-ported to stable tree as well?

Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3] iommu: amd: Fix IOMMU page flush when detach device from a domain

2019-01-23 Thread Suthikulpanit, Suravee
From: Suravee Suthikulpanit 

When a VM is terminated, the VFIO driver detaches all pass-through
devices from VFIO domain by clearing domain id and page table root
pointer from each device table entry (DTE), and then invalidates
the DTE. Then, the VFIO driver unmap pages and invalidate IOMMU pages.

Currently, the IOMMU driver keeps track of which IOMMU and how many
devices are attached to the domain. When invalidate IOMMU pages,
the driver checks if the IOMMU is still attached to the domain before
issuing the invalidate page command.

However, since VFIO has already detached all devices from the domain,
the subsequent INVALIDATE_IOMMU_PAGES commands are being skipped as
there is no IOMMU attached to the domain. This results in data
corruption and could cause the PCI device to end up in indeterministic
state.

Fix this by invalidate IOMMU pages when detach a device, and
before decrementing the per-domain device reference counts.

Cc: Boris Ostrovsky 
Suggested-by: Joerg Roedel 
Co-developed-by: Brijesh Singh 
Signed-off-by: Brijesh Singh 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 87ba23a75b38..6cd4a00036c1 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1991,16 +1991,13 @@ static void do_attach(struct iommu_dev_data *dev_data,
 
 static void do_detach(struct iommu_dev_data *dev_data)
 {
+   struct protection_domain *domain = dev_data->domain;
struct amd_iommu *iommu;
u16 alias;
 
iommu = amd_iommu_rlookup_table[dev_data->devid];
alias = dev_data->alias;
 
-   /* decrease reference counters */
-   dev_data->domain->dev_iommu[iommu->index] -= 1;
-   dev_data->domain->dev_cnt -= 1;
-
/* Update data structures */
dev_data->domain = NULL;
list_del(_data->list);
@@ -2010,6 +2007,16 @@ static void do_detach(struct iommu_dev_data *dev_data)
 
/* Flush the DTE entry */
device_flush_dte(dev_data);
+
+   /* Flush IOTLB */
+   domain_flush_tlb_pde(domain);
+
+   /* Wait for the flushes to finish */
+   domain_flush_complete(domain);
+
+   /* decrease reference counters - needs to happen after the flushes */
+   domain->dev_iommu[iommu->index] -= 1;
+   domain->dev_cnt -= 1;
 }
 
 /*
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain

2019-01-23 Thread Suthikulpanit, Suravee
Joerg,

On 1/23/19 2:56 PM, j...@8bytes.org wrote:
> Hi Suravee,
> 
> On Tue, Jan 22, 2019 at 03:53:18PM +, Suthikulpanit, Suravee wrote:
>> Thanks for the detail. Alright then, let's just go with the version you
>> sent on 1/16/19. Do you want me to resend V3 with that changes, or
>> would you be taking care of that?
> 
> Please send me a v3 based on the diff I sent last week. Also add a
> separate patch which adds the missing dte flush for the alias entry.
> 
> Thanks,
> 
>   Joerg
> 

Actually, I just noticed that device_flush_dte() has already handled flushing 
the DTE
for alias device as well. Please see the link below.

https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd_iommu.c#L1219

So, we don't need to call device_flush_dte() for alias device in do_detach().

Regards,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain

2019-01-22 Thread Suthikulpanit, Suravee
Joerg,

On 1/22/19 5:44 PM, j...@8bytes.org wrote:
> Hi Suravee,
> 
> On Thu, Jan 17, 2019 at 08:44:36AM +, Suthikulpanit, Suravee wrote:
>> Then, in __domain_flush_pages, we issue command when the dev_iommu[] >= 0.
>> This should preserve previous behavior, and only add flushing condition to
>> the specific IOMMU in detached state. Please let me know what you think.
> 
> I think the whole point why VFIO is detaching all devices first and then
> goes into unmapping pages is that there are no flushes needed anymore
> when devices are detached.
> 
> But when we follow your proposal we would still do IOTLB flushes even
> when no devices are attached anymore. So I'd like to avoid this, given
> the implications on unmapping performance. We should just flush the
> IOMMU TLB at detach time and be done with it.
> 
> Makes sense?
> 
> Regards,
> 
>   Joerg
> 

Thanks for the detail. Alright then, let's just go with the version you
sent on 1/16/19. Do you want me to resend V3 with that changes, or
would you be taking care of that?

Thanks,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain

2019-01-17 Thread Suthikulpanit, Suravee
Joerg,

On 1/17/19 3:44 PM, Suravee Suthikulpanit wrote:
> Joerg,
> 
> On 1/17/19 12:08 AM, j...@8bytes.org wrote:
>> On Wed, Jan 16, 2019 at 02:08:55PM +, Suthikulpanit, Suravee wrote:
>>> Actually, I am not sure how we would be missing the flush on the last 
>>> device.
>>> In my test, I am seeing the flush command being issued correctly during
>>> vfio_unmap_unpin(), which is after all devices are detached.
>>> Although, I might be missing your point here. Could you please elaborate?
>>
>> Okay, you are right, the patch effectivly adds an unconditional flush of
>> the domain on all iommus when the last device is detached. So it is
>> correct in that regard. But that code path is also used in the
>> iommu_unmap() path.
>>
>> The problem now is, that with your change we send flush commands to all
>> IOMMUs in the unmap path when no device is attached to the domain.
>> This will hurt performance there, no?
>>
>> Regards,
>>
>> Joerg
>>
> 
> Sounds like we need a way to track state of each IOMMU for a domain.
> What if we define the following:
> 
>    enum IOMMU_DOMAIN_STATES {
>      DOMAIN_FREE = -1,
>      DOMAIN_DETACHED = 0,
>      DOMAIN_ATTACHED >= 1
>    }
> 
> We should be able to use the dev_iommu[] to help track the state.
>      - In amd_iommu_domain_alloc, we initialize the array to DOMAIN_FREE
>      - In do_attach(), we change to DOMAIN_ATTACH or we can increment the 
> count
>    if it is already in DOMAIN_ATTACH state.
>      - In do_detach(). we change to DOMAIN_DETACH.
> 
> Then, in __domain_flush_pages, we issue command when the dev_iommu[] >= 0.
> This should preserve previous behavior, and only add flushing condition to
> the specific IOMMU in detached state. Please let me know what you think.
> 
> Regards,
> Suravee

By the way, I just sent V2 of this patch since it might be more clear.

Regards,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2] iommu/amd: Fix IOMMU page flush when detach all devices from a domain

2019-01-17 Thread Suthikulpanit, Suravee
From: Suravee Suthikulpanit 

When a VM is terminated, the VFIO driver detaches all pass-through
devices from VFIO domain by clearing domain id and page table root
pointer from each device table entry (DTE), and then invalidates
the DTE. Then, the VFIO driver unmap pages and invalidate IOMMU pages.

Currently, the IOMMU driver keeps track of which IOMMU and how many
devices are attached to the domain. When invalidate IOMMU pages,
the driver checks if the IOMMU is still attached to the domain before
issuing the invalidate page command.

However, since VFIO has already detached all devices from the domain,
the subsequent INVALIDATE_IOMMU_PAGES commands are being skipped as
there is no IOMMU attached to the domain. This results in data
corruption and could cause the PCI device to end up in indeterministic
state.

Fix this by introducing IOMMU domain states attached, detached,
and free. Then only issuing the IOMMU pages invalidate command when
domain is in attached and detached states.

Cc: Boris Ostrovsky 
Signed-off-by: Brijesh Singh 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c   | 25 -
 drivers/iommu/amd_iommu_types.h |  8 +++-
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 525659b88ade..1c64a26c50fb 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1248,7 +1248,14 @@ static void __domain_flush_pages(struct 
protection_domain *domain,
build_inv_iommu_pages(, address, size, domain->id, pde);
 
for (i = 0; i < amd_iommu_get_num_iommus(); ++i) {
-   if (!domain->dev_iommu[i])
+   /*
+* We issue the command only when the domain is
+* in ATTACHED and DETACHED state. This is required
+* since VFIO detaches all devices from the group
+* before invalidating IOMMU pages. Please see
+* vfio_iommu_type1_detach_group().
+*/
+   if (domain->dev_iommu[i] == IOMMU_DOMAIN_STATE_FREE)
continue;
 
/*
@@ -1292,7 +1299,8 @@ static void domain_flush_complete(struct 
protection_domain *domain)
int i;
 
for (i = 0; i < amd_iommu_get_num_iommus(); ++i) {
-   if (domain && !domain->dev_iommu[i])
+   if (domain &&
+   domain->dev_iommu[i] < IOMMU_DOMAIN_STATE_ATTACHED)
continue;
 
/*
@@ -1978,8 +1986,11 @@ static void do_attach(struct iommu_dev_data *dev_data,
list_add(_data->list, >dev_list);
 
/* Do reference counting */
-   domain->dev_iommu[iommu->index] += 1;
-   domain->dev_cnt += 1;
+   if (domain->dev_iommu[iommu->index] < IOMMU_DOMAIN_STATE_ATTACHED)
+   domain->dev_iommu[iommu->index] = IOMMU_DOMAIN_STATE_ATTACHED;
+   else
+   domain->dev_iommu[iommu->index] += 1;
+   domain->dev_cnt += 1;
 
/* Update device table */
set_dte_entry(dev_data->devid, domain, ats, dev_data->iommu_v2);
@@ -2904,6 +2915,7 @@ static int protection_domain_init(struct 
protection_domain *domain)
 
 static struct protection_domain *protection_domain_alloc(void)
 {
+   int i;
struct protection_domain *domain;
 
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
@@ -2915,6 +2927,9 @@ static struct protection_domain 
*protection_domain_alloc(void)
 
add_domain_to_list(domain);
 
+   for (i = 0; i < MAX_IOMMUS; i++)
+   domain->dev_iommu[i] = IOMMU_DOMAIN_STATE_FREE;
+
return domain;
 
 out_err:
@@ -3364,7 +3379,7 @@ static int __flush_pasid(struct protection_domain 
*domain, int pasid,
 * prevent device TLB refill from IOMMU TLB
 */
for (i = 0; i < amd_iommu_get_num_iommus(); ++i) {
-   if (domain->dev_iommu[i] == 0)
+   if (domain->dev_iommu[i] < IOMMU_DOMAIN_STATE_ATTACHED)
continue;
 
ret = iommu_queue_command(amd_iommus[i], );
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index eae0741f72dc..6d17cdcfa306 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -461,6 +461,12 @@ struct amd_irte_ops;
 
 #define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED  (1 << 0)
 
+enum IOMMU_DOMAIN_STATES {
+   IOMMU_DOMAIN_STATE_FREE = -1,
+   IOMMU_DOMAIN_STATE_DETTACHED,
+   IOMMU_DOMAIN_STATE_ATTACHED
+};
+
 /*
  * This structure contains generic data for  IOMMU protection domains
  * independent of their use.
@@ -480,7 +486,7 @@ struct protection_domain {
unsigned long flags;/* flags to find out type of domain */
bool updated;   /* complete domain flush required */
unsigned dev_cnt;   /* devices assigned to this domain */
-   unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */

Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain

2019-01-17 Thread Suthikulpanit, Suravee
Joerg,

On 1/17/19 12:08 AM, j...@8bytes.org wrote:
> On Wed, Jan 16, 2019 at 02:08:55PM +0000, Suthikulpanit, Suravee wrote:
>> Actually, I am not sure how we would be missing the flush on the last device.
>> In my test, I am seeing the flush command being issued correctly during
>> vfio_unmap_unpin(), which is after all devices are detached.
>> Although, I might be missing your point here. Could you please elaborate?
> 
> Okay, you are right, the patch effectivly adds an unconditional flush of
> the domain on all iommus when the last device is detached. So it is
> correct in that regard. But that code path is also used in the
> iommu_unmap() path.
> 
> The problem now is, that with your change we send flush commands to all
> IOMMUs in the unmap path when no device is attached to the domain.
> This will hurt performance there, no?
> 
> Regards,
> 
>   Joerg
> 

Sounds like we need a way to track state of each IOMMU for a domain.
What if we define the following:

   enum IOMMU_DOMAIN_STATES {
 DOMAIN_FREE = -1,
 DOMAIN_DETACHED = 0,
 DOMAIN_ATTACHED >= 1
   }

We should be able to use the dev_iommu[] to help track the state.
 - In amd_iommu_domain_alloc, we initialize the array to DOMAIN_FREE
 - In do_attach(), we change to DOMAIN_ATTACH or we can increment the count
   if it is already in DOMAIN_ATTACH state.
 - In do_detach(). we change to DOMAIN_DETACH.

Then, in __domain_flush_pages, we issue command when the dev_iommu[] >= 0.
This should preserve previous behavior, and only add flushing condition to
the specific IOMMU in detached state. Please let me know what you think.

Regards,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain

2019-01-16 Thread Suthikulpanit, Suravee
Joerg,

On 1/16/19 8:26 PM, j...@8bytes.org wrote:
> How about the attached diff? If
> I understand the problem correctly, it should fix the problem more
> reliably.
> 
> Thanks,
> 
>   Joerg
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 87ba23a75b38..dc1e2a8a19d7 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -1991,25 +1991,36 @@ static void do_attach(struct iommu_dev_data *dev_data,
>   
>   static void do_detach(struct iommu_dev_data *dev_data)
>   {
> + struct protection_domain *domain = dev_data->domain;
>   struct amd_iommu *iommu;
>   u16 alias;
>   
>   iommu = amd_iommu_rlookup_table[dev_data->devid];
>   alias = dev_data->alias;
>   
> - /* decrease reference counters */
> - dev_data->domain->dev_iommu[iommu->index] -= 1;
> - dev_data->domain->dev_cnt -= 1;
> -
>   /* Update data structures */
>   dev_data->domain = NULL;
>   list_del(_data->list);
> - clear_dte_entry(dev_data->devid);
> - if (alias != dev_data->devid)
> - clear_dte_entry(alias);
>   
> + clear_dte_entry(dev_data->devid);
>   /* Flush the DTE entry */
>   device_flush_dte(dev_data);
> +
> + if (alias != dev_data->devid) {
> + clear_dte_entry(alias);
> + /* Flush the Alias DTE entry */
> + device_flush_dte(alias);
> + }
> +

Actually, device_flush_dte(alias) should be needed regardless of this patch.
Are you planning to add this?

Regards,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Mark translation invalid during device detach

2019-01-16 Thread Suthikulpanit, Suravee
Joerg,

On 1/16/19 8:03 PM, j...@8bytes.org wrote:
> Hi Suravee,
> 
> On Wed, Jan 16, 2019 at 04:15:10AM +, Suthikulpanit, Suravee wrote:
>> From: Suravee Suthikulpanit 
>>
>> When a device switches domain, IOMMU driver detach device from the old
>> domain, and attach device to the new domain. During this period
>> the host table root pointer is not set, which means DMA translation
>> should be marked as invalid (clear TV bit).
>>
>> So, clear the TV bit when detach the device. The TV bit will be set
>> again when attaching device to the new domain.
> 
> Is there a specific problem with setting the TV bit?

We are not currently seeing issue.

> Note that the update will clear all other fields in the first 128 bits
> of the DTE, which means that IR, IW and Mode are all set to 0. This
> effectivly blocks all DMA requests from the device, which is by design.

Ahh.. This makes sense now. I missed the IR/IW/Mode=0 part.
It was not clear to us earlier. Thanks for clarification.

Regards,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain

2019-01-16 Thread Suthikulpanit, Suravee
Joerg,

On 1/16/19 8:26 PM, j...@8bytes.org wrote:
> On Wed, Jan 16, 2019 at 04:16:25AM +0000, Suthikulpanit, Suravee wrote:
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index 525659b88ade..ab31ba75da1b 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -1248,7 +1248,13 @@ static void __domain_flush_pages(struct 
>> protection_domain *domain,
>>  build_inv_iommu_pages(, address, size, domain->id, pde);
>>   
>>  for (i = 0; i < amd_iommu_get_num_iommus(); ++i) {
>> -if (!domain->dev_iommu[i])
>> +/*
>> + * The dev_cnt is zero when all devices are detached
>> + * from the domain. This is the case when VFIO detaches
>> + * all devices from the group before flushing IOMMU pages.
>> + * So, always issue the flush command.
>> + */
>> +if (domain->dev_cnt && !domain->dev_iommu[i])
>>  continue;
> 
> This doesn't look like the correct fix. We still miss the flush if we
> detach the last device from the domain. 

Actually, I am not sure how we would be missing the flush on the last device.
In my test, I am seeing the flush command being issued correctly during
vfio_unmap_unpin(), which is after all devices are detached.
Although, I might be missing your point here. Could you please elaborate?

> How about the attached diff? If
> I understand the problem correctly, it should fix the problem more
> reliably.
> 
> Thanks,
> 
>   Joerg
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 87ba23a75b38..dc1e2a8a19d7 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -1991,25 +1991,36 @@ static void do_attach(struct iommu_dev_data *dev_data,
>   
>   static void do_detach(struct iommu_dev_data *dev_data)
>   {
> + struct protection_domain *domain = dev_data->domain;
>   struct amd_iommu *iommu;
>   u16 alias;
>   
>   iommu = amd_iommu_rlookup_table[dev_data->devid];
>   alias = dev_data->alias;
>   
> - /* decrease reference counters */
> - dev_data->domain->dev_iommu[iommu->index] -= 1;
> - dev_data->domain->dev_cnt -= 1;
> -
>   /* Update data structures */
>   dev_data->domain = NULL;
>   list_del(_data->list);
> - clear_dte_entry(dev_data->devid);
> - if (alias != dev_data->devid)
> - clear_dte_entry(alias);
>   
> + clear_dte_entry(dev_data->devid);
>   /* Flush the DTE entry */
>   device_flush_dte(dev_data);
> +
> + if (alias != dev_data->devid) {
> + clear_dte_entry(alias);
> + /* Flush the Alias DTE entry */
> + device_flush_dte(alias);
> + }
> +
> + /* Flush IOTLB */
> + domain_flush_tlb_pde(domain);
> +
> + /* Wait for the flushes to finish */
> + domain_flush_complete(domain);
> +
> + /* decrease reference counters - needs to happen after the flushes */
> + domain->dev_iommu[iommu->index] -= 1;
> + domain->dev_cnt -= 1;
>   }

I have also considered this. This would also work. But since we are already
doing page flushes during page unmapping later on after all devices are 
detached.
So, would this be necessary? Please see vfio_iommu_type1_detach_group().

Also, if we consider the case where there are more than one devices sharing
the domain. This would issue page flush every time we detach a device,
and while other devices still attached to the domain.

Regards,
Suravee

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] amd/iommu: Fix Guest Virtual APIC Log Tail Address Register

2018-11-12 Thread Suthikulpanit, Suravee
From: Filippo Sironi 

This register should have been programmed with the physical address
of the memory location containing the shadow tail pointer for
the guest virtual APIC log instead of the base address.

Fixes: 8bda0cfbdc1a  ('iommu/amd: Detect and initialize guest vAPIC log')
Signed-off-by: Filippo Sironi 
Signed-off-by: Wei Wang 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu_init.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 9c3d610e1e19..e777fa90b2c2 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -797,7 +797,8 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
entry = iommu_virt_to_phys(iommu->ga_log) | GA_LOG_SIZE_512;
memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_BASE_OFFSET,
, sizeof(entry));
-   entry = (iommu_virt_to_phys(iommu->ga_log) & 0xFULL) & 
~7ULL;
+   entry = (iommu_virt_to_phys(iommu->ga_log_tail) &
+(BIT_ULL(52)-1)) & ~7ULL;
memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_TAIL_OFFSET,
, sizeof(entry));
writel(0x00, iommu->mmio_base + MMIO_GA_HEAD_OFFSET);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu