Re: AMD-Vi: Unable to read/write to IOMMU perf counter
Tj, I have posted RFCv3 in the BZ https://bugzilla.kernel.org/show_bug.cgi?id=201753. RFCv3 patch adds the logic to retry checking after 20msec wait for each retry loop since I have founded that certain platform takes about 10msec for the power gating to disable. Please give this a try to see if this works better on your platform. Thanks, Suravee On 2/4/21 1:25 PM, Tj (Elloe Linux) wrote: On 02/02/2021 05:54, Suravee Suthikulpanit wrote: Could you please try the attached patch to see if the problem still persist. Tested on top of commit 61556703b610 doesn't appear to have solved the issue. Linux version 5.11.0-rc6+ (tj@elloe000) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubunt> Command line: BOOT_IMAGE=/vmlinuz-5.11.0-rc6+ root=/dev/mapper/ELLOE000-rootfs ro acpi_osi=! "acpi_osi=Windows 20> ... DMI: LENOVO 20NECTO1WW/20NECTO1WW, BIOS R11ET32W (1.12 ) 12/23/2019 ... AMD-Vi: ivrs, add hid:PNPD0040, uid:, rdevid:152 ... smpboot: CPU0: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx (family: 0x17, model: 0x18, stepping: 0x1) ... pci :00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter. pci :00:00.2: can't derive routing for PCI INT A pci :00:00.2: PCI INT A: not connected pci :00:01.0: Adding to iommu group 0 pci :00:01.1: Adding to iommu group 1 ... pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40 pci :00:00.2: AMD-Vi: Extended features (0x4f77ef22294ada): PPR NX GT IA GA PC GA_vAPIC AMD-Vi: Interrupt remapping enabled AMD-Vi: Virtual APIC enabled AMD-Vi: Lazy IO/TLB flushing enabled amd_uncore: 4 amd_df counters detected ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 2/2] PCI: vmd: Disable MSI-X remapping when possible
VMD will retransmit child device MSI-X using its own MSI-X table and requester-id. This limits the number of MSI-X available to the whole child device domain to the number of VMD MSI-X interrupts. Some VMD devices have a mode where this remapping can be disabled, allowing child device interrupts to bypass processing with the VMD MSI-X domain interrupt handler and going straight the child device interrupt handler, allowing for better performance and scaling. The requester-id still gets changed to the VMD endpoint's requester-id, and the interrupt remapping handlers have been updated to properly set IRTE for child device interrupts to the VMD endpoint's context. Some VMD platforms have existing production BIOS which rely on MSI-X remapping and won't explicitly program the MSI-X remapping bit. This re-enables MSI-X remapping on unload. Acked-by: Joerg Roedel Signed-off-by: Jon Derrick --- drivers/pci/controller/vmd.c | 61 +--- 1 file changed, 49 insertions(+), 12 deletions(-) diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 5e80f28f0119..2a585d1b3349 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -28,6 +28,7 @@ #define BUS_RESTRICT_CAP(vmcap)(vmcap & 0x1) #define PCI_REG_VMCONFIG 0x44 #define BUS_RESTRICT_CFG(vmcfg)((vmcfg >> 8) & 0x3) +#define VMCFG_MSI_RMP_DIS 0x2 #define PCI_REG_VMLOCK 0x70 #define MB2_SHADOW_EN(vmlock) (vmlock & 0x2) @@ -59,6 +60,13 @@ enum vmd_features { * be used for MSI remapping */ VMD_FEAT_OFFSET_FIRST_VECTOR= (1 << 3), + + /* +* Device can bypass remapping MSI-X transactions into its MSI-X table, +* avoiding the requirement of a VMD MSI domain for child device +* interrupt handling. +*/ + VMD_FEAT_BYPASS_MSI_REMAP = (1 << 4), }; /* @@ -306,6 +314,15 @@ static struct msi_domain_info vmd_msi_domain_info = { .chip = &vmd_msi_controller, }; +static void vmd_enable_msi_remapping(struct vmd_dev *vmd, bool enable) +{ + u16 reg; + + pci_read_config_word(vmd->dev, PCI_REG_VMCONFIG, ®); + reg = enable ? (reg & ~VMCFG_MSI_RMP_DIS) : (reg | VMCFG_MSI_RMP_DIS); + pci_write_config_word(vmd->dev, PCI_REG_VMCONFIG, reg); +} + static int vmd_create_irq_domain(struct vmd_dev *vmd) { struct fwnode_handle *fn; @@ -325,6 +342,13 @@ static int vmd_create_irq_domain(struct vmd_dev *vmd) static void vmd_remove_irq_domain(struct vmd_dev *vmd) { + /* +* Some production BIOS won't enable remapping between soft reboots. +* Ensure remapping is restored before unloading the driver. +*/ + if (!vmd->msix_count) + vmd_enable_msi_remapping(vmd, true); + if (vmd->irq_domain) { struct fwnode_handle *fn = vmd->irq_domain->fwnode; @@ -679,15 +703,31 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) sd->node = pcibus_to_node(vmd->dev->bus); - ret = vmd_create_irq_domain(vmd); - if (ret) - return ret; - /* -* Override the irq domain bus token so the domain can be distinguished -* from a regular PCI/MSI domain. +* Currently MSI remapping must be enabled in guest passthrough mode +* due to some missing interrupt remapping plumbing. This is probably +* acceptable because the guest is usually CPU-limited and MSI +* remapping doesn't become a performance bottleneck. */ - irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI); + if (!(features & VMD_FEAT_BYPASS_MSI_REMAP) || offset[0] || offset[1]) { + ret = vmd_alloc_irqs(vmd); + if (ret) + return ret; + + vmd_enable_msi_remapping(vmd, true); + + ret = vmd_create_irq_domain(vmd); + if (ret) + return ret; + + /* +* Override the irq domain bus token so the domain can be +* distinguished from a regular PCI/MSI domain. +*/ + irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI); + } else { + vmd_enable_msi_remapping(vmd, false); + } pci_add_resource(&resources, &vmd->resources[0]); pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]); @@ -753,10 +793,6 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id) if (features & VMD_FEAT_OFFSET_FIRST_VECTOR) vmd->first_vec = 1; - err = vmd_alloc_irqs(vmd); - if (err) - return err; - spin_lock_init(&vmd->cfg_lock); pci_set_drvdata(dev, vmd); err = vmd_enable_domain(vmd, features); @@ -825,7 +861,8 @@ static const struct pci_device_id vmd_ids[] = {
[PATCH v3 0/2] VMD MSI Remapping Bypass
The Intel Volume Management Device acts similar to a PCI-to-PCI bridge in that it changes downstream devices' requester-ids to its own. As VMD supports PCIe devices, it has its own MSI-X table and transmits child device MSI-X by remapping child device MSI-X and handling like a demultiplexer. Some newer VMD devices (Icelake Server) have an option to bypass the VMD MSI-X remapping table. This allows for better performance scaling as the child device MSI-X won't be limited by VMD's MSI-X count and IRQ handler. V2->V3: Trivial comment fixes Added acks V1->V2: Updated for 5.12-next Moved IRQ allocation and remapping enable/disable to a more logical location V1 patches 1-4 were already merged V1, 5/6: https://patchwork.kernel.org/project/linux-pci/patch/20200728194945.14126-6-jonathan.derr...@intel.com/ V1, 6/6: https://patchwork.kernel.org/project/linux-pci/patch/20200728194945.14126-7-jonathan.derr...@intel.com/ Jon Derrick (2): iommu/vt-d: Use Real PCI DMA device for IRTE PCI: vmd: Disable MSI-X remapping when possible drivers/iommu/intel/irq_remapping.c | 3 +- drivers/pci/controller/vmd.c| 61 +++-- 2 files changed, 51 insertions(+), 13 deletions(-) -- 2.27.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 1/2] iommu/vt-d: Use Real PCI DMA device for IRTE
VMD retransmits child device MSI-X with the VMD endpoint's requester-id. In order to support direct interrupt remapping of VMD child devices, ensure that the IRTE is programmed with the VMD endpoint's requester-id using pci_real_dma_dev(). Acked-by: Lu Baolu Acked-by: Joerg Roedel Signed-off-by: Jon Derrick --- drivers/iommu/intel/irq_remapping.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 685200a5cff0..1939e070eec8 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1276,7 +1276,8 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, break; case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: - set_msi_sid(irte, msi_desc_to_pci_dev(info->desc)); + set_msi_sid(irte, + pci_real_dma_dev(msi_desc_to_pci_dev(info->desc))); break; default: BUG_ON(1); -- 2.27.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/2] PCI: vmd: Disable MSI/X remapping when possible
On Fri, 2021-02-05 at 15:57 -0600, Bjorn Helgaas wrote: > On Thu, Feb 04, 2021 at 12:09:06PM -0700, Jon Derrick wrote: > > VMD will retransmit child device MSI/X using its own MSI/X table and > > requester-id. This limits the number of MSI/X available to the whole > > child device domain to the number of VMD MSI/X interrupts. > > > > Some VMD devices have a mode where this remapping can be disabled, > > allowing child device interrupts to bypass processing with the VMD MSI/X > > domain interrupt handler and going straight the child device interrupt > > handler, allowing for better performance and scaling. The requester-id > > still gets changed to the VMD endpoint's requester-id, and the interrupt > > remapping handlers have been updated to properly set IRTE for child > > device interrupts to the VMD endpoint's context. > > > > Some VMD platforms have existing production BIOS which rely on MSI/X > > remapping and won't explicitly program the MSI/X remapping bit. This > > re-enables MSI/X remapping on unload. > > Trivial comments below. Would you mind using "MSI-X" instead of > "MSI/X" so it matches the usage in the PCIe specs? Several mentions > above (including subject) and below. Thanks Bjorn, will do > > > Signed-off-by: Jon Derrick > > --- > > drivers/pci/controller/vmd.c | 60 > > 1 file changed, 48 insertions(+), 12 deletions(-) > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > > index 5e80f28f0119..a319ce49645b 100644 > > --- a/drivers/pci/controller/vmd.c > > +++ b/drivers/pci/controller/vmd.c > > @@ -59,6 +59,13 @@ enum vmd_features { > > * be used for MSI remapping > > */ > > VMD_FEAT_OFFSET_FIRST_VECTOR= (1 << 3), > > + > > + /* > > +* Device can bypass remapping MSI/X transactions into its MSI/X table, > > +* avoding the requirement of a VMD MSI domain for child device > > s/avoding/avoiding/ > > > +* interrupt handling > > Maybe a period at the end of the sentence. > > > +*/ > > + VMD_FEAT_BYPASS_MSI_REMAP = (1 << 4), > > }; > > > > /* > > @@ -306,6 +313,15 @@ static struct msi_domain_info vmd_msi_domain_info = { > > .chip = &vmd_msi_controller, > > }; > > > > +static void vmd_enable_msi_remapping(struct vmd_dev *vmd, bool enable) > > +{ > > + u16 reg; > > + > > + pci_read_config_word(vmd->dev, PCI_REG_VMCONFIG, ®); > > + reg = enable ? (reg & ~0x2) : (reg | 0x2); > > Would be nice to have a #define for 0x2. > > > + pci_write_config_word(vmd->dev, PCI_REG_VMCONFIG, reg); > > +} > > + > > static int vmd_create_irq_domain(struct vmd_dev *vmd) > > { > > struct fwnode_handle *fn; > > @@ -325,6 +341,13 @@ static int vmd_create_irq_domain(struct vmd_dev *vmd) > > > > static void vmd_remove_irq_domain(struct vmd_dev *vmd) > > { > > + /* > > +* Some production BIOS won't enable remapping between soft reboots. > > +* Ensure remapping is restored before unloading the driver. > > +*/ > > + if (!vmd->msix_count) > > + vmd_enable_msi_remapping(vmd, true); > > + > > if (vmd->irq_domain) { > > struct fwnode_handle *fn = vmd->irq_domain->fwnode; > > > > @@ -679,15 +702,31 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > unsigned long features) > > > > sd->node = pcibus_to_node(vmd->dev->bus); > > > > - ret = vmd_create_irq_domain(vmd); > > - if (ret) > > - return ret; > > - > > /* > > -* Override the irq domain bus token so the domain can be distinguished > > -* from a regular PCI/MSI domain. > > +* Currently MSI remapping must be enabled in guest passthrough mode > > +* due to some missing interrupt remapping plumbing. This is probably > > +* acceptable because the guest is usually CPU-limited and MSI > > +* remapping doesn't become a performance bottleneck. > > */ > > - irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI); > > + if (!(features & VMD_FEAT_BYPASS_MSI_REMAP) || offset[0] || offset[1]) { > > + ret = vmd_alloc_irqs(vmd); > > + if (ret) > > + return ret; > > + > > + vmd_enable_msi_remapping(vmd, true); > > + > > + ret = vmd_create_irq_domain(vmd); > > + if (ret) > > + return ret; > > + > > + /* > > +* Override the irq domain bus token so the domain can be > > +* distinguished from a regular PCI/MSI domain. > > +*/ > > + irq_domain_update_bus_token(vmd->irq_domain, > > DOMAIN_BUS_VMD_MSI); > > + } else { > > + vmd_enable_msi_remapping(vmd, false); > > + } > > > > pci_add_resource(&resources, &vmd->resources[0]); > > pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]); > > @@ -753,10 +792,6 @@ static int vmd_probe(struct pci_dev *dev, const struct > > pci_device_id *id) > > if (features & VMD_FEAT_OFFSET_FIRST_V
Re: [PATCH 1/3] dt-bindings: Fix undocumented compatible strings in examples
Quoting Rob Herring (2021-02-02 12:55:42) > Running 'dt-validate -m' will flag any compatible strings missing a schema. > Fix all the errors found in DT binding examples. Most of these are just > typos. > > Cc: Stephen Boyd > Cc: Maxime Ripard > Cc: Chen-Yu Tsai > Cc: Linus Walleij > Cc: Herbert Xu > Cc: "David S. Miller" > Cc: Daniel Palmer > Cc: Bartosz Golaszewski > Cc: Avi Fishman > Cc: Tomer Maimon > Cc: Tali Perry > Cc: Joerg Roedel > Cc: Will Deacon > Cc: Andrew Jeffery > Cc: Joel Stanley > Cc: Wim Van Sebroeck > Cc: Guenter Roeck > Cc: Yoshihiro Shimoda > Cc: Vincent Cheng > Cc: linux-...@vger.kernel.org > Cc: linux-cry...@vger.kernel.org > Cc: linux-g...@vger.kernel.org > Cc: linux-...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org > Cc: linux-watch...@vger.kernel.org > Signed-off-by: Rob Herring > --- Acked-by: Stephen Boyd ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] dt-bindings: Fix undocumented compatible strings in examples
On Tue, Feb 02, 2021 at 02:55:42PM -0600, Rob Herring wrote: > Running 'dt-validate -m' will flag any compatible strings missing a schema. > Fix all the errors found in DT binding examples. Most of these are just > typos. > > Cc: Stephen Boyd > Cc: Maxime Ripard > Cc: Chen-Yu Tsai > Cc: Linus Walleij > Cc: Herbert Xu > Cc: "David S. Miller" > Cc: Daniel Palmer > Cc: Bartosz Golaszewski > Cc: Avi Fishman > Cc: Tomer Maimon > Cc: Tali Perry > Cc: Joerg Roedel > Cc: Will Deacon > Cc: Andrew Jeffery > Cc: Joel Stanley > Cc: Wim Van Sebroeck > Cc: Guenter Roeck > Cc: Yoshihiro Shimoda > Cc: Vincent Cheng > Cc: linux-...@vger.kernel.org > Cc: linux-cry...@vger.kernel.org > Cc: linux-g...@vger.kernel.org > Cc: linux-...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org > Cc: linux-watch...@vger.kernel.org > Signed-off-by: Rob Herring Acked-by: Wolfram Sang # for I2C signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/2] PCI: vmd: Disable MSI/X remapping when possible
On Thu, Feb 04, 2021 at 12:09:06PM -0700, Jon Derrick wrote: > VMD will retransmit child device MSI/X using its own MSI/X table and > requester-id. This limits the number of MSI/X available to the whole > child device domain to the number of VMD MSI/X interrupts. > > Some VMD devices have a mode where this remapping can be disabled, > allowing child device interrupts to bypass processing with the VMD MSI/X > domain interrupt handler and going straight the child device interrupt > handler, allowing for better performance and scaling. The requester-id > still gets changed to the VMD endpoint's requester-id, and the interrupt > remapping handlers have been updated to properly set IRTE for child > device interrupts to the VMD endpoint's context. > > Some VMD platforms have existing production BIOS which rely on MSI/X > remapping and won't explicitly program the MSI/X remapping bit. This > re-enables MSI/X remapping on unload. Trivial comments below. Would you mind using "MSI-X" instead of "MSI/X" so it matches the usage in the PCIe specs? Several mentions above (including subject) and below. > Signed-off-by: Jon Derrick > --- > drivers/pci/controller/vmd.c | 60 > 1 file changed, 48 insertions(+), 12 deletions(-) > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > index 5e80f28f0119..a319ce49645b 100644 > --- a/drivers/pci/controller/vmd.c > +++ b/drivers/pci/controller/vmd.c > @@ -59,6 +59,13 @@ enum vmd_features { >* be used for MSI remapping >*/ > VMD_FEAT_OFFSET_FIRST_VECTOR= (1 << 3), > + > + /* > + * Device can bypass remapping MSI/X transactions into its MSI/X table, > + * avoding the requirement of a VMD MSI domain for child device s/avoding/avoiding/ > + * interrupt handling Maybe a period at the end of the sentence. > + */ > + VMD_FEAT_BYPASS_MSI_REMAP = (1 << 4), > }; > > /* > @@ -306,6 +313,15 @@ static struct msi_domain_info vmd_msi_domain_info = { > .chip = &vmd_msi_controller, > }; > > +static void vmd_enable_msi_remapping(struct vmd_dev *vmd, bool enable) > +{ > + u16 reg; > + > + pci_read_config_word(vmd->dev, PCI_REG_VMCONFIG, ®); > + reg = enable ? (reg & ~0x2) : (reg | 0x2); Would be nice to have a #define for 0x2. > + pci_write_config_word(vmd->dev, PCI_REG_VMCONFIG, reg); > +} > + > static int vmd_create_irq_domain(struct vmd_dev *vmd) > { > struct fwnode_handle *fn; > @@ -325,6 +341,13 @@ static int vmd_create_irq_domain(struct vmd_dev *vmd) > > static void vmd_remove_irq_domain(struct vmd_dev *vmd) > { > + /* > + * Some production BIOS won't enable remapping between soft reboots. > + * Ensure remapping is restored before unloading the driver. > + */ > + if (!vmd->msix_count) > + vmd_enable_msi_remapping(vmd, true); > + > if (vmd->irq_domain) { > struct fwnode_handle *fn = vmd->irq_domain->fwnode; > > @@ -679,15 +702,31 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > unsigned long features) > > sd->node = pcibus_to_node(vmd->dev->bus); > > - ret = vmd_create_irq_domain(vmd); > - if (ret) > - return ret; > - > /* > - * Override the irq domain bus token so the domain can be distinguished > - * from a regular PCI/MSI domain. > + * Currently MSI remapping must be enabled in guest passthrough mode > + * due to some missing interrupt remapping plumbing. This is probably > + * acceptable because the guest is usually CPU-limited and MSI > + * remapping doesn't become a performance bottleneck. >*/ > - irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI); > + if (!(features & VMD_FEAT_BYPASS_MSI_REMAP) || offset[0] || offset[1]) { > + ret = vmd_alloc_irqs(vmd); > + if (ret) > + return ret; > + > + vmd_enable_msi_remapping(vmd, true); > + > + ret = vmd_create_irq_domain(vmd); > + if (ret) > + return ret; > + > + /* > + * Override the irq domain bus token so the domain can be > + * distinguished from a regular PCI/MSI domain. > + */ > + irq_domain_update_bus_token(vmd->irq_domain, > DOMAIN_BUS_VMD_MSI); > + } else { > + vmd_enable_msi_remapping(vmd, false); > + } > > pci_add_resource(&resources, &vmd->resources[0]); > pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]); > @@ -753,10 +792,6 @@ static int vmd_probe(struct pci_dev *dev, const struct > pci_device_id *id) > if (features & VMD_FEAT_OFFSET_FIRST_VECTOR) > vmd->first_vec = 1; > > - err = vmd_alloc_irqs(vmd); > - if (err) > - return err; > - > spin_lock_init(&vmd->cfg_lock); > pci_set_drvdata(dev, vmd); > err = vmd
Re: [git pull] IOMMU Fix for Linux v5.11-rc6
The pull request you sent on Fri, 5 Feb 2021 17:01:57 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git > tags/iommu-fixes-v5.11-rc6 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/97ba0c7413f83ab3b43a5ba05362ecc837fce518 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] swiotlb: Validate bounce size in the sync/unmap path
On Wed, Feb 03, 2021 at 02:36:38PM -0500, Konrad Rzeszutek Wilk wrote: > > So what? If you guys want to provide a new capability you'll have to do > > work. And designing a new protocol based around the fact that the > > hardware/hypervisor is not trusted and a copy is always required makes > > a lot of more sense than throwing in band aids all over the place. > > If you don't trust the hypervisor, what would this capability be in? Well, they don't trust the hypervisor to not attack the guest somehow, except through the data read. I never really understood the concept, as it leaves too many holes. But the point is that these schemes want to force bounce buffering because they think it is more secure. And if that is what you want you better have protocol build around the fact that each I/O needs to use bounce buffers, so you make those buffers the actual shared memory use for communication, and build the protocol around it. E.g. you don't force the ridiculous NVMe PRP offset rules on the block layer, just to make a complicated swiotlb allocation that needs to preserve the alignment just do I/O. But instead you have a trivial ring buffer or whatever because you know I/O will be copied anyway and none of all the hard work higher layers do to make the I/O suitable for a normal device apply. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Consult on ARM SMMU debugfs
On 2021-01-15 22:47, Robin Murphy wrote: On 2021-01-15 15:14, Russell King - ARM Linux admin wrote: On Mon, Jan 11, 2021 at 08:01:48PM +, Robin Murphy wrote: On 2021-01-07 02:45, chenxiang (M) wrote: Hi Will,� Robin or other guys, When debugging SMMU/SVA issue on huawei ARM64 board, we find that it lacks of enough debugfs for ARM SMMU driver (such as the value of STE/CD which we need to check sometimes). Currently it creates top-level iommu directory in debugfs, but there is no debugfs for ARM SMMU driver specially. Do you know whether ARM have the plan to do that recently? FWIW I don't think I've ever felt the need to need to inspect the Stream Table on a live system. So far the nature of the STE code has been simple enough that it's very hard for any given STE to be *wrong* - either it's set up as expected and thus works fine, or it's not initialised at all and you get C_BAD_STE, where 99% of the time you then just cross-reference the Stream ID against the firmware and find that the DT/IORT is wrong. Similarly I don't think I've even even *seen* an issue that could be attributed to a context descriptor, although I appreciate that as we start landing more PASID and SVA support the scope for that starts to widen considerably. Feel free to propose a patch if you believe it would be genuinely useful and won't just bit-rot into a maintenance burden, but it's not something that's on our roadmap here. I do think that the IOMMU stuff needs better debugging. I've hit the WARN_ON() in __arm_lpae_map(), and it's been pretty much undebuggable, so I've resorted to putting the IOMMU into bypass mode permanently to work around the issue. The reason that it's undebuggable is if one puts printk() or trace statements in the code, boots the platform, you get flooded with those debugging messages, because every access to the rootfs generates and tears down a mapping. It would be nice to be able to inspect the IOMMU page tables and state of the IOMMU, rather than having to resort to effectively disabling the IOMMU. Certainly once we get to stuff like unpinned VFIO, having the ability to inspect pagetables for arbitrary IOMMU API usage will indeed be useful. From the DMA mapping perspective, though, unless you're working on the io-pgtable code itself it's not really going to tell you much that dumping the mappings from dma-debug can't already. FWIW whenever I encounter that particular warning in iommu-dma context, I don't care where the existing mapping is pointing, since it's merely a symptom of the damage already having been done. At that point I'd usually go off and audit all the DMA API calls in the offending driver, since it's typically caused by corruption in the IOVA allocator from passing the wrong size in a dma_unmap_*() call, and those can often be spotted by inspection. For active debugging, what you really want to know is the *history* of operations around that IOVA, since you're primarily interested in the request that last mapped it, then the corresponding unmap request for nominally the same buffer (which allowed the IOVA region to be freed for reuse) that for some reason didn't cover one or more pages that it should have. The IOMMU API tracepoints can be a handy tool there. Currently IOMMU trace events are not straight forward to decode if there are multiple devices attached. For ex: consider below: map: IOMMU: iova=0x00035000 paddr=0x000113be2000 size=4096 unmap: IOMMU: iova=0x00034000 size=4096 unmapped_size=4096 unmap: IOMMU: iova=0x00035000 size=4096 unmapped_size=4096 map: IOMMU: iova=0x00036000 paddr=0x0001164d8000 size=4096 map: IOMMU: iova=0x00037000 paddr=0x0001164da000 size=4096 unmap: IOMMU: iova=0x00036000 size=4096 unmapped_size=4096 unmap: IOMMU: iova=0x00037000 size=4096 unmapped_size=4096 How about making it more useful adding the device name as well? Ex: map: IOMMU:ae0.mdss iova=0x0002b000 paddr=0x00010a9e6000 size=8192 map: IOMMU:ae0.mdss iova=0x0002d000 paddr=0x00010a9ec000 size=21790 map: IOMMU:ae0.mdss iova=0x00241000 paddr=0x00010c40 size=59392 map: IOMMU:a60.dwc3 iova=0x0004a000 paddr=0x00010a821000 size=4096 map: IOMMU:a60.dwc3 iova=0x00049000 paddr=0x00010a82 size=4096 unmap: IOMMU:a60.dwc3 iova=0x0004a000 size=4096 unmapped_size=4096 unmap: IOMMU:a60.dwc3 iova=0x00049000 size=4096 unmapped_size=4096 We have been carrying a local patch downstream like forever, I can post a patch if you guys think it is useful in general. Thanks Sai -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU
On 2021-02-05 11:48, Robin Murphy wrote: On 2021-02-05 09:13, Keqian Zhu wrote: Hi Robin and Jean, On 2021/2/5 3:50, Robin Murphy wrote: On 2021-01-28 15:17, Keqian Zhu wrote: From: jiangkunkun The SMMU which supports HTTU (Hardware Translation Table Update) can update the access flag and the dirty state of TTD by hardware. It is essential to track dirty pages of DMA. This adds feature detection, none functional change. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 include/linux/io-pgtable.h | 1 + 3 files changed, 25 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 8ca7415d785d..0f0fe71cc10d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1987,6 +1987,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain, .pgsize_bitmap = smmu->pgsize_bitmap, .ias = ias, .oas = oas, + .httu_hd = smmu->features & ARM_SMMU_FEAT_HTTU_HD, .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENCY, .tlb = &arm_smmu_flush_ops, .iommu_dev = smmu->dev, @@ -3224,6 +3225,21 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) if (reg & IDR0_HYP) smmu->features |= ARM_SMMU_FEAT_HYP; + switch (FIELD_GET(IDR0_HTTU, reg)) { We need to accommodate the firmware override as well if we need this to be meaningful. Jean-Philippe is already carrying a suitable patch in the SVA stack[1]. Robin, Thanks for pointing it out. Jean, I see that the IORT HTTU flag overrides the hardware register info unconditionally. I have some concern about it: If the override flag has HTTU but hardware doesn't support it, then driver will use this feature but receive access fault or permission fault from SMMU unexpectedly. 1) If IOPF is not supported, then kernel can not work normally. 2) If IOPF is supported, kernel will perform useless actions, such as HTTU based dma dirty tracking (this series). Yes, if the IORT describes the SMMU incorrectly, things will not work well. Just like if it describes the wrong base address or the wrong interrupt numbers, things will also not work well. The point is that incorrect firmware can be updated in the field fairly easily; incorrect hardware can not. Say the SMMU designer hard-codes the ID register field to 0x2 because the SMMU itself is capable of HTTU, and they assume it's always going to be wired up coherently, but then a customer integrates it to a non-coherent interconnect. Firmware needs to override that value to prevent an OS thinking that the claimed HTTU capability is ever going to work. Or say the SMMU *is* integrated correctly, but due to an erratum discovered later in the interconnect or SMMU itself, it turns out DBM doesn't always work reliably, but AF is still OK. Firmware needs to downgrade the indicated level of support from that which was intended to that which works reliably. Or say someone forgets to set an integration tieoff so their SMMU reports 0x0 even though it and the interconnect *can* happily support HTTU. In that case, firmware may want to upgrade the value to *allow* an OS to use HTTU despite the ID register being wrong. As the IORT spec doesn't give an explicit explanation for HTTU override, can we comprehend it as a mask for HTTU related hardware register? So the logic becomes: smmu->feature = HTTU override & IDR0_HTTU; No, it literally states that the OS must use the value of the firmware field *instead* of the value from the hardware field. Oops, apologies for an oversight there - I've been reviewing IORT spec updates lately so naturally had the newest version open already. Turns out these descriptions were only clarified in the most recent release, so if you were looking at an older document they *were* horribly vague. Robin. + case IDR0_HTTU_NONE: + break; + case IDR0_HTTU_HA: + smmu->features |= ARM_SMMU_FEAT_HTTU_HA; + break; + case IDR0_HTTU_HAD: + smmu->features |= ARM_SMMU_FEAT_HTTU_HA; + smmu->features |= ARM_SMMU_FEAT_HTTU_HD; + break; + default: + dev_err(smmu->dev, "unknown/unsupported HTTU!\n"); + return -ENXIO; + } + /* * The coherency feature as set by FW is used in preference to the ID * register, but warn on mismatch. diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 96c2e9565e00..e91bea44519e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -33,6 +33,10 @@ #define IDR0_ASID16 (1 << 12) #define IDR0_ATS (1
[git pull] IOMMU Fix for Linux v5.11-rc6
Hi Linus, The following changes since commit 1048ba83fb1c00cd24172e23e8263972f6b5d9ac: Linux 5.11-rc6 (2021-01-31 13:50:09 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git tags/iommu-fixes-v5.11-rc6 for you to fetch changes up to 4c9fb5d9140802db4db9f66c23887f43174e113c: iommu: Check dev->iommu in dev_iommu_priv_get() before dereferencing it (2021-02-02 15:57:23 +0100) IOMMU Fix for Linux v5.11-rc6 - Fix a possible NULL-ptr dereference in dev_iommu_priv_get() which is too easy to accidentially trigger from IOMMU drivers. In the current case the AMD IOMMU driver triggered it on some machines in the IO-page-fault path, so fix it once and for all. Joerg Roedel (1): iommu: Check dev->iommu in dev_iommu_priv_get() before dereferencing it include/linux/iommu.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) Please pull. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/mediatek: Fix error code in probe()
This error path is supposed to return -EINVAL. It used to return directly but we added some clean up and accidentally removed the error code. Also I fixed a typo in the error message. Fixes: c0b57581b73b ("iommu/mediatek: Add power-domain operation") Signed-off-by: Dan Carpenter --- drivers/iommu/mtk_iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 0ad14a7604b1..5f78ac0dc30e 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -886,7 +886,8 @@ static int mtk_iommu_probe(struct platform_device *pdev) link = device_link_add(data->smicomm_dev, dev, DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME); if (!link) { - dev_err(dev, "Unable link %s.\n", dev_name(data->smicomm_dev)); + dev_err(dev, "Unable to link %s.\n", dev_name(data->smicomm_dev)); + ret = -EINVAL; goto out_runtime_disable; } -- 2.30.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
On 2021-02-04 03:16, Will Deacon wrote: On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote: On 2021-02-01 23:50, Jordan Crouse wrote: > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote: > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon wrote: > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote: > > > > On 2021-01-29 14:35, Will Deacon wrote: > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote: > > > > > > +#define IOMMU_LLC(1 << 6) > > > > > > > > > > On reflection, I'm a bit worried about exposing this because I think it > > > > > will > > > > > introduce a mismatched virtual alias with the CPU (we don't even have a > > > > > MAIR > > > > > set up for this memory type). Now, we also have that issue for the PTW, > > > > > but > > > > > since we always use cache maintenance (i.e. the streaming API) for > > > > > publishing the page-tables to a non-coheren walker, it works out. > > > > > However, > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent > > > > > allocation, then they're potentially in for a nasty surprise due to the > > > > > mismatched outer-cacheability attributes. > > > > > > > > > > > > > Can't we add the syscached memory type similar to what is done on android? > > > > > > Maybe. How does the GPU driver map these things on the CPU side? > > > > Currently we use writecombine mappings for everything, although there > > are some cases that we'd like to use cached (but have not merged > > patches that would give userspace a way to flush/invalidate) > > > > LLC/system cache doesn't have a relationship with the CPU cache. Its > just a > little accelerator that sits on the connection from the GPU to DDR and > caches > accesses. The hint that Sai is suggesting is used to mark the buffers as > 'no-write-allocate' to prevent GPU write operations from being cached in > the LLC > which a) isn't interesting and b) takes up cache space for read > operations. > > Its easiest to think of the LLC as a bonus accelerator that has no cost > for > us to use outside of the unfortunate per buffer hint. > > We do have to worry about the CPU cache w.r.t I/O coherency (which is a > different hint) and in that case we have all of concerns that Will > identified. > For mismatched outer cacheability attributes which Will mentioned, I was referring to [1] in android kernel. I've lost track of the conversation here :/ When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also mapped into the CPU and with what attributes? Rob said "writecombine for everything" -- does that mean ioremap_wc() / MEMREMAP_WC? Rob answered this. Finally, we need to be careful when we use the word "hint" as "allocation hint" has a specific meaning in the architecture, and if we only mismatch on those then we're actually ok. But I think IOMMU_LLC is more than just a hint, since it actually drives eviction policy (i.e. it enables writeback). Sorry for the pedantry, but I just want to make sure we're all talking about the same things! Sorry for the confusion which probably was caused by my mentioning of android, NWA(no write allocate) is an allocation hint which we can ignore for now as it is not introduced yet in upstream. Thanks, Sai -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 3/3] iommu/arm-smmu-v3: Add debug interfaces for SMMUv3
On 2021/2/4 23:58, Robin Murphy wrote: > On 2021-01-29 09:06, Zhou Wang wrote: >> This patch adds debug interfaces for SMMUv3 driver in sysfs. It adds debug >> related files under /sys/kernel/debug/iommu/smmuv3. >> >> User should firstly set device and pasid to pci_dev and pasid by: >> (currently only support PCI device) >> echo ::. > /sys/kernel/debug/iommu/smmuv3/pci_dev >> echo > /sys/kernel/debug/iommu/smmuv3/pasid >> >> Then value in cd and ste can be got by: >> cat /sys/kernel/debug/iommu/smmuv3/ste >> cat /sys/kernel/debug/iommu/smmuv3/cd >> >> S1 and S2 page tables can be got by: >> cat /sys/kernel/debug/iommu/smmuv3/pt_dump_s1 >> cat /sys/kernel/debug/iommu/smmuv3/pt_dump_s2 >> >> For ste, cd and page table, related device and pasid are set in pci_dev and >> pasid files as above. >> >> Signed-off-by: Zhou Wang >> --- >> drivers/iommu/Kconfig | 11 + >> drivers/iommu/arm/arm-smmu-v3/Makefile | 1 + >> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 + >> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 + >> drivers/iommu/arm/arm-smmu-v3/debugfs.c | 398 >> >> 5 files changed, 421 insertions(+) >> create mode 100644 drivers/iommu/arm/arm-smmu-v3/debugfs.c >> >> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig >> index 192ef8f..4822c88 100644 >> --- a/drivers/iommu/Kconfig >> +++ b/drivers/iommu/Kconfig >> @@ -325,6 +325,17 @@ config ARM_SMMU_V3_SVA >> Say Y here if your system supports SVA extensions such as PCIe PASID >> and PRI. >> +config ARM_SMMU_V3_DEBUGFS >> +bool "Export ARM SMMUv3 internals in Debugfs" >> +depends on ARM_SMMU_V3 && IOMMU_DEBUGFS >> +help >> + DO NOT ENABLE THIS OPTION UNLESS YOU REALLY KNOW WHAT YOU ARE DOING! >> + >> + Expose ARM SMMUv3 internals in Debugfs. >> + >> + This option is -NOT- intended for production environments, and should >> + only be enabled for debugging ARM SMMUv3. >> + >> config S390_IOMMU >> def_bool y if S390 && PCI >> depends on S390 && PCI >> diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile >> b/drivers/iommu/arm/arm-smmu-v3/Makefile >> index 54feb1ec..55b411a 100644 >> --- a/drivers/iommu/arm/arm-smmu-v3/Makefile >> +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile >> @@ -3,3 +3,4 @@ obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o >> arm_smmu_v3-objs-y += arm-smmu-v3.o >> arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o >> arm_smmu_v3-objs := $(arm_smmu_v3-objs-y) >> +obj-$(CONFIG_ARM_SMMU_V3_DEBUGFS) += debugfs.o >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> index b65f63e2..aac7fdb 100644 >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> @@ -3602,6 +3602,8 @@ static int arm_smmu_device_probe(struct >> platform_device *pdev) >> return ret; >> } >> +arm_smmu_debugfs_init(); >> + >> return arm_smmu_set_bus_ops(&arm_smmu_ops); >> } >> @@ -3610,6 +3612,7 @@ static int arm_smmu_device_remove(struct >> platform_device *pdev) >> struct arm_smmu_device *smmu = platform_get_drvdata(pdev); >> arm_smmu_set_bus_ops(NULL); >> +arm_smmu_debugfs_uninit(); >> iommu_device_unregister(&smmu->iommu); >> iommu_device_sysfs_remove(&smmu->iommu); >> arm_smmu_device_disable(smmu); >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> index 3e7af39..31c4580 100644 >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> @@ -752,4 +752,12 @@ static inline u32 arm_smmu_sva_get_pasid(struct >> iommu_sva *handle) >> static inline void arm_smmu_sva_notifier_synchronize(void) {} >> #endif /* CONFIG_ARM_SMMU_V3_SVA */ >> + >> +#ifdef CONFIG_ARM_SMMU_V3_DEBUGFS >> +void arm_smmu_debugfs_init(void); >> +void arm_smmu_debugfs_uninit(void); >> +#else >> +static inline void arm_smmu_debugfs_init(void) {} >> +static inline void arm_smmu_debugfs_uninit(void) {} >> +#endif /* CONFIG_ARM_SMMU_V3_DEBUGFS */ >> #endif /* _ARM_SMMU_V3_H */ >> diff --git a/drivers/iommu/arm/arm-smmu-v3/debugfs.c >> b/drivers/iommu/arm/arm-smmu-v3/debugfs.c >> new file mode 100644 >> index 000..1af219a >> --- /dev/null >> +++ b/drivers/iommu/arm/arm-smmu-v3/debugfs.c >> @@ -0,0 +1,398 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> +#include >> +#include >> +#include >> +#include >> +#include "arm-smmu-v3.h" >> +#include "../../io-pgtable-arm.h" >> + >> +#undef pr_fmt >> +#define pr_fmt(fmt)"SMMUv3 debug: " fmt >> + >> +#define NAME_BUF_LEN 32 >> + >> +static struct dentry *arm_smmu_debug; >> +static char dump_pci_dev[NAME_BUF_LEN]; >> +static u32 pasid; >> +static struct mutex lock; >> + >> +static ssize_t master_pdev_read(struct file *filp, char __user *buf, >> +size_t count, loff_t *pos) >> +{ >> +char
Re: preserve DMA offsets when using swiotlb
I've pushed the updated series out to: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-offset but I'm going to wait until next week before patch bombing everyone again. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/2] dma-mapping: benchmark: use u8 for reserved field in uAPI structure
Thanks, applied. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU
On 2021-02-05 09:13, Keqian Zhu wrote: Hi Robin and Jean, On 2021/2/5 3:50, Robin Murphy wrote: On 2021-01-28 15:17, Keqian Zhu wrote: From: jiangkunkun The SMMU which supports HTTU (Hardware Translation Table Update) can update the access flag and the dirty state of TTD by hardware. It is essential to track dirty pages of DMA. This adds feature detection, none functional change. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 include/linux/io-pgtable.h | 1 + 3 files changed, 25 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 8ca7415d785d..0f0fe71cc10d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1987,6 +1987,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain, .pgsize_bitmap= smmu->pgsize_bitmap, .ias= ias, .oas= oas, +.httu_hd= smmu->features & ARM_SMMU_FEAT_HTTU_HD, .coherent_walk= smmu->features & ARM_SMMU_FEAT_COHERENCY, .tlb= &arm_smmu_flush_ops, .iommu_dev= smmu->dev, @@ -3224,6 +3225,21 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) if (reg & IDR0_HYP) smmu->features |= ARM_SMMU_FEAT_HYP; +switch (FIELD_GET(IDR0_HTTU, reg)) { We need to accommodate the firmware override as well if we need this to be meaningful. Jean-Philippe is already carrying a suitable patch in the SVA stack[1]. Robin, Thanks for pointing it out. Jean, I see that the IORT HTTU flag overrides the hardware register info unconditionally. I have some concern about it: If the override flag has HTTU but hardware doesn't support it, then driver will use this feature but receive access fault or permission fault from SMMU unexpectedly. 1) If IOPF is not supported, then kernel can not work normally. 2) If IOPF is supported, kernel will perform useless actions, such as HTTU based dma dirty tracking (this series). Yes, if the IORT describes the SMMU incorrectly, things will not work well. Just like if it describes the wrong base address or the wrong interrupt numbers, things will also not work well. The point is that incorrect firmware can be updated in the field fairly easily; incorrect hardware can not. Say the SMMU designer hard-codes the ID register field to 0x2 because the SMMU itself is capable of HTTU, and they assume it's always going to be wired up coherently, but then a customer integrates it to a non-coherent interconnect. Firmware needs to override that value to prevent an OS thinking that the claimed HTTU capability is ever going to work. Or say the SMMU *is* integrated correctly, but due to an erratum discovered later in the interconnect or SMMU itself, it turns out DBM doesn't always work reliably, but AF is still OK. Firmware needs to downgrade the indicated level of support from that which was intended to that which works reliably. Or say someone forgets to set an integration tieoff so their SMMU reports 0x0 even though it and the interconnect *can* happily support HTTU. In that case, firmware may want to upgrade the value to *allow* an OS to use HTTU despite the ID register being wrong. As the IORT spec doesn't give an explicit explanation for HTTU override, can we comprehend it as a mask for HTTU related hardware register? So the logic becomes: smmu->feature = HTTU override & IDR0_HTTU; No, it literally states that the OS must use the value of the firmware field *instead* of the value from the hardware field. +case IDR0_HTTU_NONE: +break; +case IDR0_HTTU_HA: +smmu->features |= ARM_SMMU_FEAT_HTTU_HA; +break; +case IDR0_HTTU_HAD: +smmu->features |= ARM_SMMU_FEAT_HTTU_HA; +smmu->features |= ARM_SMMU_FEAT_HTTU_HD; +break; +default: +dev_err(smmu->dev, "unknown/unsupported HTTU!\n"); +return -ENXIO; +} + /* * The coherency feature as set by FW is used in preference to the ID * register, but warn on mismatch. diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 96c2e9565e00..e91bea44519e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -33,6 +33,10 @@ #define IDR0_ASID16(1 << 12) #define IDR0_ATS(1 << 10) #define IDR0_HYP(1 << 9) +#define IDR0_HTTUGENMASK(7, 6) +#define IDR0_HTTU_NONE0 +#define IDR0_HTTU_HA1 +#define IDR0_HTTU_HAD2 #define IDR0_COHACC(1 << 4) #define IDR0_TTFGENMASK(3, 2) #define IDR0_TTF_AARCH642 @@ -286,6 +290,8 @@ #def
[PATCH v3 2/2] dma-mapping: benchmark: pretend DMA is transmitting
In a real dma mapping user case, after dma_map is done, data will be transmit. Thus, in multi-threaded user scenario, IOMMU contention should not be that severe. For example, if users enable multiple threads to send network packets through 1G/10G/100Gbps NIC, usually the steps will be: map -> transmission -> unmap. Transmission delay reduces the contention of IOMMU. Here a delay is added to simulate the transmission between map and unmap so that the tested result could be more accurate for TX and simple RX. A typical TX transmission for NIC would be like: map -> TX -> unmap since the socket buffers come from OS. Simple RX model eg. disk driver, is also map -> RX -> unmap, but real RX model in a NIC could be more complicated considering packets can come spontaneously and many drivers are using pre-mapped buffers pool. This is in the TBD list. Signed-off-by: Barry Song --- kernel/dma/map_benchmark.c| 12 ++- .../testing/selftests/dma/dma_map_benchmark.c | 21 --- 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c index da95df381483..e0e64f8b0739 100644 --- a/kernel/dma/map_benchmark.c +++ b/kernel/dma/map_benchmark.c @@ -21,6 +21,7 @@ #define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) #define DMA_MAP_MAX_THREADS1024 #define DMA_MAP_MAX_SECONDS300 +#define DMA_MAP_MAX_TRANS_DELAY(10 * NSEC_PER_MSEC) #define DMA_MAP_BIDIRECTIONAL 0 #define DMA_MAP_TO_DEVICE 1 @@ -36,7 +37,8 @@ struct map_benchmark { __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ - __u8 expansion[84]; /* For future use */ + __u32 dma_trans_ns; /* time for DMA transmission in ns */ + __u8 expansion[80]; /* For future use */ }; struct map_benchmark_data { @@ -87,6 +89,9 @@ static int map_benchmark_thread(void *data) map_etime = ktime_get(); map_delta = ktime_sub(map_etime, map_stime); + /* Pretend DMA is transmitting */ + ndelay(map->bparam.dma_trans_ns); + unmap_stime = ktime_get(); dma_unmap_single(map->dev, dma_addr, PAGE_SIZE, map->dir); unmap_etime = ktime_get(); @@ -218,6 +223,11 @@ static long map_benchmark_ioctl(struct file *file, unsigned int cmd, return -EINVAL; } + if (map->bparam.dma_trans_ns > DMA_MAP_MAX_TRANS_DELAY) { + pr_err("invalid transmission delay\n"); + return -EINVAL; + } + if (map->bparam.node != NUMA_NO_NODE && !node_possible(map->bparam.node)) { pr_err("invalid numa node\n"); diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c b/tools/testing/selftests/dma/dma_map_benchmark.c index 537d65968c48..fb23ce9617ea 100644 --- a/tools/testing/selftests/dma/dma_map_benchmark.c +++ b/tools/testing/selftests/dma/dma_map_benchmark.c @@ -12,9 +12,12 @@ #include #include +#define NSEC_PER_MSEC 100L + #define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) #define DMA_MAP_MAX_THREADS1024 #define DMA_MAP_MAX_SECONDS 300 +#define DMA_MAP_MAX_TRANS_DELAY(10 * NSEC_PER_MSEC) #define DMA_MAP_BIDIRECTIONAL 0 #define DMA_MAP_TO_DEVICE 1 @@ -36,7 +39,8 @@ struct map_benchmark { __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ - __u8 expansion[84]; /* For future use */ + __u32 dma_trans_ns; /* time for DMA transmission in ns */ + __u8 expansion[80]; /* For future use */ }; int main(int argc, char **argv) @@ -46,12 +50,12 @@ int main(int argc, char **argv) /* default single thread, run 20 seconds on NUMA_NO_NODE */ int threads = 1, seconds = 20, node = -1; /* default dma mask 32bit, bidirectional DMA */ - int bits = 32, dir = DMA_MAP_BIDIRECTIONAL; + int bits = 32, xdelay = 0, dir = DMA_MAP_BIDIRECTIONAL; int cmd = DMA_MAP_BENCHMARK; char *p; - while ((opt = getopt(argc, argv, "t:s:n:b:d:")) != -1) { + while ((opt = getopt(argc, argv, "t:s:n:b:d:x:")) != -1) { switch (opt) { case 't': threads = atoi(optarg); @@ -68,6 +72,9 @@ int main(int argc, char **argv) case 'd': dir = atoi(optarg); break; + case 'x': + xdelay = atoi(optarg); + break; default: return -1; } @@ -85,6 +92,12 @@ int main(int argc, char **argv) exit(1); }
[PATCH v3 1/2] dma-mapping: benchmark: use u8 for reserved field in uAPI structure
The original code put five u32 before a u64 expansion[10] array. Five is odd, this will cause trouble in the extension of the structure by adding new features. This patch moves to use u8 for reserved field to avoid future alignment risk. Meanwhile, it also clears the memory of struct map_benchmark in tools, otherwise, if users use old version to run on newer kernel, the random expansion value will cause side effect on newer kernel. Signed-off-by: Barry Song --- kernel/dma/map_benchmark.c | 2 +- tools/testing/selftests/dma/dma_map_benchmark.c | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c index 1b1b8ff875cb..da95df381483 100644 --- a/kernel/dma/map_benchmark.c +++ b/kernel/dma/map_benchmark.c @@ -36,7 +36,7 @@ struct map_benchmark { __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ - __u64 expansion[10];/* For future use */ + __u8 expansion[84]; /* For future use */ }; struct map_benchmark_data { diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c b/tools/testing/selftests/dma/dma_map_benchmark.c index 7065163a8388..537d65968c48 100644 --- a/tools/testing/selftests/dma/dma_map_benchmark.c +++ b/tools/testing/selftests/dma/dma_map_benchmark.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -35,7 +36,7 @@ struct map_benchmark { __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ - __u64 expansion[10];/* For future use */ + __u8 expansion[84]; /* For future use */ }; int main(int argc, char **argv) @@ -102,6 +103,7 @@ int main(int argc, char **argv) exit(1); } + memset(&map, 0, sizeof(map)); map.seconds = seconds; map.threads = threads; map.node = node; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] dma-mapping: benchmark: pretend DMA is transmitting
On Fri, Feb 05, 2021 at 10:52:37AM +, Song Bao Hua (Barry Song) wrote: > I assume there is no need to keep the same size with 5.11-rc, so > could change the struct to: > > struct map_benchmark { > __u64 avg_map_100ns; /* average map latency in 100ns */ > __u64 map_stddev; /* standard deviation of map latency */ > __u64 avg_unmap_100ns; /* as above */ > __u64 unmap_stddev; > __u32 threads; /* how many threads will do map/unmap in parallel */ > __u32 seconds; /* how long the test will last */ > __s32 node; /* which numa node this benchmark will run on */ > __u32 dma_bits; /* DMA addressing capability */ > __u32 dma_dir; /* DMA data direction */ > __u8 expansion[84]; /* For future use */ > }; > > This won't increase size on 64bit system, but it increases 4bytes > on 32bits system comparing to 5.11-rc. How do you think about it? Yes, that sounds good. Please send me a two patch series with the first one changing the alignment, and the second adding the delay. I'll send the first one off to Linus ASAP then. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2] dma-mapping: benchmark: pretend DMA is transmitting
> -Original Message- > From: Christoph Hellwig [mailto:h...@lst.de] > Sent: Friday, February 5, 2021 11:36 PM > To: Song Bao Hua (Barry Song) > Cc: Christoph Hellwig ; m.szyprow...@samsung.com; > robin.mur...@arm.com; iommu@lists.linux-foundation.org; > linux-ker...@vger.kernel.org; linux...@openeuler.org > Subject: Re: [PATCH v2] dma-mapping: benchmark: pretend DMA is transmitting > > On Fri, Feb 05, 2021 at 10:32:26AM +, Song Bao Hua (Barry Song) wrote: > > I can keep the struct size unchanged by changing the struct to > > > > struct map_benchmark { > > __u64 avg_map_100ns; /* average map latency in 100ns */ > > __u64 map_stddev; /* standard deviation of map latency */ > > __u64 avg_unmap_100ns; /* as above */ > > __u64 unmap_stddev; > > __u32 threads; /* how many threads will do map/unmap in parallel */ > > __u32 seconds; /* how long the test will last */ > > __s32 node; /* which numa node this benchmark will run on */ > > __u32 dma_bits; /* DMA addressing capability */ > > __u32 dma_dir; /* DMA data direction */ > > __u32 dma_trans_ns; /* time for DMA transmission in ns */ > > > > __u32 exp; /* For future use */ > > __u64 expansion[9]; /* For future use */ > > }; > > > > But the code is really ugly now. > > Thats why we usually use __u8 fields for reserved field. You might > consider just switching to that instead while you're at it. I guess > we'll just have to get the addition into 5.11 then to make sure we > don't release a kernel with the alignment fix. I assume there is no need to keep the same size with 5.11-rc, so could change the struct to: struct map_benchmark { __u64 avg_map_100ns; /* average map latency in 100ns */ __u64 map_stddev; /* standard deviation of map latency */ __u64 avg_unmap_100ns; /* as above */ __u64 unmap_stddev; __u32 threads; /* how many threads will do map/unmap in parallel */ __u32 seconds; /* how long the test will last */ __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ __u8 expansion[84]; /* For future use */ }; This won't increase size on 64bit system, but it increases 4bytes on 32bits system comparing to 5.11-rc. How do you think about it? Thanks Barry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] dma-mapping: benchmark: pretend DMA is transmitting
On Fri, Feb 05, 2021 at 10:32:26AM +, Song Bao Hua (Barry Song) wrote: > I can keep the struct size unchanged by changing the struct to > > struct map_benchmark { > __u64 avg_map_100ns; /* average map latency in 100ns */ > __u64 map_stddev; /* standard deviation of map latency */ > __u64 avg_unmap_100ns; /* as above */ > __u64 unmap_stddev; > __u32 threads; /* how many threads will do map/unmap in parallel */ > __u32 seconds; /* how long the test will last */ > __s32 node; /* which numa node this benchmark will run on */ > __u32 dma_bits; /* DMA addressing capability */ > __u32 dma_dir; /* DMA data direction */ > __u32 dma_trans_ns; /* time for DMA transmission in ns */ > > __u32 exp; /* For future use */ > __u64 expansion[9]; /* For future use */ > }; > > But the code is really ugly now. Thats why we usually use __u8 fields for reserved field. You might consider just switching to that instead while you're at it. I guess we'll just have to get the addition into 5.11 then to make sure we don't release a kernel with the alignment fix. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 7/8] swiotlb: respect min_align_mask
On Thu, Feb 04, 2021 at 11:13:45PM +, Robin Murphy wrote: >> + */ >> +static unsigned int swiotlb_align_offset(struct device *dev, u64 addr) >> +{ >> +unsigned min_align_mask = dma_get_min_align_mask(dev); >> + >> +if (!min_align_mask) >> +return 0; > > I doubt that's beneficial - even if the compiler can convert it into a > csel, it'll then be doing unnecessary work to throw away a > cheaply-calculated 0 in favour of hard-coded 0 in the one case it matters True, I'll drop the checks. > ;) > >> +return addr & min_align_mask & ((1 << IO_TLB_SHIFT) - 1); > > (BTW, for readability throughout, "#define IO_TLB_SIZE (1 << IO_TLB_SHIFT)" > sure wouldn't go amiss...) I actually had a patch doing just that, but as it is the only patch touching swiotlb.h it caused endless rebuilds for me, so I dropped it as it only had a few uses anyway. But I've added it back. >> -if (alloc_size >= PAGE_SIZE) >> +if (min_align_mask) >> +stride = (min_align_mask + 1) >> IO_TLB_SHIFT; > > So this can't underflow because "min_align_mask" is actually just the > high-order bits representing the number of iotlb slots needed to meet the > requirement, right? (It took a good 5 minutes to realise this wasn't doing > what I initially thought it did...) Yes. > In that case, a) could the local var be called something like > iotlb_align_mask to clarify that it's *not* just a copy of the device's > min_align_mask, Ok. > and b) maybe just have an unconditional initialisation that > works either way: > > stride = (min_align_mask >> IO_TLB_SHIFT) + 1; Sure. > In fact with that, I think could just mask orig_addr with ~IO_TLB_SIZE in > the call to check_alignment() below, or shift everything down by > IO_TLB_SHIFT in check_alignment() itself, instead of mangling > min_align_mask at all (I'm assuming we do need to ignore the low-order bits > of orig_addr at this point). Yes, we do need to ignore the low bits as they won't ever be set in tlb_dma_addr. Not sure the shift helps as we need to mask first. I ended up killing check_alignment entirely, in favor of a new slot_addr helper that calculates the address based off the base and index and which can be used in a few other places as this one. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2] dma-mapping: benchmark: pretend DMA is transmitting
> -Original Message- > From: Christoph Hellwig [mailto:h...@lst.de] > Sent: Friday, February 5, 2021 10:21 PM > To: Song Bao Hua (Barry Song) > Cc: m.szyprow...@samsung.com; h...@lst.de; robin.mur...@arm.com; > iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > linux...@openeuler.org > Subject: Re: [PATCH v2] dma-mapping: benchmark: pretend DMA is transmitting > > On Fri, Feb 05, 2021 at 03:00:35PM +1300, Barry Song wrote: > > + __u32 dma_trans_ns; /* time for DMA transmission in ns */ > > __u64 expansion[10];/* For future use */ > > We need to keep the struct size, so the expansion field needs to > shrink by the equivalent amount of data that is added in dma_trans_ns. Unfortunately I didn't put a rsv u32 field after dma_dir in the original patch. There were five 32bits data before expansion[]: struct map_benchmark { __u64 avg_map_100ns; /* average map latency in 100ns */ __u64 map_stddev; /* standard deviation of map latency */ __u64 avg_unmap_100ns; /* as above */ __u64 unmap_stddev; __u32 threads; /* how many threads will do map/unmap in parallel */ __u32 seconds; /* how long the test will last */ __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ __u64 expansion[10];/* For future use */ }; My bad. That was really silly. I should have done the below from the first beginning: struct map_benchmark { __u64 avg_map_100ns; /* average map latency in 100ns */ __u64 map_stddev; /* standard deviation of map latency */ __u64 avg_unmap_100ns; /* as above */ __u64 unmap_stddev; __u32 threads; /* how many threads will do map/unmap in parallel */ __u32 seconds; /* how long the test will last */ __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ __u32 rsv; __u64 expansion[10];/* For future use */ }; So on 64bit system, this patch doesn't change the length of struct as the new added u32 just fill the gap between dma_dir and expansion. For 32bit system, this patch increases 4 bytes in the length. I can keep the struct size unchanged by changing the struct to struct map_benchmark { __u64 avg_map_100ns; /* average map latency in 100ns */ __u64 map_stddev; /* standard deviation of map latency */ __u64 avg_unmap_100ns; /* as above */ __u64 unmap_stddev; __u32 threads; /* how many threads will do map/unmap in parallel */ __u32 seconds; /* how long the test will last */ __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ __u32 dma_trans_ns; /* time for DMA transmission in ns */ __u32 exp; /* For future use */ __u64 expansion[9]; /* For future use */ }; But the code is really ugly now. Thanks Barry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU
Hi Keqian, On Fri, Feb 05, 2021 at 05:13:50PM +0800, Keqian Zhu wrote: > > We need to accommodate the firmware override as well if we need this to be > > meaningful. Jean-Philippe is already carrying a suitable patch in the SVA > > stack[1]. > Robin, Thanks for pointing it out. > > Jean, I see that the IORT HTTU flag overrides the hardware register info > unconditionally. I have some concern about it: > > If the override flag has HTTU but hardware doesn't support it, then driver > will use this feature but receive access fault or permission fault from SMMU > unexpectedly. > 1) If IOPF is not supported, then kernel can not work normally. > 2) If IOPF is supported, kernel will perform useless actions, such as HTTU > based dma dirty tracking (this series). > > As the IORT spec doesn't give an explicit explanation for HTTU override, can > we comprehend it as a mask for HTTU related hardware register? To me "Overrides the value of SMMU_IDR0.HTTU" is clear enough: disregard the value of SMMU_IDR0.HTTU and use the one specified by IORT instead. And that's both ways, since there is no validity mask for the IORT value: if there is an IORT table, always ignore SMMU_IDR0.HTTU. That's how the SMMU driver implements the COHACC bit, which has the same wording in IORT. So I think we should implement HTTU the same way. One complication is that there is no equivalent override for device tree. I think it can be added later if necessary, because unlike IORT it can be tri state (property not present, overriden positive, overridden negative). Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/8] swiotlb: refactor swiotlb_tbl_map_single
On Thu, Feb 04, 2021 at 10:12:31PM +, Robin Murphy wrote: >> -mask = dma_get_seg_boundary(hwdev); >> +if (boundary_mask + 1 == ~0UL) > > Either "mask == ~0UL" or "mask + 1 == 0", surely? I switched forth and back a few times and ended up with the broken variant in the middle. Fixed. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/8] swiotlb: factor out a nr_slots helper
On Thu, Feb 04, 2021 at 10:09:02PM +, Robin Murphy wrote: >> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c >> index 838dbad10ab916..0c0b81799edbdb 100644 >> --- a/kernel/dma/swiotlb.c >> +++ b/kernel/dma/swiotlb.c >> @@ -194,6 +194,11 @@ static inline unsigned long io_tlb_offset(unsigned long >> val) >> return val & (IO_TLB_SEGSIZE - 1); >> } >> +static unsigned long nr_slots(u64 val) >> +{ >> +return ALIGN(val, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; > > Would DIV_ROUND_UP(val, 1 << IOTLB_SHIFT) be even clearer? Not sure it is all that much cleaner, but it does fit a common pattern, so I'll switch to that. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()
Hi Guillaume, On Thu, Feb 04, 2021 at 09:24:23PM -0800, Nicolin Chen wrote: > > Please let us know if you need any help debugging this issue or > > to try a fix on this platform. > > Yes, I don't have any Tegra124 platform to run. It'd be very nice > if you can run some debugging patch (I can provide you) and a fix > after I root cause the issue. Would it be possible for you to run with the given debugging patch? It'd be nicer if I can get both logs of the vanilla kernel (failing) and the commit-reverted version (passing), each applying this patch. Thanks in advance! Nicolin >From 80f288d7101101fca0412c5c200cea7e203a675d Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Fri, 5 Feb 2021 01:41:07 -0800 Subject: [PATCH] iommu: debug tegra-smmu Signed-off-by: Nicolin Chen --- drivers/iommu/tegra-smmu.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 4a3f095a1c26..796b7df54b8f 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -363,6 +363,7 @@ static void tegra_smmu_enable(struct tegra_smmu *smmu, unsigned int swgroup, value |= SMMU_ASID_VALUE(asid); value |= SMMU_ASID_ENABLE; smmu_writel(smmu, value, group->reg); + pr_alert("%s, swgroup %d: writing %x to reg1 %x\n", __func__, swgroup, value, group->reg); } else { pr_warn("%s group from swgroup %u not found\n", __func__, swgroup); @@ -379,6 +380,7 @@ static void tegra_smmu_enable(struct tegra_smmu *smmu, unsigned int swgroup, value = smmu_readl(smmu, client->smmu.reg); value |= BIT(client->smmu.bit); smmu_writel(smmu, value, client->smmu.reg); + pr_alert("%s, swgroup %d: writing %x to reg2 %x\n", __func__, swgroup, value, client->smmu.reg); } } @@ -491,13 +493,19 @@ static int tegra_smmu_attach_dev(struct iommu_domain *domain, unsigned int index; int err; + dev_alert(dev, "---%s: smmu %s\n", __func__, smmu ? "valid" : "NULL"); + dump_stack(); if (!fwspec) return -ENOENT; + dev_alert(dev, "---%s: fwspec->num_ids %d\n", __func__, fwspec->num_ids); for (index = 0; index < fwspec->num_ids; index++) { err = tegra_smmu_as_prepare(smmu, as); - if (err) + if (err) { + dev_err(dev, "failed to prepare as(%d) for fwspec %d", +as->id, fwspec->ids[index]); goto disable; + } tegra_smmu_enable(smmu, fwspec->ids[index], as->id); } @@ -805,6 +813,8 @@ static struct iommu_device *tegra_smmu_probe_device(struct device *dev) if (!smmu) return ERR_PTR(-ENODEV); + dev_alert(dev, "%s, %d\n", __func__, __LINE__); + dump_stack(); return &smmu->iommu; } @@ -904,6 +914,8 @@ static int tegra_smmu_of_xlate(struct device *dev, dev_iommu_priv_set(dev, mc->smmu); + dev_alert(dev, "---%s: id %d", __func__, id); + dump_stack(); return iommu_fwspec_add_ids(dev, &id, 1); } -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] dma-mapping: benchmark: pretend DMA is transmitting
On Fri, Feb 05, 2021 at 03:00:35PM +1300, Barry Song wrote: > + __u32 dma_trans_ns; /* time for DMA transmission in ns */ > __u64 expansion[10];/* For future use */ We need to keep the struct size, so the expansion field needs to shrink by the equivalent amount of data that is added in dma_trans_ns. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU
Hi Robin and Jean, On 2021/2/5 3:50, Robin Murphy wrote: > On 2021-01-28 15:17, Keqian Zhu wrote: >> From: jiangkunkun >> >> The SMMU which supports HTTU (Hardware Translation Table Update) can >> update the access flag and the dirty state of TTD by hardware. It is >> essential to track dirty pages of DMA. >> >> This adds feature detection, none functional change. >> >> Co-developed-by: Keqian Zhu >> Signed-off-by: Kunkun Jiang >> --- >> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 >> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 >> include/linux/io-pgtable.h | 1 + >> 3 files changed, 25 insertions(+) >> >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> index 8ca7415d785d..0f0fe71cc10d 100644 >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> @@ -1987,6 +1987,7 @@ static int arm_smmu_domain_finalise(struct >> iommu_domain *domain, >> .pgsize_bitmap= smmu->pgsize_bitmap, >> .ias= ias, >> .oas= oas, >> +.httu_hd= smmu->features & ARM_SMMU_FEAT_HTTU_HD, >> .coherent_walk= smmu->features & ARM_SMMU_FEAT_COHERENCY, >> .tlb= &arm_smmu_flush_ops, >> .iommu_dev= smmu->dev, >> @@ -3224,6 +3225,21 @@ static int arm_smmu_device_hw_probe(struct >> arm_smmu_device *smmu) >> if (reg & IDR0_HYP) >> smmu->features |= ARM_SMMU_FEAT_HYP; >> +switch (FIELD_GET(IDR0_HTTU, reg)) { > > We need to accommodate the firmware override as well if we need this to be > meaningful. Jean-Philippe is already carrying a suitable patch in the SVA > stack[1]. Robin, Thanks for pointing it out. Jean, I see that the IORT HTTU flag overrides the hardware register info unconditionally. I have some concern about it: If the override flag has HTTU but hardware doesn't support it, then driver will use this feature but receive access fault or permission fault from SMMU unexpectedly. 1) If IOPF is not supported, then kernel can not work normally. 2) If IOPF is supported, kernel will perform useless actions, such as HTTU based dma dirty tracking (this series). As the IORT spec doesn't give an explicit explanation for HTTU override, can we comprehend it as a mask for HTTU related hardware register? So the logic becomes: smmu->feature = HTTU override & IDR0_HTTU; > >> +case IDR0_HTTU_NONE: >> +break; >> +case IDR0_HTTU_HA: >> +smmu->features |= ARM_SMMU_FEAT_HTTU_HA; >> +break; >> +case IDR0_HTTU_HAD: >> +smmu->features |= ARM_SMMU_FEAT_HTTU_HA; >> +smmu->features |= ARM_SMMU_FEAT_HTTU_HD; >> +break; >> +default: >> +dev_err(smmu->dev, "unknown/unsupported HTTU!\n"); >> +return -ENXIO; >> +} >> + >> /* >>* The coherency feature as set by FW is used in preference to the ID >>* register, but warn on mismatch. >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> index 96c2e9565e00..e91bea44519e 100644 >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> @@ -33,6 +33,10 @@ >> #define IDR0_ASID16(1 << 12) >> #define IDR0_ATS(1 << 10) >> #define IDR0_HYP(1 << 9) >> +#define IDR0_HTTUGENMASK(7, 6) >> +#define IDR0_HTTU_NONE0 >> +#define IDR0_HTTU_HA1 >> +#define IDR0_HTTU_HAD2 >> #define IDR0_COHACC(1 << 4) >> #define IDR0_TTFGENMASK(3, 2) >> #define IDR0_TTF_AARCH642 >> @@ -286,6 +290,8 @@ >> #define CTXDESC_CD_0_TCR_TBI0(1ULL << 38) >> #define CTXDESC_CD_0_AA64(1UL << 41) >> +#define CTXDESC_CD_0_HD(1UL << 42) >> +#define CTXDESC_CD_0_HA(1UL << 43) >> #define CTXDESC_CD_0_S(1UL << 44) >> #define CTXDESC_CD_0_R(1UL << 45) >> #define CTXDESC_CD_0_A(1UL << 46) >> @@ -604,6 +610,8 @@ struct arm_smmu_device { >> #define ARM_SMMU_FEAT_RANGE_INV(1 << 15) >> #define ARM_SMMU_FEAT_BTM(1 << 16) >> #define ARM_SMMU_FEAT_SVA(1 << 17) >> +#define ARM_SMMU_FEAT_HTTU_HA(1 << 18) >> +#define ARM_SMMU_FEAT_HTTU_HD(1 << 19) >> u32features; >> #define ARM_SMMU_OPT_SKIP_PREFETCH(1 << 0) >> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h >> index ea727eb1a1a9..1a00ea8562c7 100644 >> --- a/include/linux/io-pgtable.h >> +++ b/include/linux/io-pgtable.h >> @@ -97,6 +97,7 @@ struct io_pgtable_cfg { >> unsigned longpgsize_bitmap; >> unsigned intias; >> unsigned intoas; >> +boolhttu_hd; > > This is very specific to the AArch64 stage 1 format, not a generic capabili
Re: [PATCH][next][V2] iommu/mediatek: Fix unsigned domid comparison with less than zero
On Thu, Feb 04, 2021 at 03:00:01PM +, Colin King wrote: > drivers/iommu/mtk_iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/2] VMD MSI Remapping Bypass
On Thu, Feb 04, 2021 at 12:09:04PM -0700, Jon Derrick wrote: > Jon Derrick (2): > iommu/vt-d: Use Real PCI DMA device for IRTE > PCI: vmd: Disable MSI/X remapping when possible > > drivers/iommu/intel/irq_remapping.c | 3 +- > drivers/pci/controller/vmd.c| 60 +++-- > 2 files changed, 50 insertions(+), 13 deletions(-) This probably goes via Bjorn's tree, so Acked-by: Joerg Roedel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 2/3] arm64/hugetlb: Enable HUGETLB_PAGE_SIZE_VARIABLE
On 2/5/21 1:50 PM, David Hildenbrand wrote: > On 04.02.21 08:01, Anshuman Khandual wrote: >> MAX_ORDER which invariably depends on FORCE_MAX_ZONEORDER can be a variable >> for a given page size, depending on whether TRANSPARENT_HUGEPAGE is enabled >> or not. In certain page size and THP combinations HUGETLB_PAGE_ORDER can be >> greater than MAX_ORDER, making it unusable as pageblock_order. > > Just so I understand correctly, this does not imply that we have THP that > exceed the pageblock size / MAX_ORDER size, correct? Correct. MAX_ORDER gets incremented when THP is enabled. config FORCE_MAX_ZONEORDER int default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE) default "12" if (ARM64_16K_PAGES && TRANSPARENT_HUGEPAGE) default "11" ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 2/3] arm64/hugetlb: Enable HUGETLB_PAGE_SIZE_VARIABLE
On 04.02.21 08:01, Anshuman Khandual wrote: MAX_ORDER which invariably depends on FORCE_MAX_ZONEORDER can be a variable for a given page size, depending on whether TRANSPARENT_HUGEPAGE is enabled or not. In certain page size and THP combinations HUGETLB_PAGE_ORDER can be greater than MAX_ORDER, making it unusable as pageblock_order. Just so I understand correctly, this does not imply that we have THP that exceed the pageblock size / MAX_ORDER size, correct? This enables HUGETLB_PAGE_SIZE_VARIABLE making pageblock_order a variable rather than the compile time constant HUGETLB_PAGE_ORDER which could break MAX_ORDER rule for certain configurations. Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/arm64/Kconfig | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 175914f2f340..c4acf8230f20 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1918,6 +1918,10 @@ config ARCH_ENABLE_THP_MIGRATION def_bool y depends on TRANSPARENT_HUGEPAGE +config HUGETLB_PAGE_SIZE_VARIABLE + def_bool y + depends on HUGETLB_PAGE + menu "Power management options" source "kernel/power/Kconfig" -- Thanks, David / dhildenb ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu