RE: [PATCH 02/13] iommu: Move bus setup to IOMMU device registration
Good effort to isolate bus config from smmu drivers. Reviewed-By: Krishna Reddy I have an orthogonal question here. Can the following code handle the case, where different buses have different type of SMMU instances(like one bus has SMMUv2 and another bus has SMMUv3)? If it need to handle the above case, can the smmu device bus be matched with specific bus here and ops set only for that bus? > + for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) { > + struct bus_type *bus = iommu_buses[i]; > + const struct iommu_ops *bus_ops = bus->iommu_ops; > + int err; > + > + WARN_ON(bus_ops && bus_ops != ops); > + bus->iommu_ops = ops; > + err = bus_iommu_probe(bus); > + if (err) { > + bus_for_each_dev(bus, NULL, iommu, > remove_iommu_group); > + bus->iommu_ops = bus_ops; > + return err; > + } > + } -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [Patch v2] iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu
> Tegra194 and Tegra234 SoCs have the erratum that causes walk cache entries to > not be invalidated correctly. The problem is that the walk cache index > generated > for IOVA is not same across translation and invalidation requests. This is > leading > to page faults when PMD entry is released during unmap and populated with > new PTE table during subsequent map request. Disabling large page mappings > avoids the release of PMD entry and avoid translations seeing stale PMD entry > in > walk cache. > Fix this by limiting the page mappings to PAGE_SIZE for Tegra194 and > Tegra234 devices. This is recommended fix from Tegra hardware design team. > > Co-developed-by: Pritesh Raithatha > Signed-off-by: Pritesh Raithatha > Signed-off-by: Ashish Mhetre > --- > Changes in v2: > - Using init_context() to override pgsize_bitmap instead of new function > > drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c | 30 > > 1 file changed, 30 insertions(+) > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c > b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c > index 01e9b50b10a1..87bf522b9d2e 100644 > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c > @@ -258,6 +258,34 @@ static void nvidia_smmu_probe_finalize(struct > arm_smmu_device *smmu, struct devi > dev_name(dev), err); > } > > +static int nvidia_smmu_init_context(struct arm_smmu_domain > *smmu_domain, > + struct io_pgtable_cfg *pgtbl_cfg, > + struct device *dev) > +{ > + struct arm_smmu_device *smmu = smmu_domain->smmu; > + const struct device_node *np = smmu->dev->of_node; > + > + /* > + * Tegra194 and Tegra234 SoCs have the erratum that causes walk > cache > + * entries to not be invalidated correctly. The problem is that the walk > + * cache index generated for IOVA is not same across translation and > + * invalidation requests. This is leading to page faults when PMD entry > + * is released during unmap and populated with new PTE table during > + * subsequent map request. Disabling large page mappings avoids the > + * release of PMD entry and avoid translations seeing stale PMD entry in > + * walk cache. > + * Fix this by limiting the page mappings to PAGE_SIZE on Tegra194 and > + * Tegra234. > + */ > + if (of_device_is_compatible(np, "nvidia,tegra234-smmu") || > + of_device_is_compatible(np, "nvidia,tegra194-smmu")) { > + smmu->pgsize_bitmap = PAGE_SIZE; > + pgtbl_cfg->pgsize_bitmap = smmu->pgsize_bitmap; > + } > + > + return 0; > +} > + > static const struct arm_smmu_impl nvidia_smmu_impl = { > .read_reg = nvidia_smmu_read_reg, > .write_reg = nvidia_smmu_write_reg, > @@ -268,10 +296,12 @@ static const struct arm_smmu_impl > nvidia_smmu_impl = { > .global_fault = nvidia_smmu_global_fault, > .context_fault = nvidia_smmu_context_fault, > .probe_finalize = nvidia_smmu_probe_finalize, > + .init_context = nvidia_smmu_init_context, > }; > > static const struct arm_smmu_impl nvidia_smmu_single_impl = { > .probe_finalize = nvidia_smmu_probe_finalize, > + .init_context = nvidia_smmu_init_context, > }; > Reviewed-by: Krishna Reddy -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
Hi Eric, > This is based on Jean-Philippe's > [PATCH v14 00/10] iommu: I/O page faults for SMMUv3 > https://www.spinics.net/lists/arm-kernel/msg886518.html > (including the patches that were not pulled for 5.13) > Jean's patches have been merged to v5.14. Do you anticipate IOMMU/VFIO part patches getting into upstream kernel soon? -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] iommu/arm: Cleanup resources in case of probe error path
> + > +err_pm_disable: > + pm_runtime_disable(dev); > return ret; > } Should it be pm_runtime_force_suspend()? qcom_iommu_device_remove() doesn't use pm_runtime_disable(dev). 875 static int qcom_iommu_device_remove(struct platform_device *pdev) 876 { ... 881 >---pm_runtime_force_suspend(&pdev->dev); 882 >---platform_set_drvdata(pdev, NULL); 883 >---iommu_device_sysfs_remove(&qcom_iommu->iommu); 884 >---iommu_device_unregister(&qcom_iommu->iommu); ... 887 } -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list
> Instead of flush_ops in init_context hook, perhaps a io_pgtable quirk since > this is > related to tlb, probably a bad name but IO_PGTABLE_QUIRK_TLB_INV which will > be set in init_context impl hook and the prev condition in > io_pgtable_tlb_flush_walk() > becomes something like below. Seems very minimal and neat instead of poking > into tlb_flush_walk functions or touching dma strict with some flag? > > if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT || > iop->cfg.quirks & IO_PGTABLE_QUIRK_TLB_INV) { > iop->cfg.tlb->tlb_flush_all(iop->cookie); > return; > } Can you name it as IO_PGTABLE_QUIRK_TLB_INV_ASID or IO_PGTABLE_QUIRK_TLB_INV_ALL_ASID? -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v3 3/9] iommu/arm-smmu: Implement ->probe_finalize()
> if (smmu->impl->probe_finalize) The above is the issue. It should be updated as below similar to other instances impl callbacks. if (smmu->impl && smmu->impl->probe_finalize) -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list
> Right but we won't know until we profile the specific usecases or try them in > generic workload to see if they affect the performance. Sure, over > invalidation is > a concern where multiple buffers can be mapped to same context and the cache > is not usable at the time for lookup and such but we don't do it for small > buffers > and only for large buffers which means thousands of TLB entry mappings in > which case TLBIASID is preferred (note: I mentioned the HW team > recommendation to use it for anything greater than 128 TLB entries) in my > earlier reply. And also note that we do this only for partial walk flush, we > are not > arbitrarily changing all the TLBIs to ASID based. Most of the heavy bw use cases does involve processing larger buffers. When the physical memory is allocated dis-contiguously at page_size (let's use 4KB here) granularity, each aligned 2MB chunks IOVA unmap would involve performing a TLBIASID as 2MB is not a leaf. Essentially, It happens all the time during large buffer unmaps and potentially impact active traffic on other large buffers. Depending on how much latency HW engines can absorb, the overflow/underflow issues for ISO engines can be sporadic and vendor specific. Performing TLBIASID as default for all SoCs is not a safe operation. > I am no camera expert but from what the camera team mentioned is that there > is a thread which frees memory(large unused memory buffers) periodically which > ends up taking around 100+ms and causing some camera test failures with > frame drops. Parallel efforts are already being made to optimize this usage of > thread but as I mentioned previously, this is *not a camera specific*, lets > say > someone else invokes such large unmaps, it's going to face the same issue. >From the above, It doesn't look like the root cause of frame drops is fully >understood. Why is 100+ms delay causing camera frame drop? Is the same thread submitting the buffers to camera after unmap is complete? If not, how is the unmap latency causing issue here? > > If unmap is queued and performed on a back ground thread, would it > > resolve the frame drops? > > Not sure I understand what you mean by queuing on background thread but with > that or not, we still do the same number of TLBIs and hop through > iommu->io-pgtable->arm-smmu to perform the the unmap, so how will that > help? I mean adding the unmap requests into a queue and processing them from a different thread. It is not to reduce the TLBIs. But, not to block subsequent buffer allocation, IOVA map requests, if they are being requested from same thread that is performing unmap. If unmap is already performed from a different thread, then the issue still need to be root caused to understand it fully. Check for any serialization issues. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 1/2] iommu: Fix race condition during default domain allocation
> > + mutex_lock(&group->mutex); > > iommu_alloc_default_domain(group, dev); > > + mutex_unlock(&group->mutex); > > It feels wrong to serialise this for everybody just to cater for systems with > aliasing SIDs between devices. Serialization is limited to devices in the same group. Unless devices share SID, they wouldn't be in same group. > Can you provide some more information about exactly what the h/w > configuration is, and the callstack which exhibits the race, please? The failure is an after effect and is a page fault. Don't have a failure call stack here. Ashish has traced it through print messages and he can provide them. >From the prints messages, The following was observed in page fault case: Device1: iommu_probe_device() --> iommu_alloc_default_domain() --> iommu_group_alloc_default_domain() --> __iommu_attach_device(group->default_domain) Device2: iommu_probe_device() --> iommu_alloc_default_domain() --> iommu_group_alloc_default_domain() --> __iommu_attach_device(group->default_domain) Both devices(with same SID) are entering into iommu_group_alloc_default_domain() function and each one getting attached to a different group->default_domain as the second one overwrites group->default_domain after the first one attaches to group->default_domain it has created. SMMU would be setup to use first domain for the context page table. Whereas all the dma map/unamp requests from second device would be performed on a domain that is not used by SMMU for context translations and IOVA (not mapped in first domain) accesses from second device lead to page faults. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list
Hi Sai, > >> > No, the unmap latency is not just in some test case written, the > >> > issue is very real and we have workloads where camera is reporting > >> > frame drops because of this unmap latency in the order of 100s of > milliseconds. > Not exactly, this issue is not specific to camera. If you look at the numbers > in the > commit text, even for the test device its the same observation. It depends on > the buffer size we are unmapping which affects the number of TLBIs issue. I am > not aware of any such HW side bw issues for camera specifically on QCOM > devices. It is clear that reducing number of TLBIs reduces the umap API latency. But, It is at the expense of throwing away valid tlb entries. Quantifying the impact of arbitrary invalidation of valid tlb entries at context level is not straight forward and use case dependent. The side-effects might be rare or won't be known until they are noticed. Can you provide more details on How the unmap latency is causing camera to drop frames? Is unmap performed in the perf path? If unmap is queued and performed on a back ground thread, would it resolve the frame drops? -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list
> > No, the unmap latency is not just in some test case written, the issue > > is very real and we have workloads where camera is reporting frame > > drops because of this unmap latency in the order of 100s of milliseconds. > > And hardware team recommends using ASID based invalidations for > > anything larger than 128 TLB entries. So yes, we have taken note of > > impacts here before going this way and hence feel more inclined to > > make this qcom specific if required. Seems like the real issue here is not the unmap API latency. It should be the high number of back to back SMMU TLB invalidate register writes that is resulting in lower ISO BW to Camera and overflow. Isn't it? Even Tegra186 SoC has similar issue and HW team recommended to rate limit the number of back to back SMMU tlb invalidate registers writes. The subsequent Tegra194 SoC has a dedicated SMMU for ISO clients to avoid the impact of TLB invalidates from NISO clients on ISO BW. >> Thinking some more, I >> wonder if the Tegra folks might have an opinion to add here, given >> that their multiple-SMMU solution was seemingly about trying to get >> enough TLB and pagetable walk bandwidth in the first place? While it is good to reduce the number of tlb register writes, Flushing all TLB entries at context granularity arbitrarily can have negative impact on active traffic and BW. I don't have much data on possible impact at this point. Can the flushing at context granularity be made a quirk than performing it as default? -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 0/5] iommu: Support identity mappings of reserved-memory regions
Hi Dmitry, > Thank you for the answer. Could you please give more information about: > 1) Are you on software or hardware team, or both? I am in the software team and has contributed to initial Tegra SMMU driver in the downstream along with earlier team member Hiroshi Doyu. > 2) Is SMMU a third party IP or developed in-house? Tegra SMMU is developed in-house. > 3) Do you have a direct access to HDL sources? Are you 100% sure that > hardware does what you say? It was discussed with Hardware team before and again today as well. Enabling ASID for display engine while it continues to access the buffer memory is a safe operation. As per HW team, The only side-effect that can happen is additional latency to transaction as SMMU caches get warmed up. > 4) What happens when CPU writes to ASID register? Does SMMU state machine > latch ASID status (making it visible) only at a single "safe" point? MC makes a decision on routing transaction through either SMMU page tables or bypassing based on the ASID register value. It checks the ASID register only once per transaction in the pipeline. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 0/5] iommu: Support identity mappings of reserved-memory regions
> Is it always safe to enable SMMU ASID in a middle of a DMA request made by a > memory client? >From MC point of view, It is safe to enable and has been this way from many >years in downstream code for display engine. It doesn't impact the transactions that have already bypassed SMMU before enabling SMMU ASID. Transactions that are yet to pass SMMU stage would go through SMMU once SMMU ASID is enabled and visible. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
>> Did that patch cause any issue, or is it just not needed on your system? >> It fixes an hypothetical problem with the way ATS is implemented. >> Maybe I actually observed it on an old software model, I don't >> remember. Either way it's unlikely to go upstream but I'd like to know >> if I should drop it from my tree. > Had to revert same patch "mm: notify remote TLBs when dirtying a PTE" to > avoid below crash[1]. I am not sure about the cause yet. I have noticed this issue earlier with patch pointed here and root caused the issue as below. It happens after vfio_mmap request from QEMU for the PCIe device and during the access of VA when PTE access flags are updated. kvm_mmu_notifier_change_pte() --> kvm_set_spte_hve() --> kvm_set_spte_hva() --> clean_dcache_guest_page() The validation model doesn't have FWB capability supported. __clean_dcache_guest_page() attempts to perform dcache flush on pcie bar address(not a valid_pfn()) through page_address(), which doesn't have page table mapping and leads to exception. I have worked around the issue by filtering out the request if the pfn is not valid in __clean_dcache_guest_page(). As the patch wasn't posted in the community, reverted it as well. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 04/10] iommu/arm-smmu: tegra: Detect number of instances at runtime
>+static const struct arm_smmu_impl nvidia_smmu_single_impl = { >+ .probe_finalize = nvidia_smmu_probe_finalize, >+}; nvidia_smmu_probe_finalize is used before it is defined. It is defined in patch 5. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v13 00/10] iommu: I/O page faults for SMMUv3
Tested-by: Krishna Reddy Validated v13 patches in context of nested translations validation for NVMe PCIe device assigned to Guest VM. V12 patches(v13 is yet to be tested) has been tested for SVA functionality on Native OS and is functional. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
Tested-by: Krishna Reddy Validated nested translations with NVMe PCI device assigned to Guest VM. Tested with both v12 and v13 of Jean-Philippe's patches as base. > This is based on Jean-Philippe's > [PATCH v12 00/10] iommu: I/O page faults for SMMUv3 > https://lore.kernel.org/linux-arm-kernel/YBfij71tyYvh8LhB@myrica/T/ With Jean-Philippe's V13, Patch 12 of this series has a conflict that had to be resolved manually. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v12 00/13] SMMUv3 Nested Stage Setup (VFIO part)
Tested-by: Krishna Reddy Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM and is functional. This patch series resolved the mismatch(seen with v11 patches) for VFIO_IOMMU_SET_PASID_TABLE and VFIO_IOMMU_CACHE_INVALIDATE Ioctls between linux and QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" (v5.2.0-2stage-rfcv8). -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
> Hi Krishna, > On 3/15/21 7:04 PM, Krishna Reddy wrote: > > Tested-by: Krishna Reddy > > > >> 1) pass the guest stage 1 configuration > > > > Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM > along with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and > QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from > v5.2.0-2stage-rfcv8. > > NVMe PCIe device is functional with 2-stage translations and no issues > observed. > Thank you very much for your testing efforts. For your info, there are more > recent kernel series: > [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) (Feb 23) [PATCH > v12 00/13] SMMUv3 Nested Stage Setup (VFIO part) (Feb 23) > > working along with QEMU RFC > [RFC v8 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration (Feb 25) > > If you have cycles to test with those, this would be higly appreciated. Thanks Eric for the latest patches. Will validate and update. Feel free to reach out me for validating future patch sets as necessary. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (VFIO part)
Tested-by: Krishna Reddy > 1) pass the guest stage 1 configuration > 3) invalidate stage 1 related caches Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along with patch series "v13 SMMUv3 Nested Stage Setup (IOMMU part)" and QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8. NVMe PCIe device is functional with 2-stage translations and no issues observed. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
Tested-by: Krishna Reddy > 1) pass the guest stage 1 configuration Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8. NVMe PCIe device is functional with 2-stage translations and no issues observed. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v10 11/13] iommu/arm-smmu-v3: Add SVA device feature
> I agree that we should let a device driver enable SVA if it supports some > form of IOPF. Right, The PCIe device has capability to handle IO page faults on its own when ATS translation request response indicates that no valid mapping exist. SMMU doesn't involve/handle page faults for ATS translation failures here. Neither the PCIe device nor the SMMU have PRI capability. > Perhaps we could extract the IOPF capability from IOMMU_DEV_FEAT_SVA, into a > new IOMMU_DEV_FEAT_IOPF feature. > Device drivers that rely on PRI or stall can first check FEAT_IOPF, then > FEAT_SVA, and enable both separately. > Enabling FEAT_SVA would require FEAT_IOPF enabled if supported. Let me try to > write this up. Looks good to me. Allowing device driver to enable FEAT_IOPF and subsequently SVA should work. Thanks! -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v10 11/13] iommu/arm-smmu-v3: Add SVA device feature
Hi Jean, > +bool arm_smmu_master_sva_supported(struct arm_smmu_master *master) { > + if (!(master->smmu->features & ARM_SMMU_FEAT_SVA)) > + return false; + > + /* SSID and IOPF support are mandatory for the moment */ > + return master->ssid_bits && arm_smmu_iopf_supported(master); } > + Tegra Next Gen SOC has arm-smmu-v3 and It doesn't have support for PRI interface. However, PCIe client device has capability to handle the page faults on its own when the ATS translation fails. The PCIe device needs SVA feature enable without PRI interface supported at arm-smmu-v3. At present, the SVA feature enable is allowed only if the smmu/client device has PRI support. There seem to be no functional reason to make pri_supported as a pre-requisite for SVA enable. Can SVA enable be supported for pri_supported not set case as well? Also, It is noticed that SVA enable on Intel doesn't need pri_supported set. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v10 10/13] iommu/arm-smmu-v3: Check for SVA features
>> The Tegra Next Generation SOC uses arm-smmu-v3, but it doesn't have support >> for BTM. >> Do you have plan to get your earlier patch to handle invalidate >> notifications into upstream sometime soon? >> Can the dependency on BTM be relaxed with the patch? >> >> PATCH v9 13/13] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops >> https://www.spinics.net/lists/arm-kernel/msg825099.html >This patch (which should be in 5.11) only takes care of sending ATC >invalidations to PCIe endpoints. With this, BTM is still required to >invalidate SMMU TLBs. > However we could enable command-queue invalidation when ARM_SMMU_FEAT_BTM > isn't set. > Invalidations are still a relatively rare event so it may not be outrageously > slow. I can add a patch to my tree if you have hardware to test. Thanks! We can test the patch on emulation platform. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v10 10/13] iommu/arm-smmu-v3: Check for SVA features
> > The Tegra Next Generation SOC uses arm-smmu-v3, but it doesn't have support > > for BTM. > > Do you have plan to get your earlier patch to handle invalidate > > notifications into upstream sometime soon? >Is that a limitation of the SMMU implementation, the interconnect or the >integration? It is the limitation of interconnect. The DVM messages don't reach SMMU. The BTM bit in SMMU IDR does indicate that it doesn't support BTM. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v10 10/13] iommu/arm-smmu-v3: Check for SVA features
Hi Jean, > > Why is BTM mandated for SVA? I couldn't find this requirement in > > SMMU spec (Sorry if I missed it or this got discussed earlier). But > > if performance is the > only concern here, > > is it better just to allow it with a warning rather than limiting > > SMMUs without > BTM? > > It's a performance concern and requires to support multiple > configurations, but the spec allows it. Are there SMMUs without BTM > that need it? The Tegra Next Generation SOC uses arm-smmu-v3, but it doesn't have support for BTM. Do you have plan to get your earlier patch to handle invalidate notifications into upstream sometime soon? Can the dependency on BTM be relaxed with the patch? PATCH v9 13/13] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops https://www.spinics.net/lists/arm-kernel/msg825099.html -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v11 5/5] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow vendor specific implementations override default fault interrupt handlers. Update NVIDIA implementation to override the default global/context fault interrupt handlers and handle interrupts across the two ARM MMU-500s that are programmed identically. Reviewed-by: Jon Hunter Reviewed-by: Nicolin Chen Reviewed-by: Pritesh Raithatha Reviewed-by: Robin Murphy Reviewed-by: Thierry Reding Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 99 + drivers/iommu/arm-smmu.c| 17 +- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 117 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index 2f55e5793d34..31368057e9be 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -127,6 +127,103 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu) return 0; } +static irqreturn_t nvidia_smmu_global_fault_inst(int irq, +struct arm_smmu_device *smmu, +int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0); + + gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR); + if (!gfsr) + return IRQ_NONE; + + gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2); + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev) +{ + unsigned int inst; + irqreturn_t ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + + for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) { + irqreturn_t irq_ret; + + irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + ret = IRQ_HANDLED; + } + + return ret; +} + +static irqreturn_t nvidia_smmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1); + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + idx); + + fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev) +{ + int idx; + unsigned int inst; + irqreturn_t ret = IRQ_NONE; + struct arm_smmu_device *smmu; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain; + + smmu_domain = container_of(domain, struct arm_smmu_domain, domain); + smmu = smmu_domain->smmu; + + for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) { + irqreturn_t irq_ret; + + /* +* Interrupt line is shared between all contexts. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nvidia_smmu_context_fault_bank(irq, smmu, +idx, inst); + if (irq_ret == IRQ_HANDLED) + ret = IRQ_HANDLED; + } + } + + return ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nvidia_smmu_read_reg, .write_reg = nvidia_smmu_write_reg, @@ -134,6 +231,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nvidia_smmu_write_reg64, .reset = nvidia_smmu_reset, .tlb_sync = nvidia_sm
[PATCH v11 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 SoC SMMU. Reviewed-by: Jon Hunter Reviewed-by: Rob Herring Reviewed-by: Robin Murphy Signed-off-by: Krishna Reddy --- .../devicetree/bindings/iommu/arm,smmu.yaml | 25 ++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index 93fb9fe068b9..503160a7b9a0 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -44,6 +44,11 @@ properties: items: - const: marvell,ap806-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that program two ARM MMU-500s identically +items: + - enum: + - nvidia,tegra194-smmu + - const: nvidia,smmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 @@ -61,7 +66,8 @@ properties: - cavium,smmu-v2 reg: -maxItems: 1 +minItems: 1 +maxItems: 2 '#global-interrupts': description: The number of global interrupts exposed by the device. @@ -144,6 +150,23 @@ required: additionalProperties: false +allOf: + - if: + properties: +compatible: + contains: +enum: + - nvidia,tegra194-smmu +then: + properties: +reg: + minItems: 2 + maxItems: 2 +else: + properties: +reg: + maxItems: 1 + examples: - |+ /* SMMU with stream matching or stream indexing */ -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v11 2/5] iommu/arm-smmu: ioremap smmu mmio region before implementation init
ioremap smmu mmio region before calling into implementation init. This is necessary to allow mapped address available during vendor specific implementation init. Reviewed-by: Jon Hunter Reviewed-by: Nicolin Chen Reviewed-by: Pritesh Raithatha Reviewed-by: Robin Murphy Reviewed-by: Thierry Reding Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index cdd15ead9bc4..de520115d3df 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -2123,10 +2123,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev) if (err) return err; - smmu = arm_smmu_impl_init(smmu); - if (IS_ERR(smmu)) - return PTR_ERR(smmu); - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); ioaddr = res->start; smmu->base = devm_ioremap_resource(dev, res); @@ -2138,6 +2134,10 @@ static int arm_smmu_device_probe(struct platform_device *pdev) */ smmu->numpage = resource_size(res); + smmu = arm_smmu_impl_init(smmu); + if (IS_ERR(smmu)) + return PTR_ERR(smmu); + num_irqs = 0; while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) { num_irqs++; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v11 0/5] NVIDIA ARM SMMU Implementation
Changes in v11: Addressed Rob comment on DT binding patch to set min/maxItems of reg property in else part. Rebased on top of https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates. Changes in v10: Perform SMMU base ioremap before calling implementation init. Check for Global faults across both ARM MMU-500s during global interrupt. Check for context faults across all contexts of both ARM MMU-500s during context fault interrupt. Add new DT binding nvidia,smmu-500 for NVIDIA implementation. https://lkml.org/lkml/2020/7/8/57 v9 - https://lkml.org/lkml/2020/6/30/1282 v8 - https://lkml.org/lkml/2020/6/29/2385 v7 - https://lkml.org/lkml/2020/6/28/347 v6 - https://lkml.org/lkml/2020/6/4/1018 v5 - https://lkml.org/lkml/2020/5/21/1114 v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (5): iommu/arm-smmu: move TLB timeout and spin count macros iommu/arm-smmu: ioremap smmu mmio region before implementation init iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage dt-bindings: arm-smmu: add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks .../devicetree/bindings/iommu/arm,smmu.yaml | 25 +- MAINTAINERS | 2 + drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 278 ++ drivers/iommu/arm-smmu.c | 29 +- drivers/iommu/arm-smmu.h | 6 + 7 files changed, 334 insertions(+), 11 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: 49fbb25030265c660de732513f18275d88ff99d3 -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v11 1/5] iommu/arm-smmu: move TLB timeout and spin count macros
Move TLB timeout and spin count macros to header file to allow using the same from vendor specific implementations. Reviewed-by: Jon Hunter Reviewed-by: Nicolin Chen Reviewed-by: Pritesh Raithatha Reviewed-by: Robin Murphy Reviewed-by: Thierry Reding Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu.c | 3 --- drivers/iommu/arm-smmu.h | 2 ++ 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 19f906de6420..cdd15ead9bc4 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -52,9 +52,6 @@ */ #define QCOM_DUMMY_VAL -1 -#define TLB_LOOP_TIMEOUT 100 /* 1s! */ -#define TLB_SPIN_COUNT 10 - #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index d172c024be61..c7d0122a7c6c 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -236,6 +236,8 @@ enum arm_smmu_cbar_type { /* Maximum number of context banks per SMMU */ #define ARM_SMMU_MAX_CBS 128 +#define TLB_LOOP_TIMEOUT 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 /* Shared driver definitions */ enum arm_smmu_arch_version { -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v11 3/5] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances. It uses two of the ARM MMU-500s together to interleave IOVA accesses across them and must be programmed identically. This implementation supports programming the two ARM MMU-500s that must be programmed identically. The third ARM MMU-500 instance is supported by standard arm-smmu.c driver itself. Reviewed-by: Jon Hunter Reviewed-by: Nicolin Chen Reviewed-by: Pritesh Raithatha Reviewed-by: Robin Murphy Reviewed-by: Thierry Reding Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 179 drivers/iommu/arm-smmu.c| 1 + drivers/iommu/arm-smmu.h| 1 + 6 files changed, 187 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index 496fd4eafb68..ee2c0ba13a0f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16810,8 +16810,10 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported +F: drivers/iommu/arm-smmu-nvidia.c F: drivers/iommu/tegra* TEGRA KBC DRIVER diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 342190196dfb..2b8203db73ec 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c87d825f651e..f4ff124a1967 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -213,6 +213,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) if (of_property_read_bool(np, "calxeda,smmu-secure-config-access")) smmu->impl = &calxeda_impl; + if (of_device_is_compatible(np, "nvidia,tegra194-smmu")) + return nvidia_smmu_impl_init(smmu); + if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") || of_device_is_compatible(np, "qcom,sc7180-smmu-500") || of_device_is_compatible(np, "qcom,sm8150-smmu-500") || diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index ..2f55e5793d34 --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,179 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Copyright (C) 2019-2020 NVIDIA CORPORATION. All rights reserved. + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* + * Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together and must be programmed identically for + * interleaved IOVA accesses across them and translates accesses from + * non-isochronous HW devices. + * Third one is used for translating accesses from isochronous HW devices. + * This implementation supports programming of the two instances that must + * be programmed identically. + * The third instance usage is through standard arm-smmu driver itself and + * is out of scope of this implementation. + */ +#define NUM_SMMU_INSTANCES 2 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + void __iomem*bases[NUM_SMMU_INSTANCES]; +}; + +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, +unsigned int inst, int page) +{ + struct nvidia_smmu *nvidia_smmu; + + nvidia_smmu = container_of(smmu, struct nvidia_smmu, smmu); + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); +} + +static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return readl_relaxed(reg); +} + +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + + for (i = 0; i < NUM_SMMU_INSTANCES; i++) { + void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset; + + writel_relaxed(val, reg); + } +} + +static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return rea
RE: [PATCH v10 0/5] NVIDIA ARM SMMU Implementation
>On Mon, Jul 13, 2020 at 02:50:20PM +0100, Will Deacon wrote: >> On Tue, Jul 07, 2020 at 10:00:12PM -0700, Krishna Reddy wrote: > >> Changes in v10: >> > Perform SMMU base ioremap before calling implementation init. >> > Check for Global faults across both ARM MMU-500s during global interrupt. >> > Check for context faults across all contexts of both ARM MMU-500s during >> > context fault interrupt. > >> Add new DT binding nvidia,smmu-500 for NVIDIA implementation. >> >> Please repost based on my SMMU queue, as this doesn't currently apply. >> >> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h= >> for-joerg/arm-smmu/updates >Any update on this, please? It would be a shame to miss 5.9 now that the >patches look alright. Thanks for pinging. I have been out of the office this week. Just started working on reposting patches. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v10 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU
Thanks Rob. One question on setting "minItems: ". Please see below. >> +allOf: >> + - if: >> + properties: >> +compatible: >> + contains: >> +enum: >> + - nvidia,tegra194-smmu >> +then: >> + properties: >> +reg: >> + minItems: 2 >> + maxItems: 2 >This doesn't work. The main part of the schema already said there's only >1 reg region. This part is ANDed with that, not an override. You need to add >an else clause with 'maxItems: 1' and change the base schema to >{minItems: 1, maxItems: 2}. As the earlier version of base schema doesn't have "minItems: " set, should it be set to 0 for backward compatibility? Or can it just be omitted setting in base schema as before? "else" part to set "maxItems: 1" and setting "maxItems: 2" in base schema is clear to me. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v10 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 SoC SMMU. Signed-off-by: Krishna Reddy --- .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++ 1 file changed, 18 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index d7ceb4c34423..ac1f526c3424 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -38,6 +38,11 @@ properties: - qcom,sc7180-smmu-500 - qcom,sdm845-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that program two ARM MMU-500s identically +items: + - enum: + - nvidia,tegra194-smmu + - const: nvidia,smmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 @@ -138,6 +143,19 @@ required: additionalProperties: false +allOf: + - if: + properties: +compatible: + contains: +enum: + - nvidia,tegra194-smmu +then: + properties: +reg: + minItems: 2 + maxItems: 2 + examples: - |+ /* SMMU with stream matching or stream indexing */ -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v10 1/5] iommu/arm-smmu: move TLB timeout and spin count macros
Move TLB timeout and spin count macros to header file to allow using the same from vendor specific implementations. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu.c | 3 --- drivers/iommu/arm-smmu.h | 2 ++ 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 243bc4cb2705..d2054178df35 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -52,9 +52,6 @@ */ #define QCOM_DUMMY_VAL -1 -#define TLB_LOOP_TIMEOUT 100 /* 1s! */ -#define TLB_SPIN_COUNT 10 - #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index d172c024be61..c7d0122a7c6c 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -236,6 +236,8 @@ enum arm_smmu_cbar_type { /* Maximum number of context banks per SMMU */ #define ARM_SMMU_MAX_CBS 128 +#define TLB_LOOP_TIMEOUT 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 /* Shared driver definitions */ enum arm_smmu_arch_version { -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v10 2/5] iommu/arm-smmu: ioremap smmu mmio region before implementation init
ioremap smmu mmio region before calling into implementation init. This is necessary to allow mapped address available during vendor specific implementation init. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index d2054178df35..e03e873d3bca 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -2120,10 +2120,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev) if (err) return err; - smmu = arm_smmu_impl_init(smmu); - if (IS_ERR(smmu)) - return PTR_ERR(smmu); - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); ioaddr = res->start; smmu->base = devm_ioremap_resource(dev, res); @@ -2135,6 +2131,10 @@ static int arm_smmu_device_probe(struct platform_device *pdev) */ smmu->numpage = resource_size(res); + smmu = arm_smmu_impl_init(smmu); + if (IS_ERR(smmu)) + return PTR_ERR(smmu); + num_irqs = 0; while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) { num_irqs++; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v10 0/5] NVIDIA ARM SMMU Implementation
Changes in v10: Perform SMMU base ioremap before calling implementation init. Check for Global faults across both ARM MMU-500s during global interrupt. Check for context faults across all contexts of both ARM MMU-500s during context fault interrupt. Add new DT binding nvidia,smmu-500 for NVIDIA implementation. v9 - https://lkml.org/lkml/2020/6/30/1282 v8 - https://lkml.org/lkml/2020/6/29/2385 v7 - https://lkml.org/lkml/2020/6/28/347 v6 - https://lkml.org/lkml/2020/6/4/1018 v5 - https://lkml.org/lkml/2020/5/21/1114 v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (5): iommu/arm-smmu: move TLB timeout and spin count macros iommu/arm-smmu: ioremap smmu mmio region before implementation init iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage dt-bindings: arm-smmu: add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks .../devicetree/bindings/iommu/arm,smmu.yaml | 18 ++ MAINTAINERS | 2 + drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 278 ++ drivers/iommu/arm-smmu.c | 29 +- drivers/iommu/arm-smmu.h | 6 + 7 files changed, 328 insertions(+), 10 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: e5640f21b63d2a5d3e4e0c4111b2b38e99fe5164 -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v10 5/5] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow vendor specific implementations override default fault interrupt handlers. Update NVIDIA implementation to override the default global/context fault interrupt handlers and handle interrupts across the two ARM MMU-500s that are programmed identically. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 99 + drivers/iommu/arm-smmu.c| 17 +- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 117 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index 2f55e5793d34..31368057e9be 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -127,6 +127,103 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu) return 0; } +static irqreturn_t nvidia_smmu_global_fault_inst(int irq, +struct arm_smmu_device *smmu, +int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0); + + gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR); + if (!gfsr) + return IRQ_NONE; + + gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2); + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev) +{ + unsigned int inst; + irqreturn_t ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + + for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) { + irqreturn_t irq_ret; + + irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + ret = IRQ_HANDLED; + } + + return ret; +} + +static irqreturn_t nvidia_smmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1); + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + idx); + + fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev) +{ + int idx; + unsigned int inst; + irqreturn_t ret = IRQ_NONE; + struct arm_smmu_device *smmu; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain; + + smmu_domain = container_of(domain, struct arm_smmu_domain, domain); + smmu = smmu_domain->smmu; + + for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) { + irqreturn_t irq_ret; + + /* +* Interrupt line is shared between all contexts. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nvidia_smmu_context_fault_bank(irq, smmu, +idx, inst); + if (irq_ret == IRQ_HANDLED) + ret = IRQ_HANDLED; + } + } + + return ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nvidia_smmu_read_reg, .write_reg = nvidia_smmu_write_reg, @@ -134,6 +231,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nvidia_smmu_write_reg64, .reset = nvidia_smmu_reset, .tlb_sync = nvidia_smmu_tlb_sync, + .global_fault = nvidia_smmu_global_fault, + .context_fault = nvidia_smmu_context_fault, }; struct arm_smmu_device *nvidia_smmu
[PATCH v10 3/5] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances. It uses two of the ARM MMU-500s together to interleave IOVA accesses across them and must be programmed identically. This implementation supports programming the two ARM MMU-500s that must be programmed identically. The third ARM MMU-500 instance is supported by standard arm-smmu.c driver itself. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 179 drivers/iommu/arm-smmu.c| 1 + drivers/iommu/arm-smmu.h| 1 + 6 files changed, 187 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index c23352059a6b..534cedaf8e55 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16811,8 +16811,10 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported +F: drivers/iommu/arm-smmu-nvidia.c F: drivers/iommu/tegra* TEGRA KBC DRIVER diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 342190196dfb..2b8203db73ec 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c75b9d957b70..f15571d05474 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) if (of_property_read_bool(np, "calxeda,smmu-secure-config-access")) smmu->impl = &calxeda_impl; + if (of_device_is_compatible(np, "nvidia,tegra194-smmu")) + return nvidia_smmu_impl_init(smmu); + if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") || of_device_is_compatible(np, "qcom,sc7180-smmu-500")) return qcom_smmu_impl_init(smmu); diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index ..2f55e5793d34 --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,179 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Copyright (C) 2019-2020 NVIDIA CORPORATION. All rights reserved. + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* + * Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together and must be programmed identically for + * interleaved IOVA accesses across them and translates accesses from + * non-isochronous HW devices. + * Third one is used for translating accesses from isochronous HW devices. + * This implementation supports programming of the two instances that must + * be programmed identically. + * The third instance usage is through standard arm-smmu driver itself and + * is out of scope of this implementation. + */ +#define NUM_SMMU_INSTANCES 2 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + void __iomem*bases[NUM_SMMU_INSTANCES]; +}; + +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, +unsigned int inst, int page) +{ + struct nvidia_smmu *nvidia_smmu; + + nvidia_smmu = container_of(smmu, struct nvidia_smmu, smmu); + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); +} + +static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return readl_relaxed(reg); +} + +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + + for (i = 0; i < NUM_SMMU_INSTANCES; i++) { + void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset; + + writel_relaxed(val, reg); + } +} + +static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return readq_relaxed(reg); +} + +static void nvidia_smmu_write_reg64(struct arm_smmu_device *smmu, + int page, int offset, u64 va
RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
On 01/07/2020 20:00, Krishna Reddy wrote: >>>>>> +items: >>>>>> + - enum: >>>>>> + - nvdia,tegra194-smmu >>>>>> + - const: arm,mmu-500 >>>> >>>>> Is the fallback compatible appropriate here? If software treats this as a >>>>> standard MMU-500 it will only program the first instance (because the >>>>> second isn't presented as a separate MMU-500) - is there any way that >>>>> isn't going to blow up? >>>> >>>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, >>>> implementation override ensure that both instances are programmed. Isn't >>>> it? I am not sure I follow your comment fully. >> >>> The problem is, if for some reason someone had a Tegra194, but only set the >>> compatible string to 'arm,mmu-500' it would assume that it was a normal >>> arm,mmu-500 and only one instance would be programmed. We always want at >>> least 2 of the 3 instances >>programmed and so we should only match >>> 'nvidia,tegra194-smmu'. In fact, I think that we also need to update the >>> arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to >>> &arm_mmu500. >> >> In that case, new binding "nvidia,smmu-v2" can be added with data set to >> &arm_mmu500 and enumeration would have nvidia,tegra194-smmu and another >> variant for next generation SoC in future. >I think you would be better off with nvidia,smmu-500 as smmu-v2 appears to be >something different. I see others have a smmu-v2 but I am not sure if that is >legacy. We have an smmu-500 and so that would seem more appropriate. I tried to use the binding synonymous to other vendors. V2 is the architecture version. MMU-500 is the actual implementation from ARM based on V2 arch. As we just use the MMU-500 IP as it is, It can be named as nvidia,smmu-500 or similar as well. Others probably having their own implementation based on V2 arch. KR -- nvpublic ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
>> With shared irq line, the context fault identification is not optimal >> already. Reading all the context banks all the time can be additional mmio >> read overhead. But, it may not hurt the real use cases as these happen only >> when there are bugs. >Right, I did ponder the idea of a whole programmatic "request_context_irq" >hook that would allow registering the handler for both interrupts with the >appropriate context bank and instance data, but since all interrupts are >currently unexpected it seems somewhat hard to justify the extra complexity. >Obviously we can revisit this in future if you want to start actually doing >something with faults like the qcom GPU folks do. Thanks, I would just avoid making changes to interrupt handlers till it is really necessary in future. The current code would just be simple and functional with more interrupts when there are multiple faults. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>Yeah, I realised later last night that this probably originated from forking >the whole driver downstream. But even then you could have treated the other >one as a separate nsmmu with a single instance ;) True, But the initial nvidia implementation had limitation that it can only handle one instance of usage. With your implementation hooks design, it should be able to handle multiple instances of usage now. >Since it does add a bit of confusion to the code and comments, let's just keep >things simple. I do like Jon's suggestion of actually enforcing that the >number of "reg" regions exactly matches the number expected for the given >compatible - I guess for now that means just hard-coding 2 and hoping the >hardware folks don't cook up any more of these... For T194, reg can just be forced to 2. No future plan to use more than two MMU-500s together as of now. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
+items: + - enum: + - nvdia,tegra194-smmu + - const: arm,mmu-500 > > >>> Is the fallback compatible appropriate here? If software treats this as a >>> standard MMU-500 it will only program the first instance (because the >>> second isn't presented as a separate MMU-500) - is there any way that isn't >>> going to blow up? > > >> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, >> implementation override ensure that both instances are programmed. Isn't it? >> I am not sure I follow your comment fully. >The problem is, if for some reason someone had a Tegra194, but only set the >compatible string to 'arm,mmu-500' it would assume that it was a normal >arm,mmu-500 and only one instance would be programmed. We always want at least >2 of the 3 instances >programmed and so we should only match >'nvidia,tegra194-smmu'. In fact, I think that we also need to update the >arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to >&arm_mmu500. In that case, new binding "nvidia,smmu-v2" can be added with data set to &arm_mmu500 and enumeration would have nvidia,tegra194-smmu and another variant for next generation SoC in future. -KR -- nvpublic ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
>>> +for (inst = 0; inst < nvidia_smmu->num_inst; inst++) { >>> +irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst); >>> +if (irq_ret == IRQ_HANDLED) >>> +return irq_ret; >> >> Any chance there could be more than one SMMU faulting by the time we >> service the interrupt? >It certainly seems plausible if the interconnect is automatically >load-balancing requests across the SMMU instances - say a driver bug caused a >buffer to be unmapped too early, there could be many in-flight accesses to >parts of that buffer that aren't all taking the same path and thus could now >fault in parallel. >[ And anyone inclined to nitpick global vs. context faults, s/unmap a >buffer/tear down a domain/ ;) ] >Either way I think it would be easier to reason about if we just handled these >like a typical shared interrupt and always checked all the instances. It would be optimal to check at the same time across all instances. >>> +for (idx = 0; idx < smmu->num_context_banks; idx++) { >>> +irq_ret = nvidia_smmu_context_fault_bank(irq, smmu, >>> + idx, >>> + inst); >>> + >>> +if (irq_ret == IRQ_HANDLED) >>> +return irq_ret; >> >> Any reason why we don't check all banks? >As above, we certainly shouldn't bail out without checking the bank for the >offending domain across all of its instances, and I guess the way this works >means that we would have to iterate all the banks to achieve that. With shared irq line, the context fault identification is not optimal already. Reading all the context banks all the time can be additional mmio read overhead. But, it may not hurt the real use cases as these happen only when there are bugs. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
>> + - description: NVIDIA SoCs that use more than one "arm,mmu-500" > Hmm, there must be a better way to word that to express that it only applies > to the sets of SMMUs that must be programmed identically, and not any other > independent MMU-500s that might also happen to be in the same SoC. Let me reword it to "NVIDIA SoCs that must program multiple MMU-500s identically". >> +items: >> + - enum: >> + - nvdia,tegra194-smmu >> + - const: arm,mmu-500 >Is the fallback compatible appropriate here? If software treats this as a >standard MMU-500 it will only program the first instance (because the second >isn't presented as a separate MMU-500) - is there any way that isn't going to >blow up? When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, implementation override ensure that both instances are programmed. Isn't it? I am not sure I follow your comment fully. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>> + * When Linux kernel supports multiple SMMU devices, the SMMU device >> +used for >> + * isochornous HW devices should be added as a separate ARM MMU-500 >> +device >> + * in DT and be programmed independently for efficient TLB invalidates. >I don't understand the "When" there - the driver has always supported multiple >independent SMMUs, and it's not something that could be configured out or >otherwise disabled. Plus I really don't see why you would ever want to force >unrelated SMMUs to be >programmed together - beyond the TLB thing mentioned it >would also waste precious context bank resources and might lead to weird >device grouping via false stream ID aliasing, with no obvious upside at all. Sorry, I missed this comment. During the initial patches, when the iommu_ops were different between, support multiple SMMU drivers at the same is not possible as one of them(that gets probed last) overwrites the platform bus ops. On revisiting the original issue, This problem is no longer relevant. At this point, It makes more sense to just get rid of 3rd instance programming in arm-smmu-nvidia.c and just limit it to the SMMU instances that need identical programming. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v9 1/4] iommu/arm-smmu: move TLB timeout and spin count macros
Move TLB timeout and spin count macros to header file to allow using the same values from vendor specific implementations. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu.c | 3 --- drivers/iommu/arm-smmu.h | 2 ++ 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 243bc4cb2705b..d2054178df357 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -52,9 +52,6 @@ */ #define QCOM_DUMMY_VAL -1 -#define TLB_LOOP_TIMEOUT 100 /* 1s! */ -#define TLB_SPIN_COUNT 10 - #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index d172c024be618..c7d0122a7c6ca 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -236,6 +236,8 @@ enum arm_smmu_cbar_type { /* Maximum number of context banks per SMMU */ #define ARM_SMMU_MAX_CBS 128 +#define TLB_LOOP_TIMEOUT 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 /* Shared driver definitions */ enum arm_smmu_arch_version { -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v9 3/4] dt-bindings: arm-smmu: add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++ 1 file changed, 18 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index d7ceb4c34423b..662c46e16f07d 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -38,6 +38,11 @@ properties: - qcom,sc7180-smmu-500 - qcom,sdm845-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that use more than one "arm,mmu-500" +items: + - enum: + - nvidia,tegra194-smmu + - const: arm,mmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 @@ -138,6 +143,19 @@ required: additionalProperties: false +allOf: + - if: + properties: +compatible: + contains: +enum: + - nvidia,tegra194-smmu +then: + properties: +reg: + minItems: 2 + maxItems: 3 + examples: - |+ /* SMMU with stream matching or stream indexing */ -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v9 0/4] NVIDIA ARM SMMUv2 Implementation
Changes in v9: Move TLB Timeout and spin count macros to arm-smmu.h header to share with implementation. Set minItems and maxItems for reg property when compatible contains nvidia,tegra194-smmu. Update commit message for NVIDIA implementation patch. Fail single SMMU instance usage through NVIDIA implementation to limit the usage to two or three instances. Fix checkpatch warnings with --strict checking. v8 - https://lkml.org/lkml/2020/6/29/2385 v7 - https://lkml.org/lkml/2020/6/28/347 v6 - https://lkml.org/lkml/2020/6/4/1018 v5 - https://lkml.org/lkml/2020/5/21/1114 v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (4): iommu/arm-smmu: move TLB timeout and spin count macros iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage dt-bindings: arm-smmu: add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks .../devicetree/bindings/iommu/arm,smmu.yaml | 18 ++ MAINTAINERS | 2 + drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 304 ++ drivers/iommu/arm-smmu.c | 20 +- drivers/iommu/arm-smmu.h | 6 + 7 files changed, 349 insertions(+), 6 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v9 2/4] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances. It uses two of ARM MMU-500s together to interleave IOVA accesses across them and must be programmed identically. The third SMMU instance is used as a regular ARM MMU-500 and it can either be programmed independently or identical to other two ARM MMU-500s. This implementation supports programming two or three ARM MMU-500s identically as per DT config. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 206 drivers/iommu/arm-smmu.h| 1 + 5 files changed, 213 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index 7b5ffd646c6b9..64c37dbdd4426 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported +F: drivers/iommu/arm-smmu-nvidia.c F: drivers/iommu/tegra* TEGRA KBC DRIVER diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 342190196dfb0..2b8203db73ec3 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c75b9d957b702..f15571d05474e 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) if (of_property_read_bool(np, "calxeda,smmu-secure-config-access")) smmu->impl = &calxeda_impl; + if (of_device_is_compatible(np, "nvidia,tegra194-smmu")) + return nvidia_smmu_impl_init(smmu); + if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") || of_device_is_compatible(np, "qcom,sc7180-smmu-500")) return qcom_smmu_impl_init(smmu); diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index 0..5c874912e1c1a --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,206 @@ +// SPDX-License-Identifier: GPL-2.0-only +// NVIDIA ARM SMMU v2 implementation quirks +// Copyright (C) 2019-2020 NVIDIA CORPORATION. All rights reserved. + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* + * Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together for interleaved IOVA accesses and + * used by non-isochronous HW devices for SMMU translations. + * Third one is used for SMMU translations from isochronous HW devices. + * It is possible to use this implementation to program either + * all three or two of the instances identically as desired through + * DT node. + * + * Programming all the three instances identically comes with redundant TLB + * invalidations as all three never need to be TLB invalidated for a HW device. + * + * When Linux kernel supports multiple SMMU devices, the SMMU device used for + * isochornous HW devices should be added as a separate ARM MMU-500 device + * in DT and be programmed independently for efficient TLB invalidates. + */ +#define MAX_SMMU_INSTANCES 3 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + unsigned intnum_inst; + void __iomem*bases[MAX_SMMU_INSTANCES]; +}; + +static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu) +{ + return container_of(smmu, struct nvidia_smmu, smmu); +} + +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, +unsigned int inst, int page) +{ + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + if (!nvidia_smmu->bases[0]) + nvidia_smmu->bases[0] = smmu->base; + + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); +} + +static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return readl_relaxed(reg); +} + +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; +
[PATCH v9 4/4] iommu/arm-smmu: add global/context fault implementation hooks
Add global/context fault hooks to allow NVIDIA SMMU implementation handle faults across multiple SMMUs. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 98 + drivers/iommu/arm-smmu.c| 17 +- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 116 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index 5c874912e1c1a..d279788eab954 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -144,6 +144,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu) return 0; } +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static irqreturn_t nvidia_smmu_global_fault_inst(int irq, +struct arm_smmu_device *smmu, +int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0); + + gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR); + if (!gfsr) + return IRQ_NONE; + + gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2); + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev) +{ + int inst; + irqreturn_t irq_ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + for (inst = 0; inst < nvidia_smmu->num_inst; inst++) { + irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + return irq_ret; +} + +static irqreturn_t nvidia_smmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1); + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + idx); + + fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev) +{ + int inst, idx; + irqreturn_t irq_ret = IRQ_NONE; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + /* +* Interrupt line is shared between all contexts. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nvidia_smmu_context_fault_bank(irq, smmu, +idx, inst); + + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + } + + return irq_ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nvidia_smmu_read_reg, .write_reg = nvidia_smmu_write_reg, @@ -151,6 +247,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nvidia_smmu_write_reg64, .reset = nvidia_smmu_reset, .tlb_sync = nvidia_smmu_tlb_sync, + .global_fault = nvidia_smmu_global_fault, + .context_fault = nvidia_smmu_context_fault, }; struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index d2054
RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>> The driver intend to support up to 3 instances. It doesn't really mandate >> that all three instances be present in same DT node. >> Each mmio aperture in "reg" property is an instance here. reg = >> , , ; The reg can have >> all three or less and driver just configures based on reg and it works fine. >So it sounds like we need at least 2 SMMUs (for non-iso and iso) but we have >up to 3 (for Tegra194). So the question is do we have a use-case where we only >use 2 and not 3? If not, then it still seems that we should require that all 3 >are present. It can be either 2 SMMUs (for non-iso) or 3 SMMUs (for non-iso and iso). Let me fail the one instance case as it can use regular arm smmu implementation and don't need nvidia implementation explicitly. >The other problem I see here is that currently the arm-smmu binding defines >the 'reg' with a 'maxItems' of 1, whereas we have 3. I believe that this will >get caught by the 'dt_binding_check' when we try to populate the binding. Thanks for pointing it out! Will update the binding doc. -KR -- nvpublic ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>OK, well I see what you are saying, but if we intended to support all 3 for >Tegra194, then we should ensure all 3 are initialised correctly. The driver intend to support up to 3 instances. It doesn't really mandate that all three instances be present in same DT node. Each mmio aperture in "reg" property is an instance here. reg = , , ; The reg can have all three or less and driver just configures based on reg and it works fine. >It would be better to query the number of SMMUs populated in device-tree and >then ensure that all are initialised correctly. Getting the IORESOURCE_MEM is the way to count the instances driver need to support. In a way, It is already querying through IORESOURCE_MEM here. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave >> IOVA accesses across them. >> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible >> string for Tegra194 SoC SMMU topology. >There is no description here of the 3rd SMMU that you mention below. >I think that we should describe the full picture here. This driver is primarily for dual SMMU config. So, It is avoided in the commit message. However, Implementation supports option to configure 3 instances identically with one SMMU DT node and is documented in the implementation. >> + >> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, >> + unsigned int inst, int page) >If you run checkpatch --strict on these you will get a lot of ... >CHECK: Alignment should match open parenthesis >#116: FILE: drivers/iommu/arm-smmu-nvidia.c:46: >+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, >+ unsigned int inst, int page) >We should fix these. I will fix these if I need to push a new patch set. >> +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu, >> +int page, int offset, u32 val) { >> +unsigned int i; >> +struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); >> + >> +for (i = 0; i < nvidia_smmu->num_inst; i++) { >> +void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset; >Personally, I would declare 'reg' outside of the loop as I feel it will make >the code cleaner and easier to read. It was like that before and is updated to its current form to limit the scope of variables as per Thierry's comments in v6. We can just leave it as it is as there is no technical issue here. -KR -- nvpublic ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device >> +*smmu) { >> +unsigned int i; >> +for (i = 1; i < MAX_SMMU_INSTANCES; i++) { >> +struct resource *res; >> + >> +res = platform_get_resource(pdev, IORESOURCE_MEM, i); >> +if (!res) >> +break; >Currently this driver is only supported for Tegra194 which I understand has 3 >SMMUs. Therefore, I don't feel that we should fail silently here, I think it >is better to return an error if all 3 cannot be initialised. Initialization of all the three SMMU instances is not necessary here. The driver can work with all the possible number of instances 1, 2 and 3 based on the DT config though it doesn't make much sense to use it with 1 instance. There is no silent failure here from driver point of view. If there is misconfig in DT, SMMU faults would catch issues. >> +nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res); >> +if (IS_ERR(nvidia_smmu->bases[i])) >> +return ERR_CAST(nvidia_smmu->bases[i]); >You want to use PTR_ERR() here. PTR_ERR() returns long integer. This function returns a pointer. ERR_CAST is the right one to use here. -- nvpublic ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow NVIDIA SMMU implementation handle faults across multiple SMMUs. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 98 + drivers/iommu/arm-smmu.c| 17 +- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 116 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index 1124f0ac1823a..c9423b4199c65 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu) return 0; } +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static irqreturn_t nvidia_smmu_global_fault_inst(int irq, + struct arm_smmu_device *smmu, + int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0); + + gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR); + if (!gfsr) + return IRQ_NONE; + + gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2); + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev) +{ + int inst; + irqreturn_t irq_ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + for (inst = 0; inst < nvidia_smmu->num_inst; inst++) { + irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + return irq_ret; +} + +static irqreturn_t nvidia_smmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1); + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + idx); + + fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev) +{ + int inst, idx; + irqreturn_t irq_ret = IRQ_NONE; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + /* +* Interrupt line shared between all context faults. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nvidia_smmu_context_fault_bank(irq, smmu, +idx, inst); + + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + } + + return irq_ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nvidia_smmu_read_reg, .write_reg = nvidia_smmu_write_reg, @@ -154,6 +250,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nvidia_smmu_write_reg64, .reset = nvidia_smmu_reset, .tlb_sync = nvidia_smmu_tlb_sync, + .global_fault = nvidia_smmu_global_fault, + .context_fault = nvidia_smmu_context_fault, }; struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 243bc4cb2705b..3bb0aba15a356 100644 --- a/drivers/iommu/arm-s
[PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index d7ceb4c34423b..5b2586ac715ed 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -38,6 +38,11 @@ properties: - qcom,sc7180-smmu-500 - qcom,sdm845-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that use more than one "arm,mmu-500" +items: + - enum: + - nvdia,tegra194-smmu + - const: arm,mmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 0/3] Nvidia Arm SMMUv2 Implementation
Changes in v8: Fixed incorrect CB_FSR read issue during context bank fault. Rebased and validated patches on top of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next v7 - https://lkml.org/lkml/2020/6/28/347 v6 - https://lkml.org/lkml/2020/6/4/1018 v5 - https://lkml.org/lkml/2020/5/21/1114 v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (3): iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage dt-bindings: arm-smmu: Add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks .../devicetree/bindings/iommu/arm,smmu.yaml | 5 + MAINTAINERS | 2 + drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 294 ++ drivers/iommu/arm-smmu.c | 17 +- drivers/iommu/arm-smmu.h | 4 + 7 files changed, 324 insertions(+), 3 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave IOVA accesses across them. Add NVIDIA implementation for dual ARM MMU-500s and add new compatible string for Tegra194 SoC SMMU topology. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 196 drivers/iommu/arm-smmu.h| 1 + 5 files changed, 203 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index 7b5ffd646c6b9..64c37dbdd4426 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported +F: drivers/iommu/arm-smmu-nvidia.c F: drivers/iommu/tegra* TEGRA KBC DRIVER diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 342190196dfb0..2b8203db73ec3 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c75b9d957b702..70f7318017617 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) if (of_property_read_bool(np, "calxeda,smmu-secure-config-access")) smmu->impl = &calxeda_impl; + if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu")) + return nvidia_smmu_impl_init(smmu); + if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") || of_device_is_compatible(np, "qcom,sc7180-smmu-500")) return qcom_smmu_impl_init(smmu); diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index 0..1124f0ac1823a --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,196 @@ +// SPDX-License-Identifier: GPL-2.0-only +// NVIDIA ARM SMMU v2 implementation quirks +// Copyright (C) 2019-2020 NVIDIA CORPORATION. All rights reserved. + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* + * Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together for interleaved IOVA accesses and + * used by non-isochronous HW devices for SMMU translations. + * Third one is used for SMMU translations from isochronous HW devices. + * It is possible to use this implementation to program either + * all three or two of the instances identically as desired through + * DT node. + * + * Programming all the three instances identically comes with redundant TLB + * invalidations as all three never need to be TLB invalidated for a HW device. + * + * When Linux kernel supports multiple SMMU devices, the SMMU device used for + * isochornous HW devices should be added as a separate ARM MMU-500 device + * in DT and be programmed independently for efficient TLB invalidates. + */ +#define MAX_SMMU_INSTANCES 3 + +#define TLB_LOOP_TIMEOUT_IN_US 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + unsigned intnum_inst; + void __iomem*bases[MAX_SMMU_INSTANCES]; +}; + +static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu) +{ + return container_of(smmu, struct nvidia_smmu, smmu); +} + +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, + unsigned int inst, int page) +{ + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + if (!nvidia_smmu->bases[0]) + nvidia_smmu->bases[0] = smmu->base; + + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); +} + +static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return readl_relaxed(reg); +} + +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + for (i = 0; i < nvidia_smmu-&g
RE: [PATCH v7 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
>> +static irqreturn_t nvidia_smmu_context_fault_bank(int irq, >> + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, >> + smmu->numpage + idx); [...] >> + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); [...] >> + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR); >It reads FSR of the default inst (1st), but clears the FSR of corresponding >inst -- just want to make sure that this is okay and intended. FSR should be read from corresponding inst. Not from instance 0. Let me post updated patch. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v7 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>> + if (!nvidia_smmu->bases[0]) >> + nvidia_smmu->bases[0] = smmu->base; >> + >> + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); } >Not critical -- just a nit: why not put the bases[0] in init()? smmu->base is not available during nvidia_smmu_impl_init() call. It is set afterwards in arm-smmu.c. It can't be avoided without changing the devm_ioremap() and impl_init() call order in arm-smmu.c. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index d7ceb4c34423b..5b2586ac715ed 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -38,6 +38,11 @@ properties: - qcom,sc7180-smmu-500 - qcom,sdm845-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that use more than one "arm,mmu-500" +items: + - enum: + - nvdia,tegra194-smmu + - const: arm,mmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow NVIDIA SMMU implementation handle faults across multiple SMMUs. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 101 +++- drivers/iommu/arm-smmu.c| 17 +- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 118 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index b73c483fa3376..7276bb203ae79 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu) return 0; } +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static irqreturn_t nvidia_smmu_global_fault_inst(int irq, + struct arm_smmu_device *smmu, + int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0); + + gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR); + gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2); + + if (!gfsr) + return IRQ_NONE; + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev) +{ + int inst; + irqreturn_t irq_ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + for (inst = 0; inst < nvidia_smmu->num_inst; inst++) { + irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + return irq_ret; +} + +static irqreturn_t nvidia_smmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1); + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + idx); + + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev) +{ + int inst, idx; + irqreturn_t irq_ret = IRQ_NONE; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + /* +* Interrupt line shared between all context faults. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nvidia_smmu_context_fault_bank(irq, smmu, +idx, inst); + + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + } + + return irq_ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nvidia_smmu_read_reg, .write_reg = nvidia_smmu_write_reg, @@ -154,6 +250,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nvidia_smmu_write_reg64, .reset = nvidia_smmu_reset, .tlb_sync = nvidia_smmu_tlb_sync, + .global_fault = nvidia_smmu_global_fault, + .context_fault = nvidia_smmu_context_fault, }; struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) @@ -185,7 +283,8 @@ struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) } nvidia_smmu->smm
[PATCH v7 0/3] Nvidia Arm SMMUv2 Implementation
Changes in v7: Incorporated the review feedback from Nicolin Chen, Robin Murphy and Thierry Reding. Rebased and validated patches on top of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next v6- https://lkml.org/lkml/2020/6/4/1018 v5 - https://lkml.org/lkml/2020/5/21/1114 v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (3): iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage dt-bindings: arm-smmu: Add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks .../devicetree/bindings/iommu/arm,smmu.yaml | 5 + MAINTAINERS | 2 + drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 294 ++ drivers/iommu/arm-smmu.c | 17 +- drivers/iommu/arm-smmu.h | 4 + 7 files changed, 324 insertions(+), 3 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave IOVA accesses across them. Add NVIDIA implementation for dual ARM MMU-500s and add new compatible string for Tegra194 SoC SMMU topology. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 195 drivers/iommu/arm-smmu.h| 1 + 5 files changed, 202 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index 7b5ffd646c6b9..64c37dbdd4426 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported +F: drivers/iommu/arm-smmu-nvidia.c F: drivers/iommu/tegra* TEGRA KBC DRIVER diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 342190196dfb0..2b8203db73ec3 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c75b9d957b702..70f7318017617 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) if (of_property_read_bool(np, "calxeda,smmu-secure-config-access")) smmu->impl = &calxeda_impl; + if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu")) + return nvidia_smmu_impl_init(smmu); + if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") || of_device_is_compatible(np, "qcom,sc7180-smmu-500")) return qcom_smmu_impl_init(smmu); diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index 0..b73c483fa3376 --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,195 @@ +// SPDX-License-Identifier: GPL-2.0-only +// NVIDIA ARM SMMU v2 implementation quirks +// Copyright (C) 2019-2020 NVIDIA CORPORATION. All rights reserved. + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* + * Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together for interleaved IOVA accesses and + * used by non-isochronous HW devices for SMMU translations. + * Third one is used for SMMU translations from isochronous HW devices. + * It is possible to use this implementation to program either + * all three or two of the instances identically as desired through + * DT node. + * + * Programming all the three instances identically comes with redundant TLB + * invalidations as all three never need to be TLB invalidated for a HW device. + * + * When Linux kernel supports multiple SMMU devices, the SMMU device used for + * isochornous HW devices should be added as a separate ARM MMU-500 device + * in DT and be programmed independently for efficient TLB invalidates. + */ +#define MAX_SMMU_INSTANCES 3 + +#define TLB_LOOP_TIMEOUT_IN_US 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + unsigned intnum_inst; + void __iomem*bases[MAX_SMMU_INSTANCES]; +}; + +static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu) +{ + return container_of(smmu, struct nvidia_smmu, smmu); +} + +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, + unsigned int inst, int page) +{ + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + if (!nvidia_smmu->bases[0]) + nvidia_smmu->bases[0] = smmu->base; + + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); +} + +static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return readl_relaxed(reg); +} + +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + for (i = 0; i < nvidia_smmu-&g
RE: [PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>Should NVIDIA_TEGRA194_SMMU be a separate value for smmu->model, perhaps? That >way we avoid this somewhat odd check here. NVIDIA haven't made any changes to arm,mmu-500. It is only used in different topology. New model would be mis-leading here. As suggested by Robin, It can just be moved to end of function. >> diff --git a/drivers/iommu/arm-smmu-nvidia.c >> b/drivers/iommu/arm-smmu-nvidia.c >I wonder if it would be better to name this arm-smmu-tegra.c to make it >clearer that this is for a Tegra chip. We do have regular expressions in >MAINTAINERS that catch anything with "tegra" in it to make this easier. >Also, the nsmmu_ prefix looks somewhat odd here. You already use struct >nvidia_smmu as the name of the structure, so why not be consistent and >continue to use nvidia_smmu_ as the prefix for function names? >Or perhaps even use tegra_smmu_ as the prefix to match the filename change I >suggested earlier. Prefix can be updated to nvidia_smmu as we seem to be okay for now to keep file name as arm-smmu-nvidia.c after the vendor name. >> +#define TLB_LOOP_TIMEOUT100 /* 1s! */ >USEC_PER_SEC? It is not meant for a conversion. Reused Timeout variable from arm-smmu.c for tlb_sync implementation. Can rename it to TLB_LOOP_TIMEOUT_IN_US. >> +} >> +dev_err_ratelimited(smmu->dev, >> +"TLB sync timed out -- SMMU may be deadlocked\n"); >Same here. >Also, is there anything we can do when this happens? This is never expected to happen on Silicon. This code and message is reused from arm-smmu.c. >> +#define nsmmu_page(smmu, inst, page) \ >> +(((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \ >> +((page) << smmu->pgshift)) >Can we simply define to_nvidia_smmu(smmu)->bases[0] = smmu->base in >nvidia_smmu_impl_init()? Then this would become just: > to_nvidia_smmu(smmu)->bases[inst] + ((page) << (smmu)->pgshift) > + >Maybe add this here to simplify the nsmmu_page() macro above: > nsmmu->bases[0] = smmu->base; This preferred to avoid the check in nsmmu_page(). But, smmu->base is not yet populated when nvidia_smmu_impl_init() is called. Let me look at the alternative place to set it. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v6 2/4] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
>> + - nvdia,tegra194-smmu-500 >The -500 suffix here seems a bit redundant since there's no other type of SMMU >in Tegra194, correct? Yeah, there is only one type of SMMU supported in T194. It was added to be synonymous with mmu-500. Can be removed. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 3/4] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow NVIDIA SMMU implementation handle faults across multiple SMMUs. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 100 drivers/iommu/arm-smmu.c| 11 +++- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 112 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index dafc293a45217..5999b6a770992 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu) return 0; } +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static irqreturn_t nsmmu_global_fault_inst(int irq, + struct arm_smmu_device *smmu, + int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + + gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR2); + + if (!gfsr) + return IRQ_NONE; + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_global_fault(int irq, void *dev) +{ + int inst; + irqreturn_t irq_ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + irq_ret = nsmmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + return irq_ret; +} + +static irqreturn_t nsmmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + +ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) + + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_context_fault(int irq, void *dev) +{ + int inst, idx; + irqreturn_t irq_ret = IRQ_NONE; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + /* Interrupt line shared between all context faults. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nsmmu_context_fault_bank(irq, smmu, + idx, inst); + + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + } + + return irq_ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nsmmu_read_reg, .write_reg = nsmmu_write_reg, @@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nsmmu_write_reg64, .reset = nsmmu_reset, .tlb_sync = nsmmu_tlb_sync, + .global_fault = nsmmu_global_fault, + .context_fault = nsmmu_context_fault, }; struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 243bc4cb2705b..d7
[PATCH v6 4/4] iommu/arm-smmu-nvidia: fix the warning reported by kbuild test robot
>> drivers/iommu/arm-smmu-nvidia.c:151:33: sparse: sparse: cast removes >> address space '' of expression Reported-by: kbuild test robot Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index 5999b6a770992..6348d8dc17fc2 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -248,7 +248,7 @@ struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) break; nsmmu->bases[i] = devm_ioremap_resource(dev, res); if (IS_ERR(nsmmu->bases[i])) - return (struct arm_smmu_device *)nsmmu->bases[i]; + return ERR_CAST(nsmmu->bases[i]); nsmmu->num_inst++; } -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 0/4] Nvidia Arm SMMUv2 Implementation
Changes in v6: Restricted the patch set to driver specific patches. Fixed the cast warning reported by kbuild test robot. Rebased on git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next v5 - https://lkml.org/lkml/2020/5/21/1114 v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (4): iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage dt-bindings: arm-smmu: Add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks iommu/arm-smmu-nvidia: fix the warning reported by kbuild test robot .../devicetree/bindings/iommu/arm,smmu.yaml | 5 + MAINTAINERS | 2 + drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 261 ++ drivers/iommu/arm-smmu.c | 11 +- drivers/iommu/arm-smmu.h | 4 + 7 files changed, 285 insertions(+), 3 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: 431275afdc7155415254aef4bd3816a1b8a2ead0 -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave IOVA accesses across them. Add NVIDIA implementation for dual ARM MMU-500s and add new compatible string for Tegra194 soc. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 161 drivers/iommu/arm-smmu.h| 1 + 5 files changed, 168 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index 50659d76976b7..118da0893c964 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16572,9 +16572,11 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported F: drivers/iommu/tegra* +F: drivers/iommu/arm-smmu-nvidia.c TEGRA KBC DRIVER M: Laxman Dewangan diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 57cf4ba5e27cb..35542df00da72 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o arm-smmu-nvidia.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c75b9d957b702..52c84c30f83e4 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -160,6 +160,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) */ switch (smmu->model) { case ARM_MMU500: + if (of_device_is_compatible(smmu->dev->of_node, + "nvidia,tegra194-smmu-500")) + return nvidia_smmu_impl_init(smmu); smmu->impl = &arm_mmu500_impl; break; case CAVIUM_SMMUV2: diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index 0..dafc293a45217 --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,161 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Nvidia ARM SMMU v2 implementation quirks +// Copyright (C) 2019 NVIDIA CORPORATION. All rights reserved. + +#define pr_fmt(fmt) "nvidia-smmu: " fmt + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together for Interleaved IOVA accesses and + * used by Non-Isochronous Hw devices for SMMU translations. + * Third one is used for SMMU translations from Isochronous HW devices. + * It is possible to use this Implementation to program either + * all three or two of the instances identically as desired through + * DT node. + * + * Programming all the three instances identically comes with redundant tlb + * invalidations as all three never need to be tlb invalidated for a HW device. + * + * When Linux Kernel supports multiple SMMU devices, The SMMU device used for + * Isochornous HW devices should be added as a separate ARM MMU-500 device + * in DT and be programmed independently for efficient tlb invalidates. + * + */ +#define MAX_SMMU_INSTANCES 3 + +#define TLB_LOOP_TIMEOUT 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + unsigned intnum_inst; + void __iomem*bases[MAX_SMMU_INSTANCES]; +}; + +#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu) + +#define nsmmu_page(smmu, inst, page) \ + (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \ + ((page) << smmu->pgshift)) + +static u32 nsmmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readl_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + + for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++) + writel_relaxed(val, nsmmu_page(smmu, i, page) + offset); +} + +static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readq_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg64(struct arm_smmu_device *smmu, + int page, int offset, u64 val) +{ + uns
[PATCH v6 2/4] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 Soc SMMU that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index e3ef1c69d1326..8f7ffd248f303 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -37,6 +37,11 @@ properties: - qcom,sc7180-smmu-500 - qcom,sdm845-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that use more than one "arm,mmu-500" +items: + - enum: + - nvdia,tegra194-smmu-500 + - const: arm,mmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation
>For the record: I don't think we should apply these because we don't have a >good way of testing them. We currently have three problems that prevent us >from enabling SMMU on Tegra194: Out of three issues pointed here, I see that only issue 2) is a real blocker for enabling SMMU HW by default in upstream. >That said, I have tested earlier versions of this patchset on top of my local >branch with fixes for the above and they do seem to work as expected. >So I'll leave it up to the IOMMU maintainers whether they're willing to merge >the driver patches as is. > But I want to clarify that I won't be applying the DTS patches until we've > solved all of the above issues and therefore it should be clear that these > won't be runtime tested until then. SMMU driver patches as such are complete and can be used by nvidia with a local config change(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT=n) to disable_bypass and Protects the driver patches against kernel changes. This config disable option is tested already by Nicolin Chen and me. Robin/Will, Can you comment if smmu driver patches alone(1,2,3 out of 5 patches) can be merged without DT enable patches? Is it reasonable to merge the driver patches alone? >1) If we enable SMMU support, then the DMA API will automatically try > to use SMMU domains for allocations. This means that translations > will happen as soon as a device's IOMMU operations are initialized > and that is typically a long time (in kernel time at least) before > a driver is bound and has a chance of configuring the device. > This causes problems for non-quiesced devices like display > controllers that the bootloader might have set up to scan out a > boot splash. > What we're missing here is a way to: > a) advertise reserved memory regions for boot splash framebuffers > b) map reserved memory regions early during SMMU setup > Patches have been floating on the public mailing lists for b) but > a) requires changes to the bootloader (both proprietary ones and > U-Boot for SoCs prior to Tegra194). This happens if SMMU translations is enabled for display before reserved Memory regions issue is fixed. This issue is not a real blocker for SMMU enable. > 2) Even if we don't enable SMMU for a given device (by not hooking up > the iommus property), with a default kernel configuration we get a > bunch of faults during boot because the ARM SMMU driver faults by > default (rather than bypass) for masters which aren't hooked up to > the SMMU. > We could work around that by changing the default configuration or > overriding it on the command-line, but that's not really an option > because it decreases security and means that Tegra194 won't work > out-of-the-box. This is the real issue that blocks enabling SMMU. The USF faults for devices that don't have SMMU translations enabled should be fixed or WAR'ed before SMMU can be enabled. We should look at keeping SID as 0x7F for the devices that can't have SMMU enabled yet. SID 0x7f bypasses SMMU externally. > 3) We don't properly describe the DMA hierarchy, which causes the DMA > masks to be improperly set. As a bit of background: Tegra194 has a > special address bit (bit 39) that causes some swizzling to happen > within the memory controller. As a result, any I/O virtual address > that has bit 39 set will cause this swizzling to happen on access. > The DMA/IOMMU allocator always starts allocating from the top of > the IOVA space, which means that the first couple of gigabytes of > allocations will cause most devices to fail because of the > undesired swizzling that occurs. > We had an initial patch for SDHCI merged that hard-codes the DMA > mask to DMA_BIT_MASK(39) on Tegra194 to work around that. However, > the devices all do support addressing 40 bits and the restriction > on bit 39 is really a property of the bus rather than a capability > of the device. This means that we would have to work around this > for every device driver by adding similar hacks. A better option is > to properly describe the DMA hierarchy (using dma-ranges) because > that will then automatically be applied as a constraint on each > device's DMA mask. > I have been working on patches to address this, but they are fairly > involved because they require device tree bindings changes and so > on. Dma_mask issue is again outside SMMU driver and as long as the clients with Dma_mask issue don't have SMMU enabled, it would be fine. SDHCI can have SMMU enabled in upstream as soon as issue 2 is taken care. >So before we solve all of the above issues we can't really enable SMMU on >Tegra194 and hence won't be able to test it. As such we don't know if these >patches even work, nor can we validate that they continue to work. >As such, I don't think there's any use in applying these patches upstream >since they
[PATCH v5 1/5] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave IOVA accesses across them. Add NVIDIA implementation for dual ARM MMU-500s and add new compatible string for Tegra194 soc. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 161 drivers/iommu/arm-smmu.h| 1 + 5 files changed, 168 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index ecc0749810b0..0d8c966ecf17 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16560,9 +16560,11 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported F: drivers/iommu/tegra* +F: drivers/iommu/arm-smmu-nvidia.c TEGRA KBC DRIVER M: Laxman Dewangan diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 57cf4ba5e27c..35542df00da7 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o arm-smmu-nvidia.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index 74d97a886e93..dcdd513323aa 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) */ switch (smmu->model) { case ARM_MMU500: + if (of_device_is_compatible(smmu->dev->of_node, + "nvidia,tegra194-smmu-500")) + return nvidia_smmu_impl_init(smmu); smmu->impl = &arm_mmu500_impl; break; case CAVIUM_SMMUV2: diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index ..dafc293a4521 --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,161 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Nvidia ARM SMMU v2 implementation quirks +// Copyright (C) 2019 NVIDIA CORPORATION. All rights reserved. + +#define pr_fmt(fmt) "nvidia-smmu: " fmt + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together for Interleaved IOVA accesses and + * used by Non-Isochronous Hw devices for SMMU translations. + * Third one is used for SMMU translations from Isochronous HW devices. + * It is possible to use this Implementation to program either + * all three or two of the instances identically as desired through + * DT node. + * + * Programming all the three instances identically comes with redundant tlb + * invalidations as all three never need to be tlb invalidated for a HW device. + * + * When Linux Kernel supports multiple SMMU devices, The SMMU device used for + * Isochornous HW devices should be added as a separate ARM MMU-500 device + * in DT and be programmed independently for efficient tlb invalidates. + * + */ +#define MAX_SMMU_INSTANCES 3 + +#define TLB_LOOP_TIMEOUT 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + unsigned intnum_inst; + void __iomem*bases[MAX_SMMU_INSTANCES]; +}; + +#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu) + +#define nsmmu_page(smmu, inst, page) \ + (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \ + ((page) << smmu->pgshift)) + +static u32 nsmmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readl_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + + for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++) + writel_relaxed(val, nsmmu_page(smmu, i, page) + offset); +} + +static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readq_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg64(struct arm_smmu_device *smmu, + int page, int offset, u64 val) +{ + uns
[PATCH v5 3/5] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow NVIDIA SMMU implementation handle faults across multiple SMMUs. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 100 drivers/iommu/arm-smmu.c| 11 +++- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 112 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index dafc293a4521..5999b6a77099 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu) return 0; } +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static irqreturn_t nsmmu_global_fault_inst(int irq, + struct arm_smmu_device *smmu, + int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + + gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR2); + + if (!gfsr) + return IRQ_NONE; + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_global_fault(int irq, void *dev) +{ + int inst; + irqreturn_t irq_ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + irq_ret = nsmmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + return irq_ret; +} + +static irqreturn_t nsmmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + +ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) + + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_context_fault(int irq, void *dev) +{ + int inst, idx; + irqreturn_t irq_ret = IRQ_NONE; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + /* Interrupt line shared between all context faults. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nsmmu_context_fault_bank(irq, smmu, + idx, inst); + + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + } + + return irq_ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nsmmu_read_reg, .write_reg = nsmmu_write_reg, @@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nsmmu_write_reg64, .reset = nsmmu_reset, .tlb_sync = nsmmu_tlb_sync, + .global_fault = nsmmu_global_fault, + .context_fault = nsmmu_context_fault, }; struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index e622f4e33379..9
[PATCH v5 5/5] arm64: tegra: enable SMMU for SDHCI and EQOS on T194
Enable SMMU translations for SDHCI and EQOS transactions on T194. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index f7c4399afb55..706bbb439dcd 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -59,6 +59,7 @@ ethernet@249 { clock-names = "master_bus", "slave_bus", "rx", "tx", "ptp_ref"; resets = <&bpmp TEGRA194_RESET_EQOS>; reset-names = "eqos"; + iommus = <&smmu TEGRA194_SID_EQOS>; status = "disabled"; snps,write-requests = <1>; @@ -457,6 +458,7 @@ sdmmc1: sdhci@340 { clock-names = "sdhci"; resets = <&bpmp TEGRA194_RESET_SDMMC1>; reset-names = "sdhci"; + iommus = <&smmu TEGRA194_SID_SDMMC1>; nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>; nvidia,pad-autocal-pull-down-offset-3v3-timeout = @@ -479,6 +481,7 @@ sdmmc3: sdhci@344 { clock-names = "sdhci"; resets = <&bpmp TEGRA194_RESET_SDMMC3>; reset-names = "sdhci"; + iommus = <&smmu TEGRA194_SID_SDMMC3>; nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>; nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>; nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>; @@ -506,6 +509,7 @@ sdmmc4: sdhci@346 { <&bpmp TEGRA194_CLK_PLLC4>; resets = <&bpmp TEGRA194_RESET_SDMMC4>; reset-names = "sdhci"; + iommus = <&smmu TEGRA194_SID_SDMMC4>; nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>; nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>; nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation
Changes in v5: Rebased on top of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (5): iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage dt-bindings: arm-smmu: Add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks arm64: tegra: Add DT node for T194 SMMU arm64: tegra: enable SMMU for SDHCI and EQOS on T194 .../devicetree/bindings/iommu/arm,smmu.yaml | 5 + MAINTAINERS | 2 + arch/arm64/boot/dts/nvidia/tegra194.dtsi | 81 ++ drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 261 ++ drivers/iommu/arm-smmu.c | 11 +- drivers/iommu/arm-smmu.h | 4 + 8 files changed, 366 insertions(+), 3 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: 365f8d504da50feaebf826d180113529c9383670 -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 4/5] arm64: tegra: Add DT node for T194 SMMU
Add DT node for T194 SMMU to enable SMMU support. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 77 1 file changed, 77 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index f4ede86e32b4..f7c4399afb55 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -1620,6 +1620,83 @@ pcie@141a { 0x8200 0x0 0x4000 0x1f 0x4000 0x0 0xc000>; /* non-prefetchable memory (3GB) */ }; + smmu: iommu@1200 { + compatible = "arm,mmu-500","nvidia,tegra194-smmu-500"; + reg = <0 0x1200 0 0x80>, + <0 0x1100 0 0x80>, + <0 0x1000 0 0x80>; + interrupts = , +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +; + stream-match-mask = <0x7f80>; + #global-interrupts = <3>; + #iommu-cells = <1>; + }; + pcie_ep@1416 { compatible = "nvidia,tegra194-pcie-ep", "snps,dw-pcie-ep"; power-domains = <&bpmp TEGRA194_POWER_DOMAIN_PCIEX4A>; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 2/5] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 Soc SMMU that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index 6515dbe47508..78aba7dd5a61 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -37,6 +37,11 @@ properties: - qcom,sc7180-smmu-500 - qcom,sdm845-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that use more than one "arm,mmu-500" +items: + - enum: + - nvdia,tegra194-smmu-500 + - const: arm,mmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 3/6] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow NVIDIA SMMU implementation handle faults across multiple SMMUs. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 100 drivers/iommu/arm-smmu.c| 11 - drivers/iommu/arm-smmu.h| 3 ++ 3 files changed, 112 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index 05d66ac..0ba7abc 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu) return 0; } +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static irqreturn_t nsmmu_global_fault_inst(int irq, + struct arm_smmu_device *smmu, + int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + + gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR2); + + if (!gfsr) + return IRQ_NONE; + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_global_fault(int irq, void *dev) +{ + int inst; + irqreturn_t irq_ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + irq_ret = nsmmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + return irq_ret; +} + +static irqreturn_t nsmmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); + if (!(fsr & FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + +ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) + + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_context_fault(int irq, void *dev) +{ + int inst, idx; + irqreturn_t irq_ret = IRQ_NONE; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + /* Interrupt line shared between all context faults. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nsmmu_context_fault_bank(irq, smmu, + idx, inst); + + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + } + + return irq_ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nsmmu_read_reg, .write_reg = nsmmu_write_reg, @@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nsmmu_write_reg64, .reset = nsmmu_reset, .tlb_sync = nsmmu_tlb_sync, + .global_fault = nsmmu_global_fault, + .context_fault = nsmmu_context_fault, }; struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 080af03..829fb4c 100644 --- a
[PATCH v4 5/6] arm64: tegra: Add DT node for T194 SMMU
Add DT node for T194 SMMU to enable SMMU support. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 77 1 file changed, 77 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 1e0b54b..6f81e90 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -1436,6 +1436,83 @@ 0x8200 0x0 0x4000 0x1f 0x4000 0x0 0xc000>; /* non-prefetchable memory (3GB) */ }; + smmu: iommu@1200 { + compatible = "arm,mmu-500","nvidia,tegra194-smmu"; + reg = <0 0x1200 0 0x80>, + <0 0x1100 0 0x80>, + <0 0x1000 0 0x80>; + interrupts = , +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +; + stream-match-mask = <0x7f80>; + #global-interrupts = <3>; + #iommu-cells = <1>; + }; + sysram@4000 { compatible = "nvidia,tegra194-sysram", "mmio-sram"; reg = <0x0 0x4000 0x0 0x5>; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 6/6] arm64: tegra: enable SMMU for SDHCI and EQOS on T194
Enable SMMU translations for SDHCI and EQOS transactions on T194. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 6f81e90..bf8ed7a 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include @@ -51,6 +52,7 @@ clock-names = "master_bus", "slave_bus", "rx", "tx", "ptp_ref"; resets = <&bpmp TEGRA194_RESET_EQOS>; reset-names = "eqos"; + iommus = <&smmu TEGRA186_SID_EQOS>; status = "disabled"; snps,write-requests = <1>; @@ -413,6 +415,7 @@ clock-names = "sdhci"; resets = <&bpmp TEGRA194_RESET_SDMMC1>; reset-names = "sdhci"; + iommus = <&smmu TEGRA186_SID_SDMMC1>; nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>; nvidia,pad-autocal-pull-down-offset-3v3-timeout = @@ -435,6 +438,7 @@ clock-names = "sdhci"; resets = <&bpmp TEGRA194_RESET_SDMMC3>; reset-names = "sdhci"; + iommus = <&smmu TEGRA186_SID_SDMMC3>; nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>; nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>; nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>; @@ -462,6 +466,7 @@ <&bpmp TEGRA194_CLK_PLLC4>; resets = <&bpmp TEGRA194_RESET_SDMMC4>; reset-names = "sdhci"; + iommus = <&smmu TEGRA186_SID_SDMMC4>; nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>; nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>; nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 2/6] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 Soc SMMU that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 3133f3b..1d72fac 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -31,6 +31,10 @@ conditions. as below, SoC-specific compatibles: "qcom,sdm845-smmu-500", "arm,mmu-500" + NVIDIA SoCs that use more than one ARM MMU-500 together + needs following SoC-specific compatibles along with "arm,mmu-500": + "nvidia,tegra194-smmu" + - reg : Base address and size of the SMMU. - #global-interrupts : The number of global interrupts exposed by the -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 0/6] Nvidia Arm SMMUv2 Implementation
Changes in v4: Rebased on top of https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/ for-joerg/arm-smmu/updates. Updated arm-smmu-nvidia.c to use the tlb_sync implementation hook. Dropped patch that updates arm_smmu_flush_ops for override as it is no longer necessary. v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (6): iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage dt-bindings: arm-smmu: Add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks arm64: tegra: Add Memory controller DT node on T194 arm64: tegra: Add DT node for T194 SMMU arm64: tegra: enable SMMU for SDHCI and EQOS on T194 .../devicetree/bindings/iommu/arm,smmu.txt | 4 + MAINTAINERS| 2 + arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 + arch/arm64/boot/dts/nvidia/tegra194.dtsi | 88 +++ drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c| 261 + drivers/iommu/arm-smmu.c | 11 +- drivers/iommu/arm-smmu.h | 4 + 9 files changed, 376 insertions(+), 3 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 4/6] arm64: tegra: Add Memory controller DT node on T194
Add Memory controller DT node on T194 and enable it. This patch is a prerequisite for SMMU enable on T194. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 6 ++ 2 files changed, 10 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi index 4c38426..82a02490 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi @@ -47,6 +47,10 @@ }; }; + memory-controller@2c0 { + status = "okay"; + }; + serial@311 { status = "okay"; }; diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 3c0cf54..1e0b54b 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -163,6 +163,12 @@ }; }; + memory-controller@2c0 { + compatible = "nvidia,tegra186-mc"; + reg = <0x02c0 0xb>; + status = "disabled"; + }; + uarta: serial@310 { compatible = "nvidia,tegra194-uart", "nvidia,tegra20-uart"; reg = <0x0310 0x40>; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 1/6] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave IOVA accesses across them. Add NVIDIA implementation for dual ARM MMU-500s and add new compatible string for Tegra194 soc. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 161 drivers/iommu/arm-smmu.h| 1 + 5 files changed, 168 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index 296de2b..0519024 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15972,9 +15972,11 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported F: drivers/iommu/tegra* +F: drivers/iommu/arm-smmu-nvidia.c TEGRA KBC DRIVER M: Laxman Dewangan diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 4f405f9..7baeade 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -13,7 +13,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o -obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o +obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index 5c87a38..1a19687 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) */ switch (smmu->model) { case ARM_MMU500: + if (of_device_is_compatible(smmu->dev->of_node, + "nvidia,tegra194-smmu")) + return nvidia_smmu_impl_init(smmu); smmu->impl = &arm_mmu500_impl; break; case CAVIUM_SMMUV2: diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index 000..05d66ac --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,161 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Nvidia ARM SMMU v2 implementation quirks +// Copyright (C) 2019 NVIDIA CORPORATION. All rights reserved. + +#define pr_fmt(fmt) "nvidia-smmu: " fmt + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together for Interleaved IOVA accesses and + * used by Non-Isochronous Hw devices for SMMU translations. + * Third one is used for SMMU translations from Isochronous HW devices. + * It is possible to use this Implementation to program either + * all three or two of the instances identically as desired through + * DT node. + * + * Programming all the three instances identically comes with redundant tlb + * invalidations as all three never need to be tlb invalidated for a HW device. + * + * When Linux Kernel supports multiple SMMU devices, The SMMU device used for + * Isochornous HW devices should be added as a separate ARM MMU-500 device + * in DT and be programmed independently for efficient tlb invalidates. + * + */ +#define MAX_SMMU_INSTANCES 3 + +#define TLB_LOOP_TIMEOUT 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + unsigned intnum_inst; + void __iomem*bases[MAX_SMMU_INSTANCES]; +}; + +#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu) + +#define nsmmu_page(smmu, inst, page) \ + (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \ + ((page) << smmu->pgshift)) + +static u32 nsmmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readl_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + + for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++) + writel_relaxed(val, nsmmu_page(smmu, i, page) + offset); +} + +static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readq_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg64(struct arm_smmu_device *smmu, + int page, int offset, u64 val) +{ + unsigned int i; + + for (i = 0; i < to_nvidia_smmu(smmu)->num_ins
RE: [PATCH v3 0/7] Nvidia Arm SMMUv2 Implementation
>>https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates Thanks Will! Let me rebase my patches on top of this branch and send it out. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v3 0/7] Nvidia Arm SMMUv2 Implementation
Hi Robin, >>Apologies for crossed wires, but I had a series getting rid of >>arm_smmu_flush_ops which was also meant to end up making things a bit easier >>for you: I was looking to rebase on top of your changes first. Then I read Will's reply that said your work is queued for 5.5. Let me know if these patches need to rebased on top of iommu/devel or a different branch. I can resend the patch set on top of necessary branch. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 3/7] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 Soc SMMU that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 3133f3b..1d72fac 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -31,6 +31,10 @@ conditions. as below, SoC-specific compatibles: "qcom,sdm845-smmu-500", "arm,mmu-500" + NVIDIA SoCs that use more than one ARM MMU-500 together + needs following SoC-specific compatibles along with "arm,mmu-500": + "nvidia,tegra194-smmu" + - reg : Base address and size of the SMMU. - #global-interrupts : The number of global interrupts exposed by the -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 0/7] Nvidia Arm SMMUv2 Implementation
Changes in v3: Rebased on top of https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/ next. Resolved compile error seen with tegra194.dtsi changes after rebase. v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (7): iommu/arm-smmu: prepare arm_smmu_flush_ops for override iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage dt-bindings: arm-smmu: Add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks arm64: tegra: Add Memory controller DT node on T194 arm64: tegra: Add DT node for T194 SMMU arm64: tegra: enable SMMU for SDHCI and EQOS on T194 .../devicetree/bindings/iommu/arm,smmu.txt | 4 + MAINTAINERS| 2 + arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 + arch/arm64/boot/dts/nvidia/tegra194.dtsi | 88 +++ drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c| 287 + drivers/iommu/arm-smmu.c | 27 +- drivers/iommu/arm-smmu.h | 8 +- 9 files changed, 413 insertions(+), 12 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c -- 2.7.4
[PATCH v3 2/7] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave IOVA accesses across them. Add NVIDIA implementation for dual ARM MMU-500s and add new compatible string for Tegra194 soc. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 187 drivers/iommu/arm-smmu.h| 1 + 5 files changed, 194 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index a69e6db..b61dbda 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15974,9 +15974,11 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported F: drivers/iommu/tegra* +F: drivers/iommu/arm-smmu-nvidia.c TEGRA KBC DRIVER M: Laxman Dewangan diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 35d1709..4ef8b74 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -14,7 +14,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o -obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o +obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index 5c87a38..1a19687 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) */ switch (smmu->model) { case ARM_MMU500: + if (of_device_is_compatible(smmu->dev->of_node, + "nvidia,tegra194-smmu")) + return nvidia_smmu_impl_init(smmu); smmu->impl = &arm_mmu500_impl; break; case CAVIUM_SMMUV2: diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index 000..ca871dc --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,187 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Nvidia ARM SMMU v2 implementation quirks +// Copyright (C) 2019 NVIDIA CORPORATION. All rights reserved. + +#define pr_fmt(fmt) "nvidia-smmu: " fmt + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together for Interleaved IOVA accesses and + * used by Non-Isochronous Hw devices for SMMU translations. + * Third one is used for SMMU translations from Isochronous HW devices. + * It is possible to use this Implementation to program either + * all three or two of the instances identically as desired through + * DT node. + * + * Programming all the three instances identically comes with redundant tlb + * invalidations as all three never need to be tlb invalidated for a HW device. + * + * When Linux Kernel supports multiple SMMU devices, The SMMU device used for + * Isochornous HW devices should be added as a separate ARM MMU-500 device + * in DT and be programmed independently for efficient tlb invalidates. + * + */ +#define MAX_SMMU_INSTANCES 3 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + unsigned intnum_inst; + void __iomem*bases[MAX_SMMU_INSTANCES]; +}; + +#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu) + +#define nsmmu_page(smmu, inst, page) \ + (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \ + ((page) << smmu->pgshift)) + +static u32 nsmmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readl_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + + for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++) + writel_relaxed(val, nsmmu_page(smmu, i, page) + offset); +} + +static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readq_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg64(struct arm_smmu_device *smmu, + int page, int offset, u64 val) +{ + unsigned int i; + + for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++) + writeq_relaxed(val, nsmmu_page(smmu, i, page) + offset); +} + +static void nsmmu_t
[PATCH v3 5/7] arm64: tegra: Add Memory controller DT node on T194
Add Memory controller DT node on T194 and enable it. This patch is a prerequisite for SMMU enable on T194. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 6 ++ 2 files changed, 10 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi index 4c38426..82a02490 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi @@ -47,6 +47,10 @@ }; }; + memory-controller@2c0 { + status = "okay"; + }; + serial@311 { status = "okay"; }; diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 3c0cf54..1e0b54b 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -163,6 +163,12 @@ }; }; + memory-controller@2c0 { + compatible = "nvidia,tegra186-mc"; + reg = <0x02c0 0xb>; + status = "disabled"; + }; + uarta: serial@310 { compatible = "nvidia,tegra194-uart", "nvidia,tegra20-uart"; reg = <0x0310 0x40>; -- 2.7.4
[PATCH v3 1/7] iommu/arm-smmu: prepare arm_smmu_flush_ops for override
Remove const keyword for arm_smmu_flush_ops in arm_smmu_domain and replace direct references to arm_smmu_tlb_sync* functions with arm_smmu_flush_ops->tlb_sync(). This is necessary for vendor specific implementations that need to override arm_smmu_flush_ops in part or full. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu.c | 16 drivers/iommu/arm-smmu.h | 4 +++- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 91af695..fc0b27d 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -52,9 +52,6 @@ */ #define QCOM_DUMMY_VAL -1 -#define TLB_LOOP_TIMEOUT 100 /* 1s! */ -#define TLB_SPIN_COUNT 10 - #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 @@ -290,6 +287,8 @@ static void arm_smmu_tlb_sync_vmid(void *cookie) static void arm_smmu_tlb_inv_context_s1(void *cookie) { struct arm_smmu_domain *smmu_domain = cookie; + const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops; + /* * The TLBI write may be relaxed, so ensure that PTEs cleared by the * current CPU are visible beforehand. @@ -297,18 +296,19 @@ static void arm_smmu_tlb_inv_context_s1(void *cookie) wmb(); arm_smmu_cb_write(smmu_domain->smmu, smmu_domain->cfg.cbndx, ARM_SMMU_CB_S1_TLBIASID, smmu_domain->cfg.asid); - arm_smmu_tlb_sync_context(cookie); + ops->tlb_sync(cookie); } static void arm_smmu_tlb_inv_context_s2(void *cookie) { struct arm_smmu_domain *smmu_domain = cookie; struct arm_smmu_device *smmu = smmu_domain->smmu; + const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops; /* See above */ wmb(); arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, smmu_domain->cfg.vmid); - arm_smmu_tlb_sync_global(smmu); + ops->tlb_sync(cookie); } static void arm_smmu_tlb_inv_range_s1(unsigned long iova, size_t size, @@ -410,7 +410,7 @@ static void arm_smmu_tlb_add_page(struct iommu_iotlb_gather *gather, ops->tlb_inv_range(iova, granule, granule, true, cookie); } -static const struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops = { +static struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops = { .tlb = { .tlb_flush_all = arm_smmu_tlb_inv_context_s1, .tlb_flush_walk = arm_smmu_tlb_inv_walk, @@ -421,7 +421,7 @@ static const struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops = { .tlb_sync = arm_smmu_tlb_sync_context, }; -static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = { +static struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = { .tlb = { .tlb_flush_all = arm_smmu_tlb_inv_context_s2, .tlb_flush_walk = arm_smmu_tlb_inv_walk, @@ -432,7 +432,7 @@ static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = { .tlb_sync = arm_smmu_tlb_sync_context, }; -static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v1 = { +static struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v1 = { .tlb = { .tlb_flush_all = arm_smmu_tlb_inv_context_s2, .tlb_flush_walk = arm_smmu_tlb_inv_walk, diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index b19b6ca..b2d6c7f 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -207,6 +207,8 @@ enum arm_smmu_cbar_type { /* Maximum number of context banks per SMMU */ #define ARM_SMMU_MAX_CBS 128 +#define TLB_LOOP_TIMEOUT 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 /* Shared driver definitions */ enum arm_smmu_arch_version { @@ -314,7 +316,7 @@ struct arm_smmu_flush_ops { struct arm_smmu_domain { struct arm_smmu_device *smmu; struct io_pgtable_ops *pgtbl_ops; - const struct arm_smmu_flush_ops *flush_ops; + struct arm_smmu_flush_ops *flush_ops; struct arm_smmu_cfg cfg; enum arm_smmu_domain_stage stage; boolnon_strict; -- 2.7.4
[PATCH v3 6/7] arm64: tegra: Add DT node for T194 SMMU
Add DT node for T194 SMMU to enable SMMU support. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 77 1 file changed, 77 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 1e0b54b..6f81e90 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -1436,6 +1436,83 @@ 0x8200 0x0 0x4000 0x1f 0x4000 0x0 0xc000>; /* non-prefetchable memory (3GB) */ }; + smmu: iommu@1200 { + compatible = "arm,mmu-500","nvidia,tegra194-smmu"; + reg = <0 0x1200 0 0x80>, + <0 0x1100 0 0x80>, + <0 0x1000 0 0x80>; + interrupts = , +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +; + stream-match-mask = <0x7f80>; + #global-interrupts = <3>; + #iommu-cells = <1>; + }; + sysram@4000 { compatible = "nvidia,tegra194-sysram", "mmio-sram"; reg = <0x0 0x4000 0x0 0x5>; -- 2.7.4
[PATCH v3 4/7] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow NVIDIA SMMU implementation handle faults across multiple SMMUs. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 100 drivers/iommu/arm-smmu.c| 11 - drivers/iommu/arm-smmu.h| 3 ++ 3 files changed, 112 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index ca871dc..2a19d41 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -143,6 +143,104 @@ static int nsmmu_init_context(struct arm_smmu_domain *smmu_domain) return 0; } +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static irqreturn_t nsmmu_global_fault_inst(int irq, + struct arm_smmu_device *smmu, + int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + + gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR2); + + if (!gfsr) + return IRQ_NONE; + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_global_fault(int irq, void *dev) +{ + int inst; + irqreturn_t irq_ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + irq_ret = nsmmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + return irq_ret; +} + +static irqreturn_t nsmmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); + if (!(fsr & FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + +ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) + + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_context_fault(int irq, void *dev) +{ + int inst, idx; + irqreturn_t irq_ret = IRQ_NONE; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + /* Interrupt line shared between all context faults. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nsmmu_context_fault_bank(irq, smmu, + idx, inst); + + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + } + + return irq_ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nsmmu_read_reg, .write_reg = nsmmu_write_reg, @@ -150,6 +248,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nsmmu_write_reg64, .reset = nsmmu_reset, .init_context = nsmmu_init_context, + .global_fault = nsmmu_global_fault, + .context_fault = nsmmu_context_fault, }; struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index fc0b2
[PATCH v3 7/7] arm64: tegra: enable SMMU for SDHCI and EQOS on T194
Enable SMMU translations for SDHCI and EQOS transactions on T194. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 6f81e90..bf8ed7a 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include @@ -51,6 +52,7 @@ clock-names = "master_bus", "slave_bus", "rx", "tx", "ptp_ref"; resets = <&bpmp TEGRA194_RESET_EQOS>; reset-names = "eqos"; + iommus = <&smmu TEGRA186_SID_EQOS>; status = "disabled"; snps,write-requests = <1>; @@ -413,6 +415,7 @@ clock-names = "sdhci"; resets = <&bpmp TEGRA194_RESET_SDMMC1>; reset-names = "sdhci"; + iommus = <&smmu TEGRA186_SID_SDMMC1>; nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>; nvidia,pad-autocal-pull-down-offset-3v3-timeout = @@ -435,6 +438,7 @@ clock-names = "sdhci"; resets = <&bpmp TEGRA194_RESET_SDMMC3>; reset-names = "sdhci"; + iommus = <&smmu TEGRA186_SID_SDMMC3>; nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>; nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>; nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>; @@ -462,6 +466,7 @@ <&bpmp TEGRA194_CLK_PLLC4>; resets = <&bpmp TEGRA194_RESET_SDMMC4>; reset-names = "sdhci"; + iommus = <&smmu TEGRA186_SID_SDMMC4>; nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>; nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>; nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>; -- 2.7.4
[PATCH v2 7/7] arm64: tegra: enable SMMU for SDHCI and EQOS on T194
Enable SMMU translations for SDHCI and EQOS transactions on T194. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 5ae3bbf..cac3462 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -51,6 +51,7 @@ clock-names = "master_bus", "slave_bus", "rx", "tx", "ptp_ref"; resets = <&bpmp TEGRA194_RESET_EQOS>; reset-names = "eqos"; + iommus = <&smmu TEGRA186_SID_EQOS>; status = "disabled"; snps,write-requests = <1>; @@ -381,6 +382,7 @@ clock-names = "sdhci"; resets = <&bpmp TEGRA194_RESET_SDMMC1>; reset-names = "sdhci"; + iommus = <&smmu TEGRA186_SID_SDMMC1>; nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>; nvidia,pad-autocal-pull-down-offset-3v3-timeout = @@ -403,6 +405,7 @@ clock-names = "sdhci"; resets = <&bpmp TEGRA194_RESET_SDMMC3>; reset-names = "sdhci"; + iommus = <&smmu TEGRA186_SID_SDMMC3>; nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>; nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>; nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>; @@ -430,6 +433,7 @@ <&bpmp TEGRA194_CLK_PLLC4>; resets = <&bpmp TEGRA194_RESET_SDMMC4>; reset-names = "sdhci"; + iommus = <&smmu TEGRA186_SID_SDMMC4>; nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>; nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>; nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>; -- 2.1.4
[PATCH v2 3/7] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 Soc SMMU that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 3133f3b..1d72fac 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -31,6 +31,10 @@ conditions. as below, SoC-specific compatibles: "qcom,sdm845-smmu-500", "arm,mmu-500" + NVIDIA SoCs that use more than one ARM MMU-500 together + needs following SoC-specific compatibles along with "arm,mmu-500": + "nvidia,tegra194-smmu" + - reg : Base address and size of the SMMU. - #global-interrupts : The number of global interrupts exposed by the -- 2.1.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/7] Nvidia Arm SMMUv2 Implementation
Changes in v2: - Prepare arm_smu_flush_ops for override. - Remove NVIDIA_SMMUv2 and use ARM_SMMUv2 model as T194 SMMU hasn't modified ARM MMU-500. - Add T194 specific compatible string - "nvidia,tegra194-smmu" - Remove tlb_sync hook added in v1 and Override arm_smmu_flush_ops->tlb_sync() from implementation. - Register implementation specific context/global fault hooks directly for irq handling. - Update global/context interrupt list in DT and releant fault handling code in arm-smmu-nvidia.c. - Implement reset hook in arm-smmu-nvidia.c to clear irq status and sync tlb. v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (7): iommu/arm-smmu: prepare arm_smmu_flush_ops for override iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage dt-bindings: arm-smmu: Add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks arm64: tegra: Add Memory controller DT node on T194 arm64: tegra: Add DT node for T194 SMMU arm64: tegra: enable SMMU for SDHCI and EQOS on T194 .../devicetree/bindings/iommu/arm,smmu.txt | 4 + MAINTAINERS| 2 + arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 + arch/arm64/boot/dts/nvidia/tegra194.dtsi | 88 +++ drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c| 287 + drivers/iommu/arm-smmu.c | 27 +- drivers/iommu/arm-smmu.h | 8 +- 9 files changed, 413 insertions(+), 12 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c -- 2.1.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/7] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave IOVA accesses across them. Add NVIDIA implementation for dual ARM MMU-500s and add new compatible string for Tegra194 soc. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 187 drivers/iommu/arm-smmu.h| 1 + 5 files changed, 194 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index 74e9d9c..c9b802a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15807,9 +15807,11 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported F: drivers/iommu/tegra* +F: drivers/iommu/arm-smmu-nvidia.c TEGRA KBC DRIVER M: Laxman Dewangan diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 7caad48..556b94c 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -13,7 +13,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o -obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o +obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index 5c87a38..1a19687 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) */ switch (smmu->model) { case ARM_MMU500: + if (of_device_is_compatible(smmu->dev->of_node, + "nvidia,tegra194-smmu")) + return nvidia_smmu_impl_init(smmu); smmu->impl = &arm_mmu500_impl; break; case CAVIUM_SMMUV2: diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index 000..ca871dc --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,187 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Nvidia ARM SMMU v2 implementation quirks +// Copyright (C) 2019 NVIDIA CORPORATION. All rights reserved. + +#define pr_fmt(fmt) "nvidia-smmu: " fmt + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together for Interleaved IOVA accesses and + * used by Non-Isochronous Hw devices for SMMU translations. + * Third one is used for SMMU translations from Isochronous HW devices. + * It is possible to use this Implementation to program either + * all three or two of the instances identically as desired through + * DT node. + * + * Programming all the three instances identically comes with redundant tlb + * invalidations as all three never need to be tlb invalidated for a HW device. + * + * When Linux Kernel supports multiple SMMU devices, The SMMU device used for + * Isochornous HW devices should be added as a separate ARM MMU-500 device + * in DT and be programmed independently for efficient tlb invalidates. + * + */ +#define MAX_SMMU_INSTANCES 3 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + unsigned intnum_inst; + void __iomem*bases[MAX_SMMU_INSTANCES]; +}; + +#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu) + +#define nsmmu_page(smmu, inst, page) \ + (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \ + ((page) << smmu->pgshift)) + +static u32 nsmmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readl_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + + for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++) + writel_relaxed(val, nsmmu_page(smmu, i, page) + offset); +} + +static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readq_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg64(struct arm_smmu_device *smmu, + int page, int offset, u64 val) +{ + unsigned int i; + + for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++) + writeq_relaxed(val, nsmmu_page(smmu, i, page) + offset); +} + +static void nsmmu_t