RE: [PATCH 02/13] iommu: Move bus setup to IOMMU device registration

2022-04-22 Thread Krishna Reddy via iommu
Good effort to isolate bus config from smmu drivers.
Reviewed-By: Krishna Reddy 

I have an orthogonal question here.
Can the following code handle the case, where different buses have different 
type of SMMU instances(like one bus has SMMUv2 and another bus has SMMUv3)?
If it need to handle the above case, can the smmu device bus be matched with 
specific bus here and ops set only for that bus? 


> +   for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) {
> +   struct bus_type *bus = iommu_buses[i];
> +   const struct iommu_ops *bus_ops = bus->iommu_ops;
> +   int err;
> +
> +   WARN_ON(bus_ops && bus_ops != ops);
> +   bus->iommu_ops = ops;
> +   err = bus_iommu_probe(bus);
> +   if (err) {
> +   bus_for_each_dev(bus, NULL, iommu,
> remove_iommu_group);
> +   bus->iommu_ops = bus_ops;
> +   return err;
> +   }
> +   }


-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [Patch v2] iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu

2022-04-21 Thread Krishna Reddy via iommu
> Tegra194 and Tegra234 SoCs have the erratum that causes walk cache entries to
> not be invalidated correctly. The problem is that the walk cache index 
> generated
> for IOVA is not same across translation and invalidation requests. This is 
> leading
> to page faults when PMD entry is released during unmap and populated with
> new PTE table during subsequent map request. Disabling large page mappings
> avoids the release of PMD entry and avoid translations seeing stale PMD entry 
> in
> walk cache.
> Fix this by limiting the page mappings to PAGE_SIZE for Tegra194 and
> Tegra234 devices. This is recommended fix from Tegra hardware design team.
> 
> Co-developed-by: Pritesh Raithatha 
> Signed-off-by: Pritesh Raithatha 
> Signed-off-by: Ashish Mhetre 
> ---
> Changes in v2:
> - Using init_context() to override pgsize_bitmap instead of new function
> 
>  drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c | 30
> 
>  1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
> b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
> index 01e9b50b10a1..87bf522b9d2e 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
> @@ -258,6 +258,34 @@ static void nvidia_smmu_probe_finalize(struct
> arm_smmu_device *smmu, struct devi
>   dev_name(dev), err);
>  }
> 
> +static int nvidia_smmu_init_context(struct arm_smmu_domain
> *smmu_domain,
> + struct io_pgtable_cfg *pgtbl_cfg,
> + struct device *dev)
> +{
> + struct arm_smmu_device *smmu = smmu_domain->smmu;
> + const struct device_node *np = smmu->dev->of_node;
> +
> + /*
> +  * Tegra194 and Tegra234 SoCs have the erratum that causes walk
> cache
> +  * entries to not be invalidated correctly. The problem is that the walk
> +  * cache index generated for IOVA is not same across translation and
> +  * invalidation requests. This is leading to page faults when PMD entry
> +  * is released during unmap and populated with new PTE table during
> +  * subsequent map request. Disabling large page mappings avoids the
> +  * release of PMD entry and avoid translations seeing stale PMD entry in
> +  * walk cache.
> +  * Fix this by limiting the page mappings to PAGE_SIZE on Tegra194 and
> +  * Tegra234.
> +  */
> + if (of_device_is_compatible(np, "nvidia,tegra234-smmu") ||
> + of_device_is_compatible(np, "nvidia,tegra194-smmu")) {
> + smmu->pgsize_bitmap = PAGE_SIZE;
> + pgtbl_cfg->pgsize_bitmap = smmu->pgsize_bitmap;
> + }
> +
> + return 0;
> +}
> +
>  static const struct arm_smmu_impl nvidia_smmu_impl = {
>   .read_reg = nvidia_smmu_read_reg,
>   .write_reg = nvidia_smmu_write_reg,
> @@ -268,10 +296,12 @@ static const struct arm_smmu_impl
> nvidia_smmu_impl = {
>   .global_fault = nvidia_smmu_global_fault,
>   .context_fault = nvidia_smmu_context_fault,
>   .probe_finalize = nvidia_smmu_probe_finalize,
> + .init_context = nvidia_smmu_init_context,
>  };
> 
>  static const struct arm_smmu_impl nvidia_smmu_single_impl = {
>   .probe_finalize = nvidia_smmu_probe_finalize,
> + .init_context = nvidia_smmu_init_context,
>  };
> 

Reviewed-by: Krishna Reddy 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)

2021-09-27 Thread Krishna Reddy via iommu
Hi Eric,
> This is based on Jean-Philippe's
> [PATCH v14 00/10] iommu: I/O page faults for SMMUv3
> https://www.spinics.net/lists/arm-kernel/msg886518.html
> (including the patches that were not pulled for 5.13)
>

Jean's patches have been merged to v5.14.
Do you anticipate IOMMU/VFIO part patches getting into upstream kernel soon?

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/arm: Cleanup resources in case of probe error path

2021-06-30 Thread Krishna Reddy
> +
> +err_pm_disable:
> +   pm_runtime_disable(dev);
>  return ret;
>   }

Should it be pm_runtime_force_suspend()?  qcom_iommu_device_remove() doesn't 
use pm_runtime_disable(dev).

875 static int qcom_iommu_device_remove(struct platform_device *pdev)
876 {
...
881 >---pm_runtime_force_suspend(>dev);
882 >---platform_set_drvdata(pdev, NULL);
883 >---iommu_device_sysfs_remove(_iommu->iommu);
884 >---iommu_device_unregister(_iommu->iommu);
...
887 }

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list

2021-06-17 Thread Krishna Reddy
> Instead of flush_ops in init_context hook, perhaps a io_pgtable quirk since 
> this is
> related to tlb, probably a bad name but IO_PGTABLE_QUIRK_TLB_INV which will
> be set in init_context impl hook and the prev condition in
> io_pgtable_tlb_flush_walk()
> becomes something like below. Seems very minimal and neat instead of poking
> into tlb_flush_walk functions or touching dma strict with some flag?
> 
> if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT ||
>  iop->cfg.quirks & IO_PGTABLE_QUIRK_TLB_INV) {
>  iop->cfg.tlb->tlb_flush_all(iop->cookie);
>  return;
> }

Can you name it as IO_PGTABLE_QUIRK_TLB_INV_ASID or 
IO_PGTABLE_QUIRK_TLB_INV_ALL_ASID?

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 3/9] iommu/arm-smmu: Implement ->probe_finalize()

2021-06-15 Thread Krishna Reddy
> if (smmu->impl->probe_finalize)

The above is the issue. It should be updated as below similar to other 
instances impl callbacks.
if (smmu->impl && smmu->impl->probe_finalize)

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list

2021-06-14 Thread Krishna Reddy
> Right but we won't know until we profile the specific usecases or try them in
> generic workload to see if they affect the performance. Sure, over 
> invalidation is
> a concern where multiple buffers can be mapped to same context and the cache
> is not usable at the time for lookup and such but we don't do it for small 
> buffers
> and only for large buffers which means thousands of TLB entry mappings in
> which case TLBIASID is preferred (note: I mentioned the HW team
> recommendation to use it for anything greater than 128 TLB entries) in my
> earlier reply. And also note that we do this only for partial walk flush, we 
> are not
> arbitrarily changing all the TLBIs to ASID based.

Most of the heavy bw use cases does involve processing larger buffers.
When the physical memory is allocated dis-contiguously at page_size (let's use 
4KB here)
granularity, each aligned 2MB chunks IOVA unmap would involve performing a 
TLBIASID
as 2MB is not a leaf. Essentially, It happens all the time during large buffer 
unmaps and
potentially impact active traffic on other large buffers. Depending on how much
latency HW engines can absorb, the overflow/underflow issues for ISO engines 
can be
sporadic and vendor specific. 
Performing TLBIASID as default for all SoCs is not a safe operation.


> I am no camera expert but from what the camera team mentioned is that there
> is a thread which frees memory(large unused memory buffers) periodically which
> ends up taking around 100+ms and causing some camera test failures with
> frame drops. Parallel efforts are already being made to optimize this usage of
> thread but as I mentioned previously, this is *not a camera specific*, lets 
> say
> someone else invokes such large unmaps, it's going to face the same issue.

>From the above, It doesn't look like the root cause of frame drops is fully 
>understood.
Why is 100+ms delay causing camera frame drop?  Is the same thread submitting 
the buffers
to camera after unmap is complete? If not, how is the unmap latency causing 
issue here?
 

> > If unmap is queued and performed on a back ground thread, would it
> > resolve the frame drops?
> 
> Not sure I understand what you mean by queuing on background thread but with
> that or not, we still do the same number of TLBIs and hop through
> iommu->io-pgtable->arm-smmu to perform the the unmap, so how will that
> help?

I mean adding the unmap requests into a queue and processing them from a 
different thread.
It is not to reduce the TLBIs. But, not to block subsequent buffer allocation, 
IOVA map requests, if they
are being requested from same thread that is performing unmap. If unmap is 
already performed from
a different thread, then the issue still need to be root caused to understand 
it fully. Check for any
serialization issues. 


-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 1/2] iommu: Fix race condition during default domain allocation

2021-06-11 Thread Krishna Reddy
> > + mutex_lock(>mutex);
> >   iommu_alloc_default_domain(group, dev);
> > + mutex_unlock(>mutex);
> 
> It feels wrong to serialise this for everybody just to cater for systems with
> aliasing SIDs between devices.

Serialization is limited to devices in the same group. Unless devices share 
SID, they wouldn't be in same group.

> Can you provide some more information about exactly what the h/w
> configuration is, and the callstack which exhibits the race, please?

The failure is an after effect and is a page fault.  Don't have a failure call 
stack here. Ashish has traced it through print messages and he can provide them.

>From the prints messages, The following was observed in page fault case:

Device1:  iommu_probe_device() --> iommu_alloc_default_domain() --> 
iommu_group_alloc_default_domain() --> 
__iommu_attach_device(group->default_domain)
Device2:  iommu_probe_device() --> iommu_alloc_default_domain() --> 
iommu_group_alloc_default_domain() --> 
__iommu_attach_device(group->default_domain)

Both devices(with same SID) are entering into 
iommu_group_alloc_default_domain() function and each one getting attached to a 
different group->default_domain 
as the second one overwrites  group->default_domain after the first one 
attaches to group->default_domain it has created.

SMMU would be setup to use first domain for the context page table. Whereas all 
the dma map/unamp requests from second device would
be performed on a domain that is not used by SMMU for context translations and 
IOVA (not mapped in first domain) accesses from second device lead to page 
faults. 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list

2021-06-11 Thread Krishna Reddy
Hi Sai,
> >> > No, the unmap latency is not just in some test case written, the
> >> > issue is very real and we have workloads where camera is reporting
> >> > frame drops because of this unmap latency in the order of 100s of
> milliseconds.

> Not exactly, this issue is not specific to camera. If you look at the numbers 
> in the
> commit text, even for the test device its the same observation. It depends on
> the buffer size we are unmapping which affects the number of TLBIs issue. I am
> not aware of any such HW side bw issues for camera specifically on QCOM
> devices.

It is clear that reducing number of TLBIs  reduces the umap API latency. But, 
It is
at the expense of throwing away valid tlb entries. 
Quantifying the impact of arbitrary invalidation of valid tlb entries at 
context level is not straight forward and
use case dependent. The side-effects might be rare or won't be known until they 
are noticed.
Can you provide more details on How the unmap latency is causing camera to drop 
frames?
Is unmap performed in the perf path?
If unmap is queued and performed on a back ground thread, would it resolve the 
frame drops?

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list

2021-06-10 Thread Krishna Reddy
> > No, the unmap latency is not just in some test case written, the issue
> > is very real and we have workloads where camera is reporting frame
> > drops because of this unmap latency in the order of 100s of milliseconds.
> > And hardware team recommends using ASID based invalidations for
> > anything larger than 128 TLB entries. So yes, we have taken note of
> > impacts here before going this way and hence feel more inclined to
> > make this qcom specific if required.

Seems like the real issue here is not the unmap API latency.
It should be the high number of back to back SMMU TLB invalidate register 
writes that is resulting
in lower ISO BW to Camera and overflow. Isn't it?
Even Tegra186 SoC has similar issue and HW team recommended to rate limit the 
number of
back to back SMMU tlb invalidate registers writes. The subsequent Tegra194 SoC 
has a dedicated SMMU for
ISO clients to avoid the impact of TLB invalidates from NISO clients on ISO BW.

>> Thinking some more, I
>> wonder if the Tegra folks might have an opinion to add here, given 
>> that their multiple-SMMU solution was seemingly about trying to get 
>> enough TLB and pagetable walk bandwidth in the first place?

While it is good to reduce the number of tlb register writes, Flushing all TLB 
entries at context granularity arbitrarily
can have negative impact on active traffic and BW. I don't have much data on 
possible impact at this point.
Can the flushing at context granularity be made a quirk than performing it as 
default? 

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 0/5] iommu: Support identity mappings of reserved-memory regions

2021-04-28 Thread Krishna Reddy
Hi Dmitry,

> Thank you for the answer. Could you please give more information about:
> 1) Are you on software or hardware team, or both?

I am in the software team and has contributed to initial Tegra SMMU driver in 
the downstream along with earlier team member Hiroshi Doyu.

> 2) Is SMMU a third party IP or developed in-house?

Tegra SMMU is developed in-house. 

> 3) Do you have a direct access to HDL sources? Are you 100% sure that
> hardware does what you say?

It was discussed with Hardware team before and again today as well.
Enabling ASID for display engine while it continues to access the buffer memory 
is a safe operation.
As per HW team, The only side-effect that can happen is additional latency to 
transaction as SMMU caches get warmed up.

> 4) What happens when CPU writes to ASID register? Does SMMU state machine
> latch ASID status (making it visible) only at a single "safe" point?

MC makes a decision on routing transaction through either SMMU page tables or 
bypassing based on the ASID register value.  It
checks the ASID register only once per transaction in the pipeline.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 0/5] iommu: Support identity mappings of reserved-memory regions

2021-04-27 Thread Krishna Reddy
> Is it always safe to enable SMMU ASID in a middle of a DMA request made by a
> memory client?

>From MC point of view, It is safe to enable and has been this way from many 
>years in downstream code for display engine.
It doesn't impact the transactions that have already bypassed SMMU before 
enabling SMMU ASID. 
Transactions that are yet to pass SMMU stage would go through SMMU once SMMU 
ASID is enabled and visible.

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2021-04-23 Thread Krishna Reddy
>> Did that patch cause any issue, or is it just not needed on your system?
>> It fixes an hypothetical problem with the way ATS is implemented. 
>> Maybe I actually observed it on an old software model, I don't 
>> remember. Either way it's unlikely to go upstream but I'd like to know 
>> if I should drop it from my tree.

> Had to revert same patch "mm: notify remote TLBs when dirtying a PTE" to
> avoid below crash[1]. I am not sure about the cause yet.

I have noticed this issue earlier with patch pointed here and root caused the 
issue as below.
It happens after vfio_mmap request from QEMU for the PCIe device and during the 
access of VA when
PTE access flags are updated. 

kvm_mmu_notifier_change_pte() --> kvm_set_spte_hve() --> kvm_set_spte_hva() --> 
clean_dcache_guest_page()

The validation model doesn't have FWB capability supported.
__clean_dcache_guest_page() attempts to perform dcache flush on pcie bar 
address(not a valid_pfn()) through page_address(),
which doesn't have page table mapping and leads to exception.

I have worked around the issue by filtering out the request if the pfn is not 
valid in  __clean_dcache_guest_page(). 
As the patch wasn't posted in the community, reverted it as well.

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 04/10] iommu/arm-smmu: tegra: Detect number of instances at runtime

2021-04-20 Thread Krishna Reddy
>+static const struct arm_smmu_impl nvidia_smmu_single_impl = {
>+   .probe_finalize = nvidia_smmu_probe_finalize,
>+};

nvidia_smmu_probe_finalize is used before it is defined. It is defined in patch 
5.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v13 00/10] iommu: I/O page faults for SMMUv3

2021-03-17 Thread Krishna Reddy
Tested-by: Krishna Reddy 

Validated v13 patches in context of nested translations validation for NVMe 
PCIe device assigned to Guest VM.
V12 patches(v13 is yet to be tested) has been tested for SVA functionality on 
Native OS and is functional. 

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2021-03-17 Thread Krishna Reddy
Tested-by: Krishna Reddy 

Validated nested translations with NVMe PCI device assigned to Guest VM. 
Tested with both v12 and v13 of Jean-Philippe's patches as base.

> This is based on Jean-Philippe's
> [PATCH v12 00/10] iommu: I/O page faults for SMMUv3
> https://lore.kernel.org/linux-arm-kernel/YBfij71tyYvh8LhB@myrica/T/

With Jean-Philippe's V13, Patch 12 of this series has a conflict that had to be 
resolved manually.

-KR


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v12 00/13] SMMUv3 Nested Stage Setup (VFIO part)

2021-03-17 Thread Krishna Reddy
Tested-by: Krishna Reddy 

Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM and is 
functional.

This patch series resolved the mismatch(seen with v11 patches) for 
VFIO_IOMMU_SET_PASID_TABLE and VFIO_IOMMU_CACHE_INVALIDATE Ioctls between linux 
and QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" 
(v5.2.0-2stage-rfcv8). 

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

2021-03-16 Thread Krishna Reddy
> Hi Krishna,
> On 3/15/21 7:04 PM, Krishna Reddy wrote:
> > Tested-by: Krishna Reddy 
> >
> >> 1) pass the guest stage 1 configuration
> >
> > Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM
> along with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and
> QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from
> v5.2.0-2stage-rfcv8.
> > NVMe PCIe device is functional with 2-stage translations and no issues
> observed.
> Thank you very much for your testing efforts. For your info, there are more
> recent kernel series:
> [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) (Feb 23) [PATCH
> v12 00/13] SMMUv3 Nested Stage Setup (VFIO part) (Feb 23)
> 
> working along with QEMU RFC
> [RFC v8 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration (Feb 25)
> 
> If you have cycles to test with those, this would be higly appreciated.
 
Thanks Eric for the latest patches. Will validate and update. 
Feel free to reach out me for validating future patch sets as necessary.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (VFIO part)

2021-03-15 Thread Krishna Reddy
Tested-by: Krishna Reddy 

> 1) pass the guest stage 1 configuration
> 3) invalidate stage 1 related caches

Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along 
with patch series "v13 SMMUv3 Nested Stage Setup (IOMMU part)" and QEMU patch 
series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8. 
NVMe PCIe device is functional with 2-stage translations and no issues observed.

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

2021-03-15 Thread Krishna Reddy
Tested-by: Krishna Reddy 

> 1) pass the guest stage 1 configuration

Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along 
with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and QEMU patch 
series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8. 
NVMe PCIe device is functional with 2-stage translations and no issues observed.

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 11/13] iommu/arm-smmu-v3: Add SVA device feature

2021-01-06 Thread Krishna Reddy
> I agree that we should let a device driver enable SVA if it supports some 
> form of IOPF. 

Right, The PCIe device has capability to handle  IO page faults on its own when 
ATS translation request response indicates that no valid mapping exist.
SMMU doesn't involve/handle page faults for ATS translation failures here.
Neither the PCIe device nor the SMMU have PRI capability. 

> Perhaps we could extract the IOPF capability from IOMMU_DEV_FEAT_SVA, into a 
> new IOMMU_DEV_FEAT_IOPF feature.
> Device drivers that rely on PRI or stall can first check FEAT_IOPF, then 
> FEAT_SVA, and enable both separately.
> Enabling FEAT_SVA would require FEAT_IOPF enabled if supported. Let me try to 
> write this up.

Looks good to me. Allowing device driver to enable FEAT_IOPF and subsequently 
SVA should work. Thanks!

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 11/13] iommu/arm-smmu-v3: Add SVA device feature

2020-12-14 Thread Krishna Reddy
Hi Jean,

> +bool arm_smmu_master_sva_supported(struct arm_smmu_master *master) {
> +   if (!(master->smmu->features & ARM_SMMU_FEAT_SVA))
> +   return false;
+
> +   /* SSID and IOPF support are mandatory for the moment */
> +   return master->ssid_bits && arm_smmu_iopf_supported(master); }
> +

Tegra Next Gen SOC has arm-smmu-v3 and It doesn't have support for PRI 
interface.
However, PCIe client device has capability to handle the page faults on its own 
when the ATS translation fails.
The PCIe device needs SVA feature enable without PRI interface supported at 
arm-smmu-v3.
At present, the SVA feature enable is allowed only if the smmu/client device 
has PRI support. 
There seem to be no functional reason to make pri_supported as a pre-requisite 
for SVA enable.
Can SVA enable be supported for pri_supported not set case as well? 
Also, It is noticed that SVA  enable on Intel doesn't need pri_supported set. 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 10/13] iommu/arm-smmu-v3: Check for SVA features

2020-12-14 Thread Krishna Reddy
>> The Tegra Next Generation SOC uses arm-smmu-v3, but it doesn't have support 
>> for BTM.
>> Do you have plan to get your earlier patch to handle invalidate 
>> notifications into upstream sometime soon?
>> Can the dependency on BTM be relaxed with the patch?
>>
>> PATCH v9 13/13] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops 
>> https://www.spinics.net/lists/arm-kernel/msg825099.html

>This patch (which should be in 5.11) only takes care of sending ATC 
>invalidations to PCIe endpoints. With this, BTM is still required to 
>invalidate SMMU TLBs.
> However we could enable command-queue invalidation when ARM_SMMU_FEAT_BTM 
> isn't set.
> Invalidations are still a relatively rare event so it may not be outrageously 
> slow. I can add a patch to my tree if you have hardware to test. 

Thanks! We can test the patch on emulation platform. 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 10/13] iommu/arm-smmu-v3: Check for SVA features

2020-12-09 Thread Krishna Reddy
> > The Tegra Next Generation SOC uses arm-smmu-v3, but it doesn't have support 
> > for BTM.
> > Do you have plan to get your earlier patch to handle invalidate 
> > notifications into upstream sometime soon?

>Is that a limitation of the SMMU implementation, the interconnect or the 
>integration?

It is the limitation of interconnect. The DVM messages don't reach SMMU. 
The BTM bit in SMMU IDR does indicate that it doesn't support BTM. 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 10/13] iommu/arm-smmu-v3: Check for SVA features

2020-12-09 Thread Krishna Reddy
Hi Jean,
> > Why is BTM mandated for SVA? I couldn't find this requirement in 
> > SMMU spec (Sorry if I missed it or this got discussed earlier). But 
> > if performance is the
> only concern here,
> > is it better just to allow it with a warning rather than limiting 
> > SMMUs without
> BTM?
>
> It's a performance concern and requires to support multiple 
> configurations, but the spec allows it. Are there SMMUs without BTM 
> that need it?

The Tegra Next Generation SOC uses arm-smmu-v3, but it doesn't have support for 
BTM.
Do you have plan to get your earlier patch to handle invalidate notifications 
into upstream sometime soon? 
Can the dependency on BTM be relaxed with the patch?

PATCH v9 13/13] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops
https://www.spinics.net/lists/arm-kernel/msg825099.html

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-07-18 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU.

Reviewed-by: Jon Hunter 
Reviewed-by: Rob Herring 
Reviewed-by: Robin Murphy 
Signed-off-by: Krishna Reddy 
---
 .../devicetree/bindings/iommu/arm,smmu.yaml   | 25 ++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 93fb9fe068b9..503160a7b9a0 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -44,6 +44,11 @@ properties:
 items:
   - const: marvell,ap806-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that program two ARM MMU-500s identically
+items:
+  - enum:
+  - nvidia,tegra194-smmu
+  - const: nvidia,smmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
@@ -61,7 +66,8 @@ properties:
   - cavium,smmu-v2
 
   reg:
-maxItems: 1
+minItems: 1
+maxItems: 2
 
   '#global-interrupts':
 description: The number of global interrupts exposed by the device.
@@ -144,6 +150,23 @@ required:
 
 additionalProperties: false
 
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - nvidia,tegra194-smmu
+then:
+  properties:
+reg:
+  minItems: 2
+  maxItems: 2
+else:
+  properties:
+reg:
+  maxItems: 1
+
 examples:
   - |+
 /* SMMU with stream matching or stream indexing */
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 5/5] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-18 Thread Krishna Reddy
Add global/context fault hooks to allow vendor specific implementations
override default fault interrupt handlers.

Update NVIDIA implementation to override the default global/context fault
interrupt handlers and handle interrupts across the two ARM MMU-500s that
are programmed identically.

Reviewed-by: Jon Hunter 
Reviewed-by: Nicolin Chen 
Reviewed-by: Pritesh Raithatha 
Reviewed-by: Robin Murphy 
Reviewed-by: Thierry Reding 
Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 99 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 2f55e5793d34..31368057e9be 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -127,6 +127,103 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+struct arm_smmu_device *smmu,
+int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, 
GFSYNR2 0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+
+   return ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+ struct arm_smmu_device *smmu,
+ int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, 
fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int idx;
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain;
+
+   smmu_domain = container_of(domain, struct arm_smmu_domain, domain);
+   smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   /*
+* Interrupt line is shared between all contexts.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   }
+
+   return ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -134,6 +231,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_sm

[PATCH v11 0/5] NVIDIA ARM SMMU Implementation

2020-07-18 Thread Krishna Reddy
Changes in v11:
Addressed Rob comment on DT binding patch to set min/maxItems of reg property 
in else part.
Rebased on top of 
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates.

Changes in v10:
Perform SMMU base ioremap before calling implementation init.
Check for Global faults across both ARM MMU-500s during global interrupt.
Check for context faults across all contexts of both ARM MMU-500s during 
context fault interrupt.
Add new DT binding nvidia,smmu-500 for NVIDIA implementation.
https://lkml.org/lkml/2020/7/8/57

v9 - https://lkml.org/lkml/2020/6/30/1282
v8 - https://lkml.org/lkml/2020/6/29/2385
v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588


Krishna Reddy (5):
  iommu/arm-smmu: move TLB timeout and spin count macros
  iommu/arm-smmu: ioremap smmu mmio region before implementation init
  iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
  dt-bindings: arm-smmu: add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  25 +-
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 278 ++
 drivers/iommu/arm-smmu.c  |  29 +-
 drivers/iommu/arm-smmu.h  |   6 +
 7 files changed, 334 insertions(+), 11 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 49fbb25030265c660de732513f18275d88ff99d3
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 2/5] iommu/arm-smmu: ioremap smmu mmio region before implementation init

2020-07-18 Thread Krishna Reddy
ioremap smmu mmio region before calling into implementation init.
This is necessary to allow mapped address available during vendor
specific implementation init.

Reviewed-by: Jon Hunter 
Reviewed-by: Nicolin Chen 
Reviewed-by: Pritesh Raithatha 
Reviewed-by: Robin Murphy 
Reviewed-by: Thierry Reding 
Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index cdd15ead9bc4..de520115d3df 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2123,10 +2123,6 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (err)
return err;
 
-   smmu = arm_smmu_impl_init(smmu);
-   if (IS_ERR(smmu))
-   return PTR_ERR(smmu);
-
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
ioaddr = res->start;
smmu->base = devm_ioremap_resource(dev, res);
@@ -2138,6 +2134,10 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
 */
smmu->numpage = resource_size(res);
 
+   smmu = arm_smmu_impl_init(smmu);
+   if (IS_ERR(smmu))
+   return PTR_ERR(smmu);
+
num_irqs = 0;
while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) {
num_irqs++;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 3/5] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-07-18 Thread Krishna Reddy
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
It uses two of the ARM MMU-500s together to interleave IOVA
accesses across them and must be programmed identically.
This implementation supports programming the two ARM MMU-500s
that must be programmed identically.

The third ARM MMU-500 instance is supported by standard
arm-smmu.c driver itself.

Reviewed-by: Jon Hunter 
Reviewed-by: Nicolin Chen 
Reviewed-by: Pritesh Raithatha 
Reviewed-by: Robin Murphy 
Reviewed-by: Thierry Reding 
Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 179 
 drivers/iommu/arm-smmu.c|   1 +
 drivers/iommu/arm-smmu.h|   1 +
 6 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 496fd4eafb68..ee2c0ba13a0f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16810,8 +16810,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb..2b8203db73ec 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c87d825f651e..f4ff124a1967 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -213,6 +213,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500") ||
of_device_is_compatible(np, "qcom,sm8150-smmu-500") ||
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index ..2f55e5793d34
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,179 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together and must be programmed identically for
+ * interleaved IOVA accesses across them and translates accesses from
+ * non-isochronous HW devices.
+ * Third one is used for translating accesses from isochronous HW devices.
+ * This implementation supports programming of the two instances that must
+ * be programmed identically.
+ * The third instance usage is through standard arm-smmu driver itself and
+ * is out of scope of this implementation.
+ */
+#define NUM_SMMU_INSTANCES 2
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   void __iomem*bases[NUM_SMMU_INSTANCES];
+};
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu;
+
+   nvidia_smmu = container_of(smmu, struct nvidia_smmu, smmu);
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+ int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < NUM_SMMU_INSTANCES; i++) {
+   void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
+
+   writel_relaxed(val, reg);
+   }
+}
+
+static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readq_relaxe

[PATCH v11 1/5] iommu/arm-smmu: move TLB timeout and spin count macros

2020-07-18 Thread Krishna Reddy
Move TLB timeout and spin count macros to header file to
allow using the same from vendor specific implementations.

Reviewed-by: Jon Hunter 
Reviewed-by: Nicolin Chen 
Reviewed-by: Pritesh Raithatha 
Reviewed-by: Robin Murphy 
Reviewed-by: Thierry Reding 
Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 3 ---
 drivers/iommu/arm-smmu.h | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 19f906de6420..cdd15ead9bc4 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..c7d0122a7c6c 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -236,6 +236,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 0/5] NVIDIA ARM SMMU Implementation

2020-07-17 Thread Krishna Reddy
>On Mon, Jul 13, 2020 at 02:50:20PM +0100, Will Deacon wrote:
>> On Tue, Jul 07, 2020 at 10:00:12PM -0700, Krishna Reddy wrote:
> >> Changes in v10:
>> > Perform SMMU base ioremap before calling implementation init.
>> > Check for Global faults across both ARM MMU-500s during global interrupt.
>> > Check for context faults across all contexts of both ARM MMU-500s during 
>> > context fault interrupt.
> >> Add new DT binding nvidia,smmu-500 for NVIDIA implementation.
>>
>> Please repost based on my SMMU queue, as this doesn't currently apply.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=
>> for-joerg/arm-smmu/updates

>Any update on this, please? It would be a shame to miss 5.9 now that the 
>patches look alright.

Thanks for pinging.
I have been out of the office this week. Just started working on reposting 
patches.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-07-10 Thread Krishna Reddy
Thanks Rob. One question on setting "minItems: ". Please see below.

>> +allOf:
>> +  - if:
>> +  properties:
>> +compatible:
>> +  contains:
>> +enum:
>> +  - nvidia,tegra194-smmu
>> +then:
>> +  properties:
>> +reg:
>> +  minItems: 2
>> +  maxItems: 2

>This doesn't work. The main part of the schema already said there's only
>1 reg region. This part is ANDed with that, not an override. You need to add 
>an else clause with 'maxItems: 1' and change the base schema to
>{minItems: 1, maxItems: 2}.

As the earlier version of base schema doesn't have "minItems: " set, should it 
be set to 0 for backward compatibility?  Or can it just be omitted setting in 
base schema as before?

"else" part to set "maxItems: 1" and setting "maxItems: 2" in base schema is 
clear to me.


-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-07-07 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU.

Signed-off-by: Krishna Reddy 
---
 .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423..ac1f526c3424 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that program two ARM MMU-500s identically
+items:
+  - enum:
+  - nvidia,tegra194-smmu
+  - const: nvidia,smmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
@@ -138,6 +143,19 @@ required:
 
 additionalProperties: false
 
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - nvidia,tegra194-smmu
+then:
+  properties:
+reg:
+  minItems: 2
+  maxItems: 2
+
 examples:
   - |+
 /* SMMU with stream matching or stream indexing */
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 1/5] iommu/arm-smmu: move TLB timeout and spin count macros

2020-07-07 Thread Krishna Reddy
Move TLB timeout and spin count macros to header file to
allow using the same from vendor specific implementations.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 3 ---
 drivers/iommu/arm-smmu.h | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..d2054178df35 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..c7d0122a7c6c 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -236,6 +236,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 2/5] iommu/arm-smmu: ioremap smmu mmio region before implementation init

2020-07-07 Thread Krishna Reddy
ioremap smmu mmio region before calling into implementation init.
This is necessary to allow mapped address available during vendor
specific implementation init.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d2054178df35..e03e873d3bca 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2120,10 +2120,6 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (err)
return err;
 
-   smmu = arm_smmu_impl_init(smmu);
-   if (IS_ERR(smmu))
-   return PTR_ERR(smmu);
-
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
ioaddr = res->start;
smmu->base = devm_ioremap_resource(dev, res);
@@ -2135,6 +2131,10 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
 */
smmu->numpage = resource_size(res);
 
+   smmu = arm_smmu_impl_init(smmu);
+   if (IS_ERR(smmu))
+   return PTR_ERR(smmu);
+
num_irqs = 0;
while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) {
num_irqs++;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 0/5] NVIDIA ARM SMMU Implementation

2020-07-07 Thread Krishna Reddy
Changes in v10:
Perform SMMU base ioremap before calling implementation init.
Check for Global faults across both ARM MMU-500s during global interrupt.
Check for context faults across all contexts of both ARM MMU-500s during 
context fault interrupt.
Add new DT binding nvidia,smmu-500 for NVIDIA implementation.

v9 - https://lkml.org/lkml/2020/6/30/1282
v8 - https://lkml.org/lkml/2020/6/29/2385
v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588


Krishna Reddy (5):
  iommu/arm-smmu: move TLB timeout and spin count macros
  iommu/arm-smmu: ioremap smmu mmio region before implementation init
  iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
  dt-bindings: arm-smmu: add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  18 ++
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 278 ++
 drivers/iommu/arm-smmu.c  |  29 +-
 drivers/iommu/arm-smmu.h  |   6 +
 7 files changed, 328 insertions(+), 10 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: e5640f21b63d2a5d3e4e0c4111b2b38e99fe5164
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 5/5] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-07 Thread Krishna Reddy
Add global/context fault hooks to allow vendor specific implementations
override default fault interrupt handlers.

Update NVIDIA implementation to override the default global/context fault
interrupt handlers and handle interrupts across the two ARM MMU-500s that
are programmed identically.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 99 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 2f55e5793d34..31368057e9be 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -127,6 +127,103 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+struct arm_smmu_device *smmu,
+int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, 
GFSYNR2 0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+
+   return ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+ struct arm_smmu_device *smmu,
+ int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, 
fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int idx;
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain;
+
+   smmu_domain = container_of(domain, struct arm_smmu_domain, domain);
+   smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   /*
+* Interrupt line is shared between all contexts.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   }
+
+   return ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -134,6 +231,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu

[PATCH v10 3/5] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-07-07 Thread Krishna Reddy
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
It uses two of the ARM MMU-500s together to interleave IOVA
accesses across them and must be programmed identically.
This implementation supports programming the two ARM MMU-500s
that must be programmed identically.

The third ARM MMU-500 instance is supported by standard
arm-smmu.c driver itself.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 179 
 drivers/iommu/arm-smmu.c|   1 +
 drivers/iommu/arm-smmu.h|   1 +
 6 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index c23352059a6b..534cedaf8e55 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16811,8 +16811,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb..2b8203db73ec 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b70..f15571d05474 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index ..2f55e5793d34
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,179 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together and must be programmed identically for
+ * interleaved IOVA accesses across them and translates accesses from
+ * non-isochronous HW devices.
+ * Third one is used for translating accesses from isochronous HW devices.
+ * This implementation supports programming of the two instances that must
+ * be programmed identically.
+ * The third instance usage is through standard arm-smmu driver itself and
+ * is out of scope of this implementation.
+ */
+#define NUM_SMMU_INSTANCES 2
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   void __iomem*bases[NUM_SMMU_INSTANCES];
+};
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu;
+
+   nvidia_smmu = container_of(smmu, struct nvidia_smmu, smmu);
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+ int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < NUM_SMMU_INSTANCES; i++) {
+   void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
+
+   writel_relaxed(val, reg);
+   }
+}
+
+static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readq_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg64(struct arm_smmu_device *smmu,
+   int page, int offset, u64 val)
+{
+   unsigned int i;
+
+  

RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Krishna Reddy
On 01/07/2020 20:00, Krishna Reddy wrote:
>>>>>> +items:
>>>>>> +  - enum:
>>>>>> +  - nvdia,tegra194-smmu
>>>>>> +  - const: arm,mmu-500
>>>>
>>>>> Is the fallback compatible appropriate here? If software treats this as a 
>>>>> standard MMU-500 it will only program the first instance (because the 
>>>>> second isn't presented as a separate MMU-500) - is there any way that 
>>>>> isn't going to blow up?
>>>>
>>>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
>>>> implementation override ensure that both instances are programmed. Isn't 
>>>> it? I am not sure I follow your comment fully.
>> 
>>> The problem is, if for some reason someone had a Tegra194, but only set the 
>>> compatible string to 'arm,mmu-500' it would assume that it was a normal 
>>> arm,mmu-500 and only one instance would be programmed. We always want at 
>>> least 2 of the 3 instances >>programmed and so we should only match 
>>> 'nvidia,tegra194-smmu'. In fact, I think that we also need to update the 
>>> arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to 
>>> _mmu500.
>> 
>> In that case, new binding "nvidia,smmu-v2" can be added with data set to 
>> _mmu500 and enumeration would have nvidia,tegra194-smmu and another 
>> variant for next generation SoC in future. 

>I think you would be better off with nvidia,smmu-500 as smmu-v2 appears to be 
>something different. I see others have a smmu-v2 but I am not sure if that is 
>legacy. We have an smmu-500 and so that would seem more appropriate.

I tried to use the binding synonymous to other vendors. 
V2 is the architecture version.  MMU-500 is the actual implementation from ARM 
based on V2 arch.  As we just use the MMU-500 IP as it is, It can be named as 
nvidia,smmu-500 or similar as well.
Others probably having their own implementation based on V2 arch.

KR
--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-01 Thread Krishna Reddy
>> With shared irq line, the context fault identification is not optimal 
>> already.  Reading all the context banks all the time can be additional mmio 
>> read overhead. But, it may not hurt the real use cases as these happen only 
>> when there are bugs.

>Right, I did ponder the idea of a whole programmatic "request_context_irq" 
>hook that would allow registering the handler for both interrupts with the 
>appropriate context bank and instance data, but since all interrupts are 
>currently unexpected it seems somewhat hard to justify the extra complexity. 
>Obviously we can revisit this in future if you want to start actually doing 
>something with faults like the qcom GPU folks do.

Thanks, I would just avoid making changes to interrupt handlers till it is 
really necessary in future. The current code would just be simple and 
functional with more interrupts when there are multiple faults.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-07-01 Thread Krishna Reddy
>Yeah, I realised later last night that this probably originated from forking 
>the whole driver downstream. But even then you could have treated the other 
>one as a separate nsmmu with a single instance ;)

True, But the initial nvidia implementation had limitation that it can only 
handle one instance of usage. With your implementation hooks design, it should 
be able to handle multiple instances of usage now.

>Since it does add a bit of confusion to the code and comments, let's just keep 
>things simple. I do like Jon's suggestion of actually enforcing that the 
>number of "reg" regions exactly matches the number expected for the given 
>compatible - I guess for now that means just hard-coding 2 and hoping the 
>hardware folks don't cook up any more of these...

For T194, reg can just be forced to 2. No future plan to use more than two 
MMU-500s together as of now. 

-KR 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Krishna Reddy
 +items:
 +  - enum:
 +  - nvdia,tegra194-smmu
 +  - const: arm,mmu-500
> >
>>> Is the fallback compatible appropriate here? If software treats this as a 
>>> standard MMU-500 it will only program the first instance (because the 
>>> second isn't presented as a separate MMU-500) - is there any way that isn't 
>>> going to blow up?
> >
>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
>> implementation override ensure that both instances are programmed. Isn't it? 
>> I am not sure I follow your comment fully.

>The problem is, if for some reason someone had a Tegra194, but only set the 
>compatible string to 'arm,mmu-500' it would assume that it was a normal 
>arm,mmu-500 and only one instance would be programmed. We always want at least 
>2 of the 3 instances >programmed and so we should only match 
>'nvidia,tegra194-smmu'. In fact, I think that we also need to update the 
>arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to 
>_mmu500.

In that case, new binding "nvidia,smmu-v2" can be added with data set to 
_mmu500 and enumeration would have nvidia,tegra194-smmu and another variant 
for next generation SoC in future. 

-KR
--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-01 Thread Krishna Reddy
>>> +for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
>>> +irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
>>> +if (irq_ret == IRQ_HANDLED)
>>> +return irq_ret;
>>
>> Any chance there could be more than one SMMU faulting by the time we 
>> service the interrupt?

>It certainly seems plausible if the interconnect is automatically 
>load-balancing requests across the SMMU instances - say a driver bug caused a 
>buffer to be unmapped too early, there could be many in-flight accesses to 
>parts of that buffer that aren't all taking the same path and thus could now 
>fault in parallel.
>[ And anyone inclined to nitpick global vs. context faults, s/unmap a 
>buffer/tear down a domain/ ;) ]
>Either way I think it would be easier to reason about if we just handled these 
>like a typical shared interrupt and always checked all the instances.

It would be optimal to check at the same time across all instances. 

>>> +for (idx = 0; idx < smmu->num_context_banks; idx++) {
>>> +irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
>>> + idx, 
>>> + inst);
>>> +
>>> +if (irq_ret == IRQ_HANDLED)
>>> +return irq_ret;
>>
>> Any reason why we don't check all banks?

>As above, we certainly shouldn't bail out without checking the bank for the 
>offending domain across all of its instances, and I guess the way this works 
>means that we would have to iterate all the banks to achieve that.

With shared irq line, the context fault identification is not optimal already.  
Reading all the context banks all the time can be additional mmio read 
overhead. But, it may not hurt the real use cases as these happen only when 
there are bugs. 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Krishna Reddy
>> +  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
> Hmm, there must be a better way to word that to express that it only applies 
> to the sets of SMMUs that must be programmed identically, and not any other 
> independent MMU-500s that might also happen to be in the same SoC.

Let me reword it to "NVIDIA SoCs that must program multiple MMU-500s 
identically".

>> +items:
>> +  - enum:
>> +  - nvdia,tegra194-smmu
>> +  - const: arm,mmu-500

>Is the fallback compatible appropriate here? If software treats this as a 
>standard MMU-500 it will only program the first instance (because the second 
>isn't presented as a separate MMU-500) - is there any way that isn't going to 
>blow up?

When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
implementation override ensure that both instances are programmed. Isn't it? I 
am not sure I follow your comment fully.

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-07-01 Thread Krishna Reddy
>> + * When Linux kernel supports multiple SMMU devices, the SMMU device 
>> +used for
>> + * isochornous HW devices should be added as a separate ARM MMU-500 
>> +device
>> + * in DT and be programmed independently for efficient TLB invalidates.

>I don't understand the "When" there - the driver has always supported multiple 
>independent SMMUs, and it's not something that could be configured out or 
>otherwise disabled. Plus I really don't see why you would ever want to force 
>unrelated SMMUs to be >programmed together - beyond the TLB thing mentioned it 
>would also waste precious context bank resources and might lead to weird 
>device grouping via false stream ID aliasing, with no obvious upside at all.

Sorry, I missed this comment.
During the initial patches, when the iommu_ops were different between, support 
multiple SMMU drivers at the same is not possible as one of them(that gets 
probed last) overwrites the platform bus ops. 
On revisiting the original issue, This problem is no longer relevant. At this 
point, It makes more sense to just get rid of 3rd instance programming in 
arm-smmu-nvidia.c and just limit it to the SMMU instances that need identical 
programming.

-KR



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 1/4] iommu/arm-smmu: move TLB timeout and spin count macros

2020-06-30 Thread Krishna Reddy
Move TLB timeout and spin count macros to header file to
allow using the same values from vendor specific implementations.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 3 ---
 drivers/iommu/arm-smmu.h | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705b..d2054178df357 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be618..c7d0122a7c6ca 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -236,6 +236,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 3/4] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-06-30 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423b..662c46e16f07d 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvidia,tegra194-smmu
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
@@ -138,6 +143,19 @@ required:
 
 additionalProperties: false
 
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - nvidia,tegra194-smmu
+then:
+  properties:
+reg:
+  minItems: 2
+  maxItems: 3
+
 examples:
   - |+
 /* SMMU with stream matching or stream indexing */
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 0/4] NVIDIA ARM SMMUv2 Implementation

2020-06-30 Thread Krishna Reddy
Changes in v9:
Move TLB Timeout and spin count macros to arm-smmu.h header to share with 
implementation.
Set minItems and maxItems for reg property when compatible contains 
nvidia,tegra194-smmu.
Update commit message for NVIDIA implementation patch.
Fail single SMMU instance usage through NVIDIA implementation to limit the 
usage to two or three instances.
Fix checkpatch warnings with --strict checking.

v8 - https://lkml.org/lkml/2020/6/29/2385
v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (4):
  iommu/arm-smmu: move TLB timeout and spin count macros
  iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
  dt-bindings: arm-smmu: add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  18 ++
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 304 ++
 drivers/iommu/arm-smmu.c  |  20 +-
 drivers/iommu/arm-smmu.h  |   6 +
 7 files changed, 349 insertions(+), 6 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 2/4] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
It uses two of ARM MMU-500s together to interleave IOVA accesses
across them and must be programmed identically.
The third SMMU instance is used as a regular ARM MMU-500 and it
can either be programmed independently or identical to other
two ARM MMU-500s.

This implementation supports programming two or three ARM MMU-500s
identically as per DT config.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 206 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 213 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..f15571d05474e 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..5c874912e1c1a
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.
+ */
+#define MAX_SMMU_INSTANCES 3
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   if (!nvidia_smmu->bases[0])
+   nvidia_smmu->bases[0] = smmu->base;
+
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+ int page, int offset, u32 val)
+{
+   unsigned int i;
+   struc

[PATCH v9 4/4] iommu/arm-smmu: add global/context fault implementation hooks

2020-06-30 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 98 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 5c874912e1c1a..d279788eab954 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -144,6 +144,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+struct arm_smmu_device *smmu,
+int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, 
GFSYNR2 0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+ struct arm_smmu_device *smmu,
+ int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, 
fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /*
+* Interrupt line is shared between all contexts.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -151,6 +247,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d2054

RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> The driver intend to support up to 3 instances. It doesn't really mandate 
>> that all three instances be present in same DT node.
>> Each mmio aperture in "reg" property is an instance here. reg = 
>> , , ; The reg can have 
>> all three or less and driver just configures based on reg and it works fine.

>So it sounds like we need at least 2 SMMUs (for non-iso and iso) but we have 
>up to 3 (for Tegra194). So the question is do we have a use-case where we only 
>use 2 and not 3? If not, then it still seems that we should require that all 3 
>are present.

It can be either 2 SMMUs (for non-iso) or 3 SMMUs (for non-iso and iso).  Let 
me fail the one instance case as it can use regular arm smmu implementation and 
don't  need nvidia implementation explicitly.
 
>The other problem I see here is that currently the arm-smmu binding defines 
>the 'reg' with a 'maxItems' of 1, whereas we have 3. I believe that this will 
>get caught by the 'dt_binding_check' when we try to populate the binding.

Thanks for pointing it out! Will update the binding doc.

-KR

--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>OK, well I see what you are saying, but if we intended to support all 3 for 
>Tegra194, then we should ensure all 3 are initialised correctly.

The driver intend to support up to 3 instances. It doesn't really mandate that 
all three instances be present in same DT node.
Each mmio aperture in "reg" property is an instance here. reg = , , ;
The reg can have all three or less and driver just configures based on reg and 
it works fine.

>It would be better to query the number of SMMUs populated in device-tree and 
>then ensure that all are initialised correctly.

Getting the IORESOURCE_MEM is the way to count the instances driver need to 
support.  
In a way, It is already querying through IORESOURCE_MEM here. 


-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave 
>> IOVA accesses across them.
>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible 
>> string for Tegra194 SoC SMMU topology.

>There is no description here of the 3rd SMMU that you mention below.
>I think that we should describe the full picture here.

This driver is primarily for dual SMMU config.  So, It is avoided in the commit 
message.
However, Implementation supports option to configure 3 instances identically 
with one SMMU DT node
and is documented in the implementation.

>> +
>> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
>> +   unsigned int inst, int page)

>If you run checkpatch --strict on these you will get a lot of ...

>CHECK: Alignment should match open parenthesis
>#116: FILE: drivers/iommu/arm-smmu-nvidia.c:46:
>+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
>+ unsigned int inst, int page)

>We should fix these.

I will fix these if I need to push a new patch set.

>> +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
>> +int page, int offset, u32 val) {
>> +unsigned int i;
>> +struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
>> +
>> +for (i = 0; i < nvidia_smmu->num_inst; i++) {
>> +void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;

>Personally, I would declare 'reg' outside of the loop as I feel it will make 
>the code cleaner and easier to read.

It was like that before and is updated to its current form to limit the scope 
of variables as per Thierry's comments in v6. 
We can just leave it as it is as there is no technical issue here.

-KR

--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device 
>> +*smmu) {
>> +unsigned int i;

>> +for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>> +struct resource *res;
>> +
>> +res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>> +if (!res)
>> +break;

>Currently this driver is only supported for Tegra194 which I understand has 3 
>SMMUs. Therefore, I don't feel that we should fail silently here, I think it 
>is better to return an error if all 3 cannot be initialised.

Initialization of all the three SMMU instances is not necessary here.
The driver can work with all the possible number of instances 1, 2 and 3 based 
on the DT config though it doesn't make much sense to use it with 1 instance.
There is no silent failure here from driver point of view. If there is 
misconfig in DT, SMMU faults would catch issues.

>> +nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>> +if (IS_ERR(nvidia_smmu->bases[i]))
>> +return ERR_CAST(nvidia_smmu->bases[i]);

>You want to use PTR_ERR() here.

PTR_ERR() returns long integer. 
This function returns a pointer. ERR_CAST is the right one to use here. 


--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-29 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 98 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 1124f0ac1823a..c9423b4199c65 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /*
+* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -154,6 +250,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705b..3bb0aba15a356 100644
--- a/drivers/iommu/arm-s

[PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-29 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423b..5b2586ac715ed 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvdia,tegra194-smmu
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 0/3] Nvidia Arm SMMUv2 Implementation

2020-06-29 Thread Krishna Reddy
Changes in v8:
Fixed incorrect CB_FSR read issue during context bank fault.
Rebased and validated patches on top of 
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next

v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (3):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 294 ++
 drivers/iommu/arm-smmu.c  |  17 +-
 drivers/iommu/arm-smmu.h  |   4 +
 7 files changed, 324 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-29 Thread Krishna Reddy
NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 SoC SMMU topology.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 196 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 203 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..70f7318017617 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..1124f0ac1823a
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT_IN_US 100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+  unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   if (!nvidia_smmu->bases[0])
+   nvidia_smmu->bases[0] = smmu->base;
+
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (i = 0; i < nvidia_smmu->num_ins

RE: [PATCH v7 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-29 Thread Krishna Reddy
>> +static irqreturn_t nvidia_smmu_context_fault_bank(int irq,

>> + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, 
>> + smmu->numpage + idx);
[...]
>> + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
[...]
>> + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);

>It reads FSR of the default inst (1st), but clears the FSR of corresponding 
>inst -- just want to make sure that this is okay and intended.

FSR should be read from corresponding inst. Not from instance 0. 
Let me post updated patch.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v7 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-29 Thread Krishna Reddy
>> + if (!nvidia_smmu->bases[0])
>> + nvidia_smmu->bases[0] = smmu->base;
>> +
>> + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); }

>Not critical -- just a nit: why not put the bases[0] in init()?

smmu->base is not available during nvidia_smmu_impl_init() call. It is set 
afterwards in arm-smmu.c.
It can't be avoided without changing the devm_ioremap() and impl_init() call 
order in arm-smmu.c. 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-28 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423b..5b2586ac715ed 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvdia,tegra194-smmu
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-28 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 101 +++-
 drivers/iommu/arm-smmu.c|  17 +-
 drivers/iommu/arm-smmu.h|   3 +
 3 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index b73c483fa3376..7276bb203ae79 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /*
+* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -154,6 +250,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
@@ -185,7 +283,8 @@ struct arm_smmu_device *nvidia_smmu_impl_init(struct 
arm_smmu_device *smmu)
}
 
nvidia_smmu->smmu.i

[PATCH v7 0/3] Nvidia Arm SMMUv2 Implementation

2020-06-28 Thread Krishna Reddy
Changes in v7:
Incorporated the review feedback from Nicolin Chen, Robin Murphy and Thierry 
Reding.
Rebased and validated patches on top of 
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next

v6- https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (3):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 294 ++
 drivers/iommu/arm-smmu.c  |  17 +-
 drivers/iommu/arm-smmu.h  |   4 +
 7 files changed, 324 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-28 Thread Krishna Reddy
NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 SoC SMMU topology.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 195 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 202 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..70f7318017617 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..b73c483fa3376
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT_IN_US 100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+  unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   if (!nvidia_smmu->bases[0])
+   nvidia_smmu->bases[0] = smmu->base;
+
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (i = 0; i < nvidia_smmu->num_ins

RE: [PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-25 Thread Krishna Reddy
>Should NVIDIA_TEGRA194_SMMU be a separate value for smmu->model, perhaps? That 
>way we avoid this somewhat odd check here.

NVIDIA haven't made any changes to arm,mmu-500. It is only used in different 
topology.  New model would be mis-leading here.
As suggested by Robin, It can just be moved to end of function.

>> diff --git a/drivers/iommu/arm-smmu-nvidia.c 
>> b/drivers/iommu/arm-smmu-nvidia.c
>I wonder if it would be better to name this arm-smmu-tegra.c to make it 
>clearer that this is for a Tegra chip. We do have regular expressions in 
>MAINTAINERS that catch anything with "tegra" in it to make this easier.
>Also, the nsmmu_ prefix looks somewhat odd here. You already use struct 
>nvidia_smmu as the name of the structure, so why not be consistent and 
>continue to use nvidia_smmu_ as the prefix for function names?
>Or perhaps even use tegra_smmu_ as the prefix to match the filename change I 
>suggested earlier.

Prefix can be updated to nvidia_smmu as we seem to be okay for now to keep file 
name as arm-smmu-nvidia.c after the vendor name.  

>> +#define TLB_LOOP_TIMEOUT100 /* 1s! */
>USEC_PER_SEC?

It is not meant for a conversion. Reused Timeout variable from arm-smmu.c for 
tlb_sync implementation.  Can rename it to TLB_LOOP_TIMEOUT_IN_US.


>> +}
>> +dev_err_ratelimited(smmu->dev,
>> +"TLB sync timed out -- SMMU may be deadlocked\n");
>Same here.
>Also, is there anything we can do when this happens?

This is never expected to happen on Silicon. This code and message is reused 
from arm-smmu.c.


>> +#define nsmmu_page(smmu, inst, page) \
>> +(((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
>> +((page) << smmu->pgshift))

>Can we simply define to_nvidia_smmu(smmu)->bases[0] = smmu->base in 
>nvidia_smmu_impl_init()? Then this would become just:
>   to_nvidia_smmu(smmu)->bases[inst] + ((page) << (smmu)->pgshift)
> +
>Maybe add this here to simplify the nsmmu_page() macro above:
>   nsmmu->bases[0] = smmu->base;

This preferred to avoid the check in nsmmu_page(). But, smmu->base is not yet 
populated when nvidia_smmu_impl_init() is called.  
Let me look at the alternative place to set it.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v6 2/4] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-25 Thread Krishna Reddy
>> +  - nvdia,tegra194-smmu-500
>The -500 suffix here seems a bit redundant since there's no other type of SMMU 
>in Tegra194, correct?

Yeah, there is only one type of SMMU supported in T194. It was added to be 
synonymous with mmu-500.  Can be removed.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 3/4] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-04 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 100 
 drivers/iommu/arm-smmu.c|  11 +++-
 drivers/iommu/arm-smmu.h|   3 +
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index dafc293a45217..5999b6a770992 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nsmmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+
+   gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   irq_ret = nsmmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) +
+ ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) +
+   ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nsmmu_context_fault_bank(irq, smmu,
+  idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nsmmu_read_reg,
.write_reg = nsmmu_write_reg,
@@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nsmmu_write_reg64,
.reset = nsmmu_reset,
.tlb_sync = nsmmu_tlb_sync,
+   .global_fault = nsmmu_global_fault,
+   .context_fault = nsmmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705b..d7

[PATCH v6 4/4] iommu/arm-smmu-nvidia: fix the warning reported by kbuild test robot

2020-06-04 Thread Krishna Reddy
>> drivers/iommu/arm-smmu-nvidia.c:151:33: sparse: sparse: cast removes
>> address space '' of expression

Reported-by: kbuild test robot 
Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 5999b6a770992..6348d8dc17fc2 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -248,7 +248,7 @@ struct arm_smmu_device *nvidia_smmu_impl_init(struct 
arm_smmu_device *smmu)
break;
nsmmu->bases[i] = devm_ioremap_resource(dev, res);
if (IS_ERR(nsmmu->bases[i]))
-   return (struct arm_smmu_device *)nsmmu->bases[i];
+   return ERR_CAST(nsmmu->bases[i]);
nsmmu->num_inst++;
}
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 0/4] Nvidia Arm SMMUv2 Implementation

2020-06-04 Thread Krishna Reddy
Changes in v6:
Restricted the patch set to driver specific patches.
Fixed the cast warning reported by kbuild test robot.
Rebased on git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next

v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (4):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks
  iommu/arm-smmu-nvidia: fix the warning reported by kbuild test robot

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 261 ++
 drivers/iommu/arm-smmu.c  |  11 +-
 drivers/iommu/arm-smmu.h  |   4 +
 7 files changed, 285 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 431275afdc7155415254aef4bd3816a1b8a2ead0
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-04 Thread Krishna Reddy
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 soc.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 161 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 168 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 50659d76976b7..118da0893c964 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16572,9 +16572,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 57cf4ba5e27cb..35542df00da72 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o 
amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..52c84c30f83e4 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -160,6 +160,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 */
switch (smmu->model) {
case ARM_MMU500:
+   if (of_device_is_compatible(smmu->dev->of_node,
+   "nvidia,tegra194-smmu-500"))
+   return nvidia_smmu_impl_init(smmu);
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..dafc293a45217
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/* Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for Interleaved IOVA accesses and
+ * used by Non-Isochronous Hw devices for SMMU translations.
+ * Third one is used for SMMU translations from Isochronous HW devices.
+ * It is possible to use this Implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant tlb
+ * invalidations as all three never need to be tlb invalidated for a HW device.
+ *
+ * When Linux Kernel supports multiple SMMU devices, The SMMU device used for
+ * Isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient tlb invalidates.
+ *
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   unsigned int

[PATCH v6 2/4] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-04 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 Soc SMMU that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index e3ef1c69d1326..8f7ffd248f303 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -37,6 +37,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvdia,tegra194-smmu-500
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation

2020-05-22 Thread Krishna Reddy
>For the record: I don't think we should apply these because we don't have a 
>good way of testing them. We currently have three problems that prevent us 
>from enabling SMMU on Tegra194:

Out of three issues pointed here, I see that only issue 2) is a real blocker 
for enabling SMMU HW by default in upstream.

>That said, I have tested earlier versions of this patchset on top of my local 
>branch with fixes for the above and they do seem to work as expected.
>So I'll leave it up to the IOMMU maintainers whether they're willing to merge 
>the driver patches as is.
> But I want to clarify that I won't be applying the DTS patches until we've 
> solved all of the above issues and therefore it should be clear that these 
> won't be runtime tested until then.

SMMU driver patches as such are complete and can be used by nvidia with a local 
config change(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT=n) to disable_bypass and
Protects the driver patches against kernel changes. This config disable option 
is tested already by Nicolin Chen and me.

Robin/Will, Can you comment if smmu driver patches alone(1,2,3 out of 5 
patches) can be merged without DT enable patches? Is it reasonable to merge the 
driver patches alone?

>1) If we enable SMMU support, then the DMA API will automatically try
> to use SMMU domains for allocations. This means that translations
> will happen as soon as a device's IOMMU operations are initialized
> and that is typically a long time (in kernel time at least) before
> a driver is bound and has a chance of configuring the device.

> This causes problems for non-quiesced devices like display
> controllers that the bootloader might have set up to scan out a
> boot splash.

> What we're missing here is a way to:

> a) advertise reserved memory regions for boot splash framebuffers
> b) map reserved memory regions early during SMMU setup

> Patches have been floating on the public mailing lists for b) but
> a) requires changes to the bootloader (both proprietary ones and
> U-Boot for SoCs prior to Tegra194).

This happens if SMMU translations is enabled for display before reserved
 Memory regions issue is fixed. This issue is not a real blocker for SMMU 
enable.


>  2) Even if we don't enable SMMU for a given device (by not hooking up
> the iommus property), with a default kernel configuration we get a
> bunch of faults during boot because the ARM SMMU driver faults by
> default (rather than bypass) for masters which aren't hooked up to
> the SMMU.

> We could work around that by changing the default configuration or
> overriding it on the command-line, but that's not really an option
> because it decreases security and means that Tegra194 won't work
> out-of-the-box.

This is the real issue that blocks enabling SMMU.  The USF faults for devices
that don't have SMMU translations enabled should be fixed or WAR'ed before
SMMU can be enabled. We should look at keeping SID as 0x7F for the devices
that can't have SMMU enabled yet. SID 0x7f bypasses SMMU externally.

>  3) We don't properly describe the DMA hierarchy, which causes the DMA
> masks to be improperly set. As a bit of background: Tegra194 has a
> special address bit (bit 39) that causes some swizzling to happen
> within the memory controller. As a result, any I/O virtual address
> that has bit 39 set will cause this swizzling to happen on access.
> The DMA/IOMMU allocator always starts allocating from the top of
> the IOVA space, which means that the first couple of gigabytes of
> allocations will cause most devices to fail because of the
> undesired swizzling that occurs.

> We had an initial patch for SDHCI merged that hard-codes the DMA
> mask to DMA_BIT_MASK(39) on Tegra194 to work around that. However,
> the devices all do support addressing 40 bits and the restriction
> on bit 39 is really a property of the bus rather than a capability
> of the device. This means that we would have to work around this
> for every device driver by adding similar hacks. A better option is
> to properly describe the DMA hierarchy (using dma-ranges) because
> that will then automatically be applied as a constraint on each
> device's DMA mask.

> I have been working on patches to address this, but they are fairly
> involved because they require device tree bindings changes and so
> on.

Dma_mask issue is again outside SMMU driver and as long as the clients with
Dma_mask issue don't have SMMU enabled, it would be fine.
SDHCI can have SMMU enabled in upstream as soon as issue 2 is taken care.

>So before we solve all of the above issues we can't really enable SMMU on 
>Tegra194 and hence won't be able to test it. As such we don't know if these 
>patches even work, nor can we validate that they continue to work.
>As such, I don't think there's any use in applying these patches upstream 
>since they 

[PATCH v5 1/5] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-05-21 Thread Krishna Reddy
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 soc.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 161 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 168 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index ecc0749810b0..0d8c966ecf17 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16560,9 +16560,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 57cf4ba5e27c..35542df00da7 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o 
amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 74d97a886e93..dcdd513323aa 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 */
switch (smmu->model) {
case ARM_MMU500:
+   if (of_device_is_compatible(smmu->dev->of_node,
+   "nvidia,tegra194-smmu-500"))
+   return nvidia_smmu_impl_init(smmu);
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index ..dafc293a4521
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/* Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for Interleaved IOVA accesses and
+ * used by Non-Isochronous Hw devices for SMMU translations.
+ * Third one is used for SMMU translations from Isochronous HW devices.
+ * It is possible to use this Implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant tlb
+ * invalidations as all three never need to be tlb invalidated for a HW device.
+ *
+ * When Linux Kernel supports multiple SMMU devices, The SMMU device used for
+ * Isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient tlb invalidates.
+ *
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   unsigned int

[PATCH v5 3/5] iommu/arm-smmu: Add global/context fault implementation hooks

2020-05-21 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 100 
 drivers/iommu/arm-smmu.c|  11 +++-
 drivers/iommu/arm-smmu.h|   3 +
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index dafc293a4521..5999b6a77099 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nsmmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+
+   gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   irq_ret = nsmmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) +
+ ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) +
+   ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nsmmu_context_fault_bank(irq, smmu,
+  idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nsmmu_read_reg,
.write_reg = nsmmu_write_reg,
@@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nsmmu_write_reg64,
.reset = nsmmu_reset,
.tlb_sync = nsmmu_tlb_sync,
+   .global_fault = nsmmu_global_fault,
+   .context_fault = nsmmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e622f4e33379..9

[PATCH v5 5/5] arm64: tegra: enable SMMU for SDHCI and EQOS on T194

2020-05-21 Thread Krishna Reddy
Enable SMMU translations for SDHCI and EQOS transactions on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index f7c4399afb55..706bbb439dcd 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -59,6 +59,7 @@ ethernet@249 {
clock-names = "master_bus", "slave_bus", "rx", "tx", 
"ptp_ref";
resets = < TEGRA194_RESET_EQOS>;
reset-names = "eqos";
+   iommus = < TEGRA194_SID_EQOS>;
status = "disabled";
 
snps,write-requests = <1>;
@@ -457,6 +458,7 @@ sdmmc1: sdhci@340 {
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC1>;
reset-names = "sdhci";
+   iommus = < TEGRA194_SID_SDMMC1>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout =
<0x07>;
nvidia,pad-autocal-pull-down-offset-3v3-timeout =
@@ -479,6 +481,7 @@ sdmmc3: sdhci@344 {
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC3>;
reset-names = "sdhci";
+   iommus = < TEGRA194_SID_SDMMC3>;
nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>;
nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>;
@@ -506,6 +509,7 @@ sdmmc4: sdhci@346 {
  < TEGRA194_CLK_PLLC4>;
resets = < TEGRA194_RESET_SDMMC4>;
reset-names = "sdhci";
+   iommus = < TEGRA194_SID_SDMMC4>;
nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation

2020-05-21 Thread Krishna Reddy
Changes in v5:
Rebased on top of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
next

v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (5):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks
  arm64: tegra: Add DT node for T194 SMMU
  arm64: tegra: enable SMMU for SDHCI and EQOS on T194

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
 MAINTAINERS   |   2 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi  |  81 ++
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 261 ++
 drivers/iommu/arm-smmu.c  |  11 +-
 drivers/iommu/arm-smmu.h  |   4 +
 8 files changed, 366 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 365f8d504da50feaebf826d180113529c9383670
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 4/5] arm64: tegra: Add DT node for T194 SMMU

2020-05-21 Thread Krishna Reddy
Add DT node for T194 SMMU to enable SMMU support.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 77 
 1 file changed, 77 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index f4ede86e32b4..f7c4399afb55 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -1620,6 +1620,83 @@ pcie@141a {
  0x8200 0x0  0x4000 0x1f 0x4000 0x0 
0xc000>; /* non-prefetchable memory (3GB) */
};
 
+   smmu: iommu@1200 {
+   compatible = "arm,mmu-500","nvidia,tegra194-smmu-500";
+   reg = <0 0x1200 0 0x80>,
+ <0 0x1100 0 0x80>,
+ <0 0x1000 0 0x80>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   stream-match-mask = <0x7f80>;
+   #global-interrupts = <3>;
+   #iommu-cells = <1>;
+   };
+
pcie_ep@1416 {
compatible = "nvidia,tegra194-pcie-ep", "snps,dw-pcie-ep";
power-domains = < TEGRA194_POWER_DOMAIN_PCIEX4A>;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/5] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-05-21 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 Soc SMMU that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 6515dbe47508..78aba7dd5a61 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -37,6 +37,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvdia,tegra194-smmu-500
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 3/6] iommu/arm-smmu: Add global/context fault implementation hooks

2019-10-30 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 100 
 drivers/iommu/arm-smmu.c|  11 -
 drivers/iommu/arm-smmu.h|   3 ++
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 05d66ac..0ba7abc 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nsmmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+
+   gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   irq_ret = nsmmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) +
+ ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) +
+   ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nsmmu_context_fault_bank(irq, smmu,
+  idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nsmmu_read_reg,
.write_reg = nsmmu_write_reg,
@@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nsmmu_write_reg64,
.reset = nsmmu_reset,
.tlb_sync = nsmmu_tlb_sync,
+   .global_fault = nsmmu_global_fault,
+   .context_fault = nsmmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 080af03..829fb4c 100644
--- a

[PATCH v4 5/6] arm64: tegra: Add DT node for T194 SMMU

2019-10-30 Thread Krishna Reddy
Add DT node for T194 SMMU to enable SMMU support.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 77 
 1 file changed, 77 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 1e0b54b..6f81e90 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -1436,6 +1436,83 @@
  0x8200 0x0  0x4000 0x1f 0x4000 0x0 
0xc000>; /* non-prefetchable memory (3GB) */
};
 
+   smmu: iommu@1200 {
+   compatible = "arm,mmu-500","nvidia,tegra194-smmu";
+   reg = <0 0x1200 0 0x80>,
+ <0 0x1100 0 0x80>,
+ <0 0x1000 0 0x80>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   stream-match-mask = <0x7f80>;
+   #global-interrupts = <3>;
+   #iommu-cells = <1>;
+   };
+
sysram@4000 {
compatible = "nvidia,tegra194-sysram", "mmio-sram";
reg = <0x0 0x4000 0x0 0x5>;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 6/6] arm64: tegra: enable SMMU for SDHCI and EQOS on T194

2019-10-30 Thread Krishna Reddy
Enable SMMU translations for SDHCI and EQOS transactions on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 6f81e90..bf8ed7a 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -51,6 +52,7 @@
clock-names = "master_bus", "slave_bus", "rx", "tx", 
"ptp_ref";
resets = < TEGRA194_RESET_EQOS>;
reset-names = "eqos";
+   iommus = < TEGRA186_SID_EQOS>;
status = "disabled";
 
snps,write-requests = <1>;
@@ -413,6 +415,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC1>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC1>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout =
<0x07>;
nvidia,pad-autocal-pull-down-offset-3v3-timeout =
@@ -435,6 +438,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC3>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC3>;
nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>;
nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>;
@@ -462,6 +466,7 @@
  < TEGRA194_CLK_PLLC4>;
resets = < TEGRA194_RESET_SDMMC4>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC4>;
nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/6] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2019-10-30 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 Soc SMMU that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 3133f3b..1d72fac 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -31,6 +31,10 @@ conditions.
   as below, SoC-specific compatibles:
   "qcom,sdm845-smmu-500", "arm,mmu-500"
 
+  NVIDIA SoCs that use more than one ARM MMU-500 together
+  needs following SoC-specific compatibles along with 
"arm,mmu-500":
+  "nvidia,tegra194-smmu"
+
 - reg   : Base address and size of the SMMU.
 
 - #global-interrupts : The number of global interrupts exposed by the
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 0/6] Nvidia Arm SMMUv2 Implementation

2019-10-30 Thread Krishna Reddy
Changes in v4:
Rebased on top of 
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/ 
for-joerg/arm-smmu/updates.
Updated arm-smmu-nvidia.c to use the tlb_sync implementation hook.
Dropped patch that updates arm_smmu_flush_ops for override as it is no longer 
necessary.

v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (6):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks
  arm64: tegra: Add Memory controller DT node on T194
  arm64: tegra: Add DT node for T194 SMMU
  arm64: tegra: enable SMMU for SDHCI and EQOS on T194

 .../devicetree/bindings/iommu/arm,smmu.txt |   4 +
 MAINTAINERS|   2 +
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi |   4 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   |  88 +++
 drivers/iommu/Makefile |   2 +-
 drivers/iommu/arm-smmu-impl.c  |   3 +
 drivers/iommu/arm-smmu-nvidia.c| 261 +
 drivers/iommu/arm-smmu.c   |  11 +-
 drivers/iommu/arm-smmu.h   |   4 +
 9 files changed, 376 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 1/6] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2019-10-30 Thread Krishna Reddy
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 soc.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 161 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 168 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 296de2b..0519024 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15972,9 +15972,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 4f405f9..7baeade 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -13,7 +13,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
-obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
+obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 5c87a38..1a19687 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 */
switch (smmu->model) {
case ARM_MMU500:
+   if (of_device_is_compatible(smmu->dev->of_node,
+   "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 000..05d66ac
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/* Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for Interleaved IOVA accesses and
+ * used by Non-Isochronous Hw devices for SMMU translations.
+ * Third one is used for SMMU translations from Isochronous HW devices.
+ * It is possible to use this Implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant tlb
+ * invalidations as all three never need to be tlb invalidated for a HW device.
+ *
+ * When Linux Kernel supports multiple SMMU devices, The SMMU device used for
+ * Isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient tlb invalidates.
+ *
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+

[PATCH v4 4/6] arm64: tegra: Add Memory controller DT node on T194

2019-10-30 Thread Krishna Reddy
Add Memory controller DT node on T194 and enable it.
This patch is a prerequisite for SMMU enable on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   | 6 ++
 2 files changed, 10 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
index 4c38426..82a02490 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
@@ -47,6 +47,10 @@
};
};
 
+   memory-controller@2c0 {
+   status = "okay";
+   };
+
serial@311 {
status = "okay";
};
diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 3c0cf54..1e0b54b 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -163,6 +163,12 @@
};
};
 
+   memory-controller@2c0 {
+   compatible = "nvidia,tegra186-mc";
+   reg = <0x02c0 0xb>;
+   status = "disabled";
+   };
+
uarta: serial@310 {
compatible = "nvidia,tegra194-uart", 
"nvidia,tegra20-uart";
reg = <0x0310 0x40>;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 0/7] Nvidia Arm SMMUv2 Implementation

2019-10-23 Thread Krishna Reddy
>>https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates

Thanks Will! Let me rebase my patches on top of this branch and send it out.

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 0/7] Nvidia Arm SMMUv2 Implementation

2019-10-22 Thread Krishna Reddy
Hi Robin,
>>Apologies for crossed wires, but I had a series getting rid of 
>>arm_smmu_flush_ops which was also meant to end up making things a bit easier 
>>for you:

I was looking to rebase on top of your changes first.  Then I read Will's reply 
that said your work is queued for 5.5. 
Let me know if these patches need to rebased on top of iommu/devel or a 
different branch. I can resend the patch set on top of necessary branch.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/7] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2019-10-18 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 Soc SMMU that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 3133f3b..1d72fac 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -31,6 +31,10 @@ conditions.
   as below, SoC-specific compatibles:
   "qcom,sdm845-smmu-500", "arm,mmu-500"
 
+  NVIDIA SoCs that use more than one ARM MMU-500 together
+  needs following SoC-specific compatibles along with 
"arm,mmu-500":
+  "nvidia,tegra194-smmu"
+
 - reg   : Base address and size of the SMMU.
 
 - #global-interrupts : The number of global interrupts exposed by the
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/7] Nvidia Arm SMMUv2 Implementation

2019-10-18 Thread Krishna Reddy
Changes in v3:
Rebased on top of 
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/ next.
Resolved compile error seen with tegra194.dtsi changes after rebase.

v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (7):
  iommu/arm-smmu: prepare arm_smmu_flush_ops for override
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks
  arm64: tegra: Add Memory controller DT node on T194
  arm64: tegra: Add DT node for T194 SMMU
  arm64: tegra: enable SMMU for SDHCI and EQOS on T194

 .../devicetree/bindings/iommu/arm,smmu.txt |   4 +
 MAINTAINERS|   2 +
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi |   4 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   |  88 +++
 drivers/iommu/Makefile |   2 +-
 drivers/iommu/arm-smmu-impl.c  |   3 +
 drivers/iommu/arm-smmu-nvidia.c| 287 +
 drivers/iommu/arm-smmu.c   |  27 +-
 drivers/iommu/arm-smmu.h   |   8 +-
 9 files changed, 413 insertions(+), 12 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

-- 
2.7.4



[PATCH v3 2/7] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2019-10-18 Thread Krishna Reddy
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 soc.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 187 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 194 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index a69e6db..b61dbda 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15974,9 +15974,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 35d1709..4ef8b74 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
-obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
+obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 5c87a38..1a19687 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 */
switch (smmu->model) {
case ARM_MMU500:
+   if (of_device_is_compatible(smmu->dev->of_node,
+   "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 000..ca871dc
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,187 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/* Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for Interleaved IOVA accesses and
+ * used by Non-Isochronous Hw devices for SMMU translations.
+ * Third one is used for SMMU translations from Isochronous HW devices.
+ * It is possible to use this Implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant tlb
+ * invalidations as all three never need to be tlb invalidated for a HW device.
+ *
+ * When Linux Kernel supports multiple SMMU devices, The SMMU device used for
+ * Isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient tlb invalidates.
+ *
+ */
+#define MAX_SMMU_INSTANCES 3
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writeq_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static void nsmmu_tlb_sync(struct

[PATCH v3 5/7] arm64: tegra: Add Memory controller DT node on T194

2019-10-18 Thread Krishna Reddy
Add Memory controller DT node on T194 and enable it.
This patch is a prerequisite for SMMU enable on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   | 6 ++
 2 files changed, 10 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
index 4c38426..82a02490 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
@@ -47,6 +47,10 @@
};
};
 
+   memory-controller@2c0 {
+   status = "okay";
+   };
+
serial@311 {
status = "okay";
};
diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 3c0cf54..1e0b54b 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -163,6 +163,12 @@
};
};
 
+   memory-controller@2c0 {
+   compatible = "nvidia,tegra186-mc";
+   reg = <0x02c0 0xb>;
+   status = "disabled";
+   };
+
uarta: serial@310 {
compatible = "nvidia,tegra194-uart", 
"nvidia,tegra20-uart";
reg = <0x0310 0x40>;
-- 
2.7.4



[PATCH v3 4/7] iommu/arm-smmu: Add global/context fault implementation hooks

2019-10-18 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 100 
 drivers/iommu/arm-smmu.c|  11 -
 drivers/iommu/arm-smmu.h|   3 ++
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index ca871dc..2a19d41 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -143,6 +143,104 @@ static int nsmmu_init_context(struct arm_smmu_domain 
*smmu_domain)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nsmmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+
+   gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   irq_ret = nsmmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) +
+ ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) +
+   ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nsmmu_context_fault_bank(irq, smmu,
+  idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nsmmu_read_reg,
.write_reg = nsmmu_write_reg,
@@ -150,6 +248,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nsmmu_write_reg64,
.reset = nsmmu_reset,
.init_context = nsmmu_init_context,
+   .global_fault = nsmmu_global_fault,
+   .context_fault = nsmmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index fc0b2

[PATCH v3 7/7] arm64: tegra: enable SMMU for SDHCI and EQOS on T194

2019-10-18 Thread Krishna Reddy
Enable SMMU translations for SDHCI and EQOS transactions on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 6f81e90..bf8ed7a 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -51,6 +52,7 @@
clock-names = "master_bus", "slave_bus", "rx", "tx", 
"ptp_ref";
resets = < TEGRA194_RESET_EQOS>;
reset-names = "eqos";
+   iommus = < TEGRA186_SID_EQOS>;
status = "disabled";
 
snps,write-requests = <1>;
@@ -413,6 +415,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC1>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC1>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout =
<0x07>;
nvidia,pad-autocal-pull-down-offset-3v3-timeout =
@@ -435,6 +438,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC3>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC3>;
nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>;
nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>;
@@ -462,6 +466,7 @@
  < TEGRA194_CLK_PLLC4>;
resets = < TEGRA194_RESET_SDMMC4>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC4>;
nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>;
-- 
2.7.4



[PATCH v3 1/7] iommu/arm-smmu: prepare arm_smmu_flush_ops for override

2019-10-18 Thread Krishna Reddy
Remove const keyword for arm_smmu_flush_ops in arm_smmu_domain
and replace direct references to arm_smmu_tlb_sync* functions with
arm_smmu_flush_ops->tlb_sync().
This is necessary for vendor specific implementations that
need to override arm_smmu_flush_ops in part or full.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 16 
 drivers/iommu/arm-smmu.h |  4 +++-
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 91af695..fc0b27d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
@@ -290,6 +287,8 @@ static void arm_smmu_tlb_sync_vmid(void *cookie)
 static void arm_smmu_tlb_inv_context_s1(void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
+   const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops;
+
/*
 * The TLBI write may be relaxed, so ensure that PTEs cleared by the
 * current CPU are visible beforehand.
@@ -297,18 +296,19 @@ static void arm_smmu_tlb_inv_context_s1(void *cookie)
wmb();
arm_smmu_cb_write(smmu_domain->smmu, smmu_domain->cfg.cbndx,
  ARM_SMMU_CB_S1_TLBIASID, smmu_domain->cfg.asid);
-   arm_smmu_tlb_sync_context(cookie);
+   ops->tlb_sync(cookie);
 }
 
 static void arm_smmu_tlb_inv_context_s2(void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
+   const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops;
 
/* See above */
wmb();
arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, smmu_domain->cfg.vmid);
-   arm_smmu_tlb_sync_global(smmu);
+   ops->tlb_sync(cookie);
 }
 
 static void arm_smmu_tlb_inv_range_s1(unsigned long iova, size_t size,
@@ -410,7 +410,7 @@ static void arm_smmu_tlb_add_page(struct iommu_iotlb_gather 
*gather,
ops->tlb_inv_range(iova, granule, granule, true, cookie);
 }
 
-static const struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops = {
+static struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops = {
.tlb = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s1,
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
@@ -421,7 +421,7 @@ static const struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops 
= {
.tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
-static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = {
+static struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = {
.tlb = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
@@ -432,7 +432,7 @@ static const struct arm_smmu_flush_ops 
arm_smmu_s2_tlb_ops_v2 = {
.tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
-static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v1 = {
+static struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v1 = {
.tlb = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index b19b6ca..b2d6c7f 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -207,6 +207,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
@@ -314,7 +316,7 @@ struct arm_smmu_flush_ops {
 struct arm_smmu_domain {
struct arm_smmu_device  *smmu;
struct io_pgtable_ops   *pgtbl_ops;
-   const struct arm_smmu_flush_ops *flush_ops;
+   struct arm_smmu_flush_ops   *flush_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
boolnon_strict;
-- 
2.7.4



[PATCH v3 6/7] arm64: tegra: Add DT node for T194 SMMU

2019-10-18 Thread Krishna Reddy
Add DT node for T194 SMMU to enable SMMU support.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 77 
 1 file changed, 77 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 1e0b54b..6f81e90 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -1436,6 +1436,83 @@
  0x8200 0x0  0x4000 0x1f 0x4000 0x0 
0xc000>; /* non-prefetchable memory (3GB) */
};
 
+   smmu: iommu@1200 {
+   compatible = "arm,mmu-500","nvidia,tegra194-smmu";
+   reg = <0 0x1200 0 0x80>,
+ <0 0x1100 0 0x80>,
+ <0 0x1000 0 0x80>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   stream-match-mask = <0x7f80>;
+   #global-interrupts = <3>;
+   #iommu-cells = <1>;
+   };
+
sysram@4000 {
compatible = "nvidia,tegra194-sysram", "mmio-sram";
reg = <0x0 0x4000 0x0 0x5>;
-- 
2.7.4



[PATCH v2 7/7] arm64: tegra: enable SMMU for SDHCI and EQOS on T194

2019-09-02 Thread Krishna Reddy
Enable SMMU translations for SDHCI and EQOS transactions on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 5ae3bbf..cac3462 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -51,6 +51,7 @@
clock-names = "master_bus", "slave_bus", "rx", "tx", 
"ptp_ref";
resets = < TEGRA194_RESET_EQOS>;
reset-names = "eqos";
+   iommus = < TEGRA186_SID_EQOS>;
status = "disabled";
 
snps,write-requests = <1>;
@@ -381,6 +382,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC1>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC1>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout =
<0x07>;
nvidia,pad-autocal-pull-down-offset-3v3-timeout =
@@ -403,6 +405,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC3>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC3>;
nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>;
nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>;
@@ -430,6 +433,7 @@
  < TEGRA194_CLK_PLLC4>;
resets = < TEGRA194_RESET_SDMMC4>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC4>;
nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>;
-- 
2.1.4



[PATCH v2 3/7] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2019-09-02 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 Soc SMMU that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 3133f3b..1d72fac 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -31,6 +31,10 @@ conditions.
   as below, SoC-specific compatibles:
   "qcom,sdm845-smmu-500", "arm,mmu-500"
 
+  NVIDIA SoCs that use more than one ARM MMU-500 together
+  needs following SoC-specific compatibles along with 
"arm,mmu-500":
+  "nvidia,tegra194-smmu"
+
 - reg   : Base address and size of the SMMU.
 
 - #global-interrupts : The number of global interrupts exposed by the
-- 
2.1.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/7] Nvidia Arm SMMUv2 Implementation

2019-09-02 Thread Krishna Reddy
Changes in v2:
- Prepare arm_smu_flush_ops for override.
- Remove NVIDIA_SMMUv2 and use ARM_SMMUv2 model as T194 SMMU hasn't modified 
ARM MMU-500.
- Add T194 specific compatible string - "nvidia,tegra194-smmu"
- Remove tlb_sync hook added in v1 and Override arm_smmu_flush_ops->tlb_sync() 
from implementation.
- Register implementation specific context/global fault hooks directly for irq 
handling.
- Update global/context interrupt list in DT and releant fault handling code in 
arm-smmu-nvidia.c.
- Implement reset hook in arm-smmu-nvidia.c to clear irq status and sync tlb.

v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (7):
  iommu/arm-smmu: prepare arm_smmu_flush_ops for override
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks
  arm64: tegra: Add Memory controller DT node on T194
  arm64: tegra: Add DT node for T194 SMMU
  arm64: tegra: enable SMMU for SDHCI and EQOS on T194

 .../devicetree/bindings/iommu/arm,smmu.txt |   4 +
 MAINTAINERS|   2 +
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi |   4 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   |  88 +++
 drivers/iommu/Makefile |   2 +-
 drivers/iommu/arm-smmu-impl.c  |   3 +
 drivers/iommu/arm-smmu-nvidia.c| 287 +
 drivers/iommu/arm-smmu.c   |  27 +-
 drivers/iommu/arm-smmu.h   |   8 +-
 9 files changed, 413 insertions(+), 12 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

-- 
2.1.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/7] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2019-09-02 Thread Krishna Reddy
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 soc.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 187 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 194 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 74e9d9c..c9b802a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15807,9 +15807,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 7caad48..556b94c 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -13,7 +13,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
-obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
+obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 5c87a38..1a19687 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 */
switch (smmu->model) {
case ARM_MMU500:
+   if (of_device_is_compatible(smmu->dev->of_node,
+   "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 000..ca871dc
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,187 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/* Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for Interleaved IOVA accesses and
+ * used by Non-Isochronous Hw devices for SMMU translations.
+ * Third one is used for SMMU translations from Isochronous HW devices.
+ * It is possible to use this Implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant tlb
+ * invalidations as all three never need to be tlb invalidated for a HW device.
+ *
+ * When Linux Kernel supports multiple SMMU devices, The SMMU device used for
+ * Isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient tlb invalidates.
+ *
+ */
+#define MAX_SMMU_INSTANCES 3
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writeq_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static void nsmmu_tlb_sync(struct

  1   2   >