Re: [Freedreno] [PATCH] iommu/arm-smmu: Add a init_context_bank implementation hook

2020-07-13 Thread Jordan Crouse
On Mon, Jul 13, 2020 at 08:03:32PM +0100, Will Deacon wrote:
> On Mon, Jul 13, 2020 at 11:00:32AM -0600, Jordan Crouse wrote:
> > On Mon, Jul 13, 2020 at 04:11:23PM +0100, Will Deacon wrote:
> > > On Thu, Jun 11, 2020 at 04:36:56PM -0600, Jordan Crouse wrote:
> > > > Add a new implementation hook to allow the implementation specific code
> > > > to tweek the context bank configuration just before it gets written.
> > > > The first user will be the Adreno GPU implementation to turn on
> > > > SCTLR.HUPCF to ensure that a page fault doesn't terminating pending
> > > > transactions. Doing so could hang the GPU if one of the terminated
> > > > transactions is a CP read.
> > > > 
> > > > This depends on the arm-smmu adreno SMMU implementation [1].
> > > > 
> > > > [1] https://patchwork.kernel.org/patch/11600943/
> > > > 
> > > > Signed-off-by: Jordan Crouse 
> > > > ---
> > > > 
> > > >  drivers/iommu/arm-smmu-qcom.c | 13 +
> > > >  drivers/iommu/arm-smmu.c  | 28 +---
> > > >  drivers/iommu/arm-smmu.h  | 11 +++
> > > >  3 files changed, 37 insertions(+), 15 deletions(-)
> > > 
> > > This looks straightforward enough, but I don't want to merge this without
> > > a user and Sai's series has open questions afaict.
> > 
> > Not sure what you mean by a user in this context?
> > Are you referring to https://patchwork.kernel.org/patch/11628541/?
> 
> Right, this post was just a single patch in isolation, whereas it was
> reposted over at:
> 
> https://lore.kernel.org/r/cdcc6a1c95a84e774790389dc8b3b7f490dc.1593344119.git.saiprakash.ran...@codeaurora.org
> 
> so I'll ignore this one. Sorry, I'm just really struggling to keep track
> of what is targetting 5.9, and I don't have tonnes of time to sift through
> the backlog of duplicate postings :(

Yeah, that is our fault. There are too many cooks in the kitchen.

We need to pick either system cache or split pagetable and serialize
the other on top of it to get the impl code going and then build from there. 
This particular patch can happily hang out in the background until the rest is
resolved.

Jordan

> Will
> ___
> Freedreno mailing list
> freedr...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH v2 1/6] iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2

2020-07-13 Thread Jordan Crouse
On Tue, Jul 07, 2020 at 08:09:41AM -0700, Rob Clark wrote:
> On Tue, Jul 7, 2020 at 5:34 AM Robin Murphy  wrote:
> >
> > On 2020-06-26 21:04, Jordan Crouse wrote:
> > > Support auxiliary domains for arm-smmu-v2 to initialize and support
> > > multiple pagetables for a single SMMU context bank. Since the smmu-v2
> > > hardware doesn't have any built in support for switching the pagetable
> > > base it is left as an exercise to the caller to actually use the 
> > > pagetable.
> >
> > Hmm, I've still been thinking that we could model this as supporting
> > exactly 1 aux domain iff the device is currently attached to a primary
> > domain with TTBR1 enabled. Then supporting multiple aux domains with
> > magic TTBR0 switching is the Adreno-specific extension on top of that.
> >
> > And if we don't want to go to that length, then - as I think Will was
> > getting at - I'm not sure it's worth bothering at all. There doesn't
> > seem to be any point in half-implementing a pretend aux domain interface
> > while still driving a bus through the rest of the abstractions - it's
> > really the worst of both worlds. If we're going to hand over the guts of
> > io-pgtable to the GPU driver then couldn't it just use
> > DOMAIN_ATTR_PGTABLE_CFG bidirectionally to inject a TTBR0 table straight
> > into the TTBR1-ified domain?
> 
> So, something along the lines of:
> 
> 1) qcom_adreno_smmu_impl somehow tells core arms-smmu that we want
>to use TTBR1 instead of TTBR0
> 
> 2) gpu driver uses iommu_domain_get_attr(PGTABLE_CFG) to snapshot
>the initial pgtable cfg.  (Btw, I kinda feel like we should add
>io_pgtable_fmt to io_pgtable_cfg to make it self contained.)
> 
> 3) gpu driver constructs pgtable_ops for TTBR0, and then kicks
>arm-smmu to do the initial setup to enable TTBR0 with
>iommu_domain_set_attr(PGTABLE_CFG, &ttbr0_pgtable_cfg)


There being no objections, I'm going to start going in this direction.
I think we should have a quirk on the arm-smmu device to allow the PGTABLE_CFG
to be set otherwise there is a chance for abuse. Ideally we would filter this
behavior on a stream ID basis if we come up with a scheme to do that cleanly
based on Will's comments in [1].

[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-July/046488.html

Jordan

> if I understood you properly, that sounds simpler.
> 
> > Much as I like the idea of the aux domain abstraction and making this
> > fit semi-transparently into the IOMMU API, if an almost entirely private
> > interface will be the simplest and cleanest way to get it done then at
> > this point also I'm starting to lean towards just getting it done. But
> > if some other mediated-device type case then turns up that doesn't quite
> > fit that private interface, we revisit the proper abstraction again and
> > I reserve the right to say "I told you so" ;)
> 
> I'm on board with not trying to design this too generically until
> there is a second user
> 
> BR,
> -R
> 
> 
> >
> > Robin.
> >
> > > Aux domains are supported if split pagetable (TTBR1) support has been
> > > enabled on the master domain.  Each auxiliary domain will reuse the
> > > configuration of the master domain. By default the a domain with TTBR1
> > > support will have the TTBR0 region disabled so the first attached aux
> > > domain will enable the TTBR0 region in the hardware and conversely the
> > > last domain to be detached will disable TTBR0 translations.  All 
> > > subsequent
> > > auxiliary domains create a pagetable but not touch the hardware.
> > >
> > > The leaf driver will be able to query the physical address of the
> > > pagetable with the DOMAIN_ATTR_PTBASE attribute so that it can use the
> > > address with whatever means it has to switch the pagetable base.
> > >
> > > Following is a pseudo code example of how a domain can be created
> > >
> > >   /* Check to see if aux domains are supported */
> > >   if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
> > >iommu = iommu_domain_alloc(...);
> > >
> > >if (iommu_aux_attach_device(domain, dev))
> > >return FAIL;
> > >
> > >   /* Save the base address of the pagetable for use by the driver
> > >   iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, &ptbase);
> > >   }
> > >
> > > Then 'domain' can be used like any other iommu domain to map and
> > > unmap iova addresses in the page

Re: [Freedreno] [PATCH v9 4/7] iommu/arm-smmu: Add a pointer to the attached device to smmu_domain

2020-07-13 Thread Jordan Crouse
On Mon, Jul 13, 2020 at 04:09:02PM +0100, Will Deacon wrote:
> On Fri, Jun 26, 2020 at 02:00:38PM -0600, Jordan Crouse wrote:
> > Add a link to the pointer to the struct device that is attached to a
> > domain. This makes it easy to get the pointer if it is needed in the
> > implementation specific code.
> > 
> > Signed-off-by: Jordan Crouse 
> > ---
> > 
> >  drivers/iommu/arm-smmu.c | 6 --
> >  drivers/iommu/arm-smmu.h | 1 +
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index 048de2681670..060139452c54 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -668,7 +668,8 @@ static void arm_smmu_write_context_bank(struct 
> > arm_smmu_device *smmu, int idx)
> >  }
> >  
> >  static int arm_smmu_init_domain_context(struct iommu_domain *domain,
> > -   struct arm_smmu_device *smmu)
> > +   struct arm_smmu_device *smmu,
> > +   struct device *dev)
> >  {
> > int irq, start, ret = 0;
> > unsigned long ias, oas;
> > @@ -801,6 +802,7 @@ static int arm_smmu_init_domain_context(struct 
> > iommu_domain *domain,
> > cfg->asid = cfg->cbndx;
> >  
> > smmu_domain->smmu = smmu;
> > +   smmu_domain->dev = dev;
> >  
> > pgtbl_cfg = (struct io_pgtable_cfg) {
> > .pgsize_bitmap  = smmu->pgsize_bitmap,
> > @@ -1190,7 +1192,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
> > *domain, struct device *dev)
> > return ret;
> >  
> > /* Ensure that the domain is finalised */
> > -   ret = arm_smmu_init_domain_context(domain, smmu);
> > +   ret = arm_smmu_init_domain_context(domain, smmu, dev);
> > if (ret < 0)
> > goto rpm_put;
> >  
> > diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> > index 5f2de20e883b..d33cfe26b2f5 100644
> > --- a/drivers/iommu/arm-smmu.h
> > +++ b/drivers/iommu/arm-smmu.h
> > @@ -345,6 +345,7 @@ struct arm_smmu_domain {
> > struct mutexinit_mutex; /* Protects smmu pointer */
> > spinlock_t  cb_lock; /* Serialises ATS1* ops and 
> > TLB syncs */
> > struct iommu_domain domain;
> > +   struct device   *dev;   /* Device attached to this 
> > domain */
> 
> This really doesn't feel right to me -- you can generally have multiple
> devices attached to a domain and they can come and go without the domain
> being destroyed. Perhaps you could instead identify the GPU during
> cfg_probe() and squirrel that information away somewhere?

I need some help here. The SMMU device (qcom,adreno-smmu) will have at least two
stream ids from two different platform devices (GPU and GMU) and I need to
configure split-pagetable and stall/terminate differently on the two domains.

I couldn't figure out a way to identify the platform device before it attached
itself with iommu_attach_device. I tried poking around in fwspec but got lost.

If there is a way we can uniquely identify the devices (by stream id maybe) then
we could use that though I have reservations about hard coding stream IDs in the
impl driver. That said, the stream IDs have never changed in the life of the
GPU so maybe it's not a problem that needs solving.

Jordan

> The rest of the series looks ok to me.
> 
> Will
> ___
> Freedreno mailing list
> freedr...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: Add a init_context_bank implementation hook

2020-07-13 Thread Jordan Crouse
On Mon, Jul 13, 2020 at 04:11:23PM +0100, Will Deacon wrote:
> On Thu, Jun 11, 2020 at 04:36:56PM -0600, Jordan Crouse wrote:
> > Add a new implementation hook to allow the implementation specific code
> > to tweek the context bank configuration just before it gets written.
> > The first user will be the Adreno GPU implementation to turn on
> > SCTLR.HUPCF to ensure that a page fault doesn't terminating pending
> > transactions. Doing so could hang the GPU if one of the terminated
> > transactions is a CP read.
> > 
> > This depends on the arm-smmu adreno SMMU implementation [1].
> > 
> > [1] https://patchwork.kernel.org/patch/11600943/
> > 
> > Signed-off-by: Jordan Crouse 
> > ---
> > 
> >  drivers/iommu/arm-smmu-qcom.c | 13 +
> >  drivers/iommu/arm-smmu.c  | 28 +---
> >  drivers/iommu/arm-smmu.h  | 11 +++
> >  3 files changed, 37 insertions(+), 15 deletions(-)
> 
> This looks straightforward enough, but I don't want to merge this without
> a user and Sai's series has open questions afaict.

Not sure what you mean by a user in this context?
Are you referring to https://patchwork.kernel.org/patch/11628541/?

> Will

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3 7/7] drm/msm/a6xx: Add support for using system cache(LLC)

2020-07-09 Thread Jordan Crouse
On Fri, Jul 03, 2020 at 09:04:49AM -0700, Rob Clark wrote:
> On Fri, Jul 3, 2020 at 7:53 AM Sai Prakash Ranjan
>  wrote:
> >
> > Hi Will,
> >
> > On 2020-07-03 19:07, Will Deacon wrote:
> > > On Mon, Jun 29, 2020 at 09:22:50PM +0530, Sai Prakash Ranjan wrote:
> > >> diff --git a/drivers/gpu/drm/msm/msm_iommu.c
> > >> b/drivers/gpu/drm/msm/msm_iommu.c
> > >> index f455c597f76d..bd1d58229cc2 100644
> > >> --- a/drivers/gpu/drm/msm/msm_iommu.c
> > >> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> > >> @@ -218,6 +218,9 @@ static int msm_iommu_map(struct msm_mmu *mmu,
> > >> uint64_t iova,
> > >>  iova |= GENMASK_ULL(63, 49);
> > >>
> > >>
> > >> +if (mmu->features & MMU_FEATURE_USE_SYSTEM_CACHE)
> > >> +prot |= IOMMU_SYS_CACHE_ONLY;
> > >
> > > Given that I think this is the only user of IOMMU_SYS_CACHE_ONLY, then
> > > it
> > > looks like it should actually be a property on the domain because we
> > > never
> > > need to configure it on a per-mapping basis within a domain, and
> > > therefore
> > > it shouldn't be exposed by the IOMMU API as a prot flag.
> > >
> > > Do you agree?
> > >
> >
> > GPU being the only user is for now, but there are other clients which
> > can use this.
> > Plus how do we set the memory attributes if we do not expose this as
> > prot flag?
> 
> It does appear that the downstream kgsl driver sets this for basically
> all mappings.. well there is some conditional stuff around
> DOMAIN_ATTR_USE_LLC_NWA but it seems based on the property of the
> domain.  (Jordan may know more about what that is about.)  But looks
> like there are a lot of different paths into iommu_map in kgsl so I
> might have missed something.

Downstream does set it universally. There are some theoretical use cases
where it might be beneficial to set it on a per-mapping basis with a bunch
of hinting from userspace and nobody has tried to characterize this on real
hardware so it is not clear to me if it is worth it.

I think a domain wide attribute works for now but if a compelling per-mapping
use case does comes down the pipeline we need to have a backup in mind -
possibly a prot flag to disable NWA?

Jordan

> Assuming there isn't some case where we specifically don't want to use
> the system cache for some mapping, I think it could be a domain
> attribute that sets an io_pgtable_cfg::quirks flag
> 
> BR,
> -R

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/6] drm/msm: Add support to create a local pagetable

2020-07-08 Thread Jordan Crouse
On Tue, Jul 07, 2020 at 12:36:42PM +0100, Robin Murphy wrote:
> On 2020-06-26 21:04, Jordan Crouse wrote:
> >Add support to create a io-pgtable for use by targets that support
> >per-instance pagetables.  In order to support per-instance pagetables the
> >GPU SMMU device needs to have the qcom,adreno-smmu compatible string and
> >split pagetables and auxiliary domains need to be supported and enabled.
> >
> >Signed-off-by: Jordan Crouse 
> >---
> >
> >  drivers/gpu/drm/msm/msm_gpummu.c |   2 +-
> >  drivers/gpu/drm/msm/msm_iommu.c  | 180 ++-
> >  drivers/gpu/drm/msm/msm_mmu.h|  16 ++-
> >  3 files changed, 195 insertions(+), 3 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/msm/msm_gpummu.c 
> >b/drivers/gpu/drm/msm/msm_gpummu.c
> >index 310a31b05faa..aab121f4beb7 100644
> >--- a/drivers/gpu/drm/msm/msm_gpummu.c
> >+++ b/drivers/gpu/drm/msm/msm_gpummu.c
> >@@ -102,7 +102,7 @@ struct msm_mmu *msm_gpummu_new(struct device *dev, 
> >struct msm_gpu *gpu)
> > }
> > gpummu->gpu = gpu;
> >-msm_mmu_init(&gpummu->base, dev, &funcs);
> >+msm_mmu_init(&gpummu->base, dev, &funcs, MSM_MMU_GPUMMU);
> > return &gpummu->base;
> >  }
> >diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
> >b/drivers/gpu/drm/msm/msm_iommu.c
> >index 1b6635504069..f455c597f76d 100644
> >--- a/drivers/gpu/drm/msm/msm_iommu.c
> >+++ b/drivers/gpu/drm/msm/msm_iommu.c
> >@@ -4,15 +4,192 @@
> >   * Author: Rob Clark 
> >   */
> >+#include 
> >  #include "msm_drv.h"
> >  #include "msm_mmu.h"
> >  struct msm_iommu {
> > struct msm_mmu base;
> > struct iommu_domain *domain;
> >+struct iommu_domain *aux_domain;
> >  };
> >+
> >  #define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
> >+struct msm_iommu_pagetable {
> >+struct msm_mmu base;
> >+struct msm_mmu *parent;
> >+struct io_pgtable_ops *pgtbl_ops;
> >+phys_addr_t ttbr;
> >+u32 asid;
> >+};
> >+
> >+static struct msm_iommu_pagetable *to_pagetable(struct msm_mmu *mmu)
> >+{
> >+return container_of(mmu, struct msm_iommu_pagetable, base);
> >+}
> >+
> >+static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
> >+size_t size)
> >+{
> >+struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
> >+struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
> >+size_t unmapped = 0;
> >+
> >+/* Unmap the block one page at a time */
> >+while (size) {
> >+unmapped += ops->unmap(ops, iova, 4096, NULL);
> >+iova += 4096;
> >+size -= 4096;
> >+}
> >+
> >+iommu_flush_tlb_all(to_msm_iommu(pagetable->parent)->domain);
> >+
> >+return (unmapped == size) ? 0 : -EINVAL;
> >+}
> 
> Remember in patch #1 when you said "Then 'domain' can be used like any other
> iommu domain to map and unmap iova addresses in the pagetable."?
> 
> This appears to be very much not that :/
 
The code changed but the commit log stayed the same.  I'll reword.

Jordan

> Robin.
> 
> >+
> >+static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
> >+struct sg_table *sgt, size_t len, int prot)
> >+{
> >+struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
> >+struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
> >+struct scatterlist *sg;
> >+size_t mapped = 0;
> >+u64 addr = iova;
> >+unsigned int i;
> >+
> >+for_each_sg(sgt->sgl, sg, sgt->nents, i) {
> >+size_t size = sg->length;
> >+phys_addr_t phys = sg_phys(sg);
> >+
> >+/* Map the block one page at a time */
> >+while (size) {
> >+if (ops->map(ops, addr, phys, 4096, prot)) {
> >+msm_iommu_pagetable_unmap(mmu, iova, mapped);
> >+return -EINVAL;
> >+}
> >+
> >+phys += 4096;
> >+addr += 4096;
> >+size -= 4096;
> >+mapped += 4096;
> >+}
> >+}
> >+
> >+return 0;
> >+}
> >+
> >+static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
> >+{
> >+struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
> >+
> >+ 

Re: [Freedreno] [PATCH v2 2/6] iommu/io-pgtable: Allow a pgtable implementation to skip TLB operations

2020-07-08 Thread Jordan Crouse
On Tue, Jul 07, 2020 at 07:58:18AM -0700, Rob Clark wrote:
> On Tue, Jul 7, 2020 at 7:25 AM Rob Clark  wrote:
> >
> > On Tue, Jul 7, 2020 at 4:34 AM Robin Murphy  wrote:
> > >
> > > On 2020-06-26 21:04, Jordan Crouse wrote:
> > > > Allow a io-pgtable implementation to skip TLB operations by checking for
> > > > NULL pointers in the helper functions. It will be up to to the owner
> > > > of the io-pgtable instance to make sure that they independently handle
> > > > the TLB correctly.
> > >
> > > I don't really understand what this is for - tricking the IOMMU driver
> > > into not performing its TLB maintenance at points when that maintenance
> > > has been deemed necessary doesn't seem like the appropriate way to
> > > achieve anything good :/
> >
> > No, for triggering the io-pgtable helpers into not performing TLB
> > maintenance.  But seriously, since we are creating pgtables ourselves,
> > and we don't want to be ioremap'ing the GPU's SMMU instance, the
> > alternative is plugging in no-op helpers.  Which amounts to the same
> > thing.
> 
> Hmm, that said, since we are just memcpy'ing the io_pgtable_cfg from
> arm-smmu, it will already be populated with arm-smmu's fxn ptrs.  I
> guess we could maybe make it work without no-op helpers, although in
> that case it looks like we need to fix something about aux-domain vs
> tlb helpers:

I had a change that handled these correctly but I abandoned it because the
TLB functions didn't kick the power and I didn't think that would be desirable
at the generic level for performance reasons. Since the GPU SMMU is on the same
power domain as the GMU we could enable it in the GPU driver before calling
the TLB operations but we would need to be clever about it to prevent bringing
up the GMU just to unmap memory.

Jordan

> [  +0.004373] Unable to handle kernel NULL pointer dereference at
> virtual address 0019
> [  +0.004086] Mem abort info:
> [  +0.004319]   ESR = 0x9604
> [  +0.003462]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  +0.003494]   SET = 0, FnV = 0
> [  +0.002812]   EA = 0, S1PTW = 0
> [  +0.002873] Data abort info:
> [  +0.003031]   ISV = 0, ISS = 0x0004
> [  +0.003785]   CM = 0, WnR = 0
> [  +0.003641] user pgtable: 4k pages, 48-bit VAs, pgdp=000261d65000
> [  +0.003383] [0019] pgd=, p4d=
> [  +0.003715] Internal error: Oops: 9604 [#1] PREEMPT SMP
> [  +0.002744] Modules linked in: xt_CHECKSUM xt_MASQUERADE
> xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle
> ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack
> nf_defrag_ipv4 libcrc32c bridge stp llc ip6table_filter ip6_tables
> iptable_filter ax88179_178a usbnet uvcvideo videobuf2_vmalloc
> videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc
> hid_multitouch i2c_hid some_battery ti_sn65dsi86 hci_uart btqca btbcm
> qcom_spmi_adc5 bluetooth qcom_spmi_temp_alarm qcom_vadc_common
> ecdh_generic ecc snd_soc_sdm845 snd_soc_rt5663 snd_soc_qcom_common
> ath10k_snoc ath10k_core crct10dif_ce ath mac80211 snd_soc_rl6231
> soundwire_bus i2c_qcom_geni libarc4 qcom_rng msm phy_qcom_qusb2
> reset_qcom_pdc drm_kms_helper cfg80211 rfkill qcom_q6v5_mss
> qcom_q6v5_ipa_notify socinfo qrtr ns panel_simple qcom_q6v5_pas
> qcom_common qcom_glink_smem slim_qcom_ngd_ctrl qcom_sysmon drm
> qcom_q6v5 slimbus qmi_helpers qcom_wdt mdt_loader rmtfs_mem be2iscsi
> bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio
> [  +0.000139]  libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi fuse ip_tables x_tables
> ipv6 nf_defrag_ipv6
> [  +0.020933] CPU: 3 PID: 168 Comm: kworker/u16:7 Not tainted
> 5.8.0-rc1-c630+ #31
> [  +0.003828] Hardware name: LENOVO 81JL/LNVNB161216, BIOS
> 9UCN33WW(V2.06) 06/ 4/2019
> [  +0.004039] Workqueue: msm msm_gem_free_work [msm]
> [  +0.003885] pstate: 60c5 (nZCv daif +PAN +UAO BTYPE=--)
> [  +0.003859] pc : arm_smmu_tlb_inv_range_s1+0x30/0x148
> [  +0.003742] lr : arm_smmu_tlb_add_page_s1+0x1c/0x28
> [  +0.003887] sp : 800011cdb970
> [  +0.003868] x29: 800011cdb970 x28: 0003
> [  +0.003930] x27: 0001f1882f80 x26: 0001
> [  +0.003886] x25: 0003 x24: 0620
> [  +0.003932] x23:  x22: 1000
> [  +0.003886] x21: 1000 x20: 0001cf857300
> [  +0.003916] x19: 0001 x18: 
> [  +0.003921] x17: d9e6a24ae0e8 x16: 00012577
> [  +0.003843] x15: 00012578 x14: 
> [  +0.003884] x13: 00012574 x12: d9e6a2550180
> [  +0.

Re: [Freedreno] [PATCH v2 6/6] drm/msm/a6xx: Add support for per-instance pagetables

2020-06-29 Thread Jordan Crouse
On Sat, Jun 27, 2020 at 01:11:14PM -0700, Rob Clark wrote:
> On Sat, Jun 27, 2020 at 12:56 PM Rob Clark  wrote:
> >
> > On Fri, Jun 26, 2020 at 1:04 PM Jordan Crouse  
> > wrote:
> > >
> > > Add support for using per-instance pagetables if all the dependencies are
> > > available.
> > >
> > > Signed-off-by: Jordan Crouse 
> > > ---
> > >
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 43 +++
> > >  drivers/gpu/drm/msm/msm_ringbuffer.h  |  1 +
> > >  2 files changed, 44 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > index aa53f47b7e8b..95ed2ceac121 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > @@ -79,6 +79,34 @@ static void get_stats_counter(struct msm_ringbuffer 
> > > *ring, u32 counter,
> > > OUT_RING(ring, upper_32_bits(iova));
> > >  }
> > >
> > > +static void a6xx_set_pagetable(struct msm_gpu *gpu, struct 
> > > msm_ringbuffer *ring,
> > > +   struct msm_file_private *ctx)
> > > +{
> > > +   phys_addr_t ttbr;
> > > +   u32 asid;
> > > +
> > > +   if (msm_iommu_pagetable_params(ctx->aspace->mmu, &ttbr, &asid))
> > > +   return;
> > > +
> > > +   /* Execute the table update */
> > > +   OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 4);
> > > +   OUT_RING(ring, lower_32_bits(ttbr));
> > > +   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
> > > +   /* CONTEXTIDR is currently unused */
> > > +   OUT_RING(ring, 0);
> > > +   /* CONTEXTBANK is currently unused */
> > > +   OUT_RING(ring, 0);
> > > +
> > > +   /*
> > > +* Write the new TTBR0 to the memstore. This is good for 
> > > debugging.
> > > +*/
> > > +   OUT_PKT7(ring, CP_MEM_WRITE, 4);
> > > +   OUT_RING(ring, lower_32_bits(rbmemptr(ring, ttbr0)));
> > > +   OUT_RING(ring, upper_32_bits(rbmemptr(ring, ttbr0)));
> > > +   OUT_RING(ring, lower_32_bits(ttbr));
> > > +   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
> > > +}
> > > +
> > >  static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit 
> > > *submit,
> > > struct msm_file_private *ctx)
> > >  {
> > > @@ -89,6 +117,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
> > > msm_gem_submit *submit,
> > > struct msm_ringbuffer *ring = submit->ring;
> > > unsigned int i;
> > >
> > > +   a6xx_set_pagetable(gpu, ring, ctx);
> > > +
> > > get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO,
> > > rbmemptr_stats(ring, index, cpcycles_start));
> > >
> > > @@ -872,6 +902,18 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu 
> > > *gpu)
> > > return (unsigned long)busy_time;
> > >  }
> > >
> > > +struct msm_gem_address_space *a6xx_address_space_instance(struct msm_gpu 
> > > *gpu)
> > > +{
> > > +   struct msm_mmu *mmu;
> > > +
> > > +   mmu = msm_iommu_pagetable_create(gpu->aspace->mmu);
> > > +   if (IS_ERR(mmu))
> > > +   return msm_gem_address_space_get(gpu->aspace);
> > > +
> > > +   return msm_gem_address_space_create(mmu,
> > > +   "gpu", 0x1ULL, 0x1ULL);
> > > +}
> > > +
> > >  static const struct adreno_gpu_funcs funcs = {
> > > .base = {
> > > .get_param = adreno_get_param,
> > > @@ -895,6 +937,7 @@ static const struct adreno_gpu_funcs funcs = {
> > > .gpu_state_put = a6xx_gpu_state_put,
> > >  #endif
> > > .create_address_space = adreno_iommu_create_address_space,
> > > +   .address_space_instance = a6xx_address_space_instance,
> >
> > Hmm, maybe instead of .address_space_instance, something like
> > .create_context_address_space?
> >
> > Since like .create_address_space, it is creating an address space..
> > the difference is that it is a per context/process aspace..
> >

This is a good suggestion. I'm always open to changing function names.

> 
> 
> or maybe ju

Re: [Freedreno] [PATCH v9 6/7] drm/msm: Set the global virtual address range from the IOMMU domain

2020-06-29 Thread Jordan Crouse
On Sat, Jun 27, 2020 at 10:10:14AM -0700, Rob Clark wrote:
> On Fri, Jun 26, 2020 at 1:01 PM Jordan Crouse  wrote:
> >
> > Use the aperture settings from the IOMMU domain to set up the virtual
> > address range for the GPU. This allows us to transparently deal with
> > IOMMU side features (like split pagetables).
> >
> > Signed-off-by: Jordan Crouse 
> > ---
> >
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +++--
> >  drivers/gpu/drm/msm/msm_iommu.c |  7 +++
> >  2 files changed, 18 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > index 5db06b590943..3e717c1ebb7f 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > @@ -192,9 +192,18 @@ adreno_iommu_create_address_space(struct msm_gpu *gpu,
> > struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type);
> > struct msm_mmu *mmu = msm_iommu_new(&pdev->dev, iommu);
> > struct msm_gem_address_space *aspace;
> > +   u64 start, size;
> >
> > -   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
> > -   0x - SZ_16M);
> > +   /*
> > +* Use the aperture start or SZ_16M, whichever is greater. This will
> > +* ensure that we align with the allocated pagetable range while 
> > still
> > +* allowing room in the lower 32 bits for GMEM and whatnot
> > +*/
> > +   start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
> > +   size = iommu->geometry.aperture_end - start + 1;
> > +
> > +   aspace = msm_gem_address_space_create(mmu, "gpu",
> > +   start & GENMASK(48, 0), size);
> 
> hmm, I kinda think this isn't going to play well for the 32b gpus
> (pre-a5xx).. possibly we should add address space size to 'struct
> adreno_info'?

I checked and qcom-iommu sets the aperture correctly so this should be okay for
everybody. To be honest, I'm nots sure if we even need to mask the start to 49
bits. It seems that all of the iommu implementations do the right thing.  Of
course it would be worth a check if you have a 4xx handy.

> Or I guess it is always going to be the same for all devices within a
> generation?  So it could just be passed in to adreno_gpu_init()

We can do that easily if we are worried about it (see also: a2xx). I just
figured this might save us a bit of code.

> Hopefully that makes things smoother if we someday had more than 48bits..

We'll be at 49 bits for as far ahead as I can see. 49 bits has a special
meaning in the SMMU so it is a natural fit for the GPU hardware. If we change in
N generations we can just shift to a family specific function at that point.

Jordan

> BR,
> -R
> 
> >
> > if (IS_ERR(aspace) && !IS_ERR(mmu))
> > mmu->funcs->destroy(mmu);
> > diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
> > b/drivers/gpu/drm/msm/msm_iommu.c
> > index 3a381a9674c9..1b6635504069 100644
> > --- a/drivers/gpu/drm/msm/msm_iommu.c
> > +++ b/drivers/gpu/drm/msm/msm_iommu.c
> > @@ -36,6 +36,10 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t 
> > iova,
> > struct msm_iommu *iommu = to_msm_iommu(mmu);
> > size_t ret;
> >
> > +   /* The arm-smmu driver expects the addresses to be sign extended */
> > +   if (iova & BIT_ULL(48))
> > +   iova |= GENMASK_ULL(63, 49);
> > +
> > ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
> > WARN_ON(!ret);
> >
> > @@ -46,6 +50,9 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t 
> > iova, size_t len)
> >  {
> > struct msm_iommu *iommu = to_msm_iommu(mmu);
> >
> > +   if (iova & BIT_ULL(48))
> > +   iova |= GENMASK_ULL(63, 49);
> > +
> > iommu_unmap(iommu->domain, iova, len);
> >
> > return 0;
> > --
> > 2.17.1
> >
> > ___
> > Freedreno mailing list
> > freedr...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/freedreno

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/6] iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2

2020-06-26 Thread Jordan Crouse
Support auxiliary domains for arm-smmu-v2 to initialize and support
multiple pagetables for a single SMMU context bank. Since the smmu-v2
hardware doesn't have any built in support for switching the pagetable
base it is left as an exercise to the caller to actually use the pagetable.

Aux domains are supported if split pagetable (TTBR1) support has been
enabled on the master domain.  Each auxiliary domain will reuse the
configuration of the master domain. By default the a domain with TTBR1
support will have the TTBR0 region disabled so the first attached aux
domain will enable the TTBR0 region in the hardware and conversely the
last domain to be detached will disable TTBR0 translations.  All subsequent
auxiliary domains create a pagetable but not touch the hardware.

The leaf driver will be able to query the physical address of the
pagetable with the DOMAIN_ATTR_PTBASE attribute so that it can use the
address with whatever means it has to switch the pagetable base.

Following is a pseudo code example of how a domain can be created

 /* Check to see if aux domains are supported */
 if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
 iommu = iommu_domain_alloc(...);

 if (iommu_aux_attach_device(domain, dev))
 return FAIL;

/* Save the base address of the pagetable for use by the driver
iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, &ptbase);
 }

Then 'domain' can be used like any other iommu domain to map and
unmap iova addresses in the pagetable.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 219 ---
 drivers/iommu/arm-smmu.h |   1 +
 2 files changed, 204 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 060139452c54..ce6d654301bf 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -91,6 +91,7 @@ struct arm_smmu_cb {
u32 tcr[2];
u32 mair[2];
struct arm_smmu_cfg *cfg;
+   atomic_taux;
 };
 
 struct arm_smmu_master_cfg {
@@ -667,6 +668,86 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
 }
 
+/*
+ * Update the context context bank to enable TTBR0. Assumes AARCH64 S1
+ * configuration.
+ */
+static void arm_smmu_context_set_ttbr0(struct arm_smmu_cb *cb,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   u32 tcr = cb->tcr[0];
+
+   /* Add the TCR configuration from the new pagetable config */
+   tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
+
+   /* Make sure that both TTBR0 and TTBR1 are enabled */
+   tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
+
+   /* Udate the TCR register */
+   cb->tcr[0] = tcr;
+
+   /* Program the new TTBR0 */
+   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+}
+
+/*
+ * Thus function assumes that the current model only allows aux domains for
+ * AARCH64 S1 configurations
+ */
+static int arm_smmu_aux_init_domain_context(struct iommu_domain *domain,
+   struct arm_smmu_device *smmu, struct arm_smmu_cfg *master)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *pgtbl_ops;
+   struct io_pgtable_cfg pgtbl_cfg;
+
+   mutex_lock(&smmu_domain->init_mutex);
+
+   /* Copy the configuration from the master */
+   memcpy(&smmu_domain->cfg, master, sizeof(smmu_domain->cfg));
+
+   smmu_domain->flush_ops = &arm_smmu_s1_tlb_ops;
+   smmu_domain->smmu = smmu;
+
+   pgtbl_cfg = (struct io_pgtable_cfg) {
+   .pgsize_bitmap = smmu->pgsize_bitmap,
+   .ias = smmu->va_size,
+   .oas = smmu->ipa_size,
+   .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK,
+   .tlb = smmu_domain->flush_ops,
+   .iommu_dev = smmu->dev,
+   .quirks = 0,
+   };
+
+   if (smmu_domain->non_strict)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+
+   pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1, &pgtbl_cfg,
+   smmu_domain);
+   if (!pgtbl_ops) {
+   mutex_unlock(&smmu_domain->init_mutex);
+   return -ENOMEM;
+   }
+
+   domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
+
+   domain->geometry.aperture_end = (1UL << smmu->va_size) - 1;
+   domain->geometry.force_aperture = true;
+
+   /* enable TTBR0 when the the first aux domain is attached */
+   if (atomic_inc_return(&smmu->cbs[master->cbndx].aux) == 1) {
+   arm_smmu_context_set_ttbr0(&smmu->cbs[master->cbndx],
+   

[PATCH v2 0/6] iommu-arm-smmu: Add auxiliary domains and per-instance pagetables

2020-06-26 Thread Jordan Crouse


This is a new refresh of support for auxiliary domains for arm-smmu-v2
and per-instance pagetables for drm/msm. The big change here from past
efforts is that outside of creating a single aux-domain to enable TTBR0
all of the per-instance pagetables are created and managed exclusively
in drm/msm without involving the arm-smmu driver. This fits in with the
suggested model of letting the GPU hardware do what it needs and leave the
arm-smmu driver blissfully unaware.

Almost. In order to set up the io-pgtable properly in drm/msm we need to
query the pagetable configuration from the current active domain and we need to
rely on the iommu API to flush TLBs after a unmap. In the future we can optimize
this in the drm/msm driver to track the state of the TLBs but for now the big
hammer lets us get off the ground.

This series is built on the split pagetable support [1].

[1] https://patchwork.kernel.org/patch/11628543/

v2: Remove unneeded cruft in the a6xx page switch sequence

Jordan Crouse (6):
  iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2
  iommu/io-pgtable: Allow a pgtable implementation to skip TLB
operations
  iommu/arm-smmu: Add a domain attribute to pass the pagetable config
  drm/msm: Add support to create a local pagetable
  drm/msm: Add support for address space instances
  drm/msm/a6xx: Add support for per-instance pagetables

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  43 +
 drivers/gpu/drm/msm/msm_drv.c |  15 +-
 drivers/gpu/drm/msm/msm_drv.h |   4 +
 drivers/gpu/drm/msm/msm_gem_vma.c |   9 +
 drivers/gpu/drm/msm/msm_gpu.c |  17 ++
 drivers/gpu/drm/msm/msm_gpu.h |   5 +
 drivers/gpu/drm/msm/msm_gpummu.c  |   2 +-
 drivers/gpu/drm/msm/msm_iommu.c   | 180 +++-
 drivers/gpu/drm/msm/msm_mmu.h |  16 +-
 drivers/gpu/drm/msm/msm_ringbuffer.h  |   1 +
 drivers/iommu/arm-smmu.c  | 231 --
 drivers/iommu/arm-smmu.h  |   1 +
 include/linux/io-pgtable.h|  11 +-
 include/linux/iommu.h |   1 +
 14 files changed, 507 insertions(+), 29 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 5/6] drm/msm: Add support for address space instances

2020-06-26 Thread Jordan Crouse
Add support for allocating an address space instance. Targets that support
per-instance pagetables should implement their own function to allocate a
new instance. The default will return the existing generic address space.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/msm_drv.c | 15 +--
 drivers/gpu/drm/msm/msm_drv.h |  4 
 drivers/gpu/drm/msm/msm_gem_vma.c |  9 +
 drivers/gpu/drm/msm/msm_gpu.c | 17 +
 drivers/gpu/drm/msm/msm_gpu.h |  5 +
 5 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 6c57cc72d627..092c49552ddd 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -588,7 +588,7 @@ static int context_init(struct drm_device *dev, struct 
drm_file *file)
 
msm_submitqueue_init(dev, ctx);
 
-   ctx->aspace = priv->gpu ? priv->gpu->aspace : NULL;
+   ctx->aspace = msm_gpu_address_space_instance(priv->gpu);
file->driver_priv = ctx;
 
return 0;
@@ -607,6 +607,8 @@ static int msm_open(struct drm_device *dev, struct drm_file 
*file)
 static void context_close(struct msm_file_private *ctx)
 {
msm_submitqueue_close(ctx);
+
+   msm_gem_address_space_put(ctx->aspace);
kfree(ctx);
 }
 
@@ -771,18 +773,19 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, 
void *data,
 }
 
 static int msm_ioctl_gem_info_iova(struct drm_device *dev,
-   struct drm_gem_object *obj, uint64_t *iova)
+   struct drm_file *file, struct drm_gem_object *obj,
+   uint64_t *iova)
 {
-   struct msm_drm_private *priv = dev->dev_private;
+   struct msm_file_private *ctx = file->driver_priv;
 
-   if (!priv->gpu)
+   if (!ctx->aspace)
return -EINVAL;
 
/*
 * Don't pin the memory here - just get an address so that userspace can
 * be productive
 */
-   return msm_gem_get_iova(obj, priv->gpu->aspace, iova);
+   return msm_gem_get_iova(obj, ctx->aspace, iova);
 }
 
 static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
@@ -821,7 +824,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
args->value = msm_gem_mmap_offset(obj);
break;
case MSM_INFO_GET_IOVA:
-   ret = msm_ioctl_gem_info_iova(dev, obj, &args->value);
+   ret = msm_ioctl_gem_info_iova(dev, file, obj, &args->value);
break;
case MSM_INFO_SET_NAME:
/* length check should leave room for terminating null: */
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index e2d6a6056418..983a8b7e5a74 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -249,6 +249,10 @@ int msm_gem_map_vma(struct msm_gem_address_space *aspace,
 void msm_gem_close_vma(struct msm_gem_address_space *aspace,
struct msm_gem_vma *vma);
 
+
+struct msm_gem_address_space *
+msm_gem_address_space_get(struct msm_gem_address_space *aspace);
+
 void msm_gem_address_space_put(struct msm_gem_address_space *aspace);
 
 struct msm_gem_address_space *
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 5f6a11211b64..29cc1305cf37 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -27,6 +27,15 @@ void msm_gem_address_space_put(struct msm_gem_address_space 
*aspace)
kref_put(&aspace->kref, msm_gem_address_space_destroy);
 }
 
+struct msm_gem_address_space *
+msm_gem_address_space_get(struct msm_gem_address_space *aspace)
+{
+   if (!IS_ERR_OR_NULL(aspace))
+   kref_get(&aspace->kref);
+
+   return aspace;
+}
+
 /* Actually unmap memory for the vma */
 void msm_gem_purge_vma(struct msm_gem_address_space *aspace,
struct msm_gem_vma *vma)
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 86a138641477..0fa614430799 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -821,6 +821,23 @@ static int get_clocks(struct platform_device *pdev, struct 
msm_gpu *gpu)
return 0;
 }
 
+/* Return a new address space instance */
+struct msm_gem_address_space *
+msm_gpu_address_space_instance(struct msm_gpu *gpu)
+{
+   if (!gpu)
+   return NULL;
+
+   /*
+* If the GPU doesn't support instanced address spaces return the
+* default address space
+*/
+   if (!gpu->funcs->address_space_instance)
+   return msm_gem_address_space_get(gpu->aspace);
+
+   return gpu->funcs->address_space_instance(gpu);
+}
+
 int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
struct msm_gpu *gpu, const struct msm_gpu_funcs *funcs,

[PATCH v2 4/6] drm/msm: Add support to create a local pagetable

2020-06-26 Thread Jordan Crouse
Add support to create a io-pgtable for use by targets that support
per-instance pagetables.  In order to support per-instance pagetables the
GPU SMMU device needs to have the qcom,adreno-smmu compatible string and
split pagetables and auxiliary domains need to be supported and enabled.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/msm_gpummu.c |   2 +-
 drivers/gpu/drm/msm/msm_iommu.c  | 180 ++-
 drivers/gpu/drm/msm/msm_mmu.h|  16 ++-
 3 files changed, 195 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpummu.c b/drivers/gpu/drm/msm/msm_gpummu.c
index 310a31b05faa..aab121f4beb7 100644
--- a/drivers/gpu/drm/msm/msm_gpummu.c
+++ b/drivers/gpu/drm/msm/msm_gpummu.c
@@ -102,7 +102,7 @@ struct msm_mmu *msm_gpummu_new(struct device *dev, struct 
msm_gpu *gpu)
}
 
gpummu->gpu = gpu;
-   msm_mmu_init(&gpummu->base, dev, &funcs);
+   msm_mmu_init(&gpummu->base, dev, &funcs, MSM_MMU_GPUMMU);
 
return &gpummu->base;
 }
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 1b6635504069..f455c597f76d 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -4,15 +4,192 @@
  * Author: Rob Clark 
  */
 
+#include 
 #include "msm_drv.h"
 #include "msm_mmu.h"
 
 struct msm_iommu {
struct msm_mmu base;
struct iommu_domain *domain;
+   struct iommu_domain *aux_domain;
 };
+
 #define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
 
+struct msm_iommu_pagetable {
+   struct msm_mmu base;
+   struct msm_mmu *parent;
+   struct io_pgtable_ops *pgtbl_ops;
+   phys_addr_t ttbr;
+   u32 asid;
+};
+
+static struct msm_iommu_pagetable *to_pagetable(struct msm_mmu *mmu)
+{
+   return container_of(mmu, struct msm_iommu_pagetable, base);
+}
+
+static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
+   size_t size)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   size_t unmapped = 0;
+
+   /* Unmap the block one page at a time */
+   while (size) {
+   unmapped += ops->unmap(ops, iova, 4096, NULL);
+   iova += 4096;
+   size -= 4096;
+   }
+
+   iommu_flush_tlb_all(to_msm_iommu(pagetable->parent)->domain);
+
+   return (unmapped == size) ? 0 : -EINVAL;
+}
+
+static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
+   struct sg_table *sgt, size_t len, int prot)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   struct scatterlist *sg;
+   size_t mapped = 0;
+   u64 addr = iova;
+   unsigned int i;
+
+   for_each_sg(sgt->sgl, sg, sgt->nents, i) {
+   size_t size = sg->length;
+   phys_addr_t phys = sg_phys(sg);
+
+   /* Map the block one page at a time */
+   while (size) {
+   if (ops->map(ops, addr, phys, 4096, prot)) {
+   msm_iommu_pagetable_unmap(mmu, iova, mapped);
+   return -EINVAL;
+   }
+
+   phys += 4096;
+   addr += 4096;
+   size -= 4096;
+   mapped += 4096;
+   }
+   }
+
+   return 0;
+}
+
+static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+
+   free_io_pgtable_ops(pagetable->pgtbl_ops);
+   kfree(pagetable);
+}
+
+/*
+ * Given a parent device, create and return an aux domain. This will enable the
+ * TTBR0 region
+ */
+static struct iommu_domain *msm_iommu_get_aux_domain(struct msm_mmu *parent)
+{
+   struct msm_iommu *iommu = to_msm_iommu(parent);
+   struct iommu_domain *domain;
+   int ret;
+
+   if (iommu->aux_domain)
+   return iommu->aux_domain;
+
+   if (!iommu_dev_has_feature(parent->dev, IOMMU_DEV_FEAT_AUX))
+   return ERR_PTR(-ENODEV);
+
+   domain = iommu_domain_alloc(&platform_bus_type);
+   if (!domain)
+   return ERR_PTR(-ENODEV);
+
+   ret = iommu_aux_attach_device(domain, parent->dev);
+   if (ret) {
+   iommu_domain_free(domain);
+   return ERR_PTR(ret);
+   }
+
+   iommu->aux_domain = domain;
+   return domain;
+}
+
+int msm_iommu_pagetable_params(struct msm_mmu *mmu,
+   phys_addr_t *ttbr, int *asid)
+{
+   struct msm_iommu_pagetable *pagetable;
+
+   if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
+   return -EINVAL;
+
+   pagetable = to_pagetable(mmu);
+
+   if (ttbr)
+   *ttbr = pagetable->ttbr;
+
+   if (asid)
+

[PATCH v2 6/6] drm/msm/a6xx: Add support for per-instance pagetables

2020-06-26 Thread Jordan Crouse
Add support for using per-instance pagetables if all the dependencies are
available.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 43 +++
 drivers/gpu/drm/msm/msm_ringbuffer.h  |  1 +
 2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index aa53f47b7e8b..95ed2ceac121 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -79,6 +79,34 @@ static void get_stats_counter(struct msm_ringbuffer *ring, 
u32 counter,
OUT_RING(ring, upper_32_bits(iova));
 }
 
+static void a6xx_set_pagetable(struct msm_gpu *gpu, struct msm_ringbuffer 
*ring,
+   struct msm_file_private *ctx)
+{
+   phys_addr_t ttbr;
+   u32 asid;
+
+   if (msm_iommu_pagetable_params(ctx->aspace->mmu, &ttbr, &asid))
+   return;
+
+   /* Execute the table update */
+   OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 4);
+   OUT_RING(ring, lower_32_bits(ttbr));
+   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
+   /* CONTEXTIDR is currently unused */
+   OUT_RING(ring, 0);
+   /* CONTEXTBANK is currently unused */
+   OUT_RING(ring, 0);
+
+   /*
+* Write the new TTBR0 to the memstore. This is good for debugging.
+*/
+   OUT_PKT7(ring, CP_MEM_WRITE, 4);
+   OUT_RING(ring, lower_32_bits(rbmemptr(ring, ttbr0)));
+   OUT_RING(ring, upper_32_bits(rbmemptr(ring, ttbr0)));
+   OUT_RING(ring, lower_32_bits(ttbr));
+   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
+}
+
 static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
struct msm_file_private *ctx)
 {
@@ -89,6 +117,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit,
struct msm_ringbuffer *ring = submit->ring;
unsigned int i;
 
+   a6xx_set_pagetable(gpu, ring, ctx);
+
get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO,
rbmemptr_stats(ring, index, cpcycles_start));
 
@@ -872,6 +902,18 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
return (unsigned long)busy_time;
 }
 
+struct msm_gem_address_space *a6xx_address_space_instance(struct msm_gpu *gpu)
+{
+   struct msm_mmu *mmu;
+
+   mmu = msm_iommu_pagetable_create(gpu->aspace->mmu);
+   if (IS_ERR(mmu))
+   return msm_gem_address_space_get(gpu->aspace);
+
+   return msm_gem_address_space_create(mmu,
+   "gpu", 0x1ULL, 0x1ULL);
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
@@ -895,6 +937,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_state_put = a6xx_gpu_state_put,
 #endif
.create_address_space = adreno_iommu_create_address_space,
+   .address_space_instance = a6xx_address_space_instance,
},
.get_timestamp = a6xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h 
b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 7764373d0ed2..0987d6bf848c 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -31,6 +31,7 @@ struct msm_rbmemptrs {
volatile uint32_t fence;
 
volatile struct msm_gpu_submit_stats stats[MSM_GPU_SUBMIT_STATS_COUNT];
+   volatile u64 ttbr0;
 };
 
 struct msm_ringbuffer {
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/6] iommu/arm-smmu: Add a domain attribute to pass the pagetable config

2020-06-26 Thread Jordan Crouse
The Adreno GPU has the capacity to manage its own pagetables and switch
them dynamically from the hardware. Add a domain attribute for arm-smmu-v2
to get the default pagetable configuration so that the GPU driver can match
the format for its own pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 12 
 include/linux/iommu.h|  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ce6d654301bf..4bd247dfd703 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1714,6 +1714,18 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case DOMAIN_ATTR_NESTING:
*(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_PGTABLE_CFG: {
+   struct io_pgtable *pgtable;
+   struct io_pgtable_cfg *dest = data;
+
+   if (!smmu_domain->pgtbl_ops)
+   return -ENODEV;
+
+   pgtable = 
io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+
+   memcpy(dest, &pgtable->cfg, sizeof(*dest));
+   return 0;
+   }
default:
return -ENODEV;
}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5f0b7859d2eb..2388117641f1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -124,6 +124,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   DOMAIN_ATTR_PGTABLE_CFG,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/6] iommu/io-pgtable: Allow a pgtable implementation to skip TLB operations

2020-06-26 Thread Jordan Crouse
Allow a io-pgtable implementation to skip TLB operations by checking for
NULL pointers in the helper functions. It will be up to to the owner
of the io-pgtable instance to make sure that they independently handle
the TLB correctly.

Signed-off-by: Jordan Crouse 
---

 include/linux/io-pgtable.h | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 53d53c6c2be9..bbed1d3925ba 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -210,21 +210,24 @@ struct io_pgtable {
 
 static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
 {
-   iop->cfg.tlb->tlb_flush_all(iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_all(iop->cookie);
 }
 
 static inline void
 io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
 {
-   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
 }
 
 static inline void
 io_pgtable_tlb_flush_leaf(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
 {
-   iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie);
 }
 
 static inline void
@@ -232,7 +235,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
struct iommu_iotlb_gather * gather, unsigned long iova,
size_t granule)
 {
-   if (iop->cfg.tlb->tlb_add_page)
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
 }
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 6/7] drm/msm: Set the global virtual address range from the IOMMU domain

2020-06-26 Thread Jordan Crouse
Use the aperture settings from the IOMMU domain to set up the virtual
address range for the GPU. This allows us to transparently deal with
IOMMU side features (like split pagetables).

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +++--
 drivers/gpu/drm/msm/msm_iommu.c |  7 +++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 5db06b590943..3e717c1ebb7f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -192,9 +192,18 @@ adreno_iommu_create_address_space(struct msm_gpu *gpu,
struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type);
struct msm_mmu *mmu = msm_iommu_new(&pdev->dev, iommu);
struct msm_gem_address_space *aspace;
+   u64 start, size;
 
-   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
-   0x - SZ_16M);
+   /*
+* Use the aperture start or SZ_16M, whichever is greater. This will
+* ensure that we align with the allocated pagetable range while still
+* allowing room in the lower 32 bits for GMEM and whatnot
+*/
+   start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
+   size = iommu->geometry.aperture_end - start + 1;
+
+   aspace = msm_gem_address_space_create(mmu, "gpu",
+   start & GENMASK(48, 0), size);
 
if (IS_ERR(aspace) && !IS_ERR(mmu))
mmu->funcs->destroy(mmu);
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 3a381a9674c9..1b6635504069 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -36,6 +36,10 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
struct msm_iommu *iommu = to_msm_iommu(mmu);
size_t ret;
 
+   /* The arm-smmu driver expects the addresses to be sign extended */
+   if (iova & BIT_ULL(48))
+   iova |= GENMASK_ULL(63, 49);
+
ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
WARN_ON(!ret);
 
@@ -46,6 +50,9 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t 
iova, size_t len)
 {
struct msm_iommu *iommu = to_msm_iommu(mmu);
 
+   if (iova & BIT_ULL(48))
+   iova |= GENMASK_ULL(63, 49);
+
iommu_unmap(iommu->domain, iova, len);
 
return 0;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 4/7] iommu/arm-smmu: Add a pointer to the attached device to smmu_domain

2020-06-26 Thread Jordan Crouse
Add a link to the pointer to the struct device that is attached to a
domain. This makes it easy to get the pointer if it is needed in the
implementation specific code.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 6 --
 drivers/iommu/arm-smmu.h | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 048de2681670..060139452c54 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -668,7 +668,8 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
 }
 
 static int arm_smmu_init_domain_context(struct iommu_domain *domain,
-   struct arm_smmu_device *smmu)
+   struct arm_smmu_device *smmu,
+   struct device *dev)
 {
int irq, start, ret = 0;
unsigned long ias, oas;
@@ -801,6 +802,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
cfg->asid = cfg->cbndx;
 
smmu_domain->smmu = smmu;
+   smmu_domain->dev = dev;
 
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
@@ -1190,7 +1192,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return ret;
 
/* Ensure that the domain is finalised */
-   ret = arm_smmu_init_domain_context(domain, smmu);
+   ret = arm_smmu_init_domain_context(domain, smmu, dev);
if (ret < 0)
goto rpm_put;
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 5f2de20e883b..d33cfe26b2f5 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -345,6 +345,7 @@ struct arm_smmu_domain {
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
struct iommu_domain domain;
+   struct device   *dev;   /* Device attached to this 
domain */
 };
 
 static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 3/7] dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU

2020-06-26 Thread Jordan Crouse
Every Qcom Adreno GPU has an embedded SMMU for its own use. These
devices depend on unique features such as split pagetables,
different stall/halt requirements and other settings. Identify them
with a compatible string so that they can be identified in the
arm-smmu implementation specific code.

Reviewed-by: Rob Herring 
Signed-off-by: Jordan Crouse 
---

 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423..e52a1b146c97 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,10 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: Qcom Adreno GPUs implementing "arm,smmu-v2"
+items:
+  - const: qcom,adreno-smmu
+  - const: qcom,smmu-v2
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 5/7] iommu/arm-smmu: Add implementation for the adreno GPU SMMU

2020-06-26 Thread Jordan Crouse
Add a special implementation for the SMMU attached to most Adreno GPU
target triggered from the qcom,adreno-gpu-smmu compatible string. When
selected the driver will attempt to enable split pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-impl.c |  3 +++
 drivers/iommu/arm-smmu-qcom.c | 45 +--
 drivers/iommu/arm-smmu.h  |  1 +
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index a20e426d81ac..309675cf6699 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -176,5 +176,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
 
+   if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu"))
+   return qcom_adreno_smmu_impl_init(smmu);
+
return smmu;
 }
diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index cf01d0215a39..3248d44ec6d5 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -12,6 +12,29 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
 };
 
+static bool qcom_adreno_smmu_is_gpu_device(struct arm_smmu_domain *smmu_domain)
+{
+   return of_device_is_compatible(smmu_domain->dev->of_node, 
"qcom,adreno");
+}
+
+static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   /* TTBR1 is only for the GPU stream ID and not the GMU */
+   if (!qcom_adreno_smmu_is_gpu_device(smmu_domain))
+   return 0;
+   /*
+* All targets that use the qcom,adreno-smmu compatible string *should*
+* be AARCH64 stage 1 but double check because the arm-smmu code assumes
+* that is the case when the TTBR1 quirk is enabled
+*/
+   if ((smmu_domain->stage == ARM_SMMU_DOMAIN_S1) &&
+   (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
+   pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
+
+   return 0;
+}
+
 static const struct of_device_id qcom_smmu_client_of_match[] = {
{ .compatible = "qcom,adreno" },
{ .compatible = "qcom,mdp4" },
@@ -65,7 +88,15 @@ static const struct arm_smmu_impl qcom_smmu_impl = {
.reset = qcom_smmu500_reset,
 };
 
-struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
+static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
+   .init_context = qcom_adreno_smmu_init_context,
+   .def_domain_type = qcom_smmu_def_domain_type,
+   .reset = qcom_smmu500_reset,
+};
+
+
+static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
+   const struct arm_smmu_impl *impl)
 {
struct qcom_smmu *qsmmu;
 
@@ -75,8 +106,18 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct 
arm_smmu_device *smmu)
 
qsmmu->smmu = *smmu;
 
-   qsmmu->smmu.impl = &qcom_smmu_impl;
+   qsmmu->smmu.impl = impl;
devm_kfree(smmu->dev, smmu);
 
return &qsmmu->smmu;
 }
+
+struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
+{
+   return qcom_smmu_create(smmu, &qcom_smmu_impl);
+}
+
+struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device 
*smmu)
+{
+   return qcom_smmu_create(smmu, &qcom_adreno_smmu_impl);
+}
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d33cfe26b2f5..c417814f1d98 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -466,6 +466,7 @@ static inline void arm_smmu_writeq(struct arm_smmu_device 
*smmu, int page,
 
 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
+struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device 
*smmu);
 
 int arm_mmu500_reset(struct arm_smmu_device *smmu);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 0/7] iommu/arm-smmu: Enable split pagetable support

2020-06-26 Thread Jordan Crouse
Another iteration of the split-pagetable support for arm-smmu and the Adreno GPU
SMMU. After email discussions [1] we opted to make a arm-smmu implementation for
specifically for the Adreno GPU and use that to enable split pagetable support
and later other implementation specific bits that we need.

On the hardware side this is very close to the same code from before [2] only
the TTBR1 quirk is turned on by the implementation and not a domain attribute.
In drm/msm we use the returned size of the aperture as a clue to let us know
which virtual address space we should use for global memory objects.

There are two open items that you should be aware of. First, in the
implementation specific code we have to check the compatible string of the
device so that we only enable TTBR1 for the GPU (SID 0) and not the GMU (SID 4).
I went back and forth trying to decide if I wanted to use the compatible string
or the SID as the filter and settled on the compatible string but I could be
talked out of it.

The other open item is that in drm/msm the hardware only uses 49 bits of the
address space but arm-smmu expects the address to be sign extended all the way
to 64 bits. This isn't a problem normally unless you look at the hardware
registers that contain a IOVA and then the upper bits will be zero. I opted to
restrict the internal drm/msm IOVA range to only 49 bits and then sign extend
right before calling iommu_map / iommu_unmap. This is a bit wonky but I thought
that matching the hardware would be less confusing when debugging a hang.

v9: Fix bot-detected merge conflict
v7: Add attached device to smmu_domain to pass to implementation specific
functions

[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-May/044537.html
[2] https://patchwork.kernel.org/patch/11482591/


Jordan Crouse (7):
  iommu/arm-smmu: Pass io-pgtable config to implementation specific
function
  iommu/arm-smmu: Add support for split pagetables
  dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU
  iommu/arm-smmu: Add a pointer to the attached device to smmu_domain
  iommu/arm-smmu: Add implementation for the adreno GPU SMMU
  drm/msm: Set the global virtual address range from the IOMMU domain
  arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  4 ++
 arch/arm64/boot/dts/qcom/sdm845.dtsi  |  2 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   | 13 +-
 drivers/gpu/drm/msm/msm_iommu.c   |  7 +++
 drivers/iommu/arm-smmu-impl.c |  6 ++-
 drivers/iommu/arm-smmu-qcom.c | 45 ++-
 drivers/iommu/arm-smmu.c  | 38 +++-
 drivers/iommu/arm-smmu.h  | 30 ++---
 8 files changed, 120 insertions(+), 25 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 2/7] iommu/arm-smmu: Add support for split pagetables

2020-06-26 Thread Jordan Crouse
Enable TTBR1 for a context bank if IO_PGTABLE_QUIRK_ARM_TTBR1 is selected
by the io-pgtable configuration.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 21 -
 drivers/iommu/arm-smmu.h | 25 +++--
 2 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 8a3a6c8c887a..048de2681670 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -555,11 +555,15 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
cb->ttbr[1] = 0;
} else {
-   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID,
- cfg->asid);
+   cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+   cfg->asid);
cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
-cfg->asid);
+   cfg->asid);
+
+   if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   cb->ttbr[1] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   else
+   cb->ttbr[0] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
}
} else {
cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
@@ -824,7 +828,14 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
 
/* Update the domain's page sizes to reflect the page table format */
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-   domain->geometry.aperture_end = (1UL << ias) - 1;
+
+   if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   domain->geometry.aperture_start = ~0UL << ias;
+   domain->geometry.aperture_end = ~0UL;
+   } else {
+   domain->geometry.aperture_end = (1UL << ias) - 1;
+   }
+
domain->geometry.force_aperture = true;
 
/* Initialise the context bank with our page table cfg */
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 38b041530a4f..5f2de20e883b 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -168,10 +168,12 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_TCR0x30
 #define ARM_SMMU_TCR_EAE   BIT(31)
 #define ARM_SMMU_TCR_EPD1  BIT(23)
+#define ARM_SMMU_TCR_A1BIT(22)
 #define ARM_SMMU_TCR_TG0   GENMASK(15, 14)
 #define ARM_SMMU_TCR_SH0   GENMASK(13, 12)
 #define ARM_SMMU_TCR_ORGN0 GENMASK(11, 10)
 #define ARM_SMMU_TCR_IRGN0 GENMASK(9, 8)
+#define ARM_SMMU_TCR_EPD0  BIT(7)
 #define ARM_SMMU_TCR_T0SZ  GENMASK(5, 0)
 
 #define ARM_SMMU_VTCR_RES1 BIT(31)
@@ -347,12 +349,23 @@ struct arm_smmu_domain {
 
 static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
 {
-   return ARM_SMMU_TCR_EPD1 |
-  FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
-  FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
-  FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
-  FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
-  FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+   u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
+   FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
+   FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
+   FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
+   FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+
+   /*
+   * When TTBR1 is selected shift the TCR fields by 16 bits and disable
+   * translation in TTBR0
+   */
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   tcr = (tcr << 16) & ~ARM_SMMU_TCR_A1;
+   tcr |= ARM_SMMU_TCR_EPD0;
+   } else
+   tcr |= ARM_SMMU_TCR_EPD1;
+
+   return tcr;
 }
 
 static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 7/7] arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU

2020-06-26 Thread Jordan Crouse
Set the qcom,adreno-smmu compatible string for the GPU SMMU to enable
split pagetables.

Signed-off-by: Jordan Crouse 
---

 arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 8eb5a31346d2..8b15cd74e9ba 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -3556,7 +3556,7 @@
};
 
adreno_smmu: iommu@504 {
-   compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2";
+   compatible = "qcom,adreno-smmu", "qcom,smmu-v2";
reg = <0 0x504 0 0x1>;
#iommu-cells = <1>;
#global-interrupts = <2>;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 1/7] iommu/arm-smmu: Pass io-pgtable config to implementation specific function

2020-06-26 Thread Jordan Crouse
Construct the io-pgtable config before calling the implementation specific
init_context function and pass it so the implementation specific function
can get a chance to change it before the io-pgtable is created.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-impl.c |  3 ++-
 drivers/iommu/arm-smmu.c  | 11 ++-
 drivers/iommu/arm-smmu.h  |  3 ++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b70..a20e426d81ac 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
return 0;
 }
 
-static int cavium_init_context(struct arm_smmu_domain *smmu_domain)
+static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
 {
struct cavium_smmu *cs = container_of(smmu_domain->smmu,
  struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..8a3a6c8c887a 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -797,11 +797,6 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
cfg->asid = cfg->cbndx;
 
smmu_domain->smmu = smmu;
-   if (smmu->impl && smmu->impl->init_context) {
-   ret = smmu->impl->init_context(smmu_domain);
-   if (ret)
-   goto out_unlock;
-   }
 
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
@@ -812,6 +807,12 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
.iommu_dev  = smmu->dev,
};
 
+   if (smmu->impl && smmu->impl->init_context) {
+   ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg);
+   if (ret)
+   goto out_unlock;
+   }
+
if (smmu_domain->non_strict)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..38b041530a4f 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -383,7 +383,8 @@ struct arm_smmu_impl {
u64 val);
int (*cfg_probe)(struct arm_smmu_device *smmu);
int (*reset)(struct arm_smmu_device *smmu);
-   int (*init_context)(struct arm_smmu_domain *smmu_domain);
+   int (*init_context)(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *cfg);
void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
 int status);
int (*def_domain_type)(struct device *dev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/arm-smmu: Mark qcom_smmu_client_of_match as possibly unused

2020-06-26 Thread Jordan Crouse
On Mon, Jun 08, 2020 at 04:13:08PM +0100, Will Deacon wrote:
> On Thu, Jun 04, 2020 at 02:39:04PM -0600, Jordan Crouse wrote:
> > When CONFIG_OF=n of_match_device() gets pre-processed out of existence
> > leaving qcom-smmu_client_of_match unused. Mark it as possibly unused to
> > keep the compiler from warning in that case.
> > 
> > Fixes: 0e764a01015d ("iommu/arm-smmu: Allow client devices to select direct 
> > mapping")
> > Reported-by: kbuild test robot 
> > Acked-by: Will Deacon 
> > Signed-off-by: Jordan Crouse 
> > ---
> > 
> >  drivers/iommu/arm-smmu-qcom.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
> > index cf01d0215a39..be4318044f96 100644
> > --- a/drivers/iommu/arm-smmu-qcom.c
> > +++ b/drivers/iommu/arm-smmu-qcom.c
> > @@ -12,7 +12,7 @@ struct qcom_smmu {
> > struct arm_smmu_device smmu;
> >  };
> >  
> > -static const struct of_device_id qcom_smmu_client_of_match[] = {
> > +static const struct of_device_id qcom_smmu_client_of_match[] 
> > __maybe_unused = {
> > { .compatible = "qcom,adreno" },
> > { .compatible = "qcom,mdp4" },
> > { .compatible = "qcom,mdss" },
> 
> Thanks. Joerg -- can you pick this one up, please? I don't have any other
> SMMU fixes pending at the moment.
> 
> Cheers,
> 
> Will

Quick ping to pick this up for 5.8 fixes.

Thanks,
Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/6] drm/msm/a6xx: Add support for per-instance pagetables

2020-06-12 Thread Jordan Crouse
On Thu, Jun 11, 2020 at 08:22:29PM -0700, Rob Clark wrote:
> On Thu, Jun 11, 2020 at 3:29 PM Jordan Crouse  wrote:
> >
> > Add support for using per-instance pagetables if all the dependencies are
> > available.
> >
> > Signed-off-by: Jordan Crouse 
> > ---
> >
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 69 ++-
> >  drivers/gpu/drm/msm/msm_ringbuffer.h  |  1 +
> >  2 files changed, 69 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index a1589e040c57..5e82b85d4d55 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -79,6 +79,58 @@ static void get_stats_counter(struct msm_ringbuffer 
> > *ring, u32 counter,
> > OUT_RING(ring, upper_32_bits(iova));
> >  }
> >
> > +static void a6xx_set_pagetable(struct msm_gpu *gpu, struct msm_ringbuffer 
> > *ring,
> > +   struct msm_file_private *ctx)
> > +{
> > +   phys_addr_t ttbr;
> > +   u32 asid;
> > +
> > +   if (msm_iommu_pagetable_params(ctx->aspace->mmu, &ttbr, &asid))
> > +   return;
> > +
> > +   OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
> > +   OUT_RING(ring, 0);
> > +
> > +   /* Turn on APIV mode to access critical regions */
> > +   OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1);
> > +   OUT_RING(ring, 1);
> > +
> > +   /* Make sure the ME is synchronized before staring the update */
> > +   OUT_PKT7(ring, CP_WAIT_FOR_ME, 0);
> > +
> > +   /* Execute the table update */
> > +   OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 4);
> > +   OUT_RING(ring, lower_32_bits(ttbr));
> > +   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
> > +   /* CONTEXTIDR is currently unused */
> > +   OUT_RING(ring, 0);
> > +   /* CONTEXTBANK is currently unused */
> > +   OUT_RING(ring, 0);
> 
> I can add this to xml (on userspace side, we've been describing packet
> payload in xml and using the generated builders), and update generated
> headers, if you agree to not add more open-coded pkt7 building ;-)

But open coding opcode is so much fun! :)  Its fine to put this in the XML. It
can only be executed from the ringbuffer FWIW.

> > +
> > +   /*
> > +* Write the new TTBR0 to the memstore. This is good for debugging.
> > +*/
> > +   OUT_PKT7(ring, CP_MEM_WRITE, 4);
> > +   OUT_RING(ring, lower_32_bits(rbmemptr(ring, ttbr0)));
> > +   OUT_RING(ring, upper_32_bits(rbmemptr(ring, ttbr0)));
> > +   OUT_RING(ring, lower_32_bits(ttbr));
> > +   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
> > +
> > +   /* Invalidate the draw state so we start off fresh */
> > +   OUT_PKT7(ring, CP_SET_DRAW_STATE, 3);
> > +   OUT_RING(ring, 0x4);
> > +   OUT_RING(ring, 1);
> > +   OUT_RING(ring, 0);
> 
> Ie, this would look like:
> 
> OUT_PKT7(ring, CP_SET_DRAW_STATE, 3);
> OUT_RING(ring, CP_SET_DRAW_STATE__0_COUNT(0) |
> CP_SET_DRAW_STATE__0_DISABLE_ALL_GROUPS |
> CP_SET_DRAW_STATE__0_GROUP_ID(0));
> OUT_RING(ring, CP_SET_DRAW_STATE__1_ADDR_LO(1));
> OUT_RING(ring, CP_SET_DRAW_STATE__2_ADDR_HI(0));
> 
> .. but written that way, I think you meant ADDR_LO to be zero?
> 
> (it is possible we need to regen headers for that to work, the kernel
> headers are somewhat out of date by now)

As we discussed on IRC this bit isn't needed because the CP_SMMU_TABLE_UPDATE
handles it for us.  I'll remove that.

> BR,
> -R

Jordan

> > +
> > +   /* Turn off APRIV */
> > +   OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1);
> > +   OUT_RING(ring, 0);
> > +
> > +   /* Turn off protected mode */
> > +   OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
> > +   OUT_RING(ring, 1);
> > +}
> > +
> >  static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
> > struct msm_file_private *ctx)
> >  {
> > @@ -89,6 +141,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
> > msm_gem_submit *submit,
> > struct msm_ringbuffer *ring = submit->ring;
> > unsigned int i;
> >
> > +   a6xx_set_pagetable(gpu, ring, ctx);
> > +
> > get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO,
> > rbmemptr_stats(ring, index, cpcycles_start));
> >
>

[PATCH] iommu/arm-smmu: Add a init_context_bank implementation hook

2020-06-11 Thread Jordan Crouse
Add a new implementation hook to allow the implementation specific code
to tweek the context bank configuration just before it gets written.
The first user will be the Adreno GPU implementation to turn on
SCTLR.HUPCF to ensure that a page fault doesn't terminating pending
transactions. Doing so could hang the GPU if one of the terminated
transactions is a CP read.

This depends on the arm-smmu adreno SMMU implementation [1].

[1] https://patchwork.kernel.org/patch/11600943/

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-qcom.c | 13 +
 drivers/iommu/arm-smmu.c  | 28 +---
 drivers/iommu/arm-smmu.h  | 11 +++
 3 files changed, 37 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index 6d0ab4865fc7..e5c6345da6fc 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -17,6 +17,18 @@ static bool qcom_adreno_smmu_is_gpu_device(struct 
arm_smmu_domain *smmu_domain)
return of_device_is_compatible(smmu_domain->dev.of_node, "qcom,adreno");
 }
 
+static void qcom_adreno_smmu_init_context_bank(struct arm_smmu_domain 
*smmu_domain,
+   struct arm_smmu_cb *cb)
+{
+   /*
+* On the GPU device we want to process subsequent transactions after a
+* fault to keep the GPU from hanging
+*/
+
+   if (qcom_adreno_smmu_is_gpu_device(smmu_domain))
+   cb->sctlr |= ARM_SMMU_SCTLR_HUPCF;
+}
+
 static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
struct io_pgtable_cfg *pgtbl_cfg)
 {
@@ -92,6 +104,7 @@ static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
.init_context = qcom_adreno_smmu_init_context,
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
+   .init_context_bank = qcom_adreno_smmu_init_context_bank,
 };
 
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a06cbcaec247..f0f201ece3a0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -86,13 +86,6 @@ struct arm_smmu_smr {
boolvalid;
 };
 
-struct arm_smmu_cb {
-   u64 ttbr[2];
-   u32 tcr[2];
-   u32 mair[2];
-   struct arm_smmu_cfg *cfg;
-};
-
 struct arm_smmu_master_cfg {
struct arm_smmu_device  *smmu;
s16 smendx[];
@@ -579,6 +572,18 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;
}
}
+
+   cb->sctlr = ARM_SMMU_SCTLR_CFIE | ARM_SMMU_SCTLR_CFRE | 
ARM_SMMU_SCTLR_AFE |
+   ARM_SMMU_SCTLR_TRE | ARM_SMMU_SCTLR_M;
+
+   if (stage1)
+   cb->sctlr |= ARM_SMMU_SCTLR_S1_ASIDPNE;
+   if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
+   cb->sctlr |= ARM_SMMU_SCTLR_E;
+
+   /* Give the implementation a chance to adjust the configuration */
+   if (smmu_domain->smmu->impl && 
smmu_domain->smmu->impl->init_context_bank)
+   smmu_domain->smmu->impl->init_context_bank(smmu_domain, cb);
 }
 
 static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
@@ -657,14 +662,7 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
}
 
/* SCTLR */
-   reg = ARM_SMMU_SCTLR_CFIE | ARM_SMMU_SCTLR_CFRE | ARM_SMMU_SCTLR_AFE |
- ARM_SMMU_SCTLR_TRE | ARM_SMMU_SCTLR_M;
-   if (stage1)
-   reg |= ARM_SMMU_SCTLR_S1_ASIDPNE;
-   if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
-   reg |= ARM_SMMU_SCTLR_E;
-
-   arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
+   arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, cb->sctlr);
 }
 
 /*
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 79d441024043..9b539820997b 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -142,6 +142,7 @@ enum arm_smmu_cbar_type {
 
 #define ARM_SMMU_CB_SCTLR  0x0
 #define ARM_SMMU_SCTLR_S1_ASIDPNE  BIT(12)
+#define ARM_SMMU_SCTLR_HUPCF   BIT(8)
 #define ARM_SMMU_SCTLR_CFCFG   BIT(7)
 #define ARM_SMMU_SCTLR_CFIEBIT(6)
 #define ARM_SMMU_SCTLR_CFREBIT(5)
@@ -349,6 +350,14 @@ struct arm_smmu_domain {
boolaux;
 };
 
+struct arm_smmu_cb {
+   u64 ttbr[2];
+   u32 tcr[2];
+   u32 mair[2];
+   u32 sctlr;
+   struct arm_smmu_cfg *cfg;
+};
+
 static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
 {
u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
@@ -403,6 

[PATCH 3/6] iommu/arm-smmu: Add a domain attribute to pass the pagetable config

2020-06-11 Thread Jordan Crouse
The Adreno GPU has the capacity to manage its own pagetables and switch
them dynamically from the hardware. Add a domain attribute for arm-smmu-v2
to get the default pagetable configuration so that the GPU driver can match
the format for its own pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 12 
 include/linux/iommu.h|  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 46a96c578592..a06cbcaec247 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1710,6 +1710,18 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case DOMAIN_ATTR_NESTING:
*(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_PGTABLE_CFG: {
+   struct io_pgtable *pgtable;
+   struct io_pgtable_cfg *dest = data;
+
+   if (!smmu_domain->pgtbl_ops)
+   return -ENODEV;
+
+   pgtable = 
io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+
+   memcpy(dest, &pgtable->cfg, sizeof(*dest));
+   return 0;
+   }
default:
return -ENODEV;
}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5f0b7859d2eb..2388117641f1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -124,6 +124,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   DOMAIN_ATTR_PGTABLE_CFG,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/6] drm/msm/a6xx: Add support for per-instance pagetables

2020-06-11 Thread Jordan Crouse
Add support for using per-instance pagetables if all the dependencies are
available.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 69 ++-
 drivers/gpu/drm/msm/msm_ringbuffer.h  |  1 +
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index a1589e040c57..5e82b85d4d55 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -79,6 +79,58 @@ static void get_stats_counter(struct msm_ringbuffer *ring, 
u32 counter,
OUT_RING(ring, upper_32_bits(iova));
 }
 
+static void a6xx_set_pagetable(struct msm_gpu *gpu, struct msm_ringbuffer 
*ring,
+   struct msm_file_private *ctx)
+{
+   phys_addr_t ttbr;
+   u32 asid;
+
+   if (msm_iommu_pagetable_params(ctx->aspace->mmu, &ttbr, &asid))
+   return;
+
+   OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+   OUT_RING(ring, 0);
+
+   /* Turn on APIV mode to access critical regions */
+   OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1);
+   OUT_RING(ring, 1);
+
+   /* Make sure the ME is synchronized before staring the update */
+   OUT_PKT7(ring, CP_WAIT_FOR_ME, 0);
+
+   /* Execute the table update */
+   OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 4);
+   OUT_RING(ring, lower_32_bits(ttbr));
+   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
+   /* CONTEXTIDR is currently unused */
+   OUT_RING(ring, 0);
+   /* CONTEXTBANK is currently unused */
+   OUT_RING(ring, 0);
+
+   /*
+* Write the new TTBR0 to the memstore. This is good for debugging.
+*/
+   OUT_PKT7(ring, CP_MEM_WRITE, 4);
+   OUT_RING(ring, lower_32_bits(rbmemptr(ring, ttbr0)));
+   OUT_RING(ring, upper_32_bits(rbmemptr(ring, ttbr0)));
+   OUT_RING(ring, lower_32_bits(ttbr));
+   OUT_RING(ring, (((u64) asid) << 48) | upper_32_bits(ttbr));
+
+   /* Invalidate the draw state so we start off fresh */
+   OUT_PKT7(ring, CP_SET_DRAW_STATE, 3);
+   OUT_RING(ring, 0x4);
+   OUT_RING(ring, 1);
+   OUT_RING(ring, 0);
+
+   /* Turn off APRIV */
+   OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1);
+   OUT_RING(ring, 0);
+
+   /* Turn off protected mode */
+   OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+   OUT_RING(ring, 1);
+}
+
 static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
struct msm_file_private *ctx)
 {
@@ -89,6 +141,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit,
struct msm_ringbuffer *ring = submit->ring;
unsigned int i;
 
+   a6xx_set_pagetable(gpu, ring, ctx);
+
get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO,
rbmemptr_stats(ring, index, cpcycles_start));
 
@@ -872,6 +926,18 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
return (unsigned long)busy_time;
 }
 
+struct msm_gem_address_space *a6xx_address_space_instance(struct msm_gpu *gpu)
+{
+   struct msm_mmu *mmu;
+
+   mmu = msm_iommu_pagetable_create(gpu->aspace->mmu);
+   if (IS_ERR(mmu))
+   return msm_gem_address_space_get(gpu->aspace);
+
+   return msm_gem_address_space_create(mmu,
+   "gpu", 0x1ULL, 0x1ULL);
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
@@ -893,8 +959,9 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DRM_MSM_GPU_STATE)
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
-   .create_address_space = adreno_iommu_create_address_space,
 #endif
+   .create_address_space = adreno_iommu_create_address_space,
+   .address_space_instance = a6xx_address_space_instance,
},
.get_timestamp = a6xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h 
b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 7764373d0ed2..0987d6bf848c 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -31,6 +31,7 @@ struct msm_rbmemptrs {
volatile uint32_t fence;
 
volatile struct msm_gpu_submit_stats stats[MSM_GPU_SUBMIT_STATS_COUNT];
+   volatile u64 ttbr0;
 };
 
 struct msm_ringbuffer {
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/6] iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2

2020-06-11 Thread Jordan Crouse
Support auxiliary domains for arm-smmu-v2 to initialize and support
multiple pagetables for a single SMMU context bank. Since the smmu-v2
hardware doesn't have any built in support for switching the pagetable
base it is left as an exercise to the caller to actually use the pagetable.

Aux domains are supported if split pagetable (TTBR1) support has been
enabled on the master domain.  Each auxiliary domain will reuse the
configuration of the master domain. By default the a domain with TTBR1
support will have the TTBR0 region disabled so the first attached aux
domain will enable the TTBR0 region in the hardware and conversely the
last domain to be detached will disable TTBR0 translations.  All subsequent
auxiliary domains create a pagetable but not touch the hardware.

The leaf driver will be able to query the physical address of the
pagetable with the DOMAIN_ATTR_PTBASE attribute so that it can use the
address with whatever means it has to switch the pagetable base.

Following is a pseudo code example of how a domain can be created

 /* Check to see if aux domains are supported */
 if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
 iommu = iommu_domain_alloc(...);

 if (iommu_aux_attach_device(domain, dev))
 return FAIL;

/* Save the base address of the pagetable for use by the driver
iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, &ptbase);
 }

Then 'domain' can be used like any other iommu domain to map and
unmap iova addresses in the pagetable.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 216 ---
 drivers/iommu/arm-smmu.h |   1 +
 2 files changed, 201 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 743d75b9ff3f..46a96c578592 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -667,6 +667,84 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
 }
 
+/*
+ * Update the context context bank to enable TTBR0. Assumes AARCH64 S1
+ * configuration.
+ */
+static void arm_smmu_context_set_ttbr0(struct arm_smmu_cb *cb,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   u32 tcr = cb->tcr[0];
+
+   /* Add the TCR configuration from the new pagetable config */
+   tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
+
+   /* Make sure that both TTBR0 and TTBR1 are enabled */
+   tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
+
+   /* Udate the TCR register */
+   cb->tcr[0] = tcr;
+
+   /* Program the new TTBR0 */
+   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+}
+
+/*
+ * Thus function assumes that the current model only allows aux domains for
+ * AARCH64 S1 configurations
+ */
+static int arm_smmu_aux_init_domain_context(struct iommu_domain *domain,
+   struct arm_smmu_device *smmu, struct arm_smmu_cfg *master)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *pgtbl_ops;
+   struct io_pgtable_cfg pgtbl_cfg;
+
+   mutex_lock(&smmu_domain->init_mutex);
+
+   /* Copy the configuration from the master */
+   memcpy(&smmu_domain->cfg, master, sizeof(smmu_domain->cfg));
+
+   smmu_domain->flush_ops = &arm_smmu_s1_tlb_ops;
+   smmu_domain->smmu = smmu;
+
+   pgtbl_cfg = (struct io_pgtable_cfg) {
+   .pgsize_bitmap = smmu->pgsize_bitmap,
+   .ias = smmu->va_size,
+   .oas = smmu->ipa_size,
+   .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK,
+   .tlb = smmu_domain->flush_ops,
+   .iommu_dev = smmu->dev,
+   .quirks = 0,
+   };
+
+   if (smmu_domain->non_strict)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+
+   pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1, &pgtbl_cfg,
+   smmu_domain);
+   if (!pgtbl_ops) {
+   mutex_unlock(&smmu_domain->init_mutex);
+   return -ENOMEM;
+   }
+
+   domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
+
+   domain->geometry.aperture_end = (1UL << smmu->va_size) - 1;
+   domain->geometry.force_aperture = true;
+
+   /* enable TTBR0 when the the first aux domain is attached */
+   if (atomic_inc_return(&smmu->cbs[master->cbndx].aux) == 1) {
+   arm_smmu_context_set_ttbr0(&smmu->cbs[master->cbndx],
+   &pgtbl_cfg);
+   arm_smmu_write_context_bank(smmu, master->cbndx);
+   }
+
+   smmu_domain->pgtbl_ops = pgtbl_ops;
+   return 0;
+}
+
 static int arm_smmu_init_domain_context(struct iommu_domain *domain,

[PATCH 4/6] drm/msm: Add support to create a local pagetable

2020-06-11 Thread Jordan Crouse
Add support to create a io-pgtable for use by targets that support
per-instance pagetables.  In order to support per-instance pagetables the
GPU SMMU device needs to have the qcom,adreno-smmu compatible string and
split pagetables and auxiliary domains need to be supported and enabled.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/msm_gpummu.c |   2 +-
 drivers/gpu/drm/msm/msm_iommu.c  | 180 ++-
 drivers/gpu/drm/msm/msm_mmu.h|  16 ++-
 3 files changed, 195 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpummu.c b/drivers/gpu/drm/msm/msm_gpummu.c
index 310a31b05faa..aab121f4beb7 100644
--- a/drivers/gpu/drm/msm/msm_gpummu.c
+++ b/drivers/gpu/drm/msm/msm_gpummu.c
@@ -102,7 +102,7 @@ struct msm_mmu *msm_gpummu_new(struct device *dev, struct 
msm_gpu *gpu)
}
 
gpummu->gpu = gpu;
-   msm_mmu_init(&gpummu->base, dev, &funcs);
+   msm_mmu_init(&gpummu->base, dev, &funcs, MSM_MMU_GPUMMU);
 
return &gpummu->base;
 }
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index bbe129867590..c7efe43388e3 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -4,15 +4,192 @@
  * Author: Rob Clark 
  */
 
+#include 
 #include "msm_drv.h"
 #include "msm_mmu.h"
 
 struct msm_iommu {
struct msm_mmu base;
struct iommu_domain *domain;
+   struct iommu_domain *aux_domain;
 };
+
 #define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
 
+struct msm_iommu_pagetable {
+   struct msm_mmu base;
+   struct msm_mmu *parent;
+   struct io_pgtable_ops *pgtbl_ops;
+   phys_addr_t ttbr;
+   u32 asid;
+};
+
+static struct msm_iommu_pagetable *to_pagetable(struct msm_mmu *mmu)
+{
+   return container_of(mmu, struct msm_iommu_pagetable, base);
+}
+
+static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
+   size_t size)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   size_t unmapped = 0;
+
+   /* Unmap the block one page at a time */
+   while (size) {
+   unmapped += ops->unmap(ops, iova, 4096, NULL);
+   iova += 4096;
+   size -= 4096;
+   }
+
+   iommu_flush_tlb_all(to_msm_iommu(pagetable->parent)->domain);
+
+   return (unmapped == size) ? 0 : -EINVAL;
+}
+
+static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
+   struct sg_table *sgt, size_t len, int prot)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   struct scatterlist *sg;
+   size_t mapped = 0;
+   u64 addr = iova;
+   unsigned int i;
+
+   for_each_sg(sgt->sgl, sg, sgt->nents, i) {
+   size_t size = sg->length;
+   phys_addr_t phys = sg_phys(sg);
+
+   /* Map the block one page at a time */
+   while (size) {
+   if (ops->map(ops, addr, phys, 4096, prot)) {
+   msm_iommu_pagetable_unmap(mmu, iova, mapped);
+   return -EINVAL;
+   }
+
+   phys += 4096;
+   addr += 4096;
+   size -= 4096;
+   mapped += 4096;
+   }
+   }
+
+   return 0;
+}
+
+static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+
+   free_io_pgtable_ops(pagetable->pgtbl_ops);
+   kfree(pagetable);
+}
+
+/*
+ * Given a parent device, create and return an aux domain. This will enable the
+ * TTBR0 region
+ */
+static struct iommu_domain *msm_iommu_get_aux_domain(struct msm_mmu *parent)
+{
+   struct msm_iommu *iommu = to_msm_iommu(parent);
+   struct iommu_domain *domain;
+   int ret;
+
+   if (iommu->aux_domain)
+   return iommu->aux_domain;
+
+   if (!iommu_dev_has_feature(parent->dev, IOMMU_DEV_FEAT_AUX))
+   return ERR_PTR(-ENODEV);
+
+   domain = iommu_domain_alloc(&platform_bus_type);
+   if (!domain)
+   return ERR_PTR(-ENODEV);
+
+   ret = iommu_aux_attach_device(domain, parent->dev);
+   if (ret) {
+   iommu_domain_free(domain);
+   return ERR_PTR(ret);
+   }
+
+   iommu->aux_domain = domain;
+   return domain;
+}
+
+int msm_iommu_pagetable_params(struct msm_mmu *mmu,
+   phys_addr_t *ttbr, int *asid)
+{
+   struct msm_iommu_pagetable *pagetable;
+
+   if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
+   return -EINVAL;
+
+   pagetable = to_pagetable(mmu);
+
+   if (ttbr)
+   *ttbr = pagetable->ttbr;
+
+   if (asid)
+

[PATCH 5/6] drm/msm: Add support for address space instances

2020-06-11 Thread Jordan Crouse
Add support for allocating an address space instance. Targets that support
per-instance pagetables should implement their own function to allocate a
new instance. The default will return the existing generic address space.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/msm_drv.c | 15 +--
 drivers/gpu/drm/msm/msm_drv.h |  4 
 drivers/gpu/drm/msm/msm_gem_vma.c |  9 +
 drivers/gpu/drm/msm/msm_gpu.c | 17 +
 drivers/gpu/drm/msm/msm_gpu.h |  5 +
 5 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index f6ce40bf3699..0c219b954943 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -599,7 +599,7 @@ static int context_init(struct drm_device *dev, struct 
drm_file *file)
 
msm_submitqueue_init(dev, ctx);
 
-   ctx->aspace = priv->gpu ? priv->gpu->aspace : NULL;
+   ctx->aspace = msm_gpu_address_space_instance(priv->gpu);
file->driver_priv = ctx;
 
return 0;
@@ -618,6 +618,8 @@ static int msm_open(struct drm_device *dev, struct drm_file 
*file)
 static void context_close(struct msm_file_private *ctx)
 {
msm_submitqueue_close(ctx);
+
+   msm_gem_address_space_put(ctx->aspace);
kfree(ctx);
 }
 
@@ -782,18 +784,19 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, 
void *data,
 }
 
 static int msm_ioctl_gem_info_iova(struct drm_device *dev,
-   struct drm_gem_object *obj, uint64_t *iova)
+   struct drm_file *file, struct drm_gem_object *obj,
+   uint64_t *iova)
 {
-   struct msm_drm_private *priv = dev->dev_private;
+   struct msm_file_private *ctx = file->driver_priv;
 
-   if (!priv->gpu)
+   if (!ctx->aspace)
return -EINVAL;
 
/*
 * Don't pin the memory here - just get an address so that userspace can
 * be productive
 */
-   return msm_gem_get_iova(obj, priv->gpu->aspace, iova);
+   return msm_gem_get_iova(obj, ctx->aspace, iova);
 }
 
 static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
@@ -832,7 +835,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
args->value = msm_gem_mmap_offset(obj);
break;
case MSM_INFO_GET_IOVA:
-   ret = msm_ioctl_gem_info_iova(dev, obj, &args->value);
+   ret = msm_ioctl_gem_info_iova(dev, file, obj, &args->value);
break;
case MSM_INFO_SET_NAME:
/* length check should leave room for terminating null: */
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index e2d6a6056418..983a8b7e5a74 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -249,6 +249,10 @@ int msm_gem_map_vma(struct msm_gem_address_space *aspace,
 void msm_gem_close_vma(struct msm_gem_address_space *aspace,
struct msm_gem_vma *vma);
 
+
+struct msm_gem_address_space *
+msm_gem_address_space_get(struct msm_gem_address_space *aspace);
+
 void msm_gem_address_space_put(struct msm_gem_address_space *aspace);
 
 struct msm_gem_address_space *
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 5f6a11211b64..29cc1305cf37 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -27,6 +27,15 @@ void msm_gem_address_space_put(struct msm_gem_address_space 
*aspace)
kref_put(&aspace->kref, msm_gem_address_space_destroy);
 }
 
+struct msm_gem_address_space *
+msm_gem_address_space_get(struct msm_gem_address_space *aspace)
+{
+   if (!IS_ERR_OR_NULL(aspace))
+   kref_get(&aspace->kref);
+
+   return aspace;
+}
+
 /* Actually unmap memory for the vma */
 void msm_gem_purge_vma(struct msm_gem_address_space *aspace,
struct msm_gem_vma *vma)
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index a22d30622306..b4f31460 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -821,6 +821,23 @@ static int get_clocks(struct platform_device *pdev, struct 
msm_gpu *gpu)
return 0;
 }
 
+/* Return a new address space instance */
+struct msm_gem_address_space *
+msm_gpu_address_space_instance(struct msm_gpu *gpu)
+{
+   if (!gpu)
+   return NULL;
+
+   /*
+* If the GPU doesn't support instanced address spaces return the
+* default address space
+*/
+   if (!gpu->funcs->address_space_instance)
+   return msm_gem_address_space_get(gpu->aspace);
+
+   return gpu->funcs->address_space_instance(gpu);
+}
+
 int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
struct msm_gpu *gpu, const struct msm_gpu_funcs *funcs,

[PATCH 0/6] iommu-arm-smmu: Add auxiliary domains and per-instance pagetables

2020-06-11 Thread Jordan Crouse
This is a new refresh of support for auxiliary domains for arm-smmu-v2
and per-instance pagetables for drm/msm. The big change here from past
efforts is that outside of creating a single aux-domain to enable TTBR0
all of the per-instance pagetables are created and managed exclusively
in drm/msm without involving the arm-smmu driver. This fits in with the
suggested model of letting the GPU hardware do what it needs and leave the
arm-smmu driver blissfully unaware.

Almost. In order to set up the io-pgtable properly in drm/msm we need to
query the pagetable configuration from the current active domain and we need to
rely on the iommu API to flush TLBs after a unmap. In the future we can optimize
this in the drm/msm driver to track the state of the TLBs but for now the big
hammer lets us get off the ground.

This series is build on the split pagetable support [1].

[1] https://patchwork.kernel.org/patch/11600949/

Jordan Crouse (6):
  iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2
  iommu/io-pgtable: Allow a pgtable implementation to skip TLB
operations
  iommu/arm-smmu: Add a domain attribute to pass the pagetable config
  drm/msm: Add support to create a local pagetable
  drm/msm: Add support for address space instances
  drm/msm/a6xx: Add support for per-instance pagetables

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  69 +++-
 drivers/gpu/drm/msm/msm_drv.c |  15 +-
 drivers/gpu/drm/msm/msm_drv.h |   4 +
 drivers/gpu/drm/msm/msm_gem_vma.c |   9 +
 drivers/gpu/drm/msm/msm_gpu.c |  17 ++
 drivers/gpu/drm/msm/msm_gpu.h |   5 +
 drivers/gpu/drm/msm/msm_gpummu.c  |   2 +-
 drivers/gpu/drm/msm/msm_iommu.c   | 180 +++-
 drivers/gpu/drm/msm/msm_mmu.h |  16 +-
 drivers/gpu/drm/msm/msm_ringbuffer.h  |   1 +
 drivers/iommu/arm-smmu.c  | 228 --
 drivers/iommu/arm-smmu.h  |   1 +
 include/linux/io-pgtable.h|  11 +-
 include/linux/iommu.h |   1 +
 14 files changed, 529 insertions(+), 30 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/6] iommu/io-pgtable: Allow a pgtable implementation to skip TLB operations

2020-06-11 Thread Jordan Crouse
Allow a io-pgtable implementation to skip TLB operations by checking for
NULL pointers in the helper functions. It will be up to to the owner
of the io-pgtable instance to make sure that they independently handle
the TLB correctly.

Signed-off-by: Jordan Crouse 
---

 include/linux/io-pgtable.h | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 53d53c6c2be9..bbed1d3925ba 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -210,21 +210,24 @@ struct io_pgtable {
 
 static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
 {
-   iop->cfg.tlb->tlb_flush_all(iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_all(iop->cookie);
 }
 
 static inline void
 io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
 {
-   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
 }
 
 static inline void
 io_pgtable_tlb_flush_leaf(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
 {
-   iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie);
 }
 
 static inline void
@@ -232,7 +235,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
struct iommu_iotlb_gather * gather, unsigned long iova,
size_t granule)
 {
-   if (iop->cfg.tlb->tlb_add_page)
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
 }
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 3/7] dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU

2020-06-11 Thread Jordan Crouse
Every Qcom Adreno GPU has an embedded SMMU for its own use. These
devices depend on unique features such as split pagetables,
different stall/halt requirements and other settings. Identify them
with a compatible string so that they can be identified in the
arm-smmu implementation specific code.

Signed-off-by: Jordan Crouse 
---

 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423..e52a1b146c97 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,10 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: Qcom Adreno GPUs implementing "arm,smmu-v2"
+items:
+  - const: qcom,adreno-smmu
+  - const: qcom,smmu-v2
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 5/7] iommu/arm-smmu: Add implementation for the adreno GPU SMMU

2020-06-11 Thread Jordan Crouse
Add a special implementation for the SMMU attached to most Adreno GPU
target triggered from the qcom,adreno-gpu-smmu compatible string. When
selected the driver will attempt to enable split pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-impl.c |  3 +++
 drivers/iommu/arm-smmu-qcom.c | 45 +--
 drivers/iommu/arm-smmu.h  |  1 +
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index a20e426d81ac..309675cf6699 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -176,5 +176,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
 
+   if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu"))
+   return qcom_adreno_smmu_impl_init(smmu);
+
return smmu;
 }
diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index cf01d0215a39..6d0ab4865fc7 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -12,6 +12,29 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
 };
 
+static bool qcom_adreno_smmu_is_gpu_device(struct arm_smmu_domain *smmu_domain)
+{
+   return of_device_is_compatible(smmu_domain->dev.of_node, "qcom,adreno");
+}
+
+static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   /* TTBR1 is only for the GPU stream ID and not the GMU */
+   if (!qcom_adreno_smmu_is_gpu_device(smmu_domain))
+   return 0;
+   /*
+* All targets that use the qcom,adreno-smmu compatible string *should*
+* be AARCH64 stage 1 but double check because the arm-smmu code assumes
+* that is the case when the TTBR1 quirk is enabled
+*/
+   if ((smmu_domain->stage == ARM_SMMU_DOMAIN_S1) &&
+   (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
+   pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
+
+   return 0;
+}
+
 static const struct of_device_id qcom_smmu_client_of_match[] = {
{ .compatible = "qcom,adreno" },
{ .compatible = "qcom,mdp4" },
@@ -65,7 +88,15 @@ static const struct arm_smmu_impl qcom_smmu_impl = {
.reset = qcom_smmu500_reset,
 };
 
-struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
+static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
+   .init_context = qcom_adreno_smmu_init_context,
+   .def_domain_type = qcom_smmu_def_domain_type,
+   .reset = qcom_smmu500_reset,
+};
+
+
+static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
+   const struct arm_smmu_impl *impl)
 {
struct qcom_smmu *qsmmu;
 
@@ -75,8 +106,18 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct 
arm_smmu_device *smmu)
 
qsmmu->smmu = *smmu;
 
-   qsmmu->smmu.impl = &qcom_smmu_impl;
+   qsmmu->smmu.impl = impl;
devm_kfree(smmu->dev, smmu);
 
return &qsmmu->smmu;
 }
+
+struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
+{
+   return qcom_smmu_create(smmu, &qcom_smmu_impl);
+}
+
+struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device 
*smmu)
+{
+   return qcom_smmu_create(smmu, &qcom_adreno_smmu_impl);
+}
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d33cfe26b2f5..c417814f1d98 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -466,6 +466,7 @@ static inline void arm_smmu_writeq(struct arm_smmu_device 
*smmu, int page,
 
 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
+struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device 
*smmu);
 
 int arm_mmu500_reset(struct arm_smmu_device *smmu);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 2/7] iommu/arm-smmu: Add support for split pagetables

2020-06-11 Thread Jordan Crouse
Enable TTBR1 for a context bank if IO_PGTABLE_QUIRK_ARM_TTBR1 is selected
by the io-pgtable configuration.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 21 -
 drivers/iommu/arm-smmu.h | 25 +++--
 2 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 8a3a6c8c887a..048de2681670 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -555,11 +555,15 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
cb->ttbr[1] = 0;
} else {
-   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID,
- cfg->asid);
+   cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+   cfg->asid);
cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
-cfg->asid);
+   cfg->asid);
+
+   if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   cb->ttbr[1] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   else
+   cb->ttbr[0] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
}
} else {
cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
@@ -824,7 +828,14 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
 
/* Update the domain's page sizes to reflect the page table format */
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-   domain->geometry.aperture_end = (1UL << ias) - 1;
+
+   if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   domain->geometry.aperture_start = ~0UL << ias;
+   domain->geometry.aperture_end = ~0UL;
+   } else {
+   domain->geometry.aperture_end = (1UL << ias) - 1;
+   }
+
domain->geometry.force_aperture = true;
 
/* Initialise the context bank with our page table cfg */
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 38b041530a4f..5f2de20e883b 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -168,10 +168,12 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_TCR0x30
 #define ARM_SMMU_TCR_EAE   BIT(31)
 #define ARM_SMMU_TCR_EPD1  BIT(23)
+#define ARM_SMMU_TCR_A1BIT(22)
 #define ARM_SMMU_TCR_TG0   GENMASK(15, 14)
 #define ARM_SMMU_TCR_SH0   GENMASK(13, 12)
 #define ARM_SMMU_TCR_ORGN0 GENMASK(11, 10)
 #define ARM_SMMU_TCR_IRGN0 GENMASK(9, 8)
+#define ARM_SMMU_TCR_EPD0  BIT(7)
 #define ARM_SMMU_TCR_T0SZ  GENMASK(5, 0)
 
 #define ARM_SMMU_VTCR_RES1 BIT(31)
@@ -347,12 +349,23 @@ struct arm_smmu_domain {
 
 static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
 {
-   return ARM_SMMU_TCR_EPD1 |
-  FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
-  FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
-  FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
-  FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
-  FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+   u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
+   FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
+   FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
+   FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
+   FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+
+   /*
+   * When TTBR1 is selected shift the TCR fields by 16 bits and disable
+   * translation in TTBR0
+   */
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   tcr = (tcr << 16) & ~ARM_SMMU_TCR_A1;
+   tcr |= ARM_SMMU_TCR_EPD0;
+   } else
+   tcr |= ARM_SMMU_TCR_EPD1;
+
+   return tcr;
 }
 
 static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 1/7] iommu/arm-smmu: Pass io-pgtable config to implementation specific function

2020-06-11 Thread Jordan Crouse
Construct the io-pgtable config before calling the implementation specific
init_context function and pass it so the implementation specific function
can get a chance to change it before the io-pgtable is created.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-impl.c |  3 ++-
 drivers/iommu/arm-smmu.c  | 11 ++-
 drivers/iommu/arm-smmu.h  |  3 ++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b70..a20e426d81ac 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
return 0;
 }
 
-static int cavium_init_context(struct arm_smmu_domain *smmu_domain)
+static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
 {
struct cavium_smmu *cs = container_of(smmu_domain->smmu,
  struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..8a3a6c8c887a 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -797,11 +797,6 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
cfg->asid = cfg->cbndx;
 
smmu_domain->smmu = smmu;
-   if (smmu->impl && smmu->impl->init_context) {
-   ret = smmu->impl->init_context(smmu_domain);
-   if (ret)
-   goto out_unlock;
-   }
 
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
@@ -812,6 +807,12 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
.iommu_dev  = smmu->dev,
};
 
+   if (smmu->impl && smmu->impl->init_context) {
+   ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg);
+   if (ret)
+   goto out_unlock;
+   }
+
if (smmu_domain->non_strict)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..38b041530a4f 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -383,7 +383,8 @@ struct arm_smmu_impl {
u64 val);
int (*cfg_probe)(struct arm_smmu_device *smmu);
int (*reset)(struct arm_smmu_device *smmu);
-   int (*init_context)(struct arm_smmu_domain *smmu_domain);
+   int (*init_context)(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *cfg);
void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
 int status);
int (*def_domain_type)(struct device *dev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 7/7] arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU

2020-06-11 Thread Jordan Crouse
Set the qcom,adreno-smmu compatible string for the GPU SMMU to enable
split pagetables.

Signed-off-by: Jordan Crouse 
---

 arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 8eb5a31346d2..8b15cd74e9ba 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -3556,7 +3556,7 @@
};
 
adreno_smmu: iommu@504 {
-   compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2";
+   compatible = "qcom,adreno-smmu", "qcom,smmu-v2";
reg = <0 0x504 0 0x1>;
#iommu-cells = <1>;
#global-interrupts = <2>;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 4/7] iommu/arm-smmu: Add a pointer to the attached device to smmu_domain

2020-06-11 Thread Jordan Crouse
Add a link to the pointer to the struct device that is attached to a
domain. This makes it easy to get the pointer if it is needed in the
implementation specific code.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 1 +
 drivers/iommu/arm-smmu.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 048de2681670..743d75b9ff3f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -801,6 +801,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
cfg->asid = cfg->cbndx;
 
smmu_domain->smmu = smmu;
+   smmu_domain->dev = dev;
 
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 5f2de20e883b..d33cfe26b2f5 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -345,6 +345,7 @@ struct arm_smmu_domain {
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
struct iommu_domain domain;
+   struct device   *dev;   /* Device attached to this 
domain */
 };
 
 static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 0/7] iommu/arm-smmu: Enable split pagetable support

2020-06-11 Thread Jordan Crouse


Another iteration of the split-pagetable support for arm-smmu and the Adreno GPU
SMMU. After email discussions [1] we opted to make a arm-smmu implementation for
specifically for the Adreno GPU and use that to enable split pagetable support
and later other implementation specific bits that we need.

On the hardware side this is very close to the same code from before [2] only
the TTBR1 quirk is turned on by the implementation and not a domain attribute.
In drm/msm we use the returned size of the aperture as a clue to let us know
which virtual address space we should use for global memory objects.

There are two open items that you should be aware of. First, in the
implementation specific code we have to check the compatible string of the
device so that we only enable TTBR1 for the GPU (SID 0) and not the GMU (SID 4).
I went back and forth trying to decide if I wanted to use the compatbile string
or the SID as the filter and settled on the compatible string but I could be
talked out of it.

The other open item is that in drm/msm the hardware only uses 49 bits of the
address space but arm-smmu expects the address to be sign extended all the way
to 64 bits. This isn't a problem normally unless you look at the hardware
registers that contain a IOVA and then the upper bits will be zero. I opted to
restrict the internal drm/msm IOVA range to only 49 bits and then sign extend
right before calling iommu_map / iommu_unmap. This is a bit wonky but I thought
that matching the hardware would be less confusing when debugging a hang.

v8: Pass the attached device in the smmu_domain to the implementation
specific functions

[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-May/044537.html
[2] https://patchwork.kernel.org/patch/11482591/


Jordan Crouse (7):
  iommu/arm-smmu: Pass io-pgtable config to implementation specific
function
  iommu/arm-smmu: Add support for split pagetables
  dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU
  iommu/arm-smmu: Add a pointer to the attached device to smmu_domain
  iommu/arm-smmu: Add implementation for the adreno GPU SMMU
  drm/msm: Set the global virtual address range from the IOMMU domain
  arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  4 ++
 arch/arm64/boot/dts/qcom/sdm845.dtsi  |  2 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   | 13 +-
 drivers/gpu/drm/msm/msm_iommu.c   |  7 +++
 drivers/iommu/arm-smmu-impl.c |  6 ++-
 drivers/iommu/arm-smmu-qcom.c | 45 ++-
 drivers/iommu/arm-smmu.c  | 33 +-
 drivers/iommu/arm-smmu.h  | 30 ++---
 8 files changed, 117 insertions(+), 23 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 6/7] drm/msm: Set the global virtual address range from the IOMMU domain

2020-06-11 Thread Jordan Crouse
Use the aperture settings from the IOMMU domain to set up the virtual
address range for the GPU. This allows us to transparently deal with
IOMMU side features (like split pagetables).

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +++--
 drivers/gpu/drm/msm/msm_iommu.c |  7 +++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 89673c7ed473..3e717c1ebb7f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -192,9 +192,18 @@ adreno_iommu_create_address_space(struct msm_gpu *gpu,
struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type);
struct msm_mmu *mmu = msm_iommu_new(&pdev->dev, iommu);
struct msm_gem_address_space *aspace;
+   u64 start, size;
 
-   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
-   0xfff);
+   /*
+* Use the aperture start or SZ_16M, whichever is greater. This will
+* ensure that we align with the allocated pagetable range while still
+* allowing room in the lower 32 bits for GMEM and whatnot
+*/
+   start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
+   size = iommu->geometry.aperture_end - start + 1;
+
+   aspace = msm_gem_address_space_create(mmu, "gpu",
+   start & GENMASK(48, 0), size);
 
if (IS_ERR(aspace) && !IS_ERR(mmu))
mmu->funcs->destroy(mmu);
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 3a381a9674c9..bbe129867590 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -36,6 +36,10 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
struct msm_iommu *iommu = to_msm_iommu(mmu);
size_t ret;
 
+   /* The arm-smmu driver expects the addresses to be sign extended */
+   if (iova & BIT(48))
+   iova |= GENMASK(63, 49);
+
ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
WARN_ON(!ret);
 
@@ -46,6 +50,9 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t 
iova, size_t len)
 {
struct msm_iommu *iommu = to_msm_iommu(mmu);
 
+   if (iova & BIT(48))
+   iova |= GENMASK(63, 49);
+
iommu_unmap(iommu->domain, iova, len);
 
return 0;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: Don't bypass pinned stream mappings

2020-06-09 Thread Jordan Crouse
Commit 0e764a01015d ("iommu/arm-smmu: Allow client devices to select
direct mapping") sets the initial domain type to SMMU_DOMAIN_IDENTITY
for devices that select direct mapping. This ends up setting the domain
as ARM_SMMU_DOMAIN_BYPASS which causes the stream ID mappings
for the device to be programmed to S2CR_TYPE_BYPASS.

This causes a problem for stream mappings that are inherited from
the bootloader since rewriting the stream to BYPASS will disrupt the
display controller access to DDR.

This is an extension to ("iommu/arm-smmu: Allow inheriting stream mapping
from bootloader") [1] that identifies streams that are already configured
 and marked them as pinned. This patch extends that to not re-write pinned
stream mappings for ARM_SMMU_DOMAIN_BYPASS domains.

[1] 
https://lore.kernel.org/r/20191226221709.3844244-4-bjorn.anders...@linaro.org

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c7add09f11c1..9c1e5ba948a7 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1143,6 +1143,10 @@ static int arm_smmu_domain_add_master(struct 
arm_smmu_domain *smmu_domain,
if (type == s2cr[idx].type && cbndx == s2cr[idx].cbndx)
continue;
 
+   /* Don't bypasss pinned streams; leave them as they are */
+   if (type == S2CR_TYPE_BYPASS && s2cr[idx].pinned)
+   continue;
+
s2cr[idx].type = type;
s2cr[idx].privcfg = S2CR_PRIVCFG_DEFAULT;
s2cr[idx].cbndx = cbndx;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 3/6] dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU

2020-06-04 Thread Jordan Crouse
Every Qcom Adreno GPU has an embedded SMMU for its own use. These
devices depend on unique features such as split pagetables,
different stall/halt requirements and other settings. Identify them
with a compatible string so that they can be identified in the
arm-smmu implementation specific code.

Signed-off-by: Jordan Crouse 
---

 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423..e52a1b146c97 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,10 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: Qcom Adreno GPUs implementing "arm,smmu-v2"
+items:
+  - const: qcom,adreno-smmu
+  - const: qcom,smmu-v2
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 4/6] iommu/arm-smmu: Add implementation for the adreno GPU SMMU

2020-06-04 Thread Jordan Crouse
Add a special implementation for the SMMU attached to most Adreno GPU
target triggered from the qcom,adreno-gpu-smmu compatible string. When
selected the driver will attempt to enable split pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-impl.c |  5 -
 drivers/iommu/arm-smmu-qcom.c | 38 +--
 drivers/iommu/arm-smmu.c  |  2 +-
 drivers/iommu/arm-smmu.h  |  3 ++-
 4 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index a20e426d81ac..3bb1ef4e85f7 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -69,7 +69,7 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
 }
 
 static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
-   struct io_pgtable_cfg *pgtbl_cfg)
+   struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
 {
struct cavium_smmu *cs = container_of(smmu_domain->smmu,
  struct cavium_smmu, smmu);
@@ -176,5 +176,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
 
+   if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu"))
+   return qcom_adreno_smmu_impl_init(smmu);
+
return smmu;
 }
diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index be4318044f96..cc03f94fa458 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -12,6 +12,22 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
 };
 
+static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
+{
+   /*
+* All targets that use the qcom,adreno-smmu compatible string *should*
+* be AARCH64 stage 1 but double check because the arm-smmu code assumes
+* that is the case when the TTBR1 quirk is enabled
+*/
+   if (of_device_is_compatible(dev->of_node, "qcom,adreno") &&
+   (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) &&
+   (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
+   pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
+
+   return 0;
+}
+
 static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {
{ .compatible = "qcom,adreno" },
{ .compatible = "qcom,mdp4" },
@@ -65,7 +81,15 @@ static const struct arm_smmu_impl qcom_smmu_impl = {
.reset = qcom_smmu500_reset,
 };
 
-struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
+static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
+   .init_context = qcom_adreno_smmu_init_context,
+   .def_domain_type = qcom_smmu_def_domain_type,
+   .reset = qcom_smmu500_reset,
+};
+
+
+static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
+   const struct arm_smmu_impl *impl)
 {
struct qcom_smmu *qsmmu;
 
@@ -75,8 +99,18 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct 
arm_smmu_device *smmu)
 
qsmmu->smmu = *smmu;
 
-   qsmmu->smmu.impl = &qcom_smmu_impl;
+   qsmmu->smmu.impl = impl;
devm_kfree(smmu->dev, smmu);
 
return &qsmmu->smmu;
 }
+
+struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
+{
+   return qcom_smmu_create(smmu, &qcom_smmu_impl);
+}
+
+struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device 
*smmu)
+{
+   return qcom_smmu_create(smmu, &qcom_adreno_smmu_impl);
+}
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 048de2681670..f14dc4ecb422 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -812,7 +812,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
};
 
if (smmu->impl && smmu->impl->init_context) {
-   ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg);
+   ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg, dev);
if (ret)
goto out_unlock;
}
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 5f2de20e883b..df70d410f77d 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -397,7 +397,7 @@ struct arm_smmu_impl {
int (*cfg_probe)(struct arm_smmu_device *smmu);
int (*reset)(struct arm_smmu_device *smmu);
int (*init_context)(struct arm_smmu_domain *smmu_domain,
-   struct io_pgtable_cfg *cfg);
+   struct io_pgtable_cfg *cfg, struct device *dev);
void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sy

[PATCH v7 2/6] iommu/arm-smmu: Add support for split pagetables

2020-06-04 Thread Jordan Crouse
Enable TTBR1 for a context bank if IO_PGTABLE_QUIRK_ARM_TTBR1 is selected
by the io-pgtable configuration.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 21 -
 drivers/iommu/arm-smmu.h | 25 +++--
 2 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 8a3a6c8c887a..048de2681670 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -555,11 +555,15 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
cb->ttbr[1] = 0;
} else {
-   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID,
- cfg->asid);
+   cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+   cfg->asid);
cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
-cfg->asid);
+   cfg->asid);
+
+   if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   cb->ttbr[1] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   else
+   cb->ttbr[0] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
}
} else {
cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
@@ -824,7 +828,14 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
 
/* Update the domain's page sizes to reflect the page table format */
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-   domain->geometry.aperture_end = (1UL << ias) - 1;
+
+   if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   domain->geometry.aperture_start = ~0UL << ias;
+   domain->geometry.aperture_end = ~0UL;
+   } else {
+   domain->geometry.aperture_end = (1UL << ias) - 1;
+   }
+
domain->geometry.force_aperture = true;
 
/* Initialise the context bank with our page table cfg */
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 38b041530a4f..5f2de20e883b 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -168,10 +168,12 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_TCR0x30
 #define ARM_SMMU_TCR_EAE   BIT(31)
 #define ARM_SMMU_TCR_EPD1  BIT(23)
+#define ARM_SMMU_TCR_A1BIT(22)
 #define ARM_SMMU_TCR_TG0   GENMASK(15, 14)
 #define ARM_SMMU_TCR_SH0   GENMASK(13, 12)
 #define ARM_SMMU_TCR_ORGN0 GENMASK(11, 10)
 #define ARM_SMMU_TCR_IRGN0 GENMASK(9, 8)
+#define ARM_SMMU_TCR_EPD0  BIT(7)
 #define ARM_SMMU_TCR_T0SZ  GENMASK(5, 0)
 
 #define ARM_SMMU_VTCR_RES1 BIT(31)
@@ -347,12 +349,23 @@ struct arm_smmu_domain {
 
 static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
 {
-   return ARM_SMMU_TCR_EPD1 |
-  FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
-  FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
-  FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
-  FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
-  FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+   u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
+   FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
+   FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
+   FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
+   FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+
+   /*
+   * When TTBR1 is selected shift the TCR fields by 16 bits and disable
+   * translation in TTBR0
+   */
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   tcr = (tcr << 16) & ~ARM_SMMU_TCR_A1;
+   tcr |= ARM_SMMU_TCR_EPD0;
+   } else
+   tcr |= ARM_SMMU_TCR_EPD1;
+
+   return tcr;
 }
 
 static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 1/6] iommu/arm-smmu: Pass io-pgtable config to implementation specific function

2020-06-04 Thread Jordan Crouse
Construct the io-pgtable config before calling the implementation specific
init_context function and pass it so the implementation specific function
can get a chance to change it before the io-pgtable is created.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-impl.c |  3 ++-
 drivers/iommu/arm-smmu.c  | 11 ++-
 drivers/iommu/arm-smmu.h  |  3 ++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b70..a20e426d81ac 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
return 0;
 }
 
-static int cavium_init_context(struct arm_smmu_domain *smmu_domain)
+static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
 {
struct cavium_smmu *cs = container_of(smmu_domain->smmu,
  struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..8a3a6c8c887a 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -797,11 +797,6 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
cfg->asid = cfg->cbndx;
 
smmu_domain->smmu = smmu;
-   if (smmu->impl && smmu->impl->init_context) {
-   ret = smmu->impl->init_context(smmu_domain);
-   if (ret)
-   goto out_unlock;
-   }
 
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
@@ -812,6 +807,12 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
.iommu_dev  = smmu->dev,
};
 
+   if (smmu->impl && smmu->impl->init_context) {
+   ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg);
+   if (ret)
+   goto out_unlock;
+   }
+
if (smmu_domain->non_strict)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..38b041530a4f 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -383,7 +383,8 @@ struct arm_smmu_impl {
u64 val);
int (*cfg_probe)(struct arm_smmu_device *smmu);
int (*reset)(struct arm_smmu_device *smmu);
-   int (*init_context)(struct arm_smmu_domain *smmu_domain);
+   int (*init_context)(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *cfg);
void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
 int status);
int (*def_domain_type)(struct device *dev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 0/6] iommu/arm-smmu: Enable split pagetable support

2020-06-04 Thread Jordan Crouse
Another iteration of the split-pagetable support for arm-smmu and the Adreno GPU
SMMU. After email discussions [1] we opted to make a arm-smmu implementation for
specifically for the Adreno GPU and use that to enable split pagetable support
and later other implementation specific bits that we need.

On the hardware side this is very close to the same code from before [2] only
the TTBR1 quirk is turned on by the implementation and not a domain attribute.
In drm/msm we use the returned size of the aperture as a clue to let us know
which virtual address space we should use for global memory objects.

There are two open items that you should be aware of. First, in the
implementation specific code we have to check the compatible string of the
device so that we only enable TTBR1 for the GPU (SID 0) and not the GMU (SID 4).
I went back and forth trying to decide if I wanted to use the compatbile string
or the SID as the filter and settled on the compatible string but I could be
talked out of it.

The other open item is that in drm/msm the hardware only uses 49 bits of the
address space but arm-smmu expects the address to be sign extended all the way
to 64 bits. This isn't a problem normally unless you look at the hardware
registers that contain a IOVA and then the upper bits will be zero. I opted to
restrict the internal drm/msm IOVA range to only 49 bits and then sign extend
right before calling iommu_map / iommu_unmap. This is a bit wonky but I thought
that matching the hardware would be less confusing when debugging a hang.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-May/044537.html
[2] https://patchwork.kernel.org/patch/11482591/


Jordan Crouse (6):
  iommu/arm-smmu: Pass io-pgtable config to implementation specific
function
  iommu/arm-smmu: Add support for split pagetables
  dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU
  iommu/arm-smmu: Add implementation for the adreno GPU SMMU
  drm/msm: Set the global virtual address range from the IOMMU domain
  arm6: dts: qcom: sm845: Set the compatible string for the GPU SMMU

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  4 ++
 arch/arm64/boot/dts/qcom/sdm845.dtsi  |  2 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   | 13 ++-
 drivers/gpu/drm/msm/msm_iommu.c   |  7 
 drivers/iommu/arm-smmu-impl.c |  6 ++-
 drivers/iommu/arm-smmu-qcom.c | 38 ++-
 drivers/iommu/arm-smmu.c  | 32 +++-
 drivers/iommu/arm-smmu.h  | 29 ++
 8 files changed, 108 insertions(+), 23 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu/arm-smmu: Mark qcom_smmu_client_of_match as possibly unused

2020-06-04 Thread Jordan Crouse
When CONFIG_OF=n of_match_device() gets pre-processed out of existence
leaving qcom-smmu_client_of_match unused. Mark it as possibly unused to
keep the compiler from warning in that case.

Fixes: 0e764a01015d ("iommu/arm-smmu: Allow client devices to select direct 
mapping")
Reported-by: kbuild test robot 
Acked-by: Will Deacon 
Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-qcom.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index cf01d0215a39..be4318044f96 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -12,7 +12,7 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
 };
 
-static const struct of_device_id qcom_smmu_client_of_match[] = {
+static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {
{ .compatible = "qcom,adreno" },
{ .compatible = "qcom,mdp4" },
{ .compatible = "qcom,mdss" },
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: Mark qcom_smmu_client_of_match as possibly unused

2020-06-03 Thread Jordan Crouse
When CONFIG_OF=n of_match_device() gets pre-processed out of existence
leaving qcom-smmu_client_of_match unused. Mark it as possibly unused to
keep the compiler from warning in that case.

Fixes: 0e764a01015d ("iommu/arm-smmu: Allow client devices to select direct 
mapping")
Reported-by: kbuild test robot 
Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-qcom.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index cf01d0215a39..063b4388b0ff 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -12,7 +12,7 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
 };
 
-static const struct of_device_id qcom_smmu_client_of_match[] = {
+static const struct __maybe_unused of_device_id qcom_smmu_client_of_match[] = {
{ .compatible = "qcom,adreno" },
{ .compatible = "qcom,mdp4" },
{ .compatible = "qcom,mdss" },
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 2/6] arm/smmu: Add auxiliary domain support for arm-smmuv2

2020-05-20 Thread Jordan Crouse
On Wed, May 20, 2020 at 01:57:01PM +0100, Will Deacon wrote:
> On Mon, May 18, 2020 at 08:50:27AM -0700, Rob Clark wrote:
> > On Mon, May 18, 2020 at 8:18 AM Will Deacon  wrote:
> > > On Wed, Mar 18, 2020 at 04:43:07PM -0700, Rob Clark wrote:
> > > > We do in fact need live domain switching, that is really the whole
> > > > point.  The GPU CP (command processor/parser) is directly updating
> > > > TTBR0 and triggering TLB flush, asynchronously from the CPU.
> > > >
> > > > And I think the answer about ASID is easy (on current hw).. it must be 
> > > > zero[*].
> > >
> > > Using ASID zero is really bad, because it means that you will end up 
> > > sharing
> > > TLB entries with whichever device is using context bank 0.
> > >
> > > Is the SMMU only used by the GPU in your SoC?
> > >
> > 
> > yes, the snapdragon SoCs have two SMMU instances, one used by the GPU,
> > where ASID0/cb0 is the gpu itself, and another cb is the GMU
> > (basically power control for the gpu), and the second SMMU is
> > everything else.
> 
> Right, in which case I'm starting to think that we should treat this GPU
> SMMU instance specially. Give it its own compatible string (looks like you
> need this for HUPCFG anyway) and hook in via arm_smmu_impl_init(). You can
> then set IO_PGTABLE_QUIRK_ARM_TTBR1 when talking to the io-pgtable code
> without having to add a domain attribute.

If we did this via a special GPU SMMU instance then we could also create and
register a dummy TTBR0 instance along with the TTBR1 instance and then we
wouldn't need to worry about the aux domains at all.

> With that. you'll need to find a way to allow the GPU driver to call into
> your own hooks for getting at the TTBR0 tables -- given that you're
> programming these in the hardware, I don't think it makes sense to expose
> that in the IOMMU API, since most devices won't be able to do anything with
> that data. Perhaps you could install a couple of function pointers
> (subdomain_alloc/subdomain_free) in the GPU device when you see it appear
> from the SMMU driver? Alternatively, you could make an io_pgtable_cfg
> available so that the GPU driver can interface with io-pgtable directly.
 
I don't want to speak for Rob but I think that this is the same direction we've
landed on. If we use the implementation specific code to initialize the base
pagetables then the GPU driver can use io-pgtable directly. We can easily
construct an io_pgtable_cfg. This feature will only be available for opt-in
GPU targets that will have a known configuration.

The only gotcha is TLB maintenance but Rob and I have ideas about coordinating
with the GPU hardware (which has to do a TLBIALL during a switch anyway) and we
can always use the iommu_tlb_flush_all() hammer from software if we really need
it. It might take a bit of thought, but it is doable.

> Yes, it's ugly, but I don't think it's worth trying to abstract this.

I'm not sure how ugly it is. I've always operated under the assumption that the
GPU SMMU was special (though it had generic registers) just because of where it
was and how it it was used.  In the long run baking in a implementation specific
solution would probably be preferable to lots of domain attributes and aux
domains that would never be used except by us.

> Thoughts? It's taken me a long time to figure out what's going on here,
> so sorry if it feels like I'm leading you round the houses.

I'll hack on this and try to get something in place. It might be dumber on the
GPU side than we would like but it would at least spur some more conversation.

Jordan

> Will

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 2/5] iommu/arm-smmu: Add support for TTBR1

2020-05-19 Thread Jordan Crouse
On Mon, May 18, 2020 at 03:59:59PM +0100, Will Deacon wrote:
> On Thu, Apr 09, 2020 at 05:33:47PM -0600, Jordan Crouse wrote:
> > Add support to enable TTBR1 if the domain requests it via the
> > DOMAIN_ATTR_SPLIT_TABLES attribute. If enabled by the hardware
> > and pagetable configuration the driver will configure the TTBR1 region
> > and program the domain pagetable on TTBR1. TTBR0 will be disabled.
> > 
> > After attaching the device the value of he domain attribute can
> > be queried to see if the split pagetables were successfully programmed.
> > The domain geometry will be updated as well so that the caller can
> > determine the active region for the pagetable that was programmed.
> > 
> > Signed-off-by: Jordan Crouse 
> > ---
> > 
> >  drivers/iommu/arm-smmu.c | 48 ++--
> >  drivers/iommu/arm-smmu.h | 24 +++-
> >  2 files changed, 59 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index a6a5796e9c41..db6d503c1673 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -555,11 +555,16 @@ static void arm_smmu_init_context_bank(struct 
> > arm_smmu_domain *smmu_domain,
> > cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
> > cb->ttbr[1] = 0;
> > } else {
> > -   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> > -   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID,
> > - cfg->asid);
> > -   cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
> > -cfg->asid);
> > +   cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
> > +   cfg->asid);
> > +
> > +   if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
> > +   cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> > +   } else {
> > +   cb->ttbr[0] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> > +   cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
> > +cfg->asid);
> > +   }
> 
> This looks odd to me. As I mentioned before, the SMMU driver absolutely has
> to manage the ASID space, so we should be setting it in both TTBRs here.

Somebody had suggested a while back to only do TTBR0 but I agree that it makes
more sense for it to be on both.

> > diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> > index 8d1cd54d82a6..5f6d0af7c8c8 100644
> > --- a/drivers/iommu/arm-smmu.h
> > +++ b/drivers/iommu/arm-smmu.h
> > @@ -172,6 +172,7 @@ enum arm_smmu_cbar_type {
> >  #define ARM_SMMU_TCR_SH0   GENMASK(13, 12)
> >  #define ARM_SMMU_TCR_ORGN0 GENMASK(11, 10)
> >  #define ARM_SMMU_TCR_IRGN0 GENMASK(9, 8)
> > +#define ARM_SMMU_TCR_EPD0  BIT(7)
> >  #define ARM_SMMU_TCR_T0SZ  GENMASK(5, 0)
> >  
> >  #define ARM_SMMU_VTCR_RES1 BIT(31)
> > @@ -343,16 +344,27 @@ struct arm_smmu_domain {
> > struct mutexinit_mutex; /* Protects smmu pointer */
> > spinlock_t  cb_lock; /* Serialises ATS1* ops and 
> > TLB syncs */
> > struct iommu_domain domain;
> > +   boolsplit_pagetables;
> >  };
> >  
> >  static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
> >  {
> > -   return ARM_SMMU_TCR_EPD1 |
> > -  FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
> > -  FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
> > -  FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
> > -  FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
> > -  FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
> > +   u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
> > +   FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
> > +   FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
> > +   FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
> > +   FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
> > +
> > +   /*
> > +   * When TTBR1 is selected shift the TCR fields by 16 bits and disable
> > +  

Re: [PATCH] iomm/arm-smmu: Add stall implementation hook

2020-05-11 Thread Jordan Crouse
On Fri, May 08, 2020 at 08:40:40AM -0700, Rob Clark wrote:
> On Fri, May 8, 2020 at 8:32 AM Rob Clark  wrote:
> >
> > On Thu, May 7, 2020 at 5:54 AM Will Deacon  wrote:
> > >
> > > On Thu, May 07, 2020 at 11:55:54AM +0100, Robin Murphy wrote:
> > > > On 2020-05-07 11:14 am, Sai Prakash Ranjan wrote:
> > > > > On 2020-04-22 01:50, Sai Prakash Ranjan wrote:
> > > > > > Add stall implementation hook to enable stalling
> > > > > > faults on QCOM platforms which supports it without
> > > > > > causing any kind of hardware mishaps. Without this
> > > > > > on QCOM platforms, GPU faults can cause unrelated
> > > > > > GPU memory accesses to return zeroes. This has the
> > > > > > unfortunate result of command-stream reads from CP
> > > > > > getting invalid data, causing a cascade of fail.
> > > >
> > > > I think this came up before, but something about this rationale doesn't 
> > > > add
> > > > up - we're not *using* stalls at all, we're still terminating faulting
> > > > transactions unconditionally; we're just using CFCFG to terminate them 
> > > > with
> > > > a slight delay, rather than immediately. It's really not clear how or 
> > > > why
> > > > that makes a difference. Is it a GPU bug? Or an SMMU bug? Is this 
> > > > reliable
> > > > (or even a documented workaround for something), or might things start
> > > > blowing up again if any other behaviour subtly changes? I'm not dead set
> > > > against adding this, but I'd *really* like to have a lot more 
> > > > confidence in
> > > > it.
> > >
> > > Rob mentioned something about the "bus returning zeroes" before, but I 
> > > agree
> > > that we need more information so that we can reason about this and 
> > > maintain
> > > the code as the driver continues to change. That needs to be a comment in
> > > the driver, and I don't think "but android seems to work" is a good enough
> > > justification. There was some interaction with HUPCF as well.
> >
> > The issue is that there are multiple parallel memory accesses
> > happening at the same time, for example CP (the cmdstream processor)
> > will be reading ahead and setting things up for the next draw or
> > compute grid, in parallel with some memory accesses from the shader
> > which could trigger a fault.  (And with faults triggered by something
> > in the shader, there are *many* shader threads running in parallel so
> > those tend to generate a big number of faults at the same time.)
> >
> > We need either CFCFG or HUPCF, otherwise what I have observed is that
> > while the fault happens, CP's memory access will start returning
> > zero's instead of valid cmdstream data, which triggers a GPU hang.  I
> > can't say whether this is something unique to qcom's implementation of
> > the smmu spec or not.
> >
> > *Often* a fault is the result of the usermode gl/vk/cl driver bug,
> > although I don't think that is an argument against fixing this in the
> > smmu driver.. I've been carrying around a local patch to set HUPCF for
> > *years* because debugging usermode driver issues is so much harder
> > without.  But there are some APIs where faults can be caused by the
> > user's app on top of the usermode driver.
> >
> 
> Also, I'll add to that, a big wish of mine is to have stall with the
> ability to resume later from a wq context.  That would enable me to
> hook in the gpu crash dump handling for faults, which would make
> debugging these sorts of issues much easier.  I think I posted a
> prototype of this quite some time back, which would schedule a worker
> on the first fault (since there are cases where you see 1000's of
> faults at once), which grabbed some information about the currently
> executing submit and some gpu registers to indicate *where* in the
> submit (a single submit could have 100's or 1000's of draws), and then
> resumed the iommu cb.
> 
> (This would ofc eventually be useful for svm type things.. I expect
> we'll eventually care about that too.)

Rob is right about HUPCF. Due to the parallel nature of the command processor
there is always a very good chance that a CP access is somewhere in the bus so
any pagefault is usually a death sentence. The GPU context bank would always
want HUPCF set to 1.

Downstream also uses CFCFG for stall-on-fault debug case. You wouldn't want
this on all the time in production since bringing down the world for every user
pagefault is less than desirable so it needs to be modified in run-time (or at
the very least kernel command line selectable).

Jordan

PS: Interestingly, the GMU does not want HUPCF set to 1 because it wants to
crash immediately on all invalid accesses so ideally these combination of bits
would be configurable on a per-context basis.

> > >
> > > As a template, I'd suggest:
> > >
> > > > > > diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> > > > > > index 8d1cd54d82a6..d5134e0d5cce 100644
> > > > > > --- a/drivers/iommu/arm-smmu.h
> > > > > > +++ b/drivers/iommu/arm-smmu.h
> > > > > > @@ -386,6 +386,7 @@ struct arm_smmu_im

Re: [PATCH 0/2] iommu/arm-smmu: Allow client devices to select direct mapping

2020-04-13 Thread Jordan Crouse
On Thu, Apr 09, 2020 at 04:31:24PM -0700, Matthias Kaehlcke wrote:
> On Tue, Feb 04, 2020 at 11:12:17PM +0530, Sai Prakash Ranjan wrote:
> > Hello Robin, Will
> > 
> > On 2020-01-22 17:18, Sai Prakash Ranjan wrote:
> > > This series allows drm devices to set a default identity
> > > mapping using iommu_request_dm_for_dev(). First patch is
> > > a cleanup to support other SoCs to call into QCOM specific
> > > implementation and preparation for second patch.
> > > Second patch sets the default identity domain for drm devices.
> > > 
> > > Jordan Crouse (1):
> > >   iommu/arm-smmu: Allow client devices to select direct mapping
> > > 
> > > Sai Prakash Ranjan (1):
> > >   iommu: arm-smmu-impl: Convert to a generic reset implementation
> > > 
> > >  drivers/iommu/arm-smmu-impl.c |  8 +++--
> > >  drivers/iommu/arm-smmu-qcom.c | 55 +--
> > >  drivers/iommu/arm-smmu.c  |  3 ++
> > >  drivers/iommu/arm-smmu.h  |  5 
> > >  4 files changed, 65 insertions(+), 6 deletions(-)
> > 
> > Any review comments?
> 
> Ping
> 
> What is the status of this series, is it ready to land or are any changes
> needed?
> 
> Thanks
> 
> Matthias

I think this is up in the air following the changes that Joerg suggested:
https://lists.linuxfoundation.org/pipermail/iommu/2020-April/043017.html

Jordan
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 3/5] drm/msm: Attach the IOMMU device during initialization

2020-04-09 Thread Jordan Crouse
Everywhere an IOMMU object is created by msm_gpu_create_address_space
the IOMMU device is attached immediately after. Instead of carrying around
the infrastructure to do the attach from the device specific code do it
directly in the msm_iommu_init() function. This gets it out of the way for
more aggressive cleanups that follow.

Reviewed-by: Rob Clark 
Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  |  8 
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c |  4 
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  7 ---
 drivers/gpu/drm/msm/msm_gem_vma.c| 23 +++
 drivers/gpu/drm/msm/msm_gpu.c| 11 +--
 drivers/gpu/drm/msm/msm_gpummu.c |  6 --
 drivers/gpu/drm/msm/msm_iommu.c  | 15 +++
 drivers/gpu/drm/msm/msm_mmu.h|  1 -
 8 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
index ce19f1d39367..6629a142574e 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
@@ -772,7 +772,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
 {
struct iommu_domain *domain;
struct msm_gem_address_space *aspace;
-   int ret;
 
domain = iommu_domain_alloc(&platform_bus_type);
if (!domain)
@@ -788,13 +787,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
return PTR_ERR(aspace);
}
 
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DPU_ERROR("failed to attach iommu %d\n", ret);
-   msm_gem_address_space_put(aspace);
-   return ret;
-   }
-
dpu_kms->base.aspace = aspace;
return 0;
 }
diff --git a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c 
b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
index dda05436f716..9dba37c6344f 100644
--- a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
@@ -518,10 +518,6 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret)
-   goto fail;
} else {
DRM_DEV_INFO(dev->dev, "no iommu, fallback to phys "
"contig buffers for scanout\n");
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
index 47b989834af1..1e9ba99fd9eb 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
@@ -644,13 +644,6 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DRM_DEV_ERROR(&pdev->dev, "failed to attach iommu: 
%d\n",
-   ret);
-   goto fail;
-   }
} else {
DRM_DEV_INFO(&pdev->dev,
 "no iommu, fallback to phys contig buffers for 
scanout\n");
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 1af5354bcd46..91d993a16850 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -131,8 +131,8 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
const char *name)
 {
struct msm_gem_address_space *aspace;
-   u64 size = domain->geometry.aperture_end -
-   domain->geometry.aperture_start;
+   u64 start = domain->geometry.aperture_start;
+   u64 size = domain->geometry.aperture_end - start;
 
aspace = kzalloc(sizeof(*aspace), GFP_KERNEL);
if (!aspace)
@@ -141,9 +141,18 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
spin_lock_init(&aspace->lock);
aspace->name = name;
aspace->mmu = msm_iommu_new(dev, domain);
+   if (IS_ERR(aspace->mmu)) {
+   int ret = PTR_ERR(aspace->mmu);
 
-   drm_mm_init(&aspace->mm, (domain->geometry.aperture_start >> 
PAGE_SHIFT),
-   size >> PAGE_SHIFT);
+   kfree(aspace);
+   return ERR_PTR(ret);
+   }
+
+   /*
+* Attaching the IOMMU device changes the aperture values so use the
+* cached values instead
+*/
+   drm_mm_init(&aspace->mm, start >> PAGE_SHIFT, size >> PAGE_SHIFT);
 
kref_init(&aspace->kref);
 
@@ -164,6 +173,12 @@ msm_gem_address_space_create_a2xx(struct device *dev, 
struct msm_gpu *gpu,
spin_lock_init(&aspace->lock);
aspac

[PATCH v6 2/5] iommu/arm-smmu: Add support for TTBR1

2020-04-09 Thread Jordan Crouse
Add support to enable TTBR1 if the domain requests it via the
DOMAIN_ATTR_SPLIT_TABLES attribute. If enabled by the hardware
and pagetable configuration the driver will configure the TTBR1 region
and program the domain pagetable on TTBR1. TTBR0 will be disabled.

After attaching the device the value of he domain attribute can
be queried to see if the split pagetables were successfully programmed.
The domain geometry will be updated as well so that the caller can
determine the active region for the pagetable that was programmed.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 48 ++--
 drivers/iommu/arm-smmu.h | 24 +++-
 2 files changed, 59 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a6a5796e9c41..db6d503c1673 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -555,11 +555,16 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
cb->ttbr[1] = 0;
} else {
-   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID,
- cfg->asid);
-   cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
-cfg->asid);
+   cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+   cfg->asid);
+
+   if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   } else {
+   cb->ttbr[0] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+cfg->asid);
+   }
}
} else {
cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
@@ -673,6 +678,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
enum io_pgtable_fmt fmt;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+   unsigned long quirks = 0;
 
mutex_lock(&smmu_domain->init_mutex);
if (smmu_domain->smmu)
@@ -741,6 +747,14 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
oas = smmu->ipa_size;
if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
fmt = ARM_64_LPAE_S1;
+
+   /*
+* We are assuming that split pagetables will always use
+* SEP_UPSTREAM so we don't need to reduce the size of
+* the ias to account for the sign extension bit
+*/
+   if (smmu_domain->split_pagetables)
+   quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
} else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) {
fmt = ARM_32_LPAE_S1;
ias = min(ias, 32UL);
@@ -810,6 +824,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
.coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK,
.tlb= smmu_domain->flush_ops,
.iommu_dev  = smmu->dev,
+   .quirks = quirks,
};
 
if (smmu_domain->non_strict)
@@ -823,8 +838,16 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
 
/* Update the domain's page sizes to reflect the page table format */
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-   domain->geometry.aperture_end = (1UL << ias) - 1;
-   domain->geometry.force_aperture = true;
+
+   if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   domain->geometry.aperture_start = ~0UL << ias;
+   domain->geometry.aperture_end = ~0UL;
+   domain->geometry.force_aperture = true;
+   } else {
+   domain->geometry.aperture_end = (1UL << ias) - 1;
+   domain->geometry.force_aperture = true;
+   smmu_domain->split_pagetables = false;
+   }
 
/* Initialise the context bank with our page table cfg */
arm_smmu_init_context_bank(smmu_domain, &pgtbl_cfg);
@@ -1526,6 +1549,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case DOMAIN_ATTR_NESTING:
*(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);

[PATCH v6 0/5] iommu/arm-smmu: Split pagetable support for arm-smmu-v2

2020-04-09 Thread Jordan Crouse
This is another iteration for the split pagetable support based on the
suggestions from Robin and Will [1].

Background: In order to support per-context pagetables the GPU needs to enable
split tables so that we can store global buffers in the TTBR1 space leaving the
GPU free to program the TTBR0 register with the address of a context specific
pagetable.

If the DOMAIN_ATTR_SPLIT_TABLES attribute is set on the domain before attaching,
the context bank assigned to the domain will be programmed to allow translations
in the TTBR1 space. Translations in the TTBR0 region will be disallowed because,
as Robin pointe out, having a un-programmed TTBR0 register is dangerous.

The driver can determine if TTBR1 was successfully programmed by querying
DOMAIN_ATTR_SPLIT_TABLES after attaching. The domain geometry will also be
updated to reflect the virtual address space for the TTBR1 range.

Upcoming changes will allow auxiliary domains to be attached to the device which
will enable and program TTBR0.

This patchset is based on top of linux-next-20200409

Change log:

v6: Cleanups for the arm-smmu TTBR1 patch from Will Deacon
v4: Only program TTBR1 when split pagetables are requested. TTBR0 will be
enabled later when an auxiliary domain is attached
v3: Remove the implementation specific and make split pagetable support
part of the generic configuration

[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-January/041373.html


Jordan Crouse (5):
  iommu: Add DOMAIN_ATTR_SPLIT_TABLES
  iommu/arm-smmu: Add support for TTBR1
  drm/msm: Attach the IOMMU device during initialization
  drm/msm: Refactor address space initialization
  drm/msm/a6xx: Support split pagetables

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c| 51 
 drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ---
 drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 18 +++--
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 18 -
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 --
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 18 -
 drivers/gpu/drm/msm/msm_drv.h|  8 +---
 drivers/gpu/drm/msm/msm_gem_vma.c| 36 +++--
 drivers/gpu/drm/msm/msm_gpu.c| 49 +--
 drivers/gpu/drm/msm/msm_gpu.h|  4 +-
 drivers/gpu/drm/msm/msm_gpummu.c |  6 ---
 drivers/gpu/drm/msm/msm_iommu.c  | 18 +
 drivers/gpu/drm/msm/msm_mmu.h|  1 -
 drivers/iommu/arm-smmu.c | 48 ++
 drivers/iommu/arm-smmu.h | 24 ---
 include/linux/iommu.h|  2 +
 21 files changed, 200 insertions(+), 155 deletions(-)

-- 
2.17.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 5/5] drm/msm/a6xx: Support split pagetables

2020-04-09 Thread Jordan Crouse
Attempt to enable split pagetables if the arm-smmu driver supports it.
This will move the default address space from the default region to
the address range assigned to TTBR1. The behavior should be transparent
to the driver for now but it gets the default buffers out of the way
when we want to start swapping TTBR0 for context-specific pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 52 ++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 02ade43d6335..b27daa77723c 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -825,6 +825,56 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
return (unsigned long)busy_time;
 }
 
+static struct msm_gem_address_space *
+a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type);
+   struct msm_gem_address_space *aspace;
+   struct msm_mmu *mmu;
+   u64 start, size;
+   u32 val = 1;
+   int ret;
+
+   if (!iommu)
+   return ERR_PTR(-ENOMEM);
+
+   /*
+* Try to request split pagetables - the request has to be made before
+* the domian is attached
+*/
+   iommu_domain_set_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
+
+   mmu = msm_iommu_new(&pdev->dev, iommu);
+   if (IS_ERR(mmu)) {
+   iommu_domain_free(iommu);
+   return ERR_CAST(mmu);
+   }
+
+   /*
+* After the domain is attached, see if the split tables were actually
+* successful.
+*/
+   ret = iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
+   if (!ret && val) {
+   /*
+* The aperture start will be at the beginning of the TTBR1
+* space so use that as a base
+*/
+   start = iommu->geometry.aperture_start;
+   size = 0x;
+   } else {
+   /* Otherwise use the legacy 32 bit region */
+   start = SZ_16M;
+   size = 0x - SZ_16M;
+   }
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", start, size);
+   if (IS_ERR(aspace))
+   iommu_domain_free(iommu);
+
+   return aspace;
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
@@ -847,7 +897,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
 #endif
-   .create_address_space = adreno_iommu_create_address_space,
+   .create_address_space = a6xx_create_address_space,
},
.get_timestamp = a6xx_get_timestamp,
 };
-- 
2.17.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 1/5] iommu: Add DOMAIN_ATTR_SPLIT_TABLES

2020-04-09 Thread Jordan Crouse
Add a new attribute to enable and query the state of split pagetables
for the domain.

Acked-by: Will Deacon 
Signed-off-by: Jordan Crouse 
---

 include/linux/iommu.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7ef8b0bda695..d0f96f748a00 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -126,6 +126,8 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   /* Enable split pagetables (for example, TTBR1 on arm-smmu) */
+   DOMAIN_ATTR_SPLIT_TABLES,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.17.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 4/5] drm/msm: Refactor address space initialization

2020-04-09 Thread Jordan Crouse
Refactor how address space initialization works. Instead of having the
address space function create the MMU object (and thus require separate but
equal functions for gpummu and iommu) use a single function and pass the
MMU struct in. Make the generic code cleaner by using target specific
functions to create the address space so a2xx can do its own thing in its
own space.  For all the other targets use a generic helper to initialize
IOMMU but leave the door open for newer targets to use customization
if they need it.

Reviewed-by: Rob Clark 
Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ---
 drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 10 ++---
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 14 ---
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 --
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 11 -
 drivers/gpu/drm/msm/msm_drv.h|  8 +---
 drivers/gpu/drm/msm/msm_gem_vma.c| 51 +++-
 drivers/gpu/drm/msm/msm_gpu.c| 40 +--
 drivers/gpu/drm/msm/msm_gpu.h|  4 +-
 drivers/gpu/drm/msm/msm_iommu.c  |  3 ++
 16 files changed, 82 insertions(+), 114 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index 1f83bc18d500..60f6472a3e58 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -401,6 +401,21 @@ static struct msm_gpu_state *a2xx_gpu_state_get(struct 
msm_gpu *gpu)
return state;
 }
 
+static struct msm_gem_address_space *
+a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct msm_mmu *mmu = msm_gpummu_new(&pdev->dev, gpu);
+   struct msm_gem_address_space *aspace;
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
+   SZ_16M + 0xfff * SZ_64K);
+
+   if (IS_ERR(aspace) && !IS_ERR(mmu))
+   mmu->funcs->destroy(mmu);
+
+   return aspace;
+}
+
 /* Register offset defines for A2XX - copy of A3XX */
 static const unsigned int a2xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE),
@@ -429,6 +444,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a2xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = a2xx_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index b67f88872726..0a5ea9f56cb8 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -441,6 +441,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a3xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 253d8d85daad..b626afb0627d 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -532,6 +532,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a4xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a4xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 724024a2243a..e81b1deaf535 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1439,6 +1439,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_busy = a5xx_gpu_busy,
.gpu_state_get = a5xx_gpu_state_get,
.gpu_state_put = a5xx_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a5xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 68af24150de5..02ade43d6335 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -847,6 +847,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
 #endif
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a6xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.

Re: [PATCH v1 2/6] arm/smmu: Add auxiliary domain support for arm-smmuv2

2020-03-19 Thread Jordan Crouse
On Wed, Mar 18, 2020 at 04:43:07PM -0700, Rob Clark wrote:
> On Wed, Mar 18, 2020 at 3:48 PM Will Deacon  wrote:
> >
> > On Tue, Jan 28, 2020 at 03:16:06PM -0700, Jordan Crouse wrote:
> > > Support auxiliary domains for arm-smmu-v2 to initialize and support
> > > multiple pagetables for a single SMMU context bank. Since the smmu-v2
> > > hardware doesn't have any built in support for switching the pagetable
> > > base it is left as an exercise to the caller to actually use the 
> > > pagetable.
> > >
> > > Aux domains are supported if split pagetable (TTBR1) support has been
> > > enabled on the master domain.  Each auxiliary domain will reuse the
> > > configuration of the master domain. By default the a domain with TTBR1
> > > support will have the TTBR0 region disabled so the first attached aux
> > > domain will enable the TTBR0 region in the hardware and conversely the
> > > last domain to be detached will disable TTBR0 translations.  All 
> > > subsequent
> > > auxiliary domains create a pagetable but not touch the hardware.
> > >
> > > The leaf driver will be able to query the physical address of the
> > > pagetable with the DOMAIN_ATTR_PTBASE attribute so that it can use the
> > > address with whatever means it has to switch the pagetable base.
> > >
> > > Following is a pseudo code example of how a domain can be created
> > >
> > >  /* Check to see if aux domains are supported */
> > >  if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
> > >iommu = iommu_domain_alloc(...);
> > >
> > >if (iommu_aux_attach_device(domain, dev))
> > >return FAIL;
> > >
> > >   /* Save the base address of the pagetable for use by the driver
> > >   iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, &ptbase);
> > >  }
> >
> > I'm not really understanding what the pagetable base gets used for here and,
> > to be honest with you, the whole thing feels like a huge layering violation
> > with the way things are structured today. Why doesn't the caller just
> > interface with io-pgtable directly?
> >
> > Finally, if we need to support context-switching TTBR0 for a live domain
> > then that code really needs to live inside the SMMU driver because the
> > ASID and TLB management necessary to do that safely doesn't belong anywhere
> > else.
> 
> Hi Will,
> 
> We do in fact need live domain switching, that is really the whole
> point.  The GPU CP (command processor/parser) is directly updating
> TTBR0 and triggering TLB flush, asynchronously from the CPU.

Right. This is entirely done in hardware with a GPU that has complete access to
the context bank registers. All the driver does is send the PTBASE to the
command stream see [1] and especially [2] (look for CP_SMMU_TABLE_UPDATE).

As for interacting with the io-pgtable directly I would love to do that but it
would need some new infrastructure to either pull the io-pgtable from the aux
domain or to create an io-pgtable ourselves and pass it for use by the aux
domain. I'm not sure if that is better for the layering violation.

> And I think the answer about ASID is easy (on current hw).. it must be 
> zero[*].

Right now the GPU microcode still uses TLBIALL. I want to assign each new aux
domain its own ASID in the hopes that we could some day change that but for now
having a uinque ASID doesn't help.

Jordan

[1] https://patchwork.freedesktop.org/patch/351089/
[2] https://patchwork.freedesktop.org/patch/351090/

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH v5 0/5] iommu/arm-smmu: Split pagetable support for arm-smmu-v2

2020-02-27 Thread Jordan Crouse
On Tue, Jan 28, 2020 at 03:00:14PM -0700, Jordan Crouse wrote:
> This is another iteration for the split pagetable support based on the
> suggestions from Robin and Will [1].
> 
> Background: In order to support per-context pagetables the GPU needs to enable
> split tables so that we can store global buffers in the TTBR1 space leaving 
> the
> GPU free to program the TTBR0 register with the address of a context specific
> pagetable.
> 
> If the DOMAIN_ATTR_SPLIT_TABLES attribute is set on the domain before 
> attaching,
> the context bank assigned to the domain will be programmed to allow 
> translations
> in the TTBR1 space. Translations in the TTBR0 region will be disallowed 
> because,
> as Robin pointe out, having a un-programmed TTBR0 register is dangerous.
> 
> The driver can determine if TTBR1 was successfully programmed by querying
> DOMAIN_ATTR_SPLIT_TABLES after attaching. The domain geometry will also be
> updated to reflect the virtual address space for the TTBR1 range.
> 
> Upcoming changes will allow auxiliary domains to be attached to the device 
> which
> will enable and program TTBR0.
> 
> This patchset is based on top of linux-next-20200127.

Quick ping for feedback so I can respin for (maybe?) 5.6.

Thanks,
Jordan

> Change log:
> 
> v4: Only program TTBR1 when split pagetables are requested. TTBR0 will be
> enabled later when an auxiliary domain is attached
> v3: Remove the implementation specific and make split pagetable support
> part of the generic configuration
> 
> [1] https://lists.linuxfoundation.org/pipermail/iommu/2020-January/041373.html
> 
> Jordan Crouse (5):
>   iommu: Add DOMAIN_ATTR_SPLIT_TABLES
>   iommu/arm-smmu: Add support for TTBR1
>   drm/msm: Attach the IOMMU device during initialization
>   drm/msm: Refactor address space initialization
>   drm/msm/a6xx: Support split pagetables
> 
>  drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 ++
>  drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c| 51 
> 
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 18 ---
>  drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 18 +--
>  drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 ---
>  drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 18 +--
>  drivers/gpu/drm/msm/msm_drv.h|  8 ++---
>  drivers/gpu/drm/msm/msm_gem_vma.c| 36 --
>  drivers/gpu/drm/msm/msm_gpu.c| 49 ++
>  drivers/gpu/drm/msm/msm_gpu.h|  4 +--
>  drivers/gpu/drm/msm/msm_gpummu.c |  6 
>  drivers/gpu/drm/msm/msm_iommu.c  | 18 ++-
>  drivers/gpu/drm/msm/msm_mmu.h|  1 -
>  drivers/iommu/arm-smmu.c | 48 +-
>  drivers/iommu/arm-smmu.h | 22 ++
>  include/linux/iommu.h|  2 ++
>  21 files changed, 198 insertions(+), 155 deletions(-)
> 
> -- 
> 2.7.4
> ___
> Freedreno mailing list
> freedr...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v1] iommu/arm-smmu: Allow domains to choose a context bank

2020-02-18 Thread Jordan Crouse
On Tue, Feb 18, 2020 at 10:19:53AM -0800, Rob Clark wrote:
> On Tue, Jan 28, 2020 at 2:34 PM Jordan Crouse  wrote:
> >
> > Domains which are being set up for split pagetables usually want to be
> > on a specific context bank for hardware reasons. Force the context
> > bank for domains with the split-pagetable quirk to context bank 0.
> > If context bank 0 is taken, move that context bank to another unused
> > bank and rewrite the stream matching registers accordingly.
> 
> Is the only reason for dealing with the case that bank 0 is already in
> use, due to the DMA domain that gets setup before driver probes?

Right. On Adreno GPUs only one context bank at a time is accessible from the
GPU through an aperture which defaults to context bank 0 and as you might
expect, the aperture controls are protected by the secure world on AC enabled
targets.

Some of the newer targets have a SCM call to switch the aperture but for all the
currently merged platforms we are forced to use context bank 0.

> I'm kinda thinking that we need to invent a way to unwind/detatch the
> DMA domain, and unhook the iommu-dmaops, since this seems to already
> be already causing problems with dma-bufs imported from other drivers
> (who expect that dma_map_*(), with the importing device's dev ptr,
> will do something sane.

That could work, assuming that we could guarantee that our new replacement
domain got the context bank we wanted.

Jordan

> >
> > This is be used by [1] and [2] to leave context bank 0 open so that
> > the Adreno GPU can program it.
> >
> > [1] 
> > https://lists.linuxfoundation.org/pipermail/iommu/2020-January/041438.html
> > [2] 
> > https://lists.linuxfoundation.org/pipermail/iommu/2020-January/041444.html
> >
> > Signed-off-by: Jordan Crouse 
> > ---
> >
> >  drivers/iommu/arm-smmu.c | 63 
> > +---
> >  1 file changed, 59 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index 85a6773..799a254 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -254,6 +254,43 @@ static int __arm_smmu_alloc_bitmap(unsigned long *map, 
> > int start, int end)
> > return idx;
> >  }
> >
> > +static void arm_smmu_write_s2cr(struct arm_smmu_device *smmu, int idx);
> > +
> > +static int __arm_smmu_alloc_cb(struct arm_smmu_device *smmu, int start,
> > +   int target)
> > +{
> > +   int new, i;
> > +
> > +   /* Allocate a new context bank id */
> > +   new = __arm_smmu_alloc_bitmap(smmu->context_map, start,
> > +   smmu->num_context_banks);
> > +
> > +   if (new < 0)
> > +   return new;
> > +
> > +   /* If no target is set or we actually got the bank index we wanted 
> > */
> > +   if (target == -1 || new == target)
> > +   return new;
> > +
> > +   /* Copy the context configuration to the new index */
> > +   memcpy(&smmu->cbs[new], &smmu->cbs[target], sizeof(*smmu->cbs));
> > +   smmu->cbs[new].cfg->cbndx = new;
> > +
> > +   /* FIXME: Do we need locking here? */
> > +   for (i = 0; i < smmu->num_mapping_groups; i++) {
> > +   if (smmu->s2crs[i].cbndx == target) {
> > +   smmu->s2crs[i].cbndx = new;
> > +   arm_smmu_write_s2cr(smmu, i);
> > +   }
> > +   }
> > +
> > +   /*
> > +* FIXME: Does getting here imply that 'target' is already set in 
> > the
> > +* context_map?
> > +*/
> > +   return target;
> > +}
> > +
> >  static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
> >  {
> > clear_bit(idx, map);
> > @@ -770,6 +807,7 @@ static int arm_smmu_init_domain_context(struct 
> > iommu_domain *domain,
> > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> > unsigned long quirks = 0;
> > +   int forcecb = -1;
> >
> > mutex_lock(&smmu_domain->init_mutex);
> > if (smmu_domain->smmu)
> > @@ -844,8 +882,25 @@ static int arm_smmu_init_domain_context(struct 
> > iommu_domain *domain,
> >  * SEP_UPSTREAM so we don't need to reduce the size 
> > of
> >  * the ias to account for the sign extension bit
> >  

[RFC PATCH v1] iommu/arm-smmu: Allow domains to choose a context bank

2020-01-28 Thread Jordan Crouse
Domains which are being set up for split pagetables usually want to be
on a specific context bank for hardware reasons. Force the context
bank for domains with the split-pagetable quirk to context bank 0.
If context bank 0 is taken, move that context bank to another unused
bank and rewrite the stream matching registers accordingly.

This is be used by [1] and [2] to leave context bank 0 open so that
the Adreno GPU can program it.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-January/041438.html
[2] https://lists.linuxfoundation.org/pipermail/iommu/2020-January/041444.html

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 63 +---
 1 file changed, 59 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 85a6773..799a254 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -254,6 +254,43 @@ static int __arm_smmu_alloc_bitmap(unsigned long *map, int 
start, int end)
return idx;
 }
 
+static void arm_smmu_write_s2cr(struct arm_smmu_device *smmu, int idx);
+
+static int __arm_smmu_alloc_cb(struct arm_smmu_device *smmu, int start,
+   int target)
+{
+   int new, i;
+
+   /* Allocate a new context bank id */
+   new = __arm_smmu_alloc_bitmap(smmu->context_map, start,
+   smmu->num_context_banks);
+
+   if (new < 0)
+   return new;
+
+   /* If no target is set or we actually got the bank index we wanted */
+   if (target == -1 || new == target)
+   return new;
+
+   /* Copy the context configuration to the new index */
+   memcpy(&smmu->cbs[new], &smmu->cbs[target], sizeof(*smmu->cbs));
+   smmu->cbs[new].cfg->cbndx = new;
+
+   /* FIXME: Do we need locking here? */
+   for (i = 0; i < smmu->num_mapping_groups; i++) {
+   if (smmu->s2crs[i].cbndx == target) {
+   smmu->s2crs[i].cbndx = new;
+   arm_smmu_write_s2cr(smmu, i);
+   }
+   }
+
+   /*
+* FIXME: Does getting here imply that 'target' is already set in the
+* context_map?
+*/
+   return target;
+}
+
 static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
 {
clear_bit(idx, map);
@@ -770,6 +807,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
unsigned long quirks = 0;
+   int forcecb = -1;
 
mutex_lock(&smmu_domain->init_mutex);
if (smmu_domain->smmu)
@@ -844,8 +882,25 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
 * SEP_UPSTREAM so we don't need to reduce the size of
 * the ias to account for the sign extension bit
 */
-   if (smmu_domain->split_pagetables)
-   quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
+   if (smmu_domain->split_pagetables) {
+   /*
+* If split pagetables are enabled, assume that
+* the user's intent is to use per-instance
+* pagetables which, at least on a QCOM target,
+* means that this domain should be on context
+* bank 0.
+*/
+
+   /*
+* If we can't force to context bank 0 then
+* don't bother enabling split pagetables which
+* then would not allow aux domains
+*/
+   if (start == 0) {
+   forcecb = 0;
+   quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
+   }
+   }
} else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) {
fmt = ARM_32_LPAE_S1;
ias = min(ias, 32UL);
@@ -883,8 +938,8 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
ret = -EINVAL;
goto out_unlock;
}
-   ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
- smmu->num_context_banks);
+
+   ret = __arm_smmu_alloc_cb(smmu, start, forcecb);
if (ret < 0)
goto out_unlock;
 
-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 4/6] drm/msm: Add support to create target specific address spaces

2020-01-28 Thread Jordan Crouse
Add support to create a GPU target specific address space for
a context. For those targets that support per-instance
pagetables they will return a new address space set up for
the instance if possible otherwise just use the global
device pagetable.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/msm_drv.c | 22 +++---
 drivers/gpu/drm/msm/msm_gpu.h |  2 ++
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index e4b750b..e485dc1 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -585,6 +585,18 @@ static void load_gpu(struct drm_device *dev)
mutex_unlock(&init_lock);
 }
 
+static struct msm_gem_address_space *context_address_space(struct msm_gpu *gpu)
+{
+   if (!gpu)
+   return NULL;
+
+   if (gpu->funcs->create_instance_space)
+   return gpu->funcs->create_instance_space(gpu);
+
+   /* If all else fails use the default global space */
+   return gpu->aspace;
+}
+
 static int context_init(struct drm_device *dev, struct drm_file *file)
 {
struct msm_drm_private *priv = dev->dev_private;
@@ -596,7 +608,7 @@ static int context_init(struct drm_device *dev, struct 
drm_file *file)
 
msm_submitqueue_init(dev, ctx);
 
-   ctx->aspace = priv->gpu ? priv->gpu->aspace : NULL;
+   ctx->aspace = context_address_space(priv->gpu);
file->driver_priv = ctx;
 
return 0;
@@ -612,8 +624,12 @@ static int msm_open(struct drm_device *dev, struct 
drm_file *file)
return context_init(dev, file);
 }
 
-static void context_close(struct msm_file_private *ctx)
+static void context_close(struct msm_drm_private *priv,
+   struct msm_file_private *ctx)
 {
+   if (priv->gpu && ctx->aspace != priv->gpu->aspace)
+   msm_gem_address_space_put(ctx->aspace);
+
msm_submitqueue_close(ctx);
kfree(ctx);
 }
@@ -628,7 +644,7 @@ static void msm_postclose(struct drm_device *dev, struct 
drm_file *file)
priv->lastctx = NULL;
mutex_unlock(&dev->struct_mutex);
 
-   context_close(ctx);
+   context_close(priv, ctx);
 }
 
 static irqreturn_t msm_irq(int irq, void *arg)
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index d496b68..76636da 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -64,6 +64,8 @@ struct msm_gpu_funcs {
void (*gpu_set_freq)(struct msm_gpu *gpu, unsigned long freq);
struct msm_gem_address_space *(*create_address_space)
(struct msm_gpu *gpu, struct platform_device *pdev);
+   struct msm_gem_address_space *(*create_instance_space)
+   (struct msm_gpu *gpu);
 };
 
 struct msm_gpu {
-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 1/6] iommu: Add DOMAIN_ATTR_PTBASE

2020-01-28 Thread Jordan Crouse
Add an attribute to return the base address of the pagetable. This is used
by auxiliary domains from arm-smmu to return the address of the pagetable
to the domain so that it can set the appropriate pagetable through it's
own means.

Signed-off-by: Jordan Crouse 
---

 include/linux/iommu.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b14398b..0e9bcd9 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -128,6 +128,8 @@ enum iommu_attr {
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
/* Enable split pagetables (for example, TTBR1 on arm-smmu) */
DOMAIN_ATTR_SPLIT_TABLES,
+   /* Return the pagetable base address */
+   DOMAIN_ATTR_PTBASE,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 5/6] drm/msm/gpu: Add ttbr0 to the memptrs

2020-01-28 Thread Jordan Crouse
Targets that support per-instance pagetable switching will have to keep
track of which pagetable belongs to each instance to be able to recover
for preemption.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/msm_ringbuffer.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h 
b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 7764373..c5822bd 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -29,6 +29,7 @@ struct msm_gpu_submit_stats {
 struct msm_rbmemptrs {
volatile uint32_t rptr;
volatile uint32_t fence;
+   volatile uint64_t ttbr0;
 
volatile struct msm_gpu_submit_stats stats[MSM_GPU_SUBMIT_STATS_COUNT];
 };
-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 6/6] drm/msm/a6xx: Support per-instance pagetables

2020-01-28 Thread Jordan Crouse
Add support for per-instance pagetables for a6xx targets. Add support
to handle split pagetables and create a new instance if the needed
IOMMU support exists and insert the necessary PM4 commands to trigger
a pagetable switch at the beginning of a user command.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 89 +++
 1 file changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 9bec603c..e1a257e 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -12,6 +12,62 @@
 
 #define GPU_PAS_ID 13
 
+static void a6xx_set_pagetable(struct msm_gpu *gpu, struct msm_ringbuffer 
*ring,
+   struct msm_file_private *ctx)
+{
+   u64 ttbr;
+   u32 asid;
+
+   if (!msm_iommu_get_ptinfo(ctx->aspace->mmu, &ttbr, &asid))
+   return;
+
+   ttbr = ttbr | ((u64) asid) << 48;
+
+   /* Turn off protected mode */
+   OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+   OUT_RING(ring, 0);
+
+   /* Turn on APIV mode to access critical regions */
+   OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1);
+   OUT_RING(ring, 1);
+
+   /* Make sure the ME is synchronized before staring the update */
+   OUT_PKT7(ring, CP_WAIT_FOR_ME, 0);
+
+   /* Execute the table update */
+   OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 4);
+   OUT_RING(ring, lower_32_bits(ttbr));
+   OUT_RING(ring, upper_32_bits(ttbr));
+   /* CONTEXTIDR is currently unused */
+   OUT_RING(ring, 0);
+   /* CONTEXTBANK is currently unused */
+   OUT_RING(ring, 0);
+
+   /*
+* Write the new TTBR0 to the preemption records - this will be used to
+* reload the pagetable if the current ring gets preempted out.
+*/
+   OUT_PKT7(ring, CP_MEM_WRITE, 4);
+   OUT_RING(ring, lower_32_bits(rbmemptr(ring, ttbr0)));
+   OUT_RING(ring, upper_32_bits(rbmemptr(ring, ttbr0)));
+   OUT_RING(ring, lower_32_bits(ttbr));
+   OUT_RING(ring, upper_32_bits(ttbr));
+
+   /* Invalidate the draw state so we start off fresh */
+   OUT_PKT7(ring, CP_SET_DRAW_STATE, 3);
+   OUT_RING(ring, 0x4);
+   OUT_RING(ring, 1);
+   OUT_RING(ring, 0);
+
+   /* Turn off APRIV */
+   OUT_PKT4(ring, REG_A6XX_CP_MISC_CNTL, 1);
+   OUT_RING(ring, 0);
+
+   /* Turn off protected mode */
+   OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+   OUT_RING(ring, 1);
+}
+
 static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
 {
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -89,6 +145,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit,
struct msm_ringbuffer *ring = submit->ring;
unsigned int i;
 
+   a6xx_set_pagetable(gpu, ring, ctx);
+
get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO,
rbmemptr_stats(ring, index, cpcycles_start));
 
@@ -878,6 +936,36 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
return (unsigned long)busy_time;
 }
 
+static struct msm_gem_address_space*
+a6xx_create_instance_space(struct msm_gpu *gpu)
+{
+   struct msm_gem_address_space *aspace;
+   struct iommu_domain *iommu;
+   struct msm_mmu *mmu;
+
+   if (!iommu_dev_has_feature(&gpu->pdev->dev, IOMMU_DEV_FEAT_AUX))
+   return gpu->aspace;
+
+   iommu = iommu_domain_alloc(&platform_bus_type);
+   if (!iommu)
+   return gpu->aspace;
+
+   mmu = msm_iommu_new_instance(&gpu->pdev->dev, iommu);
+   if (IS_ERR(mmu)) {
+   iommu_domain_free(iommu);
+   return gpu->aspace;
+   }
+
+   aspace = msm_gem_address_space_create(mmu, "gpu",
+   0x1ULL, 0x1ULL);
+   if (IS_ERR(aspace)) {
+   mmu->funcs->destroy(mmu);
+   return gpu->aspace;
+   }
+
+   return aspace;
+}
+
 static struct msm_gem_address_space *
 a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
 {
@@ -951,6 +1039,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_state_put = a6xx_gpu_state_put,
 #endif
.create_address_space = a6xx_create_address_space,
+   .create_instance_space = a6xx_create_instance_space,
},
.get_timestamp = a6xx_get_timestamp,
 };
-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 3/6] drm/msm/adreno: ADd support for IOMMU auxiliary domains

2020-01-28 Thread Jordan Crouse
Add support for creating a auxiliary domain from the IOMMU device to
implement per-instance pagetables. Also add a helper function to
return the pagetable base address (ttbr) and asid to the caller so
that the GPU target code can set up the pagetable switch.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/msm_iommu.c | 72 +
 drivers/gpu/drm/msm/msm_mmu.h   |  3 ++
 2 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index e773ef8..df0d70a 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -7,9 +7,17 @@
 #include "msm_drv.h"
 #include "msm_mmu.h"
 
+/*
+ * It is up to us to assign ASIDS for our instances. Start at 32 to give a
+ * cushion to account for ASIDS assigned to real context banks
+ */
+static int msm_iommu_asid = 32;
+
 struct msm_iommu {
struct msm_mmu base;
struct iommu_domain *domain;
+   u64 ttbr;
+   int asid;
 };
 #define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
 
@@ -58,6 +66,20 @@ static void msm_iommu_destroy(struct msm_mmu *mmu)
kfree(iommu);
 }
 
+static void msm_iommu_aux_detach(struct msm_mmu *mmu)
+{
+   struct msm_iommu *iommu = to_msm_iommu(mmu);
+
+   iommu_aux_detach_device(iommu->domain, mmu->dev);
+}
+
+static const struct msm_mmu_funcs aux_funcs = {
+   .detach = msm_iommu_aux_detach,
+   .map = msm_iommu_map,
+   .unmap = msm_iommu_unmap,
+   .destroy = msm_iommu_destroy,
+};
+
 static const struct msm_mmu_funcs funcs = {
.detach = msm_iommu_detach,
.map = msm_iommu_map,
@@ -65,6 +87,56 @@ static const struct msm_mmu_funcs funcs = {
.destroy = msm_iommu_destroy,
 };
 
+bool msm_iommu_get_ptinfo(struct msm_mmu *mmu, u64 *ttbr, u32 *asid)
+{
+   struct msm_iommu *iommu = to_msm_iommu(mmu);
+
+   if (!iommu->ttbr)
+   return false;
+
+   if (ttbr)
+   *ttbr = iommu->ttbr;
+   if (asid)
+   *asid = iommu->asid;
+
+   return true;
+}
+
+struct msm_mmu *msm_iommu_new_instance(struct device *dev,
+   struct iommu_domain *domain)
+{
+   struct msm_iommu *iommu;
+   u64 ptbase;
+   int ret;
+
+   ret = iommu_aux_attach_device(domain, dev);
+   if (ret)
+   return ERR_PTR(ret);
+
+   ret = iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, &ptbase);
+   if (ret) {
+   iommu_aux_detach_device(domain, dev);
+   return ERR_PTR(ret);
+   }
+
+   iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
+   if (!iommu) {
+   iommu_aux_detach_device(domain, dev);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   iommu->domain = domain;
+   iommu->ttbr = ptbase;
+   iommu->asid = msm_iommu_asid++;
+
+   if (msm_iommu_asid > 0xff)
+   msm_iommu_asid = 32;
+
+   msm_mmu_init(&iommu->base, dev, &aux_funcs);
+
+   return &iommu->base;
+}
+
 struct msm_mmu *msm_iommu_new(struct device *dev, struct iommu_domain *domain)
 {
struct msm_iommu *iommu;
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index bae9e8e..65a5cb2 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -32,6 +32,9 @@ static inline void msm_mmu_init(struct msm_mmu *mmu, struct 
device *dev,
 }
 
 struct msm_mmu *msm_iommu_new(struct device *dev, struct iommu_domain *domain);
+struct msm_mmu *msm_iommu_new_instance(struct device *dev,
+   struct iommu_domain *domain);
+bool msm_iommu_get_ptinfo(struct msm_mmu *mmu, u64 *ttbr, u32 *asid);
 struct msm_mmu *msm_gpummu_new(struct device *dev, struct msm_gpu *gpu);
 
 static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 2/6] arm/smmu: Add auxiliary domain support for arm-smmuv2

2020-01-28 Thread Jordan Crouse
Support auxiliary domains for arm-smmu-v2 to initialize and support
multiple pagetables for a single SMMU context bank. Since the smmu-v2
hardware doesn't have any built in support for switching the pagetable
base it is left as an exercise to the caller to actually use the pagetable.

Aux domains are supported if split pagetable (TTBR1) support has been
enabled on the master domain.  Each auxiliary domain will reuse the
configuration of the master domain. By default the a domain with TTBR1
support will have the TTBR0 region disabled so the first attached aux
domain will enable the TTBR0 region in the hardware and conversely the
last domain to be detached will disable TTBR0 translations.  All subsequent
auxiliary domains create a pagetable but not touch the hardware.

The leaf driver will be able to query the physical address of the
pagetable with the DOMAIN_ATTR_PTBASE attribute so that it can use the
address with whatever means it has to switch the pagetable base.

Following is a pseudo code example of how a domain can be created

 /* Check to see if aux domains are supported */
 if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
 iommu = iommu_domain_alloc(...);

 if (iommu_aux_attach_device(domain, dev))
 return FAIL;

/* Save the base address of the pagetable for use by the driver
iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, &ptbase);
 }

Then 'domain' can be used like any other iommu domain to map and
unmap iova addresses in the pagetable.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 230 +++
 drivers/iommu/arm-smmu.h |   3 +
 2 files changed, 217 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 23b22fa..85a6773 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -91,6 +91,8 @@ struct arm_smmu_cb {
u32 tcr[2];
u32 mair[2];
struct arm_smmu_cfg *cfg;
+   atomic_taux;
+   atomic_trefcount;
 };
 
 struct arm_smmu_master_cfg {
@@ -533,6 +535,7 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx];
bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
 
+   atomic_inc(&cb->refcount);
cb->cfg = cfg;
 
/* TCR */
@@ -671,6 +674,91 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
 }
 
+/*
+ * Update the context context bank to enable TTBR0. Assumes AARCH64 S1
+ * configuration.
+ */
+static void arm_smmu_context_set_ttbr0(struct arm_smmu_cb *cb,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   u32 tcr = cb->tcr[0];
+
+   /* Add the TCR configuration from the new pagetable config */
+   tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
+
+   /* Make sure that both TTBR0 and TTBR1 are enabled */
+   tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
+
+   /* Udate the TCR register */
+   cb->tcr[0] = tcr;
+
+   /* Program the new TTBR0 */
+   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+}
+
+/*
+ * Thus function assumes that the current model only allows aux domains for
+ * AARCH64 S1 configurations
+ */
+static int arm_smmu_aux_init_domain_context(struct iommu_domain *domain,
+   struct arm_smmu_device *smmu, struct arm_smmu_cfg *master)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *pgtbl_ops;
+   struct io_pgtable_cfg pgtbl_cfg;
+
+   mutex_lock(&smmu_domain->init_mutex);
+
+   /* Copy the configuration from the master */
+   memcpy(&smmu_domain->cfg, master, sizeof(smmu_domain->cfg));
+
+   smmu_domain->flush_ops = &arm_smmu_s1_tlb_ops;
+   smmu_domain->smmu = smmu;
+
+   pgtbl_cfg = (struct io_pgtable_cfg) {
+   .pgsize_bitmap = smmu->pgsize_bitmap,
+   .ias = smmu->va_size,
+   .oas = smmu->ipa_size,
+   .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK,
+   .tlb = smmu_domain->flush_ops,
+   .iommu_dev = smmu->dev,
+   .quirks = 0,
+   };
+
+   if (smmu_domain->non_strict)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+
+   pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1, &pgtbl_cfg,
+   smmu_domain);
+   if (!pgtbl_ops) {
+   mutex_unlock(&smmu_domain->init_mutex);
+   return -ENOMEM;
+   }
+
+   domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
+
+   d

[PATCH v1 0/6] iommu/arm-smmu: Auxiliary domain and per instance pagetables

2020-01-28 Thread Jordan Crouse
Some clients have a requirement to sandbox memory mappings for security and
advanced features like SVM. This series adds support to enable per-instance
pagetables as auxiliary domains in the arm-smmu driver and adds per-instance
support for the Adreno GPU.

This patchset builds on the split pagetable support from [1]. In that series the
TTBR1 address space is programmed for the default ("master") domain and enables
support for auxiliary domains. Each new auxiliary domain will allocate a
pagetable which the leaf driver can program through the usual IOMMU APIs. It can
also query the physical address of the pagetable.

In the SMMU driver the first auxiliary domain will enable and program the TTBR0
space. Subsequent auxiliary domains won't touch the hardware. Similarly when
the last auxiliary domain is detached the TTBR0 region will be disabled again.

In the Adreno driver each new file descriptor instance will create a new
auxiliary domain / pagetable and use it for all the memory allocations of that
instance. The driver will query the base address of each pagetable and switch
them dynamically using the built-in table switch capability of the GPU. If any
of these features fail the driver will automatically fall back to using the
default (global) pagetable.

This patchset had previously been submitted as [2] but has been significantly
modified since then.

Jordan

[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-January/041438.html
[2] https://patchwork.freedesktop.org/series/57441/


Jordan Crouse (6):
  iommu: Add DOMAIN_ATTR_PTBASE
  arm/smmu: Add auxiliary domain support for arm-smmuv2
  drm/msm/adreno: ADd support for IOMMU auxiliary domains
  drm/msm: Add support to create target specific address spaces
  drm/msm/gpu: Add ttbr0 to the memptrs
  drm/msm/a6xx: Support per-instance pagetables

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  89 +
 drivers/gpu/drm/msm/msm_drv.c |  22 +++-
 drivers/gpu/drm/msm/msm_gpu.h |   2 +
 drivers/gpu/drm/msm/msm_iommu.c   |  72 +++
 drivers/gpu/drm/msm/msm_mmu.h |   3 +
 drivers/gpu/drm/msm/msm_ringbuffer.h  |   1 +
 drivers/iommu/arm-smmu.c  | 230 +++---
 drivers/iommu/arm-smmu.h  |   3 +
 include/linux/iommu.h |   2 +
 9 files changed, 405 insertions(+), 19 deletions(-)

-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 4/5] drm/msm: Refactor address space initialization

2020-01-28 Thread Jordan Crouse
Refactor how address space initialization works. Instead of having the
address space function create the MMU object (and thus require separate but
equal functions for gpummu and iommu) use a single function and pass the
MMU struct in. Make the generic code cleaner by using target specific
functions to create the address space so a2xx can do its own thing in its
own space.  For all the other targets use a generic helper to initialize
IOMMU but leave the door open for newer targets to use customization
if they need it.

Reviewed-by: Rob Clark 
Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 ++
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 10 +++
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 14 +
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 ---
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 11 +--
 drivers/gpu/drm/msm/msm_drv.h|  8 ++---
 drivers/gpu/drm/msm/msm_gem_vma.c| 51 
 drivers/gpu/drm/msm/msm_gpu.c| 40 ++---
 drivers/gpu/drm/msm/msm_gpu.h|  4 +--
 drivers/gpu/drm/msm/msm_iommu.c  |  3 ++
 16 files changed, 82 insertions(+), 114 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index 1f83bc1..60f6472 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -401,6 +401,21 @@ static struct msm_gpu_state *a2xx_gpu_state_get(struct 
msm_gpu *gpu)
return state;
 }
 
+static struct msm_gem_address_space *
+a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct msm_mmu *mmu = msm_gpummu_new(&pdev->dev, gpu);
+   struct msm_gem_address_space *aspace;
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
+   SZ_16M + 0xfff * SZ_64K);
+
+   if (IS_ERR(aspace) && !IS_ERR(mmu))
+   mmu->funcs->destroy(mmu);
+
+   return aspace;
+}
+
 /* Register offset defines for A2XX - copy of A3XX */
 static const unsigned int a2xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE),
@@ -429,6 +444,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a2xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = a2xx_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index b67f888..0a5ea9f 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -441,6 +441,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a3xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 253d8d8..b626afb 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -532,6 +532,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a4xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a4xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 7d9e63e..47672dc 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1439,6 +1439,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_busy = a5xx_gpu_busy,
.gpu_state_get = a5xx_gpu_state_get,
.gpu_state_put = a5xx_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a5xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index daf0780..a2c5412 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -900,6 +900,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
 #endif
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a6xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/dr

[PATCH v5 3/5] drm/msm: Attach the IOMMU device during initialization

2020-01-28 Thread Jordan Crouse
Everywhere an IOMMU object is created by msm_gpu_create_address_space
the IOMMU device is attached immediately after. Instead of carrying around
the infrastructure to do the attach from the device specific code do it
directly in the msm_iommu_init() function. This gets it out of the way for
more aggressive cleanups that follow.

Reviewed-by: Rob Clark 
Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  |  8 
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c |  4 
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  7 ---
 drivers/gpu/drm/msm/msm_gem_vma.c| 23 +++
 drivers/gpu/drm/msm/msm_gpu.c| 11 +--
 drivers/gpu/drm/msm/msm_gpummu.c |  6 --
 drivers/gpu/drm/msm/msm_iommu.c  | 15 +++
 drivers/gpu/drm/msm/msm_mmu.h|  1 -
 8 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
index cb08faf..4fd4ded 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
@@ -704,7 +704,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
 {
struct iommu_domain *domain;
struct msm_gem_address_space *aspace;
-   int ret;
 
domain = iommu_domain_alloc(&platform_bus_type);
if (!domain)
@@ -720,13 +719,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
return PTR_ERR(aspace);
}
 
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DPU_ERROR("failed to attach iommu %d\n", ret);
-   msm_gem_address_space_put(aspace);
-   return ret;
-   }
-
dpu_kms->base.aspace = aspace;
return 0;
 }
diff --git a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c 
b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
index dda0543..9dba37c 100644
--- a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
@@ -518,10 +518,6 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret)
-   goto fail;
} else {
DRM_DEV_INFO(dev->dev, "no iommu, fallback to phys "
"contig buffers for scanout\n");
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
index e43ecd4..653dab2 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
@@ -736,13 +736,6 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DRM_DEV_ERROR(&pdev->dev, "failed to attach iommu: 
%d\n",
-   ret);
-   goto fail;
-   }
} else {
DRM_DEV_INFO(&pdev->dev,
 "no iommu, fallback to phys contig buffers for 
scanout\n");
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 1af5354..91d993a 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -131,8 +131,8 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
const char *name)
 {
struct msm_gem_address_space *aspace;
-   u64 size = domain->geometry.aperture_end -
-   domain->geometry.aperture_start;
+   u64 start = domain->geometry.aperture_start;
+   u64 size = domain->geometry.aperture_end - start;
 
aspace = kzalloc(sizeof(*aspace), GFP_KERNEL);
if (!aspace)
@@ -141,9 +141,18 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
spin_lock_init(&aspace->lock);
aspace->name = name;
aspace->mmu = msm_iommu_new(dev, domain);
+   if (IS_ERR(aspace->mmu)) {
+   int ret = PTR_ERR(aspace->mmu);
 
-   drm_mm_init(&aspace->mm, (domain->geometry.aperture_start >> 
PAGE_SHIFT),
-   size >> PAGE_SHIFT);
+   kfree(aspace);
+   return ERR_PTR(ret);
+   }
+
+   /*
+* Attaching the IOMMU device changes the aperture values so use the
+* cached values instead
+*/
+   drm_mm_init(&aspace->mm, start >> PAGE_SHIFT, size >> PAGE_SHIFT);
 
kref_init(&aspace->kref);
 
@@ -164,6 +173,12 @@ msm_gem_address_space_create_a2xx(struct device *dev, 
struct msm_gpu *gpu,
spin_lock_init(&aspace->lock);
aspace->name =

[PATCH v5 5/5] drm/msm/a6xx: Support split pagetables

2020-01-28 Thread Jordan Crouse
Attempt to enable split pagetables if the arm-smmu driver supports it.
This will move the default address space from the default region to
the address range assigned to TTBR1. The behavior should be transparent
to the driver for now but it gets the default buffers out of the way
when we want to start swapping TTBR0 for context-specific pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 52 ++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index a2c5412..9bec603c 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -878,6 +878,56 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
return (unsigned long)busy_time;
 }
 
+static struct msm_gem_address_space *
+a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type);
+   struct msm_gem_address_space *aspace;
+   struct msm_mmu *mmu;
+   u64 start, size;
+   u32 val = 1;
+   int ret;
+
+   if (!iommu)
+   return ERR_PTR(-ENOMEM);
+
+   /*
+* Try to request split pagetables - the request has to be made before
+* the domian is attached
+*/
+   iommu_domain_set_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
+
+   mmu = msm_iommu_new(&pdev->dev, iommu);
+   if (IS_ERR(mmu)) {
+   iommu_domain_free(iommu);
+   return ERR_CAST(mmu);
+   }
+
+   /*
+* After the domain is attached, see if the split tables were actually
+* successful.
+*/
+   ret = iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
+   if (!ret && val) {
+   /*
+* The aperture start will be at the beginning of the TTBR1
+* space so use that as a base
+*/
+   start = iommu->geometry.aperture_start;
+   size = 0x;
+   } else {
+   /* Otherwise use the legacy 32 bit region */
+   start = SZ_16M;
+   size = 0x - SZ_16M;
+   }
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", start, size);
+   if (IS_ERR(aspace))
+   iommu_domain_free(iommu);
+
+   return aspace;
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
@@ -900,7 +950,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
 #endif
-   .create_address_space = adreno_iommu_create_address_space,
+   .create_address_space = a6xx_create_address_space,
},
.get_timestamp = a6xx_get_timestamp,
 };
-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 0/5] iommu/arm-smmu: Split pagetable support for arm-smmu-v2

2020-01-28 Thread Jordan Crouse
This is another iteration for the split pagetable support based on the
suggestions from Robin and Will [1].

Background: In order to support per-context pagetables the GPU needs to enable
split tables so that we can store global buffers in the TTBR1 space leaving the
GPU free to program the TTBR0 register with the address of a context specific
pagetable.

If the DOMAIN_ATTR_SPLIT_TABLES attribute is set on the domain before attaching,
the context bank assigned to the domain will be programmed to allow translations
in the TTBR1 space. Translations in the TTBR0 region will be disallowed because,
as Robin pointe out, having a un-programmed TTBR0 register is dangerous.

The driver can determine if TTBR1 was successfully programmed by querying
DOMAIN_ATTR_SPLIT_TABLES after attaching. The domain geometry will also be
updated to reflect the virtual address space for the TTBR1 range.

Upcoming changes will allow auxiliary domains to be attached to the device which
will enable and program TTBR0.

This patchset is based on top of linux-next-20200127.

Change log:

v4: Only program TTBR1 when split pagetables are requested. TTBR0 will be
enabled later when an auxiliary domain is attached
v3: Remove the implementation specific and make split pagetable support
part of the generic configuration

[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-January/041373.html

Jordan Crouse (5):
  iommu: Add DOMAIN_ATTR_SPLIT_TABLES
  iommu/arm-smmu: Add support for TTBR1
  drm/msm: Attach the IOMMU device during initialization
  drm/msm: Refactor address space initialization
  drm/msm/a6xx: Support split pagetables

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 ++
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c| 51 
 drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 18 ---
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 18 +--
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 ---
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 18 +--
 drivers/gpu/drm/msm/msm_drv.h|  8 ++---
 drivers/gpu/drm/msm/msm_gem_vma.c| 36 --
 drivers/gpu/drm/msm/msm_gpu.c| 49 ++
 drivers/gpu/drm/msm/msm_gpu.h|  4 +--
 drivers/gpu/drm/msm/msm_gpummu.c |  6 
 drivers/gpu/drm/msm/msm_iommu.c  | 18 ++-
 drivers/gpu/drm/msm/msm_mmu.h|  1 -
 drivers/iommu/arm-smmu.c | 48 +-
 drivers/iommu/arm-smmu.h | 22 ++
 include/linux/iommu.h|  2 ++
 21 files changed, 198 insertions(+), 155 deletions(-)

-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/5] iommu/arm-smmu: Add support for TTBR1

2020-01-28 Thread Jordan Crouse
Add support to enable TTBR1 if the domain requests it via the
DOMAIN_ATTR_SPLIT_TABLES attribute. If enabled by the hardware
and pagetable configuration the driver will configure the TTBR1 region
and program the domain pagetable on TTBR1. TTBR0 will be disabled.

After attaching the device the value of he domain attribute can
be queried to see if the split pagetables were successfully programmed.
The domain geometry will be updated as well so that the caller can
determine the active region for the pagetable that was programmed.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 48 +---
 drivers/iommu/arm-smmu.h | 22 --
 2 files changed, 57 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 16c4b87..23b22fa 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -557,11 +557,17 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
cb->ttbr[1] = 0;
} else {
-   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID,
- cfg->asid);
-   cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
-cfg->asid);
+   if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+cfg->asid);
+   cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   } else {
+   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+ cfg->asid);
+   cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+cfg->asid);
+   }
}
} else {
cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
@@ -675,6 +681,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
enum io_pgtable_fmt fmt;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+   unsigned long quirks = 0;
 
mutex_lock(&smmu_domain->init_mutex);
if (smmu_domain->smmu)
@@ -743,6 +750,14 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
oas = smmu->ipa_size;
if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
fmt = ARM_64_LPAE_S1;
+
+   /*
+* We are assuming that split pagetables will always use
+* SEP_UPSTREAM so we don't need to reduce the size of
+* the ias to account for the sign extension bit
+*/
+   if (smmu_domain->split_pagetables)
+   quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
} else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) {
fmt = ARM_32_LPAE_S1;
ias = min(ias, 32UL);
@@ -812,6 +827,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
.coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK,
.tlb= smmu_domain->flush_ops,
.iommu_dev  = smmu->dev,
+   .quirks = quirks,
};
 
if (smmu_domain->non_strict)
@@ -825,8 +841,15 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
 
/* Update the domain's page sizes to reflect the page table format */
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-   domain->geometry.aperture_end = (1UL << ias) - 1;
-   domain->geometry.force_aperture = true;
+
+   if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   domain->geometry.aperture_start = ~0UL << ias;
+   domain->geometry.aperture_end = ~0UL;
+   } else {
+   domain->geometry.aperture_end = (1UL << ias) - 1;
+   domain->geometry.force_aperture = true;
+   smmu_domain->split_pagetables = false;
+   }
 
/* Initialise the context bank with our page table cfg */
arm_smmu_init_context_bank(smmu_domain, &pgtbl_cfg);
@@ -1523,6 +1546,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
 

[PATCH v5 1/5] iommu: Add DOMAIN_ATTR_SPLIT_TABLES

2020-01-28 Thread Jordan Crouse
Add a new attribute to enable and query the state of split pagetables
for the domain.

Acked-by: Will Deacon 
Signed-off-by: Jordan Crouse 
---

 include/linux/iommu.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d1b5f4d..b14398b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -126,6 +126,8 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   /* Enable split pagetables (for example, TTBR1 on arm-smmu) */
+   DOMAIN_ATTR_SPLIT_TABLES,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/5] iommu/arm-smmu: Add support for split pagetables

2020-01-21 Thread Jordan Crouse
On Tue, Jan 21, 2020 at 02:36:19PM +, Robin Murphy wrote:
> On 16/12/2019 4:37 pm, Jordan Crouse wrote:
> >Add support to enable split pagetables (TTBR1) if the supporting driver
> >requests it via the DOMAIN_ATTR_SPLIT_TABLES flag. When enabled, the driver
> >will set up the TTBR0 and TTBR1 regions and program the default domain
> >pagetable on TTBR1.
> >
> >After attaching the device, the value of he domain attribute can
> >be queried to see if the split pagetables were successfully programmed.
> >Furthermore the domain geometry will be updated so that the caller can
> >determine the active region for the pagetable that was programmed.
> >
> >Signed-off-by: Jordan Crouse 
> >---
> >
> >  drivers/iommu/arm-smmu.c | 40 +++-
> >  drivers/iommu/arm-smmu.h | 45 +++--
> >  2 files changed, 74 insertions(+), 11 deletions(-)
> >
> >diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> >index c106406..7b59116 100644
> >--- a/drivers/iommu/arm-smmu.c
> >+++ b/drivers/iommu/arm-smmu.c
> >@@ -538,9 +538,17 @@ static void arm_smmu_init_context_bank(struct 
> >arm_smmu_domain *smmu_domain,
> > cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
> > cb->ttbr[1] = 0;
> > } else {
> >-cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> >-cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
> >-cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
> >+if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
> >+cb->ttbr[0] = FIELD_PREP(TTBRn_ASID, cfg->asid);
> >+cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> >+cb->ttbr[1] |=
> >+FIELD_PREP(TTBRn_ASID, cfg->asid);
> >+} else {
> >+cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> >+cb->ttbr[0] |=
> >+FIELD_PREP(TTBRn_ASID, cfg->asid);
> >+cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
> >+}
> > }
> > } else {
> > cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
> >@@ -651,6 +659,7 @@ static int arm_smmu_init_domain_context(struct 
> >iommu_domain *domain,
> > enum io_pgtable_fmt fmt;
> > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> >+u32 quirks = 0;
> > mutex_lock(&smmu_domain->init_mutex);
> > if (smmu_domain->smmu)
> >@@ -719,6 +728,8 @@ static int arm_smmu_init_domain_context(struct 
> >iommu_domain *domain,
> > oas = smmu->ipa_size;
> > if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
> > fmt = ARM_64_LPAE_S1;
> >+if (smmu_domain->split_pagetables)
> >+quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
> > } else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) {
> > fmt = ARM_32_LPAE_S1;
> > ias = min(ias, 32UL);
> >@@ -788,6 +799,7 @@ static int arm_smmu_init_domain_context(struct 
> >iommu_domain *domain,
> > .coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK,
> > .tlb= smmu_domain->flush_ops,
> > .iommu_dev  = smmu->dev,
> >+.quirks = quirks,
> > };
> > if (smmu_domain->non_strict)
> >@@ -801,8 +813,15 @@ static int arm_smmu_init_domain_context(struct 
> >iommu_domain *domain,
> > /* Update the domain's page sizes to reflect the page table format */
> > domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
> >-domain->geometry.aperture_end = (1UL << ias) - 1;
> >-domain->geometry.force_aperture = true;
> >+
> >+if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
> >+domain->geometry.aperture_start = ~((1ULL << ias) - 1);
> 
> AKA "~0UL << ias", if I'm not mistaken ;)
> 
> >+domain->geometry.aperture_end = ~0UL;
> >+} else {
> >+domain->geometry.aperture_end = (1UL << ias) - 1;
> >+domain->geomet

Re: [PATCH v3 2/5] iommu/arm-smmu: Add support for split pagetables

2020-01-09 Thread Jordan Crouse
On Thu, Jan 09, 2020 at 02:33:34PM +, Will Deacon wrote:
> On Mon, Dec 16, 2019 at 09:37:48AM -0700, Jordan Crouse wrote:
> > Add support to enable split pagetables (TTBR1) if the supporting driver
> > requests it via the DOMAIN_ATTR_SPLIT_TABLES flag. When enabled, the driver
> > will set up the TTBR0 and TTBR1 regions and program the default domain
> > pagetable on TTBR1.
> > 
> > After attaching the device, the value of he domain attribute can
> > be queried to see if the split pagetables were successfully programmed.
> > Furthermore the domain geometry will be updated so that the caller can
> > determine the active region for the pagetable that was programmed.
> > 
> > Signed-off-by: Jordan Crouse 
> > ---
> > 
> >  drivers/iommu/arm-smmu.c | 40 +++-
> >  drivers/iommu/arm-smmu.h | 45 +++--
> >  2 files changed, 74 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index c106406..7b59116 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -538,9 +538,17 @@ static void arm_smmu_init_context_bank(struct 
> > arm_smmu_domain *smmu_domain,
> > cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
> > cb->ttbr[1] = 0;
> > } else {
> > -   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> > -   cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
> > -   cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
> > +   if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
> > +   cb->ttbr[0] = FIELD_PREP(TTBRn_ASID, cfg->asid);
> > +   cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> > +   cb->ttbr[1] |=
> > +   FIELD_PREP(TTBRn_ASID, cfg->asid);
> > +   } else {
> > +   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> > +   cb->ttbr[0] |=
> > +   FIELD_PREP(TTBRn_ASID, cfg->asid);
> > +   cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
> > +   }
> 
> I still don't understand why you have to set the ASID in both of the TTBRs.
> Assuming TCR.A1 is clear, then we should only need to set the field in
> TTBR0.

This is mostly out of a sense of symmetry with the non-split configuration. I'll
clean it up.

> 
> > }
> > } else {
> > cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
> > @@ -651,6 +659,7 @@ static int arm_smmu_init_domain_context(struct 
> > iommu_domain *domain,
> > enum io_pgtable_fmt fmt;
> > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> > +   u32 quirks = 0;
> >  
> > mutex_lock(&smmu_domain->init_mutex);
> > if (smmu_domain->smmu)
> > @@ -719,6 +728,8 @@ static int arm_smmu_init_domain_context(struct 
> > iommu_domain *domain,
> > oas = smmu->ipa_size;
> > if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
> > fmt = ARM_64_LPAE_S1;
> > +   if (smmu_domain->split_pagetables)
> > +   quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
> > } else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) {
> > fmt = ARM_32_LPAE_S1;
> > ias = min(ias, 32UL);
> > @@ -788,6 +799,7 @@ static int arm_smmu_init_domain_context(struct 
> > iommu_domain *domain,
> > .coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK,
> > .tlb= smmu_domain->flush_ops,
> > .iommu_dev  = smmu->dev,
> > +   .quirks = quirks,
> > };
> >  
> > if (smmu_domain->non_strict)
> > @@ -801,8 +813,15 @@ static int arm_smmu_init_domain_context(struct 
> > iommu_domain *domain,
> >  
> > /* Update the domain's page sizes to reflect the page table format */
> > domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
> > -   domain->geometry.aperture_end = (1UL << ias) - 1;
> > -   domain->geometry.force_aperture = true;
> > +
> > +   if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
> &

Re: [Freedreno] [PATCH v3 4/5] drm/msm: Refactor address space initialization

2020-01-06 Thread Jordan Crouse
On Mon, Dec 16, 2019 at 09:37:50AM -0700, Jordan Crouse wrote:
> Refactor how address space initialization works. Instead of having the
> address space function create the MMU object (and thus require separate but
> equal functions for gpummu and iommu) use a single function and pass the
> MMU struct. Make the generic code cleaner by using target specific
> functions to create the address space so a2xx can do its own thing in its
> own space.  For all the other targets use a generic helper to initialize
> IOMMU but leave the door open for newer targets to use customization
> if they need it.
> 
> Signed-off-by: Jordan Crouse 
> ---
> 
>  drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 ++
>  drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c|  1 +
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 10 +++---
>  drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 14 +
>  drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 ---
>  drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 11 +--
>  drivers/gpu/drm/msm/msm_drv.h|  8 ++---
>  drivers/gpu/drm/msm/msm_gem_vma.c| 52 
> +---
>  drivers/gpu/drm/msm/msm_gpu.c| 40 ++--
>  drivers/gpu/drm/msm/msm_gpu.h|  4 +--
>  drivers/gpu/drm/msm/msm_iommu.c  |  3 ++
>  16 files changed, 83 insertions(+), 114 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> index 1f83bc1..60f6472 100644
> --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> @@ -401,6 +401,21 @@ static struct msm_gpu_state *a2xx_gpu_state_get(struct 
> msm_gpu *gpu)
>   return state;
>  }
>  
> +static struct msm_gem_address_space *
> +a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
> +{
> + struct msm_mmu *mmu = msm_gpummu_new(&pdev->dev, gpu);
> + struct msm_gem_address_space *aspace;
> +
> + aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
> + SZ_16M + 0xfff * SZ_64K);
> +
> + if (IS_ERR(aspace) && !IS_ERR(mmu))
> + mmu->funcs->destroy(mmu);
> +
> + return aspace;
> +}
> +
>  /* Register offset defines for A2XX - copy of A3XX */
>  static const unsigned int a2xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
>   REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE),
> @@ -429,6 +444,7 @@ static const struct adreno_gpu_funcs funcs = {
>  #endif
>   .gpu_state_get = a2xx_gpu_state_get,
>   .gpu_state_put = adreno_gpu_state_put,
> + .create_address_space = a2xx_create_address_space,
>   },
>  };
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> index 7ad1493..41e51e0 100644
> --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> @@ -441,6 +441,7 @@ static const struct adreno_gpu_funcs funcs = {
>  #endif
>   .gpu_state_get = a3xx_gpu_state_get,
>   .gpu_state_put = adreno_gpu_state_put,
> + .create_address_space = adreno_iommu_create_address_space,
>   },
>  };
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> index b01388a..3655440 100644
> --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> @@ -532,6 +532,7 @@ static const struct adreno_gpu_funcs funcs = {
>  #endif
>   .gpu_state_get = a4xx_gpu_state_get,
>   .gpu_state_put = adreno_gpu_state_put,
> + .create_address_space = adreno_iommu_create_address_space,
>   },
>   .get_timestamp = a4xx_get_timestamp,
>  };
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index b02e204..0f5db72 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1432,6 +1432,7 @@ static const struct adreno_gpu_funcs funcs = {
>   .gpu_busy = a5xx_gpu_busy,
>   .gpu_state_get = a5xx_gpu_state_get,
>   .gpu_state_put = a5xx_gpu_state_put,
> + .create_address_space = adreno_iommu_create_address_space,
>   },
>   .get_timestamp = a5xx_get_timestamp,
>  };
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/a

Re: [PATCH v3 5/5] drm/msm/a6xx: Support split pagetables

2020-01-06 Thread Jordan Crouse
On Tue, Dec 24, 2019 at 08:27:28AM +0530, smase...@codeaurora.org wrote:
> On 2019-12-16 22:07, Jordan Crouse wrote:
> >Attempt to enable split pagetables if the arm-smmu driver supports it.
> >This will move the default address space from the default region to
> >the address range assigned to TTBR1. The behavior should be transparent
> >to the driver for now but it gets the default buffers out of the way
> >when we want to start swapping TTBR0 for context-specific pagetables.
> >
> >Signed-off-by: Jordan Crouse 
> >---
> >
> > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 52
> >++-
> > 1 file changed, 51 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >index 5dc0b2c..1c6da93 100644
> >--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >@@ -811,6 +811,56 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu
> >*gpu)
> > return (unsigned long)busy_time;
> > }
> >
> >+static struct msm_gem_address_space *
> >+a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device
> >*pdev)
> >+{
> >+struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type);
> >+struct msm_gem_address_space *aspace;
> >+struct msm_mmu *mmu;
> >+u64 start, size;
> >+u32 val = 1;
> >+int ret;
> >+
> >+if (!iommu)
> >+return ERR_PTR(-ENOMEM);
> >+
> >+/*
> >+ * Try to request split pagetables - the request has to be made before
> >+ * the domian is attached
> >+ */
> >+iommu_domain_set_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
> >+
> >+mmu = msm_iommu_new(&pdev->dev, iommu);
> >+if (IS_ERR(mmu)) {
> >+iommu_domain_free(iommu);
> >+return ERR_CAST(mmu);
> >+}
> >+
> >+/*
> >+ * After the domain is attached, see if the split tables were actually
> >+ * successful.
> >+ */
> >+ret = iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
> >+if (!ret && val) {
> >+/*
> >+ * The aperture start will be at the beginning of the TTBR1
> >+ * space so use that as a base
> >+ */
> >+start = iommu->geometry.aperture_start;
> >+size = 0x;
> This should be the va_end and not the size

This is a bug in msm_gem_address_space_create - I intended the parameter to be
the size.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)

2019-12-20 Thread Jordan Crouse
On Fri, Dec 20, 2019 at 03:40:59PM +0530, smase...@codeaurora.org wrote:
> On 2019-12-20 01:28, Jordan Crouse wrote:
> >On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:
> >>The last level system cache can be partitioned to 32 different slices
> >>of which GPU has two slices preallocated. One slice is used for caching
> >>GPU
> >>buffers and the other slice is used for caching the GPU SMMU pagetables.
> >>This patch talks to the core system cache driver to acquire the slice
> >>handles,
> >>configure the SCID's to those slices and activates and deactivates the
> >>slices
> >>upon GPU power collapse and restore.
> >>
> >>Some support from the IOMMU driver is also needed to make use of the
> >>system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which
> >>enables
> >>caching GPU data buffers in the system cache with memory attributes such
> >>as outer cacheable, read-allocate, write-allocate for buffers. The GPU
> >>then has the ability to override a few cacheability parameters which it
> >>does to override write-allocate to write-no-allocate as the GPU hardware
> >>does not benefit much from it.
> >>
> >>Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
> >>used by the IOMMU driver to set the right attributes to cache the
> >>hardware
> >>pagetables into the system cache.
> >>
> >>Signed-off-by: Sharat Masetty 
> >>---
> >> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122
> >>+-
> >> drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
> >> drivers/gpu/drm/msm/msm_iommu.c   |  13 
> >> drivers/gpu/drm/msm/msm_mmu.h |   3 +
> >> 4 files changed, 146 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>index faff6ff..0c7fdee 100644
> >>--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>@@ -9,6 +9,7 @@
> >> #include "a6xx_gmu.xml.h"
> >>
> >> #include 
> >>+#include 
> >>
> >> #define GPU_PAS_ID 13
> >>
> >>@@ -781,6 +782,117 @@ static void
> >>a6xx_bus_clear_pending_transactions(struct adreno_gpu *adreno_gpu)
> >>gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
> >> }
> >>
> >>+#define A6XX_LLC_NUM_GPU_SCIDS 5
> >>+#define A6XX_GPU_LLC_SCID_NUM_BITS 5
> >
> >As I mention below, I'm not sure if we need these
> >
> >>+#define A6XX_GPU_LLC_SCID_MASK \
> >>+   ((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
> >>+
> >>+#define A6XX_GPUHTW_LLC_SCID_SHIFT 25
> >>+#define A6XX_GPUHTW_LLC_SCID_MASK \
> >>+   (((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) <<
> >>A6XX_GPUHTW_LLC_SCID_SHIFT)
> >>+
> >
> >Normally these go into the envytools regmap but if we're going to do these
> >guys
> >lets use the power of  for good.
> >
> >#define A6XX_GPU_LLC_SCID GENMASK(24, 0)
> >#define A6XX_GPUHTW_LLC_SCID GENMASK(29, 25)
> >
> >>+static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,
> >
> >Don't mark C functions as inline - let the compiler figure it out for you.
> >
> >>+   u32 reg, u32 mask, u32 or)
> >>+{
> >>+   msm_rmw(llc->mmio + (reg << 2), mask, or);
> >>+}
> >>+
> >>+static void a6xx_llc_deactivate(struct a6xx_llc *llc)
> >>+{
> >>+   llcc_slice_deactivate(llc->gpu_llc_slice);
> >>+   llcc_slice_deactivate(llc->gpuhtw_llc_slice);
> >>+}
> >>+
> >>+static void a6xx_llc_activate(struct a6xx_llc *llc)
> >>+{
> >>+   if (!llc->mmio)
> >>+   return;
> >>+
> >>+   /* Program the sub-cache ID for all GPU blocks */
> >>+   if (!llcc_slice_activate(llc->gpu_llc_slice))
> >>+   a6xx_gpu_cx_rmw(llc,
> >>+   REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> >>+   A6XX_GPU_LLC_SCID_MASK,
> >>+   (llc->cntl1_regval &
> >>+A6XX_GPU_LLC_SCID_MASK));
> >
> >This is out of order with the comments below, but if we store the slice id
> >then
> >you could calculate regval here and not have to store it.
> >
> >>+
> >&g

Re: [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)

2019-12-19 Thread Jordan Crouse
On Thu, Dec 19, 2019 at 12:58:15PM -0700, Jordan Crouse wrote:
> On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:




> > +
> > +   /*
> > +* CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC
> > +* FLAG cache) GPU blocks. This value will be passed along with
> > +* the address for any memory transaction from GPU to identify
> > +* the sub-cache for that transaction.
> > +*/
> > +   if (!IS_ERR(llc->gpu_llc_slice)) {
> > +   u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice);
> > +   int i;
> > +
> > +   for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++)
> > +   llc->cntl1_regval |=
> > +   gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i);
> 
> As above, i'm not sure a loop is better than just:
> 
> gpu_scid &= 0x1f;
> 
> llc->cntl1_regval = (gpu_scid << 0) || (gpu_scid << 5) | (gpu_scid << 10)
>  | (gpu_scid << 15) | (gpu_scid << 20);
> 
> And I'm not even sure we need do this math here in the first place.

One more question - can you get a valid slice id before activation?



Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] drm/msm/a6xx: Add support for using system cache(LLC)

2019-12-19 Thread Jordan Crouse
On Thu, Dec 19, 2019 at 06:44:46PM +0530, Sharat Masetty wrote:
> The last level system cache can be partitioned to 32 different slices
> of which GPU has two slices preallocated. One slice is used for caching GPU
> buffers and the other slice is used for caching the GPU SMMU pagetables.
> This patch talks to the core system cache driver to acquire the slice handles,
> configure the SCID's to those slices and activates and deactivates the slices
> upon GPU power collapse and restore.
> 
> Some support from the IOMMU driver is also needed to make use of the
> system cache. IOMMU_QCOM_SYS_CACHE is a buffer protection flag which enables
> caching GPU data buffers in the system cache with memory attributes such
> as outer cacheable, read-allocate, write-allocate for buffers. The GPU
> then has the ability to override a few cacheability parameters which it
> does to override write-allocate to write-no-allocate as the GPU hardware
> does not benefit much from it.
> 
> Similarly DOMAIN_ATTR_QCOM_SYS_CACHE is another domain level attribute
> used by the IOMMU driver to set the right attributes to cache the hardware
> pagetables into the system cache.
> 
> Signed-off-by: Sharat Masetty 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 122 
> +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   9 +++
>  drivers/gpu/drm/msm/msm_iommu.c   |  13 
>  drivers/gpu/drm/msm/msm_mmu.h |   3 +
>  4 files changed, 146 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index faff6ff..0c7fdee 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -9,6 +9,7 @@
>  #include "a6xx_gmu.xml.h"
> 
>  #include 
> +#include 
> 
>  #define GPU_PAS_ID 13
> 
> @@ -781,6 +782,117 @@ static void a6xx_bus_clear_pending_transactions(struct 
> adreno_gpu *adreno_gpu)
>   gpu_write(gpu, REG_A6XX_GBIF_HALT, 0x0);
>  }
> 
> +#define A6XX_LLC_NUM_GPU_SCIDS   5
> +#define A6XX_GPU_LLC_SCID_NUM_BITS   5

As I mention below, I'm not sure if we need these 

> +#define A6XX_GPU_LLC_SCID_MASK \
> + ((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1)
> +
> +#define A6XX_GPUHTW_LLC_SCID_SHIFT   25
> +#define A6XX_GPUHTW_LLC_SCID_MASK \
> + (((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT)
> +

Normally these go into the envytools regmap but if we're going to do these guys
lets use the power of  for good.

#define A6XX_GPU_LLC_SCID GENMASK(24, 0)
#define A6XX_GPUHTW_LLC_SCID GENMASK(29, 25)

> +static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc,

Don't mark C functions as inline - let the compiler figure it out for you.

> + u32 reg, u32 mask, u32 or)
> +{
> + msm_rmw(llc->mmio + (reg << 2), mask, or);
> +}
> +
> +static void a6xx_llc_deactivate(struct a6xx_llc *llc)
> +{
> + llcc_slice_deactivate(llc->gpu_llc_slice);
> + llcc_slice_deactivate(llc->gpuhtw_llc_slice);
> +}
> +
> +static void a6xx_llc_activate(struct a6xx_llc *llc)
> +{
> + if (!llc->mmio)
> + return;
> +
> + /* Program the sub-cache ID for all GPU blocks */
> + if (!llcc_slice_activate(llc->gpu_llc_slice))
> + a6xx_gpu_cx_rmw(llc,
> + REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> + A6XX_GPU_LLC_SCID_MASK,
> + (llc->cntl1_regval &
> +  A6XX_GPU_LLC_SCID_MASK));

This is out of order with the comments below, but if we store the slice id then
you could calculate regval here and not have to store it.

> +
> + /* Program the sub-cache ID for the GPU pagetables */
> + if (!llcc_slice_activate(llc->gpuhtw_llc_slice))

val |= FIELD_SET(A6XX_GPUHTW_LLC_SCID, htw_llc_sliceid);

> + a6xx_gpu_cx_rmw(llc,
> + REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1,
> + A6XX_GPUHTW_LLC_SCID_MASK,
> + (llc->cntl1_regval &
> +  A6XX_GPUHTW_LLC_SCID_MASK));

And this could be FIELD_SET(A6XX_GPUHTW_LLC_SCID, sliceid);

In theory you could just calculate the u32 and write it directly without a rmw.
In fact, that might be preferable - if the slice activate failed, you don't want
to run the risk that the scid for htw is still populated.

> +
> + /* Program cacheability overrides */
> + a6xx_gpu_cx_rmw(llc, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF,
> + llc->cntl0_regval);

As below, this could easily be a constant.

> +}
> +
> +static void a6xx_llc_slices_destroy(struct a6xx_llc *llc)
> +{
> + if (llc->mmio)
> + iounmap(llc->mmio);

msm_ioremap returns a devm_ managed resource, so do not use iounmap() to free
it. Bets to just leave it and let the gpu device handle it when it goes boom.

> +
> + llcc_slice_putd(llc->gpu_llc_slice);
> + llcc_slice_putd(llc->gpuhtw_llc_slice

Re: [PATCH 4/5] drm/msm: Pass mmu features to generic layers

2019-12-19 Thread Jordan Crouse
On Thu, Dec 19, 2019 at 06:44:45PM +0530, Sharat Masetty wrote:
> Allow different Adreno targets the ability to pass
> specific mmu features to the generic layers. This will
> help conditionally configure certain iommu features for
> certain Adreno targets.
> 
> Also Add a few simple support functions to support a bitmask of
> features that a specific MMU implementation supports.

This whole change could benefit from [1] which makes the address space
creation target specific.

That would get rid of most of the blobs. Further more, if you took part of [2]
that set up the mmu inside of the target specific code (skipping over the
SPLIT_PAGETABLE stuff for now) you could set mmu->features directly and not need
a helper function to do it.

[1] https://patchwork.freedesktop.org/patch/342170/
[2] https://patchwork.freedesktop.org/patch/342173/

Jordan

> Signed-off-by: Sharat Masetty 
> ---
>  drivers/gpu/drm/msm/adreno/a2xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  4 +++-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |  2 +-
>  drivers/gpu/drm/msm/msm_gpu.c   |  6 --
>  drivers/gpu/drm/msm/msm_gpu.h   |  1 +
>  drivers/gpu/drm/msm/msm_mmu.h   | 11 +++
>  10 files changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> index 1f83bc1..bbac43c 100644
> --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> @@ -472,7 +472,7 @@ struct msm_gpu *a2xx_gpu_init(struct drm_device *dev)
>  
>   adreno_gpu->reg_offsets = a2xx_register_offsets;
>  
> - ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>   if (ret)
>   goto fail;
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> index 5f7e980..63448fb 100644
> --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> @@ -488,7 +488,7 @@ struct msm_gpu *a3xx_gpu_init(struct drm_device *dev)
>   adreno_gpu->registers = a3xx_registers;
>   adreno_gpu->reg_offsets = a3xx_register_offsets;
>  
> - ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>   if (ret)
>   goto fail;
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> index ab2b752..90ae26d 100644
> --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> @@ -572,7 +572,7 @@ struct msm_gpu *a4xx_gpu_init(struct drm_device *dev)
>   adreno_gpu->registers = a4xx_registers;
>   adreno_gpu->reg_offsets = a4xx_register_offsets;
>  
> - ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>   if (ret)
>   goto fail;
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index 99cd6e6..a51ed2e 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1445,7 +1445,7 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
>  
>   check_speed_bin(&pdev->dev);
>  
> - ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4);
> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0);
>   if (ret) {
>   a5xx_destroy(&(a5xx_gpu->base.base));
>   return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index daf0780..faff6ff 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -924,7 +924,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>   adreno_gpu->registers = NULL;
>   adreno_gpu->reg_offsets = a6xx_register_offsets;
>  
> - ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0);
>   if (ret) {
>   a6xx_destroy(&(a6xx_gpu->base.base));
>   return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 048c8be..7dade16 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -895,7 +895,8 @@ static int adreno_get_pwrlevels(struct device *dev,
>  
>  int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>   struct adreno_gpu *adreno_gpu,
> - const struct adreno_gpu_funcs *funcs, int nr_rings)
> + const struct adreno_gpu_funcs *funcs, int nr_rings,
> + u

[PATCH v3 5/5] drm/msm/a6xx: Support split pagetables

2019-12-16 Thread Jordan Crouse
Attempt to enable split pagetables if the arm-smmu driver supports it.
This will move the default address space from the default region to
the address range assigned to TTBR1. The behavior should be transparent
to the driver for now but it gets the default buffers out of the way
when we want to start swapping TTBR0 for context-specific pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 52 ++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 5dc0b2c..1c6da93 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -811,6 +811,56 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
return (unsigned long)busy_time;
 }
 
+static struct msm_gem_address_space *
+a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type);
+   struct msm_gem_address_space *aspace;
+   struct msm_mmu *mmu;
+   u64 start, size;
+   u32 val = 1;
+   int ret;
+
+   if (!iommu)
+   return ERR_PTR(-ENOMEM);
+
+   /*
+* Try to request split pagetables - the request has to be made before
+* the domian is attached
+*/
+   iommu_domain_set_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
+
+   mmu = msm_iommu_new(&pdev->dev, iommu);
+   if (IS_ERR(mmu)) {
+   iommu_domain_free(iommu);
+   return ERR_CAST(mmu);
+   }
+
+   /*
+* After the domain is attached, see if the split tables were actually
+* successful.
+*/
+   ret = iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
+   if (!ret && val) {
+   /*
+* The aperture start will be at the beginning of the TTBR1
+* space so use that as a base
+*/
+   start = iommu->geometry.aperture_start;
+   size = 0x;
+   } else {
+   /* Otherwise use the legacy 32 bit region */
+   start = SZ_16M;
+   size = 0x - SZ_16M;
+   }
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", start, size);
+   if (IS_ERR(aspace))
+   iommu_domain_free(iommu);
+
+   return aspace;
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
@@ -832,7 +882,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DRM_MSM_GPU_STATE)
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
-   .create_address_space = adreno_iommu_create_address_space,
+   .create_address_space = a6xx_create_address_space,
 #endif
},
.get_timestamp = a6xx_get_timestamp,
-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 4/5] drm/msm: Refactor address space initialization

2019-12-16 Thread Jordan Crouse
Refactor how address space initialization works. Instead of having the
address space function create the MMU object (and thus require separate but
equal functions for gpummu and iommu) use a single function and pass the
MMU struct. Make the generic code cleaner by using target specific
functions to create the address space so a2xx can do its own thing in its
own space.  For all the other targets use a generic helper to initialize
IOMMU but leave the door open for newer targets to use customization
if they need it.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 ++
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 10 +++---
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 14 +
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 ---
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 11 +--
 drivers/gpu/drm/msm/msm_drv.h|  8 ++---
 drivers/gpu/drm/msm/msm_gem_vma.c| 52 +---
 drivers/gpu/drm/msm/msm_gpu.c| 40 ++--
 drivers/gpu/drm/msm/msm_gpu.h|  4 +--
 drivers/gpu/drm/msm/msm_iommu.c  |  3 ++
 16 files changed, 83 insertions(+), 114 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index 1f83bc1..60f6472 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -401,6 +401,21 @@ static struct msm_gpu_state *a2xx_gpu_state_get(struct 
msm_gpu *gpu)
return state;
 }
 
+static struct msm_gem_address_space *
+a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct msm_mmu *mmu = msm_gpummu_new(&pdev->dev, gpu);
+   struct msm_gem_address_space *aspace;
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
+   SZ_16M + 0xfff * SZ_64K);
+
+   if (IS_ERR(aspace) && !IS_ERR(mmu))
+   mmu->funcs->destroy(mmu);
+
+   return aspace;
+}
+
 /* Register offset defines for A2XX - copy of A3XX */
 static const unsigned int a2xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE),
@@ -429,6 +444,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a2xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = a2xx_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 7ad1493..41e51e0 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -441,6 +441,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a3xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index b01388a..3655440 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -532,6 +532,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a4xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a4xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index b02e204..0f5db72 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1432,6 +1432,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_busy = a5xx_gpu_busy,
.gpu_state_get = a5xx_gpu_state_get,
.gpu_state_put = a5xx_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a5xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index dc8ec2c..5dc0b2c 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -832,6 +832,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DRM_MSM_GPU_STATE)
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
 #endif
},
.get_timestamp = a6xx_get_timestamp,
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/dr

[PATCH v3 2/5] iommu/arm-smmu: Add support for split pagetables

2019-12-16 Thread Jordan Crouse
Add support to enable split pagetables (TTBR1) if the supporting driver
requests it via the DOMAIN_ATTR_SPLIT_TABLES flag. When enabled, the driver
will set up the TTBR0 and TTBR1 regions and program the default domain
pagetable on TTBR1.

After attaching the device, the value of he domain attribute can
be queried to see if the split pagetables were successfully programmed.
Furthermore the domain geometry will be updated so that the caller can
determine the active region for the pagetable that was programmed.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 40 +++-
 drivers/iommu/arm-smmu.h | 45 +++--
 2 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c106406..7b59116 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -538,9 +538,17 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
cb->ttbr[1] = 0;
} else {
-   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-   cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
-   cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
+   if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   cb->ttbr[0] = FIELD_PREP(TTBRn_ASID, cfg->asid);
+   cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[1] |=
+   FIELD_PREP(TTBRn_ASID, cfg->asid);
+   } else {
+   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[0] |=
+   FIELD_PREP(TTBRn_ASID, cfg->asid);
+   cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
+   }
}
} else {
cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
@@ -651,6 +659,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
enum io_pgtable_fmt fmt;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+   u32 quirks = 0;
 
mutex_lock(&smmu_domain->init_mutex);
if (smmu_domain->smmu)
@@ -719,6 +728,8 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
oas = smmu->ipa_size;
if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
fmt = ARM_64_LPAE_S1;
+   if (smmu_domain->split_pagetables)
+   quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
} else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) {
fmt = ARM_32_LPAE_S1;
ias = min(ias, 32UL);
@@ -788,6 +799,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
.coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK,
.tlb= smmu_domain->flush_ops,
.iommu_dev  = smmu->dev,
+   .quirks = quirks,
};
 
if (smmu_domain->non_strict)
@@ -801,8 +813,15 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
 
/* Update the domain's page sizes to reflect the page table format */
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-   domain->geometry.aperture_end = (1UL << ias) - 1;
-   domain->geometry.force_aperture = true;
+
+   if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   domain->geometry.aperture_start = ~((1ULL << ias) - 1);
+   domain->geometry.aperture_end = ~0UL;
+   } else {
+   domain->geometry.aperture_end = (1UL << ias) - 1;
+   domain->geometry.force_aperture = true;
+   smmu_domain->split_pagetables = false;
+   }
 
/* Initialise the context bank with our page table cfg */
arm_smmu_init_context_bank(smmu_domain, &pgtbl_cfg);
@@ -1484,6 +1503,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case DOMAIN_ATTR_NESTING:
*(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_SPLIT_TABLES:
+   *(int *)data = smmu_domain->split_pagetables;
+   return 0;
default:
return -ENODEV;
}
@@ -1524,6 +1546,14 @@ static int arm_smmu_domain_set_attr(s

[PATCH v3 1/5] iommu: Add DOMAIN_ATTR_SPLIT_TABLES

2019-12-16 Thread Jordan Crouse
Add a new attribute to enable and query the state of split pagetables
for the domain.

Signed-off-by: Jordan Crouse 
---

 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f2223cb..18c861e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -126,6 +126,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   DOMAIN_ATTR_SPLIT_TABLES,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/5] iommu/arm-smmu: Split pagetable support for arm-smmu-v2

2019-12-16 Thread Jordan Crouse
Another refresh to support split pagetables for Adreno GPUs as part of an
incremental process to enable per-context pagetables.

In order to support per-context pagetables the GPU needs to enable split tables
so that we can store global buffers in the TTBR1 space leaving the GPU free to
program the TTBR0 register with the address of a context specific pagetable.

This patchset adds split pagetable support if requested by the domain owner
via the DOMAIN_ATTR_SPLIT_TABLES attribute. If the attribute is non zero at
attach time, the implementation will set up the TTBR0 and TTBR1 spaces with
identical configurations and program the domain pagetable into the TTBR1
register. The TTBR0 register will be unused.

The driver can determine if split pagetables were programmed by querying
DOMAIN_ATTR_SPLIT_TABLES after attaching. The domain geometry will also be
updated to reflect the virtual address space for the TTBR1 range.

These patches are on based on top of linux-next-20191216 with [1], [2], and [3]
from Robin on the iommu list.

Change log:

v3: Remove the implementation specific and make split pagetable support
part of the generic configuration

[1] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039718.html
[2] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039719.html
[3] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039720.html


Jordan Crouse (5):
  iommu: Add DOMAIN_ATTR_SPLIT_TABLES
  iommu/arm-smmu: Add support for split pagetables
  drm/msm: Attach the IOMMU device during initialization
  drm/msm: Refactor address space initialization
  drm/msm/a6xx: Support split pagetables

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 ++
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c| 51 
 drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 18 ---
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 18 +--
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 ---
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 18 +--
 drivers/gpu/drm/msm/msm_drv.h|  8 ++---
 drivers/gpu/drm/msm/msm_gem_vma.c| 37 +--
 drivers/gpu/drm/msm/msm_gpu.c| 49 ++
 drivers/gpu/drm/msm/msm_gpu.h|  4 +--
 drivers/gpu/drm/msm/msm_gpummu.c |  6 
 drivers/gpu/drm/msm/msm_iommu.c  | 18 ++-
 drivers/gpu/drm/msm/msm_mmu.h|  1 -
 drivers/iommu/arm-smmu.c | 40 +
 drivers/iommu/arm-smmu.h | 45 
 include/linux/iommu.h|  1 +
 21 files changed, 215 insertions(+), 153 deletions(-)

-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/5] drm/msm: Attach the IOMMU device during initialization

2019-12-16 Thread Jordan Crouse
Everywhere an IOMMU object is created by msm_gpu_create_address_space
the IOMMU device is attached immediately after. Instead of carrying around
the infrastructure to do the attach from the device specific code do it
directly in the msm_iommu_init() function. This gets it out of the way for
more aggressive cleanups that follow.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  |  8 
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c |  4 
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  7 ---
 drivers/gpu/drm/msm/msm_gem_vma.c| 23 +++
 drivers/gpu/drm/msm/msm_gpu.c| 11 +--
 drivers/gpu/drm/msm/msm_gpummu.c |  6 --
 drivers/gpu/drm/msm/msm_iommu.c  | 15 +++
 drivers/gpu/drm/msm/msm_mmu.h|  1 -
 8 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
index 6c92f0f..b082b23 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
@@ -704,7 +704,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
 {
struct iommu_domain *domain;
struct msm_gem_address_space *aspace;
-   int ret;
 
domain = iommu_domain_alloc(&platform_bus_type);
if (!domain)
@@ -720,13 +719,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
return PTR_ERR(aspace);
}
 
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DPU_ERROR("failed to attach iommu %d\n", ret);
-   msm_gem_address_space_put(aspace);
-   return ret;
-   }
-
dpu_kms->base.aspace = aspace;
return 0;
 }
diff --git a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c 
b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
index dda0543..9dba37c 100644
--- a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
@@ -518,10 +518,6 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret)
-   goto fail;
} else {
DRM_DEV_INFO(dev->dev, "no iommu, fallback to phys "
"contig buffers for scanout\n");
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
index e43ecd4..653dab2 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
@@ -736,13 +736,6 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DRM_DEV_ERROR(&pdev->dev, "failed to attach iommu: 
%d\n",
-   ret);
-   goto fail;
-   }
} else {
DRM_DEV_INFO(&pdev->dev,
 "no iommu, fallback to phys contig buffers for 
scanout\n");
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 1af5354..91d993a 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -131,8 +131,8 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
const char *name)
 {
struct msm_gem_address_space *aspace;
-   u64 size = domain->geometry.aperture_end -
-   domain->geometry.aperture_start;
+   u64 start = domain->geometry.aperture_start;
+   u64 size = domain->geometry.aperture_end - start;
 
aspace = kzalloc(sizeof(*aspace), GFP_KERNEL);
if (!aspace)
@@ -141,9 +141,18 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
spin_lock_init(&aspace->lock);
aspace->name = name;
aspace->mmu = msm_iommu_new(dev, domain);
+   if (IS_ERR(aspace->mmu)) {
+   int ret = PTR_ERR(aspace->mmu);
 
-   drm_mm_init(&aspace->mm, (domain->geometry.aperture_start >> 
PAGE_SHIFT),
-   size >> PAGE_SHIFT);
+   kfree(aspace);
+   return ERR_PTR(ret);
+   }
+
+   /*
+* Attaching the IOMMU device changes the aperture values so use the
+* cached values instead
+*/
+   drm_mm_init(&aspace->mm, start >> PAGE_SHIFT, size >> PAGE_SHIFT);
 
kref_init(&aspace->kref);
 
@@ -164,6 +173,12 @@ msm_gem_address_space_create_a2xx(struct device *dev, 
struct msm_gpu *gpu,
spin_lock_init(&aspace->lock);
aspace->name = name;
aspace->mmu = msm_gpummu_new(dev, gpu);

Re: [PATCH v2 4/8] iommu/arm-smmu: Add split pagetables for Adreno IOMMU implementations

2019-12-05 Thread Jordan Crouse
On Wed, Dec 04, 2019 at 04:44:59PM +, Robin Murphy wrote:
> On 22/11/2019 11:31 pm, Jordan Crouse wrote:
> >Add implementation specific support to enable split pagetables for
> >SMMU implementations attached to Adreno GPUs on Qualcomm targets.
> >
> >To enable split pagetables the driver will set an attribute on the domain.
> >if conditions are correct, set up the hardware to support equally sized
> >TTBR0 and TTBR1 regions and programs the domain pagetable to TTBR1 to make
> >it available for global buffers while allowing the GPU the chance to
> >switch the TTBR0 at runtime for per-context pagetables.
> >
> >After programming the context, the value of the domain attribute can be
> >queried to see if split pagetables were successfully programmed. The
> >domain geometry will be updated so that the caller can determine the
> >start of the region to generate correct virtual addresses.
> 
> Why is any of this in impl? It all looks like perfectly generic
> architectural TTBR1 setup to me. As long as DOMAIN_ATTR_SPLIT_TABLES is
> explicitly an opt-in for callers, I'm OK with them having to trust that
> SEP_UPSTREAM is good enough. Or, even better, make the value of
> DOMAIN_ATTR_SPLIT_TABLES not a boolean but the actual split point, where the
> default of 0 would logically mean "no split".

(apologies if you get multiple copies of this email, I have tickets in with the
CAF IT folks).

I made it impl specific because my impression from the previous conversations
was that setting up the T0 space but leaving TTBR0 un-programmed was a silly
thing that was unique to the Adreno GPU. I don't mind moving it to the generic
code since that saves us from some silly compatible string games.

I like the idea of DOMAIN_ATTR_SPLIT_TABLES returning the split point but would
we want to allow the user to try to specific a desired split point ahead of
time? It is my impression that we only have a handful of valid SEP values and
I'm not sure what the right response would be if the user specified an incorrect
one.

So far I've not found a use for anything except SEP_UPSTREAM but I have the
extreme luxury of a SMMU with an actual 49 bit IAS.

New patchset coming soon.

Thanks,
Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 7/8] drm/msm/a6xx: Support split pagetables

2019-11-22 Thread Jordan Crouse
Attempt to enable split pagetables if the arm-smmu driver supports it.
This will move the default address space from the default region to
the address range assigned to TTBR1. The behavior should be transparent
to the driver for now but it gets the default buffers out of the way
when we want to start swapping TTBR0 for context-specific pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 46 ++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 5dc0b2c..96b3b28 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -811,6 +811,50 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
return (unsigned long)busy_time;
 }
 
+static struct msm_gem_address_space *
+a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type);
+   struct msm_gem_address_space *aspace;
+   struct msm_mmu *mmu;
+   u64 start, size;
+   u32 val = 1;
+   int ret;
+
+   if (!iommu)
+   return ERR_PTR(-ENOMEM);
+
+   /* Try to request split pagetables */
+   iommu_domain_set_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
+
+   mmu = msm_iommu_new(&pdev->dev, iommu);
+   if (IS_ERR(mmu)) {
+   iommu_domain_free(iommu);
+   return ERR_CAST(mmu);
+   }
+
+   /* Check to see if split pagetables were successful */
+   ret = iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
+   if (!ret && val) {
+   /*
+* The aperture start will be at the beginning of the TTBR1
+* space so use that as a base
+*/
+   start = iommu->geometry.aperture_start;
+   size = 0x;
+   } else {
+   /* Otherwise use the legacy 32 bit region */
+   start = SZ_16M;
+   size = 0x - SZ_16M;
+   }
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", start, size);
+   if (IS_ERR(aspace))
+   iommu_domain_free(iommu);
+
+   return aspace;
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
@@ -832,7 +876,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DRM_MSM_GPU_STATE)
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
-   .create_address_space = adreno_iommu_create_address_space,
+   .create_address_space = a6xx_create_address_space,
 #endif
},
.get_timestamp = a6xx_get_timestamp,
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 6/8] drm/msm: Refactor address space initialization

2019-11-22 Thread Jordan Crouse
Refactor how address space initialization works. Instead of having the
address space function create the MMU object (and thus require separate but
equal functions for gpummu and iommu) use a single function and pass the
MMU struct in. Make the generic code cleaner by using target specific
functions to create the address space so a2xx can do its own thing in its
own space.  For all the other targets use a generic helper to initialize
IOMMU but leave the door open for newer targets to use customization
if they need it.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 ++
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 10 +++---
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 14 +
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 ---
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 11 +--
 drivers/gpu/drm/msm/msm_drv.h|  8 ++---
 drivers/gpu/drm/msm/msm_gem_vma.c| 52 +---
 drivers/gpu/drm/msm/msm_gpu.c| 40 ++--
 drivers/gpu/drm/msm/msm_gpu.h|  4 +--
 drivers/gpu/drm/msm/msm_iommu.c  |  3 ++
 16 files changed, 83 insertions(+), 114 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index 1f83bc1..60f6472 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -401,6 +401,21 @@ static struct msm_gpu_state *a2xx_gpu_state_get(struct 
msm_gpu *gpu)
return state;
 }
 
+static struct msm_gem_address_space *
+a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct msm_mmu *mmu = msm_gpummu_new(&pdev->dev, gpu);
+   struct msm_gem_address_space *aspace;
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
+   SZ_16M + 0xfff * SZ_64K);
+
+   if (IS_ERR(aspace) && !IS_ERR(mmu))
+   mmu->funcs->destroy(mmu);
+
+   return aspace;
+}
+
 /* Register offset defines for A2XX - copy of A3XX */
 static const unsigned int a2xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE),
@@ -429,6 +444,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a2xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = a2xx_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 7ad1493..41e51e0 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -441,6 +441,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a3xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index b01388a..3655440 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -532,6 +532,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a4xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a4xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index b02e204..0f5db72 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1432,6 +1432,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_busy = a5xx_gpu_busy,
.gpu_state_get = a5xx_gpu_state_get,
.gpu_state_put = a5xx_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a5xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index dc8ec2c..5dc0b2c 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -832,6 +832,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DRM_MSM_GPU_STATE)
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
 #endif
},
.get_timestamp = a6xx_get_timestamp,
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/

[PATCH v2 8/8] arm64: dts: qcom: sdm845: Update Adreno GPU SMMU compatible string

2019-11-22 Thread Jordan Crouse
Add "qcom,adreno-smmu-v2" compatible string for the Adreno GPU SMMU node
to enable split pagetable support.

Signed-off-by: Jordan Crouse 
---

 arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index ddb1f23..d90ba6eda 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -2869,7 +2869,7 @@
};
 
adreno_smmu: iommu@504 {
-   compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2";
+   compatible = "qcom,adreno-smmu-v2", "qcom,smmu-v2";
reg = <0 0x504 0 0x1>;
#iommu-cells = <1>;
#global-interrupts = <2>;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/8] iommu/arm-smmu: Split pagetable support for Adreno GPUs

2019-11-22 Thread Jordan Crouse


Another refresh to support split pagetables for Adreno GPUs as part of an
incremental process to enable per-context pagetables.

In order to support per-context pagetables the GPU needs to enable split tables
so that we can store global buffers in the TTBR1 space leaving the GPU free to
program the TTBR0 register with the address of a context specific pagetable.

This patchset adds split pagetable support for devices identified with the
compatible string qcom,adreno-smmu-v2. If the compatible string is enabled and
DOMAIN_ATTR_SPLIT_TABLES is non zero at attach time, the implementation will
set up the TTBR0 and TTBR1 spaces with identical configurations and program
the domain pagetable into the TTBR1 register. The TTBR0 register will be
unused.

The driver can determine if split pagetables were programmed by querying
DOMAIN_ATTR_SPLIT_TABLES after attaching. The domain geometry will also be
updated to reflect the virtual address space for the TTBR1 range.

These patches are on based on top of linux-next-20191120 with [1], [2], and [3]
from Robin on the iommu list.

The first four patches add the device tree bindings and implementation
specific support for arm-smmu and the rest of the patches add the drm/msm
implementation followed by the device tree update for sdm845.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039718.html
[2] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039719.html
[3] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039720.html


Jordan Crouse (8):
  dt-bindings: arm-smmu: Add Adreno GPU variant
  iommu: Add DOMAIN_ATTR_SPLIT_TABLES
  iommu/arm-smmu: Pass io_pgtable_cfg to impl specific init_context
  iommu/arm-smmu: Add split pagetables for Adreno IOMMU implementations
  drm/msm: Attach the IOMMU device during initialization
  drm/msm: Refactor address space initialization
  drm/msm/a6xx: Support split pagetables
  arm64: dts: qcom: sdm845:  Update Adreno GPU SMMU compatible string

 .../devicetree/bindings/iommu/arm,smmu.yaml|  6 ++
 arch/arm64/boot/dts/qcom/sdm845.dtsi   |  2 +-
 drivers/gpu/drm/msm/adreno/a2xx_gpu.c  | 16 
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c  |  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c  |  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c  |  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 45 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c| 23 --
 drivers/gpu/drm/msm/adreno/adreno_gpu.h|  8 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c| 18 ++--
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c   | 18 ++--
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c   |  4 -
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c   | 18 ++--
 drivers/gpu/drm/msm/msm_drv.h  |  8 +-
 drivers/gpu/drm/msm/msm_gem_vma.c  | 37 ++---
 drivers/gpu/drm/msm/msm_gpu.c  | 49 +--
 drivers/gpu/drm/msm/msm_gpu.h  |  4 +-
 drivers/gpu/drm/msm/msm_gpummu.c   |  6 --
 drivers/gpu/drm/msm/msm_iommu.c| 18 ++--
 drivers/gpu/drm/msm/msm_mmu.h  |  1 -
 drivers/iommu/arm-smmu-impl.c  |  6 +-
 drivers/iommu/arm-smmu-qcom.c  | 96 ++
 drivers/iommu/arm-smmu.c   | 52 +---
 drivers/iommu/arm-smmu.h   | 14 +++-
 include/linux/iommu.h  |  1 +
 25 files changed, 295 insertions(+), 158 deletions(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 5/8] drm/msm: Attach the IOMMU device during initialization

2019-11-22 Thread Jordan Crouse
Everywhere an IOMMU object is created by msm_gpu_create_address_space
the IOMMU device is attached immediately after. Instead of carrying around
the infrastructure to do the attach from the device specific code do it
directly in the msm_iommu_init() function. This gets it out of the way for
more aggressive cleanups that follow.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  |  8 
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c |  4 
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  7 ---
 drivers/gpu/drm/msm/msm_gem_vma.c| 23 +++
 drivers/gpu/drm/msm/msm_gpu.c| 11 +--
 drivers/gpu/drm/msm/msm_gpummu.c |  6 --
 drivers/gpu/drm/msm/msm_iommu.c  | 15 +++
 drivers/gpu/drm/msm/msm_mmu.h|  1 -
 8 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
index 6c92f0f..b082b23 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
@@ -704,7 +704,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
 {
struct iommu_domain *domain;
struct msm_gem_address_space *aspace;
-   int ret;
 
domain = iommu_domain_alloc(&platform_bus_type);
if (!domain)
@@ -720,13 +719,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
return PTR_ERR(aspace);
}
 
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DPU_ERROR("failed to attach iommu %d\n", ret);
-   msm_gem_address_space_put(aspace);
-   return ret;
-   }
-
dpu_kms->base.aspace = aspace;
return 0;
 }
diff --git a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c 
b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
index dda0543..9dba37c 100644
--- a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
@@ -518,10 +518,6 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret)
-   goto fail;
} else {
DRM_DEV_INFO(dev->dev, "no iommu, fallback to phys "
"contig buffers for scanout\n");
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
index e43ecd4..653dab2 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
@@ -736,13 +736,6 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DRM_DEV_ERROR(&pdev->dev, "failed to attach iommu: 
%d\n",
-   ret);
-   goto fail;
-   }
} else {
DRM_DEV_INFO(&pdev->dev,
 "no iommu, fallback to phys contig buffers for 
scanout\n");
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 1af5354..91d993a 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -131,8 +131,8 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
const char *name)
 {
struct msm_gem_address_space *aspace;
-   u64 size = domain->geometry.aperture_end -
-   domain->geometry.aperture_start;
+   u64 start = domain->geometry.aperture_start;
+   u64 size = domain->geometry.aperture_end - start;
 
aspace = kzalloc(sizeof(*aspace), GFP_KERNEL);
if (!aspace)
@@ -141,9 +141,18 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
spin_lock_init(&aspace->lock);
aspace->name = name;
aspace->mmu = msm_iommu_new(dev, domain);
+   if (IS_ERR(aspace->mmu)) {
+   int ret = PTR_ERR(aspace->mmu);
 
-   drm_mm_init(&aspace->mm, (domain->geometry.aperture_start >> 
PAGE_SHIFT),
-   size >> PAGE_SHIFT);
+   kfree(aspace);
+   return ERR_PTR(ret);
+   }
+
+   /*
+* Attaching the IOMMU device changes the aperture values so use the
+* cached values instead
+*/
+   drm_mm_init(&aspace->mm, start >> PAGE_SHIFT, size >> PAGE_SHIFT);
 
kref_init(&aspace->kref);
 
@@ -164,6 +173,12 @@ msm_gem_address_space_create_a2xx(struct device *dev, 
struct msm_gpu *gpu,
spin_lock_init(&aspace->lock);
aspace->name = name;
aspace->mmu = msm_gpummu_new(dev, gpu);

[PATCH v2 4/8] iommu/arm-smmu: Add split pagetables for Adreno IOMMU implementations

2019-11-22 Thread Jordan Crouse
Add implementation specific support to enable split pagetables for
SMMU implementations attached to Adreno GPUs on Qualcomm targets.

To enable split pagetables the driver will set an attribute on the domain.
if conditions are correct, set up the hardware to support equally sized
TTBR0 and TTBR1 regions and programs the domain pagetable to TTBR1 to make
it available for global buffers while allowing the GPU the chance to
switch the TTBR0 at runtime for per-context pagetables.

After programming the context, the value of the domain attribute can be
queried to see if split pagetables were successfully programmed. The
domain geometry will be updated so that the caller can determine the
start of the region to generate correct virtual addresses.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-impl.c |  3 ++
 drivers/iommu/arm-smmu-qcom.c | 96 +++
 drivers/iommu/arm-smmu.c  | 41 ++
 drivers/iommu/arm-smmu.h  | 11 +
 4 files changed, 143 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 33ed682..1e91231 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -174,5 +174,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_device_is_compatible(smmu->dev->of_node, "qcom,sdm845-smmu-500"))
return qcom_smmu_impl_init(smmu);
 
+   if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu-v2"))
+   return adreno_smmu_impl_init(smmu);
+
return smmu;
 }
diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index 24c071c..6591e49 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -11,6 +11,102 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
 };
 
+#define TG0_4K  0
+#define TG0_64K 1
+#define TG0_16K 2
+
+#define TG1_16K 1
+#define TG1_4K  2
+#define TG1_64K 3
+
+/*
+ * Set up split pagetables for Adreno SMMUs that will keep a static TTBR1 for
+ * global buffers and dynamically switch TTBR0 from the GPU for context 
specific
+ * pagetables.
+ */
+static int adreno_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+   struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx];
+   u32 tcr, tg0;
+
+   /*
+* Return error if split pagetables are not enabled so that arm-smmu
+* do the default configuration
+*/
+   if (!(pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1))
+   return -EINVAL;
+
+   /* Get the bank configuration from the pagetable config */
+   tcr = arm_smmu_lpae_tcr(pgtbl_cfg) & 0x;
+
+   /*
+* The TCR configuration for TTBR0 and TTBR1 is (almost) identical so
+* just duplicate the T0 configuration and shift it
+*/
+   cb->tcr[0] = (tcr << 16) | tcr;
+
+   /*
+* The (almost) above refers to the granule size field which is
+* different for TTBR0 and TTBR1. With the TTBR1 quirk enabled,
+* io-pgtable-arm will write the T1 appropriate granule size for tg.
+* Translate the configuration from the T1 field to get the right value
+* for T0
+*/
+   if (pgtbl_cfg->arm_lpae_s1_cfg.tcr.tg == TG1_4K)
+   tg0 = TG0_4K;
+   else if (pgtbl_cfg->arm_lpae_s1_cfg.tcr.tg == TG1_16K)
+   tg0 = TG0_16K;
+   else
+   tg0 = TG0_64K;
+
+   /* clear and set the correct value for TG0  */
+   cb->tcr[0] &= ~TCR_TG0;
+   cb->tcr[0] |= FIELD_PREP(TCR_TG0, tg0);
+
+   /*
+* arm_smmu_lape_tcr2 sets SEP_UPSTREAM which is always the appropriate
+* SEP for Adreno IOMMU
+*/
+   cb->tcr[1] = arm_smmu_lpae_tcr2(pgtbl_cfg);
+   cb->tcr[1] |= TCR2_AS;
+
+   /* TTBRs */
+   cb->ttbr[0] = FIELD_PREP(TTBRn_ASID, cfg->asid);
+   cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[1] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
+
+   /* MAIRs */
+   cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair;
+   cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;
+
+   return 0;
+}
+
+static int adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   /* Enable split pagetables if the flag is set and the format matches */
+   if (smmu_domain->split_pagetables)
+   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 &&
+   smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64)
+   pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
+
+   return 0;
+}
+
+static const struct arm_smmu_impl adreno_smm

[PATCH v2 3/8] iommu/arm-smmu: Pass io_pgtable_cfg to impl specific init_context

2019-11-22 Thread Jordan Crouse
Pass the propposed io_pgtable_cfg to the implementation specific
init_context() function to give the implementation an opportunity to to
modify it before it gets passed to io-pgtable.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-impl.c |  3 ++-
 drivers/iommu/arm-smmu.c  | 11 ++-
 drivers/iommu/arm-smmu.h  |  3 ++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index b2fe72a..33ed682 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
return 0;
 }
 
-static int cavium_init_context(struct arm_smmu_domain *smmu_domain)
+static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
 {
struct cavium_smmu *cs = container_of(smmu_domain->smmu,
  struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c106406..5c7c32b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -775,11 +775,6 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
cfg->asid = cfg->cbndx;
 
smmu_domain->smmu = smmu;
-   if (smmu->impl && smmu->impl->init_context) {
-   ret = smmu->impl->init_context(smmu_domain);
-   if (ret)
-   goto out_unlock;
-   }
 
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
@@ -790,6 +785,12 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
.iommu_dev  = smmu->dev,
};
 
+   if (smmu->impl && smmu->impl->init_context) {
+   ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg);
+   if (ret)
+   goto out_unlock;
+   }
+
if (smmu_domain->non_strict)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index afab9de..0eb498f 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -357,7 +357,8 @@ struct arm_smmu_impl {
u64 val);
int (*cfg_probe)(struct arm_smmu_device *smmu);
int (*reset)(struct arm_smmu_device *smmu);
-   int (*init_context)(struct arm_smmu_domain *smmu_domain);
+   int (*init_context)(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg);
void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
 int status);
 };
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/8] dt-bindings: arm-smmu: Add Adreno GPU variant

2019-11-22 Thread Jordan Crouse
Add a compatible string to identify SMMUs that are attached
to Adreno GPU devices that wish to support split pagetables.

Signed-off-by: Jordan Crouse 
---

 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 6515dbe..db9f826 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -31,6 +31,12 @@ properties:
   - qcom,sdm845-smmu-v2
   - const: qcom,smmu-v2
 
+  - description: Qcom Adreno GPU SMMU iplementing split pagetables
+items:
+  - enum:
+  - qcom,adreno-smmu-v2
+  - const: qcom,smmu-v2
+
   - description: Qcom SoCs implementing "arm,mmu-500"
 items:
   - enum:
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


<    1   2   3   4   >