Hi Robin, On Fri, Dec 7, 2018 at 2:54 PM Vivek Gautam <vivek.gau...@codeaurora.org> wrote: > > Hi Robin, > > On Tue, Dec 4, 2018 at 8:51 PM Robin Murphy <robin.mur...@arm.com> wrote: > > > > On 04/12/2018 11:01, Vivek Gautam wrote: > > > Qualcomm SoCs have an additional level of cache called as > > > System cache, aka. Last level cache (LLC). This cache sits right > > > before the DDR, and is tightly coupled with the memory controller. > > > The cache is available to all the clients present in the SoC system. > > > The clients request their slices from this system cache, make it > > > active, and can then start using it. > > > For these clients with smmu, to start using the system cache for > > > buffers and, related page tables [1], memory attributes need to be > > > set accordingly. > > > This change updates the MAIR and TCR configurations with correct > > > attributes to use this system cache. > > > > > > To explain a little about memory attribute requirements here: > > > > > > Non-coherent I/O devices can't look-up into inner caches. However, > > > coherent I/O devices can. But both can allocate in the system cache > > > based on system policy and configured memory attributes in page > > > tables. > > > CPUs can access both inner and outer caches (including system cache, > > > aka. Last level cache), and can allocate into system cache too > > > based on memory attributes, and system policy. > > > > > > Further looking at memory types, we have following - > > > a) Normal uncached :- MAIR 0x44, inner non-cacheable, > > > outer non-cacheable; > > > b) Normal cached :- MAIR 0xff, inner read write-back non-transient, > > > outer read write-back non-transient; > > > attribute setting for coherenet I/O devices. > > > > > > and, for non-coherent i/o devices that can allocate in system cache > > > another type gets added - > > > c) Normal sys-cached/non-inner-cached :- > > > MAIR 0xf4, inner non-cacheable, > > > outer read write-back non-transient > > > > > > So, CPU will automatically use the system cache for memory marked as > > > normal cached. The normal sys-cached is downgraded to normal non-cached > > > memory for CPUs. > > > Coherent I/O devices can use system cache by marking the memory as > > > normal cached. > > > Non-coherent I/O devices, to use system cache, should mark the memory as > > > normal sys-cached in page tables. > > > > > > This change is a realisation of following changes > > > from downstream msm-4.9: > > > iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2] > > > iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3] > > > > > > [1] https://patchwork.kernel.org/patch/10302791/ > > > [2] > > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=bf762276796e79ca90014992f4d9da5593fa7d51 > > > [3] > > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=d4c72c413ea27c43f60825193d4de9cb8ffd9602 > > > > > > Signed-off-by: Vivek Gautam <vivek.gau...@codeaurora.org> > > > --- > > > > > > Changes since v1: > > > - Addressed Tomasz's comments for basing the change on > > > "NO_INNER_CACHE" concept for non-coherent I/O devices > > > rather than capturing "SYS_CACHE". This is to indicate > > > clearly the intent of non-coherent I/O devices that > > > can't access inner caches. > > > > That seems backwards to me - there is already a fundamental assumption > > that non-coherent devices can't access caches. What we're adding here is > > a weird exception where they *can* use some level of cache despite still > > being non-coherent overall. > > > > In other words, it's not a case of downgrading coherent devices' > > accesses to bypass inner caches, it's upgrading non-coherent devices' > > accesses to hit the outer cache. That's certainly the understanding I > > got from talking with Pratik at Plumbers, and it does appear to fit with > > your explanation above despite the final conclusion you draw being > > different. > > Thanks for the thorough review of the change. > Right, I guess it's rather an upgrade for non-coherent devices to use > an outer cache than a downgrade for coherent devices. > > > > > I do see what Tomasz meant in terms of the TCR attributes, but what we > > currently do there is a little unintuitive and not at all representative > > of actual mapping attributes - I'll come back to that inline. > > > > > drivers/iommu/arm-smmu.c | 15 +++++++++++++++ > > > drivers/iommu/dma-iommu.c | 3 +++ > > > drivers/iommu/io-pgtable-arm.c | 22 +++++++++++++++++----- > > > drivers/iommu/io-pgtable.h | 5 +++++ > > > include/linux/iommu.h | 3 +++ > > > 5 files changed, 43 insertions(+), 5 deletions(-) > > > > As a minor nit, I'd prefer this as at least two patches to separate the > > io-pgtable changes and arm-smmu changes - basically I'd expect it to > > look much the same as the non-strict mode support did. > > Sure, will split the patch. > > > > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > > > index ba18d89d4732..047f7ff95b0d 100644 > > > --- a/drivers/iommu/arm-smmu.c > > > +++ b/drivers/iommu/arm-smmu.c > > > @@ -255,6 +255,7 @@ struct arm_smmu_domain { > > > struct mutex init_mutex; /* Protects smmu > > > pointer */ > > > spinlock_t cb_lock; /* Serialises ATS1* ops > > > and TLB syncs */ > > > struct iommu_domain domain; > > > + bool no_inner_cache; > > > > Can we keep all the domain flags together please? In fact, I'd be > > inclined to implement an options bitmap as we do elsewhere rather than > > proliferate multiple bools. > > Yea, changing this to bitmap makes sense. Will update this. > > > > > > }; > > > > > > struct arm_smmu_option_prop { > > > @@ -897,6 +898,9 @@ static int arm_smmu_init_domain_context(struct > > > iommu_domain *domain, > > > if (smmu_domain->non_strict) > > > pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; > > > > > > + if (smmu_domain->no_inner_cache) > > > + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NO_IC; > > > > Maybe we need to be a bit cleverer about setting the quirk (and/or > > allowing the domain attribute to be set), since depending on > > configuration and hardware support the domain may end up picking a stage > > 2 or short-descriptor format and thus being rendered unusable. > > I don't think I completely get you here. > But, do you mean that to set such quirks we should first check configurations > such as the domain's stage, and the format before deciding whether > we want to set this or not? > > > > > > + > > > smmu_domain->smmu = smmu; > > > pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain); > > > if (!pgtbl_ops) { > > > @@ -1579,6 +1583,9 @@ static int arm_smmu_domain_get_attr(struct > > > iommu_domain *domain, > > > case DOMAIN_ATTR_NESTING: > > > *(int *)data = (smmu_domain->stage == > > > ARM_SMMU_DOMAIN_NESTED); > > > return 0; > > > + case DOMAIN_ATTR_NO_IC: > > > + *((int *)data) = smmu_domain->no_inner_cache; > > > + return 0; > > > default: > > > return -ENODEV; > > > } > > > @@ -1619,6 +1626,14 @@ static int arm_smmu_domain_set_attr(struct > > > iommu_domain *domain, > > > else > > > smmu_domain->stage = ARM_SMMU_DOMAIN_S1; > > > break; > > > + case DOMAIN_ATTR_NO_IC: > > > + if (smmu_domain->smmu) { > > > + ret = -EPERM; > > > + goto out_unlock; > > > + } > > > + if (*((int *)data)) > > > + smmu_domain->no_inner_cache = true; > > > > This makes the attribute impossible to disable again, even before the > > domain is initialized - is that intentional? (and if so, why?) > > Right. I should add for data = 0 as well. That should help to disable this > attribute again. > > > > > > + break; > > > default: > > > ret = -ENODEV; > > > } > > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > > > index d1b04753b204..87c3d59c4a6c 100644 > > > --- a/drivers/iommu/dma-iommu.c > > > +++ b/drivers/iommu/dma-iommu.c > > > @@ -354,6 +354,9 @@ int dma_info_to_prot(enum dma_data_direction dir, > > > bool coherent, > > > { > > > int prot = coherent ? IOMMU_CACHE : 0; > > > > > > + if (!coherent && (attrs & DOMAIN_ATTR_NO_IC)) > > > + prot |= IOMMU_NO_IC; > > > + > > > > Erm, that's going to be a hilariously unexpected interpretation of > > DMA_ATTR_FORCE_CONTIGUOUS... > > Right. :) > I guess i will take your suggestion to have something like > DOMAIN_ATTR_QCOM_SYSTEM_CACHE. > > > > > I'm not sure it would really makes sense to expose fine-grained controls > > at the DMA API level anyway, given the main point is to largely abstract > > away the notion of caches altogether. > > But there are DMA devices (such as video) which use DMA mapping APIs only, > and which are non-coherent-upgraded-to-use-sytem-cache. Such devices > can't force set IOMMU quirks unless they do iommu_get_domain_for_dev() > and then set the domain attributes. > Will that be better way?
Any suggestions here? > > > > > > if (attrs & DMA_ATTR_PRIVILEGED) > > > prot |= IOMMU_PRIV; > > > > > > diff --git a/drivers/iommu/io-pgtable-arm.c > > > b/drivers/iommu/io-pgtable-arm.c > > > index 237cacd4a62b..815b86067bcc 100644 > > > --- a/drivers/iommu/io-pgtable-arm.c > > > +++ b/drivers/iommu/io-pgtable-arm.c > > > @@ -168,10 +168,12 @@ > > > #define ARM_LPAE_MAIR_ATTR_MASK 0xff > > > #define ARM_LPAE_MAIR_ATTR_DEVICE 0x04 > > > #define ARM_LPAE_MAIR_ATTR_NC 0x44 > > > +#define ARM_LPAE_MAIR_ATTR_NO_IC 0xf4 > > > #define ARM_LPAE_MAIR_ATTR_WBRWA 0xff > > > #define ARM_LPAE_MAIR_ATTR_IDX_NC 0 > > > #define ARM_LPAE_MAIR_ATTR_IDX_CACHE 1 > > > #define ARM_LPAE_MAIR_ATTR_IDX_DEV 2 > > > +#define ARM_LPAE_MAIR_ATTR_IDX_NO_IC 3 > > > > > > /* IOPTE accessors */ > > > #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d)) > > > @@ -443,6 +445,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct > > > arm_lpae_io_pgtable *data, > > > else if (prot & IOMMU_CACHE) > > > pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE > > > << ARM_LPAE_PTE_ATTRINDX_SHIFT); > > > + else if (prot & IOMMU_NO_IC) > > > + pte |= (ARM_LPAE_MAIR_ATTR_IDX_NO_IC > > > + << ARM_LPAE_PTE_ATTRINDX_SHIFT); > > > } else { > > > pte = ARM_LPAE_PTE_HAP_FAULT; > > > if (prot & IOMMU_READ) > > > @@ -780,7 +785,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg > > > *cfg, void *cookie) > > > struct arm_lpae_io_pgtable *data; > > > > > > if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | > > > IO_PGTABLE_QUIRK_NO_DMA | > > > - IO_PGTABLE_QUIRK_NON_STRICT)) > > > + IO_PGTABLE_QUIRK_NON_STRICT | > > > + IO_PGTABLE_QUIRK_NO_IC)) > > > return NULL; > > > > > > data = arm_lpae_alloc_pgtable(cfg); > > > @@ -788,9 +794,13 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg > > > *cfg, void *cookie) > > > return NULL; > > > > > > /* TCR */ > > > - reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) | > > > - (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) | > > > - (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT); > > > > The subtle assumption here is that if the SMMU is coherent then these > > are the attributes we actually want to use, but if it's non-coherent > > then the interconnect should ignore them anyway so it doesn't really > > matter. Does either of those aspects hold for qcom SoCs? > > From the downstream [1] it's clear that default for smmu is set to > Non-cached access. > So, I don't think the interconnect helps us. OR, possibly we are just > forcing these > mappings be uncached. > > > > > TBH if we're going to touch the TCR attributes at all then we should > > probably correct that sloppiness first - there's an occasional argument > > for using non-cacheable pagetables even on a coherent SMMU if reducing > > snoop traffic/latency on walks outweighs the cost of cache maintenance > > on PTE updates, but anyone thinking they can get that by overriding > > dma-coherent silently gets the worst of both worlds thanks to this > > current TCR value. > > So, what do you suggest? > This is something that's smmu's implementation specific detail, not something > that's going to vary from one domain to another? Isn't that right? > So, in that case additional dt property can help setting a quirk? I have a change that adds "arm,smmu-pgtable-non-coherent" option and based on that adds a quick IO_PGTABLE_QUIRK_NON_COHERENT. But before that I would like to check if we can make use of IO_PGTABLE_QUIRK_NO_DMA? In present design though we don't force page table mappings to be non-coherent based on this quirk. Do we just rely on the interconnect, as you said earlier, for non-coherent SMMU? Anyone who wants to just force smmu's page table to be non-coherent can use IO_PGTABLE_QUIRK_NON_COHERENT when not declaring the SMMU as dma-coherent. [snip] -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation