Re: [PATCH 2/2] iommu: arm-smmu-v3: Report domain nesting info reuqired for stage1
Hi Eric, On Fri, Feb 12, 2021 at 11:44 PM Auger Eric wrote: > > Hi Vivek, > > On 2/12/21 11:58 AM, Vivek Gautam wrote: > > Update nested domain information required for stage1 page table. > > s/reuqired/required in the commit title Oh! my bad. > > > > Signed-off-by: Vivek Gautam > > --- > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 ++-- > > 1 file changed, 14 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > index c11dd3940583..728018921fae 100644 > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > @@ -2555,6 +2555,7 @@ static int arm_smmu_domain_nesting_info(struct > > arm_smmu_domain *smmu_domain, > > void *data) > > { > > struct iommu_nesting_info *info = (struct iommu_nesting_info *)data; > > + struct arm_smmu_device *smmu = smmu_domain->smmu; > > unsigned int size; > > > > if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED) > > @@ -2571,9 +2572,20 @@ static int arm_smmu_domain_nesting_info(struct > > arm_smmu_domain *smmu_domain, > > return 0; > > } > > > > - /* report an empty iommu_nesting_info for now */ > > - memset(info, 0x0, size); > > + /* Update the nesting info as required for stage1 page tables */ > > + info->addr_width = smmu->ias; > > + info->format = IOMMU_PASID_FORMAT_ARM_SMMU_V3; > > + info->features = IOMMU_NESTING_FEAT_BIND_PGTBL | > I understood IOMMU_NESTING_FEAT_BIND_PGTBL advertises the requirement to > bind tables per PASID, ie. passing iommu_gpasid_bind_data. > In ARM case I guess you plan to use attach/detach_pasid_table API with > iommu_pasid_table_config struct. So I understood we should add a new > feature here. Right, the idea is to let vfio know that we support pasid table binding, and I thought we could use the same flag. But clearly that's not the case. Will add a new feature. > > + IOMMU_NESTING_FEAT_PAGE_RESP | > > + IOMMU_NESTING_FEAT_CACHE_INVLD; > > + info->pasid_bits = smmu->ssid_bits; > > + info->vendor.smmuv3.asid_bits = smmu->asid_bits; > > + info->vendor.smmuv3.pgtbl_fmt = ARM_64_LPAE_S1; > > + memset(&info->padding, 0x0, 12); > > + memset(&info->vendor.smmuv3.padding, 0x0, 9); > > + > > info->argsz = size; > > + > spurious new line Sure, will remove it. Best regards Vivek > > return 0; > > } > > > > > > ___ > iommu mailing list > io...@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] iommu: arm-smmu-v3: Report domain nesting info reuqired for stage1
Update nested domain information required for stage1 page table. Signed-off-by: Vivek Gautam --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index c11dd3940583..728018921fae 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2555,6 +2555,7 @@ static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain, void *data) { struct iommu_nesting_info *info = (struct iommu_nesting_info *)data; + struct arm_smmu_device *smmu = smmu_domain->smmu; unsigned int size; if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED) @@ -2571,9 +2572,20 @@ static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain, return 0; } - /* report an empty iommu_nesting_info for now */ - memset(info, 0x0, size); + /* Update the nesting info as required for stage1 page tables */ + info->addr_width = smmu->ias; + info->format = IOMMU_PASID_FORMAT_ARM_SMMU_V3; + info->features = IOMMU_NESTING_FEAT_BIND_PGTBL | +IOMMU_NESTING_FEAT_PAGE_RESP | +IOMMU_NESTING_FEAT_CACHE_INVLD; + info->pasid_bits = smmu->ssid_bits; + info->vendor.smmuv3.asid_bits = smmu->asid_bits; + info->vendor.smmuv3.pgtbl_fmt = ARM_64_LPAE_S1; + memset(&info->padding, 0x0, 12); + memset(&info->vendor.smmuv3.padding, 0x0, 9); + info->argsz = size; + return 0; } -- 2.17.1
[PATCH 0/2] Domain nesting info for arm-smmu
These couple of patches are adding nesting information for arm and are based on the domain nesting info patches by Yi [1,2,3]. Based on the discussion in the thread [4], sending these out as I have been using in my tree [5] for nested translation based on virtio-iommu on Arm reference platforms. Thanks & regards Vivek [1] https://lore.kernel.org/kvm/1599734733-6431-2-git-send-email-yi.l@intel.com/ [2] https://lore.kernel.org/kvm/1599734733-6431-3-git-send-email-yi.l@intel.com/ [3] https://lore.kernel.org/kvm/1599734733-6431-4-git-send-email-yi.l@intel.com/ [4] https://lore.kernel.org/kvm/306e7dd2-9eb2-0ca3-6a93-7c9aa0821...@arm.com/ [5] https://github.com/vivek-arm/linux/tree/5.11-rc3-nested-pgtbl-arm-smmuv3-virtio-iommu Vivek Gautam (2): iommu: Report domain nesting info for arm-smmu-v3 iommu: arm-smmu-v3: Report domain nesting info reuqired for stage1 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 +-- include/uapi/linux/iommu.h | 31 + 2 files changed, 39 insertions(+), 8 deletions(-) -- 2.17.1
[PATCH 1/2] iommu: Report domain nesting info for arm-smmu-v3
Add a vendor specific structure for domain nesting info for arm smmu-v3, and necessary info fields required to populate stage1 page tables. Signed-off-by: Vivek Gautam --- include/uapi/linux/iommu.h | 31 +-- 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h index 4d3d988fa353..5f059bcf7720 100644 --- a/include/uapi/linux/iommu.h +++ b/include/uapi/linux/iommu.h @@ -323,7 +323,8 @@ struct iommu_gpasid_bind_data { #define IOMMU_GPASID_BIND_VERSION_11 __u32 version; #define IOMMU_PASID_FORMAT_INTEL_VTD 1 -#define IOMMU_PASID_FORMAT_LAST2 +#define IOMMU_PASID_FORMAT_ARM_SMMU_V3 2 +#define IOMMU_PASID_FORMAT_LAST3 __u32 format; __u32 addr_width; #define IOMMU_SVA_GPASID_VAL (1 << 0) /* guest PASID valid */ @@ -409,6 +410,21 @@ struct iommu_nesting_info_vtd { __u64 ecap_reg; }; +/* + * struct iommu_nesting_info_arm_smmuv3 - Arm SMMU-v3 nesting info. + */ +struct iommu_nesting_info_arm_smmuv3 { + __u32 flags; + __u16 asid_bits; + + /* Arm LPAE page table format as per kernel */ +#define ARM_PGTBL_32_LPAE_S1 (0x0) +#define ARM_PGTBL_64_LPAE_S1 (0x2) + __u8pgtbl_fmt; + + __u8padding[9]; +}; + /* * struct iommu_nesting_info - Information for nesting-capable IOMMU. *userspace should check it before using @@ -445,11 +461,13 @@ struct iommu_nesting_info_vtd { * +---+--+ * * data struct types defined for @format: - * ++=+ - * | @format| data struct | - * ++=+ - * | IOMMU_PASID_FORMAT_INTEL_VTD | struct iommu_nesting_info_vtd | - * ++-+ + * ++==+ + * | @format| data struct | + * ++==+ + * | IOMMU_PASID_FORMAT_INTEL_VTD | struct iommu_nesting_info_vtd| + * +---+---+ + * | IOMMU_PASID_FORMAT_ARM_SMMU_V3 | struct iommu_nesting_info_arm_smmuv3 | + * ++--+ * */ struct iommu_nesting_info { @@ -466,6 +484,7 @@ struct iommu_nesting_info { /* Vendor specific data */ union { struct iommu_nesting_info_vtd vtd; + struct iommu_nesting_info_arm_smmuv3 smmuv3; } vendor; }; -- 2.17.1
[PATCH RFC v1 08/15] iommu: Add asid_bits to arm smmu-v3 stage1 table info
aisd_bits data is required to prepare stage-1 tables for arm-smmu-v3. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- include/uapi/linux/iommu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h index 082d758dd016..96abbfc7c643 100644 --- a/include/uapi/linux/iommu.h +++ b/include/uapi/linux/iommu.h @@ -357,7 +357,7 @@ struct iommu_pasid_smmuv3 { __u32 version; __u8s1fmt; __u8s1dss; - __u8padding[2]; + __u16 asid_bits; }; /** -- 2.17.1
[PATCH RFC v1 14/15] iommu/virtio: Add support for Arm LPAE page table format
From: Jean-Philippe Brucker When PASID isn't supported, we can still register one set of tables. Add support to register Arm LPAE based page table. Signed-off-by: Jean-Philippe Brucker [Vivek: Clean-ups to add right tcr definitions and accomodate with parent patches] Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/virtio-iommu.c | 131 +- include/uapi/linux/virtio_iommu.h | 30 +++ 2 files changed, 139 insertions(+), 22 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index b5222da1dc74..9cc3d35125e9 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -135,6 +135,13 @@ struct viommu_event { #define to_viommu_domain(domain) \ container_of(domain, struct viommu_domain, domain) +#define VIRTIO_FIELD_PREP(_mask, _shift, _val) \ + ({ \ + (((_val) << VIRTIO_IOMMU_PGTF_ARM_ ## _shift) & \ +(VIRTIO_IOMMU_PGTF_ARM_ ## _mask <<\ + VIRTIO_IOMMU_PGTF_ARM_ ## _shift)); \ + }) + static int viommu_get_req_errno(void *buf, size_t len) { struct virtio_iommu_req_tail *tail = buf + len - sizeof(*tail); @@ -897,6 +904,76 @@ static int viommu_simple_attach(struct viommu_domain *vdomain, return ret; } +static int viommu_config_arm_pgt(struct viommu_endpoint *vdev, +struct io_pgtable_cfg *cfg, +struct virtio_iommu_req_attach_pgt_arm *req, +u64 *asid) +{ + int id; + struct virtio_iommu_probe_table_format *pgtf = (void *)vdev->pgtf; + typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr; + u64 __tcr; + + if (pgtf->asid_bits != 8 && pgtf->asid_bits != 16) + return -EINVAL; + + id = ida_simple_get(&asid_ida, 1, 1 << pgtf->asid_bits, GFP_KERNEL); + if (id < 0) + return -ENOMEM; + + __tcr = VIRTIO_FIELD_PREP(T0SZ_MASK, T0SZ_SHIFT, tcr->tsz) | + VIRTIO_FIELD_PREP(IRGN0_MASK, IRGN0_SHIFT, tcr->irgn) | + VIRTIO_FIELD_PREP(ORGN0_MASK, ORGN0_SHIFT, tcr->orgn) | + VIRTIO_FIELD_PREP(SH0_MASK, SH0_SHIFT, tcr->sh) | + VIRTIO_FIELD_PREP(TG0_MASK, TG0_SHIFT, tcr->tg) | + VIRTIO_IOMMU_PGTF_ARM_EPD1 | VIRTIO_IOMMU_PGTF_ARM_HPD0 | + VIRTIO_IOMMU_PGTF_ARM_HPD1; + + req->format = cpu_to_le16(VIRTIO_IOMMU_FOMRAT_PGTF_ARM_LPAE); + req->ttbr = cpu_to_le64(cfg->arm_lpae_s1_cfg.ttbr); + req->tcr= cpu_to_le64(__tcr); + req->mair = cpu_to_le64(cfg->arm_lpae_s1_cfg.mair); + req->asid = cpu_to_le16(id); + + *asid = id; + return 0; +} + +static int viommu_attach_pgtable(struct viommu_endpoint *vdev, +struct viommu_domain *vdomain, +enum io_pgtable_fmt fmt, +struct io_pgtable_cfg *cfg, +u64 *asid) +{ + int ret; + int i, eid; + + struct virtio_iommu_req_attach_table req = { + .head.type = VIRTIO_IOMMU_T_ATTACH_TABLE, + .domain = cpu_to_le32(vdomain->id), + }; + + switch (fmt) { + case ARM_64_LPAE_S1: + ret = viommu_config_arm_pgt(vdev, cfg, (void *)&req, asid); + if (ret) + return ret; + break; + default: + WARN_ON(1); + return -EINVAL; + } + + vdev_for_each_id(i, eid, vdev) { + req.endpoint = cpu_to_le32(eid); + ret = viommu_send_req_sync(vdomain->viommu, &req, sizeof(req)); + if (ret) + return ret; + } + + return 0; +} + static int viommu_teardown_pgtable(struct viommu_domain *vdomain) { struct iommu_vendor_psdtable_cfg *pst_cfg; @@ -972,32 +1049,42 @@ static int viommu_setup_pgtable(struct viommu_endpoint *vdev, if (!ops) return -ENOMEM; - pst_cfg = &tbl->cfg; - cfgi = &pst_cfg->vendor.cfg; - id = ida_simple_get(&asid_ida, 1, 1 << desc->asid_bits, GFP_KERNEL); - if (id < 0) { - ret = id; - goto err_free_pgtable; - } + if (!tbl) { + /* No PASID support, send attach_table */ + ret = viommu_attach_pgtable(vdev, vdomain, fmt, &cfg,
[PATCH RFC v1 13/15] iommu/virtio: Attach Arm PASID tables when available
From: Jean-Philippe Brucker When the ARM PASID table format is reported in a probe, send an attach request and install the page tables for iommu_map/iommu_unmap use. Architecture-specific components are already abstracted to libraries. We just need to pass config bits around and setup an alternative mechanism to the mapping tree. We reuse the convention already adopted by other IOMMU architectures (ARM SMMU and AMD IOMMU), that entry 0 in the PASID table is reserved for non-PASID traffic. Bind the PASID table, and setup entry 0 to be modified with iommu_map/unmap. Signed-off-by: Jean-Philippe Brucker [Vivek: Bunch of refactoring and clean-ups to use iommu-pasid-table APIs, creating iommu_pasid_table, and configuring based on reported pasid format. Couple of additional methods have also been created to configure vendor specific pasid configuration] Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/virtio-iommu.c | 314 +++ 1 file changed, 314 insertions(+) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 004ea94e3731..b5222da1dc74 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -25,6 +25,7 @@ #include #include +#include "iommu-pasid-table.h" #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 @@ -33,6 +34,9 @@ #define VIOMMU_EVENT_VQ1 #define VIOMMU_NR_VQS 2 +/* Some architectures need an Address Space ID for each page table */ +static DEFINE_IDA(asid_ida); + struct viommu_dev { struct iommu_device iommu; struct device *dev; @@ -55,6 +59,7 @@ struct viommu_dev { u32 probe_size; boolhas_map:1; + boolhas_table:1; }; struct viommu_mapping { @@ -76,6 +81,7 @@ struct viommu_domain { struct mutexmutex; /* protects viommu pointer */ unsigned intid; u32 map_flags; + struct iommu_pasid_table*pasid_tbl; /* Default address space when a table is bound */ struct viommu_mmmm; @@ -891,6 +897,285 @@ static int viommu_simple_attach(struct viommu_domain *vdomain, return ret; } +static int viommu_teardown_pgtable(struct viommu_domain *vdomain) +{ + struct iommu_vendor_psdtable_cfg *pst_cfg; + struct arm_smmu_cfg_info *cfgi; + u32 asid; + + if (!vdomain->mm.ops) + return 0; + + free_io_pgtable_ops(vdomain->mm.ops); + vdomain->mm.ops = NULL; + + if (vdomain->pasid_tbl) { + pst_cfg = &vdomain->pasid_tbl->cfg; + cfgi = &pst_cfg->vendor.cfg; + asid = cfgi->s1_cfg->cd.asid; + + iommu_psdtable_write(vdomain->pasid_tbl, pst_cfg, 0, NULL); + ida_simple_remove(&asid_ida, asid); + } + + return 0; +} + +static int viommu_setup_pgtable(struct viommu_endpoint *vdev, + struct viommu_domain *vdomain) +{ + int ret, id; + u32 asid; + enum io_pgtable_fmt fmt; + struct io_pgtable_ops *ops = NULL; + struct viommu_dev *viommu = vdev->viommu; + struct virtio_iommu_probe_table_format *desc = vdev->pgtf; + struct iommu_pasid_table *tbl = vdomain->pasid_tbl; + struct iommu_vendor_psdtable_cfg *pst_cfg; + struct arm_smmu_cfg_info *cfgi; + struct io_pgtable_cfg cfg = { + .iommu_dev = viommu->dev->parent, + .tlb= &viommu_flush_ops, + .pgsize_bitmap = vdev->pgsize_mask ? vdev->pgsize_mask : + vdomain->domain.pgsize_bitmap, + .ias= (vdev->input_end ? ilog2(vdev->input_end) : + ilog2(vdomain->domain.geometry.aperture_end)) + 1, + .oas= vdev->output_bits, + }; + + if (!desc) + return -EINVAL; + + if (!vdev->output_bits) + return -ENODEV; + + switch (le16_to_cpu(desc->format)) { + case VIRTIO_IOMMU_FOMRAT_PGTF_ARM_LPAE: + fmt = ARM_64_LPAE_S1; + break; + default: + dev_err(vdev->dev, "unsupported page table format 0x%x\n", + le16_to_cpu(desc->format)); + return -EINVAL; + } + + if (vdomain->mm.ops) { + /
[PATCH RFC v1 15/15] iommu/virtio: Update fault type and reason info for viommu fault
Fault type information can tell about a page request fault or an unreceoverable fault, and further additions to fault reasons and the related PASID information can help in handling faults efficiently. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/virtio-iommu.c | 27 +-- include/uapi/linux/virtio_iommu.h | 13 - 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 9cc3d35125e9..10ef9e98214a 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -652,9 +652,16 @@ static int viommu_fault_handler(struct viommu_dev *viommu, char *reason_str; u8 reason = fault->reason; + u16 type= fault->flt_type; u32 flags = le32_to_cpu(fault->flags); u32 endpoint= le32_to_cpu(fault->endpoint); u64 address = le64_to_cpu(fault->address); + u32 pasid = le32_to_cpu(fault->pasid); + + if (type == VIRTIO_IOMMU_FAULT_F_PAGE_REQ) { + dev_info(viommu->dev, "Page request fault - unhandled\n"); + return 0; + } switch (reason) { case VIRTIO_IOMMU_FAULT_R_DOMAIN: @@ -663,6 +670,21 @@ static int viommu_fault_handler(struct viommu_dev *viommu, case VIRTIO_IOMMU_FAULT_R_MAPPING: reason_str = "page"; break; + case VIRTIO_IOMMU_FAULT_R_WALK_EABT: + reason_str = "page walk external abort"; + break; + case VIRTIO_IOMMU_FAULT_R_PTE_FETCH: + reason_str = "pte fetch"; + break; + case VIRTIO_IOMMU_FAULT_R_PERMISSION: + reason_str = "permission"; + break; + case VIRTIO_IOMMU_FAULT_R_ACCESS: + reason_str = "access"; + break; + case VIRTIO_IOMMU_FAULT_R_OOR_ADDRESS: + reason_str = "output address"; + break; case VIRTIO_IOMMU_FAULT_R_UNKNOWN: default: reason_str = "unknown"; @@ -671,8 +693,9 @@ static int viommu_fault_handler(struct viommu_dev *viommu, /* TODO: find EP by ID and report_iommu_fault */ if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS) - dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx [%s%s%s]\n", - reason_str, endpoint, address, + dev_err_ratelimited(viommu->dev, + "%s fault from EP %u PASID %u at %#llx [%s%s%s]\n", + reason_str, endpoint, pasid, address, flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : "", flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : "", flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : ""); diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h index 608c8d642e1f..a537d82777f7 100644 --- a/include/uapi/linux/virtio_iommu.h +++ b/include/uapi/linux/virtio_iommu.h @@ -290,19 +290,30 @@ struct virtio_iommu_req_invalidate { #define VIRTIO_IOMMU_FAULT_R_UNKNOWN 0 #define VIRTIO_IOMMU_FAULT_R_DOMAIN1 #define VIRTIO_IOMMU_FAULT_R_MAPPING 2 +#define VIRTIO_IOMMU_FAULT_R_WALK_EABT 3 +#define VIRTIO_IOMMU_FAULT_R_PTE_FETCH 4 +#define VIRTIO_IOMMU_FAULT_R_PERMISSION5 +#define VIRTIO_IOMMU_FAULT_R_ACCESS6 +#define VIRTIO_IOMMU_FAULT_R_OOR_ADDRESS 7 #define VIRTIO_IOMMU_FAULT_F_READ (1 << 0) #define VIRTIO_IOMMU_FAULT_F_WRITE (1 << 1) #define VIRTIO_IOMMU_FAULT_F_EXEC (1 << 2) #define VIRTIO_IOMMU_FAULT_F_ADDRESS (1 << 8) +#define VIRTIO_IOMMU_FAULT_F_DMA_UNRECOV 1 +#define VIRTIO_IOMMU_FAULT_F_PAGE_REQ 2 + struct virtio_iommu_fault { __u8reason; - __u8reserved[3]; + __le16 flt_type; + __u8reserved; __le32 flags; __le32 endpoint; __u8reserved2[4]; __le64 address; + __le32 pasid; + __u8reserved3[4]; }; #endif -- 2.17.1
[PATCH RFC v1 00/15] iommu/virtio: Nested stage support with Arm
This patch-series aims at enabling Nested stage translation in guests using virtio-iommu as the paravirtualized iommu. The backend is supported with Arm SMMU-v3 that provides nested stage-1 and stage-2 translation. This series derives its purpose from various efforts happening to add support for Shared Virtual Addressing (SVA) in host and guest. On Arm, most of the support for SVA has already landed. The support for nested stage translation and fault reporting to guest has been proposed [1]. The related changes required in VFIO [2] framework have also been put forward. This series proposes changes in virtio-iommu to program PASID tables and related stage-1 page tables. A simple iommu-pasid-table library is added for this purpose that interacts with vendor drivers to allocate and populate PASID tables. In Arm SMMUv3 we propose to pull the Context Descriptor (CD) management code out of the arm-smmu-v3 driver and add that as a glue vendor layer to support allocating CD tables, and populating them with right values. These CD tables are essentially the PASID tables and contain stage-1 page table configurations too. A request to setup these CD tables come from virtio-iommu driver using the iommu-pasid-table library when running on Arm. The virtio-iommu then pass these PASID tables to the host using the right virtio backend and support in VMM. For testing we have added necessary support in kvmtool. The changes in kvmtool are based on virtio-iommu development branch by Jean-Philippe Brucker [3]. The tested kernel branch contains following in the order bottom to top on the git hash - a) v5.11-rc3 b) arm-smmu-v3 [1] and vfio [2] changes from Eric to add nested page table support for Arm. c) Smmu test engine patches from Jean-Philippe's branch [4] d) This series e) Domain nesting info patches [5][6][7]. f) Changes to add arm-smmu-v3 specific nesting info (to be sent to the list). This kernel is tested on Neoverse reference software stack with Fixed virtual platform. Public version of the software stack and FVP is available here[8][9]. A big thanks to Jean-Philippe for his contributions towards this work and for his valuable guidance. [1] https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.au...@redhat.com/T/ [2] https://lore.kernel.org/kvmarm/20201116110030.32335-12-eric.au...@redhat.com/T/ [3] https://jpbrucker.net/git/kvmtool/log/?h=virtio-iommu/devel [4] https://jpbrucker.net/git/linux/log/?h=sva/smmute [5] https://lore.kernel.org/kvm/1599734733-6431-2-git-send-email-yi.l@intel.com/ [6] https://lore.kernel.org/kvm/1599734733-6431-3-git-send-email-yi.l@intel.com/ [7] https://lore.kernel.org/kvm/1599734733-6431-4-git-send-email-yi.l@intel.com/ [8] https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps [9] https://git.linaro.org/landing-teams/working/arm/arm-reference-platforms.git/about/docs/rdn1edge/user-guide.rst Jean-Philippe Brucker (6): iommu/virtio: Add headers for table format probing iommu/virtio: Add table format probing iommu/virtio: Add headers for binding pasid table in iommu iommu/virtio: Add support for INVALIDATE request iommu/virtio: Attach Arm PASID tables when available iommu/virtio: Add support for Arm LPAE page table format Vivek Gautam (9): iommu/arm-smmu-v3: Create a Context Descriptor library iommu: Add a simple PASID table library iommu/arm-smmu-v3: Update drivers to work with iommu-pasid-table iommu/arm-smmu-v3: Update CD base address info for user-space iommu/arm-smmu-v3: Set sync op from consumer driver of cd-lib iommu: Add asid_bits to arm smmu-v3 stage1 table info iommu/virtio: Update table format probing header iommu/virtio: Prepare to add attach pasid table infrastructure iommu/virtio: Update fault type and reason info for viommu fault drivers/iommu/arm/arm-smmu-v3/Makefile| 2 +- .../arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c | 283 +++ .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 16 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 268 +-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 +- drivers/iommu/iommu-pasid-table.h | 140 drivers/iommu/virtio-iommu.c | 692 +- include/uapi/linux/iommu.h| 2 +- include/uapi/linux/virtio_iommu.h | 158 +++- 9 files changed, 1303 insertions(+), 262 deletions(-) create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c create mode 100644 drivers/iommu/iommu-pasid-table.h -- 2.17.1
[PATCH RFC v1 11/15] iommu/virtio: Add headers for binding pasid table in iommu
From: Jean-Philippe Brucker Add the required UAPI defines for binding pasid tables in virtio-iommu. This mode allows to hand stage-1 page tables over to the guest. Signed-off-by: Jean-Philippe Brucker [Vivek: Refactor to cleanup headers for invalidation] Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- include/uapi/linux/virtio_iommu.h | 68 +++ 1 file changed, 68 insertions(+) diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h index 8a0624bab4b2..3481e4a3dd24 100644 --- a/include/uapi/linux/virtio_iommu.h +++ b/include/uapi/linux/virtio_iommu.h @@ -16,6 +16,7 @@ #define VIRTIO_IOMMU_F_BYPASS 3 #define VIRTIO_IOMMU_F_PROBE 4 #define VIRTIO_IOMMU_F_MMIO5 +#define VIRTIO_IOMMU_F_ATTACH_TABLE6 struct virtio_iommu_range_64 { __le64 start; @@ -44,6 +45,8 @@ struct virtio_iommu_config { #define VIRTIO_IOMMU_T_MAP 0x03 #define VIRTIO_IOMMU_T_UNMAP 0x04 #define VIRTIO_IOMMU_T_PROBE 0x05 +#define VIRTIO_IOMMU_T_ATTACH_TABLE0x06 +#define VIRTIO_IOMMU_T_INVALIDATE 0x07 /* Status types */ #define VIRTIO_IOMMU_S_OK 0x00 @@ -82,6 +85,37 @@ struct virtio_iommu_req_detach { struct virtio_iommu_req_tailtail; }; +struct virtio_iommu_req_attach_table { + struct virtio_iommu_req_headhead; + __le32 domain; + __le32 endpoint; + __le16 format; + __u8reserved[62]; + struct virtio_iommu_req_tailtail; +}; + +#define VIRTIO_IOMMU_PSTF_ARM_SMMU_V3_LINEAR 0x0 +#define VIRTIO_IOMMU_PSTF_ARM_SMMU_V3_4KL2 0x1 +#define VIRTIO_IOMMU_PSTF_ARM_SMMU_V3_64KL20x2 + +#define VIRTIO_IOMMU_PSTF_ARM_SMMU_V3_DSS_TERM 0x0 +#define VIRTIO_IOMMU_PSTF_ARM_SMMU_V3_DSS_BYPASS 0x1 +#define VIRTIO_IOMMU_PSTF_ARM_SMMU_V3_DSS_00x2 + +/* Arm SMMUv3 PASID Table Descriptor */ +struct virtio_iommu_req_attach_pst_arm { + struct virtio_iommu_req_headhead; + __le32 domain; + __le32 endpoint; + __le16 format; + __u8s1fmt; + __u8s1dss; + __le64 s1contextptr; + __le32 s1cdmax; + __u8reserved[48]; + struct virtio_iommu_req_tailtail; +}; + #define VIRTIO_IOMMU_MAP_F_READ(1 << 0) #define VIRTIO_IOMMU_MAP_F_WRITE (1 << 1) #define VIRTIO_IOMMU_MAP_F_MMIO(1 << 2) @@ -188,6 +222,40 @@ struct virtio_iommu_req_probe { */ }; +#define VIRTIO_IOMMU_INVAL_G_DOMAIN(1 << 0) +#define VIRTIO_IOMMU_INVAL_G_PASID (1 << 1) +#define VIRTIO_IOMMU_INVAL_G_VA(1 << 2) + +#define VIRTIO_IOMMU_INV_T_IOTLB (1 << 0) +#define VIRTIO_IOMMU_INV_T_DEV_IOTLB (1 << 1) +#define VIRTIO_IOMMU_INV_T_PASID (1 << 2) + +#define VIRTIO_IOMMU_INVAL_F_PASID (1 << 0) +#define VIRTIO_IOMMU_INVAL_F_ARCHID(1 << 1) +#define VIRTIO_IOMMU_INVAL_F_LEAF (1 << 2) + +struct virtio_iommu_req_invalidate { + struct virtio_iommu_req_headhead; + __le16 inv_gran; + __le16 inv_type; + + __le16 flags; + __u8reserved1[2]; + __le32 domain; + + __le32 pasid; + __u8reserved2[4]; + + __le64 archid; + __le64 virt_start; + __le64 nr_pages; + + /* Page size, in nr of bits, typically 12 for 4k, 30 for 2MB, etc.) */ + __u8granule; + __u8reserved3[11]; + struct virtio_iommu_req_tailtail; +}; + /* Fault types */ #define VIRTIO_IOMMU_FAULT_R_UNKNOWN 0 #define VIRTIO_IOMMU_FAULT_R_DOMAIN1 -- 2.17.1
[PATCH RFC v1 12/15] iommu/virtio: Add support for INVALIDATE request
From: Jean-Philippe Brucker Add support for tlb invalidation ops that can send invalidation requests to back-end virtio-iommu when stage-1 page tables are supported. Signed-off-by: Jean-Philippe Brucker [Vivek: Refactoring the iommu_flush_ops, and adding only one pasid sync op that's needed with current iommu-pasid-table infrastructure. Also updating uapi defines as required by latest changes] Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/virtio-iommu.c | 95 1 file changed, 95 insertions(+) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index ae5dfd3f8269..004ea94e3731 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -63,6 +64,8 @@ struct viommu_mapping { }; struct viommu_mm { + int pasid; + u64 archid; struct io_pgtable_ops *ops; struct viommu_domain*domain; }; @@ -692,6 +695,98 @@ static void viommu_event_handler(struct virtqueue *vq) virtqueue_kick(vq); } +/* PASID and pgtable APIs */ + +static void __viommu_flush_pasid_tlb_all(struct viommu_domain *vdomain, +int pasid, u64 arch_id, int type) +{ + struct virtio_iommu_req_invalidate req = { + .head.type = VIRTIO_IOMMU_T_INVALIDATE, + .inv_gran = cpu_to_le32(VIRTIO_IOMMU_INVAL_G_PASID), + .flags = cpu_to_le32(VIRTIO_IOMMU_INVAL_F_PASID), + .inv_type = cpu_to_le32(type), + + .domain = cpu_to_le32(vdomain->id), + .pasid = cpu_to_le32(pasid), + .archid = cpu_to_le64(arch_id), + }; + + if (viommu_send_req_sync(vdomain->viommu, &req, sizeof(req))) + pr_debug("could not send invalidate request\n"); +} + +static void viommu_flush_tlb_add(struct iommu_iotlb_gather *gather, +unsigned long iova, size_t granule, +void *cookie) +{ + struct viommu_mm *viommu_mm = cookie; + struct viommu_domain *vdomain = viommu_mm->domain; + struct iommu_domain *domain = &vdomain->domain; + + iommu_iotlb_gather_add_page(domain, gather, iova, granule); +} + +static void viommu_flush_tlb_walk(unsigned long iova, size_t size, + size_t granule, void *cookie) +{ + struct viommu_mm *viommu_mm = cookie; + struct viommu_domain *vdomain = viommu_mm->domain; + struct virtio_iommu_req_invalidate req = { + .head.type = VIRTIO_IOMMU_T_INVALIDATE, + .inv_gran = cpu_to_le32(VIRTIO_IOMMU_INVAL_G_VA), + .inv_type = cpu_to_le32(VIRTIO_IOMMU_INV_T_IOTLB), + .flags = cpu_to_le32(VIRTIO_IOMMU_INVAL_F_ARCHID), + + .domain = cpu_to_le32(vdomain->id), + .pasid = cpu_to_le32(viommu_mm->pasid), + .archid = cpu_to_le64(viommu_mm->archid), + .virt_start = cpu_to_le64(iova), + .nr_pages = cpu_to_le64(size / granule), + .granule= ilog2(granule), + }; + + if (viommu_add_req(vdomain->viommu, &req, sizeof(req))) + pr_debug("could not add invalidate request\n"); +} + +static void viommu_flush_tlb_all(void *cookie) +{ + struct viommu_mm *viommu_mm = cookie; + + if (!viommu_mm->archid) + return; + + __viommu_flush_pasid_tlb_all(viommu_mm->domain, viommu_mm->pasid, +viommu_mm->archid, +VIRTIO_IOMMU_INV_T_IOTLB); +} + +static struct iommu_flush_ops viommu_flush_ops = { + .tlb_flush_all = viommu_flush_tlb_all, + .tlb_flush_walk = viommu_flush_tlb_walk, + .tlb_add_page = viommu_flush_tlb_add, +}; + +static void viommu_flush_pasid(void *cookie, int pasid, bool leaf) +{ + struct viommu_domain *vdomain = cookie; + struct virtio_iommu_req_invalidate req = { + .head.type = VIRTIO_IOMMU_T_INVALIDATE, + .inv_gran = cpu_to_le32(VIRTIO_IOMMU_INVAL_G_PASID), + .inv_type = cpu_to_le32(VIRTIO_IOMMU_INV_T_PASID), + .flags = cpu_to_le32(VIRTIO_IOMMU_INVAL_F_PASID), + + .domain = cpu_to_le32(vdomain->id), + .pasid
[PATCH RFC v1 10/15] iommu/virtio: Prepare to add attach pasid table infrastructure
In preparation to add attach pasid table op, separate out the existing attach request code to a separate method. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/virtio-iommu.c | 73 +--- 1 file changed, 51 insertions(+), 22 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 12d73321dbf4..ae5dfd3f8269 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -52,6 +52,8 @@ struct viommu_dev { /* Supported MAP flags */ u32 map_flags; u32 probe_size; + + boolhas_map:1; }; struct viommu_mapping { @@ -60,6 +62,11 @@ struct viommu_mapping { u32 flags; }; +struct viommu_mm { + struct io_pgtable_ops *ops; + struct viommu_domain*domain; +}; + struct viommu_domain { struct iommu_domain domain; struct viommu_dev *viommu; @@ -67,12 +74,20 @@ struct viommu_domain { unsigned intid; u32 map_flags; + /* Default address space when a table is bound */ + struct viommu_mmmm; + + /* When no table is bound, use generic mappings */ spinlock_t mappings_lock; struct rb_root_cached mappings; unsigned long nr_endpoints; }; +#define vdev_for_each_id(i, eid, vdev) \ + for (i = 0; i < vdev->dev->iommu->fwspec->num_ids &&\ + ({ eid = vdev->dev->iommu->fwspec->ids[i]; 1; }); i++) + struct viommu_endpoint { struct device *dev; struct viommu_dev *viommu; @@ -750,12 +765,40 @@ static void viommu_domain_free(struct iommu_domain *domain) kfree(vdomain); } +static int viommu_simple_attach(struct viommu_domain *vdomain, + struct viommu_endpoint *vdev) +{ + int i, eid, ret; + struct virtio_iommu_req_attach req = { + .head.type = VIRTIO_IOMMU_T_ATTACH, + .domain = cpu_to_le32(vdomain->id), + }; + + if (!vdomain->viommu->has_map) + return -ENODEV; + + vdev_for_each_id(i, eid, vdev) { + req.endpoint = cpu_to_le32(eid); + + ret = viommu_send_req_sync(vdomain->viommu, &req, sizeof(req)); + if (ret) + return ret; + } + + if (!vdomain->nr_endpoints) { + /* +* This endpoint is the first to be attached to the domain. +* Replay existing mappings if any (e.g. SW MSI). +*/ + ret = viommu_replay_mappings(vdomain); + } + + return ret; +} + static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev) { - int i; int ret = 0; - struct virtio_iommu_req_attach req; - struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct viommu_endpoint *vdev = dev_iommu_priv_get(dev); struct viommu_domain *vdomain = to_viommu_domain(domain); @@ -790,25 +833,9 @@ static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev) if (vdev->vdomain) vdev->vdomain->nr_endpoints--; - req = (struct virtio_iommu_req_attach) { - .head.type = VIRTIO_IOMMU_T_ATTACH, - .domain = cpu_to_le32(vdomain->id), - }; - - for (i = 0; i < fwspec->num_ids; i++) { - req.endpoint = cpu_to_le32(fwspec->ids[i]); - - ret = viommu_send_req_sync(vdomain->viommu, &req, sizeof(req)); - if (ret) - return ret; - } - - if (!vdomain->nr_endpoints) { - /* -* This endpoint is the first to be attached to the domain. -* Replay existing mappings (e.g. SW MSI). -*/ - ret = viommu_replay_mappings(vdomain); + if (!vdomain->mm.ops) { + /* If we couldn't bind any table, use the mapping tree */ + ret = viommu_simple_attach(vdomain, vdev); if (ret) return ret; } @@ -1142,6 +1169,8 @@ static int viommu_probe(struct virtio_device *vdev) struct virtio_iommu_config, probe_size, &viommu->probe_size); + viommu->has_map = virtio_
[PATCH RFC v1 09/15] iommu/virtio: Update table format probing header
Add info about asid_bits and additional flags to table format probing header. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- include/uapi/linux/virtio_iommu.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h index 43821e33e7af..8a0624bab4b2 100644 --- a/include/uapi/linux/virtio_iommu.h +++ b/include/uapi/linux/virtio_iommu.h @@ -169,7 +169,10 @@ struct virtio_iommu_probe_pasid_size { struct virtio_iommu_probe_table_format { struct virtio_iommu_probe_property head; __le16 format; - __u8reserved[2]; + __le16 asid_bits; + + __le32 flags; + __u8reserved[4]; }; struct virtio_iommu_req_probe { -- 2.17.1
[PATCH RFC v1 06/15] iommu/virtio: Add headers for table format probing
From: Jean-Philippe Brucker Add required UAPI defines for probing table format for underlying iommu hardware. The device may provide information about hardware tables and additional capabilities for each device. This allows guest to correctly fabricate stage-1 page tables. Signed-off-by: Jean-Philippe Brucker [Vivek: Use a single "struct virtio_iommu_probe_table_format" rather than separate structures for page table and pasid table format. Also update commit message.] Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- include/uapi/linux/virtio_iommu.h | 44 ++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h index 237e36a280cb..43821e33e7af 100644 --- a/include/uapi/linux/virtio_iommu.h +++ b/include/uapi/linux/virtio_iommu.h @@ -2,7 +2,7 @@ /* * Virtio-iommu definition v0.12 * - * Copyright (C) 2019 Arm Ltd. + * Copyright (C) 2019-2021 Arm Ltd. */ #ifndef _UAPI_LINUX_VIRTIO_IOMMU_H #define _UAPI_LINUX_VIRTIO_IOMMU_H @@ -111,6 +111,12 @@ struct virtio_iommu_req_unmap { #define VIRTIO_IOMMU_PROBE_T_NONE 0 #define VIRTIO_IOMMU_PROBE_T_RESV_MEM 1 +#define VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK2 +#define VIRTIO_IOMMU_PROBE_T_INPUT_RANGE 3 +#define VIRTIO_IOMMU_PROBE_T_OUTPUT_SIZE 4 +#define VIRTIO_IOMMU_PROBE_T_PASID_SIZE5 +#define VIRTIO_IOMMU_PROBE_T_PAGE_TABLE_FMT6 +#define VIRTIO_IOMMU_PROBE_T_PASID_TABLE_FMT 7 #define VIRTIO_IOMMU_PROBE_T_MASK 0xfff @@ -130,6 +136,42 @@ struct virtio_iommu_probe_resv_mem { __le64 end; }; +struct virtio_iommu_probe_page_size_mask { + struct virtio_iommu_probe_property head; + __u8reserved[4]; + __le64 mask; +}; + +struct virtio_iommu_probe_input_range { + struct virtio_iommu_probe_property head; + __u8reserved[4]; + __le64 start; + __le64 end; +}; + +struct virtio_iommu_probe_output_size { + struct virtio_iommu_probe_property head; + __u8bits; + __u8reserved[3]; +}; + +struct virtio_iommu_probe_pasid_size { + struct virtio_iommu_probe_property head; + __u8bits; + __u8reserved[3]; +}; + +/* Arm LPAE page table format */ +#define VIRTIO_IOMMU_FOMRAT_PGTF_ARM_LPAE 1 +/* Arm smmu-v3 type PASID table format */ +#define VIRTIO_IOMMU_FORMAT_PSTF_ARM_SMMU_V3 2 + +struct virtio_iommu_probe_table_format { + struct virtio_iommu_probe_property head; + __le16 format; + __u8reserved[2]; +}; + struct virtio_iommu_req_probe { struct virtio_iommu_req_headhead; __le32 endpoint; -- 2.17.1
[PATCH RFC v1 05/15] iommu/arm-smmu-v3: Set sync op from consumer driver of cd-lib
Te change allows different consumers of arm-smmu-v3-cd-lib to set their respective sync op for pasid entries. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c | 1 - drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c| 7 +++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c index ec37476c8d09..acaa09acecdd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c @@ -265,7 +265,6 @@ struct iommu_vendor_psdtable_ops arm_cd_table_ops = { .free= arm_smmu_free_cd_tables, .prepare = arm_smmu_prepare_cd, .write = arm_smmu_write_ctx_desc, - .sync= arm_smmu_sync_cd, }; struct iommu_pasid_table *arm_smmu_register_cd_table(struct device *dev, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 2f86c6ac42b6..0c644be22b4b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1869,6 +1869,13 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain, if (ret) goto out_free_cd_tables; + /* +* Strange to setup an op here? +* cd-lib is the actual user of sync op, and therefore the platform +* drivers should assign this sync/maintenance ops as per need. +*/ + tbl->ops->sync = arm_smmu_sync_cd; + /* * Note that this will end up calling arm_smmu_sync_cd() before * the master has been added to the devices list for this domain. -- 2.17.1
[PATCH RFC v1 07/15] iommu/virtio: Add table format probing
From: Jean-Philippe Brucker The device may provide information about hardware tables and additional capabilities for each device. Parse the new probe fields. Signed-off-by: Jean-Philippe Brucker [Vivek: Refactor to use "struct virtio_iommu_probe_table_format" rather than separate structures for page table and pasid table format.] Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/virtio-iommu.c | 102 ++- 1 file changed, 101 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 2bfdd5734844..12d73321dbf4 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -78,6 +78,17 @@ struct viommu_endpoint { struct viommu_dev *viommu; struct viommu_domain*vdomain; struct list_headresv_regions; + + /* properties of the physical IOMMU */ + u64 pgsize_mask; + u64 input_start; + u64 input_end; + u8 output_bits; + u8 pasid_bits; + /* Preferred PASID table format */ + void*pstf; + /* Preferred page table format */ + void*pgtf; }; struct viommu_request { @@ -457,6 +468,72 @@ static int viommu_add_resv_mem(struct viommu_endpoint *vdev, return 0; } +static int viommu_add_pgsize_mask(struct viommu_endpoint *vdev, + struct virtio_iommu_probe_page_size_mask *prop, + size_t len) +{ + if (len < sizeof(*prop)) + return -EINVAL; + vdev->pgsize_mask = le64_to_cpu(prop->mask); + return 0; +} + +static int viommu_add_input_range(struct viommu_endpoint *vdev, + struct virtio_iommu_probe_input_range *prop, + size_t len) +{ + if (len < sizeof(*prop)) + return -EINVAL; + vdev->input_start = le64_to_cpu(prop->start); + vdev->input_end = le64_to_cpu(prop->end); + return 0; +} + +static int viommu_add_output_size(struct viommu_endpoint *vdev, + struct virtio_iommu_probe_output_size *prop, + size_t len) +{ + if (len < sizeof(*prop)) + return -EINVAL; + vdev->output_bits = prop->bits; + return 0; +} + +static int viommu_add_pasid_size(struct viommu_endpoint *vdev, +struct virtio_iommu_probe_pasid_size *prop, +size_t len) +{ + if (len < sizeof(*prop)) + return -EINVAL; + vdev->pasid_bits = prop->bits; + return 0; +} + +static int viommu_add_pgtf(struct viommu_endpoint *vdev, void *pgtf, size_t len) +{ + /* Select the first page table format available */ + if (len < sizeof(struct virtio_iommu_probe_table_format) || vdev->pgtf) + return -EINVAL; + + vdev->pgtf = kmemdup(pgtf, len, GFP_KERNEL); + if (!vdev->pgtf) + return -ENOMEM; + + return 0; +} + +static int viommu_add_pstf(struct viommu_endpoint *vdev, void *pstf, size_t len) +{ + if (len < sizeof(struct virtio_iommu_probe_table_format) || vdev->pstf) + return -EINVAL; + + vdev->pstf = kmemdup(pstf, len, GFP_KERNEL); + if (!vdev->pstf) + return -ENOMEM; + + return 0; +} + static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev) { int ret; @@ -493,11 +570,30 @@ static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev) while (type != VIRTIO_IOMMU_PROBE_T_NONE && cur < viommu->probe_size) { + void *value = prop; len = le16_to_cpu(prop->length) + sizeof(*prop); switch (type) { case VIRTIO_IOMMU_PROBE_T_RESV_MEM: - ret = viommu_add_resv_mem(vdev, (void *)prop, len); + ret = viommu_add_resv_mem(vdev, value, len); + break; + case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK: + ret = viommu_add_pgsize_mask(vdev, value, len); + break; + case VIRTIO_IOMMU_PROBE_T_INPUT_RANGE: + ret = viommu_add_input_range(vdev, value, len); + break; + case VIRTIO_IOMMU_PROBE_T_OUTPUT_SIZE: +
[PATCH RFC v1 03/15] iommu/arm-smmu-v3: Update drivers to work with iommu-pasid-table
Update arm-smmu-v3 context descriptor (CD) library driver to work with iommu-pasid-table APIs. These APIs are then used in arm-smmu-v3 drivers to manage CD tables. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- .../arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c | 127 +- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 16 ++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 7 +- drivers/iommu/iommu-pasid-table.h | 10 +- 5 files changed, 144 insertions(+), 63 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c index 97d1786a8a70..8a7187534706 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c @@ -8,17 +8,17 @@ #include #include "arm-smmu-v3.h" +#include "../../iommu-pasid-table.h" -static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu, +static int arm_smmu_alloc_cd_leaf_table(struct device *dev, struct arm_smmu_l1_ctx_desc *l1_desc) { size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3); - l1_desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, + l1_desc->l2ptr = dmam_alloc_coherent(dev, size, &l1_desc->l2ptr_dma, GFP_KERNEL); if (!l1_desc->l2ptr) { - dev_warn(smmu->dev, -"failed to allocate context descriptor table\n"); + dev_warn(dev, "failed to allocate context descriptor table\n"); return -ENOMEM; } return 0; @@ -34,35 +34,39 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst, WRITE_ONCE(*dst, cpu_to_le64(val)); } -static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain, +static __le64 *arm_smmu_get_cd_ptr(struct iommu_vendor_psdtable_cfg *pst_cfg, u32 ssid) { __le64 *l1ptr; unsigned int idx; + struct device *dev = pst_cfg->iommu_dev; + struct arm_smmu_cfg_info *cfgi = &pst_cfg->vendor.cfg; + struct arm_smmu_s1_cfg *s1cfg = cfgi->s1_cfg; + struct arm_smmu_ctx_desc_cfg *cdcfg = &s1cfg->cdcfg; struct arm_smmu_l1_ctx_desc *l1_desc; - struct arm_smmu_device *smmu = smmu_domain->smmu; - struct arm_smmu_ctx_desc_cfg *cdcfg = &smmu_domain->s1_cfg.cdcfg; + struct iommu_pasid_table *tbl = pasid_table_cfg_to_table(pst_cfg); - if (smmu_domain->s1_cfg.s1fmt == STRTAB_STE_0_S1FMT_LINEAR) + if (s1cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) return cdcfg->cdtab + ssid * CTXDESC_CD_DWORDS; idx = ssid >> CTXDESC_SPLIT; l1_desc = &cdcfg->l1_desc[idx]; if (!l1_desc->l2ptr) { - if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc)) + if (arm_smmu_alloc_cd_leaf_table(dev, l1_desc)) return NULL; l1ptr = cdcfg->cdtab + idx * CTXDESC_L1_DESC_DWORDS; arm_smmu_write_cd_l1_desc(l1ptr, l1_desc); /* An invalid L1CD can be cached */ - arm_smmu_sync_cd(smmu_domain, ssid, false); + if (iommu_psdtable_sync(tbl, tbl->cookie, ssid, false)) + return NULL; } idx = ssid & (CTXDESC_L2_ENTRIES - 1); return l1_desc->l2ptr + idx * CTXDESC_CD_DWORDS; } -int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, - struct arm_smmu_ctx_desc *cd) +static int arm_smmu_write_ctx_desc(struct iommu_vendor_psdtable_cfg *pst_cfg, + int ssid, void *cookie) { /* * This function handles the following cases: @@ -78,12 +82,15 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, u64 val; bool cd_live; __le64 *cdptr; - struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_cfg_info *cfgi = &pst_cfg->vendor.cfg; + struct arm_smmu_s1_cfg *s1cfg = cfgi->s1_cfg; + struct iommu_pasid_table *tbl = pasid_table_cfg_to_table(pst_cfg); + struct arm_smmu_ctx_desc *cd = cookie; - if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg.s1cdmax))) + if (WARN_ON(ssid >= (1 << s1cfg->s1cdmax))) return -E2BIG; - cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid); + cdptr = arm_smmu_get_cd_ptr(pst_cfg, ssid); if (!cdptr) return -ENOMEM; @@ -111,7 +118,8 @@ int
[PATCH RFC v1 04/15] iommu/arm-smmu-v3: Update CD base address info for user-space
Update base address information in vendor pasid table info to pass that to user-space for stage1 table management. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c index 8a7187534706..ec37476c8d09 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c @@ -55,6 +55,9 @@ static __le64 *arm_smmu_get_cd_ptr(struct iommu_vendor_psdtable_cfg *pst_cfg, if (arm_smmu_alloc_cd_leaf_table(dev, l1_desc)) return NULL; + if (s1cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) + pst_cfg->base = l1_desc->l2ptr_dma; + l1ptr = cdcfg->cdtab + idx * CTXDESC_L1_DESC_DWORDS; arm_smmu_write_cd_l1_desc(l1ptr, l1_desc); /* An invalid L1CD can be cached */ @@ -211,6 +214,9 @@ static int arm_smmu_alloc_cd_tables(struct iommu_vendor_psdtable_cfg *pst_cfg) goto err_free_l1; } + if (s1cfg->s1fmt == STRTAB_STE_0_S1FMT_64K_L2) + pst_cfg->base = cdcfg->cdtab_dma; + return 0; err_free_l1: -- 2.17.1
[PATCH RFC v1 02/15] iommu: Add a simple PASID table library
Add a small API in iommu subsystem to handle PASID table allocation requests from different consumer drivers, such as a paravirtualized iommu driver. The API provides ops for allocating and freeing PASID table, writing to it and managing the table caches. This library also provides for registering a vendor API that attaches to these ops. The vendor APIs would eventually perform arch level implementations for these PASID tables. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/iommu-pasid-table.h | 134 ++ 1 file changed, 134 insertions(+) create mode 100644 drivers/iommu/iommu-pasid-table.h diff --git a/drivers/iommu/iommu-pasid-table.h b/drivers/iommu/iommu-pasid-table.h new file mode 100644 index ..bd4f57656f67 --- /dev/null +++ b/drivers/iommu/iommu-pasid-table.h @@ -0,0 +1,134 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * PASID table management for the IOMMU + * + * Copyright (C) 2021 Arm Ltd. + */ +#ifndef __IOMMU_PASID_TABLE_H +#define __IOMMU_PASID_TABLE_H + +#include + +#include "arm/arm-smmu-v3/arm-smmu-v3.h" + +enum pasid_table_fmt { + PASID_TABLE_ARM_SMMU_V3, + PASID_TABLE_NUM_FMTS, +}; + +/** + * struct arm_smmu_cfg_info - arm-smmu-v3 specific configuration data + * + * @s1_cfg: arm-smmu-v3 stage1 config data + * @feat_flag: features supported by arm-smmu-v3 implementation + */ +struct arm_smmu_cfg_info { + struct arm_smmu_s1_cfg *s1_cfg; + u32 feat_flag; +}; + +/** + * struct iommu_vendor_psdtable_cfg - Configuration data for PASID tables + * + * @iommu_dev: device performing the DMA table walks + * @fmt: The PASID table format + * @base: DMA address of the allocated table, set by the vendor driver + * @cfg: arm-smmu-v3 specific config data + */ +struct iommu_vendor_psdtable_cfg { + struct device *iommu_dev; + enum pasid_table_fmtfmt; + dma_addr_t base; + union { + struct arm_smmu_cfg_infocfg; + } vendor; +}; + +struct iommu_vendor_psdtable_ops; + +/** + * struct iommu_pasid_table - describes a set of PASID tables + * + * @cookie: An opaque token provided by the IOMMU driver and passed back to any + * callback routine. + * @cfg: A copy of the PASID table configuration + * @ops: The PASID table operations in use for this set of page tables + */ +struct iommu_pasid_table { + void*cookie; + struct iommu_vendor_psdtable_cfgcfg; + struct iommu_vendor_psdtable_ops*ops; +}; + +#define pasid_table_cfg_to_table(pst_cfg) \ + container_of((pst_cfg), struct iommu_pasid_table, cfg) + +struct iommu_vendor_psdtable_ops { + int (*alloc)(struct iommu_vendor_psdtable_cfg *cfg); + void (*free)(struct iommu_vendor_psdtable_cfg *cfg); + void (*prepare)(struct iommu_vendor_psdtable_cfg *cfg, + struct io_pgtable_cfg *pgtbl_cfg, u32 asid); + int (*write)(struct iommu_vendor_psdtable_cfg *cfg, int ssid, +void *cookie); + void (*sync)(void *cookie, int ssid, bool leaf); +}; + +static inline int iommu_psdtable_alloc(struct iommu_pasid_table *tbl, + struct iommu_vendor_psdtable_cfg *cfg) +{ + if (!tbl->ops->alloc) + return -ENOSYS; + + return tbl->ops->alloc(cfg); +} + +static inline void iommu_psdtable_free(struct iommu_pasid_table *tbl, + struct iommu_vendor_psdtable_cfg *cfg) +{ + if (!tbl->ops->free) + return; + + tbl->ops->free(cfg); +} + +static inline int iommu_psdtable_prepare(struct iommu_pasid_table *tbl, +struct iommu_vendor_psdtable_cfg *cfg, +struct io_pgtable_cfg *pgtbl_cfg, +u32 asid) +{ + if (!tbl->ops->prepare) + return -ENOSYS; + + tbl->ops->prepare(cfg, pgtbl_cfg, asid); + return 0; +} + +static inline int iommu_psdtable_write(struct iommu_pasid_table *tbl, + struct iommu_vendor_psdtable_cfg *cfg, + int ssid, void *cookie) +{ + if (!tbl->ops->write) + return -ENOSYS; + + return tbl->ops->write(cfg, ssid, cookie); +} + +static inline int iommu_psdtable_sync(struct iommu_pasid_table *tbl, + void *cookie, int ssid, bool leaf) +{ + if (!tbl->ops->sync) + return -ENOSYS; + + tbl->ops->sync(cookie, ssid, leaf); + return 0; +} + +/* A placeholder to register
[PATCH RFC v1 01/15] iommu/arm-smmu-v3: Create a Context Descriptor library
Para-virtualized iommu drivers in guest may require to create and manage context descriptor (CD) tables as part of PASID table allocations. The PASID tables are passed to host to configure stage-1 tables in hardware. Make way for a library driver for CD management to allow para- virtualized iommu driver call such code. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/arm/arm-smmu-v3/Makefile| 2 +- .../arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c | 223 ++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 216 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 + 4 files changed, 228 insertions(+), 216 deletions(-) create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile index 54feb1ecccad..ca1a05b8b8ad 100644 --- a/drivers/iommu/arm/arm-smmu-v3/Makefile +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o -arm_smmu_v3-objs-y += arm-smmu-v3.o +arm_smmu_v3-objs-y += arm-smmu-v3.o arm-smmu-v3-cd-lib.o arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o arm_smmu_v3-objs := $(arm_smmu_v3-objs-y) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c new file mode 100644 index ..97d1786a8a70 --- /dev/null +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c @@ -0,0 +1,223 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * arm-smmu-v3 context descriptor handling library driver + * + * Copyright (C) 2021 Arm Ltd. + */ + +#include + +#include "arm-smmu-v3.h" + +static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu, + struct arm_smmu_l1_ctx_desc *l1_desc) +{ + size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3); + + l1_desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, +&l1_desc->l2ptr_dma, GFP_KERNEL); + if (!l1_desc->l2ptr) { + dev_warn(smmu->dev, +"failed to allocate context descriptor table\n"); + return -ENOMEM; + } + return 0; +} + +static void arm_smmu_write_cd_l1_desc(__le64 *dst, + struct arm_smmu_l1_ctx_desc *l1_desc) +{ + u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | + CTXDESC_L1_DESC_V; + + /* See comment in arm_smmu_write_ctx_desc() */ + WRITE_ONCE(*dst, cpu_to_le64(val)); +} + +static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain, + u32 ssid) +{ + __le64 *l1ptr; + unsigned int idx; + struct arm_smmu_l1_ctx_desc *l1_desc; + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_ctx_desc_cfg *cdcfg = &smmu_domain->s1_cfg.cdcfg; + + if (smmu_domain->s1_cfg.s1fmt == STRTAB_STE_0_S1FMT_LINEAR) + return cdcfg->cdtab + ssid * CTXDESC_CD_DWORDS; + + idx = ssid >> CTXDESC_SPLIT; + l1_desc = &cdcfg->l1_desc[idx]; + if (!l1_desc->l2ptr) { + if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc)) + return NULL; + + l1ptr = cdcfg->cdtab + idx * CTXDESC_L1_DESC_DWORDS; + arm_smmu_write_cd_l1_desc(l1ptr, l1_desc); + /* An invalid L1CD can be cached */ + arm_smmu_sync_cd(smmu_domain, ssid, false); + } + idx = ssid & (CTXDESC_L2_ENTRIES - 1); + return l1_desc->l2ptr + idx * CTXDESC_CD_DWORDS; +} + +int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, + struct arm_smmu_ctx_desc *cd) +{ + /* +* This function handles the following cases: +* +* (1) Install primary CD, for normal DMA traffic (SSID = 0). +* (2) Install a secondary CD, for SID+SSID traffic. +* (3) Update ASID of a CD. Atomically write the first 64 bits of the +* CD, then invalidate the old entry and mappings. +* (4) Quiesce the context without clearing the valid bit. Disable +* translation, and ignore any translation fault. +* (5) Remove a secondary CD. +*/ + u64 val; + bool cd_live; + __le64 *cdptr; + struct arm_smmu_device *smmu = smmu_domain->smmu; + + if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg.s1cdmax))) + return -E2BIG; + + cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid); + if (!cdptr) + return -ENOMEM;
Re: [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook
On Fri, Aug 23, 2019 at 12:03 PM Vivek Gautam wrote: > > Add reset hook for sdm845 based platforms to turn off > the wait-for-safe sequence. > > Understanding how wait-for-safe logic affects USB and UFS performance > on MTP845 and DB845 boards: > > Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic > to address under-performance issues in real-time clients, such as > Display, and Camera. > On receiving an invalidation requests, the SMMU forwards SAFE request > to these clients and waits for SAFE ack signal from real-time clients. > The SAFE signal from such clients is used to qualify the start of > invalidation. > This logic is controlled by chicken bits, one for each - MDP (display), > IFE0, and IFE1 (camera), that can be accessed only from secure software > on sdm845. > > This configuration, however, degrades the performance of non-real time > clients, such as USB, and UFS etc. This happens because, with wait-for-safe > logic enabled the hardware tries to throttle non-real time clients while > waiting for SAFE ack signals from real-time clients. > > On mtp845 and db845 devices, with wait-for-safe logic enabled by the > bootloaders we see degraded performance of USB and UFS when kernel > enables the smmu stage-1 translations for these clients. > Turn off this wait-for-safe logic from the kernel gets us back the perf > of USB and UFS devices until we re-visit this when we start seeing perf > issues on display/camera on upstream supported SDM845 platforms. > The bootloaders on these boards implement secure monitor callbacks to > handle a specific command - QCOM_SCM_SVC_SMMU_PROGRAM with which the > logic can be toggled. > > There are other boards such as cheza whose bootloaders don't enable this > logic. Such boards don't implement callbacks to handle the specific SCM > call so disabling this logic for such boards will be a no-op. > > This change is inspired by the downstream change from Patrick Daly > to address performance issues with display and camera by handling > this wait-for-safe within separte io-pagetable ops to do TLB > maintenance. So a big thanks to him for the change and for all the > offline discussions. > > Without this change the UFS reads are pretty slow: > $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync > 10+0 records in > 10+0 records out > 10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s > real0m 22.39s > user0m 0.00s > sys 0m 0.01s > > With this change they are back to rock! > $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync > 300+0 records in > 300+0 records out > 314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s > real0m 1.03s > user0m 0.00s > sys 0m 0.54s > > Signed-off-by: Vivek Gautam > --- > drivers/iommu/arm-smmu-impl.c | 27 ++- > 1 file changed, 26 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c > index 3f88cd078dd5..0aef87c41f9c 100644 > --- a/drivers/iommu/arm-smmu-impl.c > +++ b/drivers/iommu/arm-smmu-impl.c > @@ -6,6 +6,7 @@ > > #include > #include > +#include > > #include "arm-smmu.h" > > @@ -102,7 +103,6 @@ static struct arm_smmu_device > *cavium_smmu_impl_init(struct arm_smmu_device *smm > return &cs->smmu; > } > > - > #define ARM_MMU500_ACTLR_CPRE (1 << 1) > > #define ARM_MMU500_ACR_CACHE_LOCK (1 << 26) > @@ -147,6 +147,28 @@ static const struct arm_smmu_impl arm_mmu500_impl = { > .reset = arm_mmu500_reset, > }; > > +static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu) > +{ > + int ret; > + > + arm_mmu500_reset(smmu); > + > + /* > +* To address performance degradation in non-real time clients, > +* such as USB and UFS, turn off wait-for-safe on sdm845 based boards, > +* such as MTP and db845, whose firmwares implement secure monitor > +* call handlers to turn on/off the wait-for-safe logic. > +*/ > + ret = qcom_scm_qsmmu500_wait_safe_toggle(0); > + if (ret) > + dev_warn(smmu->dev, "Failed to turn off SAFE logic\n"); > + > + return 0; > +} > + > +const struct arm_smmu_impl qcom_sdm845_smmu500_impl = { > + .reset = qcom_sdm845_smmu500_reset, > +}; > > struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) > { > @@ -170,5 +192,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct > arm_smmu_device *smmu) > "calxeda,smmu-secure-config-access")) > smmu-&g
Re: [PATCH v2 0/3] soc: qcom: llcc cleanups
On Wed, Sep 4, 2019 at 10:13 AM Bjorn Andersson wrote: > > On Tue 27 Aug 04:01 PDT 2019, Vivek Gautam wrote: > > > On Fri, Aug 2, 2019 at 11:43 AM Vivek Gautam > > wrote: > > > > > > On Thu, Jul 18, 2019 at 6:33 PM Vivek Gautam > > > wrote: > > > > > > > > To better support future versions of llcc, consolidating the > > > > driver to llcc-qcom driver file, and taking care of the dependencies. > > > > v1 series is availale at: > > > > https://lore.kernel.org/patchwork/patch/1099573/ > > > > > > > > Changes since v1: > > > > Addressing Bjorn's comments - > > > > * Not using llcc-plat as the platform driver rather using a single > > > >driver file now - llcc-qcom. > > > > * Removed SCT_ENTRY macro. > > > > * Moved few structure definitions from include/linux path to llcc-qcom > > > >driver as they are not exposed to other subsystems. > > > > > > Hi Bjorn, > > > > > > How does this cleanup look now? Let me know if there are any > > > improvements to make here. > > > > > > > Hi Bjorn, > > > > Are you planning to pull this series in the next merge window? > > There's a dt patch as well for llcc on sdm845 [1] that has been lying > > around. > > > > Let me know if you have concerns with this series. I will be happy to > > incorporate the suggestions. > > > > No concerns, this is exactly what we discussed before. Sorry for missing > it. I've picked the patches now. > > > [1] https://lore.kernel.org/patchwork/patch/1099318/ > > > > This is part of the v5.4 pull request. Thanks a lot Bjorn. Best regards Vivek > > Thanks, > Bjorn > > > Thanks & Regards > > Vivek > > > > > Best Regards > > > Vivek > > > > > > > > Vivek Gautam (3): > > > > soc: qcom: llcc cleanup to get rid of sdm845 specific driver file > > > > soc: qcom: Rename llcc-slice to llcc-qcom > > > > soc: qcom: Make llcc-qcom a generic driver > > > > > > > > drivers/soc/qcom/Kconfig | 14 +-- > > > > drivers/soc/qcom/Makefile | 3 +- > > > > drivers/soc/qcom/{llcc-slice.c => llcc-qcom.c} | 155 > > > > +++-- > > > > drivers/soc/qcom/llcc-sdm845.c | 100 > > > > include/linux/soc/qcom/llcc-qcom.h | 104 - > > > > 5 files changed, 152 insertions(+), 224 deletions(-) > > > > rename drivers/soc/qcom/{llcc-slice.c => llcc-qcom.c} (64%) > > > > delete mode 100644 drivers/soc/qcom/llcc-sdm845.c > > > > > > > > > > > > -- > > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > > of Code Aurora Forum, hosted by The Linux Foundation -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v2 0/3] soc: qcom: llcc cleanups
On Fri, Aug 2, 2019 at 11:43 AM Vivek Gautam wrote: > > On Thu, Jul 18, 2019 at 6:33 PM Vivek Gautam > wrote: > > > > To better support future versions of llcc, consolidating the > > driver to llcc-qcom driver file, and taking care of the dependencies. > > v1 series is availale at: > > https://lore.kernel.org/patchwork/patch/1099573/ > > > > Changes since v1: > > Addressing Bjorn's comments - > > * Not using llcc-plat as the platform driver rather using a single > >driver file now - llcc-qcom. > > * Removed SCT_ENTRY macro. > > * Moved few structure definitions from include/linux path to llcc-qcom > >driver as they are not exposed to other subsystems. > > Hi Bjorn, > > How does this cleanup look now? Let me know if there are any > improvements to make here. > Hi Bjorn, Are you planning to pull this series in the next merge window? There's a dt patch as well for llcc on sdm845 [1] that has been lying around. Let me know if you have concerns with this series. I will be happy to incorporate the suggestions. [1] https://lore.kernel.org/patchwork/patch/1099318/ Thanks & Regards Vivek > Best Regards > Vivek > > > > Vivek Gautam (3): > > soc: qcom: llcc cleanup to get rid of sdm845 specific driver file > > soc: qcom: Rename llcc-slice to llcc-qcom > > soc: qcom: Make llcc-qcom a generic driver > > > > drivers/soc/qcom/Kconfig | 14 +-- > > drivers/soc/qcom/Makefile | 3 +- > > drivers/soc/qcom/{llcc-slice.c => llcc-qcom.c} | 155 > > +++-- > > drivers/soc/qcom/llcc-sdm845.c | 100 > > include/linux/soc/qcom/llcc-qcom.h | 104 - > > 5 files changed, 152 insertions(+), 224 deletions(-) > > rename drivers/soc/qcom/{llcc-slice.c => llcc-qcom.c} (64%) > > delete mode 100644 drivers/soc/qcom/llcc-sdm845.c > > -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook
Add reset hook for sdm845 based platforms to turn off the wait-for-safe sequence. Understanding how wait-for-safe logic affects USB and UFS performance on MTP845 and DB845 boards: Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic to address under-performance issues in real-time clients, such as Display, and Camera. On receiving an invalidation requests, the SMMU forwards SAFE request to these clients and waits for SAFE ack signal from real-time clients. The SAFE signal from such clients is used to qualify the start of invalidation. This logic is controlled by chicken bits, one for each - MDP (display), IFE0, and IFE1 (camera), that can be accessed only from secure software on sdm845. This configuration, however, degrades the performance of non-real time clients, such as USB, and UFS etc. This happens because, with wait-for-safe logic enabled the hardware tries to throttle non-real time clients while waiting for SAFE ack signals from real-time clients. On mtp845 and db845 devices, with wait-for-safe logic enabled by the bootloaders we see degraded performance of USB and UFS when kernel enables the smmu stage-1 translations for these clients. Turn off this wait-for-safe logic from the kernel gets us back the perf of USB and UFS devices until we re-visit this when we start seeing perf issues on display/camera on upstream supported SDM845 platforms. The bootloaders on these boards implement secure monitor callbacks to handle a specific command - QCOM_SCM_SVC_SMMU_PROGRAM with which the logic can be toggled. There are other boards such as cheza whose bootloaders don't enable this logic. Such boards don't implement callbacks to handle the specific SCM call so disabling this logic for such boards will be a no-op. This change is inspired by the downstream change from Patrick Daly to address performance issues with display and camera by handling this wait-for-safe within separte io-pagetable ops to do TLB maintenance. So a big thanks to him for the change and for all the offline discussions. Without this change the UFS reads are pretty slow: $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync 10+0 records in 10+0 records out 10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s real0m 22.39s user0m 0.00s sys 0m 0.01s With this change they are back to rock! $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync 300+0 records in 300+0 records out 314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s real0m 1.03s user0m 0.00s sys 0m 0.54s Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu-impl.c | 27 ++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index 3f88cd078dd5..0aef87c41f9c 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -6,6 +6,7 @@ #include #include +#include #include "arm-smmu.h" @@ -102,7 +103,6 @@ static struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smm return &cs->smmu; } - #define ARM_MMU500_ACTLR_CPRE (1 << 1) #define ARM_MMU500_ACR_CACHE_LOCK (1 << 26) @@ -147,6 +147,28 @@ static const struct arm_smmu_impl arm_mmu500_impl = { .reset = arm_mmu500_reset, }; +static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu) +{ + int ret; + + arm_mmu500_reset(smmu); + + /* +* To address performance degradation in non-real time clients, +* such as USB and UFS, turn off wait-for-safe on sdm845 based boards, +* such as MTP and db845, whose firmwares implement secure monitor +* call handlers to turn on/off the wait-for-safe logic. +*/ + ret = qcom_scm_qsmmu500_wait_safe_toggle(0); + if (ret) + dev_warn(smmu->dev, "Failed to turn off SAFE logic\n"); + + return 0; +} + +const struct arm_smmu_impl qcom_sdm845_smmu500_impl = { + .reset = qcom_sdm845_smmu500_reset, +}; struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) { @@ -170,5 +192,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) "calxeda,smmu-secure-config-access")) smmu->impl = &calxeda_impl; + if (of_device_is_compatible(smmu->dev->of_node, "qcom,sdm845-smmu-500")) + smmu->impl = &qcom_sdm845_smmu500_impl; + return smmu; } -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v4 1/3] firmware: qcom_scm-64: Add atomic version of qcom_scm_call
There are scnenarios where drivers are required to make a scm call in atomic context, such as in one of the qcom's arm-smmu-500 errata [1]. [1] ("https://source.codeaurora.org/quic/la/kernel/msm-4.9/ tree/drivers/iommu/arm-smmu.c?h=msm-4.9#n4842") Signed-off-by: Vivek Gautam Reviewed-by: Bjorn Andersson --- drivers/firmware/qcom_scm-64.c | 136 - 1 file changed, 92 insertions(+), 44 deletions(-) diff --git a/drivers/firmware/qcom_scm-64.c b/drivers/firmware/qcom_scm-64.c index 91d5ad7cf58b..b6dca32c5ac4 100644 --- a/drivers/firmware/qcom_scm-64.c +++ b/drivers/firmware/qcom_scm-64.c @@ -62,32 +62,71 @@ static DEFINE_MUTEX(qcom_scm_lock); #define FIRST_EXT_ARG_IDX 3 #define N_REGISTER_ARGS (MAX_QCOM_SCM_ARGS - N_EXT_QCOM_SCM_ARGS + 1) -/** - * qcom_scm_call() - Invoke a syscall in the secure world - * @dev: device - * @svc_id:service identifier - * @cmd_id:command identifier - * @desc: Descriptor structure containing arguments and return values - * - * Sends a command to the SCM and waits for the command to finish processing. - * This should *only* be called in pre-emptible context. -*/ -static int qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id, -const struct qcom_scm_desc *desc, -struct arm_smccc_res *res) +static void __qcom_scm_call_do(const struct qcom_scm_desc *desc, + struct arm_smccc_res *res, u32 fn_id, + u64 x5, u32 type) +{ + u64 cmd; + struct arm_smccc_quirk quirk = {.id = ARM_SMCCC_QUIRK_QCOM_A6}; + + cmd = ARM_SMCCC_CALL_VAL(type, qcom_smccc_convention, +ARM_SMCCC_OWNER_SIP, fn_id); + + quirk.state.a6 = 0; + + do { + arm_smccc_smc_quirk(cmd, desc->arginfo, desc->args[0], + desc->args[1], desc->args[2], x5, + quirk.state.a6, 0, res, &quirk); + + if (res->a0 == QCOM_SCM_INTERRUPTED) + cmd = res->a0; + + } while (res->a0 == QCOM_SCM_INTERRUPTED); +} + +static void qcom_scm_call_do(const struct qcom_scm_desc *desc, +struct arm_smccc_res *res, u32 fn_id, +u64 x5, bool atomic) +{ + int retry_count = 0; + + if (!atomic) { + do { + mutex_lock(&qcom_scm_lock); + + __qcom_scm_call_do(desc, res, fn_id, x5, + ARM_SMCCC_STD_CALL); + + mutex_unlock(&qcom_scm_lock); + + if (res->a0 == QCOM_SCM_V2_EBUSY) { + if (retry_count++ > QCOM_SCM_EBUSY_MAX_RETRY) + break; + msleep(QCOM_SCM_EBUSY_WAIT_MS); + } + } while (res->a0 == QCOM_SCM_V2_EBUSY); + } else { + __qcom_scm_call_do(desc, res, fn_id, x5, ARM_SMCCC_FAST_CALL); + } +} + +static int ___qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id, + const struct qcom_scm_desc *desc, + struct arm_smccc_res *res, bool atomic) { int arglen = desc->arginfo & 0xf; - int retry_count = 0, i; + int i; u32 fn_id = QCOM_SCM_FNID(svc_id, cmd_id); - u64 cmd, x5 = desc->args[FIRST_EXT_ARG_IDX]; + u64 x5 = desc->args[FIRST_EXT_ARG_IDX]; dma_addr_t args_phys = 0; void *args_virt = NULL; size_t alloc_len; - struct arm_smccc_quirk quirk = {.id = ARM_SMCCC_QUIRK_QCOM_A6}; + gfp_t flag = atomic ? GFP_ATOMIC : GFP_KERNEL; if (unlikely(arglen > N_REGISTER_ARGS)) { alloc_len = N_EXT_QCOM_SCM_ARGS * sizeof(u64); - args_virt = kzalloc(PAGE_ALIGN(alloc_len), GFP_KERNEL); + args_virt = kzalloc(PAGE_ALIGN(alloc_len), flag); if (!args_virt) return -ENOMEM; @@ -117,33 +156,7 @@ static int qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id, x5 = args_phys; } - do { - mutex_lock(&qcom_scm_lock); - - cmd = ARM_SMCCC_CALL_VAL(ARM_SMCCC_STD_CALL, -qcom_smccc_convention, -ARM_SMCCC_OWNER_SIP, fn_id); - - quirk.state.a6 = 0; - - do { - arm_smccc_smc_quirk(cmd, desc->arginfo, desc->args[0], - desc->args[1], desc->args[2], x5, - quirk.state.a6, 0, res, &quirk); - - if (res->a0 == QCOM_SCM_INTERRUPTED) - cmd = res->a0; - -
[PATCH v4 0/3] Qcom smmu-500 wait-for-safe handling for sdm845
Previous version of the patches are at [1]: Qcom's implementation of smmu-500 on sdm845 adds a hardware logic called wait-for-safe. This logic helps in meeting the invalidation requirements from 'real-time clients', such as display and camera. This wait-for-safe logic ensures that the invalidations happen after getting an ack from these devices. In this patch-series we are disabling this wait-for-safe logic from the arm-smmu driver's probe as with this enabled the hardware tries to throttle invalidations from 'non-real-time clients', such as USB and UFS. For detailed information please refer to patch [3/4] in this series. I have included the device tree patch too in this series for someone who would like to test out this. Here's a branch [2] that gets display on MTP SDM845 device. This patch series is inspired from downstream work to handle under-performance issues on real-time clients on sdm845. In downstream we add separate page table ops to handle TLB maintenance and toggle wait-for-safe in tlb_sync call so that achieve required performance for display and camera [3, 4]. Changes since v3: * Based on arm-smmu implementation cleanup series [5] by Robin Murphy which is already merged in Will's tree [6]. * Implemented the sdm845 specific reset hook which does arm_smmu_device_reset() followed by making SCM call to disable the wait-for-safe logic. * Removed depedency for SCM call on any dt flag. We invariably try to disable the wait-for-safe logic on sdm845. The platforms such as mtp845, and db845 that implement handlers for this particular SCM call should be able disable wait-for-safe logic. Other platforms such as cheza don't enable the wait-for-safe logic at all from their bootloaders. So there's no need to disable the same. * No change in SCM call patches 1 & 2. Changes since v2: * Dropped the patch to add atomic io_read/write scm API. * Removed support for any separate page table ops to handle wait-for-safe. Currently just disabling this wait-for-safe logic from arm_smmu_device_probe() to achieve performance on USB/UFS on sdm845. * Added a device tree patch to add smmu option for fw-implemented support for SCM call to take care of SAFE toggling. Changes since v1: * Addressed Will and Robin's comments: - Dropped the patch[4] that forked out __arm_smmu_tlb_inv_range_nosync(), and __arm_smmu_tlb_sync(). - Cleaned up the errata patch further to use downstream polling mechanism for tlb sync. * No change in SCM call patches - patches 1 to 3. [1] https://lore.kernel.org/patchwork/cover/1087453/ [2] https://github.com/vivekgautam1/linux/tree/v5.2-rc4/sdm845-display-working [3] https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/drivers/iommu/arm-smmu.c?h=CogSystems-msm-49/msm-4.9&id=da765c6c75266b38191b38ef086274943f353ea7 [4] https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/drivers/iommu/arm-smmu.c?h=CogSystems-msm-49/msm-4.9&id=8696005aaaf745de68f57793c1a534a34345c30a [5] https://patchwork.kernel.org/patch/11096265/ [6] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/ Vivek Gautam (3): firmware: qcom_scm-64: Add atomic version of qcom_scm_call firmware/qcom_scm: Add scm call to handle smmu errata iommu: arm-smmu-impl: Add sdm845 implementation hook drivers/firmware/qcom_scm-32.c | 5 ++ drivers/firmware/qcom_scm-64.c | 149 + drivers/firmware/qcom_scm.c| 6 ++ drivers/firmware/qcom_scm.h| 5 ++ drivers/iommu/arm-smmu-impl.c | 27 +++- include/linux/qcom_scm.h | 2 + 6 files changed, 149 insertions(+), 45 deletions(-) -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v4 2/3] firmware/qcom_scm: Add scm call to handle smmu errata
Qcom's smmu-500 needs to toggle wait-for-safe sequence to handle TLB invalidation sync's. Few firmwares allow doing that through SCM interface. Add API to toggle wait for safe from firmware through a SCM call. Signed-off-by: Vivek Gautam Reviewed-by: Bjorn Andersson --- drivers/firmware/qcom_scm-32.c | 5 + drivers/firmware/qcom_scm-64.c | 13 + drivers/firmware/qcom_scm.c| 6 ++ drivers/firmware/qcom_scm.h| 5 + include/linux/qcom_scm.h | 2 ++ 5 files changed, 31 insertions(+) diff --git a/drivers/firmware/qcom_scm-32.c b/drivers/firmware/qcom_scm-32.c index 215061c581e1..bee8729525ec 100644 --- a/drivers/firmware/qcom_scm-32.c +++ b/drivers/firmware/qcom_scm-32.c @@ -614,3 +614,8 @@ int __qcom_scm_io_writel(struct device *dev, phys_addr_t addr, unsigned int val) return qcom_scm_call_atomic2(QCOM_SCM_SVC_IO, QCOM_SCM_IO_WRITE, addr, val); } + +int __qcom_scm_qsmmu500_wait_safe_toggle(struct device *dev, bool enable) +{ + return -ENODEV; +} diff --git a/drivers/firmware/qcom_scm-64.c b/drivers/firmware/qcom_scm-64.c index b6dca32c5ac4..41c06dcfa9e1 100644 --- a/drivers/firmware/qcom_scm-64.c +++ b/drivers/firmware/qcom_scm-64.c @@ -550,3 +550,16 @@ int __qcom_scm_io_writel(struct device *dev, phys_addr_t addr, unsigned int val) return qcom_scm_call(dev, QCOM_SCM_SVC_IO, QCOM_SCM_IO_WRITE, &desc, &res); } + +int __qcom_scm_qsmmu500_wait_safe_toggle(struct device *dev, bool en) +{ + struct qcom_scm_desc desc = {0}; + struct arm_smccc_res res; + + desc.args[0] = QCOM_SCM_CONFIG_ERRATA1_CLIENT_ALL; + desc.args[1] = en; + desc.arginfo = QCOM_SCM_ARGS(2); + + return qcom_scm_call_atomic(dev, QCOM_SCM_SVC_SMMU_PROGRAM, + QCOM_SCM_CONFIG_ERRATA1, &desc, &res); +} diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c index 2ddc118dba1b..2b3b7a8c4270 100644 --- a/drivers/firmware/qcom_scm.c +++ b/drivers/firmware/qcom_scm.c @@ -344,6 +344,12 @@ int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 size, u32 spare) } EXPORT_SYMBOL(qcom_scm_iommu_secure_ptbl_init); +int qcom_scm_qsmmu500_wait_safe_toggle(bool en) +{ + return __qcom_scm_qsmmu500_wait_safe_toggle(__scm->dev, en); +} +EXPORT_SYMBOL(qcom_scm_qsmmu500_wait_safe_toggle); + int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val) { return __qcom_scm_io_readl(__scm->dev, addr, val); diff --git a/drivers/firmware/qcom_scm.h b/drivers/firmware/qcom_scm.h index 99506bd873c0..baee744dbcfe 100644 --- a/drivers/firmware/qcom_scm.h +++ b/drivers/firmware/qcom_scm.h @@ -91,10 +91,15 @@ extern int __qcom_scm_restore_sec_cfg(struct device *dev, u32 device_id, u32 spare); #define QCOM_SCM_IOMMU_SECURE_PTBL_SIZE3 #define QCOM_SCM_IOMMU_SECURE_PTBL_INIT4 +#define QCOM_SCM_SVC_SMMU_PROGRAM 0x15 +#define QCOM_SCM_CONFIG_ERRATA10x3 +#define QCOM_SCM_CONFIG_ERRATA1_CLIENT_ALL 0x2 extern int __qcom_scm_iommu_secure_ptbl_size(struct device *dev, u32 spare, size_t *size); extern int __qcom_scm_iommu_secure_ptbl_init(struct device *dev, u64 addr, u32 size, u32 spare); +extern int __qcom_scm_qsmmu500_wait_safe_toggle(struct device *dev, + bool enable); #define QCOM_MEM_PROT_ASSIGN_ID0x16 extern int __qcom_scm_assign_mem(struct device *dev, phys_addr_t mem_region, size_t mem_sz, diff --git a/include/linux/qcom_scm.h b/include/linux/qcom_scm.h index 3f12cc77fb58..aee3d8580d89 100644 --- a/include/linux/qcom_scm.h +++ b/include/linux/qcom_scm.h @@ -57,6 +57,7 @@ extern int qcom_scm_set_remote_state(u32 state, u32 id); extern int qcom_scm_restore_sec_cfg(u32 device_id, u32 spare); extern int qcom_scm_iommu_secure_ptbl_size(u32 spare, size_t *size); extern int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 size, u32 spare); +extern int qcom_scm_qsmmu500_wait_safe_toggle(bool en); extern int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val); extern int qcom_scm_io_writel(phys_addr_t addr, unsigned int val); #else @@ -96,6 +97,7 @@ qcom_scm_set_remote_state(u32 state,u32 id) { return -ENODEV; } static inline int qcom_scm_restore_sec_cfg(u32 device_id, u32 spare) { return -ENODEV; } static inline int qcom_scm_iommu_secure_ptbl_size(u32 spare, size_t *size) { return -ENODEV; } static inline int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 size, u32 spare) { return -ENODEV; } +static inline int qcom_scm_qsmmu500_wait_safe_toggle(bool en) { return -ENODEV; } static inline int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val) { return -ENODEV; } static inline int qcom_scm_io_writel(phys_addr_t addr, unsigned int val) { return -ENOD
Re: [PATCH v3 4/4] arm64: dts/sdm845: Enable FW implemented safe sequence handler on MTP
On Tue, Aug 6, 2019 at 3:56 AM Bjorn Andersson wrote: > > On Wed 12 Jun 00:15 PDT 2019, Vivek Gautam wrote: > > > Indicate on MTP SDM845 that firmware implements handler to > > TLB invalidate erratum SCM call where SAFE sequence is toggled > > to achieve optimum performance on real-time clients, such as > > display and camera. > > > > Signed-off-by: Vivek Gautam > > --- > > arch/arm64/boot/dts/qcom/sdm845.dtsi | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi > > b/arch/arm64/boot/dts/qcom/sdm845.dtsi > > index 78ec373a2b18..6a73d9744a71 100644 > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi > > @@ -2368,6 +2368,7 @@ > > compatible = "qcom,sdm845-smmu-500", "arm,mmu-500"; > > reg = <0 0x1500 0 0x8>; > > #iommu-cells = <2>; > > + qcom,smmu-500-fw-impl-safe-errata; > > Looked back at this series and started to wonder if there there is a > case where this should not be set? I mean we're after all adding this to > the top 845 dtsi... My bad. This is not valid in case of cheza. Cheza firmware doesn't implement the safe errata handling hook. On cheza we just have the liberty of accessing the secure registers through scm calls - this is what we were doing in earlier patch series handling this errata. So, a property like this should go to mtp board's dts file. Thanks Vivek > > How about making it the default in the driver and opt out of the errata > once there is a need? > > Regards, > Bjorn > > > #global-interrupts = <1>; > > interrupts = , > >, > > -- > > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > > of Code Aurora Forum, hosted by The Linux Foundation > > > ___ > iommu mailing list > io...@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH 1/1] arm64: dts: sdm845: Add device node for Last level cache controller
Hi Bjorn, On Wed, Jul 10, 2019 at 5:09 PM Vivek Gautam wrote: > > From: Sai Prakash Ranjan > > Last level cache (aka. system cache) controller provides control > over the last level cache present on SDM845. This cache lies after > the memory noc, right before the DDR. > > Signed-off-by: Sai Prakash Ranjan > Signed-off-by: Vivek Gautam > --- > arch/arm64/boot/dts/qcom/sdm845.dtsi | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi > b/arch/arm64/boot/dts/qcom/sdm845.dtsi > index 4babff5f19b5..314241a99290 100644 > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi > @@ -1275,6 +1275,13 @@ > }; > }; > > + cache-controller@110 { > + compatible = "qcom,sdm845-llcc"; > + reg = <0 0x110 0 0x20>, <0 0x130 0 > 0x5>; > + reg-names = "llcc_base", "llcc_broadcast_base"; > + interrupts = ; > + }; Gentle ping. Are you planning to pick this? Thanks Vivek [snip] -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH] phy: qualcomm: phy-qcom-qmp: Add of_node_put() before return
On Sun, Aug 4, 2019 at 9:54 PM Nishka Dasgupta wrote: > > Each iteration of for_each_available_child_of_node puts the previous > node, but in the case of a return from the middle of the loop, there is > no put, thus causing a memory leak. Hence add an of_node_put before the > return in two places. > Issue found with Coccinelle. > > Signed-off-by: Nishka Dasgupta > --- > drivers/phy/qualcomm/phy-qcom-qmp.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c > b/drivers/phy/qualcomm/phy-qcom-qmp.c > index 34ff6434da8f..2f0652efebf0 100644 > --- a/drivers/phy/qualcomm/phy-qcom-qmp.c > +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c > @@ -2094,6 +2094,7 @@ static int qcom_qmp_phy_probe(struct platform_device > *pdev) > dev_err(dev, "failed to create lane%d phy, %d\n", > id, ret); > pm_runtime_disable(dev); > + of_node_put(child); > return ret; > } > > @@ -2106,6 +2107,7 @@ static int qcom_qmp_phy_probe(struct platform_device > *pdev) > dev_err(qmp->dev, > "failed to register pipe clock source\n"); > pm_runtime_disable(dev); > + of_node_put(child); Nice find. Thanks for the patch. Reviewed-by: Vivek Gautam Best regards Vivek [snip] -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v2 0/3] soc: qcom: llcc cleanups
On Thu, Jul 18, 2019 at 6:33 PM Vivek Gautam wrote: > > To better support future versions of llcc, consolidating the > driver to llcc-qcom driver file, and taking care of the dependencies. > v1 series is availale at: > https://lore.kernel.org/patchwork/patch/1099573/ > > Changes since v1: > Addressing Bjorn's comments - > * Not using llcc-plat as the platform driver rather using a single >driver file now - llcc-qcom. > * Removed SCT_ENTRY macro. > * Moved few structure definitions from include/linux path to llcc-qcom >driver as they are not exposed to other subsystems. Hi Bjorn, How does this cleanup look now? Let me know if there are any improvements to make here. Best Regards Vivek > > Vivek Gautam (3): > soc: qcom: llcc cleanup to get rid of sdm845 specific driver file > soc: qcom: Rename llcc-slice to llcc-qcom > soc: qcom: Make llcc-qcom a generic driver > > drivers/soc/qcom/Kconfig | 14 +-- > drivers/soc/qcom/Makefile | 3 +- > drivers/soc/qcom/{llcc-slice.c => llcc-qcom.c} | 155 > +++-- > drivers/soc/qcom/llcc-sdm845.c | 100 > include/linux/soc/qcom/llcc-qcom.h | 104 - > 5 files changed, 152 insertions(+), 224 deletions(-) > rename drivers/soc/qcom/{llcc-slice.c => llcc-qcom.c} (64%) > delete mode 100644 drivers/soc/qcom/llcc-sdm845.c > > -- > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > of Code Aurora Forum, hosted by The Linux Foundation > -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 1/1] tty: serial: qcom_geni_serial: Update the oversampling rate
For QUP IP versions 2.5 and above the oversampling rate is halved from 32 to 16. Update this rate after reading hardware version register, so that the clock divider value is correctly set to achieve required baud rate. Signed-off-by: Vivek Gautam --- drivers/tty/serial/qcom_geni_serial.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c index 35e5f9c5d5be..318f811585cc 100644 --- a/drivers/tty/serial/qcom_geni_serial.c +++ b/drivers/tty/serial/qcom_geni_serial.c @@ -920,12 +920,13 @@ static unsigned long get_clk_cfg(unsigned long clk_freq) return 0; } -static unsigned long get_clk_div_rate(unsigned int baud, unsigned int *clk_div) +static unsigned long get_clk_div_rate(unsigned int baud, + unsigned int sampling_rate, unsigned int *clk_div) { unsigned long ser_clk; unsigned long desired_clk; - desired_clk = baud * UART_OVERSAMPLING; + desired_clk = baud * sampling_rate; ser_clk = get_clk_cfg(desired_clk); if (!ser_clk) { pr_err("%s: Can't find matching DFS entry for baud %d\n", @@ -951,12 +952,20 @@ static void qcom_geni_serial_set_termios(struct uart_port *uport, u32 ser_clk_cfg; struct qcom_geni_serial_port *port = to_dev_port(uport, uport); unsigned long clk_rate; + u32 ver, sampling_rate; qcom_geni_serial_stop_rx(uport); /* baud rate */ baud = uart_get_baud_rate(uport, termios, old, 300, 400); port->baud = baud; - clk_rate = get_clk_div_rate(baud, &clk_div); + + sampling_rate = UART_OVERSAMPLING; + /* Sampling rate is halved for IP versions >= 2.5 */ + ver = geni_se_get_qup_hw_version(&port->se); + if (GENI_SE_VERSION_MAJOR(ver) >= 2 && GENI_SE_VERSION_MINOR(ver) >= 5) + sampling_rate /= 2; + + clk_rate = get_clk_div_rate(baud, sampling_rate, &clk_div); if (!clk_rate) goto out_restart_rx; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 2/3] soc: qcom: Rename llcc-slice to llcc-qcom
The cleaning up was done without changing the driver file name to ensure a cleaner bisect. Change the file name now to facilitate making the driver generic in subsequent patch. Signed-off-by: Vivek Gautam --- drivers/soc/qcom/Makefile | 2 +- drivers/soc/qcom/{llcc-slice.c => llcc-qcom.c} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename drivers/soc/qcom/{llcc-slice.c => llcc-qcom.c} (100%) diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile index 386bf197e0e5..caf8e0beaa57 100644 --- a/drivers/soc/qcom/Makefile +++ b/drivers/soc/qcom/Makefile @@ -20,6 +20,6 @@ obj-$(CONFIG_QCOM_SMP2P) += smp2p.o obj-$(CONFIG_QCOM_SMSM)+= smsm.o obj-$(CONFIG_QCOM_WCNSS_CTRL) += wcnss_ctrl.o obj-$(CONFIG_QCOM_APR) += apr.o -obj-$(CONFIG_QCOM_LLCC) += llcc-slice.o +obj-$(CONFIG_QCOM_LLCC) += llcc-qcom.o obj-$(CONFIG_QCOM_RPMHPD) += rpmhpd.o obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o diff --git a/drivers/soc/qcom/llcc-slice.c b/drivers/soc/qcom/llcc-qcom.c similarity index 100% rename from drivers/soc/qcom/llcc-slice.c rename to drivers/soc/qcom/llcc-qcom.c -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 1/3] soc: qcom: llcc cleanup to get rid of sdm845 specific driver file
A single file should suffice the need to program the llcc for various platforms. Get rid of sdm845 specific driver file to make way for a more generic driver. Signed-off-by: Vivek Gautam --- drivers/soc/qcom/Kconfig | 14 ++ drivers/soc/qcom/Makefile | 1 - drivers/soc/qcom/llcc-sdm845.c | 100 - drivers/soc/qcom/llcc-slice.c | 60 +++--- include/linux/soc/qcom/llcc-qcom.h | 57 - 5 files changed, 77 insertions(+), 155 deletions(-) delete mode 100644 drivers/soc/qcom/llcc-sdm845.c diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig index a6d1bfb17279..b6cc5816a94b 100644 --- a/drivers/soc/qcom/Kconfig +++ b/drivers/soc/qcom/Kconfig @@ -58,17 +58,9 @@ config QCOM_LLCC depends on ARCH_QCOM || COMPILE_TEST help Qualcomm Technologies, Inc. platform specific - Last Level Cache Controller(LLCC) driver. This provides interfaces - to clients that use the LLCC. Say yes here to enable LLCC slice - driver. - -config QCOM_SDM845_LLCC - tristate "Qualcomm Technologies, Inc. SDM845 LLCC driver" - depends on QCOM_LLCC - help - Say yes here to enable the LLCC driver for SDM845. This provides - data required to configure LLCC so that clients can start using the - LLCC slices. + Last Level Cache Controller(LLCC) driver for platforms such as, + SDM845. This provides interfaces to clients that use the LLCC. + Say yes here to enable LLCC slice driver. config QCOM_MDT_LOADER tristate diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile index eeb088beb15f..386bf197e0e5 100644 --- a/drivers/soc/qcom/Makefile +++ b/drivers/soc/qcom/Makefile @@ -21,6 +21,5 @@ obj-$(CONFIG_QCOM_SMSM) += smsm.o obj-$(CONFIG_QCOM_WCNSS_CTRL) += wcnss_ctrl.o obj-$(CONFIG_QCOM_APR) += apr.o obj-$(CONFIG_QCOM_LLCC) += llcc-slice.o -obj-$(CONFIG_QCOM_SDM845_LLCC) += llcc-sdm845.o obj-$(CONFIG_QCOM_RPMHPD) += rpmhpd.o obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o diff --git a/drivers/soc/qcom/llcc-sdm845.c b/drivers/soc/qcom/llcc-sdm845.c deleted file mode 100644 index 86600d97c36d.. --- a/drivers/soc/qcom/llcc-sdm845.c +++ /dev/null @@ -1,100 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Copyright (c) 2017-2018, The Linux Foundation. All rights reserved. - * - */ - -#include -#include -#include -#include -#include - -/* - * SCT(System Cache Table) entry contains of the following members: - * usecase_id: Unique id for the client's use case - * slice_id: llcc slice id for each client - * max_cap: The maximum capacity of the cache slice provided in KB - * priority: Priority of the client used to select victim line for replacement - * fixed_size: Boolean indicating if the slice has a fixed capacity - * bonus_ways: Bonus ways are additional ways to be used for any slice, - * if client ends up using more than reserved cache ways. Bonus - * ways are allocated only if they are not reserved for some - * other client. - * res_ways: Reserved ways for the cache slice, the reserved ways cannot - * be used by any other client than the one its assigned to. - * cache_mode: Each slice operates as a cache, this controls the mode of the - * slice: normal or TCM(Tightly Coupled Memory) - * probe_target_ways: Determines what ways to probe for access hit. When - *configured to 1 only bonus and reserved ways are probed. - *When configured to 0 all ways in llcc are probed. - * dis_cap_alloc: Disable capacity based allocation for a client - * retain_on_pc: If this bit is set and client has maintained active vote - * then the ways assigned to this client are not flushed on power - * collapse. - * activate_on_init: Activate the slice immediately after the SCT is programmed - */ -#define SCT_ENTRY(uid, sid, mc, p, fs, bway, rway, cmod, ptw, dca, rp, a) \ - { \ - .usecase_id = uid, \ - .slice_id = sid,\ - .max_cap = mc, \ - .priority = p, \ - .fixed_size = fs, \ - .bonus_ways = bway, \ - .res_ways = rway, \ - .cache_mode = cmod, \ - .probe_target_ways = ptw, \ - .dis_cap_alloc = dca, \ - .retain_on_pc = rp, \ - .activate_on_init = a, \ - } - -static struct llcc_slice_config sdm845_data[] = { - SCT_ENTRY(LLCC_CPUSS,1, 2816, 1, 0, 0xffc, 0x2, 0, 0, 1, 1, 1), - SCT_ENTRY(LLCC_VIDSC0, 2, 512, 2, 1, 0x0, 0x0f0, 0, 0, 1, 1, 0), - SCT_ENTRY(LLCC_VIDSC1, 3,
[PATCH 3/3] soc: qcom: Make llcc-qcom a generic driver
This makes way for adding future llcc versions. Also pull out the llcc-qcom specific definitions from includes. Includes path now contains the only definitions that are to be exposed to other subsystems. Signed-off-by: Vivek Gautam --- drivers/soc/qcom/llcc-qcom.c | 137 +++-- include/linux/soc/qcom/llcc-qcom.h | 89 2 files changed, 116 insertions(+), 110 deletions(-) diff --git a/drivers/soc/qcom/llcc-qcom.c b/drivers/soc/qcom/llcc-qcom.c index 574bb5bf20bc..98563ef0ac6b 100644 --- a/drivers/soc/qcom/llcc-qcom.c +++ b/drivers/soc/qcom/llcc-qcom.c @@ -47,6 +47,100 @@ #define BANK_OFFSET_STRIDE 0x8 +/** + * llcc_slice_config - Data associated with the llcc slice + * @usecase_id: Unique id for the client's use case + * @slice_id: llcc slice id for each client + * @max_cap: The maximum capacity of the cache slice provided in KB + * @priority: Priority of the client used to select victim line for replacement + * @fixed_size: Boolean indicating if the slice has a fixed capacity + * @bonus_ways: Bonus ways are additional ways to be used for any slice, + * if client ends up using more than reserved cache ways. Bonus + * ways are allocated only if they are not reserved for some + * other client. + * @res_ways: Reserved ways for the cache slice, the reserved ways cannot + * be used by any other client than the one its assigned to. + * @cache_mode: Each slice operates as a cache, this controls the mode of the + * slice: normal or TCM(Tightly Coupled Memory) + * @probe_target_ways: Determines what ways to probe for access hit. When + *configured to 1 only bonus and reserved ways are probed. + *When configured to 0 all ways in llcc are probed. + * @dis_cap_alloc: Disable capacity based allocation for a client + * @retain_on_pc: If this bit is set and client has maintained active vote + * then the ways assigned to this client are not flushed on power + * collapse. + * @activate_on_init: Activate the slice immediately after it is programmed + */ +struct llcc_slice_config { + u32 usecase_id; + u32 slice_id; + u32 max_cap; + u32 priority; + bool fixed_size; + u32 bonus_ways; + u32 res_ways; + u32 cache_mode; + u32 probe_target_ways; + bool dis_cap_alloc; + bool retain_on_pc; + bool activate_on_init; +}; + +/** + * llcc_drv_data - Data associated with the llcc driver + * @regmap: regmap associated with the llcc device + * @bcast_regmap: regmap associated with llcc broadcast offset + * @cfg: pointer to the data structure for slice configuration + * @lock: mutex associated with each slice + * @cfg_size: size of the config data table + * @max_slices: max slices as read from device tree + * @num_banks: Number of llcc banks + * @bitmap: Bit map to track the active slice ids + * @offsets: Pointer to the bank offsets array + * @ecc_irq: interrupt for llcc cache error detection and reporting + */ +struct llcc_drv_data { + struct regmap *regmap; + struct regmap *bcast_regmap; + const struct llcc_slice_config *cfg; + struct mutex lock; + u32 cfg_size; + u32 max_slices; + u32 num_banks; + unsigned long *bitmap; + u32 *offsets; + int ecc_irq; +}; + +/** + * llcc_edac_reg_data - llcc edac registers data for each error type + * @name: Name of the error + * @synd_reg: Syndrome register address + * @count_status_reg: Status register address to read the error count + * @ways_status_reg: Status register address to read the error ways + * @reg_cnt: Number of registers + * @count_mask: Mask value to get the error count + * @ways_mask: Mask value to get the error ways + * @count_shift: Shift value to get the error count + * @ways_shift: Shift value to get the error ways + */ +struct llcc_edac_reg_data { + char *name; + u64 synd_reg; + u64 count_status_reg; + u64 ways_status_reg; + u32 reg_cnt; + u32 count_mask; + u32 ways_mask; + u8 count_shift; + u8 ways_shift; +}; + +struct qcom_llcc_config { + const struct llcc_slice_config *sct_data; + int size; +}; + static struct llcc_slice_config sdm845_data[] = { { LLCC_CPUSS,1, 2816, 1, 0, 0xffc, 0x2, 0, 0, 1, 1, 1 }, { LLCC_VIDSC0, 2, 512, 2, 1, 0x0, 0x0f0, 0, 0, 1, 1, 0 }, @@ -68,6 +162,11 @@ static struct llcc_slice_config sdm845_data[] = { { LLCC_AUDHW,22, 1024, 1, 1, 0xffc, 0x2, 0, 0, 1, 1, 0 }, }; +static const struct qcom_llcc_config sdm845_cfg = { + .sct_data = sdm845_data, + .size = ARRAY_SIZE(sdm845_data), +}; + static struct llcc_drv_data *drv_data = (void *) -EPROBE_DEFER; static const struct regmap_config llcc_regmap_config = { @@ -347,13 +446,15 @@ static struct regmap *qcom_llcc_init_mmio(s
[PATCH v2 0/3] soc: qcom: llcc cleanups
To better support future versions of llcc, consolidating the driver to llcc-qcom driver file, and taking care of the dependencies. v1 series is availale at: https://lore.kernel.org/patchwork/patch/1099573/ Changes since v1: Addressing Bjorn's comments - * Not using llcc-plat as the platform driver rather using a single driver file now - llcc-qcom. * Removed SCT_ENTRY macro. * Moved few structure definitions from include/linux path to llcc-qcom driver as they are not exposed to other subsystems. Vivek Gautam (3): soc: qcom: llcc cleanup to get rid of sdm845 specific driver file soc: qcom: Rename llcc-slice to llcc-qcom soc: qcom: Make llcc-qcom a generic driver drivers/soc/qcom/Kconfig | 14 +-- drivers/soc/qcom/Makefile | 3 +- drivers/soc/qcom/{llcc-slice.c => llcc-qcom.c} | 155 +++-- drivers/soc/qcom/llcc-sdm845.c | 100 include/linux/soc/qcom/llcc-qcom.h | 104 - 5 files changed, 152 insertions(+), 224 deletions(-) rename drivers/soc/qcom/{llcc-slice.c => llcc-qcom.c} (64%) delete mode 100644 drivers/soc/qcom/llcc-sdm845.c -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH 2/2] soc: qcom: llcc-plat: Make the driver more generic
Hi Bjorn, Thanks for the review. On Thu, Jul 11, 2019 at 9:29 PM Bjorn Andersson wrote: > > On Thu 11 Jul 04:03 PDT 2019, Vivek Gautam wrote: > > > - Remove 'sdm845' from names, and use 'plat' instead. > > - Move SCT_ENTRY macro to header file. > > - Create a new config structure to asssign to of-match-data. > > > > I interpret the intention of these two patches as that you want to add > some new platform without having to create one llcc-xyz.c per platform. That's right. The intention is to avoid creating a new platform specific file. > > If that's the case then the only user of this macro would be in plat.c, > so I don't see a reason for moving it to the header file. Alright. Better to keep it in the driver file itself. > > > Signed-off-by: Vivek Gautam > > --- > > drivers/soc/qcom/llcc-plat.c | 77 > > -- > > include/linux/soc/qcom/llcc-qcom.h | 45 ++ > > 2 files changed, 68 insertions(+), 54 deletions(-) > > > > diff --git a/drivers/soc/qcom/llcc-plat.c b/drivers/soc/qcom/llcc-plat.c > > index 86600d97c36d..31cff0f75b53 100644 > > --- a/drivers/soc/qcom/llcc-plat.c > > +++ b/drivers/soc/qcom/llcc-plat.c > > @@ -1,6 +1,6 @@ > > // SPDX-License-Identifier: GPL-2.0 > > /* > > - * Copyright (c) 2017-2018, The Linux Foundation. All rights reserved. > > + * Copyright (c) 2017-2019, The Linux Foundation. All rights reserved. > > * > > */ > > > > @@ -10,47 +10,7 @@ > > #include > > #include > > > > -/* > > - * SCT(System Cache Table) entry contains of the following members: > > Should have caught this during previous review, but this comment simply > duplicates the kerneldoc for struct llcc_slice_config. Ok, i noticed it now. Will clean it up. I can remove this comment, and update the one for struct llcc_slice_config. > > > - * usecase_id: Unique id for the client's use case > > - * slice_id: llcc slice id for each client > > - * max_cap: The maximum capacity of the cache slice provided in KB > > - * priority: Priority of the client used to select victim line for > > replacement > > - * fixed_size: Boolean indicating if the slice has a fixed capacity > > - * bonus_ways: Bonus ways are additional ways to be used for any slice, > > - * if client ends up using more than reserved cache ways. Bonus > > - * ways are allocated only if they are not reserved for some > > - * other client. > > - * res_ways: Reserved ways for the cache slice, the reserved ways cannot > > - * be used by any other client than the one its assigned to. > > - * cache_mode: Each slice operates as a cache, this controls the mode of > > the > > - * slice: normal or TCM(Tightly Coupled Memory) > > - * probe_target_ways: Determines what ways to probe for access hit. When > > - *configured to 1 only bonus and reserved ways are > > probed. > > - *When configured to 0 all ways in llcc are probed. > > - * dis_cap_alloc: Disable capacity based allocation for a client > > - * retain_on_pc: If this bit is set and client has maintained active vote > > - * then the ways assigned to this client are not flushed on > > power > > - * collapse. > > - * activate_on_init: Activate the slice immediately after the SCT is > > programmed > > - */ > > -#define SCT_ENTRY(uid, sid, mc, p, fs, bway, rway, cmod, ptw, dca, rp, a) \ > > This simply maps macro arguments 1:1 to struct members, there's no need > for a macro for this. Sure, will remove the macro. > > > - { \ > > - .usecase_id = uid, \ > > - .slice_id = sid,\ > > - .max_cap = mc, \ > > - .priority = p, \ > > - .fixed_size = fs, \ > > - .bonus_ways = bway, \ > > - .res_ways = rway, \ > > - .cache_mode = cmod, \ > > - .probe_target_ways = ptw, \ > > - .dis_cap_alloc = dca, \ > > - .retain_on_pc = rp, \ > > - .activate_on_init = a, \ > > - } > > - > > -static struct llcc_slice_config sdm845_data[] = { > > +static const struct llcc_slice_config sdm845_data[] = { > > SCT_ENTRY(LLCC_CPUSS,1, 2816, 1, 0, 0xffc, 0x2, 0,
Re: [PATCH 1/2] soc: qcom: llcc: Rename llcc-sdm845 to llcc-plat
On Thu, Jul 11, 2019 at 9:19 PM Bjorn Andersson wrote: > > On Thu 11 Jul 04:03 PDT 2019, Vivek Gautam wrote: > > > To avoid adding files for each future supported SoCs rename > > the file to a generic name - llcc-plat, so that llcc configuration > > tables for other SoCs can be added in the same driver. > > > > We've had a generic LLCC Kconfig option and then a specific SDM845 one, > with this change we have two different generic options and both would > either always be enabled or disabled. > > So I think you should drop QCOM_SDM845_LLCC and build both llcc-slice > and llcc-plat into the same qcom_llcc.ko instead. Yea. I can chuck off the llcc-slice module. But for readability would it still be better to maintain separate files. I will drop the SDM845 config, and keep only QCOM_LLC. Best regards Vivek > > Regards, > Bjorn > > > Signed-off-by: Vivek Gautam > > --- > > drivers/soc/qcom/Kconfig| 10 +- > > drivers/soc/qcom/Makefile | 2 +- > > drivers/soc/qcom/{llcc-sdm845.c => llcc-plat.c} | 0 > > 3 files changed, 6 insertions(+), 6 deletions(-) > > rename drivers/soc/qcom/{llcc-sdm845.c => llcc-plat.c} (100%) > > > > diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig > > index a6d1bfb17279..8110d415b18e 100644 > > --- a/drivers/soc/qcom/Kconfig > > +++ b/drivers/soc/qcom/Kconfig > > @@ -62,13 +62,13 @@ config QCOM_LLCC > > to clients that use the LLCC. Say yes here to enable LLCC slice > > driver. > > > > -config QCOM_SDM845_LLCC > > - tristate "Qualcomm Technologies, Inc. SDM845 LLCC driver" > > +config QCOM_PLAT_LLCC > > + tristate "Qualcomm Technologies, Inc. platform LLCC driver" > > depends on QCOM_LLCC > > help > > - Say yes here to enable the LLCC driver for SDM845. This provides > > - data required to configure LLCC so that clients can start using the > > - LLCC slices. > > + Say yes here to enable the LLCC driver for Qcom platforms, such as > > + SDM845. This provides data required to configure LLCC so that > > + clients can start using the LLCC slices. > > > > config QCOM_MDT_LOADER > > tristate > > diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile > > index eeb088beb15f..3bf26667d7ee 100644 > > --- a/drivers/soc/qcom/Makefile > > +++ b/drivers/soc/qcom/Makefile > > @@ -21,6 +21,6 @@ obj-$(CONFIG_QCOM_SMSM) += smsm.o > > obj-$(CONFIG_QCOM_WCNSS_CTRL) += wcnss_ctrl.o > > obj-$(CONFIG_QCOM_APR) += apr.o > > obj-$(CONFIG_QCOM_LLCC) += llcc-slice.o > > -obj-$(CONFIG_QCOM_SDM845_LLCC) += llcc-sdm845.o > > +obj-$(CONFIG_QCOM_PLAT_LLCC) += llcc-plat.o > > obj-$(CONFIG_QCOM_RPMHPD) += rpmhpd.o > > obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o > > diff --git a/drivers/soc/qcom/llcc-sdm845.c b/drivers/soc/qcom/llcc-plat.c > > similarity index 100% > > rename from drivers/soc/qcom/llcc-sdm845.c > > rename to drivers/soc/qcom/llcc-plat.c > > -- > > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > > of Code Aurora Forum, hosted by The Linux Foundation > > -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 2/2] soc: qcom: llcc-plat: Make the driver more generic
- Remove 'sdm845' from names, and use 'plat' instead. - Move SCT_ENTRY macro to header file. - Create a new config structure to asssign to of-match-data. Signed-off-by: Vivek Gautam --- drivers/soc/qcom/llcc-plat.c | 77 -- include/linux/soc/qcom/llcc-qcom.h | 45 ++ 2 files changed, 68 insertions(+), 54 deletions(-) diff --git a/drivers/soc/qcom/llcc-plat.c b/drivers/soc/qcom/llcc-plat.c index 86600d97c36d..31cff0f75b53 100644 --- a/drivers/soc/qcom/llcc-plat.c +++ b/drivers/soc/qcom/llcc-plat.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* - * Copyright (c) 2017-2018, The Linux Foundation. All rights reserved. + * Copyright (c) 2017-2019, The Linux Foundation. All rights reserved. * */ @@ -10,47 +10,7 @@ #include #include -/* - * SCT(System Cache Table) entry contains of the following members: - * usecase_id: Unique id for the client's use case - * slice_id: llcc slice id for each client - * max_cap: The maximum capacity of the cache slice provided in KB - * priority: Priority of the client used to select victim line for replacement - * fixed_size: Boolean indicating if the slice has a fixed capacity - * bonus_ways: Bonus ways are additional ways to be used for any slice, - * if client ends up using more than reserved cache ways. Bonus - * ways are allocated only if they are not reserved for some - * other client. - * res_ways: Reserved ways for the cache slice, the reserved ways cannot - * be used by any other client than the one its assigned to. - * cache_mode: Each slice operates as a cache, this controls the mode of the - * slice: normal or TCM(Tightly Coupled Memory) - * probe_target_ways: Determines what ways to probe for access hit. When - *configured to 1 only bonus and reserved ways are probed. - *When configured to 0 all ways in llcc are probed. - * dis_cap_alloc: Disable capacity based allocation for a client - * retain_on_pc: If this bit is set and client has maintained active vote - * then the ways assigned to this client are not flushed on power - * collapse. - * activate_on_init: Activate the slice immediately after the SCT is programmed - */ -#define SCT_ENTRY(uid, sid, mc, p, fs, bway, rway, cmod, ptw, dca, rp, a) \ - { \ - .usecase_id = uid, \ - .slice_id = sid,\ - .max_cap = mc, \ - .priority = p, \ - .fixed_size = fs, \ - .bonus_ways = bway, \ - .res_ways = rway, \ - .cache_mode = cmod, \ - .probe_target_ways = ptw, \ - .dis_cap_alloc = dca, \ - .retain_on_pc = rp, \ - .activate_on_init = a, \ - } - -static struct llcc_slice_config sdm845_data[] = { +static const struct llcc_slice_config sdm845_data[] = { SCT_ENTRY(LLCC_CPUSS,1, 2816, 1, 0, 0xffc, 0x2, 0, 0, 1, 1, 1), SCT_ENTRY(LLCC_VIDSC0, 2, 512, 2, 1, 0x0, 0x0f0, 0, 0, 1, 1, 0), SCT_ENTRY(LLCC_VIDSC1, 3, 512, 2, 1, 0x0, 0x0f0, 0, 0, 1, 1, 0), @@ -71,30 +31,39 @@ static struct llcc_slice_config sdm845_data[] = { SCT_ENTRY(LLCC_AUDHW,22, 1024, 1, 1, 0xffc, 0x2, 0, 0, 1, 1, 0), }; -static int sdm845_qcom_llcc_remove(struct platform_device *pdev) +static const struct qcom_llcc_config sdm845_cfg = { + .sct_data = sdm845_data, + .size = ARRAY_SIZE(sdm845_data), +}; + +static int qcom_plat_llcc_remove(struct platform_device *pdev) { return qcom_llcc_remove(pdev); } -static int sdm845_qcom_llcc_probe(struct platform_device *pdev) +static int qcom_plat_llcc_probe(struct platform_device *pdev) { - return qcom_llcc_probe(pdev, sdm845_data, ARRAY_SIZE(sdm845_data)); + const struct qcom_llcc_config *cfg; + + cfg = of_device_get_match_data(&pdev->dev); + + return qcom_llcc_probe(pdev, cfg->sct_data, cfg->size); } -static const struct of_device_id sdm845_qcom_llcc_of_match[] = { - { .compatible = "qcom,sdm845-llcc", }, +static const struct of_device_id qcom_plat_llcc_of_match[] = { + { .compatible = "qcom,sdm845-llcc", .data = &sdm845_cfg }, { } }; -static struct platform_driver sdm845_qcom_llcc_driver = { +static struct platform_driver qcom_plat_llcc_driver = { .driver = { - .name = "sdm845-llcc", - .of_match_table = sdm845_qcom_llcc_of_match, + .name = "qcom-plat-llcc", + .of_match_table = qcom_plat_llcc_of_match, }, - .probe = sdm845_qcom_llcc_probe
[PATCH 1/2] soc: qcom: llcc: Rename llcc-sdm845 to llcc-plat
To avoid adding files for each future supported SoCs rename the file to a generic name - llcc-plat, so that llcc configuration tables for other SoCs can be added in the same driver. Signed-off-by: Vivek Gautam --- drivers/soc/qcom/Kconfig| 10 +- drivers/soc/qcom/Makefile | 2 +- drivers/soc/qcom/{llcc-sdm845.c => llcc-plat.c} | 0 3 files changed, 6 insertions(+), 6 deletions(-) rename drivers/soc/qcom/{llcc-sdm845.c => llcc-plat.c} (100%) diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig index a6d1bfb17279..8110d415b18e 100644 --- a/drivers/soc/qcom/Kconfig +++ b/drivers/soc/qcom/Kconfig @@ -62,13 +62,13 @@ config QCOM_LLCC to clients that use the LLCC. Say yes here to enable LLCC slice driver. -config QCOM_SDM845_LLCC - tristate "Qualcomm Technologies, Inc. SDM845 LLCC driver" +config QCOM_PLAT_LLCC + tristate "Qualcomm Technologies, Inc. platform LLCC driver" depends on QCOM_LLCC help - Say yes here to enable the LLCC driver for SDM845. This provides - data required to configure LLCC so that clients can start using the - LLCC slices. + Say yes here to enable the LLCC driver for Qcom platforms, such as + SDM845. This provides data required to configure LLCC so that + clients can start using the LLCC slices. config QCOM_MDT_LOADER tristate diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile index eeb088beb15f..3bf26667d7ee 100644 --- a/drivers/soc/qcom/Makefile +++ b/drivers/soc/qcom/Makefile @@ -21,6 +21,6 @@ obj-$(CONFIG_QCOM_SMSM) += smsm.o obj-$(CONFIG_QCOM_WCNSS_CTRL) += wcnss_ctrl.o obj-$(CONFIG_QCOM_APR) += apr.o obj-$(CONFIG_QCOM_LLCC) += llcc-slice.o -obj-$(CONFIG_QCOM_SDM845_LLCC) += llcc-sdm845.o +obj-$(CONFIG_QCOM_PLAT_LLCC) += llcc-plat.o obj-$(CONFIG_QCOM_RPMHPD) += rpmhpd.o obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o diff --git a/drivers/soc/qcom/llcc-sdm845.c b/drivers/soc/qcom/llcc-plat.c similarity index 100% rename from drivers/soc/qcom/llcc-sdm845.c rename to drivers/soc/qcom/llcc-plat.c -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 1/1] arm64: dts: sdm845: Add device node for Last level cache controller
From: Sai Prakash Ranjan Last level cache (aka. system cache) controller provides control over the last level cache present on SDM845. This cache lies after the memory noc, right before the DDR. Signed-off-by: Sai Prakash Ranjan Signed-off-by: Vivek Gautam --- arch/arm64/boot/dts/qcom/sdm845.dtsi | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi index 4babff5f19b5..314241a99290 100644 --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi @@ -1275,6 +1275,13 @@ }; }; + cache-controller@110 { + compatible = "qcom,sdm845-llcc"; + reg = <0 0x110 0 0x20>, <0 0x130 0 0x5>; + reg-names = "llcc_base", "llcc_broadcast_base"; + interrupts = ; + }; + ufs_mem_hc: ufshc@1d84000 { compatible = "qcom,sdm845-ufshc", "qcom,ufshc", "jedec,ufs-2.0"; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v3 3/4] iommu/arm-smmu: Add support to handle Qcom's wait-for-safe logic
On Wed, Jun 26, 2019 at 8:18 PM Will Deacon wrote: > > On Wed, Jun 26, 2019 at 12:03:02PM +0530, Vivek Gautam wrote: > > On Tue, Jun 25, 2019 at 7:09 PM Will Deacon wrote: > > > > > > On Tue, Jun 25, 2019 at 12:34:56PM +0530, Vivek Gautam wrote: > > > > On Mon, Jun 24, 2019 at 10:33 PM Will Deacon wrote: > > > > > Instead, I think this needs to be part of a separate file that is > > > > > maintained > > > > > by you, which follows on from the work that Krishna is doing for > > > > > nvidia > > > > > built on top of Robin's prototype patches: > > > > > > > > > > http://linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/smmu-impl > > > > > > > > Looking at this branch quickly, it seem there can be separate > > > > implementation > > > > level configuration file that can be added. > > > > But will this also handle separate page table ops when required in > > > > future. > > > > > > Nothing's set in stone, but having the implementation-specific code > > > constrain the page-table format (especially wrt quirks) sounds reasonable > > > to > > > me. I'm currently waiting for Krishna to respin the nvidia changes [1] on > > > top of this so that we can see how well the abstractions are holding up. > > > > Sure. Would you want me to try Robin's branch and take out the qualcomm > > related stuff to its own implementation? Or, would you like me to respin > > this > > series so that you can take it in to enable SDM845 boards such as, MTP > > and dragonboard to have a sane build - debian, etc. so people benefit > > out of it. > > I can't take this series without Acks on the firmware calling changes, and I > plan to send my 5.3 patches to Joerg at the end of the week so they get some > time in -next. In which case, I think it may be worth you having a play with > the branch above so we can get a better idea of any additional smmu_impl hooks > you may need. Cool. I will play around with it and get something tangible and meaningful. > > > Qualcomm stuff is lying in qcom-smmu and arm-smmu and may take some > > time to stub out the implementation related details. > > Not sure I follow you here. Are you talking about qcom_iommu.c? That's right. The qcom_iommu.c solved a different issue of secure context bank allocations, when Rob forked out this driver and reused some of the arm-smmu.c stuff. We will take a look at that once we start adding the qcom implementation. Thanks Vivek > > Will > ___ > iommu mailing list > io...@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v2] arm64: dts: qcom: msm8996: Rename smmu nodes
On Wed, Jun 19, 2019 at 5:46 AM Bjorn Andersson wrote: > > Node names shouldn't include a vendor prefix and should whenever > possible use a generic identifier. Resolve this by renaming the smmu > nodes "iommu". The bindings too say so :) Reviewed-by: Vivek Gautam > > Signed-off-by: Bjorn Andersson > --- > > Changes since v1: > - Updated commit message to talk about vendor prefix rather than qcom, > > arch/arm64/boot/dts/qcom/msm8996.dtsi | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi > b/arch/arm64/boot/dts/qcom/msm8996.dtsi > index 2ecd9d775d61..c934e00434c7 100644 > --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi > +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi > @@ -1163,7 +1163,7 @@ > }; > }; > > - vfe_smmu: arm,smmu@da { > + vfe_smmu: iommu@da { > compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2"; > reg = <0xda 0x1>; > > @@ -1314,7 +1314,7 @@ > }; > }; > > - adreno_smmu: arm,smmu@b4 { > + adreno_smmu: iommu@b4 { > compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2"; > reg = <0xb4 0x1>; > > @@ -1331,7 +1331,7 @@ > power-domains = <&mmcc GPU_GDSC>; > }; > > - mdp_smmu: arm,smmu@d0 { > + mdp_smmu: iommu@d0 { > compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2"; > reg = <0xd0 0x1>; > > @@ -1347,7 +1347,7 @@ > power-domains = <&mmcc MDSS_GDSC>; > }; > > - lpass_q6_smmu: arm,smmu-lpass_q6@160 { > + lpass_q6_smmu: iommu@160 { > compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2"; > reg = <0x160 0x2>; > #iommu-cells = <1>; > -- > 2.18.0 > -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v1] phy: qcom-qmp: Raise qcom_qmp_phy_enable() polling delay
Hi Marc, On 6/13/2019 5:02 PM, Marc Gonzalez wrote: readl_poll_timeout() calls usleep_range() to sleep between reads. usleep_range() doesn't work efficiently for tiny values. Raise the polling delay in qcom_qmp_phy_enable() to bring it in line with the delay in qcom_qmp_phy_com_init(). Signed-off-by: Marc Gonzalez --- Vivek, do you remember why you didn't use the same delay value in qcom_qmp_phy_enable) and qcom_qmp_phy_com_init() ? phy_qcom_init() thingy came from the PCIE phy driver from downstream msm-3.18 PCIE did something as below: - do { Â Â Â if (pcie_phy_is_ready(dev)) Â Â Â break; Â Â Â retries++; Â Â Â usleep_range(REFCLK_STABILIZATION_DELAY_US_MIN, REFCLK_STABILIZATION_DELAY_US_MAX); } while (retries < PHY_READY_TIMEOUT_COUNT); REFCLK_STABILIZATION_DELAY_US_MIN/MAX ==> 1000/1005 PHY_READY_TIMEOUT_COUNT ==> 10 - phy_enable() from the usb phy driver from downstream. Â /* Wait for PHY initialization to be done */ Â do { if (readl_relaxed(phy->base + phy->phy_reg[USB3_PHY_PCS_STATUS]) & PHYSTATUS) usleep_range(1, 2); else break; Â } while (--init_timeout_usec); init_timeout_usec ==> 1000 - USB never had a COM_PHY status bit. So clearly the resolutions were different. Does this change solves an issue at hand? --- drivers/phy/qualcomm/phy-qcom-qmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c index bb522b915fa9..34ff6434da8f 100644 --- a/drivers/phy/qualcomm/phy-qcom-qmp.c +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c @@ -1548,7 +1548,7 @@ static int qcom_qmp_phy_enable(struct phy *phy) status = pcs + cfg->regs[QPHY_PCS_READY_STATUS]; mask = cfg->mask_pcs_ready; - ret = readl_poll_timeout(status, val, val & mask, 1, + ret = readl_poll_timeout(status, val, val & mask, 10, PHY_INIT_COMPLETE_TIMEOUT); if (ret) { dev_err(qmp->dev, "phy initialization timed-out\n");
Re: [PATCH] arm64: dts: sdm845: Add iommus property to qup1
On 6/11/2019 4:51 AM, Stephen Boyd wrote: Quoting Vivek Gautam (2019-06-06 04:17:16) Hi Stephen, On Thu, Jun 6, 2019 at 2:27 AM Stephen Boyd wrote: Quoting Vivek Gautam (2019-06-04 21:55:26) Cheza will throw faults for anything that is programmed with TZ on mtp as all of that should be handled in HLOS. The firmwares of all these peripherals assume that the SID reservation is done (whether in TZ or HLOS). I am inclined to moving the iommus property for all 'TZ' to board dts files. MTP wouldn't need those SIDs. So, the SOC level dtsi will have just the HLOS SIDs. So you're saying you'd like to have the SID be <&apps_smmu 0x6c3 0x0> in the sdm845.dtsi file and then override this on Cheza because our SID is different (possibly because we don't use GSI)? Why can't we program the SID in Cheza firmware to match the "HLOS" SID of 0x6c3? Sorry my bad, I missed the overriding part. May be we add the lists of SIDs in board dts only. So, cheza dts will have all these SIDs - <&apps_smmu 0x6c0 0x3> // for both 0x6c0 (TZ) and 0x6c3 (HLOS) <&apps_smmu 0x6d6 0x0> // if we want to use the GSI dma. and MTP will have <&apps_smmu 0x6c3 0x0> <&apps_smmu 0x6d6 0x0> WDUT? I'd prefer to fix the firmware so that the HLOS SID is used even on this board. Making Cheza use something different from MTP doesn't sound so good. Do you know how that works? Is there some configuration register or something that I should be looking for to see why the SID is not the HLOS one? It's definitely generating SIDs for the TZ SID (0x6c0), but I'd like to make sure that we can't change it because it's tied to some hardware signal like the NS bit and/or the Execution Level. Hopefully it's a config and then our difference from MTP can be minimized. I don't think SMMU limits any such programming of SIDs. It's a design decision to program few SIDs in TZ/Hyp and allocate the corresponding context banks and create respective mappings. You should be able to see these SMR configurations before kernel boots up. I use a simple T32 command - smmu.add "" smmu.streammaptable e.g. for sdm845 apps_smmu smmu.add "apps" MMU500 a:0x1500 smmu.StreamMapTable apps This dumps all the information regarding the smmu. As far as I can tell, HLOS on SDM845 has always used GPI (yet another DMA engine) to do the DMA transfers. And the GPI is the hardware block that uses the SID of 0x6d6 above, so putting that into iommus for the geniqup doesn't make any sense given that GPI is another node. Can you confirm this is the case? Furthermore, the SID of 0x6c3 sounds untested? Has it ever been generated on SDM845 MTP? I will get back with this information. BRs Vivek If we ever support GPI, I'd expect to see something like this in DT: gpi_dma: gpi@a0 { reg = <0x00a0 0x6>; iommus = <&apps_smmu 0x6d6 0x0>; ... }; geniqup@ac { reg = <0x00ac 0x6000>; iommus = <&apps_smmu 0x6c3 0x0>; i2c@{ dmas = <&gpi_dma >; }; But now I'm worried that the geniqup needs the proper geniqup wrapper clks to talk to it. Most likely the GPI is embedded inside the geniqup wrapper and sits right next to the bus to do bus DMA mastering. From the DT side, it means we should either put it inside the geniqup node, or we should add the wrapper clks to the GPI node and hope things work out with regards to clks and shared resources being used at the right time. If we're left with trying to figure out how to express the different SIDs depending on the CPU execution state then it may be easier to push for GPI upstreaming and use that dma engine to "fold" the SID numberspace into one SID for the GPI. This would avoid having to deal with the HLOS vs. TZ SID problem by adding a whole other driver. Or we could just rip out the non-GPI DMA support in this driver because the SID is all confused.
[PATCH v3 0/4] Qcom smmu-500 wait-for-safe handling for sdm845
Subject changed, older subject was - Qcom smmu-500 TLB invalidation errata for sdm845. Previous version of the patches are at [1]: Qcom's implementation of smmu-500 on sdm845 adds a hardware logic called wait-for-safe. This logic helps in meeting the invalidation requirements from 'real-time clients', such as display and camera. This wait-for-safe logic ensures that the invalidations happen after getting an ack from these devices. In this patch-series we are disabling this wait-for-safe logic from the arm-smmu driver's probe as with this enabled the hardware tries to throttle invalidations from 'non-real-time clients', such as USB and UFS. For detailed information please refer to patch [3/4] in this series. I have included the device tree patch too in this series for someone who would like to test out this. Here's a branch [2] that gets display on MTP SDM845 device. This patch series is inspired from downstream work to handle under-performance issues on real-time clients on sdm845. In downstream we add separate page table ops to handle TLB maintenance and toggle wait-for-safe in tlb_sync call so that achieve required performance for display and camera [3, 4]. Changes since v2: * Dropped the patch to add atomic io_read/write scm API. * Removed support for any separate page table ops to handle wait-for-safe. Currently just disabling this wait-for-safe logic from arm_smmu_device_probe() to achieve performance on USB/UFS on sdm845. * Added a device tree patch to add smmu option for fw-implemented support for SCM call to take care of SAFE toggling. Changes since v1: * Addressed Will and Robin's comments: - Dropped the patch[4] that forked out __arm_smmu_tlb_inv_range_nosync(), and __arm_smmu_tlb_sync(). - Cleaned up the errata patch further to use downstream polling mechanism for tlb sync. * No change in SCM call patches - patches 1 to 3. [1] https://lore.kernel.org/patchwork/cover/983913/ [2] https://github.com/vivekgautam1/linux/tree/v5.2-rc4/sdm845-display-working [3] https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/drivers/iommu/arm-smmu.c?h=CogSystems-msm-49/msm-4.9&id=da765c6c75266b38191b38ef086274943f353ea7 [4] https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/drivers/iommu/arm-smmu.c?h=CogSystems-msm-49/msm-4.9&id=8696005aaaf745de68f57793c1a534a34345c30a Vivek Gautam (4): firmware: qcom_scm-64: Add atomic version of qcom_scm_call firmware/qcom_scm: Add scm call to handle smmu errata iommu/arm-smmu: Add support to handle Qcom's wait-for-safe logic arm64: dts/sdm845: Enable FW implemented safe sequence handler on MTP arch/arm64/boot/dts/qcom/sdm845.dtsi | 1 + drivers/firmware/qcom_scm-32.c | 5 ++ drivers/firmware/qcom_scm-64.c | 149 --- drivers/firmware/qcom_scm.c | 6 ++ drivers/firmware/qcom_scm.h | 5 ++ drivers/iommu/arm-smmu.c | 16 include/linux/qcom_scm.h | 2 + 7 files changed, 140 insertions(+), 44 deletions(-) -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v3 2/4] firmware/qcom_scm: Add scm call to handle smmu errata
Qcom's smmu-500 needs to toggle wait-for-safe logic to handle TLB invalidations. Few firmwares allow doing that through SCM interface. Add API to toggle wait for safe from firmware through a SCM call. Signed-off-by: Vivek Gautam Reviewed-by: Bjorn Andersson --- drivers/firmware/qcom_scm-32.c | 5 + drivers/firmware/qcom_scm-64.c | 13 + drivers/firmware/qcom_scm.c| 6 ++ drivers/firmware/qcom_scm.h| 5 + include/linux/qcom_scm.h | 2 ++ 5 files changed, 31 insertions(+) diff --git a/drivers/firmware/qcom_scm-32.c b/drivers/firmware/qcom_scm-32.c index 215061c581e1..bee8729525ec 100644 --- a/drivers/firmware/qcom_scm-32.c +++ b/drivers/firmware/qcom_scm-32.c @@ -614,3 +614,8 @@ int __qcom_scm_io_writel(struct device *dev, phys_addr_t addr, unsigned int val) return qcom_scm_call_atomic2(QCOM_SCM_SVC_IO, QCOM_SCM_IO_WRITE, addr, val); } + +int __qcom_scm_qsmmu500_wait_safe_toggle(struct device *dev, bool enable) +{ + return -ENODEV; +} diff --git a/drivers/firmware/qcom_scm-64.c b/drivers/firmware/qcom_scm-64.c index b6dca32c5ac4..23de54b75cd7 100644 --- a/drivers/firmware/qcom_scm-64.c +++ b/drivers/firmware/qcom_scm-64.c @@ -550,3 +550,16 @@ int __qcom_scm_io_writel(struct device *dev, phys_addr_t addr, unsigned int val) return qcom_scm_call(dev, QCOM_SCM_SVC_IO, QCOM_SCM_IO_WRITE, &desc, &res); } + +int __qcom_scm_qsmmu500_wait_safe_toggle(struct device *dev, bool en) +{ + struct qcom_scm_desc desc = {0}; + struct arm_smccc_res res; + + desc.args[0] = QCOM_SCM_CONFIG_SAFE_EN_CLIENT_ALL; + desc.args[1] = en; + desc.arginfo = QCOM_SCM_ARGS(2); + + return qcom_scm_call_atomic(dev, QCOM_SCM_SVC_SMMU_PROGRAM, + QCOM_SCM_CONFIG_SAFE_EN, &desc, &res); +} diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c index 2ddc118dba1b..2b3b7a8c4270 100644 --- a/drivers/firmware/qcom_scm.c +++ b/drivers/firmware/qcom_scm.c @@ -344,6 +344,12 @@ int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 size, u32 spare) } EXPORT_SYMBOL(qcom_scm_iommu_secure_ptbl_init); +int qcom_scm_qsmmu500_wait_safe_toggle(bool en) +{ + return __qcom_scm_qsmmu500_wait_safe_toggle(__scm->dev, en); +} +EXPORT_SYMBOL(qcom_scm_qsmmu500_wait_safe_toggle); + int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val) { return __qcom_scm_io_readl(__scm->dev, addr, val); diff --git a/drivers/firmware/qcom_scm.h b/drivers/firmware/qcom_scm.h index 99506bd873c0..0b63ded89b41 100644 --- a/drivers/firmware/qcom_scm.h +++ b/drivers/firmware/qcom_scm.h @@ -91,10 +91,15 @@ extern int __qcom_scm_restore_sec_cfg(struct device *dev, u32 device_id, u32 spare); #define QCOM_SCM_IOMMU_SECURE_PTBL_SIZE3 #define QCOM_SCM_IOMMU_SECURE_PTBL_INIT4 +#define QCOM_SCM_SVC_SMMU_PROGRAM 0x15 +#define QCOM_SCM_CONFIG_SAFE_EN0x3 +#define QCOM_SCM_CONFIG_SAFE_EN_CLIENT_ALL 0x2 extern int __qcom_scm_iommu_secure_ptbl_size(struct device *dev, u32 spare, size_t *size); extern int __qcom_scm_iommu_secure_ptbl_init(struct device *dev, u64 addr, u32 size, u32 spare); +extern int __qcom_scm_qsmmu500_wait_safe_toggle(struct device *dev, + bool enable); #define QCOM_MEM_PROT_ASSIGN_ID0x16 extern int __qcom_scm_assign_mem(struct device *dev, phys_addr_t mem_region, size_t mem_sz, diff --git a/include/linux/qcom_scm.h b/include/linux/qcom_scm.h index 3f12cc77fb58..aee3d8580d89 100644 --- a/include/linux/qcom_scm.h +++ b/include/linux/qcom_scm.h @@ -57,6 +57,7 @@ extern int qcom_scm_set_remote_state(u32 state, u32 id); extern int qcom_scm_restore_sec_cfg(u32 device_id, u32 spare); extern int qcom_scm_iommu_secure_ptbl_size(u32 spare, size_t *size); extern int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 size, u32 spare); +extern int qcom_scm_qsmmu500_wait_safe_toggle(bool en); extern int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val); extern int qcom_scm_io_writel(phys_addr_t addr, unsigned int val); #else @@ -96,6 +97,7 @@ qcom_scm_set_remote_state(u32 state,u32 id) { return -ENODEV; } static inline int qcom_scm_restore_sec_cfg(u32 device_id, u32 spare) { return -ENODEV; } static inline int qcom_scm_iommu_secure_ptbl_size(u32 spare, size_t *size) { return -ENODEV; } static inline int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 size, u32 spare) { return -ENODEV; } +static inline int qcom_scm_qsmmu500_wait_safe_toggle(bool en) { return -ENODEV; } static inline int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val) { return -ENODEV; } static inline int qcom_scm_io_writel(phys_addr_t addr, unsigned int val) { return -ENODEV; } #e
Re: [PATCH] arm64: dts: sdm845: Add iommus property to qup1
Hi Stephen, On Thu, Jun 6, 2019 at 2:27 AM Stephen Boyd wrote: > > Quoting Vivek Gautam (2019-06-04 21:55:26) > > On Wed, Jun 5, 2019 at 4:16 AM Stephen Boyd wrote: > > > > > > Quoting Bjorn Andersson (2019-06-04 15:37:00) > > > > On Tue 04 Jun 15:29 PDT 2019, Stephen Boyd wrote: > > > > > > > > > The SMMU that sits in front of the QUP needs to be programmed properly > > > > > so that the i2c geni driver can allocate DMA descriptors. Failure to > > > > > do > > > > > this leads to faults when using devices such as an i2c touchscreen > > > > > where > > > > > the transaction is larger than 32 bytes and we use a DMA buffer. > > > > > > > > > > > > > I'm pretty sure I've run into this problem, but before we marked the > > > > smmu bypass_disable and as such didn't get the fault, thanks. > > > > > > > > > arm-smmu 1500.iommu: Unexpected global fault, this could be > > > > > serious > > > > > arm-smmu 1500.iommu: GFSR 0x0002, GFSYNR0 > > > > > 0x0002, GFSYNR1 0x06c0, GFSYNR2 0x > > > > > > > > > > Add the right SID and mask so this works. > > > > > > > > > > Cc: Sibi Sankar > > > > > Signed-off-by: Stephen Boyd > > > > > --- > > > > > arch/arm64/boot/dts/qcom/sdm845.dtsi | 1 + > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi > > > > > b/arch/arm64/boot/dts/qcom/sdm845.dtsi > > > > > index fcb93300ca62..2e57e861e17c 100644 > > > > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi > > > > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi > > > > > @@ -900,6 +900,7 @@ > > > > > #address-cells = <2>; > > > > > #size-cells = <2>; > > > > > ranges; > > > > > + iommus = <&apps_smmu 0x6c0 0x3>; > > > > > > > > According to the docs this stream belongs to TZ, the HLOS stream should > > > > be 0x6c3. > > > > > > Aye, I saw this line in the downstream kernel but it doesn't work for > > > me. If I specify <&apps_smmu 0x6c3 0x0> it still blows up. I wonder if > > > my firmware perhaps is missing some initialization here to make the QUP > > > operate in HLOS mode? Otherwise, I thought that the 0x3 at the end was > > > the mask and so it should be split off to the second cell in the DT > > > specifier but that seemed a little weird. > > > > Two things here - > > 0x6c0 - TZ SID. Do you see above fault on MTP sdm845 devices? > > No. I see the above fault on Cheza. Right, expected. > > > 0x6c3/0x6c6 - HLOS SIDs. My bad, the other SID is 0x6D6. > > Why are there two? I see some mentions of GSI mode near these SIDs so > maybe GSI has to be used for DMA here to get the above two SIDs at the > IOMMU? Otherwise if we do the non-GSI mode of DMA we're going to use the > "TZ" SID? Yea, one for GSI, and the other one for non-GSI DMA. I am unsure at this point about the use of TZ SID, but i would assume this is the SID that's used by the qup firmware, and therefore on MTP TZ programs this SID. > > > > > Cheza will throw faults for anything that is programmed with TZ on mtp > > as all of that should be handled in HLOS. The firmwares of all these > > peripherals assume that the SID reservation is done (whether in TZ or HLOS). > > > > I am inclined to moving the iommus property for all 'TZ' to board dts files. > > MTP wouldn't need those SIDs. So, the SOC level dtsi will have just the > > HLOS SIDs. > > So you're saying you'd like to have the SID be <&apps_smmu 0x6c3 0x0> in > the sdm845.dtsi file and then override this on Cheza because our SID is > different (possibly because we don't use GSI)? Why can't we program the > SID in Cheza firmware to match the "HLOS" SID of 0x6c3? Sorry my bad, I missed the overriding part. May be we add the lists of SIDs in board dts only. So, cheza dts will have all these SIDs - <&apps_smmu 0x6c0 0x3> // for both 0x6c0 (TZ) and 0x6c3 (HLOS) <&apps_smmu 0x6d6 0x0> // if we want to use the GSI dma. and MTP will have <&apps_smmu 0x6c3 0x0> <&apps_smmu 0x6d6 0x0> WDUT? > > > > > P.S. > > As you rightly said, the second cell in iommus property is the mask so that > > the iommu is able to reserve all that SIDs that are covered with the > > starting SID > > and the mask. > > > > Well if 0x6c6 is another possibility maybe it should be <&apps_smmu > 0x6c0 0x7> to cover the 0x6c3 and 0x6c6 SIDs? The other SID was 0x6D6. Best regards Vivek -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v2] arm64: dts: qcom: Add Dragonboard 845c
Hi Bjorn, On Thu, Jun 6, 2019 at 10:10 AM Bjorn Andersson wrote: > > This adds an initial dts for the Dragonboard 845. Supported > functionality includes Debug UART, UFS, USB-C (peripheral), USB-A > (host), microSD-card and Bluetooth. > > Initializing the SMMU is clearing the mapping used for the splash screen > framebuffer, which causes the board to reboot. This can be worked around > using: > > fastboot oem select-display-panel none This works well with your SMR handoff RFC series too? > > Signed-off-by: Bjorn Andersson > --- Patch looks good, so Reviewed-by: Vivek Gautam Best Regards Vivek -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH] arm64: dts: qcom: sdm845-mtp: Add Truly display
On Tue, May 14, 2019 at 2:39 AM Bjorn Andersson wrote: > > Bring in the Truly display and enable the DSI channels to make the > mdss/gpu probe, even though we're lacking LABIB, preventing us from > seeing anything on the screen. > > Signed-off-by: Bjorn Andersson > --- Looks good to me and work well too with a wip lab-ibb driver change. Reviewed-by: Vivek Gautam Tested-by: Vivek Gautam > arch/arm64/boot/dts/qcom/sdm845-mtp.dts | 79 + > 1 file changed, 79 insertions(+) > > diff --git a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts > b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts > index 02b8357c8ce8..83198a19ff57 100644 > --- a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts > +++ b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts > @@ -352,6 +352,77 @@ > status = "okay"; > }; > > +&dsi0 { > + status = "okay"; > + vdda-supply = <&vdda_mipi_dsi0_1p2>; > + > + qcom,dual-dsi-mode; > + qcom,master-dsi; > + > + ports { > + port@1 { > + endpoint { > + remote-endpoint = <&truly_in_0>; > + data-lanes = <0 1 2 3>; > + }; > + }; > + }; > + > + panel@0 { > + compatible = "truly,nt35597-2K-display"; > + reg = <0>; > + vdda-supply = <&vreg_l14a_1p88>; > + > + reset-gpios = <&tlmm 6 GPIO_ACTIVE_LOW>; > + mode-gpios = <&tlmm 52 GPIO_ACTIVE_HIGH>; > + > + ports { > + #address-cells = <1>; > + #size-cells = <0>; > + > + port@0 { > + reg = <0>; > + truly_in_0: endpoint { > + remote-endpoint = <&dsi0_out>; > + }; > + }; > + > + port@1 { > + reg = <1>; > + truly_in_1: endpoint { > + remote-endpoint = <&dsi1_out>; > + }; > + }; > + }; > + }; > +}; > + > +&dsi0_phy { > + status = "okay"; > + vdds-supply = <&vdda_mipi_dsi0_pll>; > +}; > + > +&dsi1 { > + status = "okay"; > + vdda-supply = <&vdda_mipi_dsi1_1p2>; > + > + qcom,dual-dsi-mode; > + > + ports { > + port@1 { > + endpoint { > + remote-endpoint = <&truly_in_1>; > + data-lanes = <0 1 2 3>; > + }; > + }; > + }; > +}; > + > +&dsi1_phy { > + status = "okay"; > + vdds-supply = <&vdda_mipi_dsi1_pll>; > +}; > + > &gcc { > protected-clocks = , >, > @@ -365,6 +436,14 @@ > clock-frequency = <40>; > }; > > +&mdss { > + status = "okay"; > +}; > + > +&mdss_mdp { > + status = "okay"; > +}; > + > &qupv3_id_1 { > status = "okay"; > }; > -- > 2.18.0 > -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH] arm64: dts: sdm845: Add iommus property to qup1
On Wed, Jun 5, 2019 at 4:16 AM Stephen Boyd wrote: > > Quoting Bjorn Andersson (2019-06-04 15:37:00) > > On Tue 04 Jun 15:29 PDT 2019, Stephen Boyd wrote: > > > > > The SMMU that sits in front of the QUP needs to be programmed properly > > > so that the i2c geni driver can allocate DMA descriptors. Failure to do > > > this leads to faults when using devices such as an i2c touchscreen where > > > the transaction is larger than 32 bytes and we use a DMA buffer. > > > > > > > I'm pretty sure I've run into this problem, but before we marked the > > smmu bypass_disable and as such didn't get the fault, thanks. > > > > > arm-smmu 1500.iommu: Unexpected global fault, this could be serious > > > arm-smmu 1500.iommu: GFSR 0x0002, GFSYNR0 0x0002, > > > GFSYNR1 0x06c0, GFSYNR2 0x > > > > > > Add the right SID and mask so this works. > > > > > > Cc: Sibi Sankar > > > Signed-off-by: Stephen Boyd > > > --- > > > arch/arm64/boot/dts/qcom/sdm845.dtsi | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi > > > b/arch/arm64/boot/dts/qcom/sdm845.dtsi > > > index fcb93300ca62..2e57e861e17c 100644 > > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi > > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi > > > @@ -900,6 +900,7 @@ > > > #address-cells = <2>; > > > #size-cells = <2>; > > > ranges; > > > + iommus = <&apps_smmu 0x6c0 0x3>; > > > > According to the docs this stream belongs to TZ, the HLOS stream should > > be 0x6c3. > > Aye, I saw this line in the downstream kernel but it doesn't work for > me. If I specify <&apps_smmu 0x6c3 0x0> it still blows up. I wonder if > my firmware perhaps is missing some initialization here to make the QUP > operate in HLOS mode? Otherwise, I thought that the 0x3 at the end was > the mask and so it should be split off to the second cell in the DT > specifier but that seemed a little weird. Two things here - 0x6c0 - TZ SID. Do you see above fault on MTP sdm845 devices? 0x6c3/0x6c6 - HLOS SIDs. Cheza will throw faults for anything that is programmed with TZ on mtp as all of that should be handled in HLOS. The firmwares of all these peripherals assume that the SID reservation is done (whether in TZ or HLOS). I am inclined to moving the iommus property for all 'TZ' to board dts files. MTP wouldn't need those SIDs. So, the SOC level dtsi will have just the HLOS SIDs. P.S. As you rightly said, the second cell in iommus property is the mask so that the iommu is able to reserve all that SIDs that are covered with the starting SID and the mask. Best regards Vivek -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH] of/device: add blacklist for iommu dma_ops
On Mon, Jun 3, 2019 at 4:14 PM Rob Clark wrote: > > On Mon, Jun 3, 2019 at 12:57 AM Vivek Gautam > wrote: > > > > > > > > On 6/3/2019 11:50 AM, Tomasz Figa wrote: > > > On Mon, Jun 3, 2019 at 4:40 AM Rob Clark wrote: > > >> On Fri, May 10, 2019 at 7:35 AM Rob Clark wrote: > > >>> On Tue, Dec 4, 2018 at 2:29 PM Rob Herring wrote: > > >>>> On Sat, Dec 1, 2018 at 10:54 AM Rob Clark wrote: > > >>>>> This solves a problem we see with drm/msm, caused by getting > > >>>>> iommu_dma_ops while we attach our own domain and manage it directly at > > >>>>> the iommu API level: > > >>>>> > > >>>>>[0038] user address but active_mm is swapper > > >>>>>Internal error: Oops: 9605 [#1] PREEMPT SMP > > >>>>>Modules linked in: > > >>>>>CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW > > >>>>> 4.19.3 #90 > > >>>>>Hardware name: xxx (DT) > > >>>>>Workqueue: events deferred_probe_work_func > > >>>>>pstate: 80c9 (Nzcv daif +PAN +UAO) > > >>>>>pc : iommu_dma_map_sg+0x7c/0x2c8 > > >>>>>lr : iommu_dma_map_sg+0x40/0x2c8 > > >>>>>sp : ff80095eb4f0 > > >>>>>x29: ff80095eb4f0 x28: > > >>>>>x27: ffc0f9431578 x26: > > >>>>>x25: x24: 0003 > > >>>>>x23: 0001 x22: ffc0fa9ac010 > > >>>>>x21: x20: ffc0fab40980 > > >>>>>x19: ffc0fab40980 x18: 0003 > > >>>>>x17: 01c4 x16: 0007 > > >>>>>x15: 000e x14: > > >>>>>x13: x12: 0028 > > >>>>>x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f > > >>>>>x9 : x8 : ffc0fab409a0 > > >>>>>x7 : x6 : 0002 > > >>>>>x5 : 0001 x4 : > > >>>>>x3 : 0001 x2 : 0002 > > >>>>>x1 : ffc0f9431578 x0 : > > >>>>>Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb) > > >>>>>Call trace: > > >>>>> iommu_dma_map_sg+0x7c/0x2c8 > > >>>>> __iommu_map_sg_attrs+0x70/0x84 > > >>>>> get_pages+0x170/0x1e8 > > >>>>> msm_gem_get_iova+0x8c/0x128 > > >>>>> _msm_gem_kernel_new+0x6c/0xc8 > > >>>>> msm_gem_kernel_new+0x4c/0x58 > > >>>>> dsi_tx_buf_alloc_6g+0x4c/0x8c > > >>>>> msm_dsi_host_modeset_init+0xc8/0x108 > > >>>>> msm_dsi_modeset_init+0x54/0x18c > > >>>>> _dpu_kms_drm_obj_init+0x430/0x474 > > >>>>> dpu_kms_hw_init+0x5f8/0x6b4 > > >>>>> msm_drm_bind+0x360/0x6c8 > > >>>>> try_to_bring_up_master.part.7+0x28/0x70 > > >>>>> component_master_add_with_match+0xe8/0x124 > > >>>>> msm_pdev_probe+0x294/0x2b4 > > >>>>> platform_drv_probe+0x58/0xa4 > > >>>>> really_probe+0x150/0x294 > > >>>>> driver_probe_device+0xac/0xe8 > > >>>>> __device_attach_driver+0xa4/0xb4 > > >>>>> bus_for_each_drv+0x98/0xc8 > > >>>>> __device_attach+0xac/0x12c > > >>>>> device_initial_probe+0x24/0x30 > > >>>>> bus_probe_device+0x38/0x98 > > >>>>> deferred_probe_work_func+0x78/0xa4 > > >>>>> process_one_work+0x24c/0x3dc > > >>>>> worker_thread+0x280/0x360 > > >>>>> kthread+0x134/0x13c > > >>>>> ret_from_fork+0x10/0x18 > > >>>>>Code: d284 91000725 6b17039f 5400048a (f9401f40) > > >>>>>---[ end trace f22dda57f3648e2c ]--- > > >>>>>Kernel panic - not syncing: Fatal exception > > >>>>>SMP: stopping secondary CPUs > > >>>>>Kernel Offset: disable
Re: [PATCH 1/1] drm/panel: truly: Add additional delay after pulling down reset gpio
On 5/28/2019 2:13 PM, Marc Gonzalez wrote: On 27/05/2019 12:26, Vivek Gautam wrote: MTP SDM845 panel seems to need additional delay to bring panel to a workable state. Running modetest without this change displays blurry artifacts. Signed-off-by: Vivek Gautam --- drivers/gpu/drm/panel/panel-truly-nt35597.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/panel/panel-truly-nt35597.c b/drivers/gpu/drm/panel/panel-truly-nt35597.c index fc2a66c53db4..aa7153fd3be4 100644 --- a/drivers/gpu/drm/panel/panel-truly-nt35597.c +++ b/drivers/gpu/drm/panel/panel-truly-nt35597.c @@ -280,6 +280,7 @@ static int truly_35597_power_on(struct truly_nt35597 *ctx) gpiod_set_value(ctx->reset_gpio, 1); usleep_range(1, 2); gpiod_set_value(ctx->reset_gpio, 0); + usleep_range(1, 2); I'm not sure usleep_range() makes sense with these values. AFAIU, usleep_range() is typically used for sub-jiffy sleeps, and is based on HRT to generate an interrupt. Once we get into jiffy granularity, it seems to me msleep() is good enough. IIUC, it would piggy-back on the jiffy timer interrupt. In short, why not just use msleep(10); ? I am just maintaining the symmetry across older code. Thanks Vivek Regards.
Re: [PATCH v1] phy: qcom-qmp: Add msm8998 PCIe QMP PHY support
Hi Marc, On Tue, Mar 26, 2019 at 1:18 PM Kishon Vijay Abraham I wrote: > > Hi, > > On 22/03/19 9:42 PM, Marc Gonzalez wrote: > > Copy init sequence from downstream: > > https://source.codeaurora.org/quic/la/kernel/msm-4.4/tree/arch/arm/boot/dts/qcom/msm8998-v2.dtsi?h=LE.UM.1.3.r3.25#n372 > > Can't we instead have reference to HW manual or datasheet? > > > > Signed-off-by: Marc Gonzalez > > --- > > .../devicetree/bindings/phy/qcom-qmp-phy.txt | 5 + > > drivers/phy/qualcomm/phy-qcom-qmp.c | 110 ++ > > drivers/phy/qualcomm/phy-qcom-qmp.h | 12 ++ > > 3 files changed, 127 insertions(+) > > > > diff --git a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt > > b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt > > index 5d181fc3cc18..6000ae34b12b 100644 > > --- a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt > > +++ b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt > > @@ -11,6 +11,7 @@ Required properties: > > "qcom,msm8996-qmp-usb3-phy" for 14nm USB3 phy on msm8996, > > "qcom,msm8998-qmp-usb3-phy" for USB3 QMP V3 phy on msm8998, > > "qcom,msm8998-qmp-ufs-phy" for UFS QMP phy on msm8998, > > +"qcom,msm8998-qmp-pcie-phy" for PCIe QMP phy on msm8998, > > "qcom,sdm845-qmp-usb3-phy" for USB3 QMP V3 phy on sdm845, > > "qcom,sdm845-qmp-usb3-uni-phy" for USB3 QMP V3 UNI phy on > > sdm845, > > "qcom,sdm845-qmp-ufs-phy" for UFS QMP phy on sdm845. > > @@ -48,6 +49,8 @@ Required properties: > > "aux", "cfg_ahb", "ref". > > For "qcom,msm8998-qmp-ufs-phy" must contain: > > "ref", "ref_aux". > > + For "qcom,msm8998-qmp-pcie-phy" must contain: > > + "aux", "cfg_ahb", "ref". > > For "qcom,sdm845-qmp-usb3-phy" must contain: > > "aux", "cfg_ahb", "ref", "com_aux". > > For "qcom,sdm845-qmp-usb3-uni-phy" must contain: > > @@ -70,6 +73,8 @@ Required properties: > > For "qcom,msm8998-qmp-usb3-phy" must contain > > "phy", "common". > > For "qcom,msm8998-qmp-ufs-phy": no resets are listed. > > + For "qcom,msm8998-qmp-pcie-phy" must contain: > > + "phy", "common", "cfg". > > For "qcom,sdm845-qmp-usb3-phy" must contain: > > "phy", "common". > > For "qcom,sdm845-qmp-usb3-uni-phy" must contain: > > Please send the dt binding in a separate patch. > > Thanks > Kishon Thanks for the patch. Besides above comments from Kishon it looks good. Reviewed-by: Vivek Gautam Best regards Vivek -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v6 8/8] arm64: dts: qcom: sdm845: Add Q6V5 MSS node
Hi Doug, On Thu, Feb 28, 2019 at 2:34 AM Doug Anderson wrote: > > Hi, > > On Tue, Feb 26, 2019 at 3:54 PM Doug Anderson wrote: > > > > Hi, > > > > On Tue, Feb 5, 2019 at 9:13 PM Bjorn Andersson > > wrote: > > > > > > From: Sibi Sankar > > > > > > This patch adds Q6V5 MSS remoteproc node for SDM845 SoCs. > > > > > > Signed-off-by: Sibi Sankar > > > Reviewed-by: Douglas Anderson > > > Signed-off-by: Bjorn Andersson > > > --- > > > > > > Changes since v5: > > > - None > > > > > > arch/arm64/boot/dts/qcom/sdm845.dtsi | 58 > > > 1 file changed, 58 insertions(+) > > > > > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi > > > b/arch/arm64/boot/dts/qcom/sdm845.dtsi > > > index 560c16616ee6..5c41f6fe3e1b 100644 > > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi > > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi > > > @@ -1612,6 +1612,64 @@ > > > }; > > > }; > > > > > > + mss_pil: remoteproc@408 { > > > + compatible = "qcom,sdm845-mss-pil"; > > > + reg = <0 0x0408 0 0x408>, <0 0x0418 0 > > > 0x48>; > > > + reg-names = "qdsp6", "rmb"; > > > > I found that when I disabled IOMMU bypass by booting with > > "arm-smmu.disable_bypass=y" that I'd get this failure: > > > > --- > > > > [ 13.633776] qcom-q6v5-mss 408.remoteproc: MBA booted, loading mpss > > [ 13.647694] arm-smmu 1500.iommu: Unexpected global fault, this > > could be serious > > [ 13.660278] arm-smmu 1500.iommu: GFSR 0x8002, GFSYNR0 > > 0x, GFSYNR1 0x0781, GFSYNR2 0x > > ... > > [ 14.648830] qcom-q6v5-mss 408.remoteproc: MPSS header > > authentication timed out > > [ 14.657141] qcom-q6v5-mss 408.remoteproc: port failed halt > > [ 14.664983] remoteproc remoteproc0: can't start rproc > > 408.remoteproc: -110 > > > > --- > > > > Adding "iommus = <&apps_smmu 0x781 0>;" here fixed my problem. NOTE > > that I'm no expert on IOMMUs so you should confirm that this is right, > > but if it is then maybe you could include it in the next spin of the > > series? I got the "0x781" just by looking at the value of the GFSYNR1 > > in the above splat. I wasn't sure what to put for the mask so I put > > 0x0. > > Upon more testing the "iommus" line that I came up with avoids the > global fault but doesn't actually work. I just get: > > qcom-q6v5-mss 408.remoteproc: failed to allocate mdt buffer > > I'm hoping someone from Qualcomm can help out here and say how this > should be solved. Thanks! I and Sibi had a chance to look at this, and we could compare things with MTP sdm845 device as well. >From the 845 block diagram it's clear that one of the MPSS paths goes through SMMU and therefore we have the SIDs 0x780 - 0x783 reserved for these streams. However, it is recommended to use them in a bypass mode (S2CR_TYPE_BYPASS). On MTP devices, the secure code programs these SIDs in SMMU and, as these SMRs are marked secure they are not visible to the kernel. Thus kernel wouldn't overwrite anything. However, in your case there's no such reservation by the secure code. In such a case, we may need to make SMMU aware of these SIDs in the kernel. And please note that adding "iommus = <&apps_smmu 0x781 0>" to the PIL device may not be the correct thing to do, since actual MPSS data streams don't use the SMMU. So, configuring DMA path via SMMU isn't right. Thanks & regards Vivek > > > -Doug -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache
On Tue, Jan 29, 2019 at 8:34 PM Ard Biesheuvel wrote: > > (+ Bjorn) > > On Mon, 28 Jan 2019 at 12:27, Vivek Gautam > wrote: > > > > Hi Ard, > > > > On Thu, Jan 24, 2019 at 1:25 PM Ard Biesheuvel > > wrote: > > > > > > On Thu, 24 Jan 2019 at 07:58, Vivek Gautam > > > wrote: > > > > > > > > On Mon, Jan 21, 2019 at 7:55 PM Ard Biesheuvel > > > > wrote: > > > > > > > > > > On Mon, 21 Jan 2019 at 14:56, Robin Murphy > > > > > wrote: > > > > > > > > > > > > On 21/01/2019 13:36, Ard Biesheuvel wrote: > > > > > > > On Mon, 21 Jan 2019 at 14:25, Robin Murphy > > > > > > > wrote: > > > > > > >> > > > > > > >> On 21/01/2019 10:50, Ard Biesheuvel wrote: > > > > > > >>> On Mon, 21 Jan 2019 at 11:17, Vivek Gautam > > > > > > >>> wrote: > > > > > > >>>> > > > > > > >>>> Hi, > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel > > > > > > >>>> wrote: > > > > > > >>>>> > > > > > > >>>>> On Mon, 21 Jan 2019 at 06:54, Vivek Gautam > > > > > > >>>>> wrote: > > > > > > >>>>>> > > > > > > >>>>>> Qualcomm SoCs have an additional level of cache called as > > > > > > >>>>>> System cache, aka. Last level cache (LLC). This cache sits > > > > > > >>>>>> right > > > > > > >>>>>> before the DDR, and is tightly coupled with the memory > > > > > > >>>>>> controller. > > > > > > >>>>>> The clients using this cache request their slices from this > > > > > > >>>>>> system cache, make it active, and can then start using it. > > > > > > >>>>>> For these clients with smmu, to start using the system cache > > > > > > >>>>>> for > > > > > > >>>>>> buffers and, related page tables [1], memory attributes need > > > > > > >>>>>> to be > > > > > > >>>>>> set accordingly. This series add the required support. > > > > > > >>>>>> > > > > > > >>>>> > > > > > > >>>>> Does this actually improve performance on reads from a > > > > > > >>>>> device? The > > > > > > >>>>> non-cache coherent DMA routines perform an unconditional > > > > > > >>>>> D-cache > > > > > > >>>>> invalidate by VA to the PoC before reading from the buffers > > > > > > >>>>> filled by > > > > > > >>>>> the device, and I would expect the PoC to be defined as lying > > > > > > >>>>> beyond > > > > > > >>>>> the LLC to still guarantee the architected behavior. > > > > > > >>>> > > > > > > >>>> We have seen performance improvements when running Manhattan > > > > > > >>>> GFXBench benchmarks. > > > > > > >>>> > > > > > > >>> > > > > > > >>> Ah ok, that makes sense, since in that case, the data flow is > > > > > > >>> mostly > > > > > > >>> to the device, not from the device. > > > > > > >>> > > > > > > >>>> As for the PoC, from my knowledge on sdm845 the system cache, > > > > > > >>>> aka > > > > > > >>>> Last level cache (LLC) lies beyond the point of coherency. > > > > > > >>>> Non-cache coherent buffers will not be cached to system cache > > > > > > >>>> also, and > > > > > > >>>> no additional software cache maintenance ops are required for > > > > > > >>>> system cache. > > > > > > >>>> Pratik can add mor
Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
Hi Will, On Tue, Jan 22, 2019 at 11:14 AM Will Deacon wrote: > > On Mon, Jan 21, 2019 at 11:35:30AM +0530, Vivek Gautam wrote: > > On Sun, Jan 20, 2019 at 5:31 AM Will Deacon wrote: > > > On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote: > > > > Adding a device tree option for arm smmu to enable non-cacheable > > > > memory for page tables. > > > > We already enable a smmu feature for coherent walk based on > > > > whether the smmu device is dma-coherent or not. Have an option > > > > to enable non-cacheable page table memory to force set it for > > > > particular smmu devices. > > > > > > Hmm, I must be missing something here. What is the difference between this > > > new property, and simply omitting dma-coherent on the SMMU? > > > > So, this is what I understood from the email thread for Last level > > cache support - > > Robin pointed to the fact that we may need to add support for setting > > non-cacheable > > mappings in the TCR. > > Currently, we don't do that for SMMUs that omit dma-coherent. > > We rely on the interconnect to handle the configuration set in TCR, > > and let interconnect > > ignore the cacheability if it can't support. > > I think that's a bug. With that fixed, can you get what you want by omitting > "dma-coherent"? Based on the discussion on the first patch in this series [1], I can update the series. First thing can be - if QUIRK_NO_DMA is set (i.e. the IOMMU _is_ coherent) then we use a cacheable TCR; So, we may need an additional check for this when setting the TCR. For the second case - IOMMUs that are *not* coherent, i.e ones that are omitting 'dma-coherent' property, anyways have to access the page table directly from memory. We take care of the CPU side of this by allocating non-coherent memory, and making sure that we sync the PTEs from map call. Shouldn't we mark TCR for these IOMMUs as non-cacheable for inner and outer cacheability attribute? [1] https://lore.kernel.org/patchwork/patch/1032939/ Regards Vivek > > Will -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH 1/2] iommu/io-pgtable-arm: Add support for non-coherent page tables
On Mon, Jan 21, 2019 at 6:43 PM Robin Murphy wrote: > > On 17/01/2019 09:27, Vivek Gautam wrote: > > From Robin's comment [1] about touching TCR configurations - > > > > "TBH if we're going to touch the TCR attributes at all then we should > > probably correct that sloppiness first - there's an occasional argument > > for using non-cacheable pagetables even on a coherent SMMU if reducing > > snoop traffic/latency on walks outweighs the cost of cache maintenance > > on PTE updates, but anyone thinking they can get that by overriding > > dma-coherent silently gets the worst of both worlds thanks to this > > current TCR value." > > > > We have IO_PGTABLE_QUIRK_NO_DMA quirk present, but we don't force > > anybody _not_ using dma-coherent smmu to have non-cacheable page table > > mappings. > > Having another quirk flag can help in having non-cacheable memory for > > page tables once and for all. > > > > [1] https://lore.kernel.org/patchwork/patch/1020906/ > > > > Signed-off-by: Vivek Gautam > > --- > > drivers/iommu/io-pgtable-arm.c | 17 - > > drivers/iommu/io-pgtable.h | 6 ++ > > 2 files changed, 18 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > > index 237cacd4a62b..c76919c30f1a 100644 > > --- a/drivers/iommu/io-pgtable-arm.c > > +++ b/drivers/iommu/io-pgtable-arm.c > > @@ -780,7 +780,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg > > *cfg, void *cookie) > > struct arm_lpae_io_pgtable *data; > > > > if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA > > | > > - IO_PGTABLE_QUIRK_NON_STRICT)) > > + IO_PGTABLE_QUIRK_NON_STRICT | > > + IO_PGTABLE_QUIRK_NON_COHERENT)) > > return NULL; > > > > data = arm_lpae_alloc_pgtable(cfg); > > @@ -788,9 +789,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg > > *cfg, void *cookie) > > return NULL; > > > > /* TCR */ > > - reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) | > > - (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) | > > - (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT); > > + reg = ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT; > > + > > + if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT) > > + reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT | > > +ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT; > > + else > > + reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT | > > +ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT; > > > > switch (ARM_LPAE_GRANULE(data)) { > > case SZ_4K: > > @@ -873,7 +879,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg > > *cfg, void *cookie) > > > > /* The NS quirk doesn't apply at stage 2 */ > > if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA | > > - IO_PGTABLE_QUIRK_NON_STRICT)) > > + IO_PGTABLE_QUIRK_NON_STRICT | > > + IO_PGTABLE_QUIRK_NON_COHERENT)) > > return NULL; > > > > data = arm_lpae_alloc_pgtable(cfg); > > diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h > > index 47d5ae559329..46604cf7b017 100644 > > --- a/drivers/iommu/io-pgtable.h > > +++ b/drivers/iommu/io-pgtable.h > > @@ -75,6 +75,11 @@ struct io_pgtable_cfg { > >* IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs > >* on unmap, for DMA domains using the flush queue mechanism for > >* delayed invalidation. > > + * > > + * IO_PGTABLE_QUIRK_NON_COHERENT: Enforce non-cacheable mappings for > > + * pagetables even on a coherent SMMU for cases where reducing > > + * snoop traffic/latency on walks outweighs the cost of cache > > + * maintenance on PTE updates. > > Hmm, we can't actually "enforce" anything with this as-is - all we're > doing is setting the attributes that the IOMMU will use for pagetable > walks, and that has no impact on how the CPU actually writes PTEs to > memory. In particular, in the case of a hardware-coherent IOMMU which is > described as such, even if we make the dma_map/sync calls they still > won't do
Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache
Hi Ard, On Thu, Jan 24, 2019 at 1:25 PM Ard Biesheuvel wrote: > > On Thu, 24 Jan 2019 at 07:58, Vivek Gautam > wrote: > > > > On Mon, Jan 21, 2019 at 7:55 PM Ard Biesheuvel > > wrote: > > > > > > On Mon, 21 Jan 2019 at 14:56, Robin Murphy wrote: > > > > > > > > On 21/01/2019 13:36, Ard Biesheuvel wrote: > > > > > On Mon, 21 Jan 2019 at 14:25, Robin Murphy > > > > > wrote: > > > > >> > > > > >> On 21/01/2019 10:50, Ard Biesheuvel wrote: > > > > >>> On Mon, 21 Jan 2019 at 11:17, Vivek Gautam > > > > >>> wrote: > > > > >>>> > > > > >>>> Hi, > > > > >>>> > > > > >>>> > > > > >>>> On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel > > > > >>>> wrote: > > > > >>>>> > > > > >>>>> On Mon, 21 Jan 2019 at 06:54, Vivek Gautam > > > > >>>>> wrote: > > > > >>>>>> > > > > >>>>>> Qualcomm SoCs have an additional level of cache called as > > > > >>>>>> System cache, aka. Last level cache (LLC). This cache sits right > > > > >>>>>> before the DDR, and is tightly coupled with the memory > > > > >>>>>> controller. > > > > >>>>>> The clients using this cache request their slices from this > > > > >>>>>> system cache, make it active, and can then start using it. > > > > >>>>>> For these clients with smmu, to start using the system cache for > > > > >>>>>> buffers and, related page tables [1], memory attributes need to > > > > >>>>>> be > > > > >>>>>> set accordingly. This series add the required support. > > > > >>>>>> > > > > >>>>> > > > > >>>>> Does this actually improve performance on reads from a device? The > > > > >>>>> non-cache coherent DMA routines perform an unconditional D-cache > > > > >>>>> invalidate by VA to the PoC before reading from the buffers > > > > >>>>> filled by > > > > >>>>> the device, and I would expect the PoC to be defined as lying > > > > >>>>> beyond > > > > >>>>> the LLC to still guarantee the architected behavior. > > > > >>>> > > > > >>>> We have seen performance improvements when running Manhattan > > > > >>>> GFXBench benchmarks. > > > > >>>> > > > > >>> > > > > >>> Ah ok, that makes sense, since in that case, the data flow is mostly > > > > >>> to the device, not from the device. > > > > >>> > > > > >>>> As for the PoC, from my knowledge on sdm845 the system cache, aka > > > > >>>> Last level cache (LLC) lies beyond the point of coherency. > > > > >>>> Non-cache coherent buffers will not be cached to system cache > > > > >>>> also, and > > > > >>>> no additional software cache maintenance ops are required for > > > > >>>> system cache. > > > > >>>> Pratik can add more if I am missing something. > > > > >>>> > > > > >>>> To take care of the memory attributes from DMA APIs side, we can > > > > >>>> add a > > > > >>>> DMA_ATTR definition to take care of any dma non-coherent APIs > > > > >>>> calls. > > > > >>>> > > > > >>> > > > > >>> So does the device use the correct inner non-cacheable, outer > > > > >>> writeback cacheable attributes if the SMMU is in pass-through? > > > > >>> > > > > >>> We have been looking into another use case where the fact that the > > > > >>> SMMU overrides memory attributes is causing issues (WC mappings used > > > > >>> by the radeon and amdgpu driver). So if the SMMU would honour the > > > > >>> existing attributes, would you still need the SMMU changes? > > > > >> > > > > >> Even if we could force a
Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache
On Mon, Jan 21, 2019 at 7:55 PM Ard Biesheuvel wrote: > > On Mon, 21 Jan 2019 at 14:56, Robin Murphy wrote: > > > > On 21/01/2019 13:36, Ard Biesheuvel wrote: > > > On Mon, 21 Jan 2019 at 14:25, Robin Murphy wrote: > > >> > > >> On 21/01/2019 10:50, Ard Biesheuvel wrote: > > >>> On Mon, 21 Jan 2019 at 11:17, Vivek Gautam > > >>> wrote: > > >>>> > > >>>> Hi, > > >>>> > > >>>> > > >>>> On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel > > >>>> wrote: > > >>>>> > > >>>>> On Mon, 21 Jan 2019 at 06:54, Vivek Gautam > > >>>>> wrote: > > >>>>>> > > >>>>>> Qualcomm SoCs have an additional level of cache called as > > >>>>>> System cache, aka. Last level cache (LLC). This cache sits right > > >>>>>> before the DDR, and is tightly coupled with the memory controller. > > >>>>>> The clients using this cache request their slices from this > > >>>>>> system cache, make it active, and can then start using it. > > >>>>>> For these clients with smmu, to start using the system cache for > > >>>>>> buffers and, related page tables [1], memory attributes need to be > > >>>>>> set accordingly. This series add the required support. > > >>>>>> > > >>>>> > > >>>>> Does this actually improve performance on reads from a device? The > > >>>>> non-cache coherent DMA routines perform an unconditional D-cache > > >>>>> invalidate by VA to the PoC before reading from the buffers filled by > > >>>>> the device, and I would expect the PoC to be defined as lying beyond > > >>>>> the LLC to still guarantee the architected behavior. > > >>>> > > >>>> We have seen performance improvements when running Manhattan > > >>>> GFXBench benchmarks. > > >>>> > > >>> > > >>> Ah ok, that makes sense, since in that case, the data flow is mostly > > >>> to the device, not from the device. > > >>> > > >>>> As for the PoC, from my knowledge on sdm845 the system cache, aka > > >>>> Last level cache (LLC) lies beyond the point of coherency. > > >>>> Non-cache coherent buffers will not be cached to system cache also, and > > >>>> no additional software cache maintenance ops are required for system > > >>>> cache. > > >>>> Pratik can add more if I am missing something. > > >>>> > > >>>> To take care of the memory attributes from DMA APIs side, we can add a > > >>>> DMA_ATTR definition to take care of any dma non-coherent APIs calls. > > >>>> > > >>> > > >>> So does the device use the correct inner non-cacheable, outer > > >>> writeback cacheable attributes if the SMMU is in pass-through? > > >>> > > >>> We have been looking into another use case where the fact that the > > >>> SMMU overrides memory attributes is causing issues (WC mappings used > > >>> by the radeon and amdgpu driver). So if the SMMU would honour the > > >>> existing attributes, would you still need the SMMU changes? > > >> > > >> Even if we could force a stage 2 mapping with the weakest pagetable > > >> attributes (such that combining would work), there would still need to > > >> be a way to set the TCR attributes appropriately if this behaviour is > > >> wanted for the SMMU's own table walks as well. > > >> > > > > > > Isn't that just a matter of implementing support for SMMUs that lack > > > the 'dma-coherent' attribute? > > > > Not quite - in general they need INC-ONC attributes in case there > > actually is something in the architectural outer-cacheable domain. > > But is it a problem to use INC-ONC attributes for the SMMU PTW on this > chip? AIUI, the reason for the SMMU changes is to avoid the > performance hit of snooping, which is more expensive than cache > maintenance of SMMU page tables. So are you saying the by-VA cache > maintenance is not relayed to this system cache, resulting in page > table updates to be invisible to masters using INC-ONC attributes? The reason for this SMMU chan
Re: [PATCH 1/3] iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes
On Mon, Jan 21, 2019 at 7:23 PM Robin Murphy wrote: > > On 21/01/2019 05:53, Vivek Gautam wrote: > > A number of arm_smmu_domain's attributes can be assigned based > > on the iommu domains's attributes. These local attributes better > > be managed by a bitmap. > > So remove boolean flags and move to a 32-bit bitmap, and enable > > each bits separtely. > > > > Signed-off-by: Vivek Gautam > > --- > > drivers/iommu/arm-smmu.c | 10 ++ > > 1 file changed, 6 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > > index 7ebbcf1b2eb3..52b300dfc096 100644 > > --- a/drivers/iommu/arm-smmu.c > > +++ b/drivers/iommu/arm-smmu.c > > @@ -257,10 +257,11 @@ struct arm_smmu_domain { > > const struct iommu_gather_ops *tlb_ops; > > struct arm_smmu_cfg cfg; > > enum arm_smmu_domain_stage stage; > > - boolnon_strict; > > struct mutexinit_mutex; /* Protects smmu pointer > > */ > > spinlock_t cb_lock; /* Serialises ATS1* ops and > > TLB syncs */ > > struct iommu_domain domain; > > +#define ARM_SMMU_DOMAIN_ATTR_NON_STRICT BIT(0) > > + unsigned intattr; > > }; > > > > struct arm_smmu_option_prop { > > @@ -901,7 +902,7 @@ static int arm_smmu_init_domain_context(struct > > iommu_domain *domain, > > if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK) > > pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA; > > > > - if (smmu_domain->non_strict) > > + if (smmu_domain->attr & ARM_SMMU_DOMAIN_ATTR_NON_STRICT) > > pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; > > > > /* Non coherent page table mappings only for Stage-1 */ > > @@ -1598,7 +1599,8 @@ static int arm_smmu_domain_get_attr(struct > > iommu_domain *domain, > > case IOMMU_DOMAIN_DMA: > > switch (attr) { > > case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: > > - *(int *)data = smmu_domain->non_strict; > > + *(int *)data = !!(smmu_domain->attr & > > + ARM_SMMU_DOMAIN_ATTR_NON_STRICT); > > return 0; > > default: > > return -ENODEV; > > @@ -1638,7 +1640,7 @@ static int arm_smmu_domain_set_attr(struct > > iommu_domain *domain, > > case IOMMU_DOMAIN_DMA: > > switch (attr) { > > case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: > > - smmu_domain->non_strict = *(int *)data; > > + smmu_domain->attr |= ARM_SMMU_DOMAIN_ATTR_NON_STRICT; > > But what if *data == 0? Right, a check for data here also similar to what we are doing for QCOM_SYS_CACHE [1]. [1] https://lore.kernel.org/patchwork/patch/1033796/ Regards Vivek > > Robin. > > > break; > > default: > > ret = -ENODEV; > > > ___ > iommu mailing list > io...@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCHv3 3/4] coresight: etm4x: Add support to enable ETMv4.2
On 1/18/2019 5:52 PM, Sai Prakash Ranjan wrote: SDM845 has ETMv4.2 and can use the existing etm4x driver. But the current etm driver checks only for ETMv4.0 and errors out for other etm4x versions. This patch adds this missing support to enable SoC's with ETMv4x to use same driver by checking only the ETM architecture major version number. Without this change, we get below error during etm probe: / # dmesg | grep etm [6.660093] coresight-etm4x: probe of 704.etm failed with error -22 [6.666902] coresight-etm4x: probe of 714.etm failed with error -22 [6.673708] coresight-etm4x: probe of 724.etm failed with error -22 [6.680511] coresight-etm4x: probe of 734.etm failed with error -22 [6.687313] coresight-etm4x: probe of 744.etm failed with error -22 [6.694113] coresight-etm4x: probe of 754.etm failed with error -22 [6.700914] coresight-etm4x: probe of 764.etm failed with error -22 [6.707717] coresight-etm4x: probe of 774.etm failed with error -22 With this change, etm probe is successful: / # dmesg | grep coresight [6.659198] coresight-etm4x 704.etm: CPU0: ETM v4.2 initialized [6.665848] coresight-etm4x 714.etm: CPU1: ETM v4.2 initialized [6.672493] coresight-etm4x 724.etm: CPU2: ETM v4.2 initialized [6.679129] coresight-etm4x 734.etm: CPU3: ETM v4.2 initialized [6.685770] coresight-etm4x 744.etm: CPU4: ETM v4.2 initialized [6.692403] coresight-etm4x 754.etm: CPU5: ETM v4.2 initialized [6.699024] coresight-etm4x 764.etm: CPU6: ETM v4.2 initialized [6.705646] coresight-etm4x 774.etm: CPU7: ETM v4.2 initialized Signed-off-by: Sai Prakash Ranjan --- drivers/hwtracing/coresight/coresight-etm4x.c | 2 +- drivers/hwtracing/coresight/coresight-etm4x.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index 53e2fb6e86f6..93d5f1f3145e 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -55,7 +55,7 @@ static void etm4_os_unlock(struct etmv4_drvdata *drvdata) static bool etm4_arch_supported(u8 arch) { - switch (arch) { + switch (arch >> 4) { While this looks good, from what it looks like arch is a combination of major version minor version. So, will it be better to masks, and shifts macros instead of a magic number shift. But, frankly it's upto Mathieu to decide the readability of this. So, I leave it to him. Thanks Vivek case ETM_ARCH_V4: break; default: diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h index 52786e9d8926..05d4bd330881 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.h +++ b/drivers/hwtracing/coresight/coresight-etm4x.h @@ -136,7 +136,7 @@ #define ETM_MAX_RES_SEL 16 #define ETM_MAX_SS_CMP8 -#define ETM_ARCH_V4 0x40 +#define ETM_ARCH_V40x4 #define ETMv4_SYNC_MASK 0x1F #define ETM_CYC_THRESHOLD_MASK0xFFF #define ETM_CYC_THRESHOLD_DEFAULT 0x100
Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache
Hi, On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel wrote: > > On Mon, 21 Jan 2019 at 06:54, Vivek Gautam > wrote: > > > > Qualcomm SoCs have an additional level of cache called as > > System cache, aka. Last level cache (LLC). This cache sits right > > before the DDR, and is tightly coupled with the memory controller. > > The clients using this cache request their slices from this > > system cache, make it active, and can then start using it. > > For these clients with smmu, to start using the system cache for > > buffers and, related page tables [1], memory attributes need to be > > set accordingly. This series add the required support. > > > > Does this actually improve performance on reads from a device? The > non-cache coherent DMA routines perform an unconditional D-cache > invalidate by VA to the PoC before reading from the buffers filled by > the device, and I would expect the PoC to be defined as lying beyond > the LLC to still guarantee the architected behavior. We have seen performance improvements when running Manhattan GFXBench benchmarks. As for the PoC, from my knowledge on sdm845 the system cache, aka Last level cache (LLC) lies beyond the point of coherency. Non-cache coherent buffers will not be cached to system cache also, and no additional software cache maintenance ops are required for system cache. Pratik can add more if I am missing something. To take care of the memory attributes from DMA APIs side, we can add a DMA_ATTR definition to take care of any dma non-coherent APIs calls. Regards Vivek > > > > > This change is a realisation of following changes from downstream msm-4.9: > > iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2] > > iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3] > > > > Changes since v2: > > - Split the patches into io-pgtable-arm driver and arm-smmu driver. > > - Converted smmu domain attributes to a bitmap, so multiple attributes > >can be managed easily. > > - With addition of non-coherent page table mapping support [4], this > >patch series now aligns with the understanding of upgrading the > >non-coherent devices to use some level of outer cache. > > - Updated the macros and comments to reflect the use of QCOM_SYS_CACHE. > > - QCOM_SYS_CACHE can still be used at stage 2, so that doens't depend on > >stage-1 mapping. > > - Added change to disable the attribute from arm_smmu_domain_set_attr() > >when needed. > > - Removed the page protection controls for QCOM_SYS_CACHE at the DMA API > >level. > > > > Goes on top of the non-coherent page tables support patch series [4] > > > > [1] https://patchwork.kernel.org/patch/10302791/ > > [2] > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=bf762276796e79ca90014992f4d9da5593fa7d51 > > [3] > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=d4c72c413ea27c43f60825193d4de9cb8ffd9602 > > [4] https://lore.kernel.org/patchwork/cover/1032938/ > > > > Vivek Gautam (3): > > iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes > > iommu/io-pgtable-arm: Add support to use system cache > > iommu/arm-smmu: Add support to use system cache > > > > drivers/iommu/arm-smmu.c | 28 > > drivers/iommu/io-pgtable-arm.c | 15 +-- > > drivers/iommu/io-pgtable.h | 4 > > include/linux/iommu.h | 2 ++ > > 4 files changed, 43 insertions(+), 6 deletions(-) > > > > -- > > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > > of Code Aurora Forum, hosted by The Linux Foundation > > > > > > ___ > > linux-arm-kernel mailing list > > linux-arm-ker...@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
Hi Will, On Sun, Jan 20, 2019 at 5:31 AM Will Deacon wrote: > > On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote: > > Adding a device tree option for arm smmu to enable non-cacheable > > memory for page tables. > > We already enable a smmu feature for coherent walk based on > > whether the smmu device is dma-coherent or not. Have an option > > to enable non-cacheable page table memory to force set it for > > particular smmu devices. > > Hmm, I must be missing something here. What is the difference between this > new property, and simply omitting dma-coherent on the SMMU? So, this is what I understood from the email thread for Last level cache support - Robin pointed to the fact that we may need to add support for setting non-cacheable mappings in the TCR. Currently, we don't do that for SMMUs that omit dma-coherent. We rely on the interconnect to handle the configuration set in TCR, and let interconnect ignore the cacheability if it can't support. Moreover, Robin suggested that we should take care of SMMUs, for which removing snoop latency on walks by making mappings as non-cacheable outweighs the cost of cache maintenance on PTE updates. So, this change adds another property to do this non-cacheable mappings explicitly. As I pointed, omitting 'dma-coherent', and corresponding IO_PGTABLE_QUIRK_NO_DMA' does takes care of few things. Should we handle the TCR settings too with this quirk? Regards Vivek > > Will > ___ > iommu mailing list > io...@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 2/3] iommu/io-pgtable-arm: Add support to use system cache
Few Qualcomm platforms such as, sdm845 have an additional outer cache called as System cache, aka. Last level cache (LLC) that allows non-coherent devices to upgrade to using caching. There is a fundamental assumption that non-coherent devices can't access caches. This change adds an exception where they *can* use some level of cache despite still being non-coherent overall. The coherent devices that use cacheable memory, and CPU make use of this system cache by default. Looking at memory types, we have following - a) Normal uncached :- MAIR 0x44, inner non-cacheable, outer non-cacheable; b) Normal cached :- MAIR 0xff, inner read write-back non-transient, outer read write-back non-transient; attribute setting for coherenet I/O devices. and, for non-coherent i/o devices that can allocate in system cache another type gets added - c) Normal sys-cached :- MAIR 0xf4, inner non-cacheable, outer read write-back non-transient Coherent I/O devices use system cache by marking the memory as normal cached. Non-coherent I/O devices should mark the memory as normal sys-cached in page tables to use system cache. Signed-off-by: Vivek Gautam --- drivers/iommu/io-pgtable-arm.c | 15 +-- drivers/iommu/io-pgtable.h | 4 include/linux/iommu.h | 2 ++ 3 files changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index c76919c30f1a..0e55772702da 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -168,10 +168,12 @@ #define ARM_LPAE_MAIR_ATTR_MASK0xff #define ARM_LPAE_MAIR_ATTR_DEVICE 0x04 #define ARM_LPAE_MAIR_ATTR_NC 0x44 +#define ARM_LPAE_MAIR_ATTR_QCOM_SYS_CACHE 0xf4 #define ARM_LPAE_MAIR_ATTR_WBRWA 0xff #define ARM_LPAE_MAIR_ATTR_IDX_NC 0 #define ARM_LPAE_MAIR_ATTR_IDX_CACHE 1 #define ARM_LPAE_MAIR_ATTR_IDX_DEV 2 +#define ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE 3 /* IOPTE accessors */ #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d)) @@ -443,6 +445,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, else if (prot & IOMMU_CACHE) pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE << ARM_LPAE_PTE_ATTRINDX_SHIFT); + else if (prot & IOMMU_QCOM_SYS_CACHE) + pte |= (ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE + << ARM_LPAE_PTE_ATTRINDX_SHIFT); } else { pte = ARM_LPAE_PTE_HAP_FAULT; if (prot & IOMMU_READ) @@ -781,7 +786,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA | IO_PGTABLE_QUIRK_NON_STRICT | - IO_PGTABLE_QUIRK_NON_COHERENT)) + IO_PGTABLE_QUIRK_NON_COHERENT | + IO_PGTABLE_QUIRK_QCOM_SYS_CACHE)) return NULL; data = arm_lpae_alloc_pgtable(cfg); @@ -794,6 +800,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT) reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT | ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT; + else if (cfg->quirks & IO_PGTABLE_QUIRK_QCOM_SYS_CACHE) + reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT | + ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT; else reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT | ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT; @@ -848,7 +857,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) (ARM_LPAE_MAIR_ATTR_WBRWA << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) | (ARM_LPAE_MAIR_ATTR_DEVICE - << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)); + << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) | + (ARM_LPAE_MAIR_ATTR_QCOM_SYS_CACHE + << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE)); cfg->arm_lpae_s1_cfg.mair[0] = reg; cfg->arm_lpae_s1_cfg.mair[1] = 0; diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h index 46604cf7b017..fb237e8aa9f1 100644 --- a/drivers/iommu/io-pgtable.h +++ b/drivers/iommu/io-pgtable.h @@ -80,6 +80,9 @@ struct io_pgtable_cfg { * pagetables even on a coherent SMMU for cases where reducing * snoop traffic/latency on walks outweighs the cost of cache * maintenance on PTE up
[PATCH 1/3] iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes
A number of arm_smmu_domain's attributes can be assigned based on the iommu domains's attributes. These local attributes better be managed by a bitmap. So remove boolean flags and move to a 32-bit bitmap, and enable each bits separtely. Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 7ebbcf1b2eb3..52b300dfc096 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -257,10 +257,11 @@ struct arm_smmu_domain { const struct iommu_gather_ops *tlb_ops; struct arm_smmu_cfg cfg; enum arm_smmu_domain_stage stage; - boolnon_strict; struct mutexinit_mutex; /* Protects smmu pointer */ spinlock_t cb_lock; /* Serialises ATS1* ops and TLB syncs */ struct iommu_domain domain; +#define ARM_SMMU_DOMAIN_ATTR_NON_STRICTBIT(0) + unsigned intattr; }; struct arm_smmu_option_prop { @@ -901,7 +902,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK) pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA; - if (smmu_domain->non_strict) + if (smmu_domain->attr & ARM_SMMU_DOMAIN_ATTR_NON_STRICT) pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; /* Non coherent page table mappings only for Stage-1 */ @@ -1598,7 +1599,8 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case IOMMU_DOMAIN_DMA: switch (attr) { case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = smmu_domain->non_strict; + *(int *)data = !!(smmu_domain->attr & + ARM_SMMU_DOMAIN_ATTR_NON_STRICT); return 0; default: return -ENODEV; @@ -1638,7 +1640,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, case IOMMU_DOMAIN_DMA: switch (attr) { case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - smmu_domain->non_strict = *(int *)data; + smmu_domain->attr |= ARM_SMMU_DOMAIN_ATTR_NON_STRICT; break; default: ret = -ENODEV; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache
Qualcomm SoCs have an additional level of cache called as System cache, aka. Last level cache (LLC). This cache sits right before the DDR, and is tightly coupled with the memory controller. The clients using this cache request their slices from this system cache, make it active, and can then start using it. For these clients with smmu, to start using the system cache for buffers and, related page tables [1], memory attributes need to be set accordingly. This series add the required support. This change is a realisation of following changes from downstream msm-4.9: iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2] iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3] Changes since v2: - Split the patches into io-pgtable-arm driver and arm-smmu driver. - Converted smmu domain attributes to a bitmap, so multiple attributes can be managed easily. - With addition of non-coherent page table mapping support [4], this patch series now aligns with the understanding of upgrading the non-coherent devices to use some level of outer cache. - Updated the macros and comments to reflect the use of QCOM_SYS_CACHE. - QCOM_SYS_CACHE can still be used at stage 2, so that doens't depend on stage-1 mapping. - Added change to disable the attribute from arm_smmu_domain_set_attr() when needed. - Removed the page protection controls for QCOM_SYS_CACHE at the DMA API level. Goes on top of the non-coherent page tables support patch series [4] [1] https://patchwork.kernel.org/patch/10302791/ [2] https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=bf762276796e79ca90014992f4d9da5593fa7d51 [3] https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=d4c72c413ea27c43f60825193d4de9cb8ffd9602 [4] https://lore.kernel.org/patchwork/cover/1032938/ Vivek Gautam (3): iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes iommu/io-pgtable-arm: Add support to use system cache iommu/arm-smmu: Add support to use system cache drivers/iommu/arm-smmu.c | 28 drivers/iommu/io-pgtable-arm.c | 15 +-- drivers/iommu/io-pgtable.h | 4 include/linux/iommu.h | 2 ++ 4 files changed, 43 insertions(+), 6 deletions(-) -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 3/3] iommu/arm-smmu: Add support to use system cache
Few Qualcomm platforms, such as sdm845 have an additional outer cache called as System cache, aka. Last level cache (LLC) that allows non-coherent devices to upgrade to using caching. This last level cache sits right before the DDR, and is tightly coupled with the memory controller. The cache is available to a number of devices - coherent and non-coherent, present in the SoC system, and to CPUs. The devices request their slices from this system cache, make it active, and can then start using it. Devices can set iommu domain attributes and page protection while mapping the buffers to set the required memory attributes to use system cache for buffers and page tables. This change adds the support for iommu domain attributes and the interaction with io page table driver. Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 52b300dfc096..324f3bb54c78 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -260,7 +260,8 @@ struct arm_smmu_domain { struct mutexinit_mutex; /* Protects smmu pointer */ spinlock_t cb_lock; /* Serialises ATS1* ops and TLB syncs */ struct iommu_domain domain; -#define ARM_SMMU_DOMAIN_ATTR_NON_STRICTBIT(0) +#define ARM_SMMU_DOMAIN_ATTR_QCOM_SYS_CACHEBIT(1) +#define ARM_SMMU_DOMAIN_ATTR_NON_STRICTBIT(0) unsigned intattr; }; @@ -910,6 +911,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, smmu_domain->stage == ARM_SMMU_DOMAIN_S1) pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_COHERENT; + if (smmu_domain->attr & ARM_SMMU_DOMAIN_ATTR_QCOM_SYS_CACHE) + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_QCOM_SYS_CACHE; + smmu_domain->smmu = smmu; pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain); if (!pgtbl_ops) { @@ -1592,6 +1596,10 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case DOMAIN_ATTR_NESTING: *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); return 0; + case DOMAIN_ATTR_QCOM_SYS_CACHE: + *(int *)data = !!(smmu_domain->attr & + ARM_SMMU_DOMAIN_ATTR_QCOM_SYS_CACHE); + return 0; default: return -ENODEV; } @@ -1633,6 +1641,16 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, else smmu_domain->stage = ARM_SMMU_DOMAIN_S1; break; + case DOMAIN_ATTR_QCOM_SYS_CACHE: + if (smmu_domain->smmu) { + ret = -EPERM; + goto out_unlock; + } + if (*(int *)data) + smmu_domain->attr |= ARM_SMMU_DOMAIN_ATTR_QCOM_SYS_CACHE; + else + smmu_domain->attr &= ~ARM_SMMU_DOMAIN_ATTR_QCOM_SYS_CACHE; + break; default: ret = -ENODEV; } -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
Adding a device tree option for arm smmu to enable non-cacheable memory for page tables. We already enable a smmu feature for coherent walk based on whether the smmu device is dma-coherent or not. Have an option to enable non-cacheable page table memory to force set it for particular smmu devices. Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index af18a7e7f917..7ebbcf1b2eb3 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -188,6 +188,7 @@ struct arm_smmu_device { u32 features; #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0) +#define ARM_SMMU_OPT_PGTBL_NON_COHERENT (1 << 1) u32 options; enum arm_smmu_arch_version version; enum arm_smmu_implementationmodel; @@ -273,6 +274,7 @@ static bool using_legacy_binding, using_generic_binding; static struct arm_smmu_option_prop arm_smmu_options[] = { { ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" }, + { ARM_SMMU_OPT_PGTBL_NON_COHERENT, "arm,smmu-pgtable-non-coherent" }, { 0, NULL}, }; @@ -902,6 +904,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, if (smmu_domain->non_strict) pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; + /* Non coherent page table mappings only for Stage-1 */ + if (smmu->options & ARM_SMMU_OPT_PGTBL_NON_COHERENT && + smmu_domain->stage == ARM_SMMU_DOMAIN_S1) + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_COHERENT; + smmu_domain->smmu = smmu; pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain); if (!pgtbl_ops) { -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 0/2] iommu/arm: Add support for non-coherent page tables
As discussed in the Qcom system cache support thread [1], it is imperative that we enable the support for non-cacheable page tables for SMMU implementations for which removing snoop latency on walks by making mappings as non-cacheable, outweighs the cost of cache maintenance on PTE updates. This series adds a new SMMU device tree option to let the particular SMMU configuration setup cacheable or non-cacheable mappings for page-tables out of box. We set a new quirk for i/o page tables - IO_PGTABLE_QUIRK_NON_COHERENT, that lets us set different TCR configurations. This quirk enables the non-cacheable page tables for all masters sitting on SMMU. Should this control be available per smmu_domain as each master may have a different perf requirement? Enabling this for the entire SMMU may not be desirable for all masters. [1] https://lore.kernel.org/patchwork/patch/1020906/ Vivek Gautam (2): iommu/io-pgtable-arm: Add support for non-coherent page tables iommu/arm-smmu: Add support for non-coherent page table mappings drivers/iommu/arm-smmu.c | 7 +++ drivers/iommu/io-pgtable-arm.c | 17 - drivers/iommu/io-pgtable.h | 6 ++ 3 files changed, 25 insertions(+), 5 deletions(-) -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 1/2] iommu/io-pgtable-arm: Add support for non-coherent page tables
>From Robin's comment [1] about touching TCR configurations - "TBH if we're going to touch the TCR attributes at all then we should probably correct that sloppiness first - there's an occasional argument for using non-cacheable pagetables even on a coherent SMMU if reducing snoop traffic/latency on walks outweighs the cost of cache maintenance on PTE updates, but anyone thinking they can get that by overriding dma-coherent silently gets the worst of both worlds thanks to this current TCR value." We have IO_PGTABLE_QUIRK_NO_DMA quirk present, but we don't force anybody _not_ using dma-coherent smmu to have non-cacheable page table mappings. Having another quirk flag can help in having non-cacheable memory for page tables once and for all. [1] https://lore.kernel.org/patchwork/patch/1020906/ Signed-off-by: Vivek Gautam --- drivers/iommu/io-pgtable-arm.c | 17 - drivers/iommu/io-pgtable.h | 6 ++ 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 237cacd4a62b..c76919c30f1a 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -780,7 +780,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) struct arm_lpae_io_pgtable *data; if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA | - IO_PGTABLE_QUIRK_NON_STRICT)) + IO_PGTABLE_QUIRK_NON_STRICT | + IO_PGTABLE_QUIRK_NON_COHERENT)) return NULL; data = arm_lpae_alloc_pgtable(cfg); @@ -788,9 +789,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) return NULL; /* TCR */ - reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) | - (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) | - (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT); + reg = ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT; + + if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT) + reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT | + ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT; + else + reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT | + ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT; switch (ARM_LPAE_GRANULE(data)) { case SZ_4K: @@ -873,7 +879,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie) /* The NS quirk doesn't apply at stage 2 */ if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA | - IO_PGTABLE_QUIRK_NON_STRICT)) + IO_PGTABLE_QUIRK_NON_STRICT | + IO_PGTABLE_QUIRK_NON_COHERENT)) return NULL; data = arm_lpae_alloc_pgtable(cfg); diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h index 47d5ae559329..46604cf7b017 100644 --- a/drivers/iommu/io-pgtable.h +++ b/drivers/iommu/io-pgtable.h @@ -75,6 +75,11 @@ struct io_pgtable_cfg { * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs * on unmap, for DMA domains using the flush queue mechanism for * delayed invalidation. +* +* IO_PGTABLE_QUIRK_NON_COHERENT: Enforce non-cacheable mappings for +* pagetables even on a coherent SMMU for cases where reducing +* snoop traffic/latency on walks outweighs the cost of cache +* maintenance on PTE updates. */ #define IO_PGTABLE_QUIRK_ARM_NS BIT(0) #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1) @@ -82,6 +87,7 @@ struct io_pgtable_cfg { #define IO_PGTABLE_QUIRK_ARM_MTK_4GBBIT(3) #define IO_PGTABLE_QUIRK_NO_DMA BIT(4) #define IO_PGTABLE_QUIRK_NON_STRICT BIT(5) + #define IO_PGTABLE_QUIRK_NON_COHERENT BIT(6) unsigned long quirks; unsigned long pgsize_bitmap; unsigned intias; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v1] arm64: dts: qcom: msm8996: Disable USB2 PHY suspend by core
On Thu, Jan 3, 2019 at 6:18 PM Manu Gautam wrote: > > QUSB2 PHY on msm8996 doesn't work well when autosuspend by > dwc3 core using USB2PHYCFG register is enabled. One of the > issue seen is that PHY driver reports PLL lock failure and > fails phy_init() if dwc3 core has USB2 PHY suspend enabled. > Fix this by using quirks to disable USB2 PHY LPM/suspend and > dwc3 core already takes care of explicitly suspending PHY > during suspend if quirks are specified. > > Signed-off-by: Manu Gautam > --- This works well for db820c [1]. Tested-by: Vivek Gautam [1] https://github.com/vivekgautam1/linux/commits/origin/v4.20-rc5/db820c Best regards Vivek > arch/arm64/boot/dts/qcom/msm8996.dtsi | 4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi > b/arch/arm64/boot/dts/qcom/msm8996.dtsi > index b29fe80d7288..1f14ca35afc2 100644 > --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi > +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi > @@ -911,6 +911,8 @@ > interrupts = <0 138 IRQ_TYPE_LEVEL_HIGH>; > phys = <&hsusb_phy2>; > phy-names = "usb2-phy"; > + snps,dis_u2_susphy_quirk; > + snps,dis_enblslpm_quirk; > }; > }; > > @@ -940,6 +942,8 @@ > interrupts = <0 131 IRQ_TYPE_LEVEL_HIGH>; > phys = <&hsusb_phy1>, <&ssusb_phy_0>; > phy-names = "usb2-phy", "usb3-phy"; > + snps,dis_u2_susphy_quirk; > + snps,dis_enblslpm_quirk; > }; > }; > > -- > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project > -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v2 1/1] drm/prime: Use sg_dma_len() macro to get sg's length
After mapping a sg list the we should use sg_dma_address() and sg_dma_len() macros to access sg->address and sg->length. Fix the same for sg->length in drm_prime_sg_to_page_addr_arrays(). Signed-off-by: Vivek Gautam --- Changes since v1: - Fixed compilation error: replaced sg_dma_length() with sg_dma_len(). This came while debugging one dmabuf import issue that we are seeing on sdm845 target. The dmabuf which is prepared by video (venus in this case), is imported by drm device. The import call flow looks like follows: drm_gem_prime_import() - drm_gem_prime_import_dev() - dma_buf_attach() & dma_buf_map_attachment() - From dma_buf_map_attachment() - vb2_dma_sg_dmabuf_ops_map() - dma_map_sg(): this updates the sg->nents. >From debugging, the sg table mapping results in sg's 'nents' to be less that the original nents. Now drm device prepares the page information based on this sg table, and messes up with the mappings, and we start seeing random crashes as below from drm's memory space. Although this change isn't helping to fix the issue currently, but this fix seems the right thing to do. One thing to notice is that, if we restore the sg->nents to sg->orig_nents in vb2_dma_sg_dmabuf_ops_map(), we don't see the below corruptions. Any pointers on this will be highly appreciated. Thanks. -- [ 338.070558] Unable to handle kernel paging request at virtual address 4038 [ 338.078751] Mem abort info: [ 338.081671] ESR = 0x9604 [ 338.084860] Exception class = DABT (current EL), IL = 32 bits [ 338.090972] SET = 0, FnV = 0 [ 338.094139] EA = 0, S1PTW = 0 [ 338.097393] Data abort info: [ 338.100375] ISV = 0, ISS = 0x0004 [ 338.104362] CM = 0, WnR = 0 [ 338.107446] [4038] address between user and kernel address ranges [ 338.114801] Internal error: Oops: 9604 [#1] PREEMPT SMP [ 338.120527] Modules linked in: rfcomm uinput cdc_ether venus_dec venus_enc usbnet videobuf2_dma_sg videobuf2_memops hci_uart btqca bluetooth r8152 mii ath10k_snoc venus_core ath10k_core v4l2_mem2mem videobuf2_v4l2 videobuf2_common ath mac80211 ecdh_generic qcom_q6v5_mss lzo lzo_compress qcom_q6v5_adsp qcom_common qcom_q6v5 zram bridge stp llc ipt_MASQUERADE fuse snd_seq_dummy snd_seq snd_seq_device cfg80211 joydev [ 338.158192] CPU: 4 PID: 3235 Comm: chrome Tainted: GW 4.19.0 #2 [ 338.165700] Hardware name: Google Cheza (rev1) (DT) [ 338.170720] pstate: 8049 (Nzcv daif +PAN -UAO) [ 338.175660] pc : drm_mm_insert_node_in_range+0xfc/0x348 [ 338.181035] lr : drm_mm_insert_node_in_range+0x24/0x348 [ 338.186407] sp : ff8013033b30 [ 338.189816] x29: ff8013033bd0 x28: ff8008591894 [ 338.195275] x27: 0010 x26: [ 338.200734] x25: x24: [ 338.206194] x23: x22: ffc0f48b7e08 [ 338.211656] x21: x20: 005d [ 338.217118] x19: x18: [ 338.222581] x17: x16: [ 338.228046] x15: x14: [ 338.233511] x13: 0001 x12: ffc0b1da7200 [ 338.238978] x11: 0010 x10: 0010 [ 338.244437] x9 : 0008 x8 : 4000 [ 338.249898] x7 : x6 : [ 338.255361] x5 : x4 : [ 338.260823] x3 : x2 : 005d [ 338.266285] x1 : ffc0b1da7100 x0 : ffc0b0215800 [ 338.271748] Process chrome (pid: 3235, stack limit = 0x0900f416) [ 338.278628] Call trace: [ 338.281151] drm_mm_insert_node_in_range+0xfc/0x348 [ 338.286168] msm_gem_map_vma+0x60/0xdc [ 338.290022] msm_gem_get_iova+0xb4/0xf4 [ 338.293967] msm_ioctl_gem_info+0x90/0xdc [ 338.298089] drm_ioctl_kernel+0xa8/0xe8 [ 338.302043] drm_ioctl+0x218/0x384 [ 338.305547] drm_compat_ioctl+0xd8/0xe8 [ 338.309503] __arm64_compat_sys_ioctl+0x134/0x20c [ 338.314339] el0_svc_common+0xa0/0xf0 [ 338.318108] el0_svc_compat_handler+0x2c/0x38 [ 338.322588] el0_svc_compat+0x8/0x18 [ 338.326274] Code: f94066c8 aa1f03e0 321d03e9 321c03ea (f9401d0b) [ 338.332538] ---[ end trace 5c09e60869887d87 ]--- [ 338.354633] Kernel panic - not syncing: Fatal exception [ 338.360018] SMP: stopping secondary CPUs [ 338.364179] Kernel Offset: disabled [ 338.367779] CPU features: 0x0,22802a18 [ 338.371643] Memory Limit: none -- drivers/gpu/drm/drm_prime.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 231e3f6d5f41..aa87ba9c0d7b 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -945,7 +945,7 @@ int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages, index = 0; for_each_sg(sgt->sgl, sg, sgt->nents, count) { -
Re: [PATCH 1/1] drm/prime: Use sg_dma_len() macro to get sg's length
On Mon, Jan 7, 2019 at 4:14 PM kbuild test robot wrote: > > Hi Vivek, > > Thank you for the patch! Yet something to improve: > > [auto build test ERROR on linus/master] > [also build test ERROR on v5.0-rc1 next-20190107] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Vivek-Gautam/drm-prime-Use-sg_dma_len-macro-to-get-sg-s-length/20190107-181350 > config: x86_64-randconfig-x013-201901 (attached as .config) > compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 > reproduce: > # save the attached .config to linux build tree > make ARCH=x86_64 > > All errors (new ones prefixed by >>): > >drivers/gpu/drm/drm_prime.c: In function > 'drm_prime_sg_to_page_addr_arrays': > >> drivers/gpu/drm/drm_prime.c:948:9: error: implicit declaration of function > >> 'sg_dma_length'; did you mean 'sg_dma_len'? > >> [-Werror=implicit-function-declaration] > len = sg_dma_length(sg); > ^ > sg_dma_len Sorry, my fat finger :( This should be as suggested - sg_dma_len(). Thanks Vivek >cc1: some warnings being treated as errors > > vim +948 drivers/gpu/drm/drm_prime.c > >926 >927 /** >928 * drm_prime_sg_to_page_addr_arrays - convert an sg table into a page > array >929 * @sgt: scatter-gather table to convert >930 * @pages: optional array of page pointers to store the page array in >931 * @addrs: optional array to store the dma bus address of each page >932 * @max_entries: size of both the passed-in arrays >933 * >934 * Exports an sg table into an array of pages and addresses. This is > currently >935 * required by the TTM driver in order to do correct fault handling. >936 */ >937 int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct > page **pages, >938 dma_addr_t *addrs, int > max_entries) >939 { >940 unsigned count; >941 struct scatterlist *sg; >942 struct page *page; >943 u32 len, index; >944 dma_addr_t addr; >945 >946 index = 0; >947 for_each_sg(sgt->sgl, sg, sgt->nents, count) { > > 948 len = sg_dma_length(sg); >949 page = sg_page(sg); >950 addr = sg_dma_address(sg); >951 >952 while (len > 0) { >953 if (WARN_ON(index >= max_entries)) >954 return -1; >955 if (pages) >956 pages[index] = page; >957 if (addrs) >958 addrs[index] = addr; >959 >960 page++; >961 addr += PAGE_SIZE; >962 len -= PAGE_SIZE; >963 index++; >964 } >965 } >966 return 0; >967 } >968 EXPORT_SYMBOL(drm_prime_sg_to_page_addr_arrays); >969 > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 1/1] drm/prime: Use sg_dma_len() macro to get sg's length
After mapping a sg list we should use sg_dma_address(), and sg_dma_len() macros to access sg->address and sg->length. Fix the same for sg->length in drm_prime_sg_to_page_addr_arrays(). Signed-off-by: Vivek Gautam --- This came while debugging one dmabuf import issue that we are seeing on sdm845 target. The dmabuf which is prepared by video (venus in this case), is imported by drm device. The import call flow looks like follows: drm_gem_prime_import() - drm_gem_prime_import_dev() - dma_buf_attach() & dma_buf_map_attachment() - From dma_buf_map_attachment() - vb2_dma_sg_dmabuf_ops_map() - dma_map_sg(): this updates the sg->nents. >From debugging, the sg table mapping results in sg's 'nents' to be less that the original nents. Now drm device prepares the page information based on this sg table, and messes up with the mappings, and we start seeing random crashes as below from drm's memory space. Although this change isn't helping to fix the issue currently, but this fix seems the right thing to do. One thing to notice is that, if we restore the sg->nents to sg->orig_nents in vb2_dma_sg_dmabuf_ops_map(), we don't see the below corruptions. Any pointers on this will be highly appreciated. Thanks. -- [ 338.070558] Unable to handle kernel paging request at virtual address 4038 [ 338.078751] Mem abort info: [ 338.081671] ESR = 0x9604 [ 338.084860] Exception class = DABT (current EL), IL = 32 bits [ 338.090972] SET = 0, FnV = 0 [ 338.094139] EA = 0, S1PTW = 0 [ 338.097393] Data abort info: [ 338.100375] ISV = 0, ISS = 0x0004 [ 338.104362] CM = 0, WnR = 0 [ 338.107446] [4038] address between user and kernel address ranges [ 338.114801] Internal error: Oops: 9604 [#1] PREEMPT SMP [ 338.120527] Modules linked in: rfcomm uinput cdc_ether venus_dec venus_enc usbnet videobuf2_dma_sg videobuf2_memops hci_uart btqca bluetooth r8152 mii ath10k_snoc venus_core ath10k_core v4l2_mem2mem videobuf2_v4l2 videobuf2_common ath mac80211 ecdh_generic qcom_q6v5_mss lzo lzo_compress qcom_q6v5_adsp qcom_common qcom_q6v5 zram bridge stp llc ipt_MASQUERADE fuse snd_seq_dummy snd_seq snd_seq_device cfg80211 joydev [ 338.158192] CPU: 4 PID: 3235 Comm: chrome Tainted: GW 4.19.0 #2 [ 338.165700] Hardware name: Google Cheza (rev1) (DT) [ 338.170720] pstate: 8049 (Nzcv daif +PAN -UAO) [ 338.175660] pc : drm_mm_insert_node_in_range+0xfc/0x348 [ 338.181035] lr : drm_mm_insert_node_in_range+0x24/0x348 [ 338.186407] sp : ff8013033b30 [ 338.189816] x29: ff8013033bd0 x28: ff8008591894 [ 338.195275] x27: 0010 x26: [ 338.200734] x25: x24: [ 338.206194] x23: x22: ffc0f48b7e08 [ 338.211656] x21: x20: 005d [ 338.217118] x19: x18: [ 338.222581] x17: x16: [ 338.228046] x15: x14: [ 338.233511] x13: 0001 x12: ffc0b1da7200 [ 338.238978] x11: 0010 x10: 0010 [ 338.244437] x9 : 0008 x8 : 4000 [ 338.249898] x7 : x6 : [ 338.255361] x5 : x4 : [ 338.260823] x3 : x2 : 005d [ 338.266285] x1 : ffc0b1da7100 x0 : ffc0b0215800 [ 338.271748] Process chrome (pid: 3235, stack limit = 0x0900f416) [ 338.278628] Call trace: [ 338.281151] drm_mm_insert_node_in_range+0xfc/0x348 [ 338.286168] msm_gem_map_vma+0x60/0xdc [ 338.290022] msm_gem_get_iova+0xb4/0xf4 [ 338.293967] msm_ioctl_gem_info+0x90/0xdc [ 338.298089] drm_ioctl_kernel+0xa8/0xe8 [ 338.302043] drm_ioctl+0x218/0x384 [ 338.305547] drm_compat_ioctl+0xd8/0xe8 [ 338.309503] __arm64_compat_sys_ioctl+0x134/0x20c [ 338.314339] el0_svc_common+0xa0/0xf0 [ 338.318108] el0_svc_compat_handler+0x2c/0x38 [ 338.322588] el0_svc_compat+0x8/0x18 [ 338.326274] Code: f94066c8 aa1f03e0 321d03e9 321c03ea (f9401d0b) [ 338.332538] ---[ end trace 5c09e60869887d87 ]--- [ 338.354633] Kernel panic - not syncing: Fatal exception [ 338.360018] SMP: stopping secondary CPUs [ 338.364179] Kernel Offset: disabled [ 338.367779] CPU features: 0x0,22802a18 [ 338.371643] Memory Limit: none -- drivers/gpu/drm/drm_prime.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 231e3f6d5f41..0d9b1c43523a 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -945,7 +945,7 @@ int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages, index = 0; for_each_sg(sgt->sgl, sg, sgt->nents, count) { - len = sg->length; + len = sg_dma_length(sg);
Re: [PATCH 1/1] iommu/arm-smmu: Add support to use Last level cache
Hi Robin, On Fri, Dec 7, 2018 at 2:54 PM Vivek Gautam wrote: > > Hi Robin, > > On Tue, Dec 4, 2018 at 8:51 PM Robin Murphy wrote: > > > > On 04/12/2018 11:01, Vivek Gautam wrote: > > > Qualcomm SoCs have an additional level of cache called as > > > System cache, aka. Last level cache (LLC). This cache sits right > > > before the DDR, and is tightly coupled with the memory controller. > > > The cache is available to all the clients present in the SoC system. > > > The clients request their slices from this system cache, make it > > > active, and can then start using it. > > > For these clients with smmu, to start using the system cache for > > > buffers and, related page tables [1], memory attributes need to be > > > set accordingly. > > > This change updates the MAIR and TCR configurations with correct > > > attributes to use this system cache. > > > > > > To explain a little about memory attribute requirements here: > > > > > > Non-coherent I/O devices can't look-up into inner caches. However, > > > coherent I/O devices can. But both can allocate in the system cache > > > based on system policy and configured memory attributes in page > > > tables. > > > CPUs can access both inner and outer caches (including system cache, > > > aka. Last level cache), and can allocate into system cache too > > > based on memory attributes, and system policy. > > > > > > Further looking at memory types, we have following - > > > a) Normal uncached :- MAIR 0x44, inner non-cacheable, > > >outer non-cacheable; > > > b) Normal cached :- MAIR 0xff, inner read write-back non-transient, > > >outer read write-back non-transient; > > >attribute setting for coherenet I/O devices. > > > > > > and, for non-coherent i/o devices that can allocate in system cache > > > another type gets added - > > > c) Normal sys-cached/non-inner-cached :- > > >MAIR 0xf4, inner non-cacheable, > > >outer read write-back non-transient > > > > > > So, CPU will automatically use the system cache for memory marked as > > > normal cached. The normal sys-cached is downgraded to normal non-cached > > > memory for CPUs. > > > Coherent I/O devices can use system cache by marking the memory as > > > normal cached. > > > Non-coherent I/O devices, to use system cache, should mark the memory as > > > normal sys-cached in page tables. > > > > > > This change is a realisation of following changes > > > from downstream msm-4.9: > > > iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2] > > > iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3] > > > > > > [1] https://patchwork.kernel.org/patch/10302791/ > > > [2] > > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=bf762276796e79ca90014992f4d9da5593fa7d51 > > > [3] > > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=d4c72c413ea27c43f60825193d4de9cb8ffd9602 > > > > > > Signed-off-by: Vivek Gautam > > > --- > > > > > > Changes since v1: > > > - Addressed Tomasz's comments for basing the change on > > > "NO_INNER_CACHE" concept for non-coherent I/O devices > > > rather than capturing "SYS_CACHE". This is to indicate > > > clearly the intent of non-coherent I/O devices that > > > can't access inner caches. > > > > That seems backwards to me - there is already a fundamental assumption > > that non-coherent devices can't access caches. What we're adding here is > > a weird exception where they *can* use some level of cache despite still > > being non-coherent overall. > > > > In other words, it's not a case of downgrading coherent devices' > > accesses to bypass inner caches, it's upgrading non-coherent devices' > > accesses to hit the outer cache. That's certainly the understanding I > > got from talking with Pratik at Plumbers, and it does appear to fit with > > your explanation above despite the final conclusion you draw being > > different. > > Thanks for the thorough review of the change. > Right, I guess it's rather an upgrade for non-coherent devices to use > an outer cache than a downgrade for coherent devices. > > > > > I do see what Toma
Re: [PATCH 1/1] iommu/arm-smmu: Add support to use Last level cache
On Thu, Dec 13, 2018 at 9:20 AM Tomasz Figa wrote: > > On Fri, Dec 7, 2018 at 6:25 PM Vivek Gautam > wrote: > > > > Hi Robin, > > > > On Tue, Dec 4, 2018 at 8:51 PM Robin Murphy wrote: > > > > > > On 04/12/2018 11:01, Vivek Gautam wrote: > > > > Qualcomm SoCs have an additional level of cache called as > > > > System cache, aka. Last level cache (LLC). This cache sits right > > > > before the DDR, and is tightly coupled with the memory controller. > > > > The cache is available to all the clients present in the SoC system. > > > > The clients request their slices from this system cache, make it > > > > active, and can then start using it. > > > > For these clients with smmu, to start using the system cache for > > > > buffers and, related page tables [1], memory attributes need to be > > > > set accordingly. > > > > This change updates the MAIR and TCR configurations with correct > > > > attributes to use this system cache. > > > > > > > > To explain a little about memory attribute requirements here: > > > > > > > > Non-coherent I/O devices can't look-up into inner caches. However, > > > > coherent I/O devices can. But both can allocate in the system cache > > > > based on system policy and configured memory attributes in page > > > > tables. > > > > CPUs can access both inner and outer caches (including system cache, > > > > aka. Last level cache), and can allocate into system cache too > > > > based on memory attributes, and system policy. > > > > > > > > Further looking at memory types, we have following - > > > > a) Normal uncached :- MAIR 0x44, inner non-cacheable, > > > >outer non-cacheable; > > > > b) Normal cached :- MAIR 0xff, inner read write-back non-transient, > > > >outer read write-back non-transient; > > > >attribute setting for coherenet I/O devices. > > > > > > > > and, for non-coherent i/o devices that can allocate in system cache > > > > another type gets added - > > > > c) Normal sys-cached/non-inner-cached :- > > > >MAIR 0xf4, inner non-cacheable, > > > >outer read write-back non-transient > > > > > > > > So, CPU will automatically use the system cache for memory marked as > > > > normal cached. The normal sys-cached is downgraded to normal non-cached > > > > memory for CPUs. > > > > Coherent I/O devices can use system cache by marking the memory as > > > > normal cached. > > > > Non-coherent I/O devices, to use system cache, should mark the memory as > > > > normal sys-cached in page tables. > > > > > > > > This change is a realisation of following changes > > > > from downstream msm-4.9: > > > > iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2] > > > > iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3] > > > > > > > > [1] https://patchwork.kernel.org/patch/10302791/ > > > > [2] > > > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=bf762276796e79ca90014992f4d9da5593fa7d51 > > > > [3] > > > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=d4c72c413ea27c43f60825193d4de9cb8ffd9602 > > > > > > > > Signed-off-by: Vivek Gautam > > > > --- > > > > > > > > Changes since v1: > > > > - Addressed Tomasz's comments for basing the change on > > > > "NO_INNER_CACHE" concept for non-coherent I/O devices > > > > rather than capturing "SYS_CACHE". This is to indicate > > > > clearly the intent of non-coherent I/O devices that > > > > can't access inner caches. > > > > > > That seems backwards to me - there is already a fundamental assumption > > > that non-coherent devices can't access caches. What we're adding here is > > > a weird exception where they *can* use some level of cache despite still > > > being non-coherent overall. > > > > > > In other words, it's not a case of downgrading coherent devices' > > > accesses to bypass inner caches, it's upgrading non-coherent devices' > > > accesses to hit the outer cache. That
Re: [PATCH v1] phy: qcom-ufs: Use iopoll.h readl_poll_timeout macro
On Fri, Dec 21, 2018 at 9:43 PM Marc Gonzalez wrote: > > The private copy of readl_poll_timeout is no longer needed. > Use the implementation in iopoll.h instead. > > Signed-off-by: Marc Gonzalez > --- > drivers/phy/qualcomm/phy-qcom-ufs-i.h | 19 +-- > 1 file changed, 1 insertion(+), 18 deletions(-) > > diff --git a/drivers/phy/qualcomm/phy-qcom-ufs-i.h > b/drivers/phy/qualcomm/phy-qcom-ufs-i.h > index 681644e43248..f798fb64de94 100644 > --- a/drivers/phy/qualcomm/phy-qcom-ufs-i.h > +++ b/drivers/phy/qualcomm/phy-qcom-ufs-i.h > @@ -23,24 +23,7 @@ > #include > #include > #include > - > -#define readl_poll_timeout(addr, val, cond, sleep_us, timeout_us) \ > -({ \ > - ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \ > - might_sleep_if(timeout_us); \ > - for (;;) { \ > - (val) = readl(addr); \ > - if (cond) \ > - break; \ > - if (timeout_us && ktime_compare(ktime_get(), timeout) > 0) { \ > - (val) = readl(addr); \ > - break; \ > - } \ > - if (sleep_us) \ > - usleep_range(DIV_ROUND_UP(sleep_us, 4), sleep_us); \ > - } \ > - (cond) ? 0 : -ETIMEDOUT; \ > -}) > +#include > > #define UFS_QCOM_PHY_CAL_ENTRY(reg, val) \ > { \ > -- > 2.17.1 Thanks for the patch. LGTM. Reviewed-by: Vivek Gautam Best regards Vivek -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [RESEND PATCH v4 1/1] dt-bindings: arm-smmu: Add binding doc for Qcom smmu-500
On Thu, Dec 13, 2018 at 4:16 PM Will Deacon wrote: > > On Thu, Dec 13, 2018 at 02:35:07PM +0530, Vivek Gautam wrote: > > Qcom's implementation of arm,mmu-500 works well with current > > arm-smmu driver implementation. Adding a soc specific compatible > > along with arm,mmu-500 makes the bindings future safe. > > > > Signed-off-by: Vivek Gautam > > Reviewed-by: Rob Herring > > Cc: Will Deacon > > --- > > > > Hi Joerg, > > I am picking this out separately from the sdm845 smmu support > > series [1], so that this can go through iommu tree. > > The dt patch from the series [1] can be taken through arm-soc tree. > > > > Hi Will, > > As asked [2], here's the resend version of dt binding patch for sdm845. > > Kindly ack this so that Joerg can pull this in. > > Acked-by: Will Deacon Thanks a lot Will for the Ack. Regards Vivek > > Joerg -- please can you take this on top of the pull request I sent already? > Vivek included it as part of a separate series which I thought was going > via arm-soc, but actually it needs to go with the other arm-smmu patches > in order to avoid conflicts. > > Cheers, > > Will > > > Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt > > b/Documentation/devicetree/bindings/iommu/arm,smmu.txt > > index a6504b37cc21..3133f3ba7567 100644 > > --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt > > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt > > @@ -27,6 +27,10 @@ conditions. > >"qcom,msm8996-smmu-v2", "qcom,smmu-v2", > >"qcom,sdm845-smmu-v2", "qcom,smmu-v2". > > > > + Qcom SoCs implementing "arm,mmu-500" must also include, > > + as below, SoC-specific compatibles: > > + "qcom,sdm845-smmu-500", "arm,mmu-500" > > + > > - reg : Base address and size of the SMMU. > > > > - #global-interrupts : The number of global interrupts exposed by the > > -- > > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > > of Code Aurora Forum, hosted by The Linux Foundation > > > ___ > iommu mailing list > io...@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[RESEND PATCH v4 1/1] dt-bindings: arm-smmu: Add binding doc for Qcom smmu-500
Qcom's implementation of arm,mmu-500 works well with current arm-smmu driver implementation. Adding a soc specific compatible along with arm,mmu-500 makes the bindings future safe. Signed-off-by: Vivek Gautam Reviewed-by: Rob Herring Cc: Will Deacon --- Hi Joerg, I am picking this out separately from the sdm845 smmu support series [1], so that this can go through iommu tree. The dt patch from the series [1] can be taken through arm-soc tree. Hi Will, As asked [2], here's the resend version of dt binding patch for sdm845. Kindly ack this so that Joerg can pull this in. Thanks Vivek [1] https://patchwork.kernel.org/cover/10636359/ [2] https://patchwork.kernel.org/patch/10636363/ Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index a6504b37cc21..3133f3ba7567 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -27,6 +27,10 @@ conditions. "qcom,msm8996-smmu-v2", "qcom,smmu-v2", "qcom,sdm845-smmu-v2", "qcom,smmu-v2". + Qcom SoCs implementing "arm,mmu-500" must also include, + as below, SoC-specific compatibles: + "qcom,sdm845-smmu-500", "arm,mmu-500" + - reg : Base address and size of the SMMU. - #global-interrupts : The number of global interrupts exposed by the -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 2/2] arm64: dts: msm8996: Add display smmu node
From: Archit Taneja Add device node for display smmu, aka. mdp_smmu. Signed-off-by: Archit Taneja Signed-off-by: Vivek Gautam --- arch/arm64/boot/dts/qcom/msm8996.dtsi | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi index 197e186eac10..949e3b99fda4 100644 --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi @@ -1121,6 +1121,23 @@ power-domains = <&mmcc GPU_GDSC>; }; + mdp_smmu: arm,smmu@d0 { + compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2"; + reg = <0xd0 0x1>; + + #global-interrupts = <1>; + interrupts = , +, +; + #iommu-cells = <1>; + + clocks = <&mmcc SMMU_MDP_AHB_CLK>, +<&mmcc SMMU_MDP_AXI_CLK>; + clock-names = "iface", "bus"; + + power-domains = <&mmcc MDSS_GDSC>; + }; + agnoc@0 { power-domains = <&gcc AGGRE0_NOC_GDSC>; compatible = "simple-pm-bus"; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 1/2] arm64: dts: msm8996: Add graphics smmu node
From: Jordan Crouse Add device node for graphics smmu, aka. adreno_smmu. Signed-off-by: Jordan Crouse Signed-off-by: Vivek Gautam --- arch/arm64/boot/dts/qcom/msm8996.dtsi | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi index 99b7495455a6..197e186eac10 100644 --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi @@ -1104,6 +1104,23 @@ }; }; + adreno_smmu: arm,smmu@b4 { + compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2"; + reg = <0xb4 0x1>; + + #global-interrupts = <1>; + interrupts = , +, +; + #iommu-cells = <1>; + + clocks = <&mmcc GPU_AHB_CLK>, +<&gcc GCC_MMSS_BIMC_GFX_CLK>; + clock-names = "iface", "bus"; + + power-domains = <&mmcc GPU_GDSC>; + }; + agnoc@0 { power-domains = <&gcc AGGRE0_NOC_GDSC>; compatible = "simple-pm-bus"; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 0/2] arm64: dts: msm8996: Add display and graphics smmu
The driver side patches are now pulled in [1]. So, we can now enable these smmu's used by display and graphics. This has been lying in my test trees [2] for a while, and work well with display and gpu enabled on msm8996. [1] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates [2] https://github.com/vivekgautam1/linux/tree/origin/v4.20-rc5/db820c Archit Taneja (1): arm64: dts: msm8996: Add display smmu node Jordan Crouse (1): arm64: dts: msm8996: Add graphics smmu node arch/arm64/boot/dts/qcom/msm8996.dtsi | 34 ++ 1 file changed, 34 insertions(+) -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v4 1/2] dt-bindings: arm-smmu: Add binding doc for Qcom smmu-500
Hi Will, On Fri, Oct 12, 2018 at 11:37 AM Vivek Gautam wrote: > > > > On 10/12/2018 3:46 AM, Rob Herring wrote: > > On Thu, 11 Oct 2018 15:19:29 +0530, Vivek Gautam wrote: > >> Qcom's implementation of arm,mmu-500 works well with current > >> arm-smmu driver implementation. Adding a soc specific compatible > >> along with arm,mmu-500 makes the bindings future safe. > >> > >> Signed-off-by: Vivek Gautam > >> --- > >> > >> Changes since v3: > >> - Refined language more to state things directly for the bindings > >> description. > >> > >> Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 > >> 1 file changed, 4 insertions(+) > >> > > Reviewed-by: Rob Herring > > Thank you Rob. > Can you please pick this one as well to your tree? This goes on top of the bindings patch for "qcom,smmu-v2". So, it can't go through Andy's tree. Will ask Andy to pick the second patch of the series, that adds the dt node. I guess as I sent this one along with the dt patch, I would have mistakenly added you to 'cc' list rather than 'to' list. Let me know if you would like me to resend it. Thank Vivek -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH 1/1] media: venus: core: Set dma maximum segment size
On 12/7/2018 3:38 PM, Stanimir Varbanov wrote: Hi Vivek, Thanks for the patch! On 12/5/18 10:31 AM, Vivek Gautam wrote: Turning on CONFIG_DMA_API_DEBUG_SG results in the following error: [ 460.308650] [ cut here ] [ 460.313490] qcom-venus aa0.video-codec: DMA-API: mapping sg segment longer than device claims to support [len=4194304] [max=65536] [ 460.326017] WARNING: CPU: 3 PID: 3555 at src/kernel/dma/debug.c:1301 debug_dma_map_sg+0x174/0x254 [ 460.33] Modules linked in: venus_dec venus_enc videobuf2_dma_sg videobuf2_memops hci_uart btqca bluetooth venus_core v4l2_mem2mem videobuf2_v4l2 videobuf2_common ath10k_snoc ath10k_core ath lzo lzo_compress zramjoydev [ 460.375811] CPU: 3 PID: 3555 Comm: V4L2DecoderThre Tainted: GW 4.19.1 #82 [ 460.384223] Hardware name: Google Cheza (rev1) (DT) [ 460.389251] pstate: 6049 (nZCv daif +PAN -UAO) [ 460.394191] pc : debug_dma_map_sg+0x174/0x254 [ 460.398680] lr : debug_dma_map_sg+0x174/0x254 [ 460.403162] sp : ff80200c37d0 [ 460.406583] x29: ff80200c3830 x28: 0001 [ 460.412056] x27: x26: ffc0f785ea80 [ 460.417532] x25: x24: ffc0f4ea1290 [ 460.423001] x23: ffc09e700300 x22: ffc0f4ea1290 [ 460.428470] x21: ff8009037000 x20: 0001 [ 460.433936] x19: ff80091b x18: [ 460.439411] x17: x16: f251 [ 460.444885] x15: 0006 x14: 0720072007200720 [ 460.450354] x13: ff800af536e0 x12: [ 460.455822] x11: x10: [ 460.461288] x9 : 537944d9c6c48d00 x8 : 537944d9c6c48d00 [ 460.466758] x7 : x6 : ffc0f8d98f80 [ 460.472230] x5 : x4 : [ 460.477703] x3 : 008a x2 : ffc0fdb13948 [ 460.483170] x1 : ffc0fdb0b0b0 x0 : 007a [ 460.488640] Call trace: [ 460.491165] debug_dma_map_sg+0x174/0x254 [ 460.495307] vb2_dma_sg_alloc+0x260/0x2dc [videobuf2_dma_sg] [ 460.501150] __vb2_queue_alloc+0x164/0x374 [videobuf2_common] [ 460.507076] vb2_core_reqbufs+0xfc/0x23c [videobuf2_common] [ 460.512815] vb2_reqbufs+0x44/0x5c [videobuf2_v4l2] [ 460.517853] v4l2_m2m_reqbufs+0x44/0x78 [v4l2_mem2mem] [ 460.523144] v4l2_m2m_ioctl_reqbufs+0x1c/0x28 [v4l2_mem2mem] [ 460.528976] v4l_reqbufs+0x30/0x40 [ 460.532480] __video_do_ioctl+0x36c/0x454 [ 460.536610] video_usercopy+0x25c/0x51c [ 460.540572] video_ioctl2+0x38/0x48 [ 460.544176] v4l2_ioctl+0x60/0x74 [ 460.547602] do_video_ioctl+0x948/0x3520 [ 460.551648] v4l2_compat_ioctl32+0x60/0x98 [ 460.555872] __arm64_compat_sys_ioctl+0x134/0x20c [ 460.560718] el0_svc_common+0x9c/0xe4 [ 460.564498] el0_svc_compat_handler+0x2c/0x38 [ 460.568982] el0_svc_compat+0x8/0x18 [ 460.572672] ---[ end trace ce209b87b2f3af88 ]--- From above warning one would deduce that the sg segment will overflow the device's capacity. In reality, the hardware can accommodate larger sg segments. So, initialize the max segment size properly to weed out this warning. Based on a similar patch sent by Sean Paul for mdss: https://patchwork.kernel.org/patch/10671457/ Signed-off-by: Vivek Gautam --- drivers/media/platform/qcom/venus/core.c | 8 1 file changed, 8 insertions(+) Acked-by: Stanimir Varbanov Thanks Stan. Best regards Vivek
Re: [PATCH 1/1] iommu/arm-smmu: Add support to use Last level cache
Hi Robin, On Tue, Dec 4, 2018 at 8:51 PM Robin Murphy wrote: > > On 04/12/2018 11:01, Vivek Gautam wrote: > > Qualcomm SoCs have an additional level of cache called as > > System cache, aka. Last level cache (LLC). This cache sits right > > before the DDR, and is tightly coupled with the memory controller. > > The cache is available to all the clients present in the SoC system. > > The clients request their slices from this system cache, make it > > active, and can then start using it. > > For these clients with smmu, to start using the system cache for > > buffers and, related page tables [1], memory attributes need to be > > set accordingly. > > This change updates the MAIR and TCR configurations with correct > > attributes to use this system cache. > > > > To explain a little about memory attribute requirements here: > > > > Non-coherent I/O devices can't look-up into inner caches. However, > > coherent I/O devices can. But both can allocate in the system cache > > based on system policy and configured memory attributes in page > > tables. > > CPUs can access both inner and outer caches (including system cache, > > aka. Last level cache), and can allocate into system cache too > > based on memory attributes, and system policy. > > > > Further looking at memory types, we have following - > > a) Normal uncached :- MAIR 0x44, inner non-cacheable, > >outer non-cacheable; > > b) Normal cached :- MAIR 0xff, inner read write-back non-transient, > >outer read write-back non-transient; > >attribute setting for coherenet I/O devices. > > > > and, for non-coherent i/o devices that can allocate in system cache > > another type gets added - > > c) Normal sys-cached/non-inner-cached :- > >MAIR 0xf4, inner non-cacheable, > >outer read write-back non-transient > > > > So, CPU will automatically use the system cache for memory marked as > > normal cached. The normal sys-cached is downgraded to normal non-cached > > memory for CPUs. > > Coherent I/O devices can use system cache by marking the memory as > > normal cached. > > Non-coherent I/O devices, to use system cache, should mark the memory as > > normal sys-cached in page tables. > > > > This change is a realisation of following changes > > from downstream msm-4.9: > > iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2] > > iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3] > > > > [1] https://patchwork.kernel.org/patch/10302791/ > > [2] > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=bf762276796e79ca90014992f4d9da5593fa7d51 > > [3] > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=d4c72c413ea27c43f60825193d4de9cb8ffd9602 > > > > Signed-off-by: Vivek Gautam > > --- > > > > Changes since v1: > > - Addressed Tomasz's comments for basing the change on > > "NO_INNER_CACHE" concept for non-coherent I/O devices > > rather than capturing "SYS_CACHE". This is to indicate > > clearly the intent of non-coherent I/O devices that > > can't access inner caches. > > That seems backwards to me - there is already a fundamental assumption > that non-coherent devices can't access caches. What we're adding here is > a weird exception where they *can* use some level of cache despite still > being non-coherent overall. > > In other words, it's not a case of downgrading coherent devices' > accesses to bypass inner caches, it's upgrading non-coherent devices' > accesses to hit the outer cache. That's certainly the understanding I > got from talking with Pratik at Plumbers, and it does appear to fit with > your explanation above despite the final conclusion you draw being > different. Thanks for the thorough review of the change. Right, I guess it's rather an upgrade for non-coherent devices to use an outer cache than a downgrade for coherent devices. > > I do see what Tomasz meant in terms of the TCR attributes, but what we > currently do there is a little unintuitive and not at all representative > of actual mapping attributes - I'll come back to that inline. > > > drivers/iommu/arm-smmu.c | 15 +++ > > drivers/iommu/dma-iommu.c | 3 +++ > > drivers/iommu/io-pgtable-arm.c | 22 +- > > drivers/iommu/io-pgtable.h | 5 + > > include/linux/iommu.h |
Re: [PATCH v6 2/5] phy: qcom-qmp: Utilize fully-specified DT registers
* > * Get memory resources for each phy lane: > -* Resources are indexed as: tx -> 0; rx -> 1; pcs -> 2; and > -* pcs_misc (optional) -> 3. > +* Resources are indexed as: tx -> 0; rx -> 1; pcs -> 2. > +* For dual lane PHYs: tx2 -> 3, rx2 -> 4, pcs_misc (optional) -> 5 > +* For single lane PHYs: pcs_misc (optional) -> 3. > */ > qphy->tx = of_iomap(np, 0); > if (!qphy->tx) > @@ -1630,7 +1630,32 @@ int qcom_qmp_phy_create(struct device *dev, struct > device_node *np, int id) > if (!qphy->pcs) > return -ENOMEM; > > - qphy->pcs_misc = of_iomap(np, 3); > + /* > +* If this is a dual-lane PHY, then there should be registers for the > +* second lane. Some old device trees did not specify this, so fall > +* back to old legacy behavior of assuming they can be reached at an > +* offset from the first lane. > +*/ > + if (qmp->cfg->is_dual_lane_phy) { > + qphy->tx2 = of_iomap(np, 3); > + qphy->rx2 = of_iomap(np, 4); > + if (!qphy->tx2 || !qphy->rx2) { > + dev_warn(dev, > +"Underspecified device tree, falling back to > legacy register regions\n"); > + > + /* In the old version, pcs_misc is at index 3. */ > + qphy->pcs_misc = qphy->tx2; > + qphy->tx2 = qphy->tx + QMP_PHY_LEGACY_LANE_STRIDE; > + qphy->rx2 = qphy->rx + QMP_PHY_LEGACY_LANE_STRIDE; > + > + } else { > + qphy->pcs_misc = of_iomap(np, 5); > + } > + > + } else { > + qphy->pcs_misc = of_iomap(np, 3); > + } > + > if (!qphy->pcs_misc) > dev_vdbg(dev, "PHY pcs_misc-reg not used\n"); > > -- > 2.18.1 > Tested on db820c [1]. USB, PCIe come up. Tested-by: Vivek Gautam [1] https://github.com/vivekgautam1/linux/tree/origin/v4.20-rc5/db820c BRs -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v1 4/4] phy: qcom-qmp: Expose provided clocks to DT
On Fri, Nov 30, 2018 at 3:46 AM Evan Green wrote: > > Register a simple clock provider for the PHY pipe clock sources so that > device tree users can point at these clocks via phandles to the lane > nodes. > > Signed-off-by: Evan Green > --- > > drivers/phy/qualcomm/phy-qcom-qmp.c | 23 ++- > 1 file changed, 22 insertions(+), 1 deletion(-) > > diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c > b/drivers/phy/qualcomm/phy-qcom-qmp.c > index 8204d55e2d650..b4006818e1b65 100644 > --- a/drivers/phy/qualcomm/phy-qcom-qmp.c > +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c > @@ -1542,6 +1542,11 @@ static int qcom_qmp_phy_clk_init(struct device *dev) > return devm_clk_bulk_get(dev, num, qmp->clks); > } > > +static void phy_pipe_clk_release_provider(void *res) > +{ > + of_clk_del_provider(res); > +} > + > /* > * Register a fixed rate pipe clock. > * > @@ -1588,7 +1593,23 @@ static int phy_pipe_clk_register(struct qcom_qmp *qmp, > struct device_node *np) > fixed->fixed_rate = 12500; > fixed->hw.init = &init; > > - return devm_clk_hw_register(qmp->dev, &fixed->hw); > + ret = devm_clk_hw_register(qmp->dev, &fixed->hw); > + if (ret) > + return ret; > + > + ret = of_clk_add_hw_provider(np, of_clk_hw_simple_get, &fixed->hw); > + if (ret) > + return ret; > + > + /* > +* Roll a devm action because the clock provider is the child node, > but > +* the child node is not actually a device. > +*/ > + ret = devm_add_action(qmp->dev, phy_pipe_clk_release_provider, np); > + if (ret) > + phy_pipe_clk_release_provider(np); > + > + return ret; > } > > static const struct phy_ops qcom_qmp_phy_gen_ops = { > -- > 2.18.1 > Tested on db820c [1] Tested-by: Vivek Gautam [1] https://github.com/vivekgautam1/linux/tree/origin/v4.20-rc5/db820c BRs Vivek -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v1 2/4] arm64: dts: qcom: msm8996: Fix QMP PHY #clock-cells
On Fri, Nov 30, 2018 at 3:45 AM Evan Green wrote: > > Move #clock-cells into the child node and set it to 0 to conform to the > proper binding specification. > > Signed-off-by: Evan Green > --- > > arch/arm64/boot/dts/qcom/msm8996.dtsi | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi > b/arch/arm64/boot/dts/qcom/msm8996.dtsi > index 13bb96444df00..4af740ca0880f 100644 > --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi > +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi > @@ -767,7 +767,6 @@ > phy@34000 { > compatible = "qcom,msm8996-qmp-pcie-phy"; > reg = <0x34000 0x488>; > - #clock-cells = <1>; > #address-cells = <1>; > #size-cells = <1>; > ranges; > @@ -790,6 +789,7 @@ > reg = <0x035000 0x130>, > <0x035200 0x200>, > <0x035400 0x1dc>; > + #clock-cells = <0>; > #phy-cells = <0>; > > clock-output-names = "pcie_0_pipe_clk_src"; > @@ -803,6 +803,7 @@ > reg = <0x036000 0x130>, > <0x036200 0x200>, > <0x036400 0x1dc>; > + #clock-cells = <0>; > #phy-cells = <0>; > > clock-output-names = "pcie_1_pipe_clk_src"; > @@ -816,6 +817,7 @@ > reg = <0x037000 0x130>, > <0x037200 0x200>, > <0x037400 0x1dc>; > + #clock-cells = <0>; > #phy-cells = <0>; > > clock-output-names = "pcie_2_pipe_clk_src"; > @@ -829,7 +831,6 @@ > phy@741 { > compatible = "qcom,msm8996-qmp-usb3-phy"; > reg = <0x741 0x1c4>; > - #clock-cells = <1>; > #address-cells = <1>; > #size-cells = <1>; > ranges; > @@ -851,6 +852,7 @@ > reg = <0x7410200 0x200>, > <0x7410400 0x130>, > <0x7410600 0x1a8>; > + #clock-cells = <0>; > #phy-cells = <0>; > > clock-output-names = "usb3_phy_pipe_clk_src"; > -- > 2.18.1 > Tested on db820c [1] Tested-by: Vivek Gautam [1] https://github.com/vivekgautam1/linux/tree/origin/v4.20-rc5/db820c BRs Vivek -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v1 0/4] phy: qcom-qmp: Fix clock-cells binding and provider
Hi, On Fri, Dec 7, 2018 at 10:15 AM Kishon Vijay Abraham I wrote: > > Vivek, > > On 04/12/18 6:07 PM, Vivek Gautam wrote: > > Hi Kishon, > > > > On Tue, Dec 4, 2018 at 1:44 PM Kishon Vijay Abraham I wrote: > >> > >> Hi Andy Gross, David Brown, Vivek, > >> > >> On 30/11/18 3:43 AM, Evan Green wrote: > >>> This series fixes the QMP PHY bindings, which had specified #clock-cells > >>> in the parent node, and had set it to 1. Putting it in the parent node is > >>> wrong because the clock providers are the child nodes, so this change > >>> moves it there. Having it set to 1 is also wrong, since nothing is ever > >>> specified as to what should go in that cell. So this changes it to zero. > >>> Finally, this change completes a little bit of code to actually allow > >>> these > >>> exposed clocks to be pointed at in DT. > >>> > >>> I had no idea how to fix up ipq8074.dtsi. It seems to be completely wrong > >>> in > >>> that it doesn't specify #clock-cells at all, has no child nodes, and > >>> specifies clock-output-names in the parent node. As far as I can tell this > >>> doesn't work at all. But I can't add the child nodes myself because I > >>> don't know > >>> 1) how many there are, and 2) the registers in them. I also have no way > >>> to test it. > >>> > >>> Speaking of testing, I was able to test this on sdm845, but haven't > >>> tested msm8996. > >> > >> Can someone help test this series in msm8996? > > > > Sure, will give it a try tomorrow. > > I'm planning to close the merge by today. Can you test this series please? Sorry, got held up with an issue yesterday. Will update you in couple of hours. Thanks Vivek [snip] -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v10 0/6] Support for Qualcomm UFS QMP PHY on SDM845
On Tue, Oct 23, 2018 at 10:07 AM Can Guo wrote: > > This patch series adds support for UFS QMP PHY on SDM845 and the > compatible string for it. This patch series depends on the current > proposed QMP V3 USB3 UNI PHY support for sdm845 driver [1], on > the DT bindings for the QMP V3 USB3 PHYs based dirver [2], and also > rebased on updated pipe_clk initialization sequence [3]. This series > can only be merged once the dependent patches do. > [1] > http://lists-archives.com/linux-kernel/29071659-dt-bindings-phy-qcom-qmp-update-bindings-for-sdm845.html > [2] > http://lists-archives.com/linux-kernel/29071660-phy-qcom-qmp-add-qmp-v3-usb3-uni-phy-support-for-sdm845.html > [3] https://patchwork.kernel.org/patch/10376551/ Besides my comment for PATCH 4/6, I have already reviewed the entire series, and it looks good. If adding new bindings for sdm845 needs a further review, can you separate out just the phy patches from this series (patch 1, 2, 3 & 6), and re-send them. We can ask Kishon if he can pull them in for this merge window. Thanks. best regards Vivek > > Changes since v9: > - Incorporated review comments from Rob. > > Changes since v8: > - Add one new change to support ufs core reset. > - Incorporated review comments from Evan, Vivek. > > Changes since v7: > - Add one new change to update UFS PHY power on sequence > - Incorporated review comments from Evan, Vivek and Manu. > > Changes since v6: > - Add one new change to clean up some structs and field > - Updates the PHY power control sequence. > - Incorporated review comments from Vivek and Manu. > > Changes since v5: > - Updates the PHY power control sequence. > - Updates UFS PHY power on condition check. > > Changes since v4: > - Adds 'ref_aux' clock back to SDM845 UFS PHY clock list. > - Power on PHY before serdes configuration starts. > - Updates the UFS PHY initialization sequence. > - Updates a few UFS PHY registers. > - Incorporated review comments from Vivek and Manu. > > Changes since v3: > - Incorporated review comments from Vivek and Rob. > > Changes since v2: > - Incorporated review comments from Vivek and Rob. > - Remove "ref_aux" from sdm845 ufs phy clock list structure. > > Changes since v1: > - Incorporated review comments from Vivek and Manu. > - Update the commit title of patch 2. > > Can Guo (5): > phy: Update PHY power control sequence > phy: General struct and field cleanup > phy: Add QMP phy based UFS phy support for sdm845 > scsi: ufs: Power on phy after it is initialized > dt-bindings: phy-qcom-qmp: Add UFS phy compatible string for sdm845 > > Dov Levenglick (1): > scsi: ufs: Add core reset support > > .../devicetree/bindings/phy/qcom-qmp-phy.txt | 4 +- > drivers/phy/qualcomm/phy-qcom-qmp.c| 216 > +++-- > drivers/phy/qualcomm/phy-qcom-qmp.h| 15 ++ > drivers/scsi/ufs/ufs-qcom.c| 34 +++- > drivers/scsi/ufs/ufs-qcom.h| 1 + > drivers/scsi/ufs/ufshcd-pltfrm.c | 22 +++ > drivers/scsi/ufs/ufshcd.c | 13 ++ > drivers/scsi/ufs/ufshcd.h | 12 ++ > 8 files changed, 296 insertions(+), 21 deletions(-) > > -- > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project > -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v10 4/6] scsi: ufs: Add core reset support
On Tue, Oct 23, 2018 at 10:06 AM Can Guo wrote: > > From: Dov Levenglick > > Enables core reset support. Add full initialization of the PHY and the > controller before initializing UFS PHY and during link recovery. > > Signed-off-by: Dov Levenglick > Signed-off-by: Amit Nischal > Signed-off-by: Subhash Jadavani > Signed-off-by: Can Guo > --- > drivers/scsi/ufs/ufs-qcom.c | 30 ++ > drivers/scsi/ufs/ufshcd-pltfrm.c | 22 ++ > drivers/scsi/ufs/ufshcd.c| 13 + > drivers/scsi/ufs/ufshcd.h| 12 > 4 files changed, 77 insertions(+) > > diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c > index 2b38db2..698b92d 100644 > --- a/drivers/scsi/ufs/ufs-qcom.c > +++ b/drivers/scsi/ufs/ufs-qcom.c > @@ -616,6 +616,35 @@ static int ufs_qcom_resume(struct ufs_hba *hba, enum > ufs_pm_op pm_op) > return err; > } > > +static int ufs_qcom_core_reset(struct ufs_hba *hba) > +{ > + int ret = -ENOTSUPP; > + > + if (!hba->core_reset) { This check doesn't make much sense. You call this ".core_reset" callback only when "hba->core_reset" is available. Why do we need to check this again here? > + dev_err(hba->dev, "%s: failed, err = %d\n", __func__, > + ret); > + goto out; > + } > + > + ret = reset_control_assert(hba->core_reset); > + if (ret) { > + dev_err(hba->dev, "core_reset assert failed, err = %d\n", > + ret); > + goto out; > + } > + > + /* As per spec, delay is required to let reset assert go through */ > + usleep_range(1, 2); > + > + ret = reset_control_deassert(hba->core_reset); > + if (ret) > + dev_err(hba->dev, "core_reset deassert failed, err = %d\n", > + ret); > + > +out: > + return ret; > +} > + > struct ufs_qcom_dev_params { > u32 pwm_rx_gear;/* pwm rx gear to work in */ > u32 pwm_tx_gear;/* pwm tx gear to work in */ > @@ -1670,6 +1699,7 @@ static void ufs_qcom_dump_dbg_regs(struct ufs_hba *hba) > .apply_dev_quirks = ufs_qcom_apply_dev_quirks, > .suspend= ufs_qcom_suspend, > .resume = ufs_qcom_resume, > + .core_reset = ufs_qcom_core_reset, > .dbg_register_dump = ufs_qcom_dump_dbg_regs, > }; > > diff --git a/drivers/scsi/ufs/ufshcd-pltfrm.c > b/drivers/scsi/ufs/ufshcd-pltfrm.c > index e82bde0..dab11a7 100644 > --- a/drivers/scsi/ufs/ufshcd-pltfrm.c > +++ b/drivers/scsi/ufs/ufshcd-pltfrm.c > @@ -42,6 +42,22 @@ > > #define UFSHCD_DEFAULT_LANES_PER_DIRECTION 2 > > +static int ufshcd_parse_reset_info(struct ufs_hba *hba) > +{ > + int ret = 0; > + > + hba->core_reset = devm_reset_control_get_optional_exclusive(hba->dev, > + "rst"); > + if (IS_ERR(hba->core_reset)) { > + ret = PTR_ERR(hba->core_reset); First thing, you need to check here for EPROBE_DEFER, and return that as reset framework may not be probed when this is probing. Secondly, this whole parse thing can as well be moved to vops (variant ops) as that's the device having knowledge of resets. Moreover, not all qcom ufs controllers have the reset, so I am tilting towards adding a of_match_data field and corresponding compatible binding for sdm845 (and may be for future SoCs too) so that we can make this reset mandatory for SoCs where things won't work without it. Simply acknowledging the absence of the reset and marking it as NULL won't help 845 and brothers that need the reset. Or, do we have any other solution to make this reset mandatory for 845? > + dev_err(hba->dev, "core_reset unavailable,err = %d\n", > + ret); > + hba->core_reset = NULL; > + } > + > + return ret; > +} > + > static int ufshcd_parse_clock_info(struct ufs_hba *hba) > { > int ret = 0; > @@ -340,6 +356,12 @@ int ufshcd_pltfrm_init(struct platform_device *pdev, > goto dealloc_host; > } > > + err = ufshcd_parse_reset_info(hba); > + if (err) { > + dev_err(&pdev->dev, "%s: reset parse failed %d\n", > + __func__, err); > + } > + > pm_runtime_set_active(&pdev->dev); > pm_runtime_enable(&pdev->dev); > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index a355d98..d18c3af 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -3657,6 +3657,15 @@ static int ufshcd_link_recovery(struct ufs_hba *hba) > ufshcd_set_eh_in_progress(hba); > spin_unlock_irqrestore(hba->host->host_lock, flags); > > + if (hba->core_reset) { > + ret = ufshcd_vops_core_reset(hba); > + if (ret) > +
Re: [PATCH v1 0/4] phy: qcom-qmp: Fix clock-cells binding and provider
Hi Kishon, On Tue, Dec 4, 2018 at 1:44 PM Kishon Vijay Abraham I wrote: > > Hi Andy Gross, David Brown, Vivek, > > On 30/11/18 3:43 AM, Evan Green wrote: > > This series fixes the QMP PHY bindings, which had specified #clock-cells > > in the parent node, and had set it to 1. Putting it in the parent node is > > wrong because the clock providers are the child nodes, so this change > > moves it there. Having it set to 1 is also wrong, since nothing is ever > > specified as to what should go in that cell. So this changes it to zero. > > Finally, this change completes a little bit of code to actually allow these > > exposed clocks to be pointed at in DT. > > > > I had no idea how to fix up ipq8074.dtsi. It seems to be completely wrong in > > that it doesn't specify #clock-cells at all, has no child nodes, and > > specifies clock-output-names in the parent node. As far as I can tell this > > doesn't work at all. But I can't add the child nodes myself because I don't > > know > > 1) how many there are, and 2) the registers in them. I also have no way to > > test it. > > > > Speaking of testing, I was able to test this on sdm845, but haven't tested > > msm8996. > > Can someone help test this series in msm8996? Sure, will give it a try tomorrow. Thanks Vivek > > Thanks > Kishon > > > > > This patch sits atop the UFS device nodes series [1]. > > > > [1] > > https://lore.kernel.org/lkml/20181026173544.136037-1-evgr...@chromium.org/ > > > > > > > > Evan Green (4): > > dt-bindings: phy-qcom-qmp: Move #clock-cells to child > > arm64: dts: qcom: msm8996: Fix QMP PHY #clock-cells > > arm64: dts: qcom: sdm845: Fix QMP PHY #clock-cells > > phy: qcom-qmp: Expose provided clocks to DT > > > > .../devicetree/bindings/phy/qcom-qmp-phy.txt | 11 - > > arch/arm64/boot/dts/qcom/msm8996.dtsi | 6 +++-- > > arch/arm64/boot/dts/qcom/sdm845.dtsi | 4 ++-- > > drivers/phy/qualcomm/phy-qcom-qmp.c | 23 ++- > > 4 files changed, 33 insertions(+), 11 deletions(-) > > -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 1/1] iommu/arm-smmu: Add support to use Last level cache
Qualcomm SoCs have an additional level of cache called as System cache, aka. Last level cache (LLC). This cache sits right before the DDR, and is tightly coupled with the memory controller. The cache is available to all the clients present in the SoC system. The clients request their slices from this system cache, make it active, and can then start using it. For these clients with smmu, to start using the system cache for buffers and, related page tables [1], memory attributes need to be set accordingly. This change updates the MAIR and TCR configurations with correct attributes to use this system cache. To explain a little about memory attribute requirements here: Non-coherent I/O devices can't look-up into inner caches. However, coherent I/O devices can. But both can allocate in the system cache based on system policy and configured memory attributes in page tables. CPUs can access both inner and outer caches (including system cache, aka. Last level cache), and can allocate into system cache too based on memory attributes, and system policy. Further looking at memory types, we have following - a) Normal uncached :- MAIR 0x44, inner non-cacheable, outer non-cacheable; b) Normal cached :- MAIR 0xff, inner read write-back non-transient, outer read write-back non-transient; attribute setting for coherenet I/O devices. and, for non-coherent i/o devices that can allocate in system cache another type gets added - c) Normal sys-cached/non-inner-cached :- MAIR 0xf4, inner non-cacheable, outer read write-back non-transient So, CPU will automatically use the system cache for memory marked as normal cached. The normal sys-cached is downgraded to normal non-cached memory for CPUs. Coherent I/O devices can use system cache by marking the memory as normal cached. Non-coherent I/O devices, to use system cache, should mark the memory as normal sys-cached in page tables. This change is a realisation of following changes from downstream msm-4.9: iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2] iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3] [1] https://patchwork.kernel.org/patch/10302791/ [2] https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=bf762276796e79ca90014992f4d9da5593fa7d51 [3] https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=d4c72c413ea27c43f60825193d4de9cb8ffd9602 Signed-off-by: Vivek Gautam --- Changes since v1: - Addressed Tomasz's comments for basing the change on "NO_INNER_CACHE" concept for non-coherent I/O devices rather than capturing "SYS_CACHE". This is to indicate clearly the intent of non-coherent I/O devices that can't access inner caches. drivers/iommu/arm-smmu.c | 15 +++ drivers/iommu/dma-iommu.c | 3 +++ drivers/iommu/io-pgtable-arm.c | 22 +- drivers/iommu/io-pgtable.h | 5 + include/linux/iommu.h | 3 +++ 5 files changed, 43 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index ba18d89d4732..047f7ff95b0d 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -255,6 +255,7 @@ struct arm_smmu_domain { struct mutexinit_mutex; /* Protects smmu pointer */ spinlock_t cb_lock; /* Serialises ATS1* ops and TLB syncs */ struct iommu_domain domain; + boolno_inner_cache; }; struct arm_smmu_option_prop { @@ -897,6 +898,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, if (smmu_domain->non_strict) pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; + if (smmu_domain->no_inner_cache) + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NO_IC; + smmu_domain->smmu = smmu; pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain); if (!pgtbl_ops) { @@ -1579,6 +1583,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case DOMAIN_ATTR_NESTING: *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); return 0; + case DOMAIN_ATTR_NO_IC: + *((int *)data) = smmu_domain->no_inner_cache; + return 0; default: return -ENODEV; } @@ -1619,6 +1626,14 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, else smmu_domain->stage = ARM_SMMU_DOMAIN_S1; break; + case DOMAIN_ATTR_NO_IC: + if (smmu_domain->smmu) { + ret = -EPERM; + goto out_
[PATCH v19 5/5] iommu/arm-smmu: Add support for qcom,smmu-v2 variant
qcom,smmu-v2 is an arm,smmu-v2 implementation with specific clock and power requirements. On msm8996, multiple cores, viz. mdss, video, etc. use this smmu. On sdm845, this smmu is used with gpu. Add bindings for the same. Signed-off-by: Vivek Gautam Reviewed-by: Rob Herring Reviewed-by: Tomasz Figa Tested-by: Srinivas Kandagatla Reviewed-by: Robin Murphy --- Changes since v18: None. drivers/iommu/arm-smmu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index b6b11642b3a9..ba18d89d4732 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -120,6 +120,7 @@ enum arm_smmu_implementation { GENERIC_SMMU, ARM_MMU500, CAVIUM_SMMUV2, + QCOM_SMMUV2, }; struct arm_smmu_s2cr { @@ -2030,6 +2031,7 @@ ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU); ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); +ARM_SMMU_MATCH_DATA(qcom_smmuv2, ARM_SMMU_V2, QCOM_SMMUV2); static const struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 }, @@ -2038,6 +2040,7 @@ static const struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,mmu-401", .data = &arm_mmu401 }, { .compatible = "arm,mmu-500", .data = &arm_mmu500 }, { .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 }, + { .compatible = "qcom,smmu-v2", .data = &qcom_smmuv2 }, { }, }; MODULE_DEVICE_TABLE(of, arm_smmu_of_match); -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v19 3/5] iommu/arm-smmu: Add the device_link between masters and smmu
From: Sricharan R Finally add the device link between the master device and smmu, so that the smmu gets runtime enabled/disabled only when the master needs it. This is done from add_device callback which gets called once when the master is added to the smmu. Signed-off-by: Sricharan R Signed-off-by: Vivek Gautam Reviewed-by: Tomasz Figa Tested-by: Srinivas Kandagatla Reviewed-by: Robin Murphy --- Changes since v18: None. drivers/iommu/arm-smmu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 1917d214c4d9..b6b11642b3a9 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1500,6 +1500,9 @@ static int arm_smmu_add_device(struct device *dev) iommu_device_link(&smmu->iommu, dev); + device_link_add(dev, smmu->dev, + DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER); + return 0; out_cfg_free: -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*
Hi Tomasz, Jordan, On 11/21/2018 9:18 AM, Tomasz Figa wrote: Hi Jordan, Vivek, On Wed, Nov 21, 2018 at 12:41 AM Jordan Crouse wrote: On Tue, Nov 20, 2018 at 03:24:37PM +0530, Vivek Gautam wrote: dma_map_sg() expects a DMA domain. However, the drm devices have been traditionally using unmanaged iommu domain which is non-dma type. Using dma mapping APIs with that domain is bad. Replace dma_map_sg() calls with dma_sync_sg_for_device{|cpu}() to do the cache maintenance. Signed-off-by: Vivek Gautam Suggested-by: Tomasz Figa --- Tested on an MTP sdm845: https://github.com/vivekgautam1/linux/tree/v4.19/sdm845-mtp-display-working drivers/gpu/drm/msm/msm_gem.c | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c index 00c795ced02c..d7a7af610803 100644 --- a/drivers/gpu/drm/msm/msm_gem.c +++ b/drivers/gpu/drm/msm/msm_gem.c @@ -81,6 +81,8 @@ static struct page **get_pages(struct drm_gem_object *obj) struct drm_device *dev = obj->dev; struct page **p; int npages = obj->size >> PAGE_SHIFT; + struct scatterlist *s; + int i; if (use_pages(obj)) p = drm_gem_get_pages(obj); @@ -107,9 +109,19 @@ static struct page **get_pages(struct drm_gem_object *obj) /* For non-cached buffers, ensure the new pages are clean * because display controller, GPU, etc. are not coherent: */ - if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED)) - dma_map_sg(dev->dev, msm_obj->sgt->sgl, - msm_obj->sgt->nents, DMA_BIDIRECTIONAL); + if (msm_obj->flags & (MSM_BO_WC | MSM_BO_UNCACHED)) { + /* + * Fake up the SG table so that dma_sync_sg_*() + * can be used to flush the pages associated with it. + */ We aren't really faking. The table is real, we are just slightly abusing the sg_dma_address() which makes this comment a bit misleading. Instead I would probably say something like: /* dma_sync_sg_* flushes pages using sg_dma_address() so point it at the * physical page for the right behavior */ Or something like that. It's actually quite complicated, but I agree that the comment isn't very precise. The cases are as follows: - arm64 iommu_dma_ops use sg_phys() https://elixir.bootlin.com/linux/v4.20-rc3/source/arch/arm64/mm/dma-mapping.c#L599 - swiotlb_dma_ops used on arm64 if no IOMMU is available use sg->dma_address directly: https://elixir.bootlin.com/linux/v4.20-rc3/source/kernel/dma/swiotlb.c#L832 - arm_dma_ops use sg_dma_address(): https://elixir.bootlin.com/linux/v4.20-rc3/source/arch/arm/mm/dma-mapping.c#L1130 - arm iommu_ops use sg_page(): https://elixir.bootlin.com/linux/v4.20-rc3/source/arch/arm/mm/dma-mapping.c#L1869 Sounds like a mess... Thanks for the review. Technically with the below assignment we address all of the above. How about an even simpler version of the suggested comment: /* dma_sync_sg_* flushes physical pages, so point sg->dma_address to * the physical one for the right behavior. */ + for_each_sg(msm_obj->sgt->sgl, s, + msm_obj->sgt->nents, i) + sg_dma_address(s) = sg_phys(s); + I'm wondering - wouldn't we want to do this association for cached buffers to so we could sync them correctly in cpu_prep and cpu_fini? Maybe it wouldn't hurt to put this association in the main path (obviously the sync should stay inside the conditional for uncached buffers). Sure, I will move this out of the conditional check. I guess it wouldn't hurt indeed. Note that cpu_prep/fini seem to be missing the sync call currently. I can't say I understand the usage of cpu_prep and cpu_fini(). But I can add the necessary support if you can point me in the right direction. Thanks Best regards Vivek P.S. Jordan, not sure if it's my Gmail or your email client, but your message had all the recipients in a Reply-to header, except you, so pressing Reply to all in my case led to a message that didn't have you in recipients anymore... Best regards, Tomasz
Re: [PATCH 1/2] arm64: dts: qcom: msm8996: Add VFE SMMU node
Hi Todor, On Mon, Nov 19, 2018 at 2:57 PM Todor Tomov wrote: > > Add VFE SMMU node. > > Signed-off-by: Todor Tomov > --- > > This patch depends on patchset: > https://lore.kernel.org/patchwork/cover/1013166/ > > arch/arm64/boot/dts/qcom/msm8996.dtsi | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi > b/arch/arm64/boot/dts/qcom/msm8996.dtsi > index 13bb964..a4d087e5 100644 > --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi > +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi > @@ -950,6 +950,23 @@ > }; > }; > > + vfe_smmu: arm,smmu@da { > + compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2"; > + reg = <0xda 0x1>; > + > + #global-interrupts = <1>; > + interrupts = , > +, > +; > + power-domains = <&mmcc MMAGIC_CAMSS_GDSC>; > + clocks = <&mmcc SMMU_VFE_AHB_CLK>, > +<&mmcc SMMU_VFE_AXI_CLK>; > + clock-names = "iface", > + "bus"; > + #iommu-cells = <1>; > + status = "ok"; No point of adding this status here. Rest looks good to me. Reviewed-by: Vivek Gautam Best regards Vivek > + }; > + > agnoc@0 { > power-domains = <&gcc AGGRE0_NOC_GDSC>; > compatible = "simple-pm-bus"; > -- > 2.7.4 > -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH 1/2] dt-bindings: phy: Add Qualcomm Synopsys High-Speed USB PHY binding
On Mon, Nov 19, 2018 at 12:29 PM Shawn Guo wrote: > > On Sat, Nov 17, 2018 at 09:13:38AM -0600, Rob Herring wrote: > > > > > > +- qcom,init-seq: > > > > > +Value type: > > > > > +Definition: Should contain a sequence of > > > > > tuples to > > > > > +program 'value' into phy register at 'offset' with > > > > > 'delay' > > > > > + in us afterwards. > > > > > > > > If we wanted this type of thing in DT, we'd have a generic binding (or > > > > forth). > > > > > > Right now, this is a qualcomm usb phy specific bindings - first used in > > > qcom,usb-hs-phy.txt and I extended it a bit for my phy. As this is not > > > a so good hardware description, I'm a little hesitated to make it > > > generic for other platforms to use in general. What about we put off it > > > a little bit until we see more platforms need the same thing? > > > > I'm not saying I want it generic. Quite the opposite. I don't think we > > should have it either generically or vendor specific. The main thing I > > have a problem with is the timing information because then we're more > > that just data. Without that we're talking about a bunch of properties > > for register fields or just raw register values in DT. That becomes > > more of a judgement call. There's not too much value in making a > > driver translate a bunch of properties just to stuff them into > > registers on init. But then just allowing any raw register value to be > > in DT could be easily abused. > > Rob, > > I agree with your comments. Honestly, I'm not comfortable with this > 'qcom,init-seq' thing in the first impression. The similar existence in > mainline qcom,usb-hs-phy.txt makes me think it might be acceptable with > the timing data added. Okay, I know your position on this now. > > @Sriharsha, > > Seeing that 'qcom,init-seq' is being configured with the exactly same > values for both HS phys in SoC level dts file (qcs404.dtsi), I think > such settings can be moved into driver code as SoC specific data. > Unless you have a different view on this, I will do it with v4. phy-qcom-qmp and phy-qcom-qusb2 have been maintaining such SoC specific init sequences in the drivers if you would like to have pointers from them. Thanks Vivek > > Shawn -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v2 1/2] phy: qcom-qusb2: Use HSTX_TRIM fused value as is
On 2018-10-25 11:46, Vivek Gautam wrote: Hi Manu, On 10/16/2018 12:52 PM, Manu Gautam wrote: Fix HSTX_TRIM tuning logic which instead of using fused value as HSTX_TRIM, incorrectly performs bitwise OR operation with existing default value. Fixes: ca04d9d3e1b1 ("phy: qcom-qusb2: New driver for QUSB2 PHY on Qcom chips") Signed-off-by: Manu Gautam Reviewed-by: Douglas Anderson --- drivers/phy/qualcomm/phy-qcom-qusb2.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/drivers/phy/qualcomm/phy-qcom-qusb2.c b/drivers/phy/qualcomm/phy-qcom-qusb2.c index e70e425f26f5..9d6c88064158 100644 --- a/drivers/phy/qualcomm/phy-qcom-qusb2.c +++ b/drivers/phy/qualcomm/phy-qcom-qusb2.c @@ -402,10 +402,10 @@ static void qusb2_phy_set_tune2_param(struct qusb2_phy *qphy) /* * Read efuse register having TUNE2/1 parameter's high nibble. - * If efuse register shows value as 0x0, or if we fail to find - * a valid efuse register settings, then use default value - * as 0xB for high nibble that we have already set while - * configuring phy. + * If efuse register shows value as 0x0 (indicating value is not + * fused), or if we fail to find a valid efuse register setting, + * then use default value for high nibble that we have already + * set while configuring the phy. */ val = nvmem_cell_read(qphy->cell, NULL); if (IS_ERR(val) || !val[0]) { @@ -415,12 +415,13 @@ static void qusb2_phy_set_tune2_param(struct qusb2_phy *qphy) /* Fused TUNE1/2 value is the higher nibble only */ if (cfg->update_tune1_with_efuse) -qusb2_setbits(qphy->base, cfg->regs[QUSB2PHY_PORT_TUNE1], - val[0] << 0x4); +qusb2_write_mask(qphy->base, cfg->regs[QUSB2PHY_PORT_TUNE1], + val[0] << HSTX_TRIM_SHIFT, + HSTX_TRIM_MASK); else -qusb2_setbits(qphy->base, cfg->regs[QUSB2PHY_PORT_TUNE2], - val[0] << 0x4); - +qusb2_write_mask(qphy->base, cfg->regs[QUSB2PHY_PORT_TUNE2], + val[0] << HSTX_TRIM_SHIFT, + HSTX_TRIM_MASK); } static int qusb2_phy_set_mode(struct phy *phy, enum phy_mode mode) Thanks for the patch. Acked-by: Vivek Gautam My bad. Didn't notice the HTML mode. Resending, so that it reaches to lists as well. Thanks Vivek Regards Vivek