from:"Shameerali Kolothum Thodi"

RE: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)

2021-04-14 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: wangxingang
> Sent: 14 April 2021 03:36
> To: Eric Auger ; eric.auger@gmail.com;
> jean-phili...@linaro.org; io...@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org; k...@vger.kernel.org;
> kvm...@lists.cs.columbia.edu; w...@kernel.org; m...@kernel.org;
> robin.mur...@arm.com; j...@8bytes.org; alex.william...@redhat.com;
> t...@semihalf.com; zhukeqian 
> Cc: jacob.jun@linux.intel.com; yi.l@intel.com; 
> zhangfei@linaro.org;
> zhangfei....@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
> Thodi ; yuzenghui
> ; nicoleots...@gmail.com; lushenming
> ; vse...@nvidia.com; chenxiang (M)
> ; vdu...@nvidia.com; jiangkunkun
> 
> Subject: Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
> 
> Hi Eric, Jean-Philippe
> 
> On 2021/4/11 19:12, Eric Auger wrote:
> > SMMUv3 Nested Stage Setup (IOMMU part)
> >
> > This series brings the IOMMU part of HW nested paging support
> > in the SMMUv3. The VFIO part is submitted separately.
> >
> > This is based on Jean-Philippe's
> > [PATCH v14 00/10] iommu: I/O page faults for SMMUv3
> > https://www.spinics.net/lists/arm-kernel/msg886518.html
> > (including the patches that were not pulled for 5.13)
> >
> > The IOMMU API is extended to support 2 new API functionalities:
> > 1) pass the guest stage 1 configuration
> > 2) pass stage 1 MSI bindings
> >
> > Then those capabilities gets implemented in the SMMUv3 driver.
> >
> > The virtualizer passes information through the VFIO user API
> > which cascades them to the iommu subsystem. This allows the guest
> > to own stage 1 tables and context descriptors (so-called PASID
> > table) while the host owns stage 2 tables and main configuration
> > structures (STE).
> >
> > Best Regards
> >
> > Eric
> >
> > This series can be found at:
> > v5.12-rc6-jean-iopf-14-2stage-v15
> > (including the VFIO part in its last version: v13)
> >
> 
> I am testing the performance of an accelerator with/without SVA/vSVA,
> and found there might be some potential performance loss risk for SVA/vSVA.
> 
> I use a Network and computing encryption device (SEC), and send 1MB
> request for 1 times.
> 
> I trigger mm fault before I send the request, so there should be no iopf.
> 
> Here's what I got:
> 
> physical scenario:
> performance:  SVA:9MB/s   NOSVA:9MB/s
> tlb_miss: SVA:302,651 NOSVA:1,223
> trans_table_walk_access:SVA:302,276   NOSVA:1,237
> 
> VM scenario:
> performance:  vSVA:9MB/s  NOvSVA:6MB/s  about 30~40% loss
> tlb_miss: vSVA:4,423,897  NOvSVA:1,907
> trans_table_walk_access:vSVA:61,928,430   NOvSVA:21,948
> 
> In physical scenario, there's almost no performance loss, but the
> tlb_miss and trans_table_walk_access of stage 1 for SVA is quite high,
> comparing to NOSVA.
> 
> In VM scenario, there's about 30~40% performance loss, this is because
> the two stage tlb_miss and trans_table_walk_access is even higher, and
> impact the performance.
> 
> I compare the procedure of building page table of SVA and NOSVA, and
> found that NOSVA uses 2MB mapping as far as possible, while SVA uses
> only 4KB.
> 
> I retest with huge page, and huge page could solve this problem, the
> performance of SVA/vSVA is almost the same as NOSVA.
> 
> I am wondering do you have any other solution for the performance loss
> of vSVA, or any other method to reduce the tlb_miss/trans_table_walk.

Hi Xingang,

Just curious, do you have DVM enabled on this board or does it use explicit
SMMU TLB invalidations?

Thanks,
Shameer

RE: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor

2021-04-01 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 01 April 2021 12:49
> To: yuzenghui 
> Cc: eric.auger@gmail.com; io...@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org; k...@vger.kernel.org;
> kvm...@lists.cs.columbia.edu; w...@kernel.org; m...@kernel.org;
> robin.mur...@arm.com; j...@8bytes.org; alex.william...@redhat.com;
> t...@semihalf.com; zhukeqian ;
> jacob.jun@linux.intel.com; yi.l@intel.com; wangxingang
> ; jiangkunkun ;
> jean-phili...@linaro.org; zhangfei@linaro.org; zhangfei@gmail.com;
> vivek.gau...@arm.com; Shameerali Kolothum Thodi
> ; nicoleots...@gmail.com;
> lushenming ; vse...@nvidia.com; Wanghaibin (D)
> 
> Subject: Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than
> one context descriptor
> 
> Hi Zenghui,
> 
> On 3/30/21 11:23 AM, Zenghui Yu wrote:
> > Hi Eric,
> >
> > On 2021/2/24 4:56, Eric Auger wrote:
> >> In preparation for vSVA, let's accept userspace provided configs
> >> with more than one CD. We check the max CD against the host iommu
> >> capability and also the format (linear versus 2 level).
> >>
> >> Signed-off-by: Eric Auger 
> >> Signed-off-by: Shameer Kolothum
> 
> >> ---
> >>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 -
> >>   1 file changed, 8 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> index 332d31c0680f..ab74a0289893 100644
> >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> @@ -3038,14 +3038,17 @@ static int
> arm_smmu_attach_pasid_table(struct
> >> iommu_domain *domain,
> >>   if (smmu_domain->s1_cfg.set)
> >>   goto out;
> >>   -    /*
> >> - * we currently support a single CD so s1fmt and s1dss
> >> - * fields are also ignored
> >> - */
> >> -    if (cfg->pasid_bits)
> >> +    list_for_each_entry(master, _domain->devices,
> >> domain_head) {
> >> +    if (cfg->pasid_bits > master->ssid_bits)
> >> +    goto out;
> >> +    }
> >> +    if (cfg->vendor_data.smmuv3.s1fmt ==
> >> STRTAB_STE_0_S1FMT_64K_L2 &&
> >> +    !(smmu->features &
> ARM_SMMU_FEAT_2_LVL_CDTAB))
> >>   goto out;
> >>     smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
> >> +    smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
> >> +    smmu_domain->s1_cfg.s1fmt =
> cfg->vendor_data.smmuv3.s1fmt;
> >
> > And what about the SIDSS field?
> >
> I added this patch upon Shameer's request, to be more vSVA friendly.
> Hower this series does not really target multiple CD support. At the
> moment the driver only supports STRTAB_STE_1_S1DSS_SSID0 (0x2) I think.
> At this moment maybe I can only check the s1dss field is 0x2. Or simply
> removes this patch?
> 
> Thoughts?

Right. This was useful for vSVA tests. But yes, to properly support multiple CDs
we need to pass the S1DSS from Qemu. And that requires further changes.
So I think it's better to remove this patch and reject S1CDMAX != 0 cases.

Thanks,
Shameer
   
> 
> Eric

RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

2021-02-22 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 21 February 2021 18:21
> To: Shameerali Kolothum Thodi ;
> eric.auger@gmail.com; io...@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org; k...@vger.kernel.org;
> kvm...@lists.cs.columbia.edu; w...@kernel.org; j...@8bytes.org;
> m...@kernel.org; robin.mur...@arm.com; alex.william...@redhat.com
> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> zhangfei@gmail.com; vivek.gau...@arm.com;
> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
> nicoleots...@gmail.com; yuzenghui ; Zengtao (B)
> ; linux...@openeuler.org
> Subject: Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
> 
> Hi Shameer,
> On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -Original Message-
> >> From: Eric Auger [mailto:eric.au...@redhat.com]
> >> Sent: 18 November 2020 11:22
> >> To: eric.auger@gmail.com; eric.au...@redhat.com;
> >> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> >> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> >> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
> >> alex.william...@redhat.com
> >> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> >> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
> >> Thodi ;
> >> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
> >> nicoleots...@gmail.com; yuzenghui 
> >> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
> >>
> >> This series brings the IOMMU part of HW nested paging support
> >> in the SMMUv3. The VFIO part is submitted separately.
> >>
> >> The IOMMU API is extended to support 2 new API functionalities:
> >> 1) pass the guest stage 1 configuration
> >> 2) pass stage 1 MSI bindings
> >>
> >> Then those capabilities gets implemented in the SMMUv3 driver.
> >>
> >> The virtualizer passes information through the VFIO user API
> >> which cascades them to the iommu subsystem. This allows the guest
> >> to own stage 1 tables and context descriptors (so-called PASID
> >> table) while the host owns stage 2 tables and main configuration
> >> structures (STE).
> >
> > I am seeing an issue with Guest testpmd run with this series.
> > I have two different setups and testpmd works fine with the
> > first one but not with the second.
> >
> > 1). Guest doesn't have kernel driver built-in for pass-through dev.
> >
> > root@ubuntu:/# lspci -v
> > ...
> > 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev
> 21)
> > Subsystem: Huawei Technologies Co., Ltd. Device 
> > Flags: fast devsel
> > Memory at 800010 (64-bit, prefetchable) [disabled] [size=64K]
> > Memory at 80 (64-bit, prefetchable) [disabled] [size=1M]
> > Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> > Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> > Capabilities: [b0] Power Management version 3
> > Capabilities: [100] Access Control Services
> > Capabilities: [300] Transaction Processing Hints
> >
> > root@ubuntu:/# echo vfio-pci >
> /sys/bus/pci/devices/:00:02.0/driver_override
> > root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers_probe
> >
> > root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w :00:02.0 --file-prefix
> socket0  -l 0-1 -n 2 -- -i
> > EAL: Detected 8 lcore(s)
> > EAL: Detected 1 NUMA nodes
> > EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> > EAL: Selected IOVA mode 'VA'
> > EAL: No available hugepages reported in hugepages-32768kB
> > EAL: No available hugepages reported in hugepages-64kB
> > EAL: No available hugepages reported in hugepages-1048576kB
> > EAL: Probing VFIO support...
> > EAL: VFIO support initialized
> > EAL:   Invalid NUMA socket, default to 0
> > EAL:   using IOMMU type 1 (Type 1)
> > EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: :00:02.0 (socket
> 0)
> > EAL: No legacy callbacks, legacy socket not created
> > Interactive-mode selected
> > testpmd: create a new mbuf pool : n=155456,
> size=2176, socket=0
> > testpmd: preferred mempool ops selected: ring_mp_mc
> >
> > Warning! port-topology=paired and odd forward ports number, the last port
> will pair with itself.
> >
> > Configuring Port 0 (socket 0)
> > Port 0: 8E:A6:8C:43:43:45
> > Checking link statuses...
> > Done
> > te

RE: [PATCH v11 12/13] vfio/pci: Register a DMA fault response region

2021-02-18 Thread Shameerali Kolothum Thodi

Hi Eric,

> > -Original Message-
> > From: Eric Auger [mailto:eric.au...@redhat.com]
> > Sent: 16 November 2020 11:00
> > To: eric.auger@gmail.com; eric.au...@redhat.com;
> > io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> > k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> > j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
> > alex.william...@redhat.com
> > Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> > zhangfei....@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
> > Thodi ;
> > jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
> > nicoleots...@gmail.com; yuzenghui 
> > Subject: [PATCH v11 12/13] vfio/pci: Register a DMA fault response
> > region
> >
> > In preparation for vSVA, let's register a DMA fault response region,
> > where the userspace will push the page responses and increment the
> > head of the buffer. The kernel will pop those responses and inject
> > them on iommu side.
> >
> > Signed-off-by: Eric Auger 
> > ---
> >  drivers/vfio/pci/vfio_pci.c | 114 +---
> >  drivers/vfio/pci/vfio_pci_private.h |   5 ++
> >  drivers/vfio/pci/vfio_pci_rdwr.c|  39 ++
> >  include/uapi/linux/vfio.h   |  32 
> >  4 files changed, 181 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index 65a83fd0e8c0..e9a904ce3f0d 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -318,9 +318,20 @@ static void vfio_pci_dma_fault_release(struct
> > vfio_pci_device *vdev,
> > kfree(vdev->fault_pages);
> >  }
> >
> > -static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
> > -  struct vfio_pci_region *region,
> > -  struct vm_area_struct *vma)
> > +static void
> > +vfio_pci_dma_fault_response_release(struct vfio_pci_device *vdev,
> > +   struct vfio_pci_region *region) {
> > +   if (vdev->dma_fault_response_wq)
> > +   destroy_workqueue(vdev->dma_fault_response_wq);
> > +   kfree(vdev->fault_response_pages);
> > +   vdev->fault_response_pages = NULL;
> > +}
> > +
> > +static int __vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
> > +struct vfio_pci_region *region,
> > +struct vm_area_struct *vma,
> > +u8 *pages)
> >  {
> > u64 phys_len, req_len, pgoff, req_start;
> > unsigned long long addr;
> > @@ -333,14 +344,14 @@ static int vfio_pci_dma_fault_mmap(struct
> > vfio_pci_device *vdev,
> > ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
> > req_start = pgoff << PAGE_SHIFT;
> >
> > -   /* only the second page of the producer fault region is mmappable */
> > +   /* only the second page of the fault region is mmappable */
> > if (req_start < PAGE_SIZE)
> > return -EINVAL;
> >
> > if (req_start + req_len > phys_len)
> > return -EINVAL;
> >
> > -   addr = virt_to_phys(vdev->fault_pages);
> > +   addr = virt_to_phys(pages);
> > vma->vm_private_data = vdev;
> > vma->vm_pgoff = (addr >> PAGE_SHIFT) + pgoff;
> >
> > @@ -349,13 +360,29 @@ static int vfio_pci_dma_fault_mmap(struct
> > vfio_pci_device *vdev,
> > return ret;
> >  }
> >
> > -static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device *vdev,
> > -struct vfio_pci_region *region,
> > -struct vfio_info_cap *caps)
> > +static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
> > +  struct vfio_pci_region *region,
> > +  struct vm_area_struct *vma)
> > +{
> > +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
> > vdev->fault_pages);
> > +}
> > +
> > +static int
> > +vfio_pci_dma_fault_response_mmap(struct vfio_pci_device *vdev,
> > +   struct vfio_pci_region *region,
> > +   struct vm_area_struct *vma)
> > +{
> > +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
> > vdev->fault_response_pages);
> > +}
> > +
> > +static int __vfio_pci_dma_fault_add_capability(struct vfio_pci_device 
> > *vdev

RE: [PATCH v2] iommu: Check dev->iommu in iommu_dev_xxx functions

2021-02-12 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: 12 February 2021 16:45
> To: 'Robin Murphy' ; linux-kernel@vger.kernel.org;
> io...@lists.linux-foundation.org
> Cc: j...@8bytes.org; jean-phili...@linaro.org; w...@kernel.org; Zengtao (B)
> ; linux...@openeuler.org
> Subject: RE: [PATCH v2] iommu: Check dev->iommu in iommu_dev_xxx functions
> 
> 
> 
> > -Original Message-
> > From: Robin Murphy [mailto:robin.mur...@arm.com]
> > Sent: 12 February 2021 16:39
> > To: Shameerali Kolothum Thodi ;
> > linux-kernel@vger.kernel.org; io...@lists.linux-foundation.org
> > Cc: j...@8bytes.org; jean-phili...@linaro.org; w...@kernel.org; Zengtao (B)
> > ; linux...@openeuler.org
> > Subject: Re: [PATCH v2] iommu: Check dev->iommu in iommu_dev_xxx
> functions
> >
> > On 2021-02-12 14:54, Shameerali Kolothum Thodi wrote:
> > > Hi Robin/Joerg,
> > >
> > >> -Original Message-
> > >> From: Shameer Kolothum
> [mailto:shameerali.kolothum.th...@huawei.com]
> > >> Sent: 01 February 2021 12:41
> > >> To: linux-kernel@vger.kernel.org; io...@lists.linux-foundation.org
> > >> Cc: j...@8bytes.org; robin.mur...@arm.com; jean-phili...@linaro.org;
> > >> w...@kernel.org; Zengtao (B) ;
> > >> linux...@openeuler.org
> > >> Subject: [Linuxarm] [PATCH v2] iommu: Check dev->iommu in
> > iommu_dev_xxx
> > >> functions
> > >>
> > >> The device iommu probe/attach might have failed leaving dev->iommu
> > >> to NULL and device drivers may still invoke these functions resulting
> > >> in a crash in iommu vendor driver code. Hence make sure we check that.
> > >>
> > >> Also added iommu_ops to the "struct dev_iommu" and set it if the dev
> > >> is successfully associated with an iommu.
> > >>
> > >> Fixes: a3a195929d40 ("iommu: Add APIs for multiple domains per
> device")
> > >> Signed-off-by: Shameer Kolothum
> > 
> > >> ---
> > >> v1 --> v2:
> > >>   -Added iommu_ops to struct dev_iommu based on the discussion with
> > Robin.
> > >>   -Rebased against iommu-tree core branch.
> > >
> > > A gentle ping on this...
> >
> > Is there a convincing justification for maintaining yet another copy of
> > the ops pointer rather than simply dereferencing iommu_dev->ops at point
> > of use?
> >
> 
> TBH, nothing I can think of now. That was mainly the way I interpreted your
> suggestion
> from the v1.  Now it looks like you didn’t mean it :). I am Ok to rework it to
> dereference
> it from iommu_dev. Please let me know.

So we can do something like this,

index fd76e2f579fe..5fd31a3cec18 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2865,10 +2865,12 @@ EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
  */
 int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features feat)
 {
-   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   if (dev->iommu && dev->iommu->iommu_dev && dev->iommu->iommu_dev->ops)
+   struct iommu_ops  *ops = dev->iommu->iommu_dev->ops;
 
-   if (ops && ops->dev_enable_feat)
-   return ops->dev_enable_feat(dev, feat);
+   if (ops->dev_enable_feat)
+   return ops->dev_enable_feat(dev, feat);
+   }
 
return -ENODEV;
 }

Again, not sure we need to do the checking for iommu->dev and ops here. If the
dev->iommu is set, is it safe to assume that we have a valid iommu->iommu_dev
and ops always? (May be it is safer to do the checking in case something
else breaks this assumption in future). Please let me know your thoughts.

Thanks,
Shameer

RE: [PATCH v2] iommu: Check dev->iommu in iommu_dev_xxx functions

2021-02-12 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 12 February 2021 16:39
> To: Shameerali Kolothum Thodi ;
> linux-kernel@vger.kernel.org; io...@lists.linux-foundation.org
> Cc: j...@8bytes.org; jean-phili...@linaro.org; w...@kernel.org; Zengtao (B)
> ; linux...@openeuler.org
> Subject: Re: [PATCH v2] iommu: Check dev->iommu in iommu_dev_xxx functions
> 
> On 2021-02-12 14:54, Shameerali Kolothum Thodi wrote:
> > Hi Robin/Joerg,
> >
> >> -Original Message-
> >> From: Shameer Kolothum [mailto:shameerali.kolothum.th...@huawei.com]
> >> Sent: 01 February 2021 12:41
> >> To: linux-kernel@vger.kernel.org; io...@lists.linux-foundation.org
> >> Cc: j...@8bytes.org; robin.mur...@arm.com; jean-phili...@linaro.org;
> >> w...@kernel.org; Zengtao (B) ;
> >> linux...@openeuler.org
> >> Subject: [Linuxarm] [PATCH v2] iommu: Check dev->iommu in
> iommu_dev_xxx
> >> functions
> >>
> >> The device iommu probe/attach might have failed leaving dev->iommu
> >> to NULL and device drivers may still invoke these functions resulting
> >> in a crash in iommu vendor driver code. Hence make sure we check that.
> >>
> >> Also added iommu_ops to the "struct dev_iommu" and set it if the dev
> >> is successfully associated with an iommu.
> >>
> >> Fixes: a3a195929d40 ("iommu: Add APIs for multiple domains per device")
> >> Signed-off-by: Shameer Kolothum
> 
> >> ---
> >> v1 --> v2:
> >>   -Added iommu_ops to struct dev_iommu based on the discussion with
> Robin.
> >>   -Rebased against iommu-tree core branch.
> >
> > A gentle ping on this...
> 
> Is there a convincing justification for maintaining yet another copy of
> the ops pointer rather than simply dereferencing iommu_dev->ops at point
> of use?
> 

TBH, nothing I can think of now. That was mainly the way I interpreted your 
suggestion
from the v1.  Now it looks like you didn’t mean it :). I am Ok to rework it to 
dereference
it from iommu_dev. Please let me know.

Thanks,
Shameer

> Robin.
> 
> > Thanks,
> > Shameer
> >
> >> ---
> >>   drivers/iommu/iommu.c | 19 +++
> >>   include/linux/iommu.h |  2 ++
> >>   2 files changed, 9 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >> index fd76e2f579fe..6023d0b7c542 100644
> >> --- a/drivers/iommu/iommu.c
> >> +++ b/drivers/iommu/iommu.c
> >> @@ -217,6 +217,7 @@ static int __iommu_probe_device(struct device
> *dev,
> >> struct list_head *group_list
> >>}
> >>
> >>dev->iommu->iommu_dev = iommu_dev;
> >> +  dev->iommu->ops = iommu_dev->ops;
> >>
> >>group = iommu_group_get_for_dev(dev);
> >>if (IS_ERR(group)) {
> >> @@ -2865,10 +2866,8 @@
> EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
> >>*/
> >>   int iommu_dev_enable_feature(struct device *dev, enum
> >> iommu_dev_features feat)
> >>   {
> >> -  const struct iommu_ops *ops = dev->bus->iommu_ops;
> >> -
> >> -  if (ops && ops->dev_enable_feat)
> >> -  return ops->dev_enable_feat(dev, feat);
> >> +  if (dev->iommu && dev->iommu->ops->dev_enable_feat)
> >> +  return dev->iommu->ops->dev_enable_feat(dev, feat);
> >>
> >>return -ENODEV;
> >>   }
> >> @@ -2881,10 +2880,8 @@
> >> EXPORT_SYMBOL_GPL(iommu_dev_enable_feature);
> >>*/
> >>   int iommu_dev_disable_feature(struct device *dev, enum
> >> iommu_dev_features feat)
> >>   {
> >> -  const struct iommu_ops *ops = dev->bus->iommu_ops;
> >> -
> >> -  if (ops && ops->dev_disable_feat)
> >> -  return ops->dev_disable_feat(dev, feat);
> >> +  if (dev->iommu && dev->iommu->ops->dev_disable_feat)
> >> +  return dev->iommu->ops->dev_disable_feat(dev, feat);
> >>
> >>return -EBUSY;
> >>   }
> >> @@ -2892,10 +2889,8 @@
> >> EXPORT_SYMBOL_GPL(iommu_dev_disable_feature);
> >>
> >>   bool iommu_dev_feature_enabled(struct device *dev, enum
> >> iommu_dev_features feat)
> >>   {
> >> -  const struct iommu_ops *ops = dev->bus->iommu_ops;
> >> -
> >> -  if (ops && ops->dev_feat_enabled)
> >> -

RE: [PATCH v2] iommu: Check dev->iommu in iommu_dev_xxx functions

2021-02-12 Thread Shameerali Kolothum Thodi

Hi Robin/Joerg,

> -Original Message-
> From: Shameer Kolothum [mailto:shameerali.kolothum.th...@huawei.com]
> Sent: 01 February 2021 12:41
> To: linux-kernel@vger.kernel.org; io...@lists.linux-foundation.org
> Cc: j...@8bytes.org; robin.mur...@arm.com; jean-phili...@linaro.org;
> w...@kernel.org; Zengtao (B) ;
> linux...@openeuler.org
> Subject: [Linuxarm] [PATCH v2] iommu: Check dev->iommu in iommu_dev_xxx
> functions
> 
> The device iommu probe/attach might have failed leaving dev->iommu
> to NULL and device drivers may still invoke these functions resulting
> in a crash in iommu vendor driver code. Hence make sure we check that.
> 
> Also added iommu_ops to the "struct dev_iommu" and set it if the dev
> is successfully associated with an iommu.
> 
> Fixes: a3a195929d40 ("iommu: Add APIs for multiple domains per device")
> Signed-off-by: Shameer Kolothum 
> ---
> v1 --> v2:
>  -Added iommu_ops to struct dev_iommu based on the discussion with Robin.
>  -Rebased against iommu-tree core branch.

A gentle ping on this...

Thanks,
Shameer

> ---
>  drivers/iommu/iommu.c | 19 +++
>  include/linux/iommu.h |  2 ++
>  2 files changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index fd76e2f579fe..6023d0b7c542 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -217,6 +217,7 @@ static int __iommu_probe_device(struct device *dev,
> struct list_head *group_list
>   }
> 
>   dev->iommu->iommu_dev = iommu_dev;
> + dev->iommu->ops = iommu_dev->ops;
> 
>   group = iommu_group_get_for_dev(dev);
>   if (IS_ERR(group)) {
> @@ -2865,10 +2866,8 @@ EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
>   */
>  int iommu_dev_enable_feature(struct device *dev, enum
> iommu_dev_features feat)
>  {
> - const struct iommu_ops *ops = dev->bus->iommu_ops;
> -
> - if (ops && ops->dev_enable_feat)
> - return ops->dev_enable_feat(dev, feat);
> + if (dev->iommu && dev->iommu->ops->dev_enable_feat)
> + return dev->iommu->ops->dev_enable_feat(dev, feat);
> 
>   return -ENODEV;
>  }
> @@ -2881,10 +2880,8 @@
> EXPORT_SYMBOL_GPL(iommu_dev_enable_feature);
>   */
>  int iommu_dev_disable_feature(struct device *dev, enum
> iommu_dev_features feat)
>  {
> - const struct iommu_ops *ops = dev->bus->iommu_ops;
> -
> - if (ops && ops->dev_disable_feat)
> - return ops->dev_disable_feat(dev, feat);
> + if (dev->iommu && dev->iommu->ops->dev_disable_feat)
> + return dev->iommu->ops->dev_disable_feat(dev, feat);
> 
>   return -EBUSY;
>  }
> @@ -2892,10 +2889,8 @@
> EXPORT_SYMBOL_GPL(iommu_dev_disable_feature);
> 
>  bool iommu_dev_feature_enabled(struct device *dev, enum
> iommu_dev_features feat)
>  {
> - const struct iommu_ops *ops = dev->bus->iommu_ops;
> -
> - if (ops && ops->dev_feat_enabled)
> - return ops->dev_feat_enabled(dev, feat);
> + if (dev->iommu && dev->iommu->ops->dev_feat_enabled)
> + return dev->iommu->ops->dev_feat_enabled(dev, feat);
> 
>   return false;
>  }
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 524ffc2ff64f..ff0c76bdfb67 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -354,6 +354,7 @@ struct iommu_fault_param {
>   * @fault_param: IOMMU detected device fault reporting data
>   * @fwspec:   IOMMU fwspec data
>   * @iommu_dev:IOMMU device this device is linked to
> + * @ops:  iommu-ops for talking to the iommu_dev
>   * @priv: IOMMU Driver private data
>   *
>   * TODO: migrate other per device data pointers under iommu_dev_data,
> e.g.
> @@ -364,6 +365,7 @@ struct dev_iommu {
>   struct iommu_fault_param*fault_param;
>   struct iommu_fwspec *fwspec;
>   struct iommu_device *iommu_dev;
> + const struct iommu_ops  *ops;
>   void*priv;
>  };
> 
> --
> 2.17.1
> ___
> Linuxarm mailing list -- linux...@openeuler.org
> To unsubscribe send an email to linuxarm-le...@openeuler.org

RE: [PATCH] iommu: Check dev->iommu in iommu_dev_xxx functions

2021-01-26 Thread Shameerali Kolothum Thodi

Hi Robin,

> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 26 January 2021 13:51
> To: Shameerali Kolothum Thodi 
> Cc: linux-kernel@vger.kernel.org; io...@lists.linux-foundation.org;
> jean-phili...@linaro.org; w...@kernel.org; linux...@openeuler.org; Zengtao
> (B) 
> Subject: Re: [PATCH] iommu: Check dev->iommu in iommu_dev_xxx functions
> 
> On Tue, 26 Jan 2021 13:06:29 +
> Shameer Kolothum  wrote:
> 
> > The device iommu probe/attach might have failed leaving dev->iommu to
> > NULL and device drivers may still invoke these functions resulting a
> > crash in iommu vendor driver code. Hence make sure we check that.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> >  drivers/iommu/iommu.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
> > ffeebda8d6de..cb68153c5cc0 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -2867,7 +2867,7 @@ bool iommu_dev_has_feature(struct device *dev,
> > enum iommu_dev_features feat) {
> > const struct iommu_ops *ops = dev->bus->iommu_ops;
> >
> > -   if (ops && ops->dev_has_feat)
> > +   if (dev->iommu && ops && ops->dev_has_feat)
> > return ops->dev_has_feat(dev, feat);
> 
> Might make sense to make these more self-contained, e.g.:
> 
>   if (dev->iommu && dev->iommu->ops->foo)
>   dev->iommu->ops->foo()

Right. Does that mean adding ops to "struct dev_iommu" or retrieve ops like
below,

if (dev->iommu && dev->iommu->iommu_dev->ops->foo)
dev->iommu->iommu_dev->ops->foo()
 
Sorry, not clear to me.

Thanks,
Shameer

RE: [PATCH] genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set

2021-01-26 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: Marc Zyngier [mailto:m...@kernel.org]
> Sent: 25 January 2021 14:49
> To: Shameerali Kolothum Thodi 
> Cc: linux-kernel@vger.kernel.org; Thomas Gleixner ; Bjorn
> Helgaas ; sta...@vger.kernel.org
> Subject: Re: [PATCH] genirq/msi: Activate Multi-MSI early when
> MSI_FLAG_ACTIVATE_EARLY is set
> 
> On 2021-01-25 14:39, Shameerali Kolothum Thodi wrote:
> >> -Original Message-
> >> From: Marc Zyngier [mailto:m...@kernel.org]
> >> Sent: 23 January 2021 12:28
> >> To: linux-kernel@vger.kernel.org
> >> Cc: Thomas Gleixner ; Bjorn Helgaas
> >> ; Shameerali Kolothum Thodi
> >> ; sta...@vger.kernel.org
> >> Subject: [PATCH] genirq/msi: Activate Multi-MSI early when
> >> MSI_FLAG_ACTIVATE_EARLY is set
> >>
> >> When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI),
> >> we perform the activation of the interrupt (which in the case of
> >> PCI results in the endpoint being programmed) as soon as the
> >> interrupt is allocated.
> >>
> >> But it appears that this is only done for the first vector,
> >> introducing an inconsistent behaviour for PCI Multi-MSI.
> >>
> >> Fix it by iterating over the number of vectors allocated to
> >> each MSI descriptor. This is easily achieved by introducing
> >> a new "for_each_msi_vector" iterator, together with a tiny
> >> bit of refactoring.
> >>
> >> Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated
> >> early")
> >> Reported-by: Shameer Kolothum
> 
> >> Signed-off-by: Marc Zyngier 
> >> Cc: sta...@vger.kernel.org
> >> ---
> >>  include/linux/msi.h |  6 ++
> >>  kernel/irq/msi.c| 44 
> >>  2 files changed, 26 insertions(+), 24 deletions(-)
> >>
> >> diff --git a/include/linux/msi.h b/include/linux/msi.h
> >> index 360a0a7e7341..aef35fd1cf11 100644
> >> --- a/include/linux/msi.h
> >> +++ b/include/linux/msi.h
> >> @@ -178,6 +178,12 @@ struct msi_desc {
> >>list_for_each_entry((desc), dev_to_msi_list((dev)), list)
> >>  #define for_each_msi_entry_safe(desc, tmp, dev)   \
> >>list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)),
> >> list)
> >> +#define for_each_msi_vector(desc, __irq, dev) 
> >> \
> >> +  for_each_msi_entry((desc), (dev))   \
> >> +  if ((desc)->irq)\
> >> +  for (__irq = (desc)->irq;   \
> >> +   __irq < ((desc)->irq + (desc)->nvec_used); \
> >> +   __irq++)
> >>
> >>  #ifdef CONFIG_IRQ_MSI_IOMMU
> >>  static inline const void *msi_desc_get_iommu_cookie(struct msi_desc
> >> *desc)
> >> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> >> index 2c0c4d6d0f83..d924676c8781 100644
> >> --- a/kernel/irq/msi.c
> >> +++ b/kernel/irq/msi.c
> >> @@ -436,22 +436,22 @@ int __msi_domain_alloc_irqs(struct irq_domain
> >> *domain, struct device *dev,
> >>
> >>can_reserve = msi_check_reservation_mode(domain, info, dev);
> >>
> >> -  for_each_msi_entry(desc, dev) {
> >> -  virq = desc->irq;
> >> -  if (desc->nvec_used == 1)
> >> -  dev_dbg(dev, "irq %d for MSI\n", virq);
> >> -  else
> >> +  /*
> >> +   * This flag is set by the PCI layer as we need to activate
> >> +   * the MSI entries before the PCI layer enables MSI in the
> >> +   * card. Otherwise the card latches a random msi message.
> >> +   */
> >> +  if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY))
> >> +  goto skip_activate;
> >
> > This will change the dbg print behavior. From the commit f3b0946d629c,
> > it looks like the below dev_dbg() code was there for
> > !MSI_FLAG_ACTIVATE_EARLY
> > case as well. Not sure how much this matters though.
> 
> I'm not sure this matters either. We may have relied on these statements
> some 6/7 years ago, as the whole hierarchy stuff was brand new, but we
> now have a much better debug infrastructure thanks to Thomas. I'd be
> totally in favour of dropping it.
> 
Ok.

Tested on D06 with gicv4 enabled and Guest MSI dev works fine.

FWIW,
   Tested-by: Shameer Kolothum 

Thanks,
Shameer

RE: [PATCH] genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set

2021-01-25 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: Marc Zyngier [mailto:m...@kernel.org]
> Sent: 23 January 2021 12:28
> To: linux-kernel@vger.kernel.org
> Cc: Thomas Gleixner ; Bjorn Helgaas
> ; Shameerali Kolothum Thodi
> ; sta...@vger.kernel.org
> Subject: [PATCH] genirq/msi: Activate Multi-MSI early when
> MSI_FLAG_ACTIVATE_EARLY is set
> 
> When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI),
> we perform the activation of the interrupt (which in the case of
> PCI results in the endpoint being programmed) as soon as the
> interrupt is allocated.
> 
> But it appears that this is only done for the first vector,
> introducing an inconsistent behaviour for PCI Multi-MSI.
> 
> Fix it by iterating over the number of vectors allocated to
> each MSI descriptor. This is easily achieved by introducing
> a new "for_each_msi_vector" iterator, together with a tiny
> bit of refactoring.
> 
> Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early")
> Reported-by: Shameer Kolothum 
> Signed-off-by: Marc Zyngier 
> Cc: sta...@vger.kernel.org
> ---
>  include/linux/msi.h |  6 ++
>  kernel/irq/msi.c| 44 
>  2 files changed, 26 insertions(+), 24 deletions(-)
> 
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 360a0a7e7341..aef35fd1cf11 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -178,6 +178,12 @@ struct msi_desc {
>   list_for_each_entry((desc), dev_to_msi_list((dev)), list)
>  #define for_each_msi_entry_safe(desc, tmp, dev)  \
>   list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list)
> +#define for_each_msi_vector(desc, __irq, dev)
> \
> + for_each_msi_entry((desc), (dev))   \
> + if ((desc)->irq)\
> + for (__irq = (desc)->irq;   \
> +  __irq < ((desc)->irq + (desc)->nvec_used); \
> +  __irq++)
> 
>  #ifdef CONFIG_IRQ_MSI_IOMMU
>  static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc)
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index 2c0c4d6d0f83..d924676c8781 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -436,22 +436,22 @@ int __msi_domain_alloc_irqs(struct irq_domain
> *domain, struct device *dev,
> 
>   can_reserve = msi_check_reservation_mode(domain, info, dev);
> 
> - for_each_msi_entry(desc, dev) {
> - virq = desc->irq;
> - if (desc->nvec_used == 1)
> - dev_dbg(dev, "irq %d for MSI\n", virq);
> - else
> + /*
> +  * This flag is set by the PCI layer as we need to activate
> +  * the MSI entries before the PCI layer enables MSI in the
> +  * card. Otherwise the card latches a random msi message.
> +  */
> + if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY))
> + goto skip_activate;

This will change the dbg print behavior. From the commit f3b0946d629c,
it looks like the below dev_dbg() code was there for !MSI_FLAG_ACTIVATE_EARLY
case as well. Not sure how much this matters though.

Thanks,
Shameer

> +
> + for_each_msi_vector(desc, i, dev) {
> + if (desc->irq == i) {
> + virq = desc->irq;
>   dev_dbg(dev, "irq [%d-%d] for MSI\n",
>   virq, virq + desc->nvec_used - 1);
> - /*
> -  * This flag is set by the PCI layer as we need to activate
> -  * the MSI entries before the PCI layer enables MSI in the
> -  * card. Otherwise the card latches a random msi message.
> -  */
> - if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY))
> - continue;
> + }
> 
> - irq_data = irq_domain_get_irq_data(domain, desc->irq);
> + irq_data = irq_domain_get_irq_data(domain, i);
>   if (!can_reserve) {
>   irqd_clr_can_reserve(irq_data);
>   if (domain->flags & IRQ_DOMAIN_MSI_NOMASK_QUIRK)
> @@ -462,28 +462,24 @@ int __msi_domain_alloc_irqs(struct irq_domain
> *domain, struct device *dev,
>   goto cleanup;
>   }
> 
> +skip_activate:
>   /*
>* If these interrupts use reservation mode, clear the activated bit
>* so request_irq() will assign the final vector.
>*/
>   if (can_reserve) {
> - for_each_msi_entry(desc, dev) {
> - irq_dat

RE: [PATCH RFC v1 00/15] iommu/virtio: Nested stage support with Arm

2021-01-22 Thread Shameerali Kolothum Thodi

Hi Vivek,

> -Original Message-
> From: Vivek Kumar Gautam [mailto:vivek.gau...@arm.com]
> Sent: 21 January 2021 17:34
> To: Auger Eric ; linux-kernel@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; io...@lists.linux-foundation.org;
> virtualizat...@lists.linux-foundation.org
> Cc: j...@8bytes.org; will.dea...@arm.com; m...@redhat.com;
> robin.mur...@arm.com; jean-phili...@linaro.org;
> alex.william...@redhat.com; kevin.t...@intel.com;
> jacob.jun@linux.intel.com; yi.l@intel.com; lorenzo.pieral...@arm.com;
> Shameerali Kolothum Thodi 
> Subject: Re: [PATCH RFC v1 00/15] iommu/virtio: Nested stage support with
> Arm
> 
> Hi Eric,
> 
> 
> On 1/19/21 2:33 PM, Auger Eric wrote:
> > Hi Vivek,
> >
> > On 1/15/21 1:13 PM, Vivek Gautam wrote:
> >> This patch-series aims at enabling Nested stage translation in guests
> >> using virtio-iommu as the paravirtualized iommu. The backend is
> >> supported with Arm SMMU-v3 that provides nested stage-1 and stage-2
> translation.
> >>
> >> This series derives its purpose from various efforts happening to add
> >> support for Shared Virtual Addressing (SVA) in host and guest. On
> >> Arm, most of the support for SVA has already landed. The support for
> >> nested stage translation and fault reporting to guest has been proposed 
> >> [1].
> >> The related changes required in VFIO [2] framework have also been put
> >> forward.
> >>
> >> This series proposes changes in virtio-iommu to program PASID tables
> >> and related stage-1 page tables. A simple iommu-pasid-table library
> >> is added for this purpose that interacts with vendor drivers to
> >> allocate and populate PASID tables.
> >> In Arm SMMUv3 we propose to pull the Context Descriptor (CD)
> >> management code out of the arm-smmu-v3 driver and add that as a glue
> >> vendor layer to support allocating CD tables, and populating them with 
> >> right
> values.
> >> These CD tables are essentially the PASID tables and contain stage-1
> >> page table configurations too.
> >> A request to setup these CD tables come from virtio-iommu driver
> >> using the iommu-pasid-table library when running on Arm. The
> >> virtio-iommu then pass these PASID tables to the host using the right
> >> virtio backend and support in VMM.
> >>
> >> For testing we have added necessary support in kvmtool. The changes
> >> in kvmtool are based on virtio-iommu development branch by
> >> Jean-Philippe Brucker [3].
> >>
> >> The tested kernel branch contains following in the order bottom to
> >> top on the git hash -
> >> a) v5.11-rc3
> >> b) arm-smmu-v3 [1] and vfio [2] changes from Eric to add nested page
> >> table support for Arm.
> >> c) Smmu test engine patches from Jean-Philippe's branch [4]
> >> d) This series
> >> e) Domain nesting info patches [5][6][7].
> >> f) Changes to add arm-smmu-v3 specific nesting info (to be sent to
> >> the list).
> >>
> >> This kernel is tested on Neoverse reference software stack with Fixed
> >> virtual platform. Public version of the software stack and FVP is
> >> available here[8][9].
> >>
> >> A big thanks to Jean-Philippe for his contributions towards this work
> >> and for his valuable guidance.
> >>
> >> [1]
> >> https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.auger
> >> @redhat.com/T/ [2]
> >>
> https://lore.kernel.org/kvmarm/20201116110030.32335-12-eric.auger@red
> >> hat.com/T/ [3]
> >> https://jpbrucker.net/git/kvmtool/log/?h=virtio-iommu/devel
> >> [4] https://jpbrucker.net/git/linux/log/?h=sva/smmute
> >> [5]
> >> https://lore.kernel.org/kvm/1599734733-6431-2-git-send-email-yi.l.liu
> >> @intel.com/ [6]
> >> https://lore.kernel.org/kvm/1599734733-6431-3-git-send-email-yi.l.liu
> >> @intel.com/ [7]
> >> https://lore.kernel.org/kvm/1599734733-6431-4-git-send-email-yi.l.liu
> >> @intel.com/ [8]
> >> https://developer.arm.com/tools-and-software/open-source-software/arm
> >> -platforms-software/arm-ecosystem-fvps
> >> [9]
> >> https://git.linaro.org/landing-teams/working/arm/arm-reference-platfo
> >> rms.git/about/docs/rdn1edge/user-guide.rst
> >
> > Could you share a public branch where we could find all the kernel pieces.
> >
> > Thank you in advance
> 
> Apologies for the delay. It took a bit of time to sort things out for a public
> branch.
> The branch is available in my github now. Please have a look.
> 
> https://github.com/vivek-arm/linux/tree/5.11-rc3-nested-pgtbl-arm-smmuv3-vi
> rtio-iommu

Thanks for this. Do you have a corresponding kvmtool branch mentioned above as 
public?

Thanks,
Shameer

RE: [PATCH] genirq/msi: Make sure early activation of all PCI MSIs

2021-01-22 Thread Shameerali Kolothum Thodi

Hi Marc,

> -Original Message-
> From: Marc Zyngier [mailto:m...@kernel.org]
> Sent: 21 January 2021 21:25
> To: Shameerali Kolothum Thodi 
> Cc: linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> eric.au...@redhat.com; t...@linutronix.de; linux...@openeuler.org;
> Zengtao (B) ; Wangzhou (B)
> 
> Subject: Re: [PATCH] genirq/msi: Make sure early activation of all PCI MSIs
> 
> Hi Shameer,
> 
> On Thu, 21 Jan 2021 11:02:47 +,
> Shameer Kolothum  wrote:
> >
> > We currently do early activation of MSI irqs for PCI/MSI based on the
> > MSI_FLAG_ACTIVATE_EARLY flag. Though this activates all the allocated
> > MSIs in the case of MSI-X, it only does so for the base irq in the
> > case of MSI. This is because, for MSI, there is only one msi_desc
> > entry for all the 32 irqs it can support and the current
> > implementation iterates over the msi entries and ends up activating
> > the base irq only.
> >
> > The above creates an issue on platforms where the msi controller
> > supports direct injection of vLPIs(eg: ARM GICv4 ITS). On these
> > platforms, upon irq activation, ITS driver maps the event to an ITT
> > entry. And for Guest pass-through to work, early mapping of all the
> > dev MSI vectors is required. Otherwise, the vfio irq bypass manager
> > registration will fail. eg, On a HiSilicon D06 platform with GICv4
> > enabled, Guest boot with zip dev pass-through reports,
> >
> > "vfio-pci :75:00.1: irq bypass producer (token 06e5176a)
> > registration fails: 66311"
> >
> > and Guest boot fails.
> >
> > This is traced to,
> >    kvm_arch_irq_bypass_add_producer
> >      kvm_vgic_v4_set_forwarding
> >        vgic_its_resolve_lpi --> returns
> E_ITS_INT_UNMAPPED_INTERRUPT
> >
> > Hence make sure we activate all the irqs for both MSI and MSI-x cases.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> > It is unclear to me whether not performing the early activation of all
> > MSI irqs was deliberate and has consequences on any other platforms.
> > Please let me know.
> 
> Probably just an oversight.

Ok. That’s a good news :)

> >
> > Thanks,
> > Shameer
> > ---
> >  kernel/irq/msi.c | 114
> > +--
> >  1 file changed, 90 insertions(+), 24 deletions(-)
> >
> > diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index
> > 2c0c4d6d0f83..eec187fc32a9 100644
> > --- a/kernel/irq/msi.c
> > +++ b/kernel/irq/msi.c
> > @@ -395,6 +395,78 @@ static bool msi_check_reservation_mode(struct
> irq_domain *domain,
> > return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit;  }
> >
> > +static void msi_domain_deactivate_irq(struct irq_domain *domain, int
> > +irq) {
> > +   struct irq_data *irqd;
> > +
> > +   irqd = irq_domain_get_irq_data(domain, irq);
> > +   if (irqd_is_activated(irqd))
> > +   irq_domain_deactivate_irq(irqd);
> > +}
> > +
> > +static int msi_domain_activate_irq(struct irq_domain *domain,
> > +  int irq, bool can_reserve)
> > +{
> > +   struct irq_data *irqd;
> > +
> > +   irqd = irq_domain_get_irq_data(domain, irq);
> > +   if (!can_reserve) {
> > +   irqd_clr_can_reserve(irqd);
> > +   if (domain->flags & IRQ_DOMAIN_MSI_NOMASK_QUIRK)
> > +   irqd_set_msi_nomask_quirk(irqd);
> > +   }
> > +   return irq_domain_activate_irq(irqd, can_reserve); }
> > +
> > +static int msi_domain_activate_msix_irqs(struct irq_domain *domain,
> > +struct device *dev, bool can_reserve) {
> > +   struct msi_desc *desc;
> > +   int ret, irq;
> > +
> > +   for_each_msi_entry(desc, dev) {
> > +   irq = desc->irq;
> > +   ret = msi_domain_activate_irq(domain, irq, can_reserve);
> > +   if (ret)
> > +   goto out;
> > +   }
> > +   return 0;
> > +
> > +out:
> > +   for_each_msi_entry(desc, dev) {
> > +   if (irq == desc->irq)
> > +   break;
> > +   msi_domain_deactivate_irq(domain, desc->irq);
> > +   }
> > +   return ret;
> > +}
> > +
> > +static int msi_domain_activate_msi_irqs(struct irq_domain *domain,
> > +   struct device *dev, bool can_reserve) {
> > +   struct msi_desc *desc;
> > +   int i, ret, base, irq;
> > +
> > +   desc = first_msi_entry(dev);
> > +

RE: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-01-14 Thread Shameerali Kolothum Thodi

Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 14 January 2021 16:58
> To: Shameerali Kolothum Thodi ;
> Jean-Philippe Brucker 
> Cc: Xieyingtai ; alex.william...@redhat.com;
> wangxingang ; k...@vger.kernel.org;
> m...@kernel.org; linux-kernel@vger.kernel.org; vivek.gau...@arm.com;
> io...@lists.linux-foundation.org; qubingbing ;
> Zengtao (B) ; zhangfei@linaro.org;
> eric.auger@gmail.com; w...@kernel.org; kvm...@lists.cs.columbia.edu;
> robin.mur...@arm.com
> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
> unmanaged ASIDs
> 
> Hi Shameer, Jean-Philippe,
> 
> On 12/4/20 11:23 AM, Auger Eric wrote:
> > Hi Shameer, Jean-Philippe,
> >
> > On 12/4/20 11:20 AM, Shameerali Kolothum Thodi wrote:
> >> Hi Jean,
> >>
> >>> -Original Message-
> >>> From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
> >>> Sent: 04 December 2020 09:54
> >>> To: Shameerali Kolothum Thodi
> 
> >>> Cc: Auger Eric ; wangxingang
> >>> ; Xieyingtai ;
> >>> k...@vger.kernel.org; m...@kernel.org; j...@8bytes.org;
> w...@kernel.org;
> >>> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> >>> vivek.gau...@arm.com; alex.william...@redhat.com;
> >>> zhangfei@linaro.org; robin.mur...@arm.com;
> >>> kvm...@lists.cs.columbia.edu; eric.auger@gmail.com; Zengtao (B)
> >>> ; qubingbing 
> >>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation
> with
> >>> unmanaged ASIDs
> >>>
> >>> Hi Shameer,
> >>>
> >>> On Thu, Dec 03, 2020 at 06:42:57PM +, Shameerali Kolothum Thodi
> wrote:
> >>>> Hi Jean/zhangfei,
> >>>> Is it possible to have a branch with minimum required SVA/UACCE related
> >>> patches
> >>>> that are already public and can be a "stable" candidate for future respin
> of
> >>> Eric's series?
> >>>> Please share your thoughts.
> >>>
> >>> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
> >>> based on mainline?
> >>
> >> Yes.
> >>
> >>  The uacce-devel branches from
> >>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
> >>> (they track the latest sva/zip-devel branch
> >>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
> As I plan to respin shortly, please could you confirm the best branch to
> rebase on still is that one (uacce-devel from the linux-kernel-uadk git
> repo). Is it up to date? Commits seem to be quite old there.

I think it is the uacce-devel-5.11 branch, but will wait for Jean or Zhangfei
to confirm.

Thanks,
Shameer

> Thanks
> 
> Eric
> >>
> >> Thanks.
> >>
> >> Hi Eric,
> >>
> >> Could you please take a look at the above branches and see whether it make
> sense
> >> to rebase on top of either of those?
> >>
> >> From vSVA point of view, it will be less rebase hassle if we can do that.
> >
> > Sure. I will rebase on top of this ;-)
> >
> > Thanks
> >
> > Eric
> >>
> >> Thanks,
> >> Shameer
> >>
> >>> Thanks,
> >>> Jean
> >>
> >
> > ___
> > iommu mailing list
> > io...@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >

RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

2021-01-08 Thread Shameerali Kolothum Thodi

Hi Eric,

> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: 18 November 2020 11:22
> To: eric.auger@gmail.com; eric.au...@redhat.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
> alex.william...@redhat.com
> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
> Thodi ;
> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
> nicoleots...@gmail.com; yuzenghui 
> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
> 
> This series brings the IOMMU part of HW nested paging support
> in the SMMUv3. The VFIO part is submitted separately.
> 
> The IOMMU API is extended to support 2 new API functionalities:
> 1) pass the guest stage 1 configuration
> 2) pass stage 1 MSI bindings
> 
> Then those capabilities gets implemented in the SMMUv3 driver.
> 
> The virtualizer passes information through the VFIO user API
> which cascades them to the iommu subsystem. This allows the guest
> to own stage 1 tables and context descriptors (so-called PASID
> table) while the host owns stage 2 tables and main configuration
> structures (STE).

I am seeing an issue with Guest testpmd run with this series.
I have two different setups and testpmd works fine with the
first one but not with the second.

1). Guest doesn't have kernel driver built-in for pass-through dev.

root@ubuntu:/# lspci -v
...
00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
Subsystem: Huawei Technologies Co., Ltd. Device 
Flags: fast devsel
Memory at 800010 (64-bit, prefetchable) [disabled] [size=64K]
Memory at 80 (64-bit, prefetchable) [disabled] [size=1M]
Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
Capabilities: [b0] Power Management version 3
Capabilities: [100] Access Control Services
Capabilities: [300] Transaction Processing Hints

root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/:00:02.0/driver_override
root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers_probe

root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w :00:02.0 --file-prefix 
socket0  -l 0-1 -n 2 -- -i
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-32768kB
EAL: No available hugepages reported in hugepages-64kB
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL:   Invalid NUMA socket, default to 0
EAL:   using IOMMU type 1 (Type 1)
EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: :00:02.0 (socket 0)
EAL: No legacy callbacks, legacy socket not created
Interactive-mode selected
testpmd: create a new mbuf pool : n=155456, size=2176, 
socket=0
testpmd: preferred mempool ops selected: ring_mp_mc

Warning! port-topology=paired and odd forward ports number, the last port will 
pair with itself.

Configuring Port 0 (socket 0)
Port 0: 8E:A6:8C:43:43:45
Checking link statuses...
Done
testpmd>

2). Guest have kernel driver built-in for pass-through dev.

root@ubuntu:/# lspci -v
...
00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
Subsystem: Huawei Technologies Co., Ltd. Device 
Flags: bus master, fast devsel, latency 0
Memory at 800010 (64-bit, prefetchable) [size=64K]
Memory at 80 (64-bit, prefetchable) [size=1M]
Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
Capabilities: [b0] Power Management version 3
Capabilities: [100] Access Control Services
Capabilities: [300] Transaction Processing Hints
Kernel driver in use: hns3

root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/:00:02.0/driver_override
root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers/hns3/unbind
root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers_probe

root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w :00:02.0 --file-prefix 
socket0 -l 0-1 -n 2 -- -i
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-32768kB
EAL: No available hugepages reported in hugepages-64kB
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL:   Invalid NUMA socket, default to 0
EAL:   using IOMMU type 1 (Type 1)
EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: :00:02.0 (socket 0)
:00:02.0 hns3_get_mbx_resp(): VF could not get mbx(11,0) head(1) tail(0) 
lost(1) from P

RE: [PATCH v11 12/13] vfio/pci: Register a DMA fault response region

2021-01-08 Thread Shameerali Kolothum Thodi

Hi Eric,

> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: 16 November 2020 11:00
> To: eric.auger@gmail.com; eric.au...@redhat.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
> alex.william...@redhat.com
> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
> Thodi ;
> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
> nicoleots...@gmail.com; yuzenghui 
> Subject: [PATCH v11 12/13] vfio/pci: Register a DMA fault response region
> 
> In preparation for vSVA, let's register a DMA fault response region,
> where the userspace will push the page responses and increment the
> head of the buffer. The kernel will pop those responses and inject them
> on iommu side.
> 
> Signed-off-by: Eric Auger 
> ---
>  drivers/vfio/pci/vfio_pci.c | 114 +---
>  drivers/vfio/pci/vfio_pci_private.h |   5 ++
>  drivers/vfio/pci/vfio_pci_rdwr.c|  39 ++
>  include/uapi/linux/vfio.h   |  32 
>  4 files changed, 181 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 65a83fd0e8c0..e9a904ce3f0d 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -318,9 +318,20 @@ static void vfio_pci_dma_fault_release(struct
> vfio_pci_device *vdev,
>   kfree(vdev->fault_pages);
>  }
> 
> -static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
> -struct vfio_pci_region *region,
> -struct vm_area_struct *vma)
> +static void
> +vfio_pci_dma_fault_response_release(struct vfio_pci_device *vdev,
> + struct vfio_pci_region *region)
> +{
> + if (vdev->dma_fault_response_wq)
> + destroy_workqueue(vdev->dma_fault_response_wq);
> + kfree(vdev->fault_response_pages);
> + vdev->fault_response_pages = NULL;
> +}
> +
> +static int __vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
> +  struct vfio_pci_region *region,
> +  struct vm_area_struct *vma,
> +  u8 *pages)
>  {
>   u64 phys_len, req_len, pgoff, req_start;
>   unsigned long long addr;
> @@ -333,14 +344,14 @@ static int vfio_pci_dma_fault_mmap(struct
> vfio_pci_device *vdev,
>   ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
>   req_start = pgoff << PAGE_SHIFT;
> 
> - /* only the second page of the producer fault region is mmappable */
> + /* only the second page of the fault region is mmappable */
>   if (req_start < PAGE_SIZE)
>   return -EINVAL;
> 
>   if (req_start + req_len > phys_len)
>   return -EINVAL;
> 
> - addr = virt_to_phys(vdev->fault_pages);
> + addr = virt_to_phys(pages);
>   vma->vm_private_data = vdev;
>   vma->vm_pgoff = (addr >> PAGE_SHIFT) + pgoff;
> 
> @@ -349,13 +360,29 @@ static int vfio_pci_dma_fault_mmap(struct
> vfio_pci_device *vdev,
>   return ret;
>  }
> 
> -static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device *vdev,
> -  struct vfio_pci_region *region,
> -  struct vfio_info_cap *caps)
> +static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
> +struct vfio_pci_region *region,
> +struct vm_area_struct *vma)
> +{
> + return __vfio_pci_dma_fault_mmap(vdev, region, vma,
> vdev->fault_pages);
> +}
> +
> +static int
> +vfio_pci_dma_fault_response_mmap(struct vfio_pci_device *vdev,
> + struct vfio_pci_region *region,
> + struct vm_area_struct *vma)
> +{
> + return __vfio_pci_dma_fault_mmap(vdev, region, vma,
> vdev->fault_response_pages);
> +}
> +
> +static int __vfio_pci_dma_fault_add_capability(struct vfio_pci_device *vdev,
> +struct vfio_pci_region *region,
> +struct vfio_info_cap *caps,
> +u32 cap_id)
>  {
>   struct vfio_region_info_cap_sparse_mmap *sparse = NULL;
>   struct vfio_region_info_cap_fault cap = {
> - .header.id = VFIO_REGION_INFO_CAP_DMA_FAULT,
> +

RE: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support

2021-01-07 Thread Shameerali Kolothum Thodi

> -Original Message-
> From: wanghuiqiang
> Sent: 06 January 2021 09:22
> To: Shameerali Kolothum Thodi ;
> 'Ard Biesheuvel' 
> Cc: 'Marc Zyngier' ; 'eric.au...@redhat.com'
> ; 'linux-kernel@vger.kernel.org'
> ; 'linux-arm-ker...@lists.infradead.org'
> ; Linuxarm ;
> xuwei (O) 
> Subject: 答复: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support
> 
> Hi Ard and all,
> 
> The issue is root caused, it is introduced by BIOS new feature implemented.
> With old BIOS，we use static MADT table and the GICV/GICH is set to 0 and
> reported this table to OS. But we added new features which will dynamic
> update MADT table based on some external input, the developer is set
> GICV/GICH as what we have done like previous generation chipset code did.
> But in fact, there is different compared with old generation chipset code.
> I'll let my internal team know this and fix this issue in later BIOS release.

Thanks Wanghuiqiang for your efforts and confirming the issue.

Hi Marc,

Considering the fact that we have systems out there with the faulty BIOS, and 
it is
not necessarily everyone will be keen to update the BIOS, I think it is better 
to
address this in kernel as well.

As discussed earlier, please consider the SRE bit based solution to make the 
logic
more robust irrespective of what BIOS provides.

(I don’t have an erratum id for this as I am told we keep that for Hardware 
issues
only, but we are using DTS202101070OAGUIP1L00 to track the issue and can be
used as reference).

Thanks,
Shameer 

> Thanks!
> 
> -邮件原件-
> 发件人: wanghuiqiang
> 发送时间: 2020年12月15日 15:49
> 收件人: Shameerali Kolothum Thodi
> ; Ard Biesheuvel
> 
> 抄送: Marc Zyngier ; eric.au...@redhat.com;
> linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Linuxarm
> ; xuwei (O) 
> 主题: 答复: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support
> 
> Sorry response late.
> Hi Shameer & Ard,
> 
> Could you let me know which firmware you are using? If the difference is Madt
> table vGIC your pointed , they are the same. We changed the vGIC memory
> base address at very early design stage.
> 
> Thanks！
> 
> -邮件原件-
> 发件人: Shameerali Kolothum Thodi
> 发送时间: 2020年12月2日 16:23
> 收件人: Ard Biesheuvel 
> 抄送: Marc Zyngier ; eric.au...@redhat.com;
> linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Linuxarm
> ; wanghuiqiang ; xuwei
> (O) 
> 主题: RE: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support
> 
> [+]
> 
> > -Original Message-
> > From: Ard Biesheuvel [mailto:a...@kernel.org]
> > Sent: 30 November 2020 18:32
> > To: Shameerali Kolothum Thodi 
> > Cc: Marc Zyngier ; eric.au...@redhat.com;
> > linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> > Linuxarm 
> > Subject: Re: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy
> > support
> >
> ...
> 
> >
> > Any clue why production D06 firmware deviates from the D06 port that
> > exists in Tianocore's edk2-platforms repository? Because that version
> > does not have this bug, and I wonder why that code was upstreamed at
> > all if a substantially different version gets shipped with production
> > hardware.
> 
> Ok. Thanks for pointing this out. I have informed our UEFI team about this.
> They will check Internally and clarify.
> 
> Regards,
> Shameer

RE: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support

2020-12-15 Thread Shameerali Kolothum Thodi

Hi Wanghuiqiang,


> -Original Message-
> From: wanghuiqiang
> Sent: 15 December 2020 07:49
> To: Shameerali Kolothum Thodi ;
> Ard Biesheuvel 
> Cc: Marc Zyngier ; eric.au...@redhat.com;
> linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Linuxarm
> ; xuwei (O) 
> Subject: 答复: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support
> 
> Sorry response late.
> Hi Shameer & Ard,
> 
> Could you let me know which firmware you are using? If the difference is Madt
> table vGIC your pointed , they are the same. We changed the vGIC memory
> base address at very early design stage.

I checked the below ones and all these boards has the issue,

Openlab-Board - 69009,

DMI: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B270.01 05/08/2020

Openlab-Board-69008,

DMI: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B030.01 07/03/2020

UK-D06CS-board,

Boot firmware (version 2280-V2 CS V3.B220.01 built at 03/19/2020  16:52)

Thanks,
Shameer

> Thanks！
> 
> -邮件原件-
> 发件人: Shameerali Kolothum Thodi
> 发送时间: 2020年12月2日 16:23
> 收件人: Ard Biesheuvel 
> 抄送: Marc Zyngier ; eric.au...@redhat.com;
> linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Linuxarm
> ; wanghuiqiang ; xuwei
> (O) 
> 主题: RE: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support
> 
> [+]
> 
> > -Original Message-
> > From: Ard Biesheuvel [mailto:a...@kernel.org]
> > Sent: 30 November 2020 18:32
> > To: Shameerali Kolothum Thodi 
> > Cc: Marc Zyngier ; eric.au...@redhat.com;
> > linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> > Linuxarm 
> > Subject: Re: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy
> > support
> >
> ...
> 
> >
> > Any clue why production D06 firmware deviates from the D06 port that
> > exists in Tianocore's edk2-platforms repository? Because that version
> > does not have this bug, and I wonder why that code was upstreamed at
> > all if a substantially different version gets shipped with production
> > hardware.
> 
> Ok. Thanks for pointing this out. I have informed our UEFI team about this.
> They will check Internally and clarify.
> 
> Regards,
> Shameer

RE: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support

2020-12-09 Thread Shameerali Kolothum Thodi

Hi Eric,

> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: 18 November 2020 11:22
> To: eric.auger@gmail.com; eric.au...@redhat.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
> alex.william...@redhat.com
> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
> Thodi ;
> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
> nicoleots...@gmail.com; yuzenghui 
> Subject: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage
> support
> 
> When nested stage translation is setup, both s1_cfg and
> s2_cfg are set.
> 
> We introduce a new smmu domain abort field that will be set
> upon guest stage1 configuration passing.
> 
> arm_smmu_write_strtab_ent() is modified to write both stage
> fields in the STE and deal with the abort field.
> 
> In nested mode, only stage 2 is "finalized" as the host does
> not own/configure the stage 1 context descriptor; guest does.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> v10 -> v11:
> - Fix an issue reported by Shameer when switching from with vSMMU
>   to without vSMMU. Despite the spec does not seem to mention it
>   seems to be needed to reset the 2 high 64b when switching from
>   S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>   On some implementations, if the S2TTB is not reset, this causes
>   a C_BAD_STE error
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64
> +
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>  2 files changed, 56 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 18ac5af1b284..412ea1bafa50 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>* three cases at the moment:
>*
>* 1. Invalid (all zero) -> bypass/fault (init)
> -  * 2. Bypass/fault -> translation/bypass (attach)
> -  * 3. Translation/bypass -> bypass/fault (detach)
> +  * 2. Bypass/fault -> single stage translation/bypass (attach)
> +  * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
> +  * 4. S2 -> S1 + S2 (attach_pasid_table)
> +  * 5. S1 + S2 -> S2 (detach_pasid_table)
>*
>* Given that we can't update the STE atomically and the SMMU
>* doesn't read the thing in a defined order, that leaves us
> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>* 3. Update Config, sync
>*/
>   u64 val = le64_to_cpu(dst[0]);
> - bool ste_live = false;
> + bool s1_live = false, s2_live = false, ste_live;
> + bool abort, nested = false, translate = false;
>   struct arm_smmu_device *smmu = NULL;
>   struct arm_smmu_s1_cfg *s1_cfg;
>   struct arm_smmu_s2_cfg *s2_cfg;
> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>   default:
>   break;
>   }
> + nested = s1_cfg->set && s2_cfg->set;

This is a problem when the Guest is booted with iommu.passthrough = 1 as we
set s1_cfg.set = false for IOMMU_PASID_CONFIG_BYPASS. 

Results in BUG_ON(ste_live && !nested).

Can we instead have nested = true set a bit above in the code, where we set
s2_cfg->set = true for the ARM_SMMU_DOMAIN_NESTED case?

Please take a look.

Thanks,
Shameer

> + translate = s1_cfg->set || s2_cfg->set;
>   }
> 
>   if (val & STRTAB_STE_0_V) {
> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>   case STRTAB_STE_0_CFG_BYPASS:
>   break;
>   case STRTAB_STE_0_CFG_S1_TRANS:
> + s1_live = true;
> + break;
>   case STRTAB_STE_0_CFG_S2_TRANS:
> - ste_live = true;
> + s2_live = true;
> + break;
> + case STRTAB_STE_0_CFG_NESTED:
> + s1_live = true;
> + s2_live = true;
>   break;
>   case STRTAB_STE_0_CFG_ABORT:
> - BUG_ON(!disable_bypass);

RE: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2020-12-04 Thread Shameerali Kolothum Thodi

Hi Jean,

> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
> Sent: 04 December 2020 09:54
> To: Shameerali Kolothum Thodi 
> Cc: Auger Eric ; wangxingang
> ; Xieyingtai ;
> k...@vger.kernel.org; m...@kernel.org; j...@8bytes.org; w...@kernel.org;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> vivek.gau...@arm.com; alex.william...@redhat.com;
> zhangfei@linaro.org; robin.mur...@arm.com;
> kvm...@lists.cs.columbia.edu; eric.auger@gmail.com; Zengtao (B)
> ; qubingbing 
> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
> unmanaged ASIDs
> 
> Hi Shameer,
> 
> On Thu, Dec 03, 2020 at 06:42:57PM +, Shameerali Kolothum Thodi wrote:
> > Hi Jean/zhangfei,
> > Is it possible to have a branch with minimum required SVA/UACCE related
> patches
> > that are already public and can be a "stable" candidate for future respin of
> Eric's series?
> > Please share your thoughts.
> 
> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
> based on mainline? 

Yes. 

 The uacce-devel branches from
> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
> (they track the latest sva/zip-devel branch
> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)

Thanks. 

Hi Eric,

Could you please take a look at the above branches and see whether it make sense
to rebase on top of either of those?

>From vSVA point of view, it will be less rebase hassle if we can do that.

Thanks,
Shameer

> Thanks,
> Jean

RE: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2020-12-03 Thread Shameerali Kolothum Thodi

Hi Eric,

> -Original Message-
> From: kvmarm-boun...@lists.cs.columbia.edu
> [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Auger Eric
> Sent: 01 December 2020 13:59
> To: wangxingang 
> Cc: Xieyingtai ; jean-phili...@linaro.org;
> k...@vger.kernel.org; m...@kernel.org; j...@8bytes.org; w...@kernel.org;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> vivek.gau...@arm.com; alex.william...@redhat.com;
> zhangfei@linaro.org; robin.mur...@arm.com;
> kvm...@lists.cs.columbia.edu; eric.auger@gmail.com
> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
> unmanaged ASIDs
> 
> Hi Xingang,
> 
> On 12/1/20 2:33 PM, Xingang Wang wrote:
> > Hi Eric
> >
> > On  Wed, 18 Nov 2020 12:21:43, Eric Auger wrote:
> >> @@ -1710,7 +1710,11 @@ static void arm_smmu_tlb_inv_context(void
> *cookie)
> >> * insertion to guarantee those are observed before the TLBI. Do be
> >> * careful, 007.
> >> */
> >> -  if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> >> +  if (ext_asid >= 0) { /* guest stage 1 invalidation */
> >> +  cmd.opcode  = CMDQ_OP_TLBI_NH_ASID;
> >> +  cmd.tlbi.asid   = ext_asid;
> >> +  cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
> >> +  } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> >
> > Found a problem here, the cmd for guest stage 1 invalidation is built,
> > but it is not delivered to smmu.
> >
> 
> Thank you for the report. I will fix that soon. With that fixed, have
> you been able to run vSVA on top of the series. Do you need other stuff
> to be fixed at SMMU level? 

I am seeing another issue with this series. This is when you have the vSMMU
in non-strict mode(iommu.strict=0). Any network pass-through dev with iperf run 
will be enough to reproduce the issue. It may randomly stop/hang.

It looks like the .flush_iotlb_all from guest is not propagated down to the host
correctly. I have a temp hack to fix this in Qemu wherein CMDQ_OP_TLBI_NH_ASID
will result in a CACHE_INVALIDATE with IOMMU_INV_GRANU_PASID flag and archid
set.

Please take a look and let me know. 

As I am going to respin soon, please let me
> know what is the best branch to rebase to alleviate your integration.

Please find the latest kernel and Qemu branch with vSVA support added here,

https://github.com/hisilicon/kernel-dev/tree/5.10-rc4-2stage-v13-vsva
https://github.com/hisilicon/qemu/tree/v5.2.0-rc1-2stage-rfcv7-vsva

I have done some basic minimum vSVA tests on a HiSilicon D06 board with
a zip dev that supports STALL. All looks good so far apart from the issues
that have been already reported/discussed.

The kernel branch is actually a rebase of sva/uacce related patches from a
Linaro branch here,

https://github.com/Linaro/linux-kernel-uadk/tree/uacce-devel-5.10

I think going forward it will be good(if possible) to respin your series on top 
of
a sva branch with STALL/PRI support added. 

Hi Jean/zhangfei,
Is it possible to have a branch with minimum required SVA/UACCE related patches
that are already public and can be a "stable" candidate for future respin of 
Eric's series?
Please share your thoughts.

Thanks,
Shameer 

> Best Regards
> 
> Eric
> 
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

RE: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support

2020-12-02 Thread Shameerali Kolothum Thodi

[+]

> -Original Message-
> From: Ard Biesheuvel [mailto:a...@kernel.org]
> Sent: 30 November 2020 18:32
> To: Shameerali Kolothum Thodi 
> Cc: Marc Zyngier ; eric.au...@redhat.com;
> linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Linuxarm
> 
> Subject: Re: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support
> 
...

> 
> Any clue why production D06 firmware deviates from the D06 port that
> exists in Tianocore's edk2-platforms repository? Because that version
> does not have this bug, and I wonder why that code was upstreamed at
> all if a substantially different version gets shipped with production
> hardware.

Ok. Thanks for pointing this out. I have informed our UEFI team about this.
They will check Internally and clarify.

Regards,
Shameer

RE: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support

2020-11-30 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Marc Zyngier [mailto:m...@kernel.org]
> Sent: 30 November 2020 14:57
> To: Shameerali Kolothum Thodi 
> Cc: linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> eric.au...@redhat.com; Linuxarm 
> Subject: Re: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support
> 
> Hi Shameer,
> 
> On 2020-11-30 13:55, Shameerali Kolothum Thodi wrote:
> > Hi Marc,
> >
> >> -Original Message-
> >> From: Marc Zyngier [mailto:m...@kernel.org]
> >> Sent: 30 November 2020 12:28
> >> To: Shameerali Kolothum Thodi 
> >> Cc: linux-kernel@vger.kernel.org;
> >> linux-arm-ker...@lists.infradead.org;
> >> eric.au...@redhat.com; Linuxarm 
> >> Subject: Re: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy
> >> support
> >>
> >> Hi Shameer,
> >>
> >> On 2020-11-30 10:26, Shameer Kolothum wrote:
> >> > At present, the support for GICv2 backward compatibility on GICv3/v4
> >> > hardware is determined based on whether DT/ACPI provides a memory
> >> > mapped phys base address for GIC virtual CPU interface register(GICV).
> >> > This creates a problem that a Qemu guest boot with default GIC(GICv2)
> >>
> >> That'd be true of *any* guest using GICv2, not just when using QEMU as
> >> the VMM, right?
> >
> > Yes, I would think so.
> >
> >> > hangs when firmware falsely reports this address on systems that don't
> >> > have support for legacy mode.
> >>
> >> And I guess it isn't just the guest that hangs, but the whole system
> >> can
> >> go south (it would be totally legitimate for the HW to deliver a
> >> SError).
> >
> > So far I haven’t seen that happening. I was able to kill the Guest and
> > recover.
> > But the annoying thing is Guest boot hangs at random places without any
> > error reported and people end up spending lot of time only to be told
> > later
> > that gic-version=3 is missing from their scripts.
> 
> That's pretty lucky. The guest has been reading/writing to random
> places,
> and depending on where this maps in the physical space, anything can
> happen. Out  of (morbid) curiosity, what is at the address pointed to by
> GICC in MADT?

This is what it reports,

[02Ch 0044   1]Subtable Type : 0B [Generic Interrupt Controller]
[02Dh 0045   1]   Length : 50
...
[04Ch 0076   8] Base Address : 9B00
[054h 0084   8] Virtual GIC Base Address : 9B02
[05Ch 0092   8]  Hypervisor GIC Base Address : 9B01
[064h 0100   4]Virtual GIC Interrupt : 0019
[068h 0104   8]   Redistributor Base Address : AE10
[070h 0112   8]ARM MPIDR : 0008
[078h 0120   1] Efficiency Class : 15
[079h 0121   3] Reserved : 001500

> >
> >> > As per GICv3/v4 spec, in an implementation that does not support legacy
> >> > operation, affinity routing and system register access are permanently
> >> > enabled. This means that the associated control bits are RAO/WI. Hence
> >> > use the ICC_SRE_EL1.SRE bit to decide whether hardware supports
> GICv2
> >> > mode in addition to the above firmware based check.
> >> >
> >> > Signed-off-by: Shameer Kolothum
> 
> >> > ---
> >> > On Hisilicon D06, UEFI sets the GIC MADT GICC gicv_base_address but
> the
> >> > GIC implementation on these boards doesn't have the GICv2 legacy
> >> > support.
> >> > This results in, Guest boot hang when Qemu uses the default GIC option.
> >>
> >> What a bore. Is this glorious firmware really out in the wild?
> >
> > :(. I am afraid it is.
> 
> Meh. We'll have to paper over it then. How urgent is that?

It is not that urgent urgent but 5.10 support would be nice :)

> 
> [...]
> 
> >> How about this instead? Completely untested, of course.
> >
> > Thanks for that. I just tested and it works.
> 
> OK. I'll rework it a bit and post it as a complete patch. Is there an
> erratum number on your side?

Sure. I am not sure on erratum, but will check internally and get back to you
if there is one.

Thanks,
Shameer
> 
> Thanks,
> 
>  M.
> --
> Jazz is not dead. It just smells funny...

RE: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support

2020-11-30 Thread Shameerali Kolothum Thodi

Hi Marc,

> -Original Message-
> From: Marc Zyngier [mailto:m...@kernel.org]
> Sent: 30 November 2020 12:28
> To: Shameerali Kolothum Thodi 
> Cc: linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> eric.au...@redhat.com; Linuxarm 
> Subject: Re: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support
> 
> Hi Shameer,
> 
> On 2020-11-30 10:26, Shameer Kolothum wrote:
> > At present, the support for GICv2 backward compatibility on GICv3/v4
> > hardware is determined based on whether DT/ACPI provides a memory
> > mapped phys base address for GIC virtual CPU interface register(GICV).
> > This creates a problem that a Qemu guest boot with default GIC(GICv2)
> 
> That'd be true of *any* guest using GICv2, not just when using QEMU as
> the VMM, right?

Yes, I would think so.

> > hangs when firmware falsely reports this address on systems that don't
> > have support for legacy mode.
> 
> And I guess it isn't just the guest that hangs, but the whole system can
> go south (it would be totally legitimate for the HW to deliver a
> SError).

So far I haven’t seen that happening. I was able to kill the Guest and recover.
But the annoying thing is Guest boot hangs at random places without any
error reported and people end up spending lot of time only to be told later
that gic-version=3 is missing from their scripts.  
 
> > As per GICv3/v4 spec, in an implementation that does not support legacy
> > operation, affinity routing and system register access are permanently
> > enabled. This means that the associated control bits are RAO/WI. Hence
> > use the ICC_SRE_EL1.SRE bit to decide whether hardware supports GICv2
> > mode in addition to the above firmware based check.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> > On Hisilicon D06, UEFI sets the GIC MADT GICC gicv_base_address but the
> > GIC implementation on these boards doesn't have the GICv2 legacy
> > support.
> > This results in, Guest boot hang when Qemu uses the default GIC option.
> 
> What a bore. Is this glorious firmware really out in the wild?
 
:(. I am afraid it is. 

> > With this patch, the Qemu Guest with GICv2 now gracefully exits,
> >  "qemu-system-aarch64: host does not support in-kernel GICv2 emulation"
> >
> > Not very sure there is a better way to detect this other than checking
> > the SRE bit as done in this patch(Of course, we will be fixing the UEFI
> > going forward).
> 
> I don't think there is any other reliable way, but see below.
> 
> >
> > Thanks,
> > Shameer
> >
> > ---
> >  drivers/irqchip/irq-gic-v3.c | 33 -
> >  1 file changed, 28 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-gic-v3.c
> > b/drivers/irqchip/irq-gic-v3.c
> > index 16fecc0febe8..15fa1eea45e4 100644
> > --- a/drivers/irqchip/irq-gic-v3.c
> > +++ b/drivers/irqchip/irq-gic-v3.c
> > @@ -1835,6 +1835,27 @@ static void __init
> > gic_populate_ppi_partitions(struct device_node *gic_node)
> > of_node_put(parts_node);
> >  }
> >
> > +/* SRE bit being RAO/WI implies no GICv2 legacy mode support */
> > +static bool __init gic_gicv2_compatible(void)
> > +{
> > +   u32 org, val;
> > +
> > +   org = gic_read_sre();
> > +   if (!(org & ICC_SRE_EL1_SRE))
> > +   return true;
> > +
> > +   val = org & ~ICC_SRE_EL1_SRE;
> > +   gic_write_sre(val);
> > +
> > +   val = gic_read_sre();
> > +   gic_write_sre(org);
> > +
> > +   if (val & ICC_SRE_EL1_SRE)
> > +   return false;
> > +
> > +   return true;
> > +}
> > +
> >  static void __init gic_of_setup_kvm_info(struct device_node *node)
> >  {
> > int ret;
> > @@ -1851,10 +1872,12 @@ static void __init
> > gic_of_setup_kvm_info(struct device_node *node)
> >  _idx))
> > gicv_idx = 1;
> >
> > -   gicv_idx += 3;  /* Also skip GICD, GICC, GICH */
> > -   ret = of_address_to_resource(node, gicv_idx, );
> > -   if (!ret)
> > -   gic_v3_kvm_info.vcpu = r;
> > +   if (gic_gicv2_compatible()) {
> > +   gicv_idx += 3;  /* Also skip GICD, GICC, GICH */
> > +   ret = of_address_to_resource(node, gicv_idx, );
> > +   if (!ret)
> > +   gic_v3_kvm_info.vcpu = r;
> > +   }
> >
> > gic_v3_kvm_info.has_v4 = gic_data.rdists.has_vlpis;
> > gic_v3_kvm_info.has_v4_1 = gic_data.rdists.has_rvpeid;
> > @@ -2164,7 +2187,7 @@ static void __init gic_acpi_setup_kvm

RE: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support

2020-11-30 Thread Shameerali Kolothum Thodi

Hi Zenghui,

> -Original Message-
> From: yuzenghui
> Sent: 30 November 2020 11:51
> To: Shameerali Kolothum Thodi ;
> linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org
> Cc: m...@kernel.org; Linuxarm ;
> eric.au...@redhat.com
> Subject: Re: [PATCH] irqchip/gic-v3: Check SRE bit for GICv2 legacy support
> 
> Hi Shameer,
> 
> On 2020/11/30 18:26, Shameer Kolothum wrote:
> > At present, the support for GICv2 backward compatibility on GICv3/v4
> > hardware is determined based on whether DT/ACPI provides a memory
> > mapped phys base address for GIC virtual CPU interface register(GICV).
> > This creates a problem that a Qemu guest boot with default GIC(GICv2)
> > hangs when firmware falsely reports this address on systems that don't
> > have support for legacy mode.
> 
> So the problem is that BIOS has provided us a bogus GICC Structure.

Yes. And kernel uses this field to determine the legacy support.

> 
> > As per GICv3/v4 spec, in an implementation that does not support legacy
> > operation, affinity routing and system register access are permanently
> > enabled. This means that the associated control bits are RAO/WI. Hence
> > use the ICC_SRE_EL1.SRE bit to decide whether hardware supports GICv2
> > mode in addition to the above firmware based check.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> > On Hisilicon D06, UEFI sets the GIC MADT GICC gicv_base_address but the
> > GIC implementation on these boards doesn't have the GICv2 legacy support.
> > This results in, Guest boot hang when Qemu uses the default GIC option.
> >
> > With this patch, the Qemu Guest with GICv2 now gracefully exits,
> >   "qemu-system-aarch64: host does not support in-kernel GICv2 emulation"
> >
> > Not very sure there is a better way to detect this other than checking
> > the SRE bit as done in this patch(Of course, we will be fixing the UEFI
> > going forward).
> 
> Yes, I had seen the same problem on the D06. But I *do* think it's the
> firmware that actually needs to be fixed.

Well, I am not sure I agree with that. The ACPI spec 6.3, section 5.2.12.14, 
says,
"If the platform is not presenting a GICv2 with virtualization extensions this 
field *can* be 0". So don’t think it mandates that.

> 
> > Thanks,
> > Shameer
> >
> > ---
> >   drivers/irqchip/irq-gic-v3.c | 33 -
> >   1 file changed, 28 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > index 16fecc0febe8..15fa1eea45e4 100644
> > --- a/drivers/irqchip/irq-gic-v3.c
> > +++ b/drivers/irqchip/irq-gic-v3.c
> > @@ -1835,6 +1835,27 @@ static void __init
> gic_populate_ppi_partitions(struct device_node *gic_node)
> > of_node_put(parts_node);
> >   }
> >
> > +/* SRE bit being RAO/WI implies no GICv2 legacy mode support */
> 
> I'm wondering if this is a mandate of the architecture.

As I mentioned above, I am not sure this is the best way, though,
section 1.3.5 of GICv3 spec, says(for no legacy support case "affinity
routing and system register access are permanently enabled. This means
that the associated control bits are RAO/WI"

But again later in the spec, it uses "might choose to
make this bit RAO/WI". So it is arguable that it mandates it or not.

I leave that to Marc :)

Thanks,
Shameer 

> > enabled.
> > +static bool __init gic_gicv2_compatible(void)
> > +{
> > +   u32 org, val;
> > +
> > +   org = gic_read_sre();
> > +   if (!(org & ICC_SRE_EL1_SRE))
> > +   return true;
> > +
> > +   val = org & ~ICC_SRE_EL1_SRE;
> > +   gic_write_sre(val);
> > +
> > +   val = gic_read_sre();
> > +   gic_write_sre(org);
> > +
> > +   if (val & ICC_SRE_EL1_SRE)
> > +   return false;
> > +
> > +   return true;
> > +}
> > +
> >   static void __init gic_of_setup_kvm_info(struct device_node *node)
> >   {
> > int ret;
> > @@ -1851,10 +1872,12 @@ static void __init gic_of_setup_kvm_info(struct
> device_node *node)
> >  _idx))
> > gicv_idx = 1;
> >
> > -   gicv_idx += 3;  /* Also skip GICD, GICC, GICH */
> > -   ret = of_address_to_resource(node, gicv_idx, );
> > -   if (!ret)
> > -   gic_v3_kvm_info.vcpu = r;
> > +   if (gic_gicv2_compatible()) {
> > +   gicv_idx += 3;  /* Also skip GICD, GICC, GICH */
> > +   ret = of_address_to_resource(node, gicv_idx, );
> > +   if (!ret)
> > +   gic_v3_kvm_info.vcpu = r;
> > +   }
> >
> > gic_v3_kvm_info.has_v4 = gic_data.rdists.has_vlpis;
> > gic_v3_kvm_info.has_v4_1 = gic_data.rdists.has_rvpeid;
> > @@ -2164,7 +2187,7 @@ static void __init gic_acpi_setup_kvm_info(void)
> >
> > gic_v3_kvm_info.maint_irq = irq;
> >
> > -   if (acpi_data.vcpu_base) {
> > +   if (gic_gicv2_compatible() && acpi_data.vcpu_base) {
> > struct resource *vcpu = _v3_kvm_info.vcpu;
> >
> > vcpu->flags = IORESOURCE_MEM;
> 
> Thanks,
> Zenghui

RE: [PATCH v11 08/13] vfio/pci: Add framework for custom interrupt indices

2020-11-23 Thread Shameerali Kolothum Thodi

Hi Eric,

> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: 16 November 2020 11:00
> To: eric.auger@gmail.com; eric.au...@redhat.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
> alex.william...@redhat.com
> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
> Thodi ;
> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
> nicoleots...@gmail.com; yuzenghui 
> Subject: [PATCH v11 08/13] vfio/pci: Add framework for custom interrupt
> indices
> 
> Implement IRQ capability chain infrastructure. All interrupt
> indexes beyond VFIO_PCI_NUM_IRQS are handled as extended
> interrupts. They are registered with a specific type/subtype
> and supported flags.
> 
> Signed-off-by: Eric Auger 
> ---
>  drivers/vfio/pci/vfio_pci.c | 99 +++--
>  drivers/vfio/pci/vfio_pci_intrs.c   | 62 ++
>  drivers/vfio/pci/vfio_pci_private.h | 14 
>  3 files changed, 157 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 2a6cc1a87323..93e03a4a5f32 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -608,6 +608,14 @@ static void vfio_pci_disable(struct vfio_pci_device
> *vdev)
> 
>   WARN_ON(iommu_unregister_device_fault_handler(>pdev->dev));
> 
> + for (i = 0; i < vdev->num_ext_irqs; i++)
> + vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
> + VFIO_IRQ_SET_ACTION_TRIGGER,
> + VFIO_PCI_NUM_IRQS + i, 0, 0, NULL);
> + vdev->num_ext_irqs = 0;
> + kfree(vdev->ext_irqs);
> + vdev->ext_irqs = NULL;
> +
>   /* Device closed, don't need mutex here */
>   list_for_each_entry_safe(ioeventfd, ioeventfd_tmp,
>>ioeventfds_list, next) {
> @@ -823,6 +831,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device
> *vdev, int irq_type)
>   return 1;
>   } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
>   return 1;
> + } else if (irq_type >= VFIO_PCI_NUM_IRQS &&
> +irq_type < VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs) {
> + return 1;
>   }
> 
>   return 0;
> @@ -1008,7 +1019,7 @@ static long vfio_pci_ioctl(void *device_data,
>   info.flags |= VFIO_DEVICE_FLAGS_RESET;
> 
>   info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions;
> - info.num_irqs = VFIO_PCI_NUM_IRQS;
> + info.num_irqs = VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs;
> 
>   if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV)) {
>   int ret = vfio_pci_info_zdev_add_caps(vdev, );
> @@ -1187,36 +1198,87 @@ static long vfio_pci_ioctl(void *device_data,
> 
>   } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>   struct vfio_irq_info info;
> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> + unsigned long capsz;
> 
>   minsz = offsetofend(struct vfio_irq_info, count);
> 
> + /* For backward compatibility, cannot require this */
> + capsz = offsetofend(struct vfio_irq_info, cap_offset);
> +
>   if (copy_from_user(, (void __user *)arg, minsz))
>   return -EFAULT;
> 
> - if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
> + if (info.argsz < minsz ||
> + info.index >= VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs)
>   return -EINVAL;
> 
> - switch (info.index) {
> - case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
> - case VFIO_PCI_REQ_IRQ_INDEX:
> - break;
> - case VFIO_PCI_ERR_IRQ_INDEX:
> - if (pci_is_pcie(vdev->pdev))
> - break;
> - fallthrough;
> - default:
> - return -EINVAL;
> - }
> + if (info.argsz >= capsz)
> + minsz = capsz;
> 
>   info.flags = VFIO_IRQ_INFO_EVENTFD;
> 
> - info.count = vfio_pci_get_irq_count(vdev, info.index);
> -
> - if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
> + switch (info.index) {
> +

RE: [PATCH v12 04/15] iommu/smmuv3: Dynamically allocate s1_cfg and s2_cfg

2020-11-17 Thread Shameerali Kolothum Thodi

Hi Eric,

> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: 16 November 2020 10:43
> To: eric.auger@gmail.com; eric.au...@redhat.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
> Thodi ;
> alex.william...@redhat.com; jacob.jun@linux.intel.com;
> yi.l@intel.com; t...@semihalf.com; nicoleots...@gmail.com
> Subject: [PATCH v12 04/15] iommu/smmuv3: Dynamically allocate s1_cfg and
> s2_cfg
> 
> In preparation for the introduction of nested stages
> let's turn s1_cfg and s2_cfg fields into pointers which are
> dynamically allocated depending on the smmu_domain stage.

This will break compile if we have CONFIG_ARM_SMMU_V3_SVA
because ,
https://github.com/eauger/linux/blob/5.10-rc4-2stage-v12/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c#L40

Do we really need to make these pointers?

Thanks,
Shameer
 
> In nested mode, both stages will coexist and s1_cfg will
> be allocated when the guest configuration gets passed.
> 
> Signed-off-by: Eric Auger 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 83 -
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  6 +-
>  2 files changed, 48 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index d828d6cbeb0e..4baf9fafe462 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -953,9 +953,9 @@ static __le64 *arm_smmu_get_cd_ptr(struct
> arm_smmu_domain *smmu_domain,
>   unsigned int idx;
>   struct arm_smmu_l1_ctx_desc *l1_desc;
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
> - struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg;
> + struct arm_smmu_ctx_desc_cfg *cdcfg =
> _domain->s1_cfg->cdcfg;
> 
> - if (smmu_domain->s1_cfg.s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
> + if (smmu_domain->s1_cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
>   return cdcfg->cdtab + ssid * CTXDESC_CD_DWORDS;
> 
>   idx = ssid >> CTXDESC_SPLIT;
> @@ -990,7 +990,7 @@ int arm_smmu_write_ctx_desc(struct
> arm_smmu_domain *smmu_domain, int ssid,
>   __le64 *cdptr;
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
> 
> - if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg.s1cdmax)))
> + if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg->s1cdmax)))
>   return -E2BIG;
> 
>   cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid);
> @@ -1056,7 +1056,7 @@ static int arm_smmu_alloc_cd_tables(struct
> arm_smmu_domain *smmu_domain)
>   size_t l1size;
>   size_t max_contexts;
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
> - struct arm_smmu_s1_cfg *cfg = _domain->s1_cfg;
> + struct arm_smmu_s1_cfg *cfg = smmu_domain->s1_cfg;
>   struct arm_smmu_ctx_desc_cfg *cdcfg = >cdcfg;
> 
>   max_contexts = 1 << cfg->s1cdmax;
> @@ -1104,7 +1104,7 @@ static void arm_smmu_free_cd_tables(struct
> arm_smmu_domain *smmu_domain)
>   int i;
>   size_t size, l1size;
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
> - struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg;
> + struct arm_smmu_ctx_desc_cfg *cdcfg =
> _domain->s1_cfg->cdcfg;
> 
>   if (cdcfg->l1_desc) {
>   size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
> @@ -1211,17 +1211,8 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>   }
> 
>   if (smmu_domain) {
> - switch (smmu_domain->stage) {
> - case ARM_SMMU_DOMAIN_S1:
> - s1_cfg = _domain->s1_cfg;
> - break;
> - case ARM_SMMU_DOMAIN_S2:
> - case ARM_SMMU_DOMAIN_NESTED:
> - s2_cfg = _domain->s2_cfg;
> - break;
> - default:
> - break;
> - }
> + s1_cfg = smmu_domain->s1_cfg;
> + s2_cfg = smmu_domain->s2_cfg;
>   }
> 
>   if (val & STRTAB_STE_0_V) {
> @@ -1664,10 +1655,10 @@ static void arm_smmu_tlb_inv_context(void
> *cookie)
>* careful, 007.
>*/
>   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> - arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
> +

RE: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2020-11-17 Thread Shameerali Kolothum Thodi

Hi Eric,

First, many thanks for the respin. I will go through all of 
these(iommu/vfio/Qemu)
and will do a thorough verification/tests on our hardware. 

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 17 November 2020 08:40
> To: Shameerali Kolothum Thodi ;
> Zhangfei Gao ; eric.auger@gmail.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
> Cc: jean-phili...@linaro.org; alex.william...@redhat.com;
> jacob.jun@linux.intel.com; yi.l@intel.com; peter.mayd...@linaro.org;
> t...@semihalf.com; bbhush...@marvell.com
> Subject: Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
> 
> Hi Shameer,
> 
> On 5/13/20 5:57 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -Original Message-
> >> From: Auger Eric [mailto:eric.au...@redhat.com]
> >> Sent: 13 May 2020 14:29
> >> To: Shameerali Kolothum Thodi ;
> >> Zhangfei Gao ; eric.auger@gmail.com;
> >> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> >> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> >> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
> >> Cc: jean-phili...@linaro.org; alex.william...@redhat.com;
> >> jacob.jun@linux.intel.com; yi.l@intel.com;
> peter.mayd...@linaro.org;
> >> t...@semihalf.com; bbhush...@marvell.com
> >> Subject: Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
> >>
> > [...]
> >
> >>>>> Yes that's normal this series is not meant to support vSVM at this 
> >>>>> stage.
> >>>>>
> >>>>> I intend to add the missing pieces during the next weeks.
> >>>>
> >>>> Thanks for that. I have made an attempt to add the vSVA based on
> >>>> your v10 + JPBs sva patches. The host kernel and Qemu changes can
> >>>> be found here[1][2].
> >>>>
> >>>> This basically adds multiple pasid support on top of your changes.
> >>>> I have done some basic sanity testing and we have some initial success
> >>>> with the zip vf dev on our D06 platform. Please note that the STALL event
> is
> >>>> not yet supported though, but works fine if we mlock() guest usr mem.
> >>>
> >>> I have added STALL support for our vSVA prototype and it seems to be
> >>> working(on our hardware). I have updated the kernel and qemu branches
> >> with
> >>> the same[1][2]. I should warn you though that these are prototype code
> and I
> >> am pretty
> >>> much re-using the VFIO_IOMMU_SET_PASID_TABLE interface for almost
> >> everything.
> >>> But thought of sharing, in case if it is useful somehow!.
> >>
> >> Thank you again for sharing the POC. I looked at the kernel and QEMU
> >> branches.
> >>
> >> Here are some preliminary comments:
> >> - "arm-smmu-v3: Reset S2TTB while switching back from nested stage":
> as
> >> you mentionned S2TTB reset now is featured in v11
> >
> > Yes.
> >
> >> - "arm-smmu-v3: Add support for multiple pasid in nested mode": I could
> >> easily integrate this into my series. Update the iommu api first and
> >> pass multiple CD info in a separate patch
> >
> > Ok.
> in v12, I added
> [PATCH v12 14/15] iommu/smmuv3: Accept configs with more than one
> context descriptor
> 
> I don't think you need to add s1cdmax addition as we already have
> pasid_bits which should do the job.

Ok.
 
> >> - "arm-smmu-v3: Add support to Invalidate CD": CD invalidation should be
> >> cascaded to host through the PASID cache invalidation uapi (no pb you
> >> warned us for the POC you simply used VFIO_IOMMU_SET_PASID_TABLE). I
> >> think I should add this support in my original series although it does
> >> not seem to trigger any issue up to now.
> >
> > Agree. Cache invalidation uapi is a better interface for this. Also I don’t 
> > think
> > this matters for non-vsva cases as Guest kernel table/CD(pasid 0) will never
> > get invalidated.
> in v12 I added [PATCH v12 15/15] iommu/smmuv3: Add PASID cache
> invalidation per PASID. I have not tested it though.

Ok. Will verify this.

> >> - "arm-smmu-v3: Remove duplication of fault propagation". I understand
> >> the transcode is done somewhere else with SVA but we

RE: [PATCH v10 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE

2020-10-27 Thread Shameerali Kolothum Thodi

Hi Eric,

> -Original Message-
> From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of
> Auger Eric
> Sent: 23 September 2020 12:47
> To: yuzenghui ; eric.auger@gmail.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org;
> alex.william...@redhat.com; jacob.jun@linux.intel.com;
> yi.l@intel.com; robin.mur...@arm.com
> Subject: Re: [PATCH v10 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE

...

> > Besides, before going through the whole series [1][2], I'd like to
> > know if this is the latest version of your Nested-Stage-Setup work in
> > case I had missed something.
> >
> > [1]
> > https://lore.kernel.org/r/20200320161911.27494-1-eric.au...@redhat.com
> > [2]
> > https://lore.kernel.org/r/20200414150607.28488-1-eric.au...@redhat.com
> 
> yes those 2 series are the last ones. Thank you for reviewing.
> 
> FYI, I intend to respin within a week or 2 on top of Jacob's  [PATCH v10 0/7]
> IOMMU user API enhancement. 

Thanks for that. Also is there any plan to respin the related Qemu series as 
well?
I know dual stage interface proposals are still under discussion, but it would 
be
nice to have a testable solution based on new interfaces for ARM64 as well.
Happy to help with any tests or verifications.

Please let me know.

Thanks,
Shameer

RE: [PATCH v2 4/9] iommu/ioasid: Add reference couting functions

2020-09-24 Thread Shameerali Kolothum Thodi

Hi Jacob,

> -Original Message-
> From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of
> Jacob Pan
> Sent: 22 August 2020 05:35
> To: io...@lists.linux-foundation.org; LKML ;
> Jean-Philippe Brucker ; Lu Baolu
> ; Joerg Roedel ; David
> Woodhouse 
> Cc: Tian, Kevin ; Raj Ashok ; Wu
> Hao 
> Subject: [PATCH v2 4/9] iommu/ioasid: Add reference couting functions
> 
> There can be multiple users of an IOASID, each user could have hardware
> contexts associated with the IOASID. In order to align lifecycles,
> reference counting is introduced in this patch. It is expected that when
> an IOASID is being freed, each user will drop a reference only after its
> context is cleared.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/ioasid.c | 113
> +
>  include/linux/ioasid.h |   4 ++
>  2 files changed, 117 insertions(+)
> 
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index f73b3dbfc37a..5f31d63c75b1 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -717,6 +717,119 @@ int ioasid_set_for_each_ioasid(struct ioasid_set
> *set,
>  EXPORT_SYMBOL_GPL(ioasid_set_for_each_ioasid);
> 
>  /**
> + * IOASID refcounting rules
> + * - ioasid_alloc() set initial refcount to 1
> + *
> + * - ioasid_free() decrement and test refcount.
> + * If refcount is 0, ioasid will be freed. Deleted from the system-wide
> + * xarray as well as per set xarray. The IOASID will be returned to the
> + * pool and available for new allocations.
> + *
> + * If recount is non-zero, mark IOASID as
> IOASID_STATE_FREE_PENDING.
> + * No new reference can be added. The IOASID is not returned to the
> pool
> + * for reuse.
> + * After free, ioasid_get() will return error but ioasid_find() and other
> + * non refcount adding APIs will continue to work until the last 
> reference
> + * is dropped
> + *
> + * - ioasid_get() get a reference on an active IOASID
> + *
> + * - ioasid_put() decrement and test refcount of the IOASID.
> + * If refcount is 0, ioasid will be freed. Deleted from the system-wide
> + * xarray as well as per set xarray. The IOASID will be returned to the
> + * pool and available for new allocations.
> + * Do nothing if refcount is non-zero.
> + *

Is it better to have a return for this based on whether ioasid is freed or not? 

I was going through Jean's SMMUv3 SVA patches[1] and that one returns true
if ioasid was freed. And that info is subsequently used to reset the pasid 
associated
with a mm. Though, not sure that is still relevant or not.

Thanks,
Shameer
1. 
https://lore.kernel.org/linux-iommu/20200918101852.582559-3-jean-phili...@linaro.org/

> + * - ioasid_find() does not take reference, caller must hold reference
> + *
> + * ioasid_free() can be called multiple times without error until all refs 
> are
> + * dropped.
> + */
> +
> +int ioasid_get_locked(struct ioasid_set *set, ioasid_t ioasid)
> +{
> + struct ioasid_data *data;
> +
> + data = xa_load(_allocator->xa, ioasid);
> + if (!data) {
> + pr_err("Trying to get unknown IOASID %u\n", ioasid);
> + return -EINVAL;
> + }
> + if (data->state == IOASID_STATE_FREE_PENDING) {
> + pr_err("Trying to get IOASID being freed%u\n", ioasid);
> + return -EBUSY;
> + }
> +
> + if (set && data->set != set) {
> + pr_err("Trying to get IOASID not in set%u\n", ioasid);
> + /* data found but does not belong to the set */
> + return -EACCES;
> + }
> + refcount_inc(>users);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_get_locked);
> +
> +/**
> + * ioasid_get - Obtain a reference of an ioasid
> + * @set
> + * @ioasid
> + *
> + * Check set ownership if @set is non-null.
> + */
> +int ioasid_get(struct ioasid_set *set, ioasid_t ioasid)
> +{
> + int ret = 0;
> +
> + spin_lock(_allocator_lock);
> + ret = ioasid_get_locked(set, ioasid);
> + spin_unlock(_allocator_lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_get);
> +
> +void ioasid_put_locked(struct ioasid_set *set, ioasid_t ioasid)
> +{
> + struct ioasid_data *data;
> +
> + data = xa_load(_allocator->xa, ioasid);
> + if (!data) {
> + pr_err("Trying to put unknown IOASID %u\n", ioasid);
> + return;
> + }
> +
> + if (set && data->set != set) {
> + pr_err("Trying to drop IOASID not in the set %u\n", ioasid);
> + return;
> + }
> +
> + if (!refcount_dec_and_test(>users)) {
> + pr_debug("%s: IOASID %d has %d remainning users\n",
> + __func__, ioasid, refcount_read(>users));
> + return;
> + }
> + ioasid_do_free(data);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_put_locked);
> +
> +/**
> + * ioasid_put - Drop a reference of an ioasid
> + * @set
> + * @ioasid
> + *
> + * Check set ownership if @set is non-null.
>

RE: [RFC PATCH] iommu/arm-smmu: Add module parameter to set msi iova address

2020-05-28 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 28 May 2020 12:48
> To: Shameerali Kolothum Thodi ;
> Jean-Philippe Brucker 
> Cc: Robin Murphy ; Joerg Roedel
> ; io...@lists.linux-foundation.org; Linux Kernel Mailing
> List ; Alex Williamson
> ; Srinath Mannam
> ; BCM Kernel Feedback
> ; Will Deacon ;
> Linux ARM 
> Subject: Re: [RFC PATCH] iommu/arm-smmu: Add module parameter to set msi
> iova address
> 
> 
> 
> On 5/28/20 11:15 AM, Shameerali Kolothum Thodi wrote:
> >
> >
> >> -Original Message-
> >> From: Auger Eric [mailto:eric.au...@redhat.com]
> >> Sent: 28 May 2020 09:54
> >> To: Jean-Philippe Brucker 
> >> Cc: Will Deacon ; Joerg Roedel ;
> >> io...@lists.linux-foundation.org; Shameerali Kolothum Thodi
> >> ; Linux Kernel Mailing List
> >> ; Alex Williamson
> >> ; Srinath Mannam
> >> ; BCM Kernel Feedback
> >> ; Robin Murphy
> >> ; Linux ARM
> 
> >> Subject: Re: [RFC PATCH] iommu/arm-smmu: Add module parameter to set
> msi
> >> iova address
> >>
> >> Hi,
> >>
> >> On 5/28/20 10:38 AM, Jean-Philippe Brucker wrote:
> >>> [+ Shameer]
> >>>
> >>> On Thu, May 28, 2020 at 09:43:46AM +0200, Auger Eric wrote:
> >>>> Hi,
> >>>>
> >>>> On 5/28/20 9:23 AM, Jean-Philippe Brucker wrote:
> >>>>> On Thu, May 28, 2020 at 10:45:14AM +0530, Srinath Mannam wrote:
> >>>>>> On Wed, May 27, 2020 at 11:00 PM Robin Murphy
> >>  wrote:
> >>>>>>>
> >>>>>> Thanks Robin for your quick response.
> >>>>>>> On 2020-05-27 17:03, Srinath Mannam wrote:
> >>>>>>>> This patch gives the provision to change default value of MSI IOVA
> base
> >>>>>>>> to platform's suitable IOVA using module parameter. The present
> >>>>>>>> hardcoded MSI IOVA base may not be the accessible IOVA ranges of
> >> platform.
> >>>>>>>
> >>>>>>> That in itself doesn't seem entirely unreasonable; IIRC the current
> >>>>>>> address is just an arbitrary choice to fit nicely into Qemu's memory
> >>>>>>> map, and there was always the possibility that it wouldn't suit
> >> everything.
> >>>>>>>
> >>>>>>>> Since commit aadad097cd46 ("iommu/dma: Reserve IOVA for PCIe
> >> inaccessible
> >>>>>>>> DMA address"), inaccessible IOVA address ranges parsed from
> >> dma-ranges
> >>>>>>>> property are reserved.
> >>>>>
> >>>>> I don't understand why we only reserve the PCIe windows for DMA
> >> domains.
> >>>>> Shouldn't VFIO also prevent userspace from mapping them?
> >>>>
> >>>> VFIO prevents userspace from DMA mapping iovas within reserved
> regions:
> >>>> 9b77e5c79840  vfio/type1: check dma map request is within a valid iova
> >> range
> >>>
> >>> Right but I was asking specifically about the IOVA reservation introduced
> >>> by commit aadad097cd46. They are not registered as reserved regions
> within
> >>> the IOMMU core, they are only taken into account by dma-iommu.c when
> >>> creating a DMA domain. As VFIO uses UNMANAGED domains, it isn't
> aware
> >> of
> >>> those regions and they won't be seen by vfio_iommu_resv_exclude().
> >>>
> >>> It looks like the PCIe regions used to be common until cd2c9fcf5c66
> >>> ("iommu/dma: Move PCI window region reservation back into dma specific
> >>> path.") But I couldn't find the justification for this commit.
> >>
> >> Yes I noticed that as well when debugging the above mentioned case
> >> before and after cd2c9fcf5c66. I do not remember about the rationale of
> >> removing the DMA host brige windows from the resv regions. Did it break
> >> a legacy case?
> >>>
> >
> > I think yes. And going through the ML discussions, this was done so because
> with the
> > " vfio/type1: Add support for valid iova list management" series you 
> > reported
> > an issue with Seattle platform. See the full discussion here,
> >
> > https://lore.kernel.org/patchwork/patch/889012/
> 
> Hey thank you for reminding me of the Seattle case :-) Now I also recall
> that, i

RE: [RFC PATCH] iommu/arm-smmu: Add module parameter to set msi iova address

2020-05-28 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 28 May 2020 09:54
> To: Jean-Philippe Brucker 
> Cc: Will Deacon ; Joerg Roedel ;
> io...@lists.linux-foundation.org; Shameerali Kolothum Thodi
> ; Linux Kernel Mailing List
> ; Alex Williamson
> ; Srinath Mannam
> ; BCM Kernel Feedback
> ; Robin Murphy
> ; Linux ARM 
> Subject: Re: [RFC PATCH] iommu/arm-smmu: Add module parameter to set msi
> iova address
> 
> Hi,
> 
> On 5/28/20 10:38 AM, Jean-Philippe Brucker wrote:
> > [+ Shameer]
> >
> > On Thu, May 28, 2020 at 09:43:46AM +0200, Auger Eric wrote:
> >> Hi,
> >>
> >> On 5/28/20 9:23 AM, Jean-Philippe Brucker wrote:
> >>> On Thu, May 28, 2020 at 10:45:14AM +0530, Srinath Mannam wrote:
> >>>> On Wed, May 27, 2020 at 11:00 PM Robin Murphy
>  wrote:
> >>>>>
> >>>> Thanks Robin for your quick response.
> >>>>> On 2020-05-27 17:03, Srinath Mannam wrote:
> >>>>>> This patch gives the provision to change default value of MSI IOVA base
> >>>>>> to platform's suitable IOVA using module parameter. The present
> >>>>>> hardcoded MSI IOVA base may not be the accessible IOVA ranges of
> platform.
> >>>>>
> >>>>> That in itself doesn't seem entirely unreasonable; IIRC the current
> >>>>> address is just an arbitrary choice to fit nicely into Qemu's memory
> >>>>> map, and there was always the possibility that it wouldn't suit
> everything.
> >>>>>
> >>>>>> Since commit aadad097cd46 ("iommu/dma: Reserve IOVA for PCIe
> inaccessible
> >>>>>> DMA address"), inaccessible IOVA address ranges parsed from
> dma-ranges
> >>>>>> property are reserved.
> >>>
> >>> I don't understand why we only reserve the PCIe windows for DMA
> domains.
> >>> Shouldn't VFIO also prevent userspace from mapping them?
> >>
> >> VFIO prevents userspace from DMA mapping iovas within reserved regions:
> >> 9b77e5c79840  vfio/type1: check dma map request is within a valid iova
> range
> >
> > Right but I was asking specifically about the IOVA reservation introduced
> > by commit aadad097cd46. They are not registered as reserved regions within
> > the IOMMU core, they are only taken into account by dma-iommu.c when
> > creating a DMA domain. As VFIO uses UNMANAGED domains, it isn't aware
> of
> > those regions and they won't be seen by vfio_iommu_resv_exclude().
> >
> > It looks like the PCIe regions used to be common until cd2c9fcf5c66
> > ("iommu/dma: Move PCI window region reservation back into dma specific
> > path.") But I couldn't find the justification for this commit.
> 
> Yes I noticed that as well when debugging the above mentioned case
> before and after cd2c9fcf5c66. I do not remember about the rationale of
> removing the DMA host brige windows from the resv regions. Did it break
> a legacy case?
> >

I think yes. And going through the ML discussions, this was done so because 
with the 
" vfio/type1: Add support for valid iova list management" series you reported
an issue with Seattle platform. See the full discussion here,

https://lore.kernel.org/patchwork/patch/889012/

Cheers,
Shameer

> > The thing is, if VFIO isn't aware of the reserved PCIe windows, then
> > allowing VFIO or userspace to choose MSI_IOVA_BASE won't solve the
> problem
> > reported by Srinath, because they could well choose an IOVA within the
> > PCIe window...
> I agree with you
> 
> Thanks
> 
> Eric
> >
> > Thanks,
> > Jean
> >
> >> but it does not prevent the SW MSI region chosen by the kernel from
> >> colliding with other reserved regions (esp. PCIe host bridge windows).
> >>
> >>   If they were
> >>> part of the common reserved regions then we could have VFIO choose a
> >>> SW_MSI region among the remaining free space.
> >> As Robin said this was the initial chosen approach
> >> [PATCH 10/10] vfio: allow the user to register reserved iova range for
> >> MSI mapping
> >> https://patchwork.kernel.org/patch/8121641/
> >>
> >> Some additional background about why the static SW MSI region chosen by
> >> the kernel was later chosen:
> >> Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM
> >> PCIe/MSI passthrough on ARM/ARM64 (Alt II))
> >>
> https://lists.linuxfoundation.org/pipermail/iommu/2016-November/019060.h

RE: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2020-05-13 Thread Shameerali Kolothum Thodi

Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 13 May 2020 14:29
> To: Shameerali Kolothum Thodi ;
> Zhangfei Gao ; eric.auger@gmail.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
> Cc: jean-phili...@linaro.org; alex.william...@redhat.com;
> jacob.jun@linux.intel.com; yi.l@intel.com; peter.mayd...@linaro.org;
> t...@semihalf.com; bbhush...@marvell.com
> Subject: Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
> 
[...]

> >>> Yes that's normal this series is not meant to support vSVM at this stage.
> >>>
> >>> I intend to add the missing pieces during the next weeks.
> >>
> >> Thanks for that. I have made an attempt to add the vSVA based on
> >> your v10 + JPBs sva patches. The host kernel and Qemu changes can
> >> be found here[1][2].
> >>
> >> This basically adds multiple pasid support on top of your changes.
> >> I have done some basic sanity testing and we have some initial success
> >> with the zip vf dev on our D06 platform. Please note that the STALL event 
> >> is
> >> not yet supported though, but works fine if we mlock() guest usr mem.
> >
> > I have added STALL support for our vSVA prototype and it seems to be
> > working(on our hardware). I have updated the kernel and qemu branches
> with
> > the same[1][2]. I should warn you though that these are prototype code and I
> am pretty
> > much re-using the VFIO_IOMMU_SET_PASID_TABLE interface for almost
> everything.
> > But thought of sharing, in case if it is useful somehow!.
> 
> Thank you again for sharing the POC. I looked at the kernel and QEMU
> branches.
> 
> Here are some preliminary comments:
> - "arm-smmu-v3: Reset S2TTB while switching back from nested stage":  as
> you mentionned S2TTB reset now is featured in v11

Yes.

> - "arm-smmu-v3: Add support for multiple pasid in nested mode": I could
> easily integrate this into my series. Update the iommu api first and
> pass multiple CD info in a separate patch

Ok.
> - "arm-smmu-v3: Add support to Invalidate CD": CD invalidation should be
> cascaded to host through the PASID cache invalidation uapi (no pb you
> warned us for the POC you simply used VFIO_IOMMU_SET_PASID_TABLE). I
> think I should add this support in my original series although it does
> not seem to trigger any issue up to now.

Agree. Cache invalidation uapi is a better interface for this. Also I don’t 
think
this matters for non-vsva cases as Guest kernel table/CD(pasid 0) will never
get invalidated. 

> - "arm-smmu-v3: Remove duplication of fault propagation". I understand
> the transcode is done somewhere else with SVA but we still need to do it
> if a single CD is used, right? I will review the SVA code to better
> understand.

Hmm..not sure. Need to take another look to see whether we need a special
handling for single CD or not.

> - for the STALL response injection I would tend to use a new VFIO region
> for responses. At the moment there is a single VFIO region for reporting
> the fault.

Sure. That will be much cleaner and probably improve the context switch
latency. Another thing I noted with STALL is that pasid_valid flag needs to be
taken care in the SVA kernel path. 

"iommu: Remove pasid validity check for STALL model page response msg"
Not sure this one is a proper way to handle this.
 
> On QEMU side:
> - I am currently working on 3.2 range invalidation support which is
> needed for DPDK/VFIO
> - While at it I will look at how to incrementally introduce some of the
> features you need in this series.

Ok. 

Thanks for taking a look at the POC.

Cheers,
Shameer

RE: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2020-05-07 Thread Shameerali Kolothum Thodi

Hi Eric,

> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: 30 April 2020 10:38
> To: 'Auger Eric' ; Zhangfei Gao
> ; eric.auger@gmail.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
> Cc: jean-phili...@linaro.org; alex.william...@redhat.com;
> jacob.jun@linux.intel.com; yi.l@intel.com; peter.mayd...@linaro.org;
> t...@semihalf.com; bbhush...@marvell.com
> Subject: RE: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
> 
> Hi Eric,
> 
> > -Original Message-
> > From: Auger Eric [mailto:eric.au...@redhat.com]
> > Sent: 16 April 2020 08:45
> > To: Zhangfei Gao ; eric.auger@gmail.com;
> > io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> > k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> > j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
> > Cc: jean-phili...@linaro.org; Shameerali Kolothum Thodi
> > ; alex.william...@redhat.com;
> > jacob.jun@linux.intel.com; yi.l@intel.com; peter.mayd...@linaro.org;
> > t...@semihalf.com; bbhush...@marvell.com
> > Subject: Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
> >
> > Hi Zhangfei,
> >
> > On 4/16/20 6:25 AM, Zhangfei Gao wrote:
> > >
> > >
> > > On 2020/4/14 下午11:05, Eric Auger wrote:
> > >> This version fixes an issue observed by Shameer on an SMMU 3.2,
> > >> when moving from dual stage config to stage 1 only config.
> > >> The 2 high 64b of the STE now get reset. Otherwise, leaving the
> > >> S2TTB set may cause a C_BAD_STE error.
> > >>
> > >> This series can be found at:
> > >> https://github.com/eauger/linux/tree/v5.6-2stage-v11_10.1
> > >> (including the VFIO part)
> > >> The QEMU fellow series still can be found at:
> > >> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
> > >>
> > >> Users have expressed interest in that work and tested v9/v10:
> > >> - https://patchwork.kernel.org/cover/11039995/#23012381
> > >> - https://patchwork.kernel.org/cover/11039995/#23197235
> > >>
> > >> Background:
> > >>
> > >> This series brings the IOMMU part of HW nested paging support
> > >> in the SMMUv3. The VFIO part is submitted separately.
> > >>
> > >> The IOMMU API is extended to support 2 new API functionalities:
> > >> 1) pass the guest stage 1 configuration
> > >> 2) pass stage 1 MSI bindings
> > >>
> > >> Then those capabilities gets implemented in the SMMUv3 driver.
> > >>
> > >> The virtualizer passes information through the VFIO user API
> > >> which cascades them to the iommu subsystem. This allows the guest
> > >> to own stage 1 tables and context descriptors (so-called PASID
> > >> table) while the host owns stage 2 tables and main configuration
> > >> structures (STE).
> > >>
> > >>
> > >
> > > Thanks Eric
> > >
> > > Tested v11 on Hisilicon kunpeng920 board via hardware zip accelerator.
> > > 1. no-sva works, where guest app directly use physical address via ioctl.
> > Thank you for the testing. Glad it works for you.
> > > 2. vSVA still not work, same as v10,
> > Yes that's normal this series is not meant to support vSVM at this stage.
> >
> > I intend to add the missing pieces during the next weeks.
> 
> Thanks for that. I have made an attempt to add the vSVA based on
> your v10 + JPBs sva patches. The host kernel and Qemu changes can
> be found here[1][2].
> 
> This basically adds multiple pasid support on top of your changes.
> I have done some basic sanity testing and we have some initial success
> with the zip vf dev on our D06 platform. Please note that the STALL event is
> not yet supported though, but works fine if we mlock() guest usr mem.

I have added STALL support for our vSVA prototype and it seems to be
working(on our hardware). I have updated the kernel and qemu branches with
the same[1][2]. I should warn you though that these are prototype code and I am 
pretty
much re-using the VFIO_IOMMU_SET_PASID_TABLE interface for almost everything.
But thought of sharing, in case if it is useful somehow!.

Thanks,
Shameer

[1]https://github.com/hisilicon/kernel-dev/commits/vsva-prototype-host-v1

[2]https://github.com/hisilicon/qemu/tree/v4.2.0-2stage-rfcv6-vsva-prototype-v1

RE: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2020-04-30 Thread Shameerali Kolothum Thodi

Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 16 April 2020 08:45
> To: Zhangfei Gao ; eric.auger@gmail.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
> Cc: jean-phili...@linaro.org; Shameerali Kolothum Thodi
> ; alex.william...@redhat.com;
> jacob.jun@linux.intel.com; yi.l@intel.com; peter.mayd...@linaro.org;
> t...@semihalf.com; bbhush...@marvell.com
> Subject: Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
> 
> Hi Zhangfei,
> 
> On 4/16/20 6:25 AM, Zhangfei Gao wrote:
> >
> >
> > On 2020/4/14 下午11:05, Eric Auger wrote:
> >> This version fixes an issue observed by Shameer on an SMMU 3.2,
> >> when moving from dual stage config to stage 1 only config.
> >> The 2 high 64b of the STE now get reset. Otherwise, leaving the
> >> S2TTB set may cause a C_BAD_STE error.
> >>
> >> This series can be found at:
> >> https://github.com/eauger/linux/tree/v5.6-2stage-v11_10.1
> >> (including the VFIO part)
> >> The QEMU fellow series still can be found at:
> >> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
> >>
> >> Users have expressed interest in that work and tested v9/v10:
> >> - https://patchwork.kernel.org/cover/11039995/#23012381
> >> - https://patchwork.kernel.org/cover/11039995/#23197235
> >>
> >> Background:
> >>
> >> This series brings the IOMMU part of HW nested paging support
> >> in the SMMUv3. The VFIO part is submitted separately.
> >>
> >> The IOMMU API is extended to support 2 new API functionalities:
> >> 1) pass the guest stage 1 configuration
> >> 2) pass stage 1 MSI bindings
> >>
> >> Then those capabilities gets implemented in the SMMUv3 driver.
> >>
> >> The virtualizer passes information through the VFIO user API
> >> which cascades them to the iommu subsystem. This allows the guest
> >> to own stage 1 tables and context descriptors (so-called PASID
> >> table) while the host owns stage 2 tables and main configuration
> >> structures (STE).
> >>
> >>
> >
> > Thanks Eric
> >
> > Tested v11 on Hisilicon kunpeng920 board via hardware zip accelerator.
> > 1. no-sva works, where guest app directly use physical address via ioctl.
> Thank you for the testing. Glad it works for you.
> > 2. vSVA still not work, same as v10,
> Yes that's normal this series is not meant to support vSVM at this stage.
> 
> I intend to add the missing pieces during the next weeks.

Thanks for that. I have made an attempt to add the vSVA based on 
your v10 + JPBs sva patches. The host kernel and Qemu changes can 
be found here[1][2].

This basically adds multiple pasid support on top of your changes.
I have done some basic sanity testing and we have some initial success
with the zip vf dev on our D06 platform. Please note that the STALL event is
not yet supported though, but works fine if we mlock() guest usr mem.

I also noted that Intel patches for vSVA has couple of changes in the vfio 
interfaces
and hope there will be a convergence soon. Please let me know your plans
of a respin of this series and see whether incorporating the changes for 
multiple
pasid make sense or not for now.

Thanks,
Shameer

[1]https://github.com/hisilicon/qemu/tree/v4.2.0-2stage-rfcv6-vsva-prototype-v1
[2]https://github.com/hisilicon/kernel-dev/tree/vsva-prototype-host-v1

> Thanks
> 
> Eric
> > 3.  the v10 issue reported by Shameer has been solved,  first start qemu
> > with  iommu=smmuv3, then start qemu without  iommu=smmuv3
> > 4. no-sva also works without  iommu=smmuv3
> >
> > Test details in https://docs.qq.com/doc/DRU5oR1NtUERseFNL
> >
> > Thanks
> >

RE: [PATCH v8 0/6] vfio/type1: Add support for valid iova list management

2019-08-23 Thread Shameerali Kolothum Thodi

Hi Alex,

A gentle ping on this. Please let me know.

Thanks,
Shameer

> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: 23 July 2019 17:07
> To: alex.william...@redhat.com; eric.au...@redhat.com
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> io...@lists.linux-foundation.org; Linuxarm ; John
> Garry ; xuwei (O) ;
> kevin.t...@intel.com; Shameerali Kolothum Thodi
> 
> Subject: [PATCH v8 0/6] vfio/type1: Add support for valid iova list management
> 
> This is to revive this series which almost made to 4.18 but got dropped
> as Alex found an issue[1] with IGD and USB devices RMRR region being
> reported as reserved regions.
> 
> Thanks to Eric for his work here[2]. It provides a way to exclude
> these regions while reporting the valid iova regions and this respin
> make use of that.
> 
> Please note that I don't have a platform to verify the reported RMRR
> issue and appreciate testing on those platforms.
> 
> Thanks,
> Shameer
> 
> [1] https://lkml.org/lkml/2018/6/5/760
> [2] https://lore.kernel.org/patchwork/cover/1083072/
> 
> v7-->v8
>   -Rebased to 5.3-rc1
>   -Addressed comments from Alex and Eric. Please see
>individual patch history.
>   -Added Eric's R-by to patches 4/5/6
> 
> v6-->v7
>  -Rebased to 5.2-rc6 + Eric's patches
>  -Added logic to exclude IOMMU_RESV_DIRECT_RELAXABLE reserved memory
>   region type(patch #2).
>  -Dropped patch #4 of v6 as it is already part of mainline.
>  -Addressed "container with only an mdev device will have an empty list"
>   case(patches 4/6 & 5/6 - Suggested by Alex)
> 
> Old
> 
> This series introduces an iova list associated with a vfio
> iommu. The list is kept updated taking care of iommu apertures,
> and reserved regions. Also this series adds checks for any conflict
> with existing dma mappings whenever a new device group is attached to
> the domain.
> 
> User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO
> ioctl capability chains. Any dma map request outside the valid iova
> range will be rejected.
> 
> v5 --> v6
> 
>  -Rebased to 4.17-rc1
>  -Changed the ordering such that previous patch#7 "iommu/dma: Move
>   PCI window region reservation back...")  is now patch #4. This
>   will avoid any bisection issues pointed out by Alex.
>  -Added Robins's Reviewed-by tag for patch#4
> 
> v4 --> v5
> Rebased to next-20180315.
> 
>  -Incorporated the corner case bug fix suggested by Alex to patch #5.
>  -Based on suggestions by Alex and Robin, added patch#7. This
>   moves the PCI window  reservation back in to DMA specific path.
>   This is to fix the issue reported by Eric[1].
> 
> v3 --> v4
>  Addressed comments received for v3.
>  -dma_addr_t instead of phys_addr_t
>  -LIST_HEAD() usage.
>  -Free up iova_copy list in case of error.
>  -updated logic in filling the iova caps info(patch #5)
> 
> RFCv2 --> v3
>  Removed RFC tag.
>  Addressed comments from Alex and Eric:
>  - Added comments to make iova list management logic more clear.
>  - Use of iova list copy so that original is not altered in
>case of failure.
> 
> RFCv1 --> RFCv2
>  Addressed comments from Alex:
> -Introduced IOVA list management and added checks for conflicts with
>  existing dma map entries during attach/detach.
> 
> Shameer Kolothum (6):
>   vfio/type1: Introduce iova list and add iommu aperture validity check
>   vfio/type1: Check reserved region conflict and update iova list
>   vfio/type1: Update iova list on detach
>   vfio/type1: check dma map request is within a valid iova range
>   vfio/type1: Add IOVA range capability support
>   vfio/type1: remove duplicate retrieval of reserved regions
> 
>  drivers/vfio/vfio_iommu_type1.c | 518 +++-
>  include/uapi/linux/vfio.h   |  26 +-
>  2 files changed, 531 insertions(+), 13 deletions(-)
> 
> --
> 2.17.1
>

RE: [PATCH v2] iommu: revisit iommu_insert_resv_region() implementation

2019-08-02 Thread Shameerali Kolothum Thodi

Hi Eric,

> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: 01 August 2019 17:00
> To: eric.auger@gmail.com; eric.au...@redhat.com; j...@8bytes.org;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> dw...@infradead.org; Shameerali Kolothum Thodi
> ; alex.william...@redhat.com;
> robin.mur...@arm.com; h...@infradead.org
> Subject: [PATCH v2] iommu: revisit iommu_insert_resv_region()
> implementation
> 
> Current implementation is recursive and in case of allocation
> failure the existing @regions list is altered. A non recursive
> version looks better for maintainability and simplifies the
> error handling. We use a separate stack for overlapping segment
> merging. The elements are sorted by start address and then by
> type, if their start address match.
> 
> Note this new implementation may change the region order of
> appearance in /sys/kernel/iommu_groups//reserved_regions
> files but this order has never been documented, see
> commit bc7d12b91bd3 ("iommu: Implement reserved_regions
> iommu-group sysfs file").

I rerun this on D05 and seems to retain the order for msi type as before.

estuary:/$ cat /sys/kernel/iommu_groups/3/reserved_regions
0x0800 0x080f msi
0xc601 0xc601 msi

FWIW,

Tested-by: Shameer Kolothum 

Cheers,
Shameer

 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v1 -> v2:
> - adapt the algo so that we don't need to move elements of
>   other types to different list and sort by address and then by
>   type
> ---
>  drivers/iommu/iommu.c | 107 +++---
>  1 file changed, 59 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 0c674d80c37f..4257b179fa54 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -229,60 +229,71 @@ static ssize_t iommu_group_show_name(struct
> iommu_group *group, char *buf)
>   * @new: new region to insert
>   * @regions: list of regions
>   *
> - * The new element is sorted by address with respect to the other
> - * regions of the same type. In case it overlaps with another
> - * region of the same type, regions are merged. In case it
> - * overlaps with another region of different type, regions are
> - * not merged.
> + * Elements are sorted by start address and overlapping segments
> + * of the same type are merged.
>   */
> -static int iommu_insert_resv_region(struct iommu_resv_region *new,
> - struct list_head *regions)
> +int iommu_insert_resv_region(struct iommu_resv_region *new,
> +  struct list_head *regions)
>  {
> - struct iommu_resv_region *region;
> - phys_addr_t start = new->start;
> - phys_addr_t end = new->start + new->length - 1;
> - struct list_head *pos = regions->next;
> + struct iommu_resv_region *iter, *tmp, *nr, *top;
> + struct list_head stack;
> + bool added = false;
> 
> - while (pos != regions) {
> - struct iommu_resv_region *entry =
> - list_entry(pos, struct iommu_resv_region, list);
> - phys_addr_t a = entry->start;
> - phys_addr_t b = entry->start + entry->length - 1;
> - int type = entry->type;
> + INIT_LIST_HEAD();
> 
> - if (end < a) {
> - goto insert;
> - } else if (start > b) {
> - pos = pos->next;
> - } else if ((start >= a) && (end <= b)) {
> - if (new->type == type)
> - return 0;
> - else
> - pos = pos->next;
> - } else {
> - if (new->type == type) {
> - phys_addr_t new_start = min(a, start);
> - phys_addr_t new_end = max(b, end);
> - int ret;
> -
> - list_del(>list);
> - entry->start = new_start;
> - entry->length = new_end - new_start + 1;
> - ret = iommu_insert_resv_region(entry, regions);
> - kfree(entry);
> - return ret;
> - } else {
> - pos = pos->next;
> - }
> - }
> - }
> -insert:
> - region = iommu_alloc_resv_region(new->start, new->length,
> -  new->prot, new->

RE: [PATCH v7 4/4] perf/smmuv3: Enable HiSilicon Erratum 162001800 quirk

2019-04-04 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: Will Deacon [mailto:will.dea...@arm.com]
> Sent: 04 April 2019 16:47
> To: Shameerali Kolothum Thodi 
> Cc: lorenzo.pieral...@arm.com; robin.mur...@arm.com;
> andrew.mur...@arm.com; jean-philippe.bruc...@arm.com;
> mark.rutl...@arm.com; Guohanjun (Hanjun Guo) ;
> John Garry ; pa...@codeaurora.org;
> vkil...@codeaurora.org; rruig...@codeaurora.org; linux-a...@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Linuxarm
> ; neil.m.lee...@gmail.com
> Subject: Re: [PATCH v7 4/4] perf/smmuv3: Enable HiSilicon Erratum 162001800
> quirk
> 
> On Tue, Mar 26, 2019 at 03:17:53PM +, Shameer Kolothum wrote:
> > HiSilicon erratum 162001800 describes the limitation of
> > SMMUv3 PMCG implementation on HiSilicon Hip08 platforms.
> >
> > On these platforms, the PMCG event counter registers
> > (SMMU_PMCG_EVCNTRn) are read only and as a result it
> > is not possible to set the initial counter period value
> > on event monitor start.
> >
> > To work around this, the current value of the counter
> > is read and used for delta calculations. OEM information
> > from ACPI header is used to identify the affected hardware
> > platforms.
> >
> > Signed-off-by: Shameer Kolothum 
> > Reviewed-by: Hanjun Guo 
> > Reviewed-by: Robin Murphy 
> > ---
> >  drivers/acpi/arm64/iort.c | 16 ++-
> >  drivers/perf/arm_smmuv3_pmu.c | 48
> ---
> >  include/linux/acpi_iort.h |  1 +
> >  3 files changed, 57 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> > index e2c9b26..4dc68de 100644
> > --- a/drivers/acpi/arm64/iort.c
> > +++ b/drivers/acpi/arm64/iort.c
> > @@ -1366,9 +1366,23 @@ static void __init
> arm_smmu_v3_pmcg_init_resources(struct resource *res,
> >ACPI_EDGE_SENSITIVE, [2]);
> >  }
> >
> > +static struct acpi_platform_list pmcg_plat_info[] __initdata = {
> > +   /* HiSilicon Hip08 Platform */
> > +   {"HISI  ", "HIP08   ", 0, ACPI_SIG_IORT, greater_than_or_equal, 0,
> 
> Passing integer constant 0 for the reason feels wrong to me. I'm going to
> change it to "Erratum #162001800" and also add an entry to
> silicon-errata.txt.
> 
> Please shout if that's not ok.

Thanks Will for taking a look at this series. The proposed changes are fine to 
me.

Shameer

RE: [PATCH v7 2/4] perf/smmuv3: Add arm64 smmuv3 pmu driver

2019-03-26 Thread Shameerali Kolothum Thodi

Hi Robin,

> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 26 March 2019 16:58
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: andrew.mur...@arm.com; jean-philippe.bruc...@arm.com;
> will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v7 2/4] perf/smmuv3: Add arm64 smmuv3 pmu driver
> 
> Hi Shameer,
> 
> On 26/03/2019 15:17, Shameer Kolothum wrote:
> [...]
> > +static int smmu_pmu_apply_event_filter(struct smmu_pmu *smmu_pmu,
> > +  struct perf_event *event, int idx)
> > +{
> > +   u32 span, sid;
> > +   unsigned int num_ctrs = smmu_pmu->num_counters;
> > +   bool filter_en = !!get_filter_enable(event);
> > +
> > +   span = filter_en ? get_filter_span(event) :
> > +  SMMU_PMCG_DEFAULT_FILTER_SPAN;
> > +   sid = filter_en ? get_filter_stream_id(event) :
> > +  SMMU_PMCG_DEFAULT_FILTER_SID;
> > +
> > +   /* Support individual filter settings */
> > +   if (!smmu_pmu->global_filter) {
> > +   smmu_pmu_set_event_filter(event, idx, span, sid);
> > +   return 0;
> > +   }
> > +
> > +   /* Requested settings same as current global settings*/
> > +   if (span == smmu_pmu->global_filter_span &&
> > +   sid == smmu_pmu->global_filter_sid)
> > +   return 0;
> > +
> > +   if (!bitmap_empty(smmu_pmu->used_counters, num_ctrs))
> > +   return -EAGAIN;
> > +
> > +   if (idx == 0) {
> > +   smmu_pmu_set_event_filter(event, idx, span, sid);
> > +   smmu_pmu->global_filter_span = span;
> > +   smmu_pmu->global_filter_sid = sid;
> > +   return 0;
> > +   }
> 
> When I suggested dropping the check of idx, I did mean removing it
> entirely, not just moving it further down ;)

Ah..I must confess that I was slightly confused by that suggestion and 
thought that you are making a case for code being more clear to read :)
 
> Nothing to worry about though, I'll just leave this here for Will to
> consider applying on top or squashing.

Thanks for that.

Cheers,
Shameer

> Thanks,
> Robin.
> 
> ->8-
> From: Robin Murphy 
> Subject: [PATCH] perf/smmuv3: Relax global filter constraint a little
> 
> Although the current behaviour of smmu_pmu_get_event_idx() effectively
> ensures that the first-allocated counter will be counter 0, there's no
> need to strictly enforce that in smmu_pmu_apply_event_filter(). All that
> matters is that we only ever touch the global filter settings in
> SMMU_PMCG_SMR0 and SMMU_PMCG_EVTYPER0 while no counters are
> active.
> 
> Signed-off-by: Robin Murphy 
> ---
>   drivers/perf/arm_smmuv3_pmu.c | 11 ---
>   1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/perf/arm_smmuv3_pmu.c
> b/drivers/perf/arm_smmuv3_pmu.c
> index 6b3c0ed7ad71..23045ead6de1 100644
> --- a/drivers/perf/arm_smmuv3_pmu.c
> +++ b/drivers/perf/arm_smmuv3_pmu.c
> @@ -286,14 +286,11 @@ static int smmu_pmu_apply_event_filter(struct
> smmu_pmu *smmu_pmu,
>   if (!bitmap_empty(smmu_pmu->used_counters, num_ctrs))
>   return -EAGAIN;
> 
> - if (idx == 0) {
> - smmu_pmu_set_event_filter(event, idx, span, sid);
> - smmu_pmu->global_filter_span = span;
> - smmu_pmu->global_filter_sid = sid;
> - return 0;
> - }
> + smmu_pmu_set_event_filter(event, 0, span, sid);
> + smmu_pmu->global_filter_span = span;
> + smmu_pmu->global_filter_sid = sid;
> 
> - return -EAGAIN;
> + return 0;
>   }
> 
>   static int smmu_pmu_get_event_idx(struct smmu_pmu *smmu_pmu,
> --
> 2.20.1.dirty

RE: [PATCH v6 2/4] perf: add arm64 smmuv3 pmu driver

2019-03-25 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: linux-arm-kernel [mailto:linux-arm-kernel-boun...@lists.infradead.org]
> On Behalf Of Robin Murphy
> Sent: 21 March 2019 15:04
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; jean-philippe.bruc...@arm.com;
> pa...@codeaurora.org; John Garry ;
> will.dea...@arm.com; rruig...@codeaurora.org; Linuxarm
> ; linux-kernel@vger.kernel.org;
> linux-a...@vger.kernel.org; Guohanjun (Hanjun Guo)
> ; andrew.mur...@arm.com;
> linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH v6 2/4] perf: add arm64 smmuv3 pmu driver

[...]

> Ah, apologies for leading you wrong on this, but it has turned out to be
> bogus - perf_pmu_register() does things for which preemption should not
> be disabled, and it flares up particularly on PREEMPT_RT. For now, I
> think the best thing to do is to bring the put_cpu() call up here (or
> just use raw_smp_processor_id() instead) and accept that those
> vanishingly-unlikely-in-practice race conditions exist until someone can
> make the registration dance more robust in the perf core itself.
> 
> Beyond that, though, I'm trusting that everything I didn't comment on
> last time and doesn't appear at a glance to have changed is still good,
> so with the comments above addressed,
> 
> Reviewed-by: Robin Murphy 
> 
> FYI, both Will and Mark are out for a while, so whilst I expect v7
> should be good to merge, don't expect any maintainer final say for at
> least a couple of weeks yet.
> 

Thanks Robin. I will address the comments and sent out v7 soon.

Cheers,
Shameer

RE: [PATCH v4] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables

2019-02-19 Thread Shameerali Kolothum Thodi

Hi Marc,

A gentle reminder on this one...

Thanks,
Shameer

> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of Shameer
> Kolothum
> Sent: 14 January 2019 09:50
> To: marc.zyng...@arm.com; linux-kernel@vger.kernel.org
> Cc: gkulka...@marvell.com; suzuki.poul...@arm.com; Linuxarm
> ; robert.rich...@cavium.com;
> shank...@codeaurora.org; linux-arm-ker...@lists.infradead.org
> Subject: [PATCH v4] irqchip: gicv3-its: Use NUMA aware memory allocation for
> ITS tables
> 
> From: Shanker Donthineni 
> 
> The NUMA node information is visible to ITS driver but not being used
> other than handling hardware errata. ITS/GICR hardware accesses to the
> local NUMA node is usually quicker than the remote NUMA node. How slow
> the remote NUMA accesses are depends on the implementation details.
> 
> This patch allocates memory for ITS management tables and command
> queue from the corresponding NUMA node using the appropriate NUMA
> aware functions. This change improves the performance of the ITS
> tables read latency on systems where it has more than one ITS block,
> and with the slower inter node accesses.
> 
> Apache Web server benchmarking using ab tool on a HiSilicon D06
> board with multiple numa mem nodes shows Time per request and
> Transfer rate improvements of ~3.6% with this patch.
> 
> Signed-off-by: Shanker Donthineni 
> Signed-off-by: Hanjun Guo 
> Signed-off-by: Shameer Kolothum 
> Reviewed-by: Ganapatrao Kulkarni 
> ---
> 
> This is to revive the patch originally sent by Shanker[1] and
> to back it up with a benchmark test. Any further testing of
> this is most welcome.
> 
> v3-->v4
> -Addressed comments on alloc_pages_node() and page_address() usage.
> -Rebased on 5.0-rc1
> -Added Ganapatrao's R-by.
> 
> v2-->v3
>  -Addressed comments to use page_address().
>  -Added Benchmark results to commit log.
>  -Removed T-by from Ganapatrao for now.
> 
> v1-->v2
>  -Edited commit text.
>  -Added Ganapatrao's tested-by.
> 
> Benchmark test details:
> 
> Test Setup:
> -D06 with dimm on node 0(Sock#0) and 3 (Sock#1).
> -ITS belongs to numa node 0.
> -Filesystem mounted on a PCIe NVMe based disk.
> -Apache server installed on D06.
> -Running ab benchmark test in concurrency mode from a remote m/c
>  connected to D06 via  hns3(PCIe) n/w port.
>  "ab -k -c 750 -n 200 http://10.202.225.188/;
> 
> Test results are avg. of 15 runs.
> 
> For 4.20-rc1  Kernel,
> 
> Time per request(mean, concurrent)  = 0.02753[ms]
> Transfer Rate = 416501[Kbytes/sec]
> 
> For 4.20-rc1 +  this patch,
> --
> Time per request(mean, concurrent)  = 0.02653[ms]
> Transfer Rate = 431954[Kbytes/sec]
> 
> % improvement ~3.6%
> 
> vmstat shows around 170K-200K interrupts per second.
> 
> ~# vmstat 1 -w
> procs ---memory-- -  -system--
>  r  b swpd freein
>  5  00 30166724  102794
>  9  00 30141828  171148
>  5  00 30150160  207185
> 13  00 30145924  175691
> 15  00 30140792  145250
> 13  00 30135556  201879
> 13  00 30134864  192391
> 10  00 30133632  168880
> 
> 
> [1] https://patchwork.kernel.org/patch/989/
> 
>  drivers/irqchip/irq-gic-v3-its.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index db20e99..5df59ad 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1737,6 +1737,7 @@ static int its_setup_baser(struct its_node *its,
> struct its_baser *baser,
>   u64 type = GITS_BASER_TYPE(val);
>   u64 baser_phys, tmp;
>   u32 alloc_pages;
> + struct page *page;
>   void *base;
> 
>  retry_alloc_baser:
> @@ -1749,10 +1750,11 @@ static int its_setup_baser(struct its_node *its,
> struct its_baser *baser,
>   order = get_order(GITS_BASER_PAGES_MAX * psz);
>   }
> 
> - base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> - if (!base)
> + page = alloc_pages_node(its->numa_node, GFP_KERNEL | __GFP_ZERO,
> order);
> + if (!page)
>   return -ENOMEM;
> 
> + base = (void *)page_address(page);
>   baser_phys = virt_to_phys(base);
> 
>   /* Check if the physical address of the memory is above 48bits */
> @@ -2236,7 +2238,8 @@ static struct its_baser *its_get_baser(struct
> its_node *its, u32 type)
>   return NULL;
>  }
> 
> -static bool its_alloc_table_entry(struct its_baser *baser, u32 id)
> +static bool its_alloc_table_entry(struct its_node *its,
> +   struct its_baser *baser, u32 id)
>  {
>   struct page *page;
>   u32 esz, idx;
> @@ -2256,7 +2259,8 @@ static bool

RE: [PATCH v6 1/4] acpi: arm64: add iort support for PMCG

2019-02-18 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: Lorenzo Pieralisi [mailto:lorenzo.pieral...@arm.com]
> Sent: 15 February 2019 11:40
> To: Shameerali Kolothum Thodi 
> Cc: robin.mur...@arm.com; andrew.mur...@arm.com;
> jean-philippe.bruc...@arm.com; will.dea...@arm.com;
> mark.rutl...@arm.com; Guohanjun (Hanjun Guo) ;
> John Garry ; pa...@codeaurora.org;
> vkil...@codeaurora.org; rruig...@codeaurora.org; linux-a...@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Linuxarm
> ; neil.m.lee...@gmail.com
> Subject: Re: [PATCH v6 1/4] acpi: arm64: add iort support for PMCG
> 
[...]

> > +/*
> > + * PMCG model identifiers for use in smmu pmu driver. Please note
> > + * that, this is not part of the IORT specification.
> 
> And it is a Linux internal tag that has nothing to do with HW, it is just 
> fabricated
> for matching a driver, I would like to have this clarified in the comment 
> please.
> 
> I would have avoided adding another hook to differentiate platform data but
> given that it is self-contained in IORT code that should be fine for the sake 
> of
> making progress:
> 
> Acked-by: Lorenzo Pieralisi 

Thanks. I will wait for review of main driver patches and then will sent out a
revised one incorporating your comments on this.

Thanks,
Shameer

> > + */
> > +#define IORT_SMMU_V3_PMCG_GENERIC0x /* Generic
> SMMUv3 PMCG */
> > +
> >  int iort_register_domain_token(int trans_id, phys_addr_t base,
> >struct fwnode_handle *fw_node);  void
> > iort_deregister_domain_token(int trans_id);
> > --
> > 2.7.4
> >
> >

RE: [RFC PATCH v2 0/4] mm, memory_hotplug: allocate memmap from hotadded memory

2019-02-12 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: Jonathan Cameron
> Sent: 12 February 2019 12:47
> To: Oscar Salvador 
> Cc: linux...@kvack.org; mho...@suse.com; dan.j.willi...@intel.com;
> pavel.tatas...@microsoft.com; da...@redhat.com;
> linux-kernel@vger.kernel.org; dave.han...@intel.com; Shameerali Kolothum
> Thodi ; Linuxarm
> ; Robin Murphy 
> Subject: Re: [RFC PATCH v2 0/4] mm, memory_hotplug: allocate memmap from
> hotadded memory
> 
> On Tue, 22 Jan 2019 11:37:04 +0100
> Oscar Salvador  wrote:
> 
> > Hi,
> >
> > this is the v2 of the first RFC I sent back then in October [1].
> > In this new version I tried to reduce the complexity as much as possible,
> > plus some clean ups.
> >
> > [Testing]
> >
> > I have tested it on "x86_64" (small/big memblocks) and on "powerpc".
> > On both architectures hot-add/hot-remove online/offline operations
> > worked as expected using vmemmap pages, I have not seen any issues so far.
> > I wanted to try it out on Hyper-V/Xen, but I did not manage to.
> > I plan to do so along this week (if time allows).
> > I would also like to test it on arm64, but I am not sure I can grab
> > an arm64 box anytime soon.
> 
> Hi Oscar,
> 
> I ran tests on one of our arm64 machines. Particular machine doesn't actually
> have
> the mechanics for hotplug, so was all 'faked', but software wise it's all the
> same.
> 
> Upshot, seems to work as expected on arm64 as well.
> Tested-by: Jonathan Cameron 
> 
> Remove currently relies on some out of tree patches (and dirty hacks) due
> to the usual issue with how arm64 does pfn_valid. It's not even vaguely
> ready for upstream. I'll aim to post an informational set for anyone else
> testing in this area (it's more or less just a rebase of the patches from
> a few years ago).
> 
> +CC Shameer who has been testing the virtualization side for more details on
> that, 

Right, I have sent out a RFC series[1] to enable mem hotplug for Qemu ARM virt
platform. Using this Qemu, I ran few tests with your patches on a HiSilicon 
ARM64
platform. Looks like it is doing the job.

root@ubuntu:~# uname -a
Linux ubuntu 5.0.0-rc1-mm1-00173-g22b0744 #5 SMP PREEMPT Tue Feb 5 10:32:26 GMT 
2019 aarch64 aarch64 aarch64 GNU/Linux

root@ubuntu:~# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0
node 0 size: 981 MB
node 0 free: 854 MB
node 1 cpus:
node 1 size: 0 MB
node 1 free: 0 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
root@ubuntu:~# (qemu) 
(qemu) object_add memory-backend-ram,id=mem1,size=1G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=1
root@ubuntu:~# 
root@ubuntu:~# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0
node 0 size: 981 MB
node 0 free: 853 MB
node 1 cpus:
node 1 size: 1008 MB
node 1 free: 1008 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
root@ubuntu:~#  

FWIW,
Tested-by: Shameer Kolothum 

Thanks,
Shameer
[1] https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06966.html

and Robin who is driving forward memory hotplug in general on the arm64
> side.
> 
> Thanks,
> 
> Jonathan
> 
> >
> > [Coverletter]:
> >
> > This is another step to make the memory hotplug more usable. The primary
> > goal of this patchset is to reduce memory overhead of the hot added
> > memory (at least for SPARSE_VMEMMAP memory model). The current way
> we use
> > to populate memmap (struct page array) has two main drawbacks:
> >
> > a) it consumes an additional memory until the hotadded memory itself is
> >onlined and
> > b) memmap might end up on a different numa node which is especially true
> >for movable_node configuration.
> >
> > a) is problem especially for memory hotplug based memory "ballooning"
> >solutions when the delay between physical memory hotplug and the
> >onlining can lead to OOM and that led to introduction of hacks like auto
> >onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining
> >policy for the newly added memory")).
> >
> > b) can have performance drawbacks.
> >
> > I have also seen hot-add operations failing on powerpc due to the fact
> > that we try to use order-8 pages when populating the memmap array.
> > Given 64KB base pagesize, that is 16MB.
> > If we run out of those, we just fail the operation and we cannot add
> > more memory.
> > We could fallback to base pages as x86_64 does, but we can do better.
> >
> > One way to mitigate all these issues is to simply allocate memmap array
> > (which is the largest memory footprint of the physical memory hotplug)
> > from the hotadded memory itself. VMEMMAP memory model allows us to
>

RE: [PATCH v5 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk

2019-01-28 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 25 January 2019 18:33
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: jean-philippe.bruc...@arm.com; will.dea...@arm.com;
> mark.rutl...@arm.com; Guohanjun (Hanjun Guo) ;
> John Garry ; pa...@codeaurora.org;
> vkil...@codeaurora.org; rruig...@codeaurora.org; linux-a...@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Linuxarm
> ; neil.m.lee...@gmail.com
> Subject: Re: [PATCH v5 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> 162001800 quirk
> 
> On 30/11/2018 15:47, Shameer Kolothum wrote:
> > HiSilicon erratum 162001800 describes the limitation of
> > SMMUv3 PMCG implementation on HiSilicon Hip08 platforms.
> >
> > On these platforms, the PMCG event counter registers
> > (SMMU_PMCG_EVCNTRn) are read only and as a result it
> > is not possible to set the initial counter period value
> > on event monitor start.
> >
> > To work around this, the current value of the counter
> > is read and used for delta calculations. OEM information
> > from ACPI header is used to identify the affected hardware
> > platforms.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> >   drivers/acpi/arm64/iort.c | 30 +++---
> >   drivers/perf/arm_smmuv3_pmu.c | 35
> +--
> >   include/linux/acpi_iort.h |  3 +++
> >   3 files changed, 59 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> > index 2da08e1..d174379 100644
> > --- a/drivers/acpi/arm64/iort.c
> > +++ b/drivers/acpi/arm64/iort.c
> > @@ -1364,6 +1364,22 @@ static void __init
> arm_smmu_v3_pmcg_init_resources(struct resource *res,
> >ACPI_EDGE_SENSITIVE, [2]);
> >   }
> >
> > +static struct acpi_platform_list pmcg_evcntr_rdonly_list[] __initdata = {
> > +   /* HiSilicon Erratum 162001800 */
> > +   {"HISI  ", "HIP08   ", 0, ACPI_SIG_IORT, greater_than_or_equal},
> > +   { }
> > +};
> > +
> > +static int __init arm_smmu_v3_pmcg_add_platdata(struct platform_device
> *pdev)
> > +{
> > +   u32 options = 0;
> > +
> > +   if (acpi_match_platform_list(pmcg_evcntr_rdonly_list) >= 0)
> > +   options |= IORT_PMCG_EVCNTR_RDONLY;
> 
> Hmm, do we want IORT code to have to understand a (potential) whole load
> of PMCG-specific quirks directly, or do we really only need to pass some
> unambiguous identifier for the PMCG implementation, and let the driver
> handle the details in private - much like the SMMU model field, only
> without an external spec to constrain us :)

Could do that, but was not sure about coming up with an identifier which is not
really part of the spec and placing that in IORT code. Personally I prefer 
having
all this private to driver rather than in IORT code. But I see your point that 
this
will be more like smmu if we can pass identifiers here. 

> If we ever want to have named imp-def events, we'd need to do something
> like that anyway, so perhaps we might be better off taking that approach
> to begin with (and if so, I'd be inclined to push the basic platdata
> initialisation for "generic PMCG" into patch #1).

Ok. I will give that a try in next respin.

> > +
> > +   return platform_device_add_data(pdev, , sizeof(options));
> > +}
> > +
> >   struct iort_dev_config {
> > const char *name;
> > int (*dev_init)(struct acpi_iort_node *node);
> > @@ -1374,6 +1390,7 @@ struct iort_dev_config {
> >  struct acpi_iort_node *node);
> > void (*dev_set_proximity)(struct device *dev,
> > struct acpi_iort_node *node);
> > +   int (*dev_add_platdata)(struct platform_device *pdev);
> >   };
> >
> >   static const struct iort_dev_config iort_arm_smmu_v3_cfg __initconst = {
> > @@ -1395,6 +1412,7 @@ static const struct iort_dev_config
> iort_arm_smmu_v3_pmcg_cfg __initconst = {
> > .name = "arm-smmu-v3-pmu",
> > .dev_count_resources = arm_smmu_v3_pmcg_count_resources,
> > .dev_init_resources = arm_smmu_v3_pmcg_init_resources,
> > +   .dev_add_platdata   = arm_smmu_v3_pmcg_add_platdata,
> >   };
> >
> >   static __init const struct iort_dev_config *iort_get_dev_cfg(
> > @@ -1455,10 +1473,16 @@ static int __init
> iort_add_platform_device(struct acpi_iort_node *node,
> > goto dev_put;
> >
> > /*
> > -* Add a copy of IORT node pointer

RE: [PATCH v5 2/4] perf: add arm64 smmuv3 pmu driver

2019-01-28 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 25 January 2019 15:14
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: jean-philippe.bruc...@arm.com; will.dea...@arm.com;
> mark.rutl...@arm.com; Guohanjun (Hanjun Guo) ;
> John Garry ; pa...@codeaurora.org;
> vkil...@codeaurora.org; rruig...@codeaurora.org; linux-a...@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Linuxarm
> ; neil.m.lee...@gmail.com
> Subject: Re: [PATCH v5 2/4] perf: add arm64 smmuv3 pmu driver
> 
> On 30/11/2018 15:47, Shameer Kolothum wrote:
> > From: Neil Leeder 
> >
> > Adds a new driver to support the SMMUv3 PMU and add it into the
> > perf events framework.
> >
> > Each SMMU node may have multiple PMUs associated with it, each of
> > which may support different events.
> >
> > SMMUv3 PMCG devices are named as smmuv3_pmcg_
> where
> >  is the physical page address of the SMMU PMCG
> > wrapped to 4K boundary. For example, the PMCG at 0xff8884 is
> > named smmuv3_pmcg_ff88840
> >
> > Filtering by stream id is done by specifying filtering parameters
> > with the event. options are:
> > filter_enable- 0 = no filtering, 1 = filtering enabled
> > filter_span  - 0 = exact match, 1 = pattern match
> > filter_stream_id - pattern to filter against
> >
> > Example: perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1,
> > filter_span=1,filter_stream_id=0x42/ -a
> netperf
> >
> > Applies filter pattern 0x42 to transaction events, which means events
> > matching stream ids 0x42 & 0x43 are counted as only upper StreamID
> > bits are required to match the given filter. Further filtering
> > information is available in the SMMU documentation.
> >
> > SMMU events are not attributable to a CPU, so task mode and sampling
> > are not supported.
> >
> > Signed-off-by: Neil Leeder 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >   drivers/perf/Kconfig  |   9 +
> >   drivers/perf/Makefile |   1 +
> >   drivers/perf/arm_smmuv3_pmu.c | 778
> ++
> >   3 files changed, 788 insertions(+)
> >   create mode 100644 drivers/perf/arm_smmuv3_pmu.c
> >
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> > index 08ebaf7..92be6a3 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -52,6 +52,15 @@ config ARM_PMU_ACPI
> > depends on ARM_PMU && ACPI
> > def_bool y
> >
> > +config ARM_SMMU_V3_PMU
> > +bool "ARM SMMUv3 Performance Monitors Extension"
> > +depends on ARM64 && ACPI && ARM_SMMU_V3
> > +  help
> > +  Provides support for the SMMU version 3 performance monitor unit
> (PMU)
> > +  on ARM-based systems.
> > +  Adds the SMMU PMU into the perf events subsystem for
> > +  monitoring SMMU performance events.
> > +
> >   config ARM_DSU_PMU
> > tristate "ARM DynamIQ Shared Unit (DSU) PMU"
> > depends on ARM64
> > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> > index b3902bd..f10a932 100644
> > --- a/drivers/perf/Makefile
> > +++ b/drivers/perf/Makefile
> > @@ -4,6 +4,7 @@ obj-$(CONFIG_ARM_CCN) += arm-ccn.o
> >   obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o
> >   obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
> >   obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
> > +obj-$(CONFIG_ARM_SMMU_V3_PMU) += arm_smmuv3_pmu.o
> >   obj-$(CONFIG_HISI_PMU) += hisilicon/
> >   obj-$(CONFIG_QCOM_L2_PMU) += qcom_l2_pmu.o
> >   obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
> > diff --git a/drivers/perf/arm_smmuv3_pmu.c
> b/drivers/perf/arm_smmuv3_pmu.c
> > new file mode 100644
> > index 000..fb9dcd8
> > --- /dev/null
> > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > @@ -0,0 +1,778 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * This driver adds support for perf events to use the Performance
> > + * Monitor Counter Groups (PMCG) associated with an SMMUv3 node
> > + * to monitor that node.
> > + *
> > + * SMMUv3 PMCG devices are named as smmuv3_pmcg_
> where
> > + *  is the physical page address of the SMMU PMCG
> wrapped
> > + * to 4K boundary. For example, the PMCG at 0xff8884 is named
> > + * smmuv3_pmcg_ff88840
> > + *
> > + * Filtering by stream id is done by specifying filtering parameters
> > + * with

RE: [PATCH v5 2/4] perf: add arm64 smmuv3 pmu driver

2019-01-25 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 24 January 2019 18:25
> To: Andrew Murray ; Shameerali Kolothum Thodi
> 
> Cc: lorenzo.pieral...@arm.com; mark.rutl...@arm.com;
> vkil...@codeaurora.org; neil.m.lee...@gmail.com;
> jean-philippe.bruc...@arm.com; pa...@codeaurora.org; John Garry
> ; will.dea...@arm.com; rruig...@codeaurora.org;
> Linuxarm ; linux-kernel@vger.kernel.org;
> linux-a...@vger.kernel.org; Guohanjun (Hanjun Guo)
> ; linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH v5 2/4] perf: add arm64 smmuv3 pmu driver
> 
> On 23/01/2019 12:14, Andrew Murray wrote:
> [...]
> >>>> +static inline void smmu_pmu_counter_set_value(struct smmu_pmu
> >>> *smmu_pmu,
> >>>> +  u32 idx, u64 value)
> >>>> +{
> >>>> +if (smmu_pmu->counter_mask & BIT(32))
> >>>> +writeq(value, smmu_pmu->reloc_base +
> SMMU_PMCG_EVCNTR(idx,
> >>> 8));
> >>>> +else
> >>>> +writel(value, smmu_pmu->reloc_base +
> SMMU_PMCG_EVCNTR(idx,
> >>> 4));
> >>>
> >>> The arm64 IO macros use __force u32 and so it's probably OK to provide a
> 64
> >>> bit
> >>> value to writel. But you could use something like lower_32_bits for 
> >>> clarity.
> >>
> >> Yes, macro uses __force u32. I will change it to make it more clear though.
> 
> To be pedantic, the first cast which the value actually undergoes is to
> (__u32) ;)
> 
> Ultimately, __raw_writel() takes a u32, so in that sense it's never a
> problem to pass a u64, since unsigned truncation is well-defined in the
> C standard. The casting involved in the I/O accessors is mostly about
> going to an endian-specific type and back again.
> 
> [...]
> >>>> +static void smmu_pmu_event_start(struct perf_event *event, int flags)
> >>>> +{
> >>>> +struct smmu_pmu *smmu_pmu = to_smmu_pmu(event->pmu);
> >>>> +struct hw_perf_event *hwc = >hw;
> >>>> +int idx = hwc->idx;
> >>>> +u32 filter_span, filter_sid;
> >>>> +u32 evtyper;
> >>>> +
> >>>> +hwc->state = 0;
> >>>> +
> >>>> +smmu_pmu_set_period(smmu_pmu, hwc);
> >>>> +
> >>>> +smmu_pmu_get_event_filter(event, _span, _sid);
> >>>> +
> >>>> +evtyper = get_event(event) |
> >>>> +  filter_span << SMMU_PMCG_SID_SPAN_SHIFT;
> >>>> +
> >>>> +smmu_pmu_set_evtyper(smmu_pmu, idx, evtyper);
> >>>> +smmu_pmu_set_smr(smmu_pmu, idx, filter_sid);
> >>>> +smmu_pmu_interrupt_enable(smmu_pmu, idx);
> >>>> +smmu_pmu_counter_enable(smmu_pmu, idx);
> >>>> +}
> >>>> +
> >>>> +static void smmu_pmu_event_stop(struct perf_event *event, int flags)
> >>>> +{
> >>>> +struct smmu_pmu *smmu_pmu = to_smmu_pmu(event->pmu);
> >>>> +struct hw_perf_event *hwc = >hw;
> >>>> +int idx = hwc->idx;
> >>>> +
> >>>> +if (hwc->state & PERF_HES_STOPPED)
> >>>> +return;
> >>>> +
> >>>> +smmu_pmu_counter_disable(smmu_pmu, idx);
> >>>
> >>> Is it intentional not to call smmu_pmu_interrupt_disable here?
> >>
> >> Yes, it is. Earlier patch had the interrupt toggling and Robin pointed out 
> >> that
> >> it is not really needed as counters are anyway stopped and explicitly
> disabling
> >> may not solve the spurious interrupt case as well.
> >>
> >
> > Ah apologies for not seeing that in earlier reviews.
> 
> Hmm, I didn't exactly mean "keep enabling it every time an event is
> restarted and never disable it anywhere", though. I was thinking more
> along the lines of enabling in event_add() and disabling in event_del()
> (i.e. effectively tying it to allocation and deallocation of the counter).
> 

Right. I missed the point that it was not disabled at all!. I will rearrange it 
to _add/_del path. 

Thanks for all the comments. I am planning to send out a v6 soon. 
Between, did you get a chance to take a look at patch #4 (HiSilicon erratum 
one) ? 
Appreciate if you could take a look and let me know before v6.

Thanks,
Shameer

RE: [PATCH v5 2/4] perf: add arm64 smmuv3 pmu driver

2019-01-23 Thread Shameerali Kolothum Thodi

Hi Andrew,

Thanks for taking a look at this.

> -Original Message-
> From: Andrew Murray [mailto:andrew.mur...@arm.com]
> Sent: 22 January 2019 16:24
> To: Shameerali Kolothum Thodi 
> Cc: lorenzo.pieral...@arm.com; robin.mur...@arm.com;
> mark.rutl...@arm.com; vkil...@codeaurora.org; neil.m.lee...@gmail.com;
> jean-philippe.bruc...@arm.com; pa...@codeaurora.org; John Garry
> ; will.dea...@arm.com; rruig...@codeaurora.org;
> Linuxarm ; linux-kernel@vger.kernel.org;
> linux-a...@vger.kernel.org; Guohanjun (Hanjun Guo)
> ; linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH v5 2/4] perf: add arm64 smmuv3 pmu driver
> 
> On Fri, Nov 30, 2018 at 03:47:49PM +, Shameer Kolothum wrote:
> > From: Neil Leeder 
> >
> > Adds a new driver to support the SMMUv3 PMU and add it into the
> > perf events framework.
> >
> > Each SMMU node may have multiple PMUs associated with it, each of
> > which may support different events.
> >
> > SMMUv3 PMCG devices are named as smmuv3_pmcg_
> where
> >  is the physical page address of the SMMU PMCG
> > wrapped to 4K boundary. For example, the PMCG at 0xff8884 is
> > named smmuv3_pmcg_ff88840
> >
> > Filtering by stream id is done by specifying filtering parameters
> > with the event. options are:
> >filter_enable- 0 = no filtering, 1 = filtering enabled
> >filter_span  - 0 = exact match, 1 = pattern match
> >filter_stream_id - pattern to filter against
> >
> > Example: perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1,
> >filter_span=1,filter_stream_id=0x42/ -a netperf
> >
> > Applies filter pattern 0x42 to transaction events, which means events
> > matching stream ids 0x42 & 0x43 are counted as only upper StreamID
> > bits are required to match the given filter. Further filtering
> > information is available in the SMMU documentation.
> >
> > SMMU events are not attributable to a CPU, so task mode and sampling
> > are not supported.
> >
> > Signed-off-by: Neil Leeder 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >  drivers/perf/Kconfig  |   9 +
> >  drivers/perf/Makefile |   1 +
> >  drivers/perf/arm_smmuv3_pmu.c | 778
> ++
> >  3 files changed, 788 insertions(+)
> >  create mode 100644 drivers/perf/arm_smmuv3_pmu.c
> >
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> > index 08ebaf7..92be6a3 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -52,6 +52,15 @@ config ARM_PMU_ACPI
> > depends on ARM_PMU && ACPI
> > def_bool y
> >
> > +config ARM_SMMU_V3_PMU
> > +bool "ARM SMMUv3 Performance Monitors Extension"
> > +depends on ARM64 && ACPI && ARM_SMMU_V3
> > +  help
> > +  Provides support for the SMMU version 3 performance monitor unit
> (PMU)
> > +  on ARM-based systems.
> > +  Adds the SMMU PMU into the perf events subsystem for
> > +  monitoring SMMU performance events.
> > +
> >  config ARM_DSU_PMU
> > tristate "ARM DynamIQ Shared Unit (DSU) PMU"
> > depends on ARM64
> > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> > index b3902bd..f10a932 100644
> > --- a/drivers/perf/Makefile
> > +++ b/drivers/perf/Makefile
> > @@ -4,6 +4,7 @@ obj-$(CONFIG_ARM_CCN) += arm-ccn.o
> >  obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o
> >  obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
> >  obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
> > +obj-$(CONFIG_ARM_SMMU_V3_PMU) += arm_smmuv3_pmu.o
> >  obj-$(CONFIG_HISI_PMU) += hisilicon/
> >  obj-$(CONFIG_QCOM_L2_PMU)  += qcom_l2_pmu.o
> >  obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
> > diff --git a/drivers/perf/arm_smmuv3_pmu.c
> b/drivers/perf/arm_smmuv3_pmu.c
> > new file mode 100644
> > index 000..fb9dcd8
> > --- /dev/null
> > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > @@ -0,0 +1,778 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * This driver adds support for perf events to use the Performance
> > + * Monitor Counter Groups (PMCG) associated with an SMMUv3 node
> > + * to monitor that node.
> > + *
> > + * SMMUv3 PMCG devices are named as smmuv3_pmcg_
> where
> > + *  is the physical page address of the SMMU PMCG
> wrapped
> > + * to 4K boundary. For example, the PMCG at 0xff8884 is named
> > + * smmuv3_pmcg_ff88840
> > + *
> > + * Filtering by stream id i

RE: [PATCH v5 0/4] arm64 SMMUv3 PMU driver with IORT support

2019-01-22 Thread Shameerali Kolothum Thodi

Hi Robin/Lorenzo,

A gentle reminder on this series. Please take a look.

Thanks,
Shameer

> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of Shameer
> Kolothum
> Sent: 30 November 2018 15:48
> To: lorenzo.pieral...@arm.com; robin.mur...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; jean-philippe.bruc...@arm.com;
> pa...@codeaurora.org; will.dea...@arm.com; rruig...@codeaurora.org;
> Linuxarm ; linux-kernel@vger.kernel.org;
> linux-a...@vger.kernel.org; linux-arm-ker...@lists.infradead.org
> Subject: [PATCH v5 0/4] arm64 SMMUv3 PMU driver with IORT support
> 
> This adds a driver for the SMMUv3 PMU into the perf framework.
> It includes an IORT update to support PM Counter Groups.
> 
> This is based on the initial work done by Neil Leeder[1]
> 
> SMMUv3 PMCG devices are named as smmuv3_pmcg_
> where  is the physical page address of the SMMU PMCG.
> For example, the PMCG at 0xff8884 is named smmuv3_pmcg_ff88840
> 
> Usage example:
> For common arch supported events:
> perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1,
>  filter_span=1,filter_stream_id=0x42/ -a netperf
> 
> For IMP DEF events:
> perf stat -e smmuv3_pmcg_ff88840/event=id/ -a netperf
> 
> This is sanity tested on a HiSilicon platform that requires
> a quirk to run  it properly. As per HiSilicon erratum  #162001800,
> PMCG event counter registers (SMMU_PMCG_EVCNTRn) on HiSilicon Hip08
> platforms are read only and this prevents the software from setting
> the initial period on event start. Unfortunately we were a bit late
> in the cycle to detect this issue and now require software workaround
> for this. Patch #4 is added to this series to provide a workaround
> for this issue.
> 
> Further testing on supported platforms are very much welcome.
> 
> v4 ---> v5
> -IORT code is modified to pass the option/quirk flags to the driver
>  through platform_data (patch #4), based on Robin's comments.
> -Removed COMPILE_TEST (patch #2).
> 
> v3 --> v4
> 
> -Addressed comments from Jean and Robin.
> -Merged dma config callbacks as per Lorenzo's comments(patch #1).
> -Added handling of Global(Counter0) filter settings mode(patch #2).
> -Added patch #4 to address HiSilicon erratum  #162001800
> -
> v2 --> v3
> 
> -Addressed comments from Robin.
> -Removed iort helper function to retrieve the PMCG reference smmu.
> -PMCG devices are now named using the base address
> 
> v1 --> v2
> 
> - Addressed comments from Robin.
> - Added an helper to retrieve the associated smmu dev and named PMUs
>   to make the association visible to user.
> - Added MSI support  for overflow irq
> 
> [1]https://www.spinics.net/lists/arm-kernel/msg598591.html
> 
> Neil Leeder (2):
>   acpi: arm64: add iort support for PMCG
>   perf: add arm64 smmuv3 pmu driver
> 
> Shameer Kolothum (2):
>   perf/smmuv3: Add MSI irq support
>   perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk
> 
>  drivers/acpi/arm64/iort.c | 127 +--
>  drivers/perf/Kconfig  |   9 +
>  drivers/perf/Makefile |   1 +
>  drivers/perf/arm_smmuv3_pmu.c | 859
> ++
>  include/linux/acpi_iort.h |   3 +
>  5 files changed, 975 insertions(+), 24 deletions(-)
>  create mode 100644 drivers/perf/arm_smmuv3_pmu.c
> 
> --
> 2.7.4
> 
> 
> ___
> Linuxarm mailing list
> linux...@huawei.com
> http://hulk.huawei.com/mailman/listinfo/linuxarm

RE: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions

2019-01-17 Thread Shameerali Kolothum Thodi

Hi Pierre,

> -Original Message-
> From: Pierre Morel [mailto:pmo...@linux.ibm.com]
> Sent: 15 January 2019 17:37
> To: gerald.schae...@de.ibm.com
> Cc: j...@8bytes.org; linux-s...@vger.kernel.org;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> alex.william...@redhat.com; Shameerali Kolothum Thodi
> ; wall...@linux.ibm.com
> Subject: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions
> 
> The s390 iommu can only allow DMA transactions between the zPCI device
> entries start_dma and end_dma.
> 
> Let's declare the regions before start_dma and after end_dma as
> reserved regions using the appropriate callback in iommu_ops.
> 
> The reserved region may later be retrieved from sysfs or from
> the vfio iommu internal interface.

Just in case you are planning to use the sysfs interface to retrieve the valid 
regions, and intend to use that in Qemu vfio path, please see the discussion
here [1] (If you haven't seen this already)

Thanks,
Shameer

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03651.html
 
> This seems to me related with the work Shameer has started on
> vfio_iommu_type1 so I add Alex and Shameer to the CC list.
> 
> Pierre Morel (1):
>   iommu/s390: Declare s390 iommu reserved regions
> 
>  drivers/iommu/s390-iommu.c | 29 +
>  1 file changed, 29 insertions(+)
> 
> --
> 2.7.4

RE: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables

2019-01-11 Thread Shameerali Kolothum Thodi

Hi Suzuki,

> -Original Message-
> From: Suzuki K Poulose [mailto:suzuki.poul...@arm.com]
> Sent: 11 January 2019 09:42
> To: Shameerali Kolothum Thodi ;
> marc.zyng...@arm.com; linux-kernel@vger.kernel.org
> Cc: shank...@codeaurora.org; ganapatrao.kulka...@cavium.com;
> robert.rich...@cavium.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> linux-arm-ker...@lists.infradead.org; Linuxarm 
> Subject: Re: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation
> for ITS tables
> 

[...]

> >   drivers/irqchip/irq-gic-v3-its.c | 20 
> >   1 file changed, 12 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> > b/drivers/irqchip/irq-gic-v3-its.c
> > index db20e99..ab01061 100644
> > --- a/drivers/irqchip/irq-gic-v3-its.c
> > +++ b/drivers/irqchip/irq-gic-v3-its.c
> > @@ -1749,7 +1749,8 @@ static int its_setup_baser(struct its_node *its,
> struct its_baser *baser,
> > order = get_order(GITS_BASER_PAGES_MAX * psz);
> > }
> >
> > -   base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> > +   base = (void *)page_address(alloc_pages_node(its->numa_node,
> > +   GFP_KERNEL | __GFP_ZERO, order));
> 
> If alloc_pages_node() fails, the page_address() could crash the system.
> 
> > -   its->cmd_base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> > -   get_order(ITS_CMD_QUEUE_SZ));
> > +   its->cmd_base = (void
> *)page_address(alloc_pages_node(its->numa_node,
> > +GFP_KERNEL | __GFP_ZERO,
> > +get_order(ITS_CMD_QUEUE_SZ)));
> 
> Similarly here. We may want to handle it properly.

Ah..good catch. I will change it and rebase on top of 5.0-rc1 as suggested by 
Marc.

Thanks,
Shameer

RE: [PATCH v1 1/2] vfio:iommu: Use capabilities do report IOMMU informations

2019-01-09 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: 09 January 2019 15:37
> To: Pierre Morel 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> wall...@linux.ibm.com; coh...@redhat.com; da...@redhat.com;
> pa...@linux.ibm.com; th...@redhat.com; Shameerali Kolothum Thodi
> 
> Subject: Re: [PATCH v1 1/2] vfio:iommu: Use capabilities do report IOMMU
> informations
> 
> On Wed,  9 Jan 2019 13:41:53 +0100
> Pierre Morel  wrote:
> 
> > We add a new flag, VFIO_IOMMU_INFO_CAPABILITIES, inside the
> > vfio_iommu_type1_info to specify the support for capabilities.
> >
> > We add a new capability, with id VFIO_IOMMU_INFO_CAP_DMA
> > in the capability list of the VFIO_IOMMU_GET_INFO ioctl.
> >
> > Signed-off-by: Pierre Morel 
> > ---
> >  include/uapi/linux/vfio.h | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 8131028..54c4fcb 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -669,6 +669,15 @@ struct vfio_iommu_type1_info {
> > __u32   flags;
> >  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)   /* supported page sizes
> info */
> > __u64   iova_pgsizes;   /* Bitmap of supported page sizes */
> > +#define VFIO_IOMMU_INFO_CAPABILITIES (1 << 1)  /* support capabilities
> info */
> > +   __u64   cap_offset; /* Offset within info struct of first cap */
> > +};
> > +
> > +#define VFIO_IOMMU_INFO_CAP_DMA 1
> > +struct vfio_iommu_cap_dma {
> > +   struct vfio_info_cap_header header;
> > +   __u64   dma_start;
> > +   __u64   dma_end;
> >  };
> >
> >  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> 
> Unfortunately for most systems, a simple start and end is not really
> sufficient to describe the available IOVA space, there are often
> reserved regions intermixed, so this is not really a complete
> solution.  Shameer tried to solve this last year[1] but we ran into a
> road block that Intel IGD devices impose a reserved range of IOVA
> spaces reported to the user that conflict with existing assignment of
> this device and we haven't figured out yet how to be more selective of
> the enforcement of those reserved ranges.  Thanks,

Right. I had further discussions to unblock this at KVM forum/off-list with
Intel folks and was promised some help.

IIRC the discussion was at, Kevin/Ashok will take another look on your
proposed approach to exclude the RMRR[1] and see whether that is good
enough or not.

Kevin/Ashok,

Please update if you had a chance to look into it.

Thanks,
Shameer

[1]. https://lkml.org/lkml/2018/6/5/897

> Alex
> 
> [1] https://lkml.org/lkml/2018/4/18/293

RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk

2018-11-27 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 26 November 2018 18:45
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com; jean-philippe.bruc...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; pa...@codeaurora.org; will.dea...@arm.com;
> rruig...@codeaurora.org; Linuxarm ; linux-
> ker...@vger.kernel.org; linux-a...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> 162001800 quirk
> 
> Hi Shameer,
> 
> Sorry for the delay...
> 
> On 18/10/2018 16:27, Shameerali Kolothum Thodi wrote:
> >
> >
> >> -Original Message-
> >> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> >> Shameerali Kolothum Thodi
> >> Sent: 18 October 2018 14:34
> >> To: Robin Murphy ; lorenzo.pieral...@arm.com;
> >> jean-philippe.bruc...@arm.com
> >> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> >> neil.m.lee...@gmail.com; pa...@codeaurora.org; will.dea...@arm.com;
> >> rruig...@codeaurora.org; Linuxarm ; linux-
> >> ker...@vger.kernel.org; linux-a...@vger.kernel.org; linux-arm-
> >> ker...@lists.infradead.org
> >> Subject: RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> >> 162001800 quirk
> >>
> >> Hi Robin,
> >>
> >>> -Original Message-
> >>> From: Robin Murphy [mailto:robin.mur...@arm.com]
> >>> Sent: 18 October 2018 12:44
> >>> To: Shameerali Kolothum Thodi
> ;
> >>> lorenzo.pieral...@arm.com; jean-philippe.bruc...@arm.com
> >>> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun
> Guo)
> >>> ; John Garry ;
> >>> pa...@codeaurora.org; vkil...@codeaurora.org;
> rruig...@codeaurora.org;
> >>> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> >>> ker...@lists.infradead.org; Linuxarm ;
> >>> neil.m.lee...@gmail.com
> >>> Subject: Re: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> >>> 162001800 quirk
> >
> > [...]
> >
> >>>> +static const struct smmu_pmu_erratum_wa smmu_pmu_wa[] = {
> >>>> +{
> >>>> +.match_type = se_match_acpi_oem,
> >>>> +.id = hisi_162001800_oem_info,
> >>>> +.desc_str = "HiSilicon erratum 162001800",
> >>>> +.enable = hisi_erratum_evcntr_rdonly,
> >>>> +},
> >>>> +};
> >>>> +
> >>>
> >>> There's an awful lot of raw ACPI internals splashed about here -
> >>> couldn't at least some of it be abstracted behind the IORT code? In
> >>> fact, can't IORT just set all this stuff up in advance like it does for
> >>> SMMUs?
> >>
> >> Hmmm.. Sorry, not clear to me. You mean to say associate the IORT node
> >> with platform device and retrieve it in driver just like smmu does for
> >> "model" checks? Not sure that works here if that’s what the above meant.
> 
> I don't think there's much of interest in the actual IORT node itself,
> but I can't see that there would be any particular problem with passing
> either some implementation identifier or just a ready-made set of quirk
> flags to the PMCG driver via platdata.

Ok.

> >>>>#define to_smmu_pmu(p) (container_of(p, struct smmu_pmu, pmu))
> >>>>
> >>>>#define SMMU_PMU_EVENT_ATTR_EXTRACTOR(_name, _config,
> _start,
> >>> _end)\
> >>>> @@ -224,15 +271,20 @@ static void smmu_pmu_set_period(struct
> >>> smmu_pmu *smmu_pmu,
> >>>>  u32 idx = hwc->idx;
> >>>>  u64 new;
> >>>>
> >>>> -/*
> >>>> - * We limit the max period to half the max counter value of the
> >>> counter
> >>>> - * size, so that even in the case of extreme interrupt latency 
> >>>> the
> >>>> - * counter will (hopefully) not wrap past its initial value.
> >>>> - */
> >>>> -new = smmu_pmu->counter_mask >> 1;
> >>>> +if (smmu_pmu->options & SMMU_PMU_OPT_EVCNTR_RDONLY) {
> >>>> +new = smmu_pmu_counter_get_value(smmu_pmu, idx);
> >>>
> >>> Something's clearly missing, because if this happens to start at 0, the

RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk

2018-11-27 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 26 November 2018 18:45
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com; jean-philippe.bruc...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; pa...@codeaurora.org; will.dea...@arm.com;
> rruig...@codeaurora.org; Linuxarm ; linux-
> ker...@vger.kernel.org; linux-a...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> 162001800 quirk
> 
> Hi Shameer,
> 
> Sorry for the delay...
> 
> On 18/10/2018 16:27, Shameerali Kolothum Thodi wrote:
> >
> >
> >> -Original Message-
> >> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> >> Shameerali Kolothum Thodi
> >> Sent: 18 October 2018 14:34
> >> To: Robin Murphy ; lorenzo.pieral...@arm.com;
> >> jean-philippe.bruc...@arm.com
> >> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> >> neil.m.lee...@gmail.com; pa...@codeaurora.org; will.dea...@arm.com;
> >> rruig...@codeaurora.org; Linuxarm ; linux-
> >> ker...@vger.kernel.org; linux-a...@vger.kernel.org; linux-arm-
> >> ker...@lists.infradead.org
> >> Subject: RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> >> 162001800 quirk
> >>
> >> Hi Robin,
> >>
> >>> -Original Message-
> >>> From: Robin Murphy [mailto:robin.mur...@arm.com]
> >>> Sent: 18 October 2018 12:44
> >>> To: Shameerali Kolothum Thodi
> ;
> >>> lorenzo.pieral...@arm.com; jean-philippe.bruc...@arm.com
> >>> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun
> Guo)
> >>> ; John Garry ;
> >>> pa...@codeaurora.org; vkil...@codeaurora.org;
> rruig...@codeaurora.org;
> >>> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> >>> ker...@lists.infradead.org; Linuxarm ;
> >>> neil.m.lee...@gmail.com
> >>> Subject: Re: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> >>> 162001800 quirk
> >
> > [...]
> >
> >>>> +static const struct smmu_pmu_erratum_wa smmu_pmu_wa[] = {
> >>>> +{
> >>>> +.match_type = se_match_acpi_oem,
> >>>> +.id = hisi_162001800_oem_info,
> >>>> +.desc_str = "HiSilicon erratum 162001800",
> >>>> +.enable = hisi_erratum_evcntr_rdonly,
> >>>> +},
> >>>> +};
> >>>> +
> >>>
> >>> There's an awful lot of raw ACPI internals splashed about here -
> >>> couldn't at least some of it be abstracted behind the IORT code? In
> >>> fact, can't IORT just set all this stuff up in advance like it does for
> >>> SMMUs?
> >>
> >> Hmmm.. Sorry, not clear to me. You mean to say associate the IORT node
> >> with platform device and retrieve it in driver just like smmu does for
> >> "model" checks? Not sure that works here if that’s what the above meant.
> 
> I don't think there's much of interest in the actual IORT node itself,
> but I can't see that there would be any particular problem with passing
> either some implementation identifier or just a ready-made set of quirk
> flags to the PMCG driver via platdata.

Ok.

> >>>>#define to_smmu_pmu(p) (container_of(p, struct smmu_pmu, pmu))
> >>>>
> >>>>#define SMMU_PMU_EVENT_ATTR_EXTRACTOR(_name, _config,
> _start,
> >>> _end)\
> >>>> @@ -224,15 +271,20 @@ static void smmu_pmu_set_period(struct
> >>> smmu_pmu *smmu_pmu,
> >>>>  u32 idx = hwc->idx;
> >>>>  u64 new;
> >>>>
> >>>> -/*
> >>>> - * We limit the max period to half the max counter value of the
> >>> counter
> >>>> - * size, so that even in the case of extreme interrupt latency 
> >>>> the
> >>>> - * counter will (hopefully) not wrap past its initial value.
> >>>> - */
> >>>> -new = smmu_pmu->counter_mask >> 1;
> >>>> +if (smmu_pmu->options & SMMU_PMU_OPT_EVCNTR_RDONLY) {
> >>>> +new = smmu_pmu_counter_get_value(smmu_pmu, idx);
> >>>
> >>> Something's clearly missing, because if this happens to start at 0, the

RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

2018-11-13 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: mika.westerb...@linux.intel.com
> [mailto:mika.westerb...@linux.intel.com]
> Sent: 13 November 2018 15:08
> To: Shameerali Kolothum Thodi 
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B)
> ; Linuxarm ; Lukas
> Wunner 
> Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

[...]
 
> > Right. As I mentioned in my previous mail, I missed the fact that you are
> updating
> > the ctrl->slot_ctrl with cmd value while in my test I did my update with the
> value
> > returned by pcie_capability_read_word().
> 
> OK, I see.
> 
> > > However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in
> > > pciehp_isr().
> >
> > Ok.
> >
> > > Here's an updated patch, can you try and see if it makes any difference?
> >
> > I just tried this and it works. Thanks.
> 
> Can you still check that the previous one (without _CCIE check) works?

Yes, it works for me without _CCIE.

> > See few comments below.
> >
> > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> > > b/drivers/pci/hotplug/pciehp_hpc.c
> > > index 7dd443aea5a5..da2cbe892444 100644
> > > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller
> *ctrl,
> > > u16 cmd,
> > >   slot_ctrl |= (cmd & mask);
> > >   ctrl->cmd_busy = 1;
> > >   smp_mb();
> > > + ctrl->slot_ctrl = slot_ctrl;
> >
> > Does it make more sense if we can move this before smp_mb()?. Also I am
> not
> > sure updating the  ctrl->slot_ctrl before actually the hardware is
> programmed
> > with that value will result in any other race conditions? TBH, I am not that
> familiar
> > with this code and I leave that to you :)
> 
> Both are good questions :)
> 
> For the moving ctrl->slot_ctrl before pcie_capability_write_word(), I
> think we should be fine and this is actually more correct because if we
> are unmasking interrupts they may trigger immediately making
> pciehp_isr() find wrong values in ctrl->slot_ctrl (as can be seen in the
> issue you reported).

Ok. I was more concerned about an unsolicited event triggering the _isr
while we are modifying the ctrl->slot_ctrl. But that's ok I think as the _isr
reads the hw status anyway. 

> The smb_mb() thing is not that clear (at least to me) because it is used
> in two places in the driver and both seem to be making write to
> ctrl->cmd_busy visible to other CPUs but I don't see where we deal with
> the read part.
> 
> I may be missing something, though.

I think the read part is in wait_event_timeout() which evaluates the condition.
The wake_up is called from the pciehp_isr().  Since the flag is being updated
in both process level and interrupt handler context, smp_mb() is used. I think
the same now applies to  ctrl->slot_ctrl now as this being used in process
context and interrupt context as well.

Thanks,
Shameer

RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

2018-11-13 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: mika.westerb...@linux.intel.com
> [mailto:mika.westerb...@linux.intel.com]
> Sent: 13 November 2018 15:08
> To: Shameerali Kolothum Thodi 
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B)
> ; Linuxarm ; Lukas
> Wunner 
> Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

[...]
 
> > Right. As I mentioned in my previous mail, I missed the fact that you are
> updating
> > the ctrl->slot_ctrl with cmd value while in my test I did my update with the
> value
> > returned by pcie_capability_read_word().
> 
> OK, I see.
> 
> > > However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in
> > > pciehp_isr().
> >
> > Ok.
> >
> > > Here's an updated patch, can you try and see if it makes any difference?
> >
> > I just tried this and it works. Thanks.
> 
> Can you still check that the previous one (without _CCIE check) works?

Yes, it works for me without _CCIE.

> > See few comments below.
> >
> > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> > > b/drivers/pci/hotplug/pciehp_hpc.c
> > > index 7dd443aea5a5..da2cbe892444 100644
> > > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller
> *ctrl,
> > > u16 cmd,
> > >   slot_ctrl |= (cmd & mask);
> > >   ctrl->cmd_busy = 1;
> > >   smp_mb();
> > > + ctrl->slot_ctrl = slot_ctrl;
> >
> > Does it make more sense if we can move this before smp_mb()?. Also I am
> not
> > sure updating the  ctrl->slot_ctrl before actually the hardware is
> programmed
> > with that value will result in any other race conditions? TBH, I am not that
> familiar
> > with this code and I leave that to you :)
> 
> Both are good questions :)
> 
> For the moving ctrl->slot_ctrl before pcie_capability_write_word(), I
> think we should be fine and this is actually more correct because if we
> are unmasking interrupts they may trigger immediately making
> pciehp_isr() find wrong values in ctrl->slot_ctrl (as can be seen in the
> issue you reported).

Ok. I was more concerned about an unsolicited event triggering the _isr
while we are modifying the ctrl->slot_ctrl. But that's ok I think as the _isr
reads the hw status anyway. 

> The smb_mb() thing is not that clear (at least to me) because it is used
> in two places in the driver and both seem to be making write to
> ctrl->cmd_busy visible to other CPUs but I don't see where we deal with
> the read part.
> 
> I may be missing something, though.

I think the read part is in wait_event_timeout() which evaluates the condition.
The wake_up is called from the pciehp_isr().  Since the flag is being updated
in both process level and interrupt handler context, smp_mb() is used. I think
the same now applies to  ctrl->slot_ctrl now as this being used in process
context and interrupt context as well.

Thanks,
Shameer

RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

2018-11-13 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: mika.westerb...@linux.intel.com
> [mailto:mika.westerb...@linux.intel.com]
> Sent: 13 November 2018 12:59
> To: Shameerali Kolothum Thodi 
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B)
> ; Linuxarm ; Lukas
> Wunner 
> Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> 
> On Tue, Nov 13, 2018 at 12:36:20PM +, Shameerali Kolothum Thodi wrote:
> > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller
> > > *ctrl,
> > > u16 cmd,
> > >   slot_ctrl |= (cmd & mask);
> > >   ctrl->cmd_busy = 1;
> > >   smp_mb();
> > > + ctrl->slot_ctrl = slot_ctrl;
> >
> > Actually I tried this one, but it doesn't help in this case as the
> > initial
> > pcie_capability_read_word() returns the slot_ctrl without
> > PCI_EXP_SLTCTL_HPIE bit set.  It looks to me
> > pcie_enable_notification() function enables this,
> >
> > if (!pciehp_poll_mode)
> > cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;
> >
> > I don't know this is as per the spec or not as the initial cap read
> > doesn't seems to have the PCI_EXP_SLTCTL_HPIE bit set.
> 
> If I read the code right cmd value should end up in ctrl->slot_ctrl properly 
> from
> pcie_enable_notification().

Right. As I mentioned in my previous mail, I missed the fact that you are 
updating
the ctrl->slot_ctrl with cmd value while in my test I did my update with the 
value
returned by pcie_capability_read_word().
 
> However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in
> pciehp_isr().

Ok.
 
> Here's an updated patch, can you try and see if it makes any difference?

I just tried this and it works. Thanks.

See few comments below.

> diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> b/drivers/pci/hotplug/pciehp_hpc.c
> index 7dd443aea5a5..da2cbe892444 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl,
> u16 cmd,
>   slot_ctrl |= (cmd & mask);
>   ctrl->cmd_busy = 1;
>   smp_mb();
> + ctrl->slot_ctrl = slot_ctrl;

Does it make more sense if we can move this before smp_mb()?. Also I am not
sure updating the  ctrl->slot_ctrl before actually the hardware is programmed
with that value will result in any other race conditions? TBH, I am not that 
familiar
with this code and I leave that to you :)

Thanks,
Shameer

>   pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
>   ctrl->cmd_started = jiffies;
> - ctrl->slot_ctrl = slot_ctrl;
> 
>   /*
>* Controllers with the Intel CF118 and similar errata advertise @@ -
> 522,7 +522,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>* in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
>*/
>   if (pdev->current_state == PCI_D3cold ||
> - (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
> + (!(ctrl->slot_ctrl & (PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE))
> +&& !pciehp_poll_mode))
>   return IRQ_NONE;
> 
>   /*

RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

2018-11-13 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: mika.westerb...@linux.intel.com
> [mailto:mika.westerb...@linux.intel.com]
> Sent: 13 November 2018 12:59
> To: Shameerali Kolothum Thodi 
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B)
> ; Linuxarm ; Lukas
> Wunner 
> Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> 
> On Tue, Nov 13, 2018 at 12:36:20PM +, Shameerali Kolothum Thodi wrote:
> > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller
> > > *ctrl,
> > > u16 cmd,
> > >   slot_ctrl |= (cmd & mask);
> > >   ctrl->cmd_busy = 1;
> > >   smp_mb();
> > > + ctrl->slot_ctrl = slot_ctrl;
> >
> > Actually I tried this one, but it doesn't help in this case as the
> > initial
> > pcie_capability_read_word() returns the slot_ctrl without
> > PCI_EXP_SLTCTL_HPIE bit set.  It looks to me
> > pcie_enable_notification() function enables this,
> >
> > if (!pciehp_poll_mode)
> > cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;
> >
> > I don't know this is as per the spec or not as the initial cap read
> > doesn't seems to have the PCI_EXP_SLTCTL_HPIE bit set.
> 
> If I read the code right cmd value should end up in ctrl->slot_ctrl properly 
> from
> pcie_enable_notification().

Right. As I mentioned in my previous mail, I missed the fact that you are 
updating
the ctrl->slot_ctrl with cmd value while in my test I did my update with the 
value
returned by pcie_capability_read_word().
 
> However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in
> pciehp_isr().

Ok.
 
> Here's an updated patch, can you try and see if it makes any difference?

I just tried this and it works. Thanks.

See few comments below.

> diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> b/drivers/pci/hotplug/pciehp_hpc.c
> index 7dd443aea5a5..da2cbe892444 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl,
> u16 cmd,
>   slot_ctrl |= (cmd & mask);
>   ctrl->cmd_busy = 1;
>   smp_mb();
> + ctrl->slot_ctrl = slot_ctrl;

Does it make more sense if we can move this before smp_mb()?. Also I am not
sure updating the  ctrl->slot_ctrl before actually the hardware is programmed
with that value will result in any other race conditions? TBH, I am not that 
familiar
with this code and I leave that to you :)

Thanks,
Shameer

>   pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
>   ctrl->cmd_started = jiffies;
> - ctrl->slot_ctrl = slot_ctrl;
> 
>   /*
>* Controllers with the Intel CF118 and similar errata advertise @@ -
> 522,7 +522,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>* in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
>*/
>   if (pdev->current_state == PCI_D3cold ||
> - (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
> + (!(ctrl->slot_ctrl & (PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE))
> +&& !pciehp_poll_mode))
>   return IRQ_NONE;
> 
>   /*

RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

2018-11-13 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: 13 November 2018 12:36
> To: mika.westerb...@linux.intel.com
> Cc: linux-...@vger.kernel.org; Lukas Wunner ; linux-
> ker...@vger.kernel.org; Linuxarm 
> Subject: RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> 
> 
> 
> > -Original Message-
> > From: mika.westerb...@linux.intel.com
> > [mailto:mika.westerb...@linux.intel.com]
> > Sent: 13 November 2018 12:25
> > To: Shameerali Kolothum Thodi 
> > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B)
> > ; Linuxarm ; Lukas
> > Wunner 
> > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> >
> > +Lukas
> >
> > On Tue, Nov 13, 2018 at 11:45:42AM +, Shameerali Kolothum Thodi
> wrote:
> > > Hi Mika,
> >
> > Hi,
> >
> > > Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events
> > > if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1)
> > > with a vfio passthrough device seems to be broken. This is on an ARM64
> > platform.
> > >
> > > I am booting a Guest with below command line options with the intention of
> > > hot add a ixgbevf dev later,
> > >
> > > ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
> > host \
> > >  -kernel Image_4.20-rc1 \
> > >  -initrd rootfs-iperf.cpio \
> > >  -device ioh3420,id=rp1 \
> > >  -net none \
> > >  -m 4096 \
> > >  -nographic -D -d -enable-kvm \
> > >  -append "console=ttyAMA0 root=/dev/vda -m 4096 rw
> > pciehp.pciehp_debug=1
> > >   pcie_ports=native searlycon=pl011,0x900"
> > >
> > > But receives this on boot,
> > >
> > > [1.327852] pciehp :00:01.0:pcie004: Timeout
> > > on hotplug command 0x03f1 (issued 1016 msec ago)
> > > [1.335842] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > > 0x03f1 (issued 1024 msec ago)
> > > [3.847843] pciehp :00:01.0:pcie004: Failed to check link status
> > > [3.855843] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > > 0x02f1 (issued 2520 msec ago)
> > > [4.879846] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > > 0x06f1 (issued 1024 msec ago)
> > > [5.911840] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > > 0x06f1 (issued 2056 msec ago)
> > > [6.927844] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > > 0x07f1 (issued 1016 msec ago)
> > > [7.951843] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > > 0x0771 (issued 1024 msec ago)
> > >
> > > Trying to hot add using "device_addvfio-
> > pci,host=:01:10.1,id=net0,bus=rp1"
> > > doesn't work either. And if I boot the guest with an assigned device
> > > (-device vfio-pci,host=:01:10.1,id=net0,bus=rp1), I can see the dev
> listed
> > in
> > > the Guest but then hot remove doesn't work.
> > >
> > > This all works on 4.19 and bisect points to the above mentioned commit,
> > where an
> > > additional check is added in pciehp_isr(),
> > >
> > > -  * Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4).
> > > +  * Interrupts only occur in D3hot or shallower and only if enabled
> > > +  * in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
> > >*/
> > > - if (pdev->current_state == PCI_D3cold)
> > > + if (pdev->current_state == PCI_D3cold ||
> > > + (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
> > >   return IRQ_NONE;
> > >
> > > I think this doesn't work for the first time, where the cmd with
> > PCI_EXP_SLTCTL_HPIE bit set
> > > is written,
> > > pciehp_probe()
> > >   pcie_init_notification()
> > > pcie_enable_notification()
> > >pcie_do_write_cmd()
> > >
> > > to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set 
> > > once the
> > write
> > > is returned.
> > >
> > > Or else I am missing something here. Please take a look and let me know.
> >
> > Thanks for the detailed report and analysis. I think you are right and
> > the ctrl->slot_ctrl is only set after the actual value has been
> > programmed to the hardware, so if there was interrupt "pending" it will
>

RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

2018-11-13 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: 13 November 2018 12:36
> To: mika.westerb...@linux.intel.com
> Cc: linux-...@vger.kernel.org; Lukas Wunner ; linux-
> ker...@vger.kernel.org; Linuxarm 
> Subject: RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> 
> 
> 
> > -Original Message-
> > From: mika.westerb...@linux.intel.com
> > [mailto:mika.westerb...@linux.intel.com]
> > Sent: 13 November 2018 12:25
> > To: Shameerali Kolothum Thodi 
> > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B)
> > ; Linuxarm ; Lukas
> > Wunner 
> > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> >
> > +Lukas
> >
> > On Tue, Nov 13, 2018 at 11:45:42AM +, Shameerali Kolothum Thodi
> wrote:
> > > Hi Mika,
> >
> > Hi,
> >
> > > Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events
> > > if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1)
> > > with a vfio passthrough device seems to be broken. This is on an ARM64
> > platform.
> > >
> > > I am booting a Guest with below command line options with the intention of
> > > hot add a ixgbevf dev later,
> > >
> > > ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
> > host \
> > >  -kernel Image_4.20-rc1 \
> > >  -initrd rootfs-iperf.cpio \
> > >  -device ioh3420,id=rp1 \
> > >  -net none \
> > >  -m 4096 \
> > >  -nographic -D -d -enable-kvm \
> > >  -append "console=ttyAMA0 root=/dev/vda -m 4096 rw
> > pciehp.pciehp_debug=1
> > >   pcie_ports=native searlycon=pl011,0x900"
> > >
> > > But receives this on boot,
> > >
> > > [1.327852] pciehp :00:01.0:pcie004: Timeout
> > > on hotplug command 0x03f1 (issued 1016 msec ago)
> > > [1.335842] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > > 0x03f1 (issued 1024 msec ago)
> > > [3.847843] pciehp :00:01.0:pcie004: Failed to check link status
> > > [3.855843] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > > 0x02f1 (issued 2520 msec ago)
> > > [4.879846] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > > 0x06f1 (issued 1024 msec ago)
> > > [5.911840] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > > 0x06f1 (issued 2056 msec ago)
> > > [6.927844] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > > 0x07f1 (issued 1016 msec ago)
> > > [7.951843] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > > 0x0771 (issued 1024 msec ago)
> > >
> > > Trying to hot add using "device_addvfio-
> > pci,host=:01:10.1,id=net0,bus=rp1"
> > > doesn't work either. And if I boot the guest with an assigned device
> > > (-device vfio-pci,host=:01:10.1,id=net0,bus=rp1), I can see the dev
> listed
> > in
> > > the Guest but then hot remove doesn't work.
> > >
> > > This all works on 4.19 and bisect points to the above mentioned commit,
> > where an
> > > additional check is added in pciehp_isr(),
> > >
> > > -  * Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4).
> > > +  * Interrupts only occur in D3hot or shallower and only if enabled
> > > +  * in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
> > >*/
> > > - if (pdev->current_state == PCI_D3cold)
> > > + if (pdev->current_state == PCI_D3cold ||
> > > + (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
> > >   return IRQ_NONE;
> > >
> > > I think this doesn't work for the first time, where the cmd with
> > PCI_EXP_SLTCTL_HPIE bit set
> > > is written,
> > > pciehp_probe()
> > >   pcie_init_notification()
> > > pcie_enable_notification()
> > >pcie_do_write_cmd()
> > >
> > > to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set 
> > > once the
> > write
> > > is returned.
> > >
> > > Or else I am missing something here. Please take a look and let me know.
> >
> > Thanks for the detailed report and analysis. I think you are right and
> > the ctrl->slot_ctrl is only set after the actual value has been
> > programmed to the hardware, so if there was interrupt "pending" it will
>

RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

2018-11-13 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: mika.westerb...@linux.intel.com
> [mailto:mika.westerb...@linux.intel.com]
> Sent: 13 November 2018 12:25
> To: Shameerali Kolothum Thodi 
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B)
> ; Linuxarm ; Lukas
> Wunner 
> Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> 
> +Lukas
> 
> On Tue, Nov 13, 2018 at 11:45:42AM +, Shameerali Kolothum Thodi wrote:
> > Hi Mika,
> 
> Hi,
> 
> > Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events
> > if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1)
> > with a vfio passthrough device seems to be broken. This is on an ARM64
> platform.
> >
> > I am booting a Guest with below command line options with the intention of
> > hot add a ixgbevf dev later,
> >
> > ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
> host \
> >  -kernel Image_4.20-rc1 \
> >  -initrd rootfs-iperf.cpio \
> >  -device ioh3420,id=rp1 \
> >  -net none \
> >  -m 4096 \
> >  -nographic -D -d -enable-kvm \
> >  -append "console=ttyAMA0 root=/dev/vda -m 4096 rw
> pciehp.pciehp_debug=1
> >   pcie_ports=native searlycon=pl011,0x900"
> >
> > But receives this on boot,
> >
> > [1.327852] pciehp :00:01.0:pcie004: Timeout
> > on hotplug command 0x03f1 (issued 1016 msec ago)
> > [1.335842] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > 0x03f1 (issued 1024 msec ago)
> > [3.847843] pciehp :00:01.0:pcie004: Failed to check link status
> > [3.855843] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > 0x02f1 (issued 2520 msec ago)
> > [4.879846] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > 0x06f1 (issued 1024 msec ago)
> > [5.911840] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > 0x06f1 (issued 2056 msec ago)
> > [6.927844] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > 0x07f1 (issued 1016 msec ago)
> > [7.951843] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > 0x0771 (issued 1024 msec ago)
> >
> > Trying to hot add using "device_addvfio-
> pci,host=:01:10.1,id=net0,bus=rp1"
> > doesn't work either. And if I boot the guest with an assigned device
> > (-device vfio-pci,host=:01:10.1,id=net0,bus=rp1), I can see the dev 
> > listed
> in
> > the Guest but then hot remove doesn't work.
> >
> > This all works on 4.19 and bisect points to the above mentioned commit,
> where an
> > additional check is added in pciehp_isr(),
> >
> > -* Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4).
> > +* Interrupts only occur in D3hot or shallower and only if enabled
> > +* in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
> >  */
> > -   if (pdev->current_state == PCI_D3cold)
> > +   if (pdev->current_state == PCI_D3cold ||
> > +   (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
> > return IRQ_NONE;
> >
> > I think this doesn't work for the first time, where the cmd with
> PCI_EXP_SLTCTL_HPIE bit set
> > is written,
> > pciehp_probe()
> >   pcie_init_notification()
> > pcie_enable_notification()
> >pcie_do_write_cmd()
> >
> > to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set once 
> > the
> write
> > is returned.
> >
> > Or else I am missing something here. Please take a look and let me know.
> 
> Thanks for the detailed report and analysis. I think you are right and
> the ctrl->slot_ctrl is only set after the actual value has been
> programmed to the hardware, so if there was interrupt "pending" it will
> trigger immediately (just to find ctrl->slot_ctrl == 0).
> 
> I wonder if the following change helps here?
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> b/drivers/pci/hotplug/pciehp_hpc.c
> index 7dd443aea5a5..cd9eae650aa5 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl,
> u16 cmd,
>   slot_ctrl |= (cmd & mask);
>   ctrl->cmd_busy = 1;
>   smp_mb();
> + ctrl->slot_ctrl = slot_ctrl;

Actually I tried this one, but it doesn't help in this case as the initial 
pcie_capability_read_word() returns the slot_ctrl without PCI_EXP_SLTCTL_HPIE
bit set.  It looks to me  pcie_enable_notification() function enables this,

if (!pciehp_poll_mode)
cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;

I don't know this is as per the spec or not as the initial cap read doesn't 
seems to
have the PCI_EXP_SLTCTL_HPIE bit set.

Thanks,
Shameer

>   pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
>   ctrl->cmd_started = jiffies;
> - ctrl->slot_ctrl = slot_ctrl;
> 
>   /*
>* Controllers with the Intel CF118 and similar errata advertise

RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

2018-11-13 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: mika.westerb...@linux.intel.com
> [mailto:mika.westerb...@linux.intel.com]
> Sent: 13 November 2018 12:25
> To: Shameerali Kolothum Thodi 
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B)
> ; Linuxarm ; Lukas
> Wunner 
> Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> 
> +Lukas
> 
> On Tue, Nov 13, 2018 at 11:45:42AM +, Shameerali Kolothum Thodi wrote:
> > Hi Mika,
> 
> Hi,
> 
> > Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events
> > if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1)
> > with a vfio passthrough device seems to be broken. This is on an ARM64
> platform.
> >
> > I am booting a Guest with below command line options with the intention of
> > hot add a ixgbevf dev later,
> >
> > ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
> host \
> >  -kernel Image_4.20-rc1 \
> >  -initrd rootfs-iperf.cpio \
> >  -device ioh3420,id=rp1 \
> >  -net none \
> >  -m 4096 \
> >  -nographic -D -d -enable-kvm \
> >  -append "console=ttyAMA0 root=/dev/vda -m 4096 rw
> pciehp.pciehp_debug=1
> >   pcie_ports=native searlycon=pl011,0x900"
> >
> > But receives this on boot,
> >
> > [1.327852] pciehp :00:01.0:pcie004: Timeout
> > on hotplug command 0x03f1 (issued 1016 msec ago)
> > [1.335842] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > 0x03f1 (issued 1024 msec ago)
> > [3.847843] pciehp :00:01.0:pcie004: Failed to check link status
> > [3.855843] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > 0x02f1 (issued 2520 msec ago)
> > [4.879846] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > 0x06f1 (issued 1024 msec ago)
> > [5.911840] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > 0x06f1 (issued 2056 msec ago)
> > [6.927844] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > 0x07f1 (issued 1016 msec ago)
> > [7.951843] pciehp :00:01.0:pcie004: Timeout on hotplug command
> > 0x0771 (issued 1024 msec ago)
> >
> > Trying to hot add using "device_addvfio-
> pci,host=:01:10.1,id=net0,bus=rp1"
> > doesn't work either. And if I boot the guest with an assigned device
> > (-device vfio-pci,host=:01:10.1,id=net0,bus=rp1), I can see the dev 
> > listed
> in
> > the Guest but then hot remove doesn't work.
> >
> > This all works on 4.19 and bisect points to the above mentioned commit,
> where an
> > additional check is added in pciehp_isr(),
> >
> > -* Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4).
> > +* Interrupts only occur in D3hot or shallower and only if enabled
> > +* in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
> >  */
> > -   if (pdev->current_state == PCI_D3cold)
> > +   if (pdev->current_state == PCI_D3cold ||
> > +   (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
> > return IRQ_NONE;
> >
> > I think this doesn't work for the first time, where the cmd with
> PCI_EXP_SLTCTL_HPIE bit set
> > is written,
> > pciehp_probe()
> >   pcie_init_notification()
> > pcie_enable_notification()
> >pcie_do_write_cmd()
> >
> > to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set once 
> > the
> write
> > is returned.
> >
> > Or else I am missing something here. Please take a look and let me know.
> 
> Thanks for the detailed report and analysis. I think you are right and
> the ctrl->slot_ctrl is only set after the actual value has been
> programmed to the hardware, so if there was interrupt "pending" it will
> trigger immediately (just to find ctrl->slot_ctrl == 0).
> 
> I wonder if the following change helps here?
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> b/drivers/pci/hotplug/pciehp_hpc.c
> index 7dd443aea5a5..cd9eae650aa5 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl,
> u16 cmd,
>   slot_ctrl |= (cmd & mask);
>   ctrl->cmd_busy = 1;
>   smp_mb();
> + ctrl->slot_ctrl = slot_ctrl;

Actually I tried this one, but it doesn't help in this case as the initial 
pcie_capability_read_word() returns the slot_ctrl without PCI_EXP_SLTCTL_HPIE
bit set.  It looks to me  pcie_enable_notification() function enables this,

if (!pciehp_poll_mode)
cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;

I don't know this is as per the spec or not as the initial cap read doesn't 
seems to
have the PCI_EXP_SLTCTL_HPIE bit set.

Thanks,
Shameer

>   pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
>   ctrl->cmd_started = jiffies;
> - ctrl->slot_ctrl = slot_ctrl;
> 
>   /*
>* Controllers with the Intel CF118 and similar errata advertise

Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

2018-11-13 Thread Shameerali Kolothum Thodi

Hi Mika,

Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events
if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1)
with a vfio passthrough device seems to be broken. This is on an ARM64 platform.

I am booting a Guest with below command line options with the intention of
hot add a ixgbevf dev later,

./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu host \
 -kernel Image_4.20-rc1 \
 -initrd rootfs-iperf.cpio \
 -device ioh3420,id=rp1 \
 -net none \
 -m 4096 \
 -nographic -D -d -enable-kvm \
 -append "console=ttyAMA0 root=/dev/vda -m 4096 rw pciehp.pciehp_debug=1
  pcie_ports=native searlycon=pl011,0x900"

But receives this on boot,

[1.327852] pciehp :00:01.0:pcie004: Timeout 
on hotplug command 0x03f1 (issued 1016 msec ago)
[1.335842] pciehp :00:01.0:pcie004: Timeout on hotplug command
0x03f1 (issued 1024 msec ago)
[3.847843] pciehp :00:01.0:pcie004: Failed to check link status
[3.855843] pciehp :00:01.0:pcie004: Timeout on hotplug command
0x02f1 (issued 2520 msec ago)
[4.879846] pciehp :00:01.0:pcie004: Timeout on hotplug command
0x06f1 (issued 1024 msec ago)
[5.911840] pciehp :00:01.0:pcie004: Timeout on hotplug command
0x06f1 (issued 2056 msec ago)
[6.927844] pciehp :00:01.0:pcie004: Timeout on hotplug command
0x07f1 (issued 1016 msec ago)
[7.951843] pciehp :00:01.0:pcie004: Timeout on hotplug command
0x0771 (issued 1024 msec ago)

Trying to hot add using "device_addvfio-pci,host=:01:10.1,id=net0,bus=rp1"
doesn't work either. And if I boot the guest with an assigned device
(-device vfio-pci,host=:01:10.1,id=net0,bus=rp1), I can see the dev listed 
in
the Guest but then hot remove doesn't work.

This all works on 4.19 and bisect points to the above mentioned commit, where an
additional check is added in pciehp_isr(),

-* Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4).
+* Interrupts only occur in D3hot or shallower and only if enabled
+* in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
 */
-   if (pdev->current_state == PCI_D3cold)
+   if (pdev->current_state == PCI_D3cold ||
+   (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
return IRQ_NONE;

I think this doesn't work for the first time, where the cmd with 
PCI_EXP_SLTCTL_HPIE bit set
is written,
pciehp_probe()
  pcie_init_notification()
pcie_enable_notification()
   pcie_do_write_cmd()

to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set once the 
write
is returned.

Or else I am missing something here. Please take a look and let me know.

Thanks,
Shameer

Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

2018-11-13 Thread Shameerali Kolothum Thodi

Hi Mika,

Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events
if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1)
with a vfio passthrough device seems to be broken. This is on an ARM64 platform.

I am booting a Guest with below command line options with the intention of
hot add a ixgbevf dev later,

./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu host \
 -kernel Image_4.20-rc1 \
 -initrd rootfs-iperf.cpio \
 -device ioh3420,id=rp1 \
 -net none \
 -m 4096 \
 -nographic -D -d -enable-kvm \
 -append "console=ttyAMA0 root=/dev/vda -m 4096 rw pciehp.pciehp_debug=1
  pcie_ports=native searlycon=pl011,0x900"

But receives this on boot,

[1.327852] pciehp :00:01.0:pcie004: Timeout 
on hotplug command 0x03f1 (issued 1016 msec ago)
[1.335842] pciehp :00:01.0:pcie004: Timeout on hotplug command
0x03f1 (issued 1024 msec ago)
[3.847843] pciehp :00:01.0:pcie004: Failed to check link status
[3.855843] pciehp :00:01.0:pcie004: Timeout on hotplug command
0x02f1 (issued 2520 msec ago)
[4.879846] pciehp :00:01.0:pcie004: Timeout on hotplug command
0x06f1 (issued 1024 msec ago)
[5.911840] pciehp :00:01.0:pcie004: Timeout on hotplug command
0x06f1 (issued 2056 msec ago)
[6.927844] pciehp :00:01.0:pcie004: Timeout on hotplug command
0x07f1 (issued 1016 msec ago)
[7.951843] pciehp :00:01.0:pcie004: Timeout on hotplug command
0x0771 (issued 1024 msec ago)

Trying to hot add using "device_addvfio-pci,host=:01:10.1,id=net0,bus=rp1"
doesn't work either. And if I boot the guest with an assigned device
(-device vfio-pci,host=:01:10.1,id=net0,bus=rp1), I can see the dev listed 
in
the Guest but then hot remove doesn't work.

This all works on 4.19 and bisect points to the above mentioned commit, where an
additional check is added in pciehp_isr(),

-* Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4).
+* Interrupts only occur in D3hot or shallower and only if enabled
+* in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
 */
-   if (pdev->current_state == PCI_D3cold)
+   if (pdev->current_state == PCI_D3cold ||
+   (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
return IRQ_NONE;

I think this doesn't work for the first time, where the cmd with 
PCI_EXP_SLTCTL_HPIE bit set
is written,
pciehp_probe()
  pcie_init_notification()
pcie_enable_notification()
   pcie_do_write_cmd()

to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set once the 
write
is returned.

Or else I am missing something here. Please take a look and let me know.

Thanks,
Shameer

RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk

2018-11-09 Thread Shameerali Kolothum Thodi

Hi Robin,

> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: 18 October 2018 16:27
> To: Robin Murphy ; lorenzo.pieral...@arm.com;
> jean-philippe.bruc...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; pa...@codeaurora.org; will.dea...@arm.com;
> rruig...@codeaurora.org; Linuxarm ; linux-
> a...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org
> Subject: RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> 162001800 quirk
 
[...]

> 
> > > > +static const struct smmu_pmu_erratum_wa smmu_pmu_wa[] = {
> > > > +   {
> > > > +   .match_type = se_match_acpi_oem,
> > > > +   .id = hisi_162001800_oem_info,
> > > > +   .desc_str = "HiSilicon erratum 162001800",
> > > > +   .enable = hisi_erratum_evcntr_rdonly,
> > > > +   },
> > > > +};
> > > > +
> > >
> > > There's an awful lot of raw ACPI internals splashed about here -
> > > couldn't at least some of it be abstracted behind the IORT code? In
> > > fact, can't IORT just set all this stuff up in advance like it does for
> > > SMMUs?
> >
> > Hmmm.. Sorry, not clear to me. You mean to say associate the IORT node
> > with platform device and retrieve it in driver just like smmu does for
> > "model" checks? Not sure that works here if that’s what the above meant.
> >
> > > >   #define to_smmu_pmu(p) (container_of(p, struct smmu_pmu, pmu))
> > > >
> > > >   #define SMMU_PMU_EVENT_ATTR_EXTRACTOR(_name, _config, _start,
> > > _end)\
> > > > @@ -224,15 +271,20 @@ static void smmu_pmu_set_period(struct
> > > smmu_pmu *smmu_pmu,
> > > > u32 idx = hwc->idx;
> > > > u64 new;
> > > >
> > > > -   /*
> > > > -* We limit the max period to half the max counter value of the
> > > counter
> > > > -* size, so that even in the case of extreme interrupt latency 
> > > > the
> > > > -* counter will (hopefully) not wrap past its initial value.
> > > > -*/
> > > > -   new = smmu_pmu->counter_mask >> 1;
> > > > +   if (smmu_pmu->options & SMMU_PMU_OPT_EVCNTR_RDONLY) {
> > > > +   new = smmu_pmu_counter_get_value(smmu_pmu, idx);
> > >
> > > Something's clearly missing, because if this happens to start at 0, the
> > > current overflow handling code cannot possibly give the correct count.
> > > Much as I hate the reset-to-half-period idiom for being impossible to
> > > make sense of, it does make various aspects appear a lot simpler than
> > > they really are. Wait, maybe that's yet another reason to hate it...
> >
> > Yes,  if the counter starts at 0 and overflow happens, it won't possibly 
> > give
> > the correct count compared to the reset-to-half-period logic. Since this is 
> > a
> > 64 bit counter, just hope that, it won't necessarily happen that often.
> 
> [...]
> 
> > > > +static void smmu_pmu_enable_errata(struct smmu_pmu *smmu_pmu,
> > > > +   enum smmu_pmu_erratum_match_type type,
> > > > +   se_match_fn_t match_fn,
> > > > +   void *arg)
> > > > +{
> > > > +   const struct smmu_pmu_erratum_wa *wa = smmu_pmu_wa;
> > > > +
> > > > +   for (; wa->desc_str; wa++) {
> > > > +   if (wa->match_type != type)
> > > > +   continue;
> > > > +
> > > > +   if (match_fn(wa, arg)) {
> > > > +   if (wa->enable) {
> > > > +   wa->enable(smmu_pmu);
> > > > +   dev_info(smmu_pmu->dev,
> > > > +   "Enabling workaround for %s\n",
> > > > +wa->desc_str);
> > > > +   }
> > >
> > > Just how many kinds of broken are we expecting here? Is this lifted from
> > > the arm64 cpufeature framework, because it seems like absolute overkill
> > > for a simple PMU driver which in all reality is only ever going to
> > > wiggle a few flags in

RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk

2018-11-09 Thread Shameerali Kolothum Thodi

Hi Robin,

> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: 18 October 2018 16:27
> To: Robin Murphy ; lorenzo.pieral...@arm.com;
> jean-philippe.bruc...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; pa...@codeaurora.org; will.dea...@arm.com;
> rruig...@codeaurora.org; Linuxarm ; linux-
> a...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org
> Subject: RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> 162001800 quirk
 
[...]

> 
> > > > +static const struct smmu_pmu_erratum_wa smmu_pmu_wa[] = {
> > > > +   {
> > > > +   .match_type = se_match_acpi_oem,
> > > > +   .id = hisi_162001800_oem_info,
> > > > +   .desc_str = "HiSilicon erratum 162001800",
> > > > +   .enable = hisi_erratum_evcntr_rdonly,
> > > > +   },
> > > > +};
> > > > +
> > >
> > > There's an awful lot of raw ACPI internals splashed about here -
> > > couldn't at least some of it be abstracted behind the IORT code? In
> > > fact, can't IORT just set all this stuff up in advance like it does for
> > > SMMUs?
> >
> > Hmmm.. Sorry, not clear to me. You mean to say associate the IORT node
> > with platform device and retrieve it in driver just like smmu does for
> > "model" checks? Not sure that works here if that’s what the above meant.
> >
> > > >   #define to_smmu_pmu(p) (container_of(p, struct smmu_pmu, pmu))
> > > >
> > > >   #define SMMU_PMU_EVENT_ATTR_EXTRACTOR(_name, _config, _start,
> > > _end)\
> > > > @@ -224,15 +271,20 @@ static void smmu_pmu_set_period(struct
> > > smmu_pmu *smmu_pmu,
> > > > u32 idx = hwc->idx;
> > > > u64 new;
> > > >
> > > > -   /*
> > > > -* We limit the max period to half the max counter value of the
> > > counter
> > > > -* size, so that even in the case of extreme interrupt latency 
> > > > the
> > > > -* counter will (hopefully) not wrap past its initial value.
> > > > -*/
> > > > -   new = smmu_pmu->counter_mask >> 1;
> > > > +   if (smmu_pmu->options & SMMU_PMU_OPT_EVCNTR_RDONLY) {
> > > > +   new = smmu_pmu_counter_get_value(smmu_pmu, idx);
> > >
> > > Something's clearly missing, because if this happens to start at 0, the
> > > current overflow handling code cannot possibly give the correct count.
> > > Much as I hate the reset-to-half-period idiom for being impossible to
> > > make sense of, it does make various aspects appear a lot simpler than
> > > they really are. Wait, maybe that's yet another reason to hate it...
> >
> > Yes,  if the counter starts at 0 and overflow happens, it won't possibly 
> > give
> > the correct count compared to the reset-to-half-period logic. Since this is 
> > a
> > 64 bit counter, just hope that, it won't necessarily happen that often.
> 
> [...]
> 
> > > > +static void smmu_pmu_enable_errata(struct smmu_pmu *smmu_pmu,
> > > > +   enum smmu_pmu_erratum_match_type type,
> > > > +   se_match_fn_t match_fn,
> > > > +   void *arg)
> > > > +{
> > > > +   const struct smmu_pmu_erratum_wa *wa = smmu_pmu_wa;
> > > > +
> > > > +   for (; wa->desc_str; wa++) {
> > > > +   if (wa->match_type != type)
> > > > +   continue;
> > > > +
> > > > +   if (match_fn(wa, arg)) {
> > > > +   if (wa->enable) {
> > > > +   wa->enable(smmu_pmu);
> > > > +   dev_info(smmu_pmu->dev,
> > > > +   "Enabling workaround for %s\n",
> > > > +wa->desc_str);
> > > > +   }
> > >
> > > Just how many kinds of broken are we expecting here? Is this lifted from
> > > the arm64 cpufeature framework, because it seems like absolute overkill
> > > for a simple PMU driver which in all reality is only ever going to
> > > wiggle a few flags in

RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk

2018-10-18 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: 18 October 2018 14:34
> To: Robin Murphy ; lorenzo.pieral...@arm.com;
> jean-philippe.bruc...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; pa...@codeaurora.org; will.dea...@arm.com;
> rruig...@codeaurora.org; Linuxarm ; linux-
> ker...@vger.kernel.org; linux-a...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org
> Subject: RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> 162001800 quirk
> 
> Hi Robin,
> 
> > -Original Message-
> > From: Robin Murphy [mailto:robin.mur...@arm.com]
> > Sent: 18 October 2018 12:44
> > To: Shameerali Kolothum Thodi ;
> > lorenzo.pieral...@arm.com; jean-philippe.bruc...@arm.com
> > Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> > ; John Garry ;
> > pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> > linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> > ker...@lists.infradead.org; Linuxarm ;
> > neil.m.lee...@gmail.com
> > Subject: Re: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> > 162001800 quirk

[...]

> > > +static const struct smmu_pmu_erratum_wa smmu_pmu_wa[] = {
> > > + {
> > > + .match_type = se_match_acpi_oem,
> > > + .id = hisi_162001800_oem_info,
> > > + .desc_str = "HiSilicon erratum 162001800",
> > > + .enable = hisi_erratum_evcntr_rdonly,
> > > + },
> > > +};
> > > +
> >
> > There's an awful lot of raw ACPI internals splashed about here -
> > couldn't at least some of it be abstracted behind the IORT code? In
> > fact, can't IORT just set all this stuff up in advance like it does for
> > SMMUs?
> 
> Hmmm.. Sorry, not clear to me. You mean to say associate the IORT node
> with platform device and retrieve it in driver just like smmu does for
> "model" checks? Not sure that works here if that’s what the above meant.
> 
> > >   #define to_smmu_pmu(p) (container_of(p, struct smmu_pmu, pmu))
> > >
> > >   #define SMMU_PMU_EVENT_ATTR_EXTRACTOR(_name, _config, _start,
> > _end)\
> > > @@ -224,15 +271,20 @@ static void smmu_pmu_set_period(struct
> > smmu_pmu *smmu_pmu,
> > >   u32 idx = hwc->idx;
> > >   u64 new;
> > >
> > > - /*
> > > -  * We limit the max period to half the max counter value of the
> > counter
> > > -  * size, so that even in the case of extreme interrupt latency the
> > > -  * counter will (hopefully) not wrap past its initial value.
> > > -  */
> > > - new = smmu_pmu->counter_mask >> 1;
> > > + if (smmu_pmu->options & SMMU_PMU_OPT_EVCNTR_RDONLY) {
> > > + new = smmu_pmu_counter_get_value(smmu_pmu, idx);
> >
> > Something's clearly missing, because if this happens to start at 0, the
> > current overflow handling code cannot possibly give the correct count.
> > Much as I hate the reset-to-half-period idiom for being impossible to
> > make sense of, it does make various aspects appear a lot simpler than
> > they really are. Wait, maybe that's yet another reason to hate it...
> 
> Yes,  if the counter starts at 0 and overflow happens, it won't possibly give
> the correct count compared to the reset-to-half-period logic. Since this is a
> 64 bit counter, just hope that, it won't necessarily happen that often.

[...]

> > > +static void smmu_pmu_enable_errata(struct smmu_pmu *smmu_pmu,
> > > + enum smmu_pmu_erratum_match_type type,
> > > + se_match_fn_t match_fn,
> > > + void *arg)
> > > +{
> > > + const struct smmu_pmu_erratum_wa *wa = smmu_pmu_wa;
> > > +
> > > + for (; wa->desc_str; wa++) {
> > > + if (wa->match_type != type)
> > > + continue;
> > > +
> > > + if (match_fn(wa, arg)) {
> > > + if (wa->enable) {
> > > + wa->enable(smmu_pmu);
> > > + dev_info(smmu_pmu->dev,
> > > + "Enabling workaround for %s\n",
> > > +  wa->desc_str);
> > > + }
> >
> > Just how many kinds of broken are we expecting here? Is this lifted from
> > the arm64 cpufeature framework, because it seems like

RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk

2018-10-18 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: 18 October 2018 14:34
> To: Robin Murphy ; lorenzo.pieral...@arm.com;
> jean-philippe.bruc...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; pa...@codeaurora.org; will.dea...@arm.com;
> rruig...@codeaurora.org; Linuxarm ; linux-
> ker...@vger.kernel.org; linux-a...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org
> Subject: RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> 162001800 quirk
> 
> Hi Robin,
> 
> > -Original Message-
> > From: Robin Murphy [mailto:robin.mur...@arm.com]
> > Sent: 18 October 2018 12:44
> > To: Shameerali Kolothum Thodi ;
> > lorenzo.pieral...@arm.com; jean-philippe.bruc...@arm.com
> > Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> > ; John Garry ;
> > pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> > linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> > ker...@lists.infradead.org; Linuxarm ;
> > neil.m.lee...@gmail.com
> > Subject: Re: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> > 162001800 quirk

[...]

> > > +static const struct smmu_pmu_erratum_wa smmu_pmu_wa[] = {
> > > + {
> > > + .match_type = se_match_acpi_oem,
> > > + .id = hisi_162001800_oem_info,
> > > + .desc_str = "HiSilicon erratum 162001800",
> > > + .enable = hisi_erratum_evcntr_rdonly,
> > > + },
> > > +};
> > > +
> >
> > There's an awful lot of raw ACPI internals splashed about here -
> > couldn't at least some of it be abstracted behind the IORT code? In
> > fact, can't IORT just set all this stuff up in advance like it does for
> > SMMUs?
> 
> Hmmm.. Sorry, not clear to me. You mean to say associate the IORT node
> with platform device and retrieve it in driver just like smmu does for
> "model" checks? Not sure that works here if that’s what the above meant.
> 
> > >   #define to_smmu_pmu(p) (container_of(p, struct smmu_pmu, pmu))
> > >
> > >   #define SMMU_PMU_EVENT_ATTR_EXTRACTOR(_name, _config, _start,
> > _end)\
> > > @@ -224,15 +271,20 @@ static void smmu_pmu_set_period(struct
> > smmu_pmu *smmu_pmu,
> > >   u32 idx = hwc->idx;
> > >   u64 new;
> > >
> > > - /*
> > > -  * We limit the max period to half the max counter value of the
> > counter
> > > -  * size, so that even in the case of extreme interrupt latency the
> > > -  * counter will (hopefully) not wrap past its initial value.
> > > -  */
> > > - new = smmu_pmu->counter_mask >> 1;
> > > + if (smmu_pmu->options & SMMU_PMU_OPT_EVCNTR_RDONLY) {
> > > + new = smmu_pmu_counter_get_value(smmu_pmu, idx);
> >
> > Something's clearly missing, because if this happens to start at 0, the
> > current overflow handling code cannot possibly give the correct count.
> > Much as I hate the reset-to-half-period idiom for being impossible to
> > make sense of, it does make various aspects appear a lot simpler than
> > they really are. Wait, maybe that's yet another reason to hate it...
> 
> Yes,  if the counter starts at 0 and overflow happens, it won't possibly give
> the correct count compared to the reset-to-half-period logic. Since this is a
> 64 bit counter, just hope that, it won't necessarily happen that often.

[...]

> > > +static void smmu_pmu_enable_errata(struct smmu_pmu *smmu_pmu,
> > > + enum smmu_pmu_erratum_match_type type,
> > > + se_match_fn_t match_fn,
> > > + void *arg)
> > > +{
> > > + const struct smmu_pmu_erratum_wa *wa = smmu_pmu_wa;
> > > +
> > > + for (; wa->desc_str; wa++) {
> > > + if (wa->match_type != type)
> > > + continue;
> > > +
> > > + if (match_fn(wa, arg)) {
> > > + if (wa->enable) {
> > > + wa->enable(smmu_pmu);
> > > + dev_info(smmu_pmu->dev,
> > > + "Enabling workaround for %s\n",
> > > +  wa->desc_str);
> > > + }
> >
> > Just how many kinds of broken are we expecting here? Is this lifted from
> > the arm64 cpufeature framework, because it seems like

RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk

2018-10-18 Thread Shameerali Kolothum Thodi

Hi Robin,

> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 18 October 2018 12:44
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com; jean-philippe.bruc...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> 162001800 quirk
> 
> On 16/10/18 13:49, Shameer Kolothum wrote:
> > HiSilicon erratum 162001800 describes the limitation of
> > SMMUv3 PMCG implementation on HiSilicon Hip08 platforms.
> >
> > On these platforms, the PMCG event counter registers
> > (SMMU_PMCG_EVCNTRn) are read only and as a result it is
> > not possible to set the initial counter period value on
> > event monitor start.
> 
> How the... oh well, never mind :(
> 
> > To work around this, the current value of the counter is
> > read and is used for delta calculations. This increases
> > the possibility of reporting incorrect values if counter
> > overflow happens and counter passes the initial value.
> >
> > OEM information from ACPI header is used to identify the
> > affected hardware platform.
> 
> I'm guessing they don't implement anything useful for
> SMMU_PMCG_ID_REGS?
> (notwithstanding the known chicken-and-egg problem with how to interpret
> those)

Your guess is right :(
 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >   drivers/perf/arm_smmuv3_pmu.c | 137
> +++---
> >   1 file changed, 130 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/perf/arm_smmuv3_pmu.c
> b/drivers/perf/arm_smmuv3_pmu.c
> > index d927ef8..519545e 100644
> > --- a/drivers/perf/arm_smmuv3_pmu.c
> > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > @@ -96,6 +96,8 @@
> >
> >   #define SMMU_PA_SHIFT   12
> >
> > +#define SMMU_PMU_OPT_EVCNTR_RDONLY (1 << 0)
> > +
> >   static int cpuhp_state_num;
> >
> >   struct smmu_pmu {
> > @@ -111,10 +113,55 @@ struct smmu_pmu {
> > struct device *dev;
> > void __iomem *reg_base;
> > void __iomem *reloc_base;
> > +   u32 options;
> > u64 counter_present_mask;
> > u64 counter_mask;
> >   };
> >
> > +struct erratum_acpi_oem_info {
> > +   char oem_id[ACPI_OEM_ID_SIZE + 1];
> > +   char oem_table_id[ACPI_OEM_TABLE_ID_SIZE + 1];
> > +   u32 oem_revision;
> > +};
> > +
> > +static struct erratum_acpi_oem_info hisi_162001800_oem_info[] = {
> > +   /*
> > +* Note that trailing spaces are required to properly match
> > +* the OEM table information.
> > +*/
> > +   {
> > +   .oem_id = "HISI  ",
> > +   .oem_table_id   = "HIP08   ",
> > +   .oem_revision   = 0,
> > +   },
> > +   { /* Sentinel indicating the end of the OEM array */ },
> > +};
> > +
> > +enum smmu_pmu_erratum_match_type {
> > +   se_match_acpi_oem,
> > +};
> > +
> > +void hisi_erratum_evcntr_rdonly(struct smmu_pmu *smmu_pmu)
> > +{
> > +   smmu_pmu->options |= SMMU_PMU_OPT_EVCNTR_RDONLY;
> > +}
> > +
> > +struct smmu_pmu_erratum_wa {
> > +   enum smmu_pmu_erratum_match_type match_type;
> > +   const void *id; /* Indicate the Erratum ID */
> > +   const char *desc_str;
> > +   void (*enable)(struct smmu_pmu *smmu_pmu);
> > +};
> > +
> > +static const struct smmu_pmu_erratum_wa smmu_pmu_wa[] = {
> > +   {
> > +   .match_type = se_match_acpi_oem,
> > +   .id = hisi_162001800_oem_info,
> > +   .desc_str = "HiSilicon erratum 162001800",
> > +   .enable = hisi_erratum_evcntr_rdonly,
> > +   },
> > +};
> > +
> 
> There's an awful lot of raw ACPI internals splashed about here -
> couldn't at least some of it be abstracted behind the IORT code? In
> fact, can't IORT just set all this stuff up in advance like it does for
> SMMUs?

Hmmm.. Sorry, not clear to me. You mean to say associate the IORT node
with platform device and retrieve it in driver just like smmu does for
"model" checks? Not sure that works here if that’s what the above meant.
 
> >   #define to_smmu_pmu(p) (container_of(p, struct smmu_pmu, pmu))
> >
> >   #define SMMU_PMU_EVENT_ATTR_EXTRACTOR(_name, _config, _start,
> _end)

RE: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk

2018-10-18 Thread Shameerali Kolothum Thodi

Hi Robin,

> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 18 October 2018 12:44
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com; jean-philippe.bruc...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v4 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum
> 162001800 quirk
> 
> On 16/10/18 13:49, Shameer Kolothum wrote:
> > HiSilicon erratum 162001800 describes the limitation of
> > SMMUv3 PMCG implementation on HiSilicon Hip08 platforms.
> >
> > On these platforms, the PMCG event counter registers
> > (SMMU_PMCG_EVCNTRn) are read only and as a result it is
> > not possible to set the initial counter period value on
> > event monitor start.
> 
> How the... oh well, never mind :(
> 
> > To work around this, the current value of the counter is
> > read and is used for delta calculations. This increases
> > the possibility of reporting incorrect values if counter
> > overflow happens and counter passes the initial value.
> >
> > OEM information from ACPI header is used to identify the
> > affected hardware platform.
> 
> I'm guessing they don't implement anything useful for
> SMMU_PMCG_ID_REGS?
> (notwithstanding the known chicken-and-egg problem with how to interpret
> those)

Your guess is right :(
 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >   drivers/perf/arm_smmuv3_pmu.c | 137
> +++---
> >   1 file changed, 130 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/perf/arm_smmuv3_pmu.c
> b/drivers/perf/arm_smmuv3_pmu.c
> > index d927ef8..519545e 100644
> > --- a/drivers/perf/arm_smmuv3_pmu.c
> > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > @@ -96,6 +96,8 @@
> >
> >   #define SMMU_PA_SHIFT   12
> >
> > +#define SMMU_PMU_OPT_EVCNTR_RDONLY (1 << 0)
> > +
> >   static int cpuhp_state_num;
> >
> >   struct smmu_pmu {
> > @@ -111,10 +113,55 @@ struct smmu_pmu {
> > struct device *dev;
> > void __iomem *reg_base;
> > void __iomem *reloc_base;
> > +   u32 options;
> > u64 counter_present_mask;
> > u64 counter_mask;
> >   };
> >
> > +struct erratum_acpi_oem_info {
> > +   char oem_id[ACPI_OEM_ID_SIZE + 1];
> > +   char oem_table_id[ACPI_OEM_TABLE_ID_SIZE + 1];
> > +   u32 oem_revision;
> > +};
> > +
> > +static struct erratum_acpi_oem_info hisi_162001800_oem_info[] = {
> > +   /*
> > +* Note that trailing spaces are required to properly match
> > +* the OEM table information.
> > +*/
> > +   {
> > +   .oem_id = "HISI  ",
> > +   .oem_table_id   = "HIP08   ",
> > +   .oem_revision   = 0,
> > +   },
> > +   { /* Sentinel indicating the end of the OEM array */ },
> > +};
> > +
> > +enum smmu_pmu_erratum_match_type {
> > +   se_match_acpi_oem,
> > +};
> > +
> > +void hisi_erratum_evcntr_rdonly(struct smmu_pmu *smmu_pmu)
> > +{
> > +   smmu_pmu->options |= SMMU_PMU_OPT_EVCNTR_RDONLY;
> > +}
> > +
> > +struct smmu_pmu_erratum_wa {
> > +   enum smmu_pmu_erratum_match_type match_type;
> > +   const void *id; /* Indicate the Erratum ID */
> > +   const char *desc_str;
> > +   void (*enable)(struct smmu_pmu *smmu_pmu);
> > +};
> > +
> > +static const struct smmu_pmu_erratum_wa smmu_pmu_wa[] = {
> > +   {
> > +   .match_type = se_match_acpi_oem,
> > +   .id = hisi_162001800_oem_info,
> > +   .desc_str = "HiSilicon erratum 162001800",
> > +   .enable = hisi_erratum_evcntr_rdonly,
> > +   },
> > +};
> > +
> 
> There's an awful lot of raw ACPI internals splashed about here -
> couldn't at least some of it be abstracted behind the IORT code? In
> fact, can't IORT just set all this stuff up in advance like it does for
> SMMUs?

Hmmm.. Sorry, not clear to me. You mean to say associate the IORT node
with platform device and retrieve it in driver just like smmu does for
"model" checks? Not sure that works here if that’s what the above meant.
 
> >   #define to_smmu_pmu(p) (container_of(p, struct smmu_pmu, pmu))
> >
> >   #define SMMU_PMU_EVENT_ATTR_EXTRACTOR(_name, _config, _start,
> _end)

RE: [PATCH v4 2/4] perf: add arm64 smmuv3 pmu driver

2018-10-18 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: kbuild test robot [mailto:l...@intel.com]
> Sent: 17 October 2018 22:53
> To: Shameerali Kolothum Thodi 
> Cc: kbuild-...@01.org; lorenzo.pieral...@arm.com; robin.mur...@arm.com;
> jean-philippe.bruc...@arm.com; will.dea...@arm.com;
> mark.rutl...@arm.com; Guohanjun (Hanjun Guo) ;
> John Garry ; pa...@codeaurora.org;
> vkil...@codeaurora.org; rruig...@codeaurora.org; linux-
> a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v4 2/4] perf: add arm64 smmuv3 pmu driver
> 
> Hi Neil,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linux-sof-driver/master]
> [also build test ERROR on v4.19-rc8 next-20181017]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system]
> 
> url:https://github.com/0day-ci/linux/commits/Shameer-Kolothum/arm64-
> SMMUv3-PMU-driver-with-IORT-support/20181017-063949
> base:   https://github.com/thesofproject/linux master
> config: xtensa-allyesconfig (attached as .config)
> compiler: xtensa-linux-gcc (GCC) 8.1.0
> reproduce:
> wget https://raw.githubusercontent.com/intel/lkp-
> tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=8.1.0 make.cross ARCH=xtensa
> 
> All errors (new ones prefixed by >>):
> 
>In file included from include/linux/kernel.h:11,
> from include/linux/list.h:9,
> from include/linux/resource_ext.h:17,
> from include/linux/acpi.h:26,
> from drivers//perf/arm_smmuv3_pmu.c:37:
>drivers//perf/arm_smmuv3_pmu.c: In function
> 'smmu_pmu_counter_set_value':
>include/linux/bitops.h:7:24: warning: left shift count >= width of type 
> [-Wshift-
> count-overflow]
> #define BIT(nr)   (1UL << (nr))
>^~
>drivers//perf/arm_smmuv3_pmu.c:145:31: note: in expansion of macro 'BIT'
>  if (smmu_pmu->counter_mask & BIT(32))
>   ^~~
>drivers//perf/arm_smmuv3_pmu.c:146:3: error: implicit declaration of
> function 'writeq'; did you mean 'writel'? [-Werror=implicit-function-
> declaration]
>   writeq(value, smmu_pmu->reloc_base + SMMU_PMCG_EVCNTR(idx, 8));
>   ^~
>   writel
>In file included from include/linux/kernel.h:11,
> from include/linux/list.h:9,
> from include/linux/resource_ext.h:17,
> from include/linux/acpi.h:26,
> from drivers//perf/arm_smmuv3_pmu.c:37:
>drivers//perf/arm_smmuv3_pmu.c: In function
> 'smmu_pmu_counter_get_value':
>include/linux/bitops.h:7:24: warning: left shift count >= width of type 
> [-Wshift-
> count-overflow]
> #define BIT(nr)   (1UL << (nr))
>^~
>drivers//perf/arm_smmuv3_pmu.c:155:31: note: in expansion of macro 'BIT'
>  if (smmu_pmu->counter_mask & BIT(32))
>   ^~~
>drivers//perf/arm_smmuv3_pmu.c:156:11: error: implicit declaration of
> function 'readq'; did you mean 'readl'? 
> [-Werror=implicit-function-declaration]
>   value = readq(smmu_pmu->reloc_base + SMMU_PMCG_EVCNTR(idx, 8));
>   ^
>   readl

Right. This again is linked to the COMPILE_TEST added in this version of the 
series.
It looks like these functions has dependency on architecture (CONFIG_64BIT). I 
will 
take care of this in next revision.

Thanks,
Shameer

>drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_reset':
>drivers//perf/arm_smmuv3_pmu.c:607:2: error: implicit declaration of
> function 'writeq_relaxed'; did you mean 'writel_relaxed'? [-Werror=implicit-
> function-declaration]
>  writeq_relaxed(smmu_pmu->counter_present_mask,
>  ^~
>  writel_relaxed
>drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_probe':
> >> drivers//perf/arm_smmuv3_pmu.c:666:15: error: implicit declaration of
> function 'readq_relaxed'; did you mean 'readl_relaxed'? [-Werror=implicit-
> function-declaration]
>  ceid_64[0] = readq_relaxed(smmu_pmu->reg_base +
> SMMU_PMCG_CEID0);
>   ^
>   readl_relaxed
>drivers//perf/arm_smmuv3_pmu.c:687:64: warning: format '%llx' expects
> argument of type 'long long unsigned int', but argument 4 has type
> 'resource_size_t' {aka 'unsigned int'} [-Wformat=]
>  name = devm_kasprintf(>dev, GFP_KERNEL, "sm

RE: [PATCH v4 2/4] perf: add arm64 smmuv3 pmu driver

2018-10-18 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: kbuild test robot [mailto:l...@intel.com]
> Sent: 17 October 2018 22:53
> To: Shameerali Kolothum Thodi 
> Cc: kbuild-...@01.org; lorenzo.pieral...@arm.com; robin.mur...@arm.com;
> jean-philippe.bruc...@arm.com; will.dea...@arm.com;
> mark.rutl...@arm.com; Guohanjun (Hanjun Guo) ;
> John Garry ; pa...@codeaurora.org;
> vkil...@codeaurora.org; rruig...@codeaurora.org; linux-
> a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v4 2/4] perf: add arm64 smmuv3 pmu driver
> 
> Hi Neil,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linux-sof-driver/master]
> [also build test ERROR on v4.19-rc8 next-20181017]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system]
> 
> url:https://github.com/0day-ci/linux/commits/Shameer-Kolothum/arm64-
> SMMUv3-PMU-driver-with-IORT-support/20181017-063949
> base:   https://github.com/thesofproject/linux master
> config: xtensa-allyesconfig (attached as .config)
> compiler: xtensa-linux-gcc (GCC) 8.1.0
> reproduce:
> wget https://raw.githubusercontent.com/intel/lkp-
> tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=8.1.0 make.cross ARCH=xtensa
> 
> All errors (new ones prefixed by >>):
> 
>In file included from include/linux/kernel.h:11,
> from include/linux/list.h:9,
> from include/linux/resource_ext.h:17,
> from include/linux/acpi.h:26,
> from drivers//perf/arm_smmuv3_pmu.c:37:
>drivers//perf/arm_smmuv3_pmu.c: In function
> 'smmu_pmu_counter_set_value':
>include/linux/bitops.h:7:24: warning: left shift count >= width of type 
> [-Wshift-
> count-overflow]
> #define BIT(nr)   (1UL << (nr))
>^~
>drivers//perf/arm_smmuv3_pmu.c:145:31: note: in expansion of macro 'BIT'
>  if (smmu_pmu->counter_mask & BIT(32))
>   ^~~
>drivers//perf/arm_smmuv3_pmu.c:146:3: error: implicit declaration of
> function 'writeq'; did you mean 'writel'? [-Werror=implicit-function-
> declaration]
>   writeq(value, smmu_pmu->reloc_base + SMMU_PMCG_EVCNTR(idx, 8));
>   ^~
>   writel
>In file included from include/linux/kernel.h:11,
> from include/linux/list.h:9,
> from include/linux/resource_ext.h:17,
> from include/linux/acpi.h:26,
> from drivers//perf/arm_smmuv3_pmu.c:37:
>drivers//perf/arm_smmuv3_pmu.c: In function
> 'smmu_pmu_counter_get_value':
>include/linux/bitops.h:7:24: warning: left shift count >= width of type 
> [-Wshift-
> count-overflow]
> #define BIT(nr)   (1UL << (nr))
>^~
>drivers//perf/arm_smmuv3_pmu.c:155:31: note: in expansion of macro 'BIT'
>  if (smmu_pmu->counter_mask & BIT(32))
>   ^~~
>drivers//perf/arm_smmuv3_pmu.c:156:11: error: implicit declaration of
> function 'readq'; did you mean 'readl'? 
> [-Werror=implicit-function-declaration]
>   value = readq(smmu_pmu->reloc_base + SMMU_PMCG_EVCNTR(idx, 8));
>   ^
>   readl

Right. This again is linked to the COMPILE_TEST added in this version of the 
series.
It looks like these functions has dependency on architecture (CONFIG_64BIT). I 
will 
take care of this in next revision.

Thanks,
Shameer

>drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_reset':
>drivers//perf/arm_smmuv3_pmu.c:607:2: error: implicit declaration of
> function 'writeq_relaxed'; did you mean 'writel_relaxed'? [-Werror=implicit-
> function-declaration]
>  writeq_relaxed(smmu_pmu->counter_present_mask,
>  ^~
>  writel_relaxed
>drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_probe':
> >> drivers//perf/arm_smmuv3_pmu.c:666:15: error: implicit declaration of
> function 'readq_relaxed'; did you mean 'readl_relaxed'? [-Werror=implicit-
> function-declaration]
>  ceid_64[0] = readq_relaxed(smmu_pmu->reg_base +
> SMMU_PMCG_CEID0);
>   ^
>   readl_relaxed
>drivers//perf/arm_smmuv3_pmu.c:687:64: warning: format '%llx' expects
> argument of type 'long long unsigned int', but argument 4 has type
> 'resource_size_t' {aka 'unsigned int'} [-Wformat=]
>  name = devm_kasprintf(>dev, GFP_KERNEL, "sm

RE: [PATCH v4 3/4] perf/smmuv3: Add MSI irq support

2018-10-17 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: kbuild test robot [mailto:l...@intel.com]
> Sent: 17 October 2018 04:36
> To: Shameerali Kolothum Thodi 
> Cc: kbuild-...@01.org; lorenzo.pieral...@arm.com; robin.mur...@arm.com;
> jean-philippe.bruc...@arm.com; will.dea...@arm.com;
> mark.rutl...@arm.com; Guohanjun (Hanjun Guo) ;
> John Garry ; pa...@codeaurora.org;
> vkil...@codeaurora.org; rruig...@codeaurora.org; linux-
> a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v4 3/4] perf/smmuv3: Add MSI irq support
> 
> Hi Shameer,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linux-sof-driver/master]
> [also build test ERROR on v4.19-rc8 next-20181016]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system]
> 
> url:https://github.com/0day-ci/linux/commits/Shameer-Kolothum/arm64-
> SMMUv3-PMU-driver-with-IORT-support/20181017-063949
> base:   https://github.com/thesofproject/linux master
> config: sh-allmodconfig (attached as .config)
> compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget https://raw.githubusercontent.com/intel/lkp-
> tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.2.0 make.cross ARCH=sh
> 
> All error/warnings (new ones prefixed by >>):
> 
>In file included from include/linux/kernel.h:11:0,
> from include/linux/list.h:9,
> from include/linux/resource_ext.h:17,
> from include/linux/acpi.h:26,
> from drivers//perf/arm_smmuv3_pmu.c:37:
>drivers//perf/arm_smmuv3_pmu.c: In function
> 'smmu_pmu_counter_set_value':
>include/linux/bitops.h:7:24: warning: left shift count >= width of type 
> [-Wshift-
> count-overflow]
> #define BIT(nr)   (1UL << (nr))
>^
>drivers//perf/arm_smmuv3_pmu.c:152:31: note: in expansion of macro 'BIT'
>  if (smmu_pmu->counter_mask & BIT(32))
>   ^~~
>drivers//perf/arm_smmuv3_pmu.c: In function
> 'smmu_pmu_counter_get_value':
>include/linux/bitops.h:7:24: warning: left shift count >= width of type 
> [-Wshift-
> count-overflow]
> #define BIT(nr)   (1UL << (nr))
>^
>drivers//perf/arm_smmuv3_pmu.c:162:31: note: in expansion of macro 'BIT'
>  if (smmu_pmu->counter_mask & BIT(32))
>   ^~~
>drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_free_msis':
> >> drivers//perf/arm_smmuv3_pmu.c:601:2: error: implicit declaration of
> function 'platform_msi_domain_free_irqs'; did you mean
> 'platform_get_device_id'? [-Werror=implicit-function-declaration]
>  platform_msi_domain_free_irqs(dev);
>  ^

Ok. This is probably because of the COMPILE_TEST added to patch #2 and
this one will have dependency on PCI/PCI_MSI. I will remove that in next
revision.

Thanks,
Shameer

>  platform_get_device_id
>drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_setup_msi':
> >> drivers//perf/arm_smmuv3_pmu.c:632:8: error: implicit declaration of
> function 'platform_msi_domain_alloc_irqs'; did you mean
> 'platform_device_alloc'? [-Werror=implicit-function-declaration]
>  ret = platform_msi_domain_alloc_irqs(dev, 1, smmu_pmu_write_msi_msg);
>^~
>platform_device_alloc
>In file included from include/linux/list.h:9:0,
> from include/linux/resource_ext.h:17,
> from include/linux/acpi.h:26,
> from drivers//perf/arm_smmuv3_pmu.c:37:
>include/linux/msi.h:114:38: error: 'struct device' has no member named
> 'msi_list'
> #define dev_to_msi_list(dev)  (&(dev)->msi_list)
>  ^
>include/linux/kernel.h:961:26: note: in definition of macro 'container_of'
>  void *__mptr = (void *)(ptr); \
>  ^~~
>include/linux/list.h:377:2: note: in expansion of macro 'list_entry'
>  list_entry((ptr)->next, type, member)
>  ^~
>include/linux/msi.h:116:2: note: in expansion of macro 'list_first_entry'
>  list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
>  ^~~~
>include/linux/msi.h:116:19: note: in expansion of macro 'dev_to_msi_list'
>  list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)

RE: [PATCH v4 3/4] perf/smmuv3: Add MSI irq support

2018-10-17 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: kbuild test robot [mailto:l...@intel.com]
> Sent: 17 October 2018 04:36
> To: Shameerali Kolothum Thodi 
> Cc: kbuild-...@01.org; lorenzo.pieral...@arm.com; robin.mur...@arm.com;
> jean-philippe.bruc...@arm.com; will.dea...@arm.com;
> mark.rutl...@arm.com; Guohanjun (Hanjun Guo) ;
> John Garry ; pa...@codeaurora.org;
> vkil...@codeaurora.org; rruig...@codeaurora.org; linux-
> a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v4 3/4] perf/smmuv3: Add MSI irq support
> 
> Hi Shameer,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linux-sof-driver/master]
> [also build test ERROR on v4.19-rc8 next-20181016]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system]
> 
> url:https://github.com/0day-ci/linux/commits/Shameer-Kolothum/arm64-
> SMMUv3-PMU-driver-with-IORT-support/20181017-063949
> base:   https://github.com/thesofproject/linux master
> config: sh-allmodconfig (attached as .config)
> compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget https://raw.githubusercontent.com/intel/lkp-
> tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.2.0 make.cross ARCH=sh
> 
> All error/warnings (new ones prefixed by >>):
> 
>In file included from include/linux/kernel.h:11:0,
> from include/linux/list.h:9,
> from include/linux/resource_ext.h:17,
> from include/linux/acpi.h:26,
> from drivers//perf/arm_smmuv3_pmu.c:37:
>drivers//perf/arm_smmuv3_pmu.c: In function
> 'smmu_pmu_counter_set_value':
>include/linux/bitops.h:7:24: warning: left shift count >= width of type 
> [-Wshift-
> count-overflow]
> #define BIT(nr)   (1UL << (nr))
>^
>drivers//perf/arm_smmuv3_pmu.c:152:31: note: in expansion of macro 'BIT'
>  if (smmu_pmu->counter_mask & BIT(32))
>   ^~~
>drivers//perf/arm_smmuv3_pmu.c: In function
> 'smmu_pmu_counter_get_value':
>include/linux/bitops.h:7:24: warning: left shift count >= width of type 
> [-Wshift-
> count-overflow]
> #define BIT(nr)   (1UL << (nr))
>^
>drivers//perf/arm_smmuv3_pmu.c:162:31: note: in expansion of macro 'BIT'
>  if (smmu_pmu->counter_mask & BIT(32))
>   ^~~
>drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_free_msis':
> >> drivers//perf/arm_smmuv3_pmu.c:601:2: error: implicit declaration of
> function 'platform_msi_domain_free_irqs'; did you mean
> 'platform_get_device_id'? [-Werror=implicit-function-declaration]
>  platform_msi_domain_free_irqs(dev);
>  ^

Ok. This is probably because of the COMPILE_TEST added to patch #2 and
this one will have dependency on PCI/PCI_MSI. I will remove that in next
revision.

Thanks,
Shameer

>  platform_get_device_id
>drivers//perf/arm_smmuv3_pmu.c: In function 'smmu_pmu_setup_msi':
> >> drivers//perf/arm_smmuv3_pmu.c:632:8: error: implicit declaration of
> function 'platform_msi_domain_alloc_irqs'; did you mean
> 'platform_device_alloc'? [-Werror=implicit-function-declaration]
>  ret = platform_msi_domain_alloc_irqs(dev, 1, smmu_pmu_write_msi_msg);
>^~
>platform_device_alloc
>In file included from include/linux/list.h:9:0,
> from include/linux/resource_ext.h:17,
> from include/linux/acpi.h:26,
> from drivers//perf/arm_smmuv3_pmu.c:37:
>include/linux/msi.h:114:38: error: 'struct device' has no member named
> 'msi_list'
> #define dev_to_msi_list(dev)  (&(dev)->msi_list)
>  ^
>include/linux/kernel.h:961:26: note: in definition of macro 'container_of'
>  void *__mptr = (void *)(ptr); \
>  ^~~
>include/linux/list.h:377:2: note: in expansion of macro 'list_entry'
>  list_entry((ptr)->next, type, member)
>  ^~
>include/linux/msi.h:116:2: note: in expansion of macro 'list_first_entry'
>  list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)
>  ^~~~
>include/linux/msi.h:116:19: note: in expansion of macro 'dev_to_msi_list'
>  list_first_entry(dev_to_msi_list((dev)), struct msi_desc, list)

RE: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver

2018-10-11 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 11 October 2018 12:26
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver
> 
> Hi Shameer,
> 
> One more thing...
> 
> On 21/09/18 16:08, Shameer Kolothum wrote:
> [...]
> > +static int smmu_pmu_probe(struct platform_device *pdev)
> > +{
> > +   struct smmu_pmu *smmu_pmu;
> > +   struct resource *res_0, *res_1;
> > +   u32 cfgr, reg_size;
> > +   u64 ceid_64[2];
> > +   int irq, err;
> > +   char *name;
> > +   struct device *dev = >dev;
> > +
> > +   smmu_pmu = devm_kzalloc(dev, sizeof(*smmu_pmu), GFP_KERNEL);
> > +   if (!smmu_pmu)
> > +   return -ENOMEM;
> > +
> > +   smmu_pmu->dev = dev;
> > +
> > +   platform_set_drvdata(pdev, smmu_pmu);
> > +   smmu_pmu->pmu = (struct pmu) {
> > +   .task_ctx_nr= perf_invalid_context,
> > +   .pmu_enable = smmu_pmu_enable,
> > +   .pmu_disable= smmu_pmu_disable,
> > +   .event_init = smmu_pmu_event_init,
> > +   .add= smmu_pmu_event_add,
> > +   .del= smmu_pmu_event_del,
> > +   .start  = smmu_pmu_event_start,
> > +   .stop   = smmu_pmu_event_stop,
> > +   .read   = smmu_pmu_event_read,
> > +   .attr_groups= smmu_pmu_attr_grps,
> > +   };
> > +
> > +   res_0 = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +   smmu_pmu->reg_base = devm_ioremap_resource(dev, res_0);
> 
> We still need to solve the resource-claiming issue when one (or both) of
> the PMCG pages belongs to the parent device's register space. I recall
> we chucked a few nascent ideas about before; did anyone manage to come
> up with anything concrete?

Right. We had an early version of an evaluation board where we had this issue, 
but this has been fixed in an updated revision and is not a priority for now. 

Agree that this is an issue as the spec doesn’t forbid using parent SMMU 
register
space and it looks like not an easy one to solve either. The initial idea was 
setting
the PMCG as a child dev, but that didn’t help.

I had an off list discussion with Lorenzo on this, but nothing concrete.

Lorenzo, 

Please update if you have any new ideas/thoughts on this.

Thanks,
Shameer

> Robin.

RE: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver

2018-10-11 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 11 October 2018 12:26
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver
> 
> Hi Shameer,
> 
> One more thing...
> 
> On 21/09/18 16:08, Shameer Kolothum wrote:
> [...]
> > +static int smmu_pmu_probe(struct platform_device *pdev)
> > +{
> > +   struct smmu_pmu *smmu_pmu;
> > +   struct resource *res_0, *res_1;
> > +   u32 cfgr, reg_size;
> > +   u64 ceid_64[2];
> > +   int irq, err;
> > +   char *name;
> > +   struct device *dev = >dev;
> > +
> > +   smmu_pmu = devm_kzalloc(dev, sizeof(*smmu_pmu), GFP_KERNEL);
> > +   if (!smmu_pmu)
> > +   return -ENOMEM;
> > +
> > +   smmu_pmu->dev = dev;
> > +
> > +   platform_set_drvdata(pdev, smmu_pmu);
> > +   smmu_pmu->pmu = (struct pmu) {
> > +   .task_ctx_nr= perf_invalid_context,
> > +   .pmu_enable = smmu_pmu_enable,
> > +   .pmu_disable= smmu_pmu_disable,
> > +   .event_init = smmu_pmu_event_init,
> > +   .add= smmu_pmu_event_add,
> > +   .del= smmu_pmu_event_del,
> > +   .start  = smmu_pmu_event_start,
> > +   .stop   = smmu_pmu_event_stop,
> > +   .read   = smmu_pmu_event_read,
> > +   .attr_groups= smmu_pmu_attr_grps,
> > +   };
> > +
> > +   res_0 = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +   smmu_pmu->reg_base = devm_ioremap_resource(dev, res_0);
> 
> We still need to solve the resource-claiming issue when one (or both) of
> the PMCG pages belongs to the parent device's register space. I recall
> we chucked a few nascent ideas about before; did anyone manage to come
> up with anything concrete?

Right. We had an early version of an evaluation board where we had this issue, 
but this has been fixed in an updated revision and is not a priority for now. 

Agree that this is an issue as the spec doesn’t forbid using parent SMMU 
register
space and it looks like not an easy one to solve either. The initial idea was 
setting
the PMCG as a child dev, but that didn’t help.

I had an off list discussion with Lorenzo on this, but nothing concrete.

Lorenzo, 

Please update if you have any new ideas/thoughts on this.

Thanks,
Shameer

> Robin.

RE: [PATCH v3 1/3] acpi: arm64: add iort support for PMCG

2018-10-05 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 04 October 2018 18:35
> To: Lorenzo Pieralisi ; Shameerali Kolothum Thodi
> 
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v3 1/3] acpi: arm64: add iort support for PMCG
> 
> On 04/10/18 17:43, Lorenzo Pieralisi wrote:
> > On Fri, Sep 21, 2018 at 04:08:01PM +0100, Shameer Kolothum wrote:
> >> From: Neil Leeder 
> >>
> >> Add support for the SMMU Performance Monitor Counter Group
> >> information from ACPI. This is in preparation for its use
> >> in the SMMUv3 PMU driver.
> >>
> >> Signed-off-by: Neil Leeder 
> >> Signed-off-by: Hanjun Guo 
> >> Signed-off-by: Shameer Kolothum
> 
> >> ---
> >>   drivers/acpi/arm64/iort.c | 78
> +++
> >>   1 file changed, 66 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> >> index 08f26db..b979c86 100644
> >> --- a/drivers/acpi/arm64/iort.c
> >> +++ b/drivers/acpi/arm64/iort.c
> >> @@ -356,7 +356,8 @@ static struct acpi_iort_node
> *iort_node_get_id(struct acpi_iort_node *node,
> >>if (map->flags & ACPI_IORT_ID_SINGLE_MAPPING) {
> >>if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT ||
> >>node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX ||
> >> -  node->type == ACPI_IORT_NODE_SMMU_V3) {
> >> +  node->type == ACPI_IORT_NODE_SMMU_V3 ||
> >> +  node->type == ACPI_IORT_NODE_PMCG) {
> >>*id_out = map->output_base;
> >>return parent;
> >>}
> >> @@ -394,6 +395,8 @@ static int iort_get_id_mapping_index(struct
> acpi_iort_node *node)
> >>}
> >>
> >>return smmu->id_mapping_index;
> >> +  case ACPI_IORT_NODE_PMCG:
> >> +  return 0;
> >>default:
> >>return -EINVAL;
> >>}
> >> @@ -1309,6 +1312,50 @@ static bool __init arm_smmu_is_coherent(struct
> acpi_iort_node *node)
> >>return smmu->flags & ACPI_IORT_SMMU_COHERENT_WALK;
> >>   }
> >>
> >> +static void __init arm_smmu_common_dma_configure(struct device *dev,
> >> +  enum dev_dma_attr attr)
> >> +{
> >> +  /* We expect the dma masks to be equivalent for all SMMUs set-ups */
> >> +  dev->dma_mask = >coherent_dma_mask;
> >> +
> >> +  /* Configure DMA for the page table walker */
> >> +  acpi_dma_configure(dev, attr);
> >> +}
> >
> > It looks like we can't get rid of this acpi_dma_configure() call
> > given that the platform device we create has no ACPI companion
> > (and I am not looking forward to fabricating one to make the
> > code homogeneous :)).
> 
> Yeah, given that this is essentially only for SMMUs, the alternatives
> all end up looking like too much bother to be worthwhile.
> 
> > Still, having two methods per IORT node type (dev_is_coherent() and
> > dev_dma_configure()) does not make much sense, we can merge it into one
> > I think.
> 
> Good point - looks the attr from dev_is_coherent is only ever passed
> through dev_dma_configure, so we may as well just have per-SMMU-type
> dev_dma_configure methods which retrieve their own relevant coherency
> directly. FWIW, on v2 I was tempted to suggest just wrapping the DMA
> setup in "if (node->type != ACPI_IORT_NODE_PMCG)..." rather than messing
> with more callbacks, but that clearly wouldn't fit well with the local
> style here.

Right,  attr is passed to dev_dma_configure only. I will merge the callbacks
as suggested in the next revision.

Thanks,
Shameer

> Robin.
> 
> >
> > Thanks,
> > Lorenzo
> >
> >> +static int __init arm_smmu_v3_pmcg_count_resources(struct
> acpi_iort_node *node)
> >> +{
> >> +  struct acpi_iort_pmcg *pmcg;
> >> +
> >> +  /* Retrieve PMCG specific data */
> >> +  pmcg = (struct acpi_iort_pmcg *)node->node_data;
> >> +
> >> +  /*
> >> +   * There are always 2 memory resources.
> >> +   * If the overflo

RE: [PATCH v3 1/3] acpi: arm64: add iort support for PMCG

2018-10-05 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 04 October 2018 18:35
> To: Lorenzo Pieralisi ; Shameerali Kolothum Thodi
> 
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v3 1/3] acpi: arm64: add iort support for PMCG
> 
> On 04/10/18 17:43, Lorenzo Pieralisi wrote:
> > On Fri, Sep 21, 2018 at 04:08:01PM +0100, Shameer Kolothum wrote:
> >> From: Neil Leeder 
> >>
> >> Add support for the SMMU Performance Monitor Counter Group
> >> information from ACPI. This is in preparation for its use
> >> in the SMMUv3 PMU driver.
> >>
> >> Signed-off-by: Neil Leeder 
> >> Signed-off-by: Hanjun Guo 
> >> Signed-off-by: Shameer Kolothum
> 
> >> ---
> >>   drivers/acpi/arm64/iort.c | 78
> +++
> >>   1 file changed, 66 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> >> index 08f26db..b979c86 100644
> >> --- a/drivers/acpi/arm64/iort.c
> >> +++ b/drivers/acpi/arm64/iort.c
> >> @@ -356,7 +356,8 @@ static struct acpi_iort_node
> *iort_node_get_id(struct acpi_iort_node *node,
> >>if (map->flags & ACPI_IORT_ID_SINGLE_MAPPING) {
> >>if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT ||
> >>node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX ||
> >> -  node->type == ACPI_IORT_NODE_SMMU_V3) {
> >> +  node->type == ACPI_IORT_NODE_SMMU_V3 ||
> >> +  node->type == ACPI_IORT_NODE_PMCG) {
> >>*id_out = map->output_base;
> >>return parent;
> >>}
> >> @@ -394,6 +395,8 @@ static int iort_get_id_mapping_index(struct
> acpi_iort_node *node)
> >>}
> >>
> >>return smmu->id_mapping_index;
> >> +  case ACPI_IORT_NODE_PMCG:
> >> +  return 0;
> >>default:
> >>return -EINVAL;
> >>}
> >> @@ -1309,6 +1312,50 @@ static bool __init arm_smmu_is_coherent(struct
> acpi_iort_node *node)
> >>return smmu->flags & ACPI_IORT_SMMU_COHERENT_WALK;
> >>   }
> >>
> >> +static void __init arm_smmu_common_dma_configure(struct device *dev,
> >> +  enum dev_dma_attr attr)
> >> +{
> >> +  /* We expect the dma masks to be equivalent for all SMMUs set-ups */
> >> +  dev->dma_mask = >coherent_dma_mask;
> >> +
> >> +  /* Configure DMA for the page table walker */
> >> +  acpi_dma_configure(dev, attr);
> >> +}
> >
> > It looks like we can't get rid of this acpi_dma_configure() call
> > given that the platform device we create has no ACPI companion
> > (and I am not looking forward to fabricating one to make the
> > code homogeneous :)).
> 
> Yeah, given that this is essentially only for SMMUs, the alternatives
> all end up looking like too much bother to be worthwhile.
> 
> > Still, having two methods per IORT node type (dev_is_coherent() and
> > dev_dma_configure()) does not make much sense, we can merge it into one
> > I think.
> 
> Good point - looks the attr from dev_is_coherent is only ever passed
> through dev_dma_configure, so we may as well just have per-SMMU-type
> dev_dma_configure methods which retrieve their own relevant coherency
> directly. FWIW, on v2 I was tempted to suggest just wrapping the DMA
> setup in "if (node->type != ACPI_IORT_NODE_PMCG)..." rather than messing
> with more callbacks, but that clearly wouldn't fit well with the local
> style here.

Right,  attr is passed to dev_dma_configure only. I will merge the callbacks
as suggested in the next revision.

Thanks,
Shameer

> Robin.
> 
> >
> > Thanks,
> > Lorenzo
> >
> >> +static int __init arm_smmu_v3_pmcg_count_resources(struct
> acpi_iort_node *node)
> >> +{
> >> +  struct acpi_iort_pmcg *pmcg;
> >> +
> >> +  /* Retrieve PMCG specific data */
> >> +  pmcg = (struct acpi_iort_pmcg *)node->node_data;
> >> +
> >> +  /*
> >> +   * There are always 2 memory resources.
> >> +   * If the overflo

RE: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver

2018-10-03 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 03 October 2018 11:37
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver
> 
> On 21/09/18 16:08, Shameer Kolothum wrote:
> [...]
> > +
> > +   err = cpuhp_state_add_instance_nocalls(cpuhp_state_num,
> > +  _pmu->node);
> 
> In theory a hotplug event could happen as soon as the instance is
> registered...
> 
> > +   if (err) {
> > +   dev_err(dev, "Error %d registering hotplug, PMU @%pa\n",
> > +   err, _0->start);
> > +   return err;
> > +   }
> > +
> > +   /* Pick one CPU to be the preferred one to use */
> > +   smmu_pmu->on_cpu = get_cpu();
> 
> ...so this looks too late, i.e. a race here can result in a bogus call
> to perf_pmu_migrate_context() with an uninitialised pmu.

Thanks Robin. I will reorder them.

Shameer

> Robin. 
> > +   WARN_ON(irq_set_affinity(smmu_pmu->irq, cpumask_of(smmu_pmu-
> >on_cpu)));
> > +
> > +   err = perf_pmu_register(_pmu->pmu, name, -1);
> > +   if (err) {
> > +   dev_err(dev, "Error %d registering PMU @%pa\n",
> > +   err, _0->start);
> > +   goto out_unregister;
> > +   }
> > +
> > +   put_cpu();

RE: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver

2018-10-03 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 03 October 2018 11:37
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver
> 
> On 21/09/18 16:08, Shameer Kolothum wrote:
> [...]
> > +
> > +   err = cpuhp_state_add_instance_nocalls(cpuhp_state_num,
> > +  _pmu->node);
> 
> In theory a hotplug event could happen as soon as the instance is
> registered...
> 
> > +   if (err) {
> > +   dev_err(dev, "Error %d registering hotplug, PMU @%pa\n",
> > +   err, _0->start);
> > +   return err;
> > +   }
> > +
> > +   /* Pick one CPU to be the preferred one to use */
> > +   smmu_pmu->on_cpu = get_cpu();
> 
> ...so this looks too late, i.e. a race here can result in a bogus call
> to perf_pmu_migrate_context() with an uninitialised pmu.

Thanks Robin. I will reorder them.

Shameer

> Robin. 
> > +   WARN_ON(irq_set_affinity(smmu_pmu->irq, cpumask_of(smmu_pmu-
> >on_cpu)));
> > +
> > +   err = perf_pmu_register(_pmu->pmu, name, -1);
> > +   if (err) {
> > +   dev_err(dev, "Error %d registering PMU @%pa\n",
> > +   err, _0->start);
> > +   goto out_unregister;
> > +   }
> > +
> > +   put_cpu();

RE: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver

2018-10-03 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 02 October 2018 17:35
> To: Jean-Philippe Brucker ; Shameerali
> Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; pa...@codeaurora.org; John Garry
> ; will.dea...@arm.com; rruig...@codeaurora.org;
> Linuxarm ; linux-a...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Guohanjun (Hanjun Guo)
> ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver
> 
> On 02/10/18 17:19, Jean-Philippe Brucker wrote:
> > On 02/10/2018 15:11, Jean-Philippe Brucker wrote:
> >>> + cfgr = readl_relaxed(smmu_pmu->reg_base + SMMU_PMCG_CFGR);
> >
> > Something I missed previously: when SMMU_PMCG_CFGR.SID_FILTER_TYPE
> is 1,
> > filtering for all counters is configured by SMMU_PMCG_SMR0 and
> > SMMU_PMCG_EVTYPER0 (instead of having one separate filter per counter).
> 
> Oh, I hadn't even noticed it had that mode as well...

Thanks Jean. Missed that completely. 
 
> > In that mode with your patch, if the user applies a filter to the first
> > event in the list passed to perf, it will be applied to all events.
> > Filter applied on any subsequent event will be ignored. Could we make
> > this more explicit? Maybe in the probe print that the PMCG is
> > global-filtering, and when attempting to apply a filter to something
> > else than EVCNTR0, return an error?
> 
> FWIW filtering is always per-counter-group on the SMMUv2 PMU, and it's
> actually pretty straightforward to cope with - pmu->add() just needs to
> reject the event if one with an incompatible configuration is already
> scheduled, so perf core handles it much like having more events than
> counters, by rotating the mutually-exclusive sets.

I will take a look and address this in next version.

Thanks,
Shameer

RE: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver

2018-10-03 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 02 October 2018 17:35
> To: Jean-Philippe Brucker ; Shameerali
> Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; pa...@codeaurora.org; John Garry
> ; will.dea...@arm.com; rruig...@codeaurora.org;
> Linuxarm ; linux-a...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Guohanjun (Hanjun Guo)
> ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver
> 
> On 02/10/18 17:19, Jean-Philippe Brucker wrote:
> > On 02/10/2018 15:11, Jean-Philippe Brucker wrote:
> >>> + cfgr = readl_relaxed(smmu_pmu->reg_base + SMMU_PMCG_CFGR);
> >
> > Something I missed previously: when SMMU_PMCG_CFGR.SID_FILTER_TYPE
> is 1,
> > filtering for all counters is configured by SMMU_PMCG_SMR0 and
> > SMMU_PMCG_EVTYPER0 (instead of having one separate filter per counter).
> 
> Oh, I hadn't even noticed it had that mode as well...

Thanks Jean. Missed that completely. 
 
> > In that mode with your patch, if the user applies a filter to the first
> > event in the list passed to perf, it will be applied to all events.
> > Filter applied on any subsequent event will be ignored. Could we make
> > this more explicit? Maybe in the probe print that the PMCG is
> > global-filtering, and when attempting to apply a filter to something
> > else than EVCNTR0, return an error?
> 
> FWIW filtering is always per-counter-group on the SMMUv2 PMU, and it's
> actually pretty straightforward to cope with - pmu->add() just needs to
> reject the event if one with an incompatible configuration is already
> scheduled, so perf core handles it much like having more events than
> counters, by rotating the mutually-exclusive sets.

I will take a look and address this in next version.

Thanks,
Shameer

RE: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver

2018-10-03 Thread Shameerali Kolothum Thodi

Hi Jean,

> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: 02 October 2018 15:11
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com; robin.mur...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; pa...@codeaurora.org; John Garry
> ; will.dea...@arm.com; rruig...@codeaurora.org;
> Linuxarm ; linux-kernel@vger.kernel.org; linux-
> a...@vger.kernel.org; Guohanjun (Hanjun Guo) ;
> linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver
> 
> Hi Shameer,
> 
> I have a few comments below, mostly naive since I don't know anything
> about perf drivers.

Thanks for taking a look at this.

> On 21/09/2018 16:08, Shameer Kolothum wrote:
> > From: Neil Leeder 
> >
> > Adds a new driver to support the SMMUv3 PMU and add it into the
> > perf events framework.
> >
> > Each SMMU node may have multiple PMUs associated with it, each of
> > which may support different events.
> >
> > SMMUv3 PMCG devices are named as smmuv3_pmcg_
> where
> >  is the physical page address of the SMMU PMCG.
> > For example, the PMCG at 0xff8884 is named smmuv3_pmcg_ff88840
> >
> > Filtering by stream id is done by specifying filtering parameters
> > with the event. options are:
> >filter_enable- 0 = no filtering, 1 = filtering enabled
> >filter_span  - 0 = exact match, 1 = pattern match
> >filter_stream_id - pattern to filter against
> > Further filtering information is available in the SMMU documentation.
> >
> > Example: perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1,
> >filter_span=1,filter_stream_id=0x42/ -a pwd
> > Applies filter pattern 0x42 to transaction events.
> >
> > SMMU events are not attributable to a CPU, so task mode and sampling
> > are not supported.
> >
> > Signed-off-by: Neil Leeder 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >  drivers/perf/Kconfig  |   9 +
> >  drivers/perf/Makefile |   1 +
> >  drivers/perf/arm_smmuv3_pmu.c | 736
> ++
> >  3 files changed, 746 insertions(+)
> >  create mode 100644 drivers/perf/arm_smmuv3_pmu.c
> >
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> > index 08ebaf7..34969dd 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -52,6 +52,15 @@ config ARM_PMU_ACPI
> > depends on ARM_PMU && ACPI
> > def_bool y
> >
> > +config ARM_SMMU_V3_PMU
> > +bool "ARM SMMUv3 Performance Monitors {Extension}"
> 
> Why the curly braces? I didn't find that notation in other Kconfig files

Hmm..That's probably because I just copied a suggestion from previous
review. I will double check and correct it.

> > +depends on ARM64 && ACPI && ARM_SMMU_V3
> > +  help
> > +  Provides support for the SMMU version 3 performance monitor unit
> (PMU)
> > +  on ARM-based systems.
> > +  Adds the SMMU PMU into the perf events subsystem for
> > +  monitoring SMMU performance events.
> > +
> >  config ARM_DSU_PMU
> > tristate "ARM DynamIQ Shared Unit (DSU) PMU"
> > depends on ARM64
> > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> > index b3902bd..f10a932 100644
> > --- a/drivers/perf/Makefile
> > +++ b/drivers/perf/Makefile
> [...]
> > +/*
> > + * This driver adds support for perf events to use the Performance
> > + * Monitor Counter Groups (PMCG) associated with an SMMUv3 node
> > + * to monitor that node.
> > + *
> > + * SMMUv3 PMCG devices are named as
> smmuv3_pmcg_ where
> > + *  is the physical page address of the SMMU PMCG.
> > + * For example, the PMCG at 0xff8884 is named
> smmuv3_pmcg_ff88840
> > +
> > + * Filtering by stream id is done by specifying filtering parameters
> > + * with the event. options are:
> > + *   filter_enable- 0 = no filtering, 1 = filtering enabled
> > + *   filter_span  - 0 = exact match, 1 = pattern match
> > + *   filter_stream_id - pattern to filter against
> > + * Further filtering information is available in the SMMU documentation.
> > + *
> > + * Example: perf stat -e
> smmuv3_pmcg_ff88840/transaction,filter_enable=1,
> > + *   filter_span=1,filter_stream_id=0x42/ -a pwd
> 
> I'm curious, why is pwd used as example? Wouldn't something like netperf
> be a more realistic workload?

Agree. That’s a more relevant wo

RE: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver

2018-10-03 Thread Shameerali Kolothum Thodi

Hi Jean,

> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: 02 October 2018 15:11
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com; robin.mur...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; pa...@codeaurora.org; John Garry
> ; will.dea...@arm.com; rruig...@codeaurora.org;
> Linuxarm ; linux-kernel@vger.kernel.org; linux-
> a...@vger.kernel.org; Guohanjun (Hanjun Guo) ;
> linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH v3 2/3] perf: add arm64 smmuv3 pmu driver
> 
> Hi Shameer,
> 
> I have a few comments below, mostly naive since I don't know anything
> about perf drivers.

Thanks for taking a look at this.

> On 21/09/2018 16:08, Shameer Kolothum wrote:
> > From: Neil Leeder 
> >
> > Adds a new driver to support the SMMUv3 PMU and add it into the
> > perf events framework.
> >
> > Each SMMU node may have multiple PMUs associated with it, each of
> > which may support different events.
> >
> > SMMUv3 PMCG devices are named as smmuv3_pmcg_
> where
> >  is the physical page address of the SMMU PMCG.
> > For example, the PMCG at 0xff8884 is named smmuv3_pmcg_ff88840
> >
> > Filtering by stream id is done by specifying filtering parameters
> > with the event. options are:
> >filter_enable- 0 = no filtering, 1 = filtering enabled
> >filter_span  - 0 = exact match, 1 = pattern match
> >filter_stream_id - pattern to filter against
> > Further filtering information is available in the SMMU documentation.
> >
> > Example: perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1,
> >filter_span=1,filter_stream_id=0x42/ -a pwd
> > Applies filter pattern 0x42 to transaction events.
> >
> > SMMU events are not attributable to a CPU, so task mode and sampling
> > are not supported.
> >
> > Signed-off-by: Neil Leeder 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >  drivers/perf/Kconfig  |   9 +
> >  drivers/perf/Makefile |   1 +
> >  drivers/perf/arm_smmuv3_pmu.c | 736
> ++
> >  3 files changed, 746 insertions(+)
> >  create mode 100644 drivers/perf/arm_smmuv3_pmu.c
> >
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> > index 08ebaf7..34969dd 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -52,6 +52,15 @@ config ARM_PMU_ACPI
> > depends on ARM_PMU && ACPI
> > def_bool y
> >
> > +config ARM_SMMU_V3_PMU
> > +bool "ARM SMMUv3 Performance Monitors {Extension}"
> 
> Why the curly braces? I didn't find that notation in other Kconfig files

Hmm..That's probably because I just copied a suggestion from previous
review. I will double check and correct it.

> > +depends on ARM64 && ACPI && ARM_SMMU_V3
> > +  help
> > +  Provides support for the SMMU version 3 performance monitor unit
> (PMU)
> > +  on ARM-based systems.
> > +  Adds the SMMU PMU into the perf events subsystem for
> > +  monitoring SMMU performance events.
> > +
> >  config ARM_DSU_PMU
> > tristate "ARM DynamIQ Shared Unit (DSU) PMU"
> > depends on ARM64
> > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> > index b3902bd..f10a932 100644
> > --- a/drivers/perf/Makefile
> > +++ b/drivers/perf/Makefile
> [...]
> > +/*
> > + * This driver adds support for perf events to use the Performance
> > + * Monitor Counter Groups (PMCG) associated with an SMMUv3 node
> > + * to monitor that node.
> > + *
> > + * SMMUv3 PMCG devices are named as
> smmuv3_pmcg_ where
> > + *  is the physical page address of the SMMU PMCG.
> > + * For example, the PMCG at 0xff8884 is named
> smmuv3_pmcg_ff88840
> > +
> > + * Filtering by stream id is done by specifying filtering parameters
> > + * with the event. options are:
> > + *   filter_enable- 0 = no filtering, 1 = filtering enabled
> > + *   filter_span  - 0 = exact match, 1 = pattern match
> > + *   filter_stream_id - pattern to filter against
> > + * Further filtering information is available in the SMMU documentation.
> > + *
> > + * Example: perf stat -e
> smmuv3_pmcg_ff88840/transaction,filter_enable=1,
> > + *   filter_span=1,filter_stream_id=0x42/ -a pwd
> 
> I'm curious, why is pwd used as example? Wouldn't something like netperf
> be a more realistic workload?

Agree. That’s a more relevant wo

RE: [PATCH v2 3/4] perf: add arm64 smmuv3 pmu driver

2018-09-12 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 11 September 2018 11:25
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v2 3/4] perf: add arm64 smmuv3 pmu driver
> 
> On 10/09/18 17:37, Shameerali Kolothum Thodi wrote:
> [...]
> >>> @@ -0,0 +1,838 @@
> >>> +// SPDX-License-Identifier: GPL-2.0+
> >>> +/* Copyright (c) 2017 The Linux Foundation. All rights reserved.
> >>> + *
> >>> + * This program is free software; you can redistribute it and/or
> >>> +modify
> >>> + * it under the terms of the GNU General Public License version 2 and
> >>> + * only version 2 as published by the Free Software Foundation.
> >>> + *
> >>> + * This program is distributed in the hope that it will be useful,
> >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >>> + * GNU General Public License for more details.
> >>
> >> You don't really need to add the license text as well as SPDX. Except for 
> >> the
> fact
> >> that in this case they don't match - which is it?
> >
> > Right. I will stick to SPDX-License-Identifier: GPL-2.0+
> 
> My question there is about the "+" - the license of the original patch
> was GPL-2.0, and I'm not sure about the legitimacy of quietly changing
> it to 2.0-or-later, especially without any visible agreement from
> previous contributors.

Ah..To avoid complication, I will change it to SPDX-License-Identifier: GPL-2.0.

> [...]
> >> Also, how relevant is it going to be for future DT support? We don't really
> want
> >> too many artificial dependencies on the way ACPI support happens to
> currently
> >> be implemented.
> >
> > Sorry, it's not clear to me what is proposed here as far as naming the PMU 
> > is
> > concerned. Please see below as well.
> 
> Here I mean whether pdev->id is meaningful for OF platform devices in
> the same way as for IORT devices in terms of uniqueness - it may well
> be, but if it isn't then we should find a better alternative.

Ok. Thanks for clarifying this.
 
> >>> +out:
> >>> + kfree(temp);
> >>> + return ret;
> >>> +}
> >>> +
> >>> +
> >>> +static char *smmu_pmu_assign_name(struct smmu_pmu *pmu) {
> >>> + unsigned long id;
> >>> + struct device *smmu, *dev = pmu->dev;
> >>> + char *s_name = NULL, *p_name = NULL;
> >>> +
> >>> + smmu = iort_find_pmcg_ref_smmu(dev);
> >>> + if (smmu) {
> >>> + if (!smmu_pmu_get_dev_id(dev_name(smmu), ))
> >>> + s_name = kasprintf(GFP_KERNEL,
> >> "arm_smmu_v3_%lu", id);
> >>> + }
> >>> +
> >>> + if (!s_name)
> >>> + s_name = kasprintf(GFP_KERNEL, "arm_smmu_v3");
> >>
> >> As I touched on before, I think it's worth generalising this from the 
> >> start, and
> >> trying to resolve the component reference to a struct device rather than
> >> IORT/SMMU specific internals. However it also occurs to me that maybe this
> >> isn't as important as it first seemed - since the auto-numbered ID doesn't
> >> actually say which PMCG is which, the only way for the user to actually
> identify
> >> which PMU is the correct one to count events for a particular endpoint is
> still to
> >> grovel up the base address, so as long as the PMU name uniquely correlates
> to
> >> the PMCG device, I'm not sure anything really matters beyond that.
> >
> > So If I understand this correctly,
> >
> > iort_find_pmcg_ref_smmu() should be something like  iort_find_pmcg_ref()
> > which returns the associated struct device for the ref node and then, pmu is
> > named as,
> >
> > arm_smmu_v3_x_pmcg_y
> > nc_dev_name_x_pmcg_y
> > pci_pmcg_y  (It’s a bit tricky for RC as we will end up with struct 
> > pci_bus)
> >
> > (where x and y are auto ids)
> >
> > Please let me know if this is what is proposed here.
> 
> That's more or less what I was angling at, but as mentioned I realise
> it

RE: [PATCH v2 3/4] perf: add arm64 smmuv3 pmu driver

2018-09-12 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 11 September 2018 11:25
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v2 3/4] perf: add arm64 smmuv3 pmu driver
> 
> On 10/09/18 17:37, Shameerali Kolothum Thodi wrote:
> [...]
> >>> @@ -0,0 +1,838 @@
> >>> +// SPDX-License-Identifier: GPL-2.0+
> >>> +/* Copyright (c) 2017 The Linux Foundation. All rights reserved.
> >>> + *
> >>> + * This program is free software; you can redistribute it and/or
> >>> +modify
> >>> + * it under the terms of the GNU General Public License version 2 and
> >>> + * only version 2 as published by the Free Software Foundation.
> >>> + *
> >>> + * This program is distributed in the hope that it will be useful,
> >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >>> + * GNU General Public License for more details.
> >>
> >> You don't really need to add the license text as well as SPDX. Except for 
> >> the
> fact
> >> that in this case they don't match - which is it?
> >
> > Right. I will stick to SPDX-License-Identifier: GPL-2.0+
> 
> My question there is about the "+" - the license of the original patch
> was GPL-2.0, and I'm not sure about the legitimacy of quietly changing
> it to 2.0-or-later, especially without any visible agreement from
> previous contributors.

Ah..To avoid complication, I will change it to SPDX-License-Identifier: GPL-2.0.

> [...]
> >> Also, how relevant is it going to be for future DT support? We don't really
> want
> >> too many artificial dependencies on the way ACPI support happens to
> currently
> >> be implemented.
> >
> > Sorry, it's not clear to me what is proposed here as far as naming the PMU 
> > is
> > concerned. Please see below as well.
> 
> Here I mean whether pdev->id is meaningful for OF platform devices in
> the same way as for IORT devices in terms of uniqueness - it may well
> be, but if it isn't then we should find a better alternative.

Ok. Thanks for clarifying this.
 
> >>> +out:
> >>> + kfree(temp);
> >>> + return ret;
> >>> +}
> >>> +
> >>> +
> >>> +static char *smmu_pmu_assign_name(struct smmu_pmu *pmu) {
> >>> + unsigned long id;
> >>> + struct device *smmu, *dev = pmu->dev;
> >>> + char *s_name = NULL, *p_name = NULL;
> >>> +
> >>> + smmu = iort_find_pmcg_ref_smmu(dev);
> >>> + if (smmu) {
> >>> + if (!smmu_pmu_get_dev_id(dev_name(smmu), ))
> >>> + s_name = kasprintf(GFP_KERNEL,
> >> "arm_smmu_v3_%lu", id);
> >>> + }
> >>> +
> >>> + if (!s_name)
> >>> + s_name = kasprintf(GFP_KERNEL, "arm_smmu_v3");
> >>
> >> As I touched on before, I think it's worth generalising this from the 
> >> start, and
> >> trying to resolve the component reference to a struct device rather than
> >> IORT/SMMU specific internals. However it also occurs to me that maybe this
> >> isn't as important as it first seemed - since the auto-numbered ID doesn't
> >> actually say which PMCG is which, the only way for the user to actually
> identify
> >> which PMU is the correct one to count events for a particular endpoint is
> still to
> >> grovel up the base address, so as long as the PMU name uniquely correlates
> to
> >> the PMCG device, I'm not sure anything really matters beyond that.
> >
> > So If I understand this correctly,
> >
> > iort_find_pmcg_ref_smmu() should be something like  iort_find_pmcg_ref()
> > which returns the associated struct device for the ref node and then, pmu is
> > named as,
> >
> > arm_smmu_v3_x_pmcg_y
> > nc_dev_name_x_pmcg_y
> > pci_pmcg_y  (It’s a bit tricky for RC as we will end up with struct 
> > pci_bus)
> >
> > (where x and y are auto ids)
> >
> > Please let me know if this is what is proposed here.
> 
> That's more or less what I was angling at, but as mentioned I realise
> it

RE: [PATCH v2 4/4] perf/smmuv3: Add MSI irq support

2018-09-10 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 10 September 2018 12:15
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v2 4/4] perf/smmuv3: Add MSI irq support
> 
> On 2018-07-24 12:45 PM, Shameer Kolothum wrote:
> > This adds support for MSI based counter overflow interrupt.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> >   drivers/perf/arm_smmuv3_pmu.c | 105
> +-
> >   1 file changed, 84 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/perf/arm_smmuv3_pmu.c
> b/drivers/perf/arm_smmuv3_pmu.c
> > index b3dc394..ca69813 100644
> > --- a/drivers/perf/arm_smmuv3_pmu.c
> > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > @@ -94,6 +94,10 @@
> >   #define SMMU_PMCG_IRQ_CFG2  0xE64
> >   #define SMMU_PMCG_IRQ_STATUS0xE68
> >
> > +/* MSI config fields */
> > +#define MSI_CFG0_ADDR_MASK  GENMASK_ULL(51, 2)
> > +#define MSI_CFG2_MEMATTR_DEVICE_nGnRE   0x1
> > +
> >   #define SMMU_COUNTER_RELOAD BIT(31)
> >   #define SMMU_DEFAULT_FILTER_SEC 0
> >   #define SMMU_DEFAULT_FILTER_SPAN1
> > @@ -657,14 +661,89 @@ static irqreturn_t smmu_pmu_handle_irq(int
> irq_num, void *data)
> > return IRQ_HANDLED;
> >   }
> >
> > +static void smmu_pmu_free_msis(void *data)
> > +{
> > +   struct device *dev = data;
> > +
> > +   platform_msi_domain_free_irqs(dev);
> > +}
> > +
> > +static void smmu_pmu_write_msi_msg(struct msi_desc *desc, struct
> msi_msg *msg)
> > +{
> > +   phys_addr_t doorbell;
> > +   struct device *dev = msi_desc_to_dev(desc);
> > +   struct smmu_pmu *pmu = dev_get_drvdata(dev);
> > +
> > +   doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;
> > +   doorbell &= MSI_CFG0_ADDR_MASK;
> > +
> > +   writeq_relaxed(doorbell, pmu->reg_base + SMMU_PMCG_IRQ_CFG0);
> > +   writel_relaxed(msg->data, pmu->reg_base +
> SMMU_PMCG_IRQ_CFG1);
> > +   writel_relaxed(MSI_CFG2_MEMATTR_DEVICE_nGnRE,
> > +   pmu->reg_base + SMMU_PMCG_IRQ_CFG2);
> > +}
> > +
> > +static void smmu_pmu_setup_msi(struct smmu_pmu *pmu)
> > +{
> > +   struct msi_desc *desc;
> > +   struct device *dev = pmu->dev;
> > +   int ret;
> > +
> > +   /* Clear MSI address reg */
> > +   writeq_relaxed(0, pmu->reg_base + SMMU_PMCG_IRQ_CFG0);
> > +
> > +   /* MSI supported or not */
> > +   if (!(readl(pmu->reg_base + SMMU_PMCG_CFGR) &
> SMMU_PMCG_CFGR_MSI))
> > +   return;
> > +
> > +   ret = platform_msi_domain_alloc_irqs(dev, 1,
> smmu_pmu_write_msi_msg);
> > +   if (ret) {
> > +   dev_warn(dev, "failed to allocate MSIs\n");
> > +   return;
> > +   }
> > +
> > +   desc = first_msi_entry(dev);
> > +   if (desc)
> > +   pmu->irq = desc->irq;
> > +
> > +   /* Add callback to free MSIs on teardown */
> > +   devm_add_action(dev, smmu_pmu_free_msis, dev);
> > +}
> > +
> > +static int smmu_pmu_setup_irq(struct smmu_pmu *pmu)
> > +{
> > +   int irq, ret = -ENXIO;
> > +
> > +   smmu_pmu_setup_msi(pmu);
> > +
> > +   irq = pmu->irq;
> > +   if (irq)
> > +   ret = devm_request_irq(pmu->dev, irq,
> smmu_pmu_handle_irq,
> > +  IRQF_NOBALANCING | IRQF_SHARED |
> IRQF_NO_THREAD,
> > +  "smmu-v3-pmu", pmu);
> > +   return ret;
> > +}
> > +
> >   static int smmu_pmu_reset(struct smmu_pmu *smmu_pmu)
> >   {
> > +   int ret;
> > +
> > /* Disable counter and interrupt */
> > writeq(smmu_pmu->counter_present_mask,
> > smmu_pmu->reg_base + SMMU_PMCG_CNTENCLR0);
> > writeq(smmu_pmu->counter_present_mask,
> > smmu_pmu->reg_base + SMMU_PMCG_INTENCLR0);
> >
> > +   ret = smmu_pmu_setup_irq(smmu_pmu);
> 
> Why are we moving this out of probe? We may perform a reset more than
> once (e.g. if we get round to system PM support), at which point this
> looks logically wrong.

I didn’t consid

RE: [PATCH v2 4/4] perf/smmuv3: Add MSI irq support

2018-09-10 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 10 September 2018 12:15
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v2 4/4] perf/smmuv3: Add MSI irq support
> 
> On 2018-07-24 12:45 PM, Shameer Kolothum wrote:
> > This adds support for MSI based counter overflow interrupt.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> >   drivers/perf/arm_smmuv3_pmu.c | 105
> +-
> >   1 file changed, 84 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/perf/arm_smmuv3_pmu.c
> b/drivers/perf/arm_smmuv3_pmu.c
> > index b3dc394..ca69813 100644
> > --- a/drivers/perf/arm_smmuv3_pmu.c
> > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > @@ -94,6 +94,10 @@
> >   #define SMMU_PMCG_IRQ_CFG2  0xE64
> >   #define SMMU_PMCG_IRQ_STATUS0xE68
> >
> > +/* MSI config fields */
> > +#define MSI_CFG0_ADDR_MASK  GENMASK_ULL(51, 2)
> > +#define MSI_CFG2_MEMATTR_DEVICE_nGnRE   0x1
> > +
> >   #define SMMU_COUNTER_RELOAD BIT(31)
> >   #define SMMU_DEFAULT_FILTER_SEC 0
> >   #define SMMU_DEFAULT_FILTER_SPAN1
> > @@ -657,14 +661,89 @@ static irqreturn_t smmu_pmu_handle_irq(int
> irq_num, void *data)
> > return IRQ_HANDLED;
> >   }
> >
> > +static void smmu_pmu_free_msis(void *data)
> > +{
> > +   struct device *dev = data;
> > +
> > +   platform_msi_domain_free_irqs(dev);
> > +}
> > +
> > +static void smmu_pmu_write_msi_msg(struct msi_desc *desc, struct
> msi_msg *msg)
> > +{
> > +   phys_addr_t doorbell;
> > +   struct device *dev = msi_desc_to_dev(desc);
> > +   struct smmu_pmu *pmu = dev_get_drvdata(dev);
> > +
> > +   doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;
> > +   doorbell &= MSI_CFG0_ADDR_MASK;
> > +
> > +   writeq_relaxed(doorbell, pmu->reg_base + SMMU_PMCG_IRQ_CFG0);
> > +   writel_relaxed(msg->data, pmu->reg_base +
> SMMU_PMCG_IRQ_CFG1);
> > +   writel_relaxed(MSI_CFG2_MEMATTR_DEVICE_nGnRE,
> > +   pmu->reg_base + SMMU_PMCG_IRQ_CFG2);
> > +}
> > +
> > +static void smmu_pmu_setup_msi(struct smmu_pmu *pmu)
> > +{
> > +   struct msi_desc *desc;
> > +   struct device *dev = pmu->dev;
> > +   int ret;
> > +
> > +   /* Clear MSI address reg */
> > +   writeq_relaxed(0, pmu->reg_base + SMMU_PMCG_IRQ_CFG0);
> > +
> > +   /* MSI supported or not */
> > +   if (!(readl(pmu->reg_base + SMMU_PMCG_CFGR) &
> SMMU_PMCG_CFGR_MSI))
> > +   return;
> > +
> > +   ret = platform_msi_domain_alloc_irqs(dev, 1,
> smmu_pmu_write_msi_msg);
> > +   if (ret) {
> > +   dev_warn(dev, "failed to allocate MSIs\n");
> > +   return;
> > +   }
> > +
> > +   desc = first_msi_entry(dev);
> > +   if (desc)
> > +   pmu->irq = desc->irq;
> > +
> > +   /* Add callback to free MSIs on teardown */
> > +   devm_add_action(dev, smmu_pmu_free_msis, dev);
> > +}
> > +
> > +static int smmu_pmu_setup_irq(struct smmu_pmu *pmu)
> > +{
> > +   int irq, ret = -ENXIO;
> > +
> > +   smmu_pmu_setup_msi(pmu);
> > +
> > +   irq = pmu->irq;
> > +   if (irq)
> > +   ret = devm_request_irq(pmu->dev, irq,
> smmu_pmu_handle_irq,
> > +  IRQF_NOBALANCING | IRQF_SHARED |
> IRQF_NO_THREAD,
> > +  "smmu-v3-pmu", pmu);
> > +   return ret;
> > +}
> > +
> >   static int smmu_pmu_reset(struct smmu_pmu *smmu_pmu)
> >   {
> > +   int ret;
> > +
> > /* Disable counter and interrupt */
> > writeq(smmu_pmu->counter_present_mask,
> > smmu_pmu->reg_base + SMMU_PMCG_CNTENCLR0);
> > writeq(smmu_pmu->counter_present_mask,
> > smmu_pmu->reg_base + SMMU_PMCG_INTENCLR0);
> >
> > +   ret = smmu_pmu_setup_irq(smmu_pmu);
> 
> Why are we moving this out of probe? We may perform a reset more than
> once (e.g. if we get round to system PM support), at which point this
> looks logically wrong.

I didn’t consid

RE: [PATCH v2 3/4] perf: add arm64 smmuv3 pmu driver

2018-09-10 Thread Shameerali Kolothum Thodi

Hi Robin,

> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 10 September 2018 12:02
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v2 3/4] perf: add arm64 smmuv3 pmu driver
> 
> [ note: for some reason I decided to review this from the bottom up,
>so it probably makes no sense unless you read it backwards ]
> 
> On 2018-07-24 12:45 PM, Shameer Kolothum wrote:
> > From: Neil Leeder 
> >
> > Adds a new driver to support the SMMU v3 PMU and add it into the perf
> > events framework.
> >
> > Each SMMU node may have multiple PMUs associated with it, each of
> > which may support different events.
> >
> > SMMUv3 PMCG devices are named as arm_smmu_v3_x_pmcg_y where x
> denotes
> > the associated smmuv3 dev id(if any) and y denotes the pmu dev id.
> >
> > Filtering by stream id is done by specifying filtering parameters with
> > the event. options are:
> > filter_enable- 0 = no filtering, 1 = filtering enabled
> > filter_span  - 0 = exact match, 1 = pattern match
> > filter_stream_id - pattern to filter against Further filtering
> > information is available in the SMMU documentation.
> >
> > Example: perf stat -e arm_smmu_v3_0_pmcg_6/transaction,filter_enable=1,
> > filter_span=1,filter_stream_id=0x42/ -a pwd
> > Applies filter pattern 0x42 to transaction events.
> >
> > SMMU events are not attributable to a CPU, so task mode and sampling
> > are not supported.
> >
> > Signed-off-by: Neil Leeder 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >   drivers/perf/Kconfig  |   9 +
> >   drivers/perf/Makefile |   1 +
> >   drivers/perf/arm_smmuv3_pmu.c | 838
> ++
> >   3 files changed, 848 insertions(+)
> >   create mode 100644 drivers/perf/arm_smmuv3_pmu.c
> >
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig index
> > 08ebaf7..0b9cc1a 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -52,6 +52,15 @@ config ARM_PMU_ACPI
> > depends on ARM_PMU && ACPI
> > def_bool y
> >
> > +config ARM_SMMUV3_PMU
> > +bool "ARM SMMUv3 PMU"
> 
> Nit: I'd be inlined to use "Performance Monitors {Extension}" or "PMCG"
> in user-facing text, since "PMU" is not the architectural terminology in this
> particular case.

Ok.
 
> > +depends on ARM64 && ACPI
> > +  help
> > +  Provides support for the SMMU version 3 performance monitor unit
> (PMU)
> > +  on ARM-based systems.
> > +  Adds the SMMU PMU into the perf events subsystem for
> > +  monitoring SMMU performance events.
> > +
> >   config ARM_DSU_PMU
> > tristate "ARM DynamIQ Shared Unit (DSU) PMU"
> > depends on ARM64
> > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile index
> > b3902bd..b3ae48d 100644
> > --- a/drivers/perf/Makefile
> > +++ b/drivers/perf/Makefile
> > @@ -4,6 +4,7 @@ obj-$(CONFIG_ARM_CCN) += arm-ccn.o
> >   obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o
> >   obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
> >   obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
> > +obj-$(CONFIG_ARM_SMMUV3_PMU) += arm_smmuv3_pmu.o
> >   obj-$(CONFIG_HISI_PMU) += hisilicon/
> >   obj-$(CONFIG_QCOM_L2_PMU) += qcom_l2_pmu.o
> >   obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o diff --git
> > a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c new
> > file mode 100644 index 000..b3dc394
> > --- /dev/null
> > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > @@ -0,0 +1,838 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/* Copyright (c) 2017 The Linux Foundation. All rights reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > +modify
> > + * it under the terms of the GNU General Public License version 2 and
> > + * only version 2 as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A P

RE: [PATCH v2 3/4] perf: add arm64 smmuv3 pmu driver

2018-09-10 Thread Shameerali Kolothum Thodi

Hi Robin,

> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 10 September 2018 12:02
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v2 3/4] perf: add arm64 smmuv3 pmu driver
> 
> [ note: for some reason I decided to review this from the bottom up,
>so it probably makes no sense unless you read it backwards ]
> 
> On 2018-07-24 12:45 PM, Shameer Kolothum wrote:
> > From: Neil Leeder 
> >
> > Adds a new driver to support the SMMU v3 PMU and add it into the perf
> > events framework.
> >
> > Each SMMU node may have multiple PMUs associated with it, each of
> > which may support different events.
> >
> > SMMUv3 PMCG devices are named as arm_smmu_v3_x_pmcg_y where x
> denotes
> > the associated smmuv3 dev id(if any) and y denotes the pmu dev id.
> >
> > Filtering by stream id is done by specifying filtering parameters with
> > the event. options are:
> > filter_enable- 0 = no filtering, 1 = filtering enabled
> > filter_span  - 0 = exact match, 1 = pattern match
> > filter_stream_id - pattern to filter against Further filtering
> > information is available in the SMMU documentation.
> >
> > Example: perf stat -e arm_smmu_v3_0_pmcg_6/transaction,filter_enable=1,
> > filter_span=1,filter_stream_id=0x42/ -a pwd
> > Applies filter pattern 0x42 to transaction events.
> >
> > SMMU events are not attributable to a CPU, so task mode and sampling
> > are not supported.
> >
> > Signed-off-by: Neil Leeder 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >   drivers/perf/Kconfig  |   9 +
> >   drivers/perf/Makefile |   1 +
> >   drivers/perf/arm_smmuv3_pmu.c | 838
> ++
> >   3 files changed, 848 insertions(+)
> >   create mode 100644 drivers/perf/arm_smmuv3_pmu.c
> >
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig index
> > 08ebaf7..0b9cc1a 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -52,6 +52,15 @@ config ARM_PMU_ACPI
> > depends on ARM_PMU && ACPI
> > def_bool y
> >
> > +config ARM_SMMUV3_PMU
> > +bool "ARM SMMUv3 PMU"
> 
> Nit: I'd be inlined to use "Performance Monitors {Extension}" or "PMCG"
> in user-facing text, since "PMU" is not the architectural terminology in this
> particular case.

Ok.
 
> > +depends on ARM64 && ACPI
> > +  help
> > +  Provides support for the SMMU version 3 performance monitor unit
> (PMU)
> > +  on ARM-based systems.
> > +  Adds the SMMU PMU into the perf events subsystem for
> > +  monitoring SMMU performance events.
> > +
> >   config ARM_DSU_PMU
> > tristate "ARM DynamIQ Shared Unit (DSU) PMU"
> > depends on ARM64
> > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile index
> > b3902bd..b3ae48d 100644
> > --- a/drivers/perf/Makefile
> > +++ b/drivers/perf/Makefile
> > @@ -4,6 +4,7 @@ obj-$(CONFIG_ARM_CCN) += arm-ccn.o
> >   obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o
> >   obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
> >   obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
> > +obj-$(CONFIG_ARM_SMMUV3_PMU) += arm_smmuv3_pmu.o
> >   obj-$(CONFIG_HISI_PMU) += hisilicon/
> >   obj-$(CONFIG_QCOM_L2_PMU) += qcom_l2_pmu.o
> >   obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o diff --git
> > a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c new
> > file mode 100644 index 000..b3dc394
> > --- /dev/null
> > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > @@ -0,0 +1,838 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/* Copyright (c) 2017 The Linux Foundation. All rights reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > +modify
> > + * it under the terms of the GNU General Public License version 2 and
> > + * only version 2 as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A P

RE: [PATCH v2 1/4] acpi: arm64: add iort support for PMCG

2018-09-10 Thread Shameerali Kolothum Thodi

Hi Robin,

Thanks for going through this series,

> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 07 September 2018 16:36
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v2 1/4] acpi: arm64: add iort support for PMCG
> 
> On 24/07/18 12:45, Shameer Kolothum wrote:
> > From: Neil Leeder 
> >
> > Add support for the SMMU Performance Monitor Counter Group
> > information from ACPI. This is in preparation for its use
> > in the SMMU v3 PMU driver.
> >
> > Signed-off-by: Neil Leeder 
> > Signed-off-by: Hanjun Guo 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >   drivers/acpi/arm64/iort.c | 95
> +--
> >   1 file changed, 83 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> > index 7a3a541..ac4d0d6 100644
> > --- a/drivers/acpi/arm64/iort.c
> > +++ b/drivers/acpi/arm64/iort.c
> > @@ -356,7 +356,8 @@ static struct acpi_iort_node *iort_node_get_id(struct
> acpi_iort_node *node,
> > if (map->flags & ACPI_IORT_ID_SINGLE_MAPPING) {
> > if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT ||
> > node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX ||
> > -   node->type == ACPI_IORT_NODE_SMMU_V3) {
> > +   node->type == ACPI_IORT_NODE_SMMU_V3 ||
> > +   node->type == ACPI_IORT_NODE_PMCG) {
> > *id_out = map->output_base;
> > return parent;
> > }
> > @@ -394,6 +395,8 @@ static int iort_get_id_mapping_index(struct
> acpi_iort_node *node)
> > }
> >
> > return smmu->id_mapping_index;
> > +   case ACPI_IORT_NODE_PMCG:
> > +   return 0;
> 
> Why do we need a PMCG case here? AIUI this whole get_id_mapping_index
> business is only relevant to SMMUv3 nodes where we have some need to
> disambiguate the difference between the SMMU's own IDs and
> StreamID-to-DeviceID mappings within the same table. PMCGs simply have
> zero or one single ID mappings so should be equivalent to most named
> components (other than their mappings pointing straight to the ITS).

ITRC this is required for the iort_set_device_domain() function as
otherwise, dev_set_msi_domain() won't be called for PMCGs with MSI
support.

> > default:
> > return -EINVAL;
> > }
> > @@ -1287,6 +1290,63 @@ static bool __init arm_smmu_is_coherent(struct
> acpi_iort_node *node)
> > return smmu->flags & ACPI_IORT_SMMU_COHERENT_WALK;
> >   }
> >
> > +static void __init arm_smmu_common_dma_configure(struct device *dev,
> > +   enum dev_dma_attr attr)
> > +{
> > +   /* We expect the dma masks to be equivalent for all SMMUs
> set-ups */
> > +   dev->dma_mask = >coherent_dma_mask;
> > +
> > +   /* Configure DMA for the page table walker */
> > +   acpi_dma_configure(dev, attr);
> 
> Hmm, I don't think we actually need this call any more, since it should
> now happen later anyway via platform_dma_configure() as the relevant
> SMMU/PMCG driver binds.

This is only applicable to SMMU nodes. As you have noted below, these devices
are from the static table, so I am not sure platform_dma_configure() applies
here. I will double check.
 
> > +}
> > +
> > +static int __init arm_smmu_v3_pmu_count_resources(struct acpi_iort_node
> *node)
> 
> Can we be consistent with "pmcg" rather than "pmu" within IORT please?

Ok.

> 
> > +{
> > +   struct acpi_iort_pmcg *pmcg;
> > +
> > +   /* Retrieve PMCG specific data */
> > +   pmcg = (struct acpi_iort_pmcg *)node->node_data;
> > +
> > +   /*
> > +* There are always 2 memory resources.
> > +* If the overflow_gsiv is present then add that for a total of 3.
> > +*/
> > +   return pmcg->overflow_gsiv > 0 ? 3 : 2;
> > +}
> > +
> > +static void __init arm_smmu_v3_pmu_init_resources(struct resource *res,
> > +  struct acpi_iort_node *node)
> > +{
> > +   struct acpi_iort_pmcg *pmcg;
> > +
> > +   /* Retrieve PMCG specific data */
> > +   p

RE: [PATCH v2 1/4] acpi: arm64: add iort support for PMCG

2018-09-10 Thread Shameerali Kolothum Thodi

Hi Robin,

Thanks for going through this series,

> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: 07 September 2018 16:36
> To: Shameerali Kolothum Thodi ;
> lorenzo.pieral...@arm.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; Guohanjun (Hanjun Guo)
> ; John Garry ;
> pa...@codeaurora.org; vkil...@codeaurora.org; rruig...@codeaurora.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Linuxarm ;
> neil.m.lee...@gmail.com
> Subject: Re: [PATCH v2 1/4] acpi: arm64: add iort support for PMCG
> 
> On 24/07/18 12:45, Shameer Kolothum wrote:
> > From: Neil Leeder 
> >
> > Add support for the SMMU Performance Monitor Counter Group
> > information from ACPI. This is in preparation for its use
> > in the SMMU v3 PMU driver.
> >
> > Signed-off-by: Neil Leeder 
> > Signed-off-by: Hanjun Guo 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >   drivers/acpi/arm64/iort.c | 95
> +--
> >   1 file changed, 83 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> > index 7a3a541..ac4d0d6 100644
> > --- a/drivers/acpi/arm64/iort.c
> > +++ b/drivers/acpi/arm64/iort.c
> > @@ -356,7 +356,8 @@ static struct acpi_iort_node *iort_node_get_id(struct
> acpi_iort_node *node,
> > if (map->flags & ACPI_IORT_ID_SINGLE_MAPPING) {
> > if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT ||
> > node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX ||
> > -   node->type == ACPI_IORT_NODE_SMMU_V3) {
> > +   node->type == ACPI_IORT_NODE_SMMU_V3 ||
> > +   node->type == ACPI_IORT_NODE_PMCG) {
> > *id_out = map->output_base;
> > return parent;
> > }
> > @@ -394,6 +395,8 @@ static int iort_get_id_mapping_index(struct
> acpi_iort_node *node)
> > }
> >
> > return smmu->id_mapping_index;
> > +   case ACPI_IORT_NODE_PMCG:
> > +   return 0;
> 
> Why do we need a PMCG case here? AIUI this whole get_id_mapping_index
> business is only relevant to SMMUv3 nodes where we have some need to
> disambiguate the difference between the SMMU's own IDs and
> StreamID-to-DeviceID mappings within the same table. PMCGs simply have
> zero or one single ID mappings so should be equivalent to most named
> components (other than their mappings pointing straight to the ITS).

ITRC this is required for the iort_set_device_domain() function as
otherwise, dev_set_msi_domain() won't be called for PMCGs with MSI
support.

> > default:
> > return -EINVAL;
> > }
> > @@ -1287,6 +1290,63 @@ static bool __init arm_smmu_is_coherent(struct
> acpi_iort_node *node)
> > return smmu->flags & ACPI_IORT_SMMU_COHERENT_WALK;
> >   }
> >
> > +static void __init arm_smmu_common_dma_configure(struct device *dev,
> > +   enum dev_dma_attr attr)
> > +{
> > +   /* We expect the dma masks to be equivalent for all SMMUs
> set-ups */
> > +   dev->dma_mask = >coherent_dma_mask;
> > +
> > +   /* Configure DMA for the page table walker */
> > +   acpi_dma_configure(dev, attr);
> 
> Hmm, I don't think we actually need this call any more, since it should
> now happen later anyway via platform_dma_configure() as the relevant
> SMMU/PMCG driver binds.

This is only applicable to SMMU nodes. As you have noted below, these devices
are from the static table, so I am not sure platform_dma_configure() applies
here. I will double check.
 
> > +}
> > +
> > +static int __init arm_smmu_v3_pmu_count_resources(struct acpi_iort_node
> *node)
> 
> Can we be consistent with "pmcg" rather than "pmu" within IORT please?

Ok.

> 
> > +{
> > +   struct acpi_iort_pmcg *pmcg;
> > +
> > +   /* Retrieve PMCG specific data */
> > +   pmcg = (struct acpi_iort_pmcg *)node->node_data;
> > +
> > +   /*
> > +* There are always 2 memory resources.
> > +* If the overflow_gsiv is present then add that for a total of 3.
> > +*/
> > +   return pmcg->overflow_gsiv > 0 ? 3 : 2;
> > +}
> > +
> > +static void __init arm_smmu_v3_pmu_init_resources(struct resource *res,
> > +  struct acpi_iort_node *node)
> > +{
> > +   struct acpi_iort_pmcg *pmcg;
> > +
> > +   /* Retrieve PMCG specific data */
> > +   p

RE: [PATCH v2 0/4] arm64 SMMUv3 PMU driver with IORT support

2018-08-01 Thread Shameerali Kolothum Thodi

Hi Lorenzo/Robin,

Just a  gentle ping on this series. This is a v2 for smmu pmcg support
based on Neil Leeder's v1[1]. 

Main changes include,
-an helper function to IORT to retrieve the associated SMMU info.
-MSI support to the PMU driver.

Please take a look and let me know your thoughts.

Thanks,
Shameer

[1]https://www.spinics.net/lists/arm-kernel/msg598591.html

> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> Shameer Kolothum
> Sent: 24 July 2018 12:45
> To: lorenzo.pieral...@arm.com; robin.mur...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; pa...@codeaurora.org; will.dea...@arm.com;
> rruig...@codeaurora.org; Linuxarm ; linux-
> ker...@vger.kernel.org; linux-a...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org
> Subject: [PATCH v2 0/4] arm64 SMMUv3 PMU driver with IORT support
> 
> This adds a driver for the SMMUv3 PMU into the perf framework.
> It includes an IORT update to support PM Counter Groups.
> 
> This is based on the initial work done by Neil Leeder[1]
> 
> SMMUv3 PMCG devices are named as arm_smmu_v3_x_pmcg_y where x
> denotes the associated smmuv3 dev id(if any) and y denotes the
> pmu dev id.
> 
> Usage example:
> For common arch supported events:
> perf stat -e arm_smmu_v3_0_pmcg_6/transaction,filter_enable=1,
>  filter_span=1,filter_stream_id=0x42/ -a pwd
> 
> For IMP DEF events:
> perf stat -e arm_smmu_v3.0_pmcg.6/event=id/ -a pwd
> 
> Sanity tested on HiSilicon platform. Further testing on supported
> platforms are very much welcome.
> 
> v1 --> v2
> 
> - Addressed comments from Robin.
> - Added an helper to retrieve the associated smmu dev and named PMUs
>   to make the association visible to user.
> - Added MSI support  for overflow irq
> 
> [1]https://www.spinics.net/lists/arm-kernel/msg598591.html
> 
> Neil Leeder (2):
>   acpi: arm64: add iort support for PMCG
>   perf: add arm64 smmuv3 pmu driver
> 
> Shameer Kolothum (2):
>   acpi: arm64: iort helper to find the associated smmu of pmcg node
>   perf/smmuv3: Add MSI irq support
> 
>  drivers/acpi/arm64/iort.c | 179 +++--
>  drivers/perf/Kconfig  |   9 +
>  drivers/perf/Makefile |   1 +
>  drivers/perf/arm_smmuv3_pmu.c | 901
> ++
>  include/linux/acpi_iort.h |   4 +
>  5 files changed, 1063 insertions(+), 31 deletions(-)
>  create mode 100644 drivers/perf/arm_smmuv3_pmu.c
> 
> --
> 2.7.4
> 
> 
> ___
> Linuxarm mailing list
> linux...@huawei.com
> http://hulk.huawei.com/mailman/listinfo/linuxarm

RE: [PATCH v2 0/4] arm64 SMMUv3 PMU driver with IORT support

2018-08-01 Thread Shameerali Kolothum Thodi

Hi Lorenzo/Robin,

Just a  gentle ping on this series. This is a v2 for smmu pmcg support
based on Neil Leeder's v1[1]. 

Main changes include,
-an helper function to IORT to retrieve the associated SMMU info.
-MSI support to the PMU driver.

Please take a look and let me know your thoughts.

Thanks,
Shameer

[1]https://www.spinics.net/lists/arm-kernel/msg598591.html

> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> Shameer Kolothum
> Sent: 24 July 2018 12:45
> To: lorenzo.pieral...@arm.com; robin.mur...@arm.com
> Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
> neil.m.lee...@gmail.com; pa...@codeaurora.org; will.dea...@arm.com;
> rruig...@codeaurora.org; Linuxarm ; linux-
> ker...@vger.kernel.org; linux-a...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org
> Subject: [PATCH v2 0/4] arm64 SMMUv3 PMU driver with IORT support
> 
> This adds a driver for the SMMUv3 PMU into the perf framework.
> It includes an IORT update to support PM Counter Groups.
> 
> This is based on the initial work done by Neil Leeder[1]
> 
> SMMUv3 PMCG devices are named as arm_smmu_v3_x_pmcg_y where x
> denotes the associated smmuv3 dev id(if any) and y denotes the
> pmu dev id.
> 
> Usage example:
> For common arch supported events:
> perf stat -e arm_smmu_v3_0_pmcg_6/transaction,filter_enable=1,
>  filter_span=1,filter_stream_id=0x42/ -a pwd
> 
> For IMP DEF events:
> perf stat -e arm_smmu_v3.0_pmcg.6/event=id/ -a pwd
> 
> Sanity tested on HiSilicon platform. Further testing on supported
> platforms are very much welcome.
> 
> v1 --> v2
> 
> - Addressed comments from Robin.
> - Added an helper to retrieve the associated smmu dev and named PMUs
>   to make the association visible to user.
> - Added MSI support  for overflow irq
> 
> [1]https://www.spinics.net/lists/arm-kernel/msg598591.html
> 
> Neil Leeder (2):
>   acpi: arm64: add iort support for PMCG
>   perf: add arm64 smmuv3 pmu driver
> 
> Shameer Kolothum (2):
>   acpi: arm64: iort helper to find the associated smmu of pmcg node
>   perf/smmuv3: Add MSI irq support
> 
>  drivers/acpi/arm64/iort.c | 179 +++--
>  drivers/perf/Kconfig  |   9 +
>  drivers/perf/Makefile |   1 +
>  drivers/perf/arm_smmuv3_pmu.c | 901
> ++
>  include/linux/acpi_iort.h |   4 +
>  5 files changed, 1063 insertions(+), 31 deletions(-)
>  create mode 100644 drivers/perf/arm_smmuv3_pmu.c
> 
> --
> 2.7.4
> 
> 
> ___
> Linuxarm mailing list
> linux...@huawei.com
> http://hulk.huawei.com/mailman/listinfo/linuxarm

RE: [PATCH 6/7] irqchip/gic-v3-its: Honor hypervisor enforced LPI range

2018-06-22 Thread Shameerali Kolothum Thodi

Hi Marc,

> -Original Message-
> From: Marc Zyngier [mailto:marc.zyng...@arm.com]
> Sent: 20 June 2018 14:53
> To: linux-kernel@vger.kernel.org
> Cc: Thomas Gleixner ; Ard Biesheuvel
> ; Shanker Donthineni
> ; Shameerali Kolothum Thodi
> ; MaJun ;
> Laurentiu Tudor ; Lei Zhang
> 
> Subject: [PATCH 6/7] irqchip/gic-v3-its: Honor hypervisor enforced LPI range
> 
> A recent extension to the GIC architecture allows a hypervisor to
> arbitrarily reduce the number of LPIs available to a guest, no
> matter what the GIC says about the valid range of IntIDs.
> 
> Let's factor in this information when computing the number of
> available LPIs

On our D05 board, this limits the lpis to 2 and results in MSI irq alloc fails:

[0.00] ITS: Using hypervisor restricted LPI range [2]

[   10.543889] ixgbe 000a:11:00.1: Failed to allocate MSI interrupt, falling 
back to legacy. Error: -12
 
> Signed-off-by: Marc Zyngier 
> ---
>  drivers/irqchip/irq-gic-v3-its.c   | 9 +
>  include/linux/irqchip/arm-gic-v3.h | 1 +
>  2 files changed, 10 insertions(+)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index 8c7e8c235faf..903ca1c19553 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1525,8 +1525,17 @@ static int free_lpi_range(u32 base, u32 nr_lpis)
>  static int __init its_lpi_init(u32 id_bits)
>  {
>   u32 lpis = (1UL << id_bits) - 8192;
> + u32 numlpis;
>   int err;
> 
> + numlpis = 1UL << GICD_TYPER_NUM_LPIS(gic_rdists->gicd_typer);
> +
> + if (numlpis > 1 && !WARN_ON(numlpis > lpis)) {
> + lpis = numlpis;
> + pr_info("ITS: Using hypervisor restricted LPI range [%u]\n",
> + lpis);
> + }

I don't have the GICv3 extension doc, but did you intent to check for,

 if (numlpis > 2 && !WARN_ON(numlpis > lpis)) {

as it looks like D05 returns 0 for bits 11-15 and that makes numlpis=2.

Please let me know.

Thanks,
Shameer

> +
>   /*
>* Initializing the allocator is just the same as freeing the
>* full range of LPIs.
> diff --git a/include/linux/irqchip/arm-gic-v3.h 
> b/include/linux/irqchip/arm-gic-
> v3.h
> index 396cd99af02f..9d2ea3e907d0 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -73,6 +73,7 @@
>  #define GICD_TYPER_MBIS  (1U << 16)
> 
>  #define GICD_TYPER_ID_BITS(typer)typer) >> 19) & 0x1f) + 1)
> +#define GICD_TYPER_NUM_LPIS(typer)   typer) >> 11) & 0x1f) + 1)
>  #define GICD_TYPER_IRQS(typer)   typer) & 0x1f) + 1) * 32)
> 
>  #define GICD_IROUTER_SPI_MODE_ONE(0U << 31)
> --
> 2.17.1

RE: [PATCH 6/7] irqchip/gic-v3-its: Honor hypervisor enforced LPI range

2018-06-22 Thread Shameerali Kolothum Thodi

Hi Marc,

> -Original Message-
> From: Marc Zyngier [mailto:marc.zyng...@arm.com]
> Sent: 20 June 2018 14:53
> To: linux-kernel@vger.kernel.org
> Cc: Thomas Gleixner ; Ard Biesheuvel
> ; Shanker Donthineni
> ; Shameerali Kolothum Thodi
> ; MaJun ;
> Laurentiu Tudor ; Lei Zhang
> 
> Subject: [PATCH 6/7] irqchip/gic-v3-its: Honor hypervisor enforced LPI range
> 
> A recent extension to the GIC architecture allows a hypervisor to
> arbitrarily reduce the number of LPIs available to a guest, no
> matter what the GIC says about the valid range of IntIDs.
> 
> Let's factor in this information when computing the number of
> available LPIs

On our D05 board, this limits the lpis to 2 and results in MSI irq alloc fails:

[0.00] ITS: Using hypervisor restricted LPI range [2]

[   10.543889] ixgbe 000a:11:00.1: Failed to allocate MSI interrupt, falling 
back to legacy. Error: -12
 
> Signed-off-by: Marc Zyngier 
> ---
>  drivers/irqchip/irq-gic-v3-its.c   | 9 +
>  include/linux/irqchip/arm-gic-v3.h | 1 +
>  2 files changed, 10 insertions(+)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index 8c7e8c235faf..903ca1c19553 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1525,8 +1525,17 @@ static int free_lpi_range(u32 base, u32 nr_lpis)
>  static int __init its_lpi_init(u32 id_bits)
>  {
>   u32 lpis = (1UL << id_bits) - 8192;
> + u32 numlpis;
>   int err;
> 
> + numlpis = 1UL << GICD_TYPER_NUM_LPIS(gic_rdists->gicd_typer);
> +
> + if (numlpis > 1 && !WARN_ON(numlpis > lpis)) {
> + lpis = numlpis;
> + pr_info("ITS: Using hypervisor restricted LPI range [%u]\n",
> + lpis);
> + }

I don't have the GICv3 extension doc, but did you intent to check for,

 if (numlpis > 2 && !WARN_ON(numlpis > lpis)) {

as it looks like D05 returns 0 for bits 11-15 and that makes numlpis=2.

Please let me know.

Thanks,
Shameer

> +
>   /*
>* Initializing the allocator is just the same as freeing the
>* full range of LPIs.
> diff --git a/include/linux/irqchip/arm-gic-v3.h 
> b/include/linux/irqchip/arm-gic-
> v3.h
> index 396cd99af02f..9d2ea3e907d0 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -73,6 +73,7 @@
>  #define GICD_TYPER_MBIS  (1U << 16)
> 
>  #define GICD_TYPER_ID_BITS(typer)typer) >> 19) & 0x1f) + 1)
> +#define GICD_TYPER_NUM_LPIS(typer)   typer) >> 11) & 0x1f) + 1)
>  #define GICD_TYPER_IRQS(typer)   typer) & 0x1f) + 1) * 32)
> 
>  #define GICD_IROUTER_SPI_MODE_ONE(0U << 31)
> --
> 2.17.1

RE: [PATCH v6 0/7] vfio/type1: Add support for valid iova list management

2018-05-25 Thread Shameerali Kolothum Thodi

Hi Alex,

> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Thursday, May 24, 2018 7:21 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com>
> Cc: eric.au...@redhat.com; pmo...@linux.vnet.ibm.com;
> k...@vger.kernel.org; linux-kernel@vger.kernel.org; iommu@lists.linux-
> foundation.org; Linuxarm <linux...@huawei.com>; John Garry
> <john.ga...@huawei.com>; xuwei (O) <xuw...@huawei.com>; Joerg Roedel
> <j...@8bytes.org>
> Subject: Re: [PATCH v6 0/7] vfio/type1: Add support for valid iova list
> management
> 
> [Cc +Joerg: AMD-Vi observation towards the end]
> 
> On Wed, 18 Apr 2018 12:40:38 +0100
> Shameer Kolothum <shameerali.kolothum.th...@huawei.com> wrote:
> 
> > This series introduces an iova list associated with a vfio
> > iommu. The list is kept updated taking care of iommu apertures,
> > and reserved regions. Also this series adds checks for any conflict
> > with existing dma mappings whenever a new device group is attached to
> > the domain.
> >
> > User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO
> > ioctl capability chains. Any dma map request outside the valid iova
> > range will be rejected.
> 
> Hi Shameer,
> 
> I ran into two minor issues in testing this series, both related to
> mdev usage of type1.  First, in patch 5/7 when we try to validate a dma
> map request:

I must admit I haven't looked into the mdev use case at all and my impression
was that it will be same as others. Thanks for doing these tests.

> > +static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
> > +   dma_addr_t start, dma_addr_t end)
> > +{
> > +   struct list_head *iova = >iova_list;
> > +   struct vfio_iova *node;
> > +
> > +   list_for_each_entry(node, iova, list) {
> > +   if ((start >= node->start) && (end <= node->end))
> > +   return true;
> > +   }
> > +
> > +   return false;
> > +}
> 
> A container with only an mdev device will have an empty list because it
> has not backing iommu to set ranges or reserved regions, so any dma map
> will fail.  I think this is resolved as follows:
> 
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1100,7 +1100,7 @@ static bool vfio_iommu_iova_dma_valid(struct
> vfio_iommu *iommu,
> return true;
> }
> 
> -   return false;
> +   return list_empty(>iova_list);
>  }

Ok.

> ie. return false only if there was anything to test against.
> 
> The second issue is similar, patch 6/7 adds to VFIO_IOMMU_GET_INFO:
> 
> + ret = vfio_iommu_iova_build_caps(iommu, );
> + if (ret)
> + return ret;
> 
> And build_caps has:
> 
> + list_for_each_entry(iova, >iova_list, list)
> + iovas++;
> +
> + if (!iovas) {
> + ret = -EINVAL;
> 
> Therefore if the iova list is empty, as for mdevs, the use can no
> longer even call VFIO_IOMMU_GET_INFO on the container, which is a
> regression.  Again, I think the fix is simple:
> 
> @@ -2090,7 +2090,7 @@ static int vfio_iommu_iova_build_caps(struct
> vfio_iommu *iommu,
> iovas++;
> 
> if (!iovas) {
> -   ret = -EINVAL;
> +   ret = 0;
> goto out_unlock;
> }
> 
> ie. build_caps needs to handle lack of an iova_list as a non-error.

Ok.

> Also, I wrote a small unit test to validate the iova list for my
> systems[1].  With the above changes, my Intel test system gives expected
> results:
> 
> # ./vfio-type1-iova-list /sys/bus/mdev/devices/c08db5ed-05d3-4b39-b150-
> 438a18bc698f /sys/bus/pci/devices/:00:1b.0
>  Adding device: c08db5ed-05d3-4b39-b150-438a18bc698f 
> Initial info struct size: 0x18
> No caps
>  Adding device: :00:1b.0 
> Initial info struct size: 0x18
> Requested info struct size: 0x48
> New info struct size: 0x48
> argsz: 0x48, flags: 0x3, cap_offset: 0x18
>   00: 4800  0300  00f0   
>   10: 1800    0100 0100  
>   20: 0200       
>   30:  dffe    f0fe  
>   40:    ff01
> [cap id: 1, version: 1, next: 0x0]
> Found type1 iova range version: 1
>   00:  - fedf
>   01: fef0 - 01ff
> 
> Adding an mdev device to the container results in no iova list, adding
> the physical device updates to the expected set with the MSI r

RE: [PATCH v6 0/7] vfio/type1: Add support for valid iova list management

2018-05-25 Thread Shameerali Kolothum Thodi

Hi Alex,

> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Thursday, May 24, 2018 7:21 PM
> To: Shameerali Kolothum Thodi 
> Cc: eric.au...@redhat.com; pmo...@linux.vnet.ibm.com;
> k...@vger.kernel.org; linux-kernel@vger.kernel.org; iommu@lists.linux-
> foundation.org; Linuxarm ; John Garry
> ; xuwei (O) ; Joerg Roedel
> 
> Subject: Re: [PATCH v6 0/7] vfio/type1: Add support for valid iova list
> management
> 
> [Cc +Joerg: AMD-Vi observation towards the end]
> 
> On Wed, 18 Apr 2018 12:40:38 +0100
> Shameer Kolothum  wrote:
> 
> > This series introduces an iova list associated with a vfio
> > iommu. The list is kept updated taking care of iommu apertures,
> > and reserved regions. Also this series adds checks for any conflict
> > with existing dma mappings whenever a new device group is attached to
> > the domain.
> >
> > User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO
> > ioctl capability chains. Any dma map request outside the valid iova
> > range will be rejected.
> 
> Hi Shameer,
> 
> I ran into two minor issues in testing this series, both related to
> mdev usage of type1.  First, in patch 5/7 when we try to validate a dma
> map request:

I must admit I haven't looked into the mdev use case at all and my impression
was that it will be same as others. Thanks for doing these tests.

> > +static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
> > +   dma_addr_t start, dma_addr_t end)
> > +{
> > +   struct list_head *iova = >iova_list;
> > +   struct vfio_iova *node;
> > +
> > +   list_for_each_entry(node, iova, list) {
> > +   if ((start >= node->start) && (end <= node->end))
> > +   return true;
> > +   }
> > +
> > +   return false;
> > +}
> 
> A container with only an mdev device will have an empty list because it
> has not backing iommu to set ranges or reserved regions, so any dma map
> will fail.  I think this is resolved as follows:
> 
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1100,7 +1100,7 @@ static bool vfio_iommu_iova_dma_valid(struct
> vfio_iommu *iommu,
> return true;
> }
> 
> -   return false;
> +   return list_empty(>iova_list);
>  }

Ok.

> ie. return false only if there was anything to test against.
> 
> The second issue is similar, patch 6/7 adds to VFIO_IOMMU_GET_INFO:
> 
> + ret = vfio_iommu_iova_build_caps(iommu, );
> + if (ret)
> + return ret;
> 
> And build_caps has:
> 
> + list_for_each_entry(iova, >iova_list, list)
> + iovas++;
> +
> + if (!iovas) {
> + ret = -EINVAL;
> 
> Therefore if the iova list is empty, as for mdevs, the use can no
> longer even call VFIO_IOMMU_GET_INFO on the container, which is a
> regression.  Again, I think the fix is simple:
> 
> @@ -2090,7 +2090,7 @@ static int vfio_iommu_iova_build_caps(struct
> vfio_iommu *iommu,
> iovas++;
> 
> if (!iovas) {
> -   ret = -EINVAL;
> +   ret = 0;
> goto out_unlock;
> }
> 
> ie. build_caps needs to handle lack of an iova_list as a non-error.

Ok.

> Also, I wrote a small unit test to validate the iova list for my
> systems[1].  With the above changes, my Intel test system gives expected
> results:
> 
> # ./vfio-type1-iova-list /sys/bus/mdev/devices/c08db5ed-05d3-4b39-b150-
> 438a18bc698f /sys/bus/pci/devices/:00:1b.0
>  Adding device: c08db5ed-05d3-4b39-b150-438a18bc698f 
> Initial info struct size: 0x18
> No caps
>  Adding device: :00:1b.0 
> Initial info struct size: 0x18
> Requested info struct size: 0x48
> New info struct size: 0x48
> argsz: 0x48, flags: 0x3, cap_offset: 0x18
>   00: 4800  0300  00f0   
>   10: 1800    0100 0100  
>   20: 0200       
>   30:  dffe    f0fe  
>   40:    ff01
> [cap id: 1, version: 1, next: 0x0]
> Found type1 iova range version: 1
>   00:  - fedf
>   01: fef0 - 01ff
> 
> Adding an mdev device to the container results in no iova list, adding
> the physical device updates to the expected set with the MSI range
> excluded.
> 
> I was a little surprised by an AMD system:
> 
> # ./vfio-type1-iova-list /sys/bus/pci/devices/:01:00.0
>  Adding device: :01:00.0 
>

RE: [PATCH 2/2] perf: add arm64 smmuv3 pmu driver

2018-05-03 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: linux-arm-kernel [mailto:linux-arm-kernel-boun...@lists.infradead.org]
> On Behalf Of Agustin Vega-Frias
> Sent: Wednesday, May 02, 2018 3:20 PM
> To: xieyisheng (A) 
> Cc: Mark Rutland ; Mark Langsdorf
> ; Neil Leeder ; Jon
> Masters ; Timur Tabi ; Will
> Deacon ; linux-kernel@vger.kernel.org; Mark Brown
> ; Mark Salter ; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [PATCH 2/2] perf: add arm64 smmuv3 pmu driver
> 
> On 2018-04-02 02:37, Yisheng Xie wrote:
> > Hi Neil,
> >
> > On 2018/4/1 13:44, Neil Leeder wrote:
> >> Hi Yisheng Xie,
> >>
> >> On 3/29/2018 03:03 AM, Yisheng Xie wrote:
> >>>
> >>> Hi Neil,
> >>>
> >>> On 2017/8/5 3:59, Neil Leeder wrote:
>  +mem_resource_0 = platform_get_resource(pdev,
> IORESOURCE_MEM,
>  0);
>  +mem_map_0 = devm_ioremap_resource(>dev,
> mem_resource_0);
>  +
> >>> Can we use devm_ioremap instead? for the reg_base of smmu_pmu is
> >>> IMPLEMENTATION DEFINED. If the reg of smmu_pmu is inside smmu,
> >>> devm_ioremap_resource will failed and return -EBUSY, eg.:
> >>>
> >>>   smmu reg ranges:0x18000 ~ 0x1801f
> >>>   its smmu_pmu reg ranges:0x180001000 ~ 0x180001fff
> >>>
> >> Just to let you know that I no longer work at Qualcomm and I won't be
> >> able to provide updates to this patchset. I expect that others from my
> >> former team at Qualcomm will pick up ownership.
> >
> > Thanks for this infomation.
> >
> > hi Agustin and Timur,
> >
> > Is there any new status about this patchset?
> >
> 
> Hi,
> 
> Apologies for the slow response.
> We are having some internal discussions about when/if to do this.
> I expect to have more clarity within a few weeks.
> 
> For what is worth let me take the opportunity to outline the approach
> we would like to see for a V2 either developed by us or somebody else
> in the community:
> 
> 1. Rework to comply with the IORT spec changes.
> 
> 2. Rework probing to extract extra information from the IORT table
> about SMMU/device associations.

Thanks for coming back on this. It would be good to address cases where
the PMCG base address is at a IMP DEF address offset within the associated
SMMUv3 page address space. As things stands with pmu v1 currently, the
SMMUv3 driver probe will fail. Please find the discussion here[1].

Thanks,
Shameer
[1] https://lkml.org/lkml/2018/1/31/235

>With this information and some perf user space work I think it's
> possible
>to have a single dynamic PMU node and use a similar approach to what
> is
>used in the Coresight drivers to pass the device we want to monitor
> and
>for the driver to find the PMU/PMCG. E.g.:
> 
>$ lspci
>0001:00:00.0 PCI bridge: Airgo Networks, Inc. Device 0401
>0002:00:00.0 PCI bridge: Airgo Networks, Inc. Device 0401
>0002:01:00.0 Ethernet controller: Mellanox Technologies MT27500 Family
> [ConnectX-3]
>0003:00:00.0 PCI bridge: Airgo Networks, Inc. Device 0401
>0003:01:00.0 Ethernet controller: Mellanox Technologies MT27500 Family
> [ConnectX-3]
> 
># Monitor TLB misses on root complex 2 (no stream filter is applied)
>perf stat -a -e smmu/tlb_miss,@0002:00:00.0/ 
> 
># Monitor TLB misses on a device on root complex 2 (derive the stream
> number from the RID)
>perf stat -a -e smmu/tlb_miss,@0002:01:00.0/ 
> Thanks,
> Agustín
> 
> --
> Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm
> Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a
> Linux Foundation Collaborative Project.
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

RE: [PATCH 2/2] perf: add arm64 smmuv3 pmu driver

2018-05-03 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: linux-arm-kernel [mailto:linux-arm-kernel-boun...@lists.infradead.org]
> On Behalf Of Agustin Vega-Frias
> Sent: Wednesday, May 02, 2018 3:20 PM
> To: xieyisheng (A) 
> Cc: Mark Rutland ; Mark Langsdorf
> ; Neil Leeder ; Jon
> Masters ; Timur Tabi ; Will
> Deacon ; linux-kernel@vger.kernel.org; Mark Brown
> ; Mark Salter ; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [PATCH 2/2] perf: add arm64 smmuv3 pmu driver
> 
> On 2018-04-02 02:37, Yisheng Xie wrote:
> > Hi Neil,
> >
> > On 2018/4/1 13:44, Neil Leeder wrote:
> >> Hi Yisheng Xie,
> >>
> >> On 3/29/2018 03:03 AM, Yisheng Xie wrote:
> >>>
> >>> Hi Neil,
> >>>
> >>> On 2017/8/5 3:59, Neil Leeder wrote:
>  +mem_resource_0 = platform_get_resource(pdev,
> IORESOURCE_MEM,
>  0);
>  +mem_map_0 = devm_ioremap_resource(>dev,
> mem_resource_0);
>  +
> >>> Can we use devm_ioremap instead? for the reg_base of smmu_pmu is
> >>> IMPLEMENTATION DEFINED. If the reg of smmu_pmu is inside smmu,
> >>> devm_ioremap_resource will failed and return -EBUSY, eg.:
> >>>
> >>>   smmu reg ranges:0x18000 ~ 0x1801f
> >>>   its smmu_pmu reg ranges:0x180001000 ~ 0x180001fff
> >>>
> >> Just to let you know that I no longer work at Qualcomm and I won't be
> >> able to provide updates to this patchset. I expect that others from my
> >> former team at Qualcomm will pick up ownership.
> >
> > Thanks for this infomation.
> >
> > hi Agustin and Timur,
> >
> > Is there any new status about this patchset?
> >
> 
> Hi,
> 
> Apologies for the slow response.
> We are having some internal discussions about when/if to do this.
> I expect to have more clarity within a few weeks.
> 
> For what is worth let me take the opportunity to outline the approach
> we would like to see for a V2 either developed by us or somebody else
> in the community:
> 
> 1. Rework to comply with the IORT spec changes.
> 
> 2. Rework probing to extract extra information from the IORT table
> about SMMU/device associations.

Thanks for coming back on this. It would be good to address cases where
the PMCG base address is at a IMP DEF address offset within the associated
SMMUv3 page address space. As things stands with pmu v1 currently, the
SMMUv3 driver probe will fail. Please find the discussion here[1].

Thanks,
Shameer
[1] https://lkml.org/lkml/2018/1/31/235

>With this information and some perf user space work I think it's
> possible
>to have a single dynamic PMU node and use a similar approach to what
> is
>used in the Coresight drivers to pass the device we want to monitor
> and
>for the driver to find the PMU/PMCG. E.g.:
> 
>$ lspci
>0001:00:00.0 PCI bridge: Airgo Networks, Inc. Device 0401
>0002:00:00.0 PCI bridge: Airgo Networks, Inc. Device 0401
>0002:01:00.0 Ethernet controller: Mellanox Technologies MT27500 Family
> [ConnectX-3]
>0003:00:00.0 PCI bridge: Airgo Networks, Inc. Device 0401
>0003:01:00.0 Ethernet controller: Mellanox Technologies MT27500 Family
> [ConnectX-3]
> 
># Monitor TLB misses on root complex 2 (no stream filter is applied)
>perf stat -a -e smmu/tlb_miss,@0002:00:00.0/ 
> 
># Monitor TLB misses on a device on root complex 2 (derive the stream
> number from the RID)
>perf stat -a -e smmu/tlb_miss,@0002:01:00.0/ 
> Thanks,
> Agustín
> 
> --
> Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm
> Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a
> Linux Foundation Collaborative Project.
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

RE: [PATCH v6 4/7] iommu/dma: Move PCI window region reservation back into dma specific path.

2018-04-24 Thread Shameerali Kolothum Thodi

Hi Joerg,

Could you please take a look at this patch and let me know.

I have rebased this to 4.17-rc1  and added Robin's R-by.

This series[1] is now pending on this patch as without this it will break few
ARM platforms[2]. 

Please take a look and let me know.

Thanks,
Shameer

[1] https://lkml.org/lkml/2018/4/18/293
[2] https://lkml.org/lkml/2018/3/14/881


> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: Wednesday, April 18, 2018 12:41 PM
> To: alex.william...@redhat.com; eric.au...@redhat.com;
> pmo...@linux.vnet.ibm.com
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org; iommu@lists.linux-
> foundation.org; Linuxarm <linux...@huawei.com>; John Garry
> <john.ga...@huawei.com>; xuwei (O) <xuw...@huawei.com>; Shameerali
> Kolothum Thodi <shameerali.kolothum.th...@huawei.com>; Joerg Roedel
> <j...@8bytes.org>
> Subject: [PATCH v6 4/7] iommu/dma: Move PCI window region reservation
> back into dma specific path.
> 
> This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
> window reservation generic")  by moving the PCI window region
> reservation back into the dma specific path so that these regions
> doesn't get exposed via the IOMMU API interface. With this change,
> the vfio interface will report only iommu specific reserved regions
> to the user space.
> 
> Cc: Joerg Roedel <j...@8bytes.org>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.th...@huawei.com>
> Reviewed-by: Robin Murphy <robin.mur...@arm.com>
> ---
>  drivers/iommu/dma-iommu.c | 54 ++--
> ---
>  1 file changed, 25 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index f05f3cf..ddcbbdb 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
>   * @list: Reserved region list from iommu_get_resv_regions()
>   *
>   * IOMMU drivers can use this to implement their .get_resv_regions callback
> - * for general non-IOMMU-specific reservations. Currently, this covers host
> - * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI
> - * based ARM platforms that may require HW MSI reservation.
> + * for general non-IOMMU-specific reservations. Currently, this covers GICv3
> + * ITS region reservation on ACPI based ARM platforms that may require HW
> MSI
> + * reservation.
>   */
>  void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
>  {
> - struct pci_host_bridge *bridge;
> - struct resource_entry *window;
> -
> - if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
> - iort_iommu_msi_get_resv_regions(dev, list) < 0)
> - return;
> -
> - if (!dev_is_pci(dev))
> - return;
> -
> - bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
> - resource_list_for_each_entry(window, >windows) {
> - struct iommu_resv_region *region;
> - phys_addr_t start;
> - size_t length;
> -
> - if (resource_type(window->res) != IORESOURCE_MEM)
> - continue;
> 
> - start = window->res->start - window->offset;
> - length = window->res->end - window->res->start + 1;
> - region = iommu_alloc_resv_region(start, length, 0,
> - IOMMU_RESV_RESERVED);
> - if (!region)
> - return;
> + if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
> + iort_iommu_msi_get_resv_regions(dev, list);
> 
> - list_add_tail(>list, list);
> - }
>  }
>  EXPORT_SYMBOL(iommu_dma_get_resv_regions);
> 
> @@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct
> iommu_dma_cookie *cookie,
>   return 0;
>  }
> 
> +static void iova_reserve_pci_windows(struct pci_dev *dev,
> + struct iova_domain *iovad)
> +{
> + struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> + struct resource_entry *window;
> + unsigned long lo, hi;
> +
> + resource_list_for_each_entry(window, >windows) {
> + if (resource_type(window->res) != IORESOURCE_MEM)
> + continue;
> +
> + lo = iova_pfn(iovad, window->res->start - window->offset);
> + hi = iova_pfn(iovad, window->res->end - window->offset);
> + reserve_iova(iovad, lo, hi);
> + }
> +}
> +
>  static int iova_reserve_iommu_regions(struct device *dev,
>   struct iommu_domain *domain)
>  {
> @@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device
> *dev,
>   LIST_HEAD(resv_regions);
>   int ret = 0;
> 
> + if (dev_is_pci(dev))
> + iova_reserve_pci_windows(to_pci_dev(dev), iovad);
> +
>   iommu_get_resv_regions(dev, _regions);
>   list_for_each_entry(region, _regions, list) {
>   unsigned long lo, hi;
> --
> 2.7.4
>

RE: [PATCH v6 4/7] iommu/dma: Move PCI window region reservation back into dma specific path.

2018-04-24 Thread Shameerali Kolothum Thodi

Hi Joerg,

Could you please take a look at this patch and let me know.

I have rebased this to 4.17-rc1  and added Robin's R-by.

This series[1] is now pending on this patch as without this it will break few
ARM platforms[2]. 

Please take a look and let me know.

Thanks,
Shameer

[1] https://lkml.org/lkml/2018/4/18/293
[2] https://lkml.org/lkml/2018/3/14/881


> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: Wednesday, April 18, 2018 12:41 PM
> To: alex.william...@redhat.com; eric.au...@redhat.com;
> pmo...@linux.vnet.ibm.com
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org; iommu@lists.linux-
> foundation.org; Linuxarm ; John Garry
> ; xuwei (O) ; Shameerali
> Kolothum Thodi ; Joerg Roedel
> 
> Subject: [PATCH v6 4/7] iommu/dma: Move PCI window region reservation
> back into dma specific path.
> 
> This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
> window reservation generic")  by moving the PCI window region
> reservation back into the dma specific path so that these regions
> doesn't get exposed via the IOMMU API interface. With this change,
> the vfio interface will report only iommu specific reserved regions
> to the user space.
> 
> Cc: Joerg Roedel 
> Signed-off-by: Shameer Kolothum 
> Reviewed-by: Robin Murphy 
> ---
>  drivers/iommu/dma-iommu.c | 54 ++--
> ---
>  1 file changed, 25 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index f05f3cf..ddcbbdb 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
>   * @list: Reserved region list from iommu_get_resv_regions()
>   *
>   * IOMMU drivers can use this to implement their .get_resv_regions callback
> - * for general non-IOMMU-specific reservations. Currently, this covers host
> - * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI
> - * based ARM platforms that may require HW MSI reservation.
> + * for general non-IOMMU-specific reservations. Currently, this covers GICv3
> + * ITS region reservation on ACPI based ARM platforms that may require HW
> MSI
> + * reservation.
>   */
>  void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
>  {
> - struct pci_host_bridge *bridge;
> - struct resource_entry *window;
> -
> - if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
> - iort_iommu_msi_get_resv_regions(dev, list) < 0)
> - return;
> -
> - if (!dev_is_pci(dev))
> - return;
> -
> - bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
> - resource_list_for_each_entry(window, >windows) {
> - struct iommu_resv_region *region;
> - phys_addr_t start;
> - size_t length;
> -
> - if (resource_type(window->res) != IORESOURCE_MEM)
> - continue;
> 
> - start = window->res->start - window->offset;
> - length = window->res->end - window->res->start + 1;
> - region = iommu_alloc_resv_region(start, length, 0,
> - IOMMU_RESV_RESERVED);
> - if (!region)
> - return;
> + if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
> + iort_iommu_msi_get_resv_regions(dev, list);
> 
> - list_add_tail(>list, list);
> - }
>  }
>  EXPORT_SYMBOL(iommu_dma_get_resv_regions);
> 
> @@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct
> iommu_dma_cookie *cookie,
>   return 0;
>  }
> 
> +static void iova_reserve_pci_windows(struct pci_dev *dev,
> + struct iova_domain *iovad)
> +{
> + struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> + struct resource_entry *window;
> + unsigned long lo, hi;
> +
> + resource_list_for_each_entry(window, >windows) {
> + if (resource_type(window->res) != IORESOURCE_MEM)
> + continue;
> +
> + lo = iova_pfn(iovad, window->res->start - window->offset);
> + hi = iova_pfn(iovad, window->res->end - window->offset);
> + reserve_iova(iovad, lo, hi);
> + }
> +}
> +
>  static int iova_reserve_iommu_regions(struct device *dev,
>   struct iommu_domain *domain)
>  {
> @@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device
> *dev,
>   LIST_HEAD(resv_regions);
>   int ret = 0;
> 
> + if (dev_is_pci(dev))
> + iova_reserve_pci_windows(to_pci_dev(dev), iovad);
> +
>   iommu_get_resv_regions(dev, _regions);
>   list_for_each_entry(region, _regions, list) {
>   unsigned long lo, hi;
> --
> 2.7.4
>

1 2 >

1 - 100 of 184 matches

Mail list logo