Re: [PATCH 10/10] dmapool: improve scalability of dma_pool_free

2022-05-31 Thread Keith Busch
On Tue, May 31, 2022 at 02:23:44PM -0400, Tony Battersby wrote:
> dma_pool_free() scales poorly when the pool contains many pages because
> pool_find_page() does a linear scan of all allocated pages.  Improve its
> scalability by replacing the linear scan with a red-black tree lookup.
> In big O notation, this improves the algorithm from O(n^2) to O(n * log n).

The improvement should say O(n) to O(log n), right?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 08/10] dmapool: cleanup dma_pool_destroy

2022-05-31 Thread Keith Busch
On Tue, May 31, 2022 at 02:22:21PM -0400, Tony Battersby wrote:
> +static void pool_free_page(struct dma_pool *pool,
> +struct dma_page *page,
> +bool destroying_pool)

'destroying_pool' is always true, so I don't think you need it.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: nvme: IO_PAGE_FAULT logged with Intel SSDPEKKF512G8

2022-01-18 Thread Keith Busch
On Tue, Jan 18, 2022 at 03:32:45PM +0100, Paul Menzel wrote:
> On a Dell OptiPlex 5055 with an Intel SSDPEKKF512G8, Linux 5.10.82 reported
> an IO_PAGE_FAULT error. This is the first and only time this has happened.
> 
> $ dmesg --level=err
> [4.194306] nvme :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
> domain=0x000c address=0xc080 flags=0x0050]
> [4.206970] nvme :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
> domain=0x000c address=0xc000 flags=0x0050]
> [7.327820] kfd kfd: VERDE  not supported in kfd
> $ lspci -nn -s 01:00.0
> 01:00.0 Non-Volatile memory controller [0108]: Intel Corporation SSD Pro 
> 7600p/760p/E 6100p Series [8086:f1a6] (rev 03)

I think it's a bug with the iommu implementation. If it causes problems,
you can typically work around it with kernel parameter "iommu=soft".
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-01 Thread Keith Busch
On Mon, Feb 01, 2021 at 10:30:17AM -0800, Jianxiong Gao wrote:
> @@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
> struct request *req,
>   if (!iod->nents)
>   goto out_free_sg;
>  
> + offset_ret = dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
> + if (offset_ret) {
> + dev_warn(dev->dev, "dma_set_min_align_mask failed to set 
> offset\n");
> + goto out_free_sg;
> + }
> +
>   if (is_pci_p2pdma_page(sg_page(iod->sg)))
>   nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
>   iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
>   else
>   nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
>rq_dma_dir(req), DMA_ATTR_NO_WARN);
> +
> + offset_ret = dma_set_min_align_mask(dev->dev, 0);
> + if (offset_ret) {
> + dev_warn(dev->dev, "dma_set_min_align_mask failed to reset 
> offset\n");
> + goto out_free_sg;
> + }
>   if (!nr_mapped)
>   goto out_free_sg;

Why is this setting being done and undone on each IO? Wouldn't it be
more efficient to set it once during device initialization?

And more importantly, this isn't thread safe: one CPU may be setting the
device's dma alignment mask to 0 while another CPU is expecting it to be
NVME_CTRL_PAGE_SIZE - 1.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.

2021-01-28 Thread Keith Busch
On Thu, Jan 28, 2021 at 12:15:28PM -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 27, 2021 at 04:38:28PM -0800, Jianxiong Gao wrote:
> > For devices that need to preserve address offset on mapping through
> > swiotlb, this patch adds offset preserving based on page_offset_mask
> > and keeps the offset if the mask is non zero. This is needed for
> > device drivers like NVMe.
> 
> 
> 
> Didn't you send this patch like a month ago and someone pointed
> out that the right fix would be in the NVMe driver?
> 
> Is there an issue with fixing the NVMe driver?

You got it backwards. The initial "fix" used a flag specific to the nvme
driver, and it was pointed out that it should just be the generic
behaviour.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 15/15] nvme-pci: Allow mmaping the CMB in userspace

2020-11-09 Thread Keith Busch
On Fri, Nov 06, 2020 at 10:00:36AM -0700, Logan Gunthorpe wrote:
> Allow userspace to obtain CMB memory by mmaping the controller's
> char device. The mmap call allocates and returns a hunk of CMB memory,
> (the offset is ignored) so userspace does not have control over the
> address within the CMB.
> 
> A VMA allocated in this way will only be usable by drivers that set
> FOLL_PCI_P2PDMA when calling GUP. And inter-device support will be
> checked the first time the pages are mapped for DMA.
> 
> Currently this is only supported by O_DIRECT to an PCI NVMe device
> or through the NVMe passthrough IOCTL.

Rather than make this be specific to nvme, could pci p2pdma create an
mmap'able file for any resource registered with it?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [trivial PATCH] treewide: Convert switch/case fallthrough; to break;

2020-09-09 Thread Keith Busch
On Wed, Sep 09, 2020 at 01:06:39PM -0700, Joe Perches wrote:
> diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
> index eea0f453cfb6..8aac5bc60f4c 100644
> --- a/crypto/tcrypt.c
> +++ b/crypto/tcrypt.c
> @@ -2464,7 +2464,7 @@ static int do_test(const char *alg, u32 type, u32 mask, 
> int m, u32 num_mb)
>   test_hash_speed("streebog512", sec,
>   generic_hash_speed_template);
>   if (mode > 300 && mode < 400) break;
> - fallthrough;
> + break;
>   case 399:
>   break;

Just imho, this change makes the preceding 'if' look even more
pointless. Maybe the fallthrough was a deliberate choice? Not that my
opinion matters here as I don't know this module, but it looked a bit
odd to me.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 6/7] PCI: vmd: Stop overriding dma_map_ops

2020-01-22 Thread Keith Busch
On Tue, Jan 21, 2020 at 06:37:50AM -0700, Jon Derrick wrote:
> Devices on the VMD domain use the VMD endpoint's requester ID and have
> been relying on the VMD endpoint's DMA operations. The problem with this
> was that VMD domain devices would use the VMD endpoint's attributes when
> doing DMA and IOMMU mapping. We can be smarter about this by only using
> the VMD endpoint when mapping and providing the correct child device's
> attributes during DMA operations.
> 
> This patch removes the dma_map_ops redirect.
> 
> Signed-off-by: Jon Derrick 

Looks good.

Reviewed-by: Keith Busch 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/5] PCI/vmd: Stop overriding dma_map_ops

2019-08-28 Thread Keith Busch
On Wed, Aug 28, 2019 at 07:14:42AM -0700, Christoph Hellwig wrote:
> With a little tweak to the intel-iommu code we should be able to work
> around the VMD mess for the requester IDs without having to create giant
> amounts of boilerplate DMA ops wrapping code.  The other advantage of
> this scheme is that we can respect the real DMA masks for the actual
> devices, and I bet it will only be a matter of time until we'll see the
> first DMA challeneged NVMe devices.

This tests out fine on VMD hardware, but it's quite different than the
previous patch. In v1, the original dev was used in iommu_need_mapping(),
but this time it's the vmd device. Is this still using the actual device's
DMA mask then?


> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/intel-iommu.c|  25 ++
>  drivers/pci/controller/Kconfig |   1 -
>  drivers/pci/controller/vmd.c   | 150 -
>  3 files changed, 25 insertions(+), 151 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 12d094d08c0a..aaa35ac73956 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -373,6 +373,23 @@ EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
>  static DEFINE_SPINLOCK(device_domain_lock);
>  static LIST_HEAD(device_domain_list);
>  
> +/*
> + * For VMD we need to use the VMD devices for mapping requests instead of the
> + * actual device to get the proper PCIe requester ID.
> + */
> +static inline struct device *vmd_real_dev(struct device *dev)
> +{
> +#if IS_ENABLED(CONFIG_VMD)
> + if (dev_is_pci(dev)) {
> + struct pci_sysdata *sd = to_pci_dev(dev)->bus->sysdata;
> +
> + if (sd->vmd_dev)
> + return sd->vmd_dev;
> + }
> +#endif
> + return dev;
> +}
> +
>  /*
>   * Iterate over elements in device_domain_list and call the specified
>   * callback @fn against each element.
> @@ -3520,6 +3537,7 @@ static dma_addr_t intel_map_page(struct device *dev, 
> struct page *page,
>enum dma_data_direction dir,
>unsigned long attrs)
>  {
> + dev = vmd_real_dev(dev);
>   if (iommu_need_mapping(dev))
>   return __intel_map_single(dev, page_to_phys(page) + offset,
>   size, dir, *dev->dma_mask);
> @@ -3530,6 +3548,7 @@ static dma_addr_t intel_map_resource(struct device 
> *dev, phys_addr_t phys_addr,
>size_t size, enum dma_data_direction dir,
>unsigned long attrs)
>  {
> + dev = vmd_real_dev(dev);
>   if (iommu_need_mapping(dev))
>   return __intel_map_single(dev, phys_addr, size, dir,
>   *dev->dma_mask);
> @@ -3585,6 +3604,7 @@ static void intel_unmap_page(struct device *dev, 
> dma_addr_t dev_addr,
>size_t size, enum dma_data_direction dir,
>unsigned long attrs)
>  {
> + dev = vmd_real_dev(dev);
>   if (iommu_need_mapping(dev))
>   intel_unmap(dev, dev_addr, size);
>   else
> @@ -3594,6 +3614,7 @@ static void intel_unmap_page(struct device *dev, 
> dma_addr_t dev_addr,
>  static void intel_unmap_resource(struct device *dev, dma_addr_t dev_addr,
>   size_t size, enum dma_data_direction dir, unsigned long attrs)
>  {
> + dev = vmd_real_dev(dev);
>   if (iommu_need_mapping(dev))
>   intel_unmap(dev, dev_addr, size);
>  }
> @@ -3605,6 +3626,7 @@ static void *intel_alloc_coherent(struct device *dev, 
> size_t size,
>   struct page *page = NULL;
>   int order;
>  
> + dev = vmd_real_dev(dev);
>   if (!iommu_need_mapping(dev))
>   return dma_direct_alloc(dev, size, dma_handle, flags, attrs);
>  
> @@ -3641,6 +3663,7 @@ static void intel_free_coherent(struct device *dev, 
> size_t size, void *vaddr,
>   int order;
>   struct page *page = virt_to_page(vaddr);
>  
> + dev = vmd_real_dev(dev);
>   if (!iommu_need_mapping(dev))
>   return dma_direct_free(dev, size, vaddr, dma_handle, attrs);
>  
> @@ -3661,6 +3684,7 @@ static void intel_unmap_sg(struct device *dev, struct 
> scatterlist *sglist,
>   struct scatterlist *sg;
>   int i;
>  
> + dev = vmd_real_dev(dev);
>   if (!iommu_need_mapping(dev))
>   return dma_direct_unmap_sg(dev, sglist, nelems, dir, attrs);
>  
> @@ -3685,6 +3709,7 @@ static int intel_map_sg(struct device *dev, struct 
> scatterlist *sglist, int nele
>   struct intel_iommu *iommu;
>  
>   BUG_ON(dir == DMA_NONE);
> + dev = vmd_real_dev(dev);
>   if (!iommu_need_mapping(dev))
>   return dma_direct_map_sg(dev, sglist, nelems, dir, attrs);
>  
> diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
> index fe9f9f13ce11..920546cb84e2 100644
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ 

Re: [PATCH] vmd: Stop overriding dma_map_ops

2019-08-27 Thread Keith Busch
On Tue, Aug 27, 2019 at 11:24:05AM -0700, Derrick, Jonathan wrote:
> On Mon, 2019-08-26 at 17:06 +0200, Christoph Hellwig wrote:
> > With a little tweak to the intel-iommu code we should be able to work
> > around the VMD mess for the requester IDs without having to create giant
> > amounts of boilerplate DMA ops wrapping code.  The other advantage of
> > this scheme is that we can respect the real DMA masks for the actual
> > devices, and I bet it will only be a matter of time until we'll see the
> > first DMA challeneged NVMe devices.
> > 
> > The only downside is that we can't offer vmd as a module given that
> > intel-iommu calls into it.  But the driver only has about 700 lines
> > of code, so this should not be a major issue.
> If we're going to remove its ability to be a module, and given its
> small size, could we make this default =y?
> 
> Otherwise we risk breaking platforms which have it enabled with OSVs
> who miss enabling it

Can we keep this as a module if we stick the remapping struct device
in pci_sysdata instead of going through the vmd driver to get it?

Otherwise, very happy to see this dma wrapping go away.


Re: [PATCH 1/3] MAINTAINERS: Add Jonathan Derrick as VMD maintainer

2017-08-11 Thread Keith Busch
On Mon, Aug 07, 2017 at 01:57:11PM -0600, Jon Derrick wrote:
> Add myself as VMD maintainer
> 
> Signed-off-by: Jon Derrick <jonathan.derr...@intel.com>

Thanks for adding. 

Acked-by: Keith Busch <keith.bu...@intel.com>


> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f66488d..3ec39df 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -10090,6 +10090,7 @@ F:drivers/pci/dwc/*imx6*
>  
>  PCI DRIVER FOR INTEL VOLUME MANAGEMENT DEVICE (VMD)
>  M:   Keith Busch <keith.bu...@intel.com>
> +M:   Jonathan Derrick <jonathan.derr...@intel.com>
>  L:   linux-...@vger.kernel.org
>  S:   Supported
>  F:   drivers/pci/host/vmd.c
> -- 
> 2.9.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu