Re: [PATCH v4 7/9] iommu/vt-d: Add trace events for domain map/unmap

2019-06-04 Thread Steven Rostedt
On Mon,  3 Jun 2019 09:16:18 +0800
Lu Baolu  wrote:


> +TRACE_EVENT(bounce_unmap_single,
> + TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
> +
> + TP_ARGS(dev, dev_addr, size),
> +
> + TP_STRUCT__entry(
> + __string(dev_name, dev_name(dev))
> + __field(dma_addr_t, dev_addr)
> + __field(size_t, size)
> + ),
> +
> + TP_fast_assign(
> + __assign_str(dev_name, dev_name(dev));
> + __entry->dev_addr = dev_addr;
> + __entry->size = size;
> + ),
> +
> + TP_printk("dev=%s dev_addr=0x%llx size=%zu",
> +   __get_str(dev_name),
> +   (unsigned long long)__entry->dev_addr,
> +   __entry->size)
> +);
> +
> +TRACE_EVENT(bounce_map_sg,
> + TP_PROTO(struct device *dev, unsigned int i, unsigned int nelems,
> +  dma_addr_t dev_addr, phys_addr_t phys_addr, size_t size),
> +
> + TP_ARGS(dev, i, nelems, dev_addr, phys_addr, size),
> +
> + TP_STRUCT__entry(
> + __string(dev_name, dev_name(dev))
> + __field(unsigned int, i)
> + __field(unsigned int, last)
> + __field(dma_addr_t, dev_addr)
> + __field(phys_addr_t, phys_addr)
> + __field(size_t, size)
> + ),
> +
> + TP_fast_assign(
> + __assign_str(dev_name, dev_name(dev));
> + __entry->i = i;
> + __entry->last = nelems - 1;
> + __entry->dev_addr = dev_addr;
> + __entry->phys_addr = phys_addr;
> + __entry->size = size;
> + ),
> +
> + TP_printk("dev=%s elem=%u/%u dev_addr=0x%llx phys_addr=0x%llx size=%zu",
> +   __get_str(dev_name), __entry->i, __entry->last,
> +   (unsigned long long)__entry->dev_addr,
> +   (unsigned long long)__entry->phys_addr,
> +   __entry->size)
> +);
> +
> +TRACE_EVENT(bounce_unmap_sg,
> + TP_PROTO(struct device *dev, unsigned int i, unsigned int nelems,
> +  dma_addr_t dev_addr, phys_addr_t phys_addr, size_t size),
> +
> + TP_ARGS(dev, i, nelems, dev_addr, phys_addr, size),
> +
> + TP_STRUCT__entry(
> + __string(dev_name, dev_name(dev))
> + __field(unsigned int, i)
> + __field(unsigned int, last)
> + __field(dma_addr_t, dev_addr)
> + __field(phys_addr_t, phys_addr)
> + __field(size_t, size)
> + ),
> +
> + TP_fast_assign(
> + __assign_str(dev_name, dev_name(dev));
> + __entry->i = i;
> + __entry->last = nelems - 1;
> + __entry->dev_addr = dev_addr;
> + __entry->phys_addr = phys_addr;
> + __entry->size = size;
> + ),
> +
> + TP_printk("dev=%s elem=%u/%u dev_addr=0x%llx phys_addr=0x%llx size=%zu",
> +   __get_str(dev_name), __entry->i, __entry->last,
> +   (unsigned long long)__entry->dev_addr,
> +   (unsigned long long)__entry->phys_addr,
> +   __entry->size)
> +);

These last two events look identical. Please use the
DECLARE_EVENT_CLASS() to describe the event and then DEFINE_EVENT() for
the two events.

Each TRACE_EVENT() can add up to 5k of data/text, where as a
DEFINE_EVENT() just adds around 250 bytes.

(Note, a TRACE_EVENT() is defined as a
DECLARE_EVENT_CLASS()/DEFINE_EVENT() pair)

-- Steve


> +#endif /* _TRACE_INTEL_IOMMU_H */
> +
> +/* This part must be outside protection */
> +#include 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 05/29] iommu: Add a timeout parameter for PRQ response

2019-06-04 Thread Jean-Philippe Brucker
On 03/06/2019 23:32, Alex Williamson wrote:
> It doesn't seem to make much sense to include this patch without also
> including "iommu: handle page response timeout".  Was that one lost?
> Dropped?  Lives elsewhere?

The first 7 patches come from my sva/api branch, where I had forgotten
to add the "handle page response timeout" patch. I added it back,
probably after Eric sent this version. But I don't think the patch is
ready for upstream, as we still haven't decided how to proceed with
timeouts. Patches 6 and 7 are for debugging, I don't know if they should
go upstream.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/26] iommu/dma: move the arm64 wrappers to common code

2019-06-04 Thread Jon Hunter


On 04/06/2019 07:05, Christoph Hellwig wrote:
> On Mon, Jun 03, 2019 at 08:47:57PM +0100, Jon Hunter wrote:
>> Since next-20190529 one of our tests for MMC has started failing, where
>> the symptom is that the data written to the MMC does not match the
>> source. Bisecting this is pointing to this commit. Unfortunately, I am
>> not able to cleanly revert this on top of -next, but wanted to report
>> this if case you have any ideas.
> 
> Does this fix your problem?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=generic-dma-ops&id=1b961423158caaae49d3900b7c9c37477bbfa9b3

Yes I can confirm with this patch on today's -next the issue is no
longer seen, and reverting this patch on top of today's -next causes the
problem to occur again. So yes this fixes my problem.

Thanks!
Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 04/29] iommu: Add recoverable fault reporting

2019-06-04 Thread Jacob Pan
On Mon, 3 Jun 2019 16:31:45 -0600
Alex Williamson  wrote:

> On Sun, 26 May 2019 18:09:39 +0200
> Eric Auger  wrote:
> 
> > From: Jean-Philippe Brucker 
> > 
> > Some IOMMU hardware features, for example PCI's PRI and Arm SMMU's
> > Stall, enable recoverable I/O page faults. Allow IOMMU drivers to
> > report PRI Page Requests and Stall events through the new fault
> > reporting API. The consumer of the fault can be either an I/O page
> > fault handler in the host, or a guest OS.
> > 
> > Once handled, the fault must be completed by sending a page
> > response back to the IOMMU. Add an iommu_page_response() function
> > to complete a page fault.
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >  drivers/iommu/iommu.c | 77
> > ++- include/linux/iommu.h |
> > 51  2 files changed, 127 insertions(+),
> > 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 795518445a3a..13b301cfb10f 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -869,7 +869,14 @@
> > EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
> >   * @data: private data passed as argument to the handler
> >   *
> >   * When an IOMMU fault event is received, this handler gets called
> > with the
> > - * fault event and data as argument.
> > + * fault event and data as argument. The handler should return 0
> > on success. If
> > + * the fault is recoverable (IOMMU_FAULT_PAGE_REQ), the handler
> > should also
> > + * complete the fault by calling iommu_page_response() with one of
> > the following
> > + * response code:
> > + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> > + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> > + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop
> > reporting
> > + *   page faults if possible.
> >   *
> >   * Return 0 if the fault handler was installed successfully, or an
> > error. */
> > @@ -904,6 +911,8 @@ int iommu_register_device_fault_handler(struct
> > device *dev, }
> > param->fault_param->handler = handler;
> > param->fault_param->data = data;
> > +   mutex_init(¶m->fault_param->lock);
> > +   INIT_LIST_HEAD(¶m->fault_param->faults);
> >  
> >  done_unlock:
> > mutex_unlock(¶m->lock);
> > @@ -934,6 +943,12 @@ int
> > iommu_unregister_device_fault_handler(struct device *dev) if
> > (!param->fault_param) goto unlock;
> >  
> > +   /* we cannot unregister handler if there are pending
> > faults */
> > +   if (!list_empty(¶m->fault_param->faults)) {
> > +   ret = -EBUSY;
> > +   goto unlock;
> > +   }  
> 
> Why?  Attempting to unregister a fault handler suggests the handler
> doesn't care about outstanding faults.  Can't we go ahead and dispatch
> them as failed?  Otherwise we need to be careful that we don't
> introduce an environment where the registered fault handler is blocked
> trying to shutdown and release the device due to a flood of errors.
> Thanks,
> 
My original thinking was that outstanding faults such as PRQ can be
cleared if the handler does not send PRS within timeout. This could be
the case of a malicious guest.

But now I think your suggestion makes sense, it is better to clear out
the pending faults immediately. Then registered fault handler will not
be blocked. And flood of faults will not be reported outside IOMMU
after handler is unregistered.

Jean, would you agree? I guess you are taking care of it in your
sva/api tree now :).

> Alex
> 
> > +
> > kfree(param->fault_param);
> > param->fault_param = NULL;
> > put_device(dev);
> > @@ -958,6 +973,7 @@
> > EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler); int
> > iommu_report_device_fault(struct device *dev, struct
> > iommu_fault_event *evt) { struct iommu_param *param =
> > dev->iommu_param;
> > +   struct iommu_fault_event *evt_pending;
> > struct iommu_fault_param *fparam;
> > int ret = 0;
> >  
> > @@ -972,6 +988,20 @@ int iommu_report_device_fault(struct device
> > *dev, struct iommu_fault_event *evt) ret = -EINVAL;
> > goto done_unlock;
> > }
> > +
> > +   if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
> > +   (evt->fault.prm.flags &
> > IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
> > +   evt_pending = kmemdup(evt, sizeof(struct
> > iommu_fault_event),
> > + GFP_KERNEL);
> > +   if (!evt_pending) {
> > +   ret = -ENOMEM;
> > +   goto done_unlock;
> > +   }
> > +   mutex_lock(&fparam->lock);
> > +   list_add_tail(&evt_pending->list, &fparam->faults);
> > +   mutex_unlock(&fparam->lock);
> > +   }
> > +
> > ret = fparam->handler(evt, fparam->data);
> >  done_unlock:
> > mutex_unlock(¶m->lock);
> > @@ -1513,6 +1543,51 @@ int iommu_attach_device(struct iommu_domain
> > *domain, struct device *dev) }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +int iommu_page

Re: [PATCH v8 05/29] iommu: Add a timeout parameter for PRQ response

2019-06-04 Thread Jacob Pan
On Tue, 4 Jun 2019 11:52:18 +0100
Jean-Philippe Brucker  wrote:

> On 03/06/2019 23:32, Alex Williamson wrote:
> > It doesn't seem to make much sense to include this patch without
> > also including "iommu: handle page response timeout".  Was that one
> > lost? Dropped?  Lives elsewhere?  
> 
> The first 7 patches come from my sva/api branch, where I had forgotten
> to add the "handle page response timeout" patch. I added it back,
> probably after Eric sent this version. But I don't think the patch is
> ready for upstream, as we still haven't decided how to proceed with
> timeouts. Patches 6 and 7 are for debugging, I don't know if they
> should go upstream.
Yeah, we can wait until we all agree on timeouts. It was introduced for
a basic safeguard against unresponsive guests.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-04 Thread Auger Eric
Hi Alex,

On 6/4/19 12:31 AM, Alex Williamson wrote:
> On Sun, 26 May 2019 18:10:01 +0200
> Eric Auger  wrote:
> 
>> This patch registers a fault handler which records faults in
>> a circular buffer and then signals an eventfd. This buffer is
>> exposed within the fault region.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v3 -> v4:
>> - move iommu_unregister_device_fault_handler to vfio_pci_release
>> ---
>>  drivers/vfio/pci/vfio_pci.c | 49 +
>>  drivers/vfio/pci/vfio_pci_private.h |  1 +
>>  2 files changed, 50 insertions(+)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index f75f61127277..52094ba8 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -30,6 +30,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include "vfio_pci_private.h"
>>  
>> @@ -296,6 +297,46 @@ static const struct vfio_pci_regops 
>> vfio_pci_fault_prod_regops = {
>>  .add_capability = vfio_pci_fault_prod_add_capability,
>>  };
>>  
>> +int vfio_pci_iommu_dev_fault_handler(struct iommu_fault_event *evt, void 
>> *data)
>> +{
>> +struct vfio_pci_device *vdev = (struct vfio_pci_device *) data;
>> +struct vfio_region_fault_prod *prod_region =
>> +(struct vfio_region_fault_prod *)vdev->fault_pages;
>> +struct vfio_region_fault_cons *cons_region =
>> +(struct vfio_region_fault_cons *)(vdev->fault_pages + 2 * 
>> PAGE_SIZE);
>> +struct iommu_fault *new =
>> +(struct iommu_fault *)(vdev->fault_pages + prod_region->offset +
>> +prod_region->prod * prod_region->entry_size);
>> +int prod, cons, size;
>> +
>> +mutex_lock(&vdev->fault_queue_lock);
>> +
>> +if (!vdev->fault_abi)
>> +goto unlock;
>> +
>> +prod = prod_region->prod;
>> +cons = cons_region->cons;
>> +size = prod_region->nb_entries;
>> +
>> +if (CIRC_SPACE(prod, cons, size) < 1)
>> +goto unlock;
>> +
>> +*new = evt->fault;
>> +prod = (prod + 1) % size;
>> +prod_region->prod = prod;
>> +mutex_unlock(&vdev->fault_queue_lock);
>> +
>> +mutex_lock(&vdev->igate);
>> +if (vdev->dma_fault_trigger)
>> +eventfd_signal(vdev->dma_fault_trigger, 1);
>> +mutex_unlock(&vdev->igate);
>> +return 0;
>> +
>> +unlock:
>> +mutex_unlock(&vdev->fault_queue_lock);
>> +return -EINVAL;
>> +}
>> +
>>  static int vfio_pci_init_fault_region(struct vfio_pci_device *vdev)
>>  {
>>  struct vfio_region_fault_prod *header;
>> @@ -328,6 +369,13 @@ static int vfio_pci_init_fault_region(struct 
>> vfio_pci_device *vdev)
>>  header = (struct vfio_region_fault_prod *)vdev->fault_pages;
>>  header->version = -1;
>>  header->offset = PAGE_SIZE;
>> +
>> +ret = iommu_register_device_fault_handler(&vdev->pdev->dev,
>> +vfio_pci_iommu_dev_fault_handler,
>> +vdev);
>> +if (ret)
>> +goto out;
>> +
>>  return 0;
>>  out:
>>  kfree(vdev->fault_pages);
>> @@ -570,6 +618,7 @@ static void vfio_pci_release(void *device_data)
>>  if (!(--vdev->refcnt)) {
>>  vfio_spapr_pci_eeh_release(vdev->pdev);
>>  vfio_pci_disable(vdev);
>> +iommu_unregister_device_fault_handler(&vdev->pdev->dev);
> 
> 
> But this can fail if there are pending faults which leaves a device
> reference and then the system is broken :(
This series only features unrecoverable errors and for those the
unregistration cannot fail. Now unrecoverable errors were added I admit
this is confusing. We need to sort this out or clean the dependencies.

Thanks

Eric
> 
>>  }
>>  
>>  mutex_unlock(&vdev->reflck->lock);
>> diff --git a/drivers/vfio/pci/vfio_pci_private.h 
>> b/drivers/vfio/pci/vfio_pci_private.h
>> index 8e0a55682d3f..a9276926f008 100644
>> --- a/drivers/vfio/pci/vfio_pci_private.h
>> +++ b/drivers/vfio/pci/vfio_pci_private.h
>> @@ -122,6 +122,7 @@ struct vfio_pci_device {
>>  int ioeventfds_nr;
>>  struct eventfd_ctx  *err_trigger;
>>  struct eventfd_ctx  *req_trigger;
>> +struct eventfd_ctx  *dma_fault_trigger;
>>  struct mutexfault_queue_lock;
>>  int fault_abi;
>>  struct list_headdummy_resources_list;
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 28/29] vfio-pci: Add VFIO_PCI_DMA_FAULT_IRQ_INDEX

2019-06-04 Thread Auger Eric
Hi Alex,

On 6/4/19 12:31 AM, Alex Williamson wrote:
> On Sun, 26 May 2019 18:10:03 +0200
> Eric Auger  wrote:
> 
>> Add a new VFIO_PCI_DMA_FAULT_IRQ_INDEX index. This allows to
>> set/unset an eventfd that will be triggered when DMA translation
>> faults are detected at physical level when the nested mode is used.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  drivers/vfio/pci/vfio_pci.c   |  3 +++
>>  drivers/vfio/pci/vfio_pci_intrs.c | 19 +++
>>  include/uapi/linux/vfio.h |  1 +
>>  3 files changed, 23 insertions(+)
> 
> 
> Note that I suggested to Intel folks trying to add a GVT-g page
> flipping eventfd to convert to device specific interrupts the same way
> we added device specific regions:
> 
> https://patchwork.kernel.org/patch/10962337/
> 
> I'd probably suggest the same here so we can optionally expose it when
> supported.  Thanks,

Agreed, I will follow the other thread.

Thanks

Eric
> 
> Alex
> 
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index a9c8af2a774a..65a1e6814f5c 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -746,6 +746,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
>> *vdev, int irq_type)
>>  return 1;
>>  } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
>>  return 1;
>> +} else if (irq_type == VFIO_PCI_DMA_FAULT_IRQ_INDEX) {
>> +return 1;
>>  }
>>  
>>  return 0;
>> @@ -1082,6 +1084,7 @@ static long vfio_pci_ioctl(void *device_data,
>>  switch (info.index) {
>>  case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
>>  case VFIO_PCI_REQ_IRQ_INDEX:
>> +case VFIO_PCI_DMA_FAULT_IRQ_INDEX:
>>  break;
>>  case VFIO_PCI_ERR_IRQ_INDEX:
>>  if (pci_is_pcie(vdev->pdev))
>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
>> b/drivers/vfio/pci/vfio_pci_intrs.c
>> index 1c46045b0e7f..28a96117daf3 100644
>> --- a/drivers/vfio/pci/vfio_pci_intrs.c
>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
>> @@ -622,6 +622,18 @@ static int vfio_pci_set_req_trigger(struct 
>> vfio_pci_device *vdev,
>> count, flags, data);
>>  }
>>  
>> +static int vfio_pci_set_dma_fault_trigger(struct vfio_pci_device *vdev,
>> +  unsigned index, unsigned start,
>> +  unsigned count, uint32_t flags,
>> +  void *data)
>> +{
>> +if (index != VFIO_PCI_DMA_FAULT_IRQ_INDEX || start != 0 || count > 1)
>> +return -EINVAL;
>> +
>> +return vfio_pci_set_ctx_trigger_single(&vdev->dma_fault_trigger,
>> +   count, flags, data);
>> +}
>> +
>>  int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
>>  unsigned index, unsigned start, unsigned count,
>>  void *data)
>> @@ -671,6 +683,13 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device 
>> *vdev, uint32_t flags,
>>  break;
>>  }
>>  break;
>> +case VFIO_PCI_DMA_FAULT_IRQ_INDEX:
>> +switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>> +case VFIO_IRQ_SET_ACTION_TRIGGER:
>> +func = vfio_pci_set_dma_fault_trigger;
>> +break;
>> +}
>> +break;
>>  }
>>  
>>  if (!func)
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 13e041b84d48..66b6b08c4a38 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -559,6 +559,7 @@ enum {
>>  VFIO_PCI_MSIX_IRQ_INDEX,
>>  VFIO_PCI_ERR_IRQ_INDEX,
>>  VFIO_PCI_REQ_IRQ_INDEX,
>> +VFIO_PCI_DMA_FAULT_IRQ_INDEX,
>>  VFIO_PCI_NUM_IRQS
>>  };
>>  
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu: Add gfp parameter to iommu_ops::map

2019-06-04 Thread Robin Murphy

On 06/05/2019 19:52, Tom Murphy wrote:

Add a gfp_t parameter to the iommu_ops::map function.
Remove the needless locking in the AMD iommu driver.

The iommu_ops::map function (or the iommu_map function which calls it)
was always supposed to be sleepable (according to Joerg's comment in
this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so
should probably have had a "might_sleep()" since it was written. However
currently the dma-iommu api can call iommu_map in an atomic context,
which it shouldn't do. This doesn't cause any problems because any iommu
driver which uses the dma-iommu api uses gfp_atomic in it's
iommu_ops::map function. But doing this wastes the memory allocators
atomic pools.


Hmm, in some cases iommu_ops::unmap may need to allocate as well, 
primarily if it needs to split a hugepage mapping. Should we pass flags 
through there as well, or are we prepared to assume that that case will 
happen rarely enough that it's fair to just assume GFP_ATOMIC? It won't 
happen for DMA ops, but it's conceivable that other users such as GPU 
drivers might make partial unmaps, and I doubt we could totally rule out 
the wackiest ones doing so from non-sleeping contexts.


Robin.


We can remove the mutex lock from amd_iommu_map and amd_iommu_unmap.
iommu_map doesn’t lock while mapping and so no two calls should touch
the same iova range. The AMD driver already handles the page table page
allocations without locks so we can safely remove the locks.

Signed-off-by: Tom Murphy 
---
  drivers/iommu/amd_iommu.c  | 14 ---
  drivers/iommu/arm-smmu-v3.c|  2 +-
  drivers/iommu/arm-smmu.c   |  2 +-
  drivers/iommu/dma-iommu.c  |  6 ++---
  drivers/iommu/exynos-iommu.c   |  2 +-
  drivers/iommu/intel-iommu.c|  2 +-
  drivers/iommu/iommu.c  | 43 +-
  drivers/iommu/ipmmu-vmsa.c |  2 +-
  drivers/iommu/msm_iommu.c  |  2 +-
  drivers/iommu/mtk_iommu.c  |  2 +-
  drivers/iommu/mtk_iommu_v1.c   |  2 +-
  drivers/iommu/omap-iommu.c |  2 +-
  drivers/iommu/qcom_iommu.c |  2 +-
  drivers/iommu/rockchip-iommu.c |  2 +-
  drivers/iommu/s390-iommu.c |  2 +-
  drivers/iommu/tegra-gart.c |  2 +-
  drivers/iommu/tegra-smmu.c |  2 +-
  include/linux/iommu.h  | 21 -
  18 files changed, 78 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index ebd062522cf5..ea3a5dc61bb0 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3092,7 +3092,8 @@ static int amd_iommu_attach_device(struct iommu_domain 
*dom,
  }
  
  static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,

-phys_addr_t paddr, size_t page_size, int iommu_prot)
+phys_addr_t paddr, size_t page_size, int iommu_prot,
+gfp_t gfp)
  {
struct protection_domain *domain = to_pdomain(dom);
int prot = 0;
@@ -3106,9 +3107,7 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
if (iommu_prot & IOMMU_WRITE)
prot |= IOMMU_PROT_IW;
  
-	mutex_lock(&domain->api_lock);

-   ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL);
-   mutex_unlock(&domain->api_lock);
+   ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp);
  
  	domain_flush_np_cache(domain, iova, page_size);
  
@@ -3119,16 +3118,11 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,

   size_t page_size)
  {
struct protection_domain *domain = to_pdomain(dom);
-   size_t unmap_size;
  
  	if (domain->mode == PAGE_MODE_NONE)

return 0;
  
-	mutex_lock(&domain->api_lock);

-   unmap_size = iommu_unmap_page(domain, iova, page_size);
-   mutex_unlock(&domain->api_lock);
-
-   return unmap_size;
+   return iommu_unmap_page(domain, iova, page_size);
  }
  
  static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index d3880010c6cf..e5c48089b49f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1777,7 +1777,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
  }
  
  static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,

-   phys_addr_t paddr, size_t size, int prot)
+   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
  {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
  
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c

index 045d93884164..2d50db55b788 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1286,7 +1286,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
  }
  
  static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,


Re: [PATCH v3 1/4] iommu: Add gfp parameter to iommu_ops::map

2019-06-04 Thread Tom Murphy
On Tue, Jun 4, 2019 at 7:11 PM Robin Murphy  wrote:
>
> On 06/05/2019 19:52, Tom Murphy wrote:
> > Add a gfp_t parameter to the iommu_ops::map function.
> > Remove the needless locking in the AMD iommu driver.
> >
> > The iommu_ops::map function (or the iommu_map function which calls it)
> > was always supposed to be sleepable (according to Joerg's comment in
> > this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so
> > should probably have had a "might_sleep()" since it was written. However
> > currently the dma-iommu api can call iommu_map in an atomic context,
> > which it shouldn't do. This doesn't cause any problems because any iommu
> > driver which uses the dma-iommu api uses gfp_atomic in it's
> > iommu_ops::map function. But doing this wastes the memory allocators
> > atomic pools.
>
> Hmm, in some cases iommu_ops::unmap may need to allocate as well,
> primarily if it needs to split a hugepage mapping. Should we pass flags

Are you sure that's the case?

I don't see that in the amd or intel iommu_ops::unmap code and from
looking at the intel code it seems like we actually unmap the whole
large page:
/* Cope with horrid API which requires us to unmap more than the
size argument if it happens to be a large-page mapping. */
BUG_ON(!pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, &level));

if (size < VTD_PAGE_SIZE << level_to_offset_bits(level))
  size = VTD_PAGE_SIZE << level_to_offset_bits(level);



> through there as well, or are we prepared to assume that that case will
> happen rarely enough that it's fair to just assume GFP_ATOMIC? It won't
> happen for DMA ops, but it's conceivable that other users such as GPU
> drivers might make partial unmaps, and I doubt we could totally rule out
> the wackiest ones doing so from non-sleeping contexts.
>
> Robin.
>
> > We can remove the mutex lock from amd_iommu_map and amd_iommu_unmap.
> > iommu_map doesn’t lock while mapping and so no two calls should touch
> > the same iova range. The AMD driver already handles the page table page
> > allocations without locks so we can safely remove the locks.
> >
> > Signed-off-by: Tom Murphy 
> > ---
> >   drivers/iommu/amd_iommu.c  | 14 ---
> >   drivers/iommu/arm-smmu-v3.c|  2 +-
> >   drivers/iommu/arm-smmu.c   |  2 +-
> >   drivers/iommu/dma-iommu.c  |  6 ++---
> >   drivers/iommu/exynos-iommu.c   |  2 +-
> >   drivers/iommu/intel-iommu.c|  2 +-
> >   drivers/iommu/iommu.c  | 43 +-
> >   drivers/iommu/ipmmu-vmsa.c |  2 +-
> >   drivers/iommu/msm_iommu.c  |  2 +-
> >   drivers/iommu/mtk_iommu.c  |  2 +-
> >   drivers/iommu/mtk_iommu_v1.c   |  2 +-
> >   drivers/iommu/omap-iommu.c |  2 +-
> >   drivers/iommu/qcom_iommu.c |  2 +-
> >   drivers/iommu/rockchip-iommu.c |  2 +-
> >   drivers/iommu/s390-iommu.c |  2 +-
> >   drivers/iommu/tegra-gart.c |  2 +-
> >   drivers/iommu/tegra-smmu.c |  2 +-
> >   include/linux/iommu.h  | 21 -
> >   18 files changed, 78 insertions(+), 34 deletions(-)
> >
> > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> > index ebd062522cf5..ea3a5dc61bb0 100644
> > --- a/drivers/iommu/amd_iommu.c
> > +++ b/drivers/iommu/amd_iommu.c
> > @@ -3092,7 +3092,8 @@ static int amd_iommu_attach_device(struct 
> > iommu_domain *dom,
> >   }
> >
> >   static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
> > -  phys_addr_t paddr, size_t page_size, int iommu_prot)
> > +  phys_addr_t paddr, size_t page_size, int iommu_prot,
> > +  gfp_t gfp)
> >   {
> >   struct protection_domain *domain = to_pdomain(dom);
> >   int prot = 0;
> > @@ -3106,9 +3107,7 @@ static int amd_iommu_map(struct iommu_domain *dom, 
> > unsigned long iova,
> >   if (iommu_prot & IOMMU_WRITE)
> >   prot |= IOMMU_PROT_IW;
> >
> > - mutex_lock(&domain->api_lock);
> > - ret = iommu_map_page(domain, iova, paddr, page_size, prot, 
> > GFP_KERNEL);
> > - mutex_unlock(&domain->api_lock);
> > + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp);
> >
> >   domain_flush_np_cache(domain, iova, page_size);
> >
> > @@ -3119,16 +3118,11 @@ static size_t amd_iommu_unmap(struct iommu_domain 
> > *dom, unsigned long iova,
> >  size_t page_size)
> >   {
> >   struct protection_domain *domain = to_pdomain(dom);
> > - size_t unmap_size;
> >
> >   if (domain->mode == PAGE_MODE_NONE)
> >   return 0;
> >
> > - mutex_lock(&domain->api_lock);
> > - unmap_size = iommu_unmap_page(domain, iova, page_size);
> > - mutex_unlock(&domain->api_lock);
> > -
> > - return unmap_size;
> > + return iommu_unmap_page(domain, iova, page_size);
> >   }
> >
> >   static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index d388

Re: [PATCH v3 1/4] iommu: Add gfp parameter to iommu_ops::map

2019-06-04 Thread Rob Clark
On Tue, Jun 4, 2019 at 11:11 AM Robin Murphy  wrote:
>
> On 06/05/2019 19:52, Tom Murphy wrote:
> > Add a gfp_t parameter to the iommu_ops::map function.
> > Remove the needless locking in the AMD iommu driver.
> >
> > The iommu_ops::map function (or the iommu_map function which calls it)
> > was always supposed to be sleepable (according to Joerg's comment in
> > this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so
> > should probably have had a "might_sleep()" since it was written. However
> > currently the dma-iommu api can call iommu_map in an atomic context,
> > which it shouldn't do. This doesn't cause any problems because any iommu
> > driver which uses the dma-iommu api uses gfp_atomic in it's
> > iommu_ops::map function. But doing this wastes the memory allocators
> > atomic pools.
>
> Hmm, in some cases iommu_ops::unmap may need to allocate as well,
> primarily if it needs to split a hugepage mapping. Should we pass flags
> through there as well, or are we prepared to assume that that case will
> happen rarely enough that it's fair to just assume GFP_ATOMIC? It won't
> happen for DMA ops, but it's conceivable that other users such as GPU
> drivers might make partial unmaps, and I doubt we could totally rule out
> the wackiest ones doing so from non-sleeping contexts.
>

jfyi, today we (well, drm/msm) only unmap entire buffers, so assuming
there isn't any coelescense of adjacent buffers that happen to form a
hugepage on ::map(), we are probably ok on ::unmap()..

we do always only call ::map or ::unmap under sleepable conditions.

btw, would it be simpler to just make gfp_t a domain a attribute?
That seems like it would be less churn, but maybe I'm overlooking
something.

BR,
-R


Re: [PATCH] iommu: io-pgtable: Support non-coherent page tables

2019-06-04 Thread Bjorn Andersson
On Wed 15 May 23:47 PDT 2019, Vivek Gautam wrote:

> On Thu, May 16, 2019 at 5:03 AM Bjorn Andersson
>  wrote:
> >
> > Describe the memory related to page table walks as non-cachable for iommu
> > instances that are not DMA coherent.
> >
> > Signed-off-by: Bjorn Andersson 
> > ---
> >  drivers/iommu/io-pgtable-arm.c | 12 +---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index 4e21efbc4459..68ff22ffd2cb 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -803,9 +803,15 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg 
> > *cfg, void *cookie)
> > return NULL;
> >
> > /* TCR */
> > -   reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
> > - (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
> > - (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
> > +   if (cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA) {
> > +   reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
> > + (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
> > + (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
> > +   } else {
> > +   reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
> > + (ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT) |
> > + (ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT);
> > +   }
> 
> This looks okay to me based on the discussion that we had on a similar
> patch that I
> posted. So,
> Reviewed-by: Vivek Gautam 
> 
> [1] https://lore.kernel.org/patchwork/patch/1032939/
> 

Will, Robin, any input on this patch?

Regards,
Bjorn


Re: [PATCH v3 1/4] iommu: Add gfp parameter to iommu_ops::map

2019-06-04 Thread Christoph Hellwig
On Mon, May 06, 2019 at 07:52:03PM +0100, Tom Murphy via iommu wrote:
> We can remove the mutex lock from amd_iommu_map and amd_iommu_unmap.
> iommu_map doesn’t lock while mapping and so no two calls should touch
> the same iova range. The AMD driver already handles the page table page
> allocations without locks so we can safely remove the locks.

Btw, this really should be a separate patch.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 7/9] iommu/vt-d: Add trace events for domain map/unmap

2019-06-04 Thread Lu Baolu

Hi Steve,

On 6/4/19 5:01 PM, Steven Rostedt wrote:

On Mon,  3 Jun 2019 09:16:18 +0800
Lu Baolu  wrote:



+TRACE_EVENT(bounce_unmap_single,
+   TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
+
+   TP_ARGS(dev, dev_addr, size),
+
+   TP_STRUCT__entry(
+   __string(dev_name, dev_name(dev))
+   __field(dma_addr_t, dev_addr)
+   __field(size_t, size)
+   ),
+
+   TP_fast_assign(
+   __assign_str(dev_name, dev_name(dev));
+   __entry->dev_addr = dev_addr;
+   __entry->size = size;
+   ),
+
+   TP_printk("dev=%s dev_addr=0x%llx size=%zu",
+ __get_str(dev_name),
+ (unsigned long long)__entry->dev_addr,
+ __entry->size)
+);
+
+TRACE_EVENT(bounce_map_sg,
+   TP_PROTO(struct device *dev, unsigned int i, unsigned int nelems,
+dma_addr_t dev_addr, phys_addr_t phys_addr, size_t size),
+
+   TP_ARGS(dev, i, nelems, dev_addr, phys_addr, size),
+
+   TP_STRUCT__entry(
+   __string(dev_name, dev_name(dev))
+   __field(unsigned int, i)
+   __field(unsigned int, last)
+   __field(dma_addr_t, dev_addr)
+   __field(phys_addr_t, phys_addr)
+   __field(size_t, size)
+   ),
+
+   TP_fast_assign(
+   __assign_str(dev_name, dev_name(dev));
+   __entry->i = i;
+   __entry->last = nelems - 1;
+   __entry->dev_addr = dev_addr;
+   __entry->phys_addr = phys_addr;
+   __entry->size = size;
+   ),
+
+   TP_printk("dev=%s elem=%u/%u dev_addr=0x%llx phys_addr=0x%llx size=%zu",
+ __get_str(dev_name), __entry->i, __entry->last,
+ (unsigned long long)__entry->dev_addr,
+ (unsigned long long)__entry->phys_addr,
+ __entry->size)
+);
+
+TRACE_EVENT(bounce_unmap_sg,
+   TP_PROTO(struct device *dev, unsigned int i, unsigned int nelems,
+dma_addr_t dev_addr, phys_addr_t phys_addr, size_t size),
+
+   TP_ARGS(dev, i, nelems, dev_addr, phys_addr, size),
+
+   TP_STRUCT__entry(
+   __string(dev_name, dev_name(dev))
+   __field(unsigned int, i)
+   __field(unsigned int, last)
+   __field(dma_addr_t, dev_addr)
+   __field(phys_addr_t, phys_addr)
+   __field(size_t, size)
+   ),
+
+   TP_fast_assign(
+   __assign_str(dev_name, dev_name(dev));
+   __entry->i = i;
+   __entry->last = nelems - 1;
+   __entry->dev_addr = dev_addr;
+   __entry->phys_addr = phys_addr;
+   __entry->size = size;
+   ),
+
+   TP_printk("dev=%s elem=%u/%u dev_addr=0x%llx phys_addr=0x%llx size=%zu",
+ __get_str(dev_name), __entry->i, __entry->last,
+ (unsigned long long)__entry->dev_addr,
+ (unsigned long long)__entry->phys_addr,
+ __entry->size)
+);


These last two events look identical. Please use the
DECLARE_EVENT_CLASS() to describe the event and then DEFINE_EVENT() for
the two events.

Each TRACE_EVENT() can add up to 5k of data/text, where as a
DEFINE_EVENT() just adds around 250 bytes.

(Note, a TRACE_EVENT() is defined as a
DECLARE_EVENT_CLASS()/DEFINE_EVENT() pair)


Thanks for reviewing. I will rework this patch according to your
comments here.



-- Steve



Best regards,
Baolu




+#endif /* _TRACE_INTEL_IOMMU_H */
+
+/* This part must be outside protection */
+#include 




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu