Re: [PATCH V3 6/8] arm: dma-mapping: Reset the device's dma_ops

2017-05-23 Thread Russell King - ARM Linux
On Wed, May 24, 2017 at 12:46:51AM +0300, Laurent Pinchart wrote:
> Hi Russell,
> 
> On Tuesday 23 May 2017 18:53:19 Russell King - ARM Linux wrote:
> > On Tue, May 23, 2017 at 05:55:57PM +0100, Robin Murphy wrote:
> > > On 23/05/17 17:25, Russell King - ARM Linux wrote:
> > >> So, I've come to apply this patch (since it's finally landed in the
> > >> patch system), and I'm not convinced that the commit message is really
> > >> up to scratch.
> > >> 
> > >> The current commit message looks like this:
> > >> 
> > >> "   ARM: 8674/1: dma-mapping: Reset the device's dma_ops
> > >> arch_teardown_dma_ops() being the inverse of arch_setup_dma_ops(),
> > >> dma_ops should be cleared in the teardown path. Otherwise this
> > >> causes problem when the probe of device is retried after being
> > >> deferred. The device's iommu structures are cleared after
> > >> EPROBEDEFER error, but on the next try dma_ops will still be set to
> > >> old value, which is not right."
> > >> 
> > >> It is obviously a fix, but a fix for which patch?  Looking at the
> > >> history, we have "arm: dma-mapping: Don't override dma_ops in
> > >> arch_setup_dma_ops()" which I would have guessed is the appropriate
> > >> one, but this post-dates your patch (it's very recent, v4.12-rc
> > >> recent.)
> > >> 
> > >> So, I need more description about the problem you were seeing when
> > >> you first proposed this patch.
> > >> 
> > >> How does leaving the dma_ops in place prior to "arm: dma-mapping:
> > >> Don't override dma_ops in arch_setup_dma_ops()" cause problems for
> > >> deferred probing?
> > >> 
> > >> What patch is your change trying to fix?  In other words, how far
> > >> back does this patch need to be backported?
> > > 
> > > In effect, it's fixing a latent inconsistency that's been present since
> > > its introduction in 4bb25789ed28. However, that has clearly not proven
> > > to be an issue in practice since then. With 09515ef5ddad we start
> > > actually calling arch_teardown_dma_ops() in a manner that might leave
> > > things partially initialised if anyone starts removing and reprobing
> > > drivers, but so far that's still from code inspection[1] rather than
> > > anyone hitting it.
> > > 
> > > Given that the changes which tickle it are fresh in -rc1 I'd say there's
> > > no need to backport this, but at the same time it shouldn't do any
> > > damage if you really want to.
> > 
> > Well, looking at this, I'm not convinced that much of it is correct.
> > 
> > 1) set_dma_ops() is in arch_setup_dma_ops() but this patch adds
> >the unsetting of the DMA ops inside arm_teardown_iommu_dma_ops()
> >rather than arch_teardown_dma_ops().
> > 
> >This doesn't strike me as being particularly symmetric.
> >arm_teardown_iommu_dma_ops() is arm_setup_iommu_dma_ops()'s
> >counterpart.
> > 
> > 2) arch_setup_dma_ops(), the recent patch to add the existing dma_ops
> >check, and Xen - Xen wants to override the DMA ops if in the
> >initial domain.  It's not clear (at least to me) whether the recent
> >patch adding the dma_ops check took account of this or not.
> > 
> > 3) random places seem to fiddle with the dma_ops - notice that
> >arm_iommu_detach_device() sets the dma_ops to NULL.
> > 
> >In fact, I think moving __arm_iommu_detach_device() into
> >arm_iommu_detach_device(), calling arm_iommu_detach_device(),
> >and getting rid of the explicit set_dma_ops(, NULL) in this
> >path would be a good first step.
> > 
> > 4) I think arch_setup_dma_ops() is over-complex.
> > 
> > So, in summary, this code is a mess today, and that means it's not
> > obviously correct - which is bad.  This needs sorting.
> 
> We've reached the same conclusion independently, but I'll refrain from 
> commenting on whether that's a good or bad thing :-)
> 
> I don't think this patch should be applied, as it could break Xen (and other 
> platforms/drivers that set the DMA operations manually) by resetting DMA 
> operations at device remove() time even if they have been set independently 
> of 
> arch_setup_dma_ops().

That will only occur if the dma ops have been overriden once the DMA
operations have been setup via arch_setup_dma_ops.  What saves it from
wholesale NULLing of the DMA operations is the check for a valid
dma_iommu_mapping structure in arm_teardown_iommu_dma_ops().  This only
exists when arm_setup_iommu_dma_ops() has attached a mapping to the
device.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 6/8] arm: dma-mapping: Reset the device's dma_ops

2017-05-23 Thread Laurent Pinchart
Hi Russell,

On Tuesday 23 May 2017 18:53:19 Russell King - ARM Linux wrote:
> On Tue, May 23, 2017 at 05:55:57PM +0100, Robin Murphy wrote:
> > On 23/05/17 17:25, Russell King - ARM Linux wrote:
> >> So, I've come to apply this patch (since it's finally landed in the
> >> patch system), and I'm not convinced that the commit message is really
> >> up to scratch.
> >> 
> >> The current commit message looks like this:
> >> 
> >> "   ARM: 8674/1: dma-mapping: Reset the device's dma_ops
> >> arch_teardown_dma_ops() being the inverse of arch_setup_dma_ops(),
> >> dma_ops should be cleared in the teardown path. Otherwise this
> >> causes problem when the probe of device is retried after being
> >> deferred. The device's iommu structures are cleared after
> >> EPROBEDEFER error, but on the next try dma_ops will still be set to
> >> old value, which is not right."
> >> 
> >> It is obviously a fix, but a fix for which patch?  Looking at the
> >> history, we have "arm: dma-mapping: Don't override dma_ops in
> >> arch_setup_dma_ops()" which I would have guessed is the appropriate
> >> one, but this post-dates your patch (it's very recent, v4.12-rc
> >> recent.)
> >> 
> >> So, I need more description about the problem you were seeing when
> >> you first proposed this patch.
> >> 
> >> How does leaving the dma_ops in place prior to "arm: dma-mapping:
> >> Don't override dma_ops in arch_setup_dma_ops()" cause problems for
> >> deferred probing?
> >> 
> >> What patch is your change trying to fix?  In other words, how far
> >> back does this patch need to be backported?
> > 
> > In effect, it's fixing a latent inconsistency that's been present since
> > its introduction in 4bb25789ed28. However, that has clearly not proven
> > to be an issue in practice since then. With 09515ef5ddad we start
> > actually calling arch_teardown_dma_ops() in a manner that might leave
> > things partially initialised if anyone starts removing and reprobing
> > drivers, but so far that's still from code inspection[1] rather than
> > anyone hitting it.
> > 
> > Given that the changes which tickle it are fresh in -rc1 I'd say there's
> > no need to backport this, but at the same time it shouldn't do any
> > damage if you really want to.
> 
> Well, looking at this, I'm not convinced that much of it is correct.
> 
> 1) set_dma_ops() is in arch_setup_dma_ops() but this patch adds
>the unsetting of the DMA ops inside arm_teardown_iommu_dma_ops()
>rather than arch_teardown_dma_ops().
> 
>This doesn't strike me as being particularly symmetric.
>arm_teardown_iommu_dma_ops() is arm_setup_iommu_dma_ops()'s
>counterpart.
> 
> 2) arch_setup_dma_ops(), the recent patch to add the existing dma_ops
>check, and Xen - Xen wants to override the DMA ops if in the
>initial domain.  It's not clear (at least to me) whether the recent
>patch adding the dma_ops check took account of this or not.
> 
> 3) random places seem to fiddle with the dma_ops - notice that
>arm_iommu_detach_device() sets the dma_ops to NULL.
> 
>In fact, I think moving __arm_iommu_detach_device() into
>arm_iommu_detach_device(), calling arm_iommu_detach_device(),
>and getting rid of the explicit set_dma_ops(, NULL) in this
>path would be a good first step.
> 
> 4) I think arch_setup_dma_ops() is over-complex.
> 
> So, in summary, this code is a mess today, and that means it's not
> obviously correct - which is bad.  This needs sorting.

We've reached the same conclusion independently, but I'll refrain from 
commenting on whether that's a good or bad thing :-)

I don't think this patch should be applied, as it could break Xen (and other 
platforms/drivers that set the DMA operations manually) by resetting DMA 
operations at device remove() time even if they have been set independently of 
arch_setup_dma_ops().

The IOMMU probe deferral support patch series was merged in v4.12-rc1 and 
breaks IOMMU operations on several platforms. We need a fix for v4.12-rc that 
should be as nonintrusive as possible, as a larger cleanup is likely not -rc 
material. Beside reverting the whole series, the simplest option I came up 
with was [1]. Note that this is not the only fix needed to fix v4.12-rc1 IOMMU 
breakage, there are four more patches in the series that Sricharan plans to 
get merged soon. I don't think there are dependencies between the other four 
drivers/ patches and the arch/arm/ patch, so the latter could be merged 
independently through your tree as soon as it's deemed ready.

[1] https://www.spinics.net/lists/arm-kernel/msg583019.html

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] PCI: Save properties required to handle FLR for replay purposes.

2017-05-23 Thread Bjorn Helgaas
On Wed, May 10, 2017 at 11:39:02AM -0700, Ashok Raj wrote:
> From: CQ Tang 
> 
> Requires: https://patchwork.kernel.org/patch/9593891

I'm not sure what the status of the patch above is.  I acked it, but it's
part of a 30-patch IOMMU series, so I expect it to be merged via an IOMMU
tree.

In any case, it's not in v4.12-rc1, so I can't apply *this* patch yet.

> After a FLR, pci-states need to be restored. This patch saves PASID features
> and PRI reqs cached.
> 
> Cc: Jean-Phillipe Brucker 
> Cc: David Woodhouse 
> Cc: iommu@lists.linux-foundation.org
> 
> Signed-off-by: CQ Tang 
> Signed-off-by: Ashok Raj 
> ---
>  drivers/pci/ats.c   | 65 
> +
>  drivers/pci/pci.c   |  3 +++
>  include/linux/pci-ats.h | 10 
>  include/linux/pci.h |  6 +
>  4 files changed, 69 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 2126497..a769955 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -160,17 +160,16 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, &control);
>   pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> - if ((control & PCI_PRI_CTRL_ENABLE) ||
> - !(status & PCI_PRI_STATUS_STOPPED))
> + if (!(status & PCI_PRI_STATUS_STOPPED))
>   return -EBUSY;
>  
>   pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, &max_requests);
>   reqs = min(max_requests, reqs);
> + pdev->pri_reqs_alloc = reqs;
>   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
>  
> - control |= PCI_PRI_CTRL_ENABLE;
> + control = PCI_PRI_CTRL_ENABLE;
>   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>  
>   pdev->pri_enabled = 1;
> @@ -206,6 +205,29 @@ void pci_disable_pri(struct pci_dev *pdev)
>  EXPORT_SYMBOL_GPL(pci_disable_pri);
>  
>  /**
> + * pci_restore_pri_state - Restore PRI
> + * @pdev: PCI device structure
> + *
> + */
> +void pci_restore_pri_state(struct pci_dev *pdev)
> +{
> +   u16 control = PCI_PRI_CTRL_ENABLE;
> +   u32 reqs = pdev->pri_reqs_alloc;
> +   int pos;
> +
> +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +   if (!pos)
> +   return;
> +
> +   if (!pdev->pri_enabled)
> +   return;
> +
> +   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
> +   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> +}
> +EXPORT_SYMBOL_GPL(pci_restore_pri_state);
> +
> +/**
>   * pci_reset_pri - Resets device's PRI state
>   * @pdev: PCI device structure
>   *
> @@ -224,12 +246,7 @@ int pci_reset_pri(struct pci_dev *pdev)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, &control);
> - if (control & PCI_PRI_CTRL_ENABLE)
> - return -EBUSY;
> -
> - control |= PCI_PRI_CTRL_RESET;
> -
> + control = PCI_PRI_CTRL_RESET;
>   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>  
>   return 0;
> @@ -259,12 +276,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PASID_CTRL, &control);
>   pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
> -
> - if (control & PCI_PASID_CTRL_ENABLE)
> - return -EINVAL;
> -
>   supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
>  
>   /* User wants to enable anything unsupported? */
> @@ -272,6 +284,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>   return -EINVAL;
>  
>   control = PCI_PASID_CTRL_ENABLE | features;
> + pdev->pasid_features = features;
>  
>   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
>  
> @@ -305,6 +318,28 @@ void pci_disable_pasid(struct pci_dev *pdev)
>  EXPORT_SYMBOL_GPL(pci_disable_pasid);
>  
>  /**
> + * pci_restore_pasid_state - Restore PASID capabilities.
> + * @pdev: PCI device structure
> + *
> + */
> +void pci_restore_pasid_state(struct pci_dev *pdev)
> +{
> +   u16 control;
> +   int pos;
> +
> +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> +   if (!pos)
> +   return;
> +
> +   if (!pdev->pasid_enabled)
> +   return;
> +
> +   control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features;
> +   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
> +}
> +EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
> +
> +/**
>   * pci_pasid_features - Check which PASID features are supported
>   * @pdev: PCI device structure
>   *
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 7904d02..c9a6510 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1171,6 +1172,8 @@ void pci_restore_state(struct pci_dev *dev)
>  
>   /* PCI Express register must be restored fir

RE: [PATCH] iommu/amd: flush IOTLB for specific domains only (v3)

2017-05-23 Thread Deucher, Alexander
> -Original Message-
> From: Arindam Nath [mailto:anath@gmail.com] On Behalf Of
> arindam.n...@amd.com
> Sent: Monday, May 22, 2017 3:48 AM
> To: iommu@lists.linux-foundation.org
> Cc: amd-...@lists.freedesktop.org; Joerg Roedel; Deucher, Alexander;
> Bridgman, John; dr...@endlessm.com; Suthikulpanit, Suravee;
> li...@endlessm.com; Craig Stein; mic...@daenzer.net; Kuehling, Felix;
> sta...@vger.kernel.org; Nath, Arindam
> Subject: [PATCH] iommu/amd: flush IOTLB for specific domains only (v3)
> 
> From: Arindam Nath 
> 
> Change History
> --
> 
> v3:
> - add Fixes and CC tags
> - add link to Bugzilla
> 
> v2: changes suggested by Joerg
> - add flush flag to improve efficiency of flush operation
> 
> v1:
> - The idea behind flush queues is to defer the IOTLB flushing
>   for domains for which the mappings are no longer valid. We
>   add such domains in queue_add(), and when the queue size
>   reaches FLUSH_QUEUE_SIZE, we perform __queue_flush().
> 
>   Since we have already taken lock before __queue_flush()
>   is called, we need to make sure the IOTLB flushing is
>   performed as quickly as possible.
> 
>   In the current implementation, we perform IOTLB flushing
>   for all domains irrespective of which ones were actually
>   added in the flush queue initially. This can be quite
>   expensive especially for domains for which unmapping is
>   not required at this point of time.
> 
>   This patch makes use of domain information in
>   'struct flush_queue_entry' to make sure we only flush
>   IOTLBs for domains who need it, skipping others.
> 
> Bugzilla: https://bugs.freedesktop.org/101029
> Fixes: b1516a14657a ("iommu/amd: Implement flush queue")
> Cc: sta...@vger.kernel.org
> Suggested-by: Joerg Roedel 
> Signed-off-by: Arindam Nath 

Acked-by: Alex Deucher 

> ---
>  drivers/iommu/amd_iommu.c   | 27 ---
>  drivers/iommu/amd_iommu_types.h |  2 ++
>  2 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 63cacf5..1edeebec 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2227,15 +2227,26 @@ static struct iommu_group
> *amd_iommu_device_group(struct device *dev)
> 
>  static void __queue_flush(struct flush_queue *queue)
>  {
> - struct protection_domain *domain;
> - unsigned long flags;
>   int idx;
> 
> - /* First flush TLB of all known domains */
> - spin_lock_irqsave(&amd_iommu_pd_lock, flags);
> - list_for_each_entry(domain, &amd_iommu_pd_list, list)
> - domain_flush_tlb(domain);
> - spin_unlock_irqrestore(&amd_iommu_pd_lock, flags);
> + /* First flush TLB of all domains which were added to flush queue */
> + for (idx = 0; idx < queue->next; ++idx) {
> + struct flush_queue_entry *entry;
> +
> + entry = queue->entries + idx;
> +
> + /*
> +  * There might be cases where multiple IOVA entries for the
> +  * same domain are queued in the flush queue. To avoid
> +  * flushing the same domain again, we check whether the
> +  * flag is set or not. This improves the efficiency of
> +  * flush operation.
> +  */
> + if (!entry->dma_dom->domain.already_flushed) {
> + entry->dma_dom->domain.already_flushed = true;
> + domain_flush_tlb(&entry->dma_dom->domain);
> + }
> + }
> 
>   /* Wait until flushes have completed */
>   domain_flush_complete(NULL);
> @@ -2289,6 +2300,8 @@ static void queue_add(struct dma_ops_domain
> *dma_dom,
>   pages = __roundup_pow_of_two(pages);
>   address >>= PAGE_SHIFT;
> 
> + dma_dom->domain.already_flushed = false;
> +
>   queue = get_cpu_ptr(&flush_queue);
>   spin_lock_irqsave(&queue->lock, flags);
> 
> diff --git a/drivers/iommu/amd_iommu_types.h
> b/drivers/iommu/amd_iommu_types.h
> index 4de8f41..4f5519d 100644
> --- a/drivers/iommu/amd_iommu_types.h
> +++ b/drivers/iommu/amd_iommu_types.h
> @@ -454,6 +454,8 @@ struct protection_domain {
>   bool updated;   /* complete domain flush required */
>   unsigned dev_cnt;   /* devices assigned to this domain */
>   unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference
> count */
> + bool already_flushed;   /* flag to avoid flushing the same domain
> again
> +in a single invocation of __queue_flush() */
>  };
> 
>  /*
> --
> 2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 6/8] arm: dma-mapping: Reset the device's dma_ops

2017-05-23 Thread Russell King - ARM Linux
On Tue, May 23, 2017 at 05:55:57PM +0100, Robin Murphy wrote:
> On 23/05/17 17:25, Russell King - ARM Linux wrote:
> > So, I've come to apply this patch (since it's finally landed in the patch
> > system), and I'm not convinced that the commit message is really up to
> > scratch.
> > 
> > The current commit message looks like this:
> > 
> > "   ARM: 8674/1: dma-mapping: Reset the device's dma_ops
> > 
> > arch_teardown_dma_ops() being the inverse of arch_setup_dma_ops(),
> > dma_ops should be cleared in the teardown path. Otherwise this causes
> > problem when the probe of device is retried after being deferred. The
> > device's iommu structures are cleared after EPROBEDEFER error, but on
> > the next try dma_ops will still be set to old value, which is not 
> > right."
> > 
> > It is obviously a fix, but a fix for which patch?  Looking at the
> > history, we have "arm: dma-mapping: Don't override dma_ops in
> > arch_setup_dma_ops()" which I would have guessed is the appropriate
> > one, but this post-dates your patch (it's very recent, v4.12-rc
> > recent.)
> > 
> > So, I need more description about the problem you were seeing when
> > you first proposed this patch.
> > 
> > How does leaving the dma_ops in place prior to "arm: dma-mapping:
> > Don't override dma_ops in arch_setup_dma_ops()" cause problems for
> > deferred probing?
> > 
> > What patch is your change trying to fix?  In other words, how far
> > back does this patch need to be backported?
> 
> In effect, it's fixing a latent inconsistency that's been present since
> its introduction in 4bb25789ed28. However, that has clearly not proven
> to be an issue in practice since then. With 09515ef5ddad we start
> actually calling arch_teardown_dma_ops() in a manner that might leave
> things partially initialised if anyone starts removing and reprobing
> drivers, but so far that's still from code inspection[1] rather than
> anyone hitting it.
> 
> Given that the changes which tickle it are fresh in -rc1 I'd say there's
> no need to backport this, but at the same time it shouldn't do any
> damage if you really want to.

Well, looking at this, I'm not convinced that much of it is correct.

1) set_dma_ops() is in arch_setup_dma_ops() but this patch adds
   the unsetting of the DMA ops inside arm_teardown_iommu_dma_ops()
   rather than arch_teardown_dma_ops().

   This doesn't strike me as being particularly symmetric.
   arm_teardown_iommu_dma_ops() is arm_setup_iommu_dma_ops()'s
   counterpart.

2) arch_setup_dma_ops(), the recent patch to add the existing dma_ops
   check, and Xen - Xen wants to override the DMA ops if in the
   initial domain.  It's not clear (at least to me) whether the recent
   patch adding the dma_ops check took account of this or not.

3) random places seem to fiddle with the dma_ops - notice that
   arm_iommu_detach_device() sets the dma_ops to NULL.

   In fact, I think moving __arm_iommu_detach_device() into
   arm_iommu_detach_device(), calling arm_iommu_detach_device(),
   and getting rid of the explicit set_dma_ops(, NULL) in this
   path would be a good first step.

4) I think arch_setup_dma_ops() is over-complex.

So, in summary, this code is a mess today, and that means it's not
obviously correct - which is bad.  This needs sorting.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 6/8] arm: dma-mapping: Reset the device's dma_ops

2017-05-23 Thread Robin Murphy
On 23/05/17 17:25, Russell King - ARM Linux wrote:
> On Thu, Oct 27, 2016 at 09:07:23AM +0530, Sricharan wrote:
>> Hi Robin,
>>
>>> -Original Message-
>>> From: Robin Murphy [mailto:robin.mur...@arm.com]
>>> Sent: Wednesday, October 26, 2016 8:37 PM
>>> To: Sricharan R ; will.dea...@arm.com; 
>>> j...@8bytes.org; iommu@lists.linux-foundation.org; linux-arm-
>>> ker...@lists.infradead.org; linux-arm-...@vger.kernel.org; 
>>> laurent.pinch...@ideasonboard.com; m.szyprow...@samsung.com;
>>> tf...@chromium.org; srinivas.kandaga...@linaro.org
>>> Subject: Re: [PATCH V3 6/8] arm: dma-mapping: Reset the device's dma_ops
>>>
>>> On 04/10/16 18:03, Sricharan R wrote:
 The dma_ops for the device is not getting set to NULL in
 arch_tear_down_dma_ops and this causes an issue when the
 device's probe gets deferred and retried. So reset the
 dma_ops to NULL.
>>>
>>> Reviewed-by: Robin Murphy 
>>>
>>
>>  Thanks.
>>
>>> This seems like it could stand independently from the rest of the series
>>> - might be worth rewording the commit message to more general terms,
>>> i.e. arch_teardown_dma_ops() being the inverse of arch_setup_dma_ops()
>>> thus clearing the ops set by the latter, and sending it to Russell as a
>>> fix in its own right.
>>
>>   Ok, will reword the commit log and push this separately.
> 
> So, I've come to apply this patch (since it's finally landed in the patch
> system), and I'm not convinced that the commit message is really up to
> scratch.
> 
> The current commit message looks like this:
> 
> "   ARM: 8674/1: dma-mapping: Reset the device's dma_ops
> 
> arch_teardown_dma_ops() being the inverse of arch_setup_dma_ops(),
> dma_ops should be cleared in the teardown path. Otherwise this causes
> problem when the probe of device is retried after being deferred. The
> device's iommu structures are cleared after EPROBEDEFER error, but on
> the next try dma_ops will still be set to old value, which is not right."
> 
> It is obviously a fix, but a fix for which patch?  Looking at the
> history, we have "arm: dma-mapping: Don't override dma_ops in
> arch_setup_dma_ops()" which I would have guessed is the appropriate
> one, but this post-dates your patch (it's very recent, v4.12-rc
> recent.)
> 
> So, I need more description about the problem you were seeing when
> you first proposed this patch.
> 
> How does leaving the dma_ops in place prior to "arm: dma-mapping:
> Don't override dma_ops in arch_setup_dma_ops()" cause problems for
> deferred probing?
> 
> What patch is your change trying to fix?  In other words, how far
> back does this patch need to be backported?

In effect, it's fixing a latent inconsistency that's been present since
its introduction in 4bb25789ed28. However, that has clearly not proven
to be an issue in practice since then. With 09515ef5ddad we start
actually calling arch_teardown_dma_ops() in a manner that might leave
things partially initialised if anyone starts removing and reprobing
drivers, but so far that's still from code inspection[1] rather than
anyone hitting it.

Given that the changes which tickle it are fresh in -rc1 I'd say there's
no need to backport this, but at the same time it shouldn't do any
damage if you really want to.

Robin.

[1]:https://www.mail-archive.com/linux-renesas-soc@vger.kernel.org/msg14301.html
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 6/8] arm: dma-mapping: Reset the device's dma_ops

2017-05-23 Thread Russell King - ARM Linux
On Thu, Oct 27, 2016 at 09:07:23AM +0530, Sricharan wrote:
> Hi Robin,
> 
> >-Original Message-
> >From: Robin Murphy [mailto:robin.mur...@arm.com]
> >Sent: Wednesday, October 26, 2016 8:37 PM
> >To: Sricharan R ; will.dea...@arm.com; 
> >j...@8bytes.org; iommu@lists.linux-foundation.org; linux-arm-
> >ker...@lists.infradead.org; linux-arm-...@vger.kernel.org; 
> >laurent.pinch...@ideasonboard.com; m.szyprow...@samsung.com;
> >tf...@chromium.org; srinivas.kandaga...@linaro.org
> >Subject: Re: [PATCH V3 6/8] arm: dma-mapping: Reset the device's dma_ops
> >
> >On 04/10/16 18:03, Sricharan R wrote:
> >> The dma_ops for the device is not getting set to NULL in
> >> arch_tear_down_dma_ops and this causes an issue when the
> >> device's probe gets deferred and retried. So reset the
> >> dma_ops to NULL.
> >
> >Reviewed-by: Robin Murphy 
> >
> 
>  Thanks.
> 
> >This seems like it could stand independently from the rest of the series
> >- might be worth rewording the commit message to more general terms,
> >i.e. arch_teardown_dma_ops() being the inverse of arch_setup_dma_ops()
> >thus clearing the ops set by the latter, and sending it to Russell as a
> >fix in its own right.
> 
>   Ok, will reword the commit log and push this separately.

So, I've come to apply this patch (since it's finally landed in the patch
system), and I'm not convinced that the commit message is really up to
scratch.

The current commit message looks like this:

"   ARM: 8674/1: dma-mapping: Reset the device's dma_ops

arch_teardown_dma_ops() being the inverse of arch_setup_dma_ops(),
dma_ops should be cleared in the teardown path. Otherwise this causes
problem when the probe of device is retried after being deferred. The
device's iommu structures are cleared after EPROBEDEFER error, but on
the next try dma_ops will still be set to old value, which is not right."

It is obviously a fix, but a fix for which patch?  Looking at the
history, we have "arm: dma-mapping: Don't override dma_ops in
arch_setup_dma_ops()" which I would have guessed is the appropriate
one, but this post-dates your patch (it's very recent, v4.12-rc
recent.)

So, I need more description about the problem you were seeing when
you first proposed this patch.

How does leaving the dma_ops in place prior to "arm: dma-mapping:
Don't override dma_ops in arch_setup_dma_ops()" cause problems for
deferred probing?

What patch is your change trying to fix?  In other words, how far
back does this patch need to be backported?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 5/5] ACPI/IORT: Move the check to get iommu_ops from translated fwspec

2017-05-23 Thread Sricharan R
From: Lorenzo Pieralisi 

With IOMMU probe deferral, iort_iommu_configure can be called
multiple times for the same device. Hence we have a check
to see if the device's fwspec is already translated and return
the iommu_ops from that directly. But the check is wrongly
placed in iort_iommu_xlate, which breaks devices with multiple
sids. Move the check to iort_iommu_configure.

Fixes: 5a1bb638d567 ("drivers: acpi: Handle IOMMU lookup failure with deferred 
probing or error")
Tested-by: Nate Watterson 
Signed-off-by: Lorenzo Pieralisi 
---

[v5] new patch

 drivers/acpi/arm64/iort.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 16e101f..797b28d 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -666,14 +666,6 @@ static const struct iommu_ops *iort_iommu_xlate(struct 
device *dev,
int ret = -ENODEV;
struct fwnode_handle *iort_fwnode;
 
-   /*
-* If we already translated the fwspec there
-* is nothing left to do, return the iommu_ops.
-*/
-   ops = iort_fwspec_iommu_ops(dev->iommu_fwspec);
-   if (ops)
-   return ops;
-
if (node) {
iort_fwnode = iort_get_fwnode(node);
if (!iort_fwnode)
@@ -735,6 +727,14 @@ const struct iommu_ops *iort_iommu_configure(struct device 
*dev)
u32 streamid = 0;
int err;
 
+   /*
+* If we already translated the fwspec there
+* is nothing left to do, return the iommu_ops.
+*/
+   ops = iort_fwspec_iommu_ops(dev->iommu_fwspec);
+   if (ops)
+   return ops;
+
if (dev_is_pci(dev)) {
struct pci_bus *bus = to_pci_dev(dev)->bus;
u32 rid;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 4/5] ARM: dma-mapping: Don't tear third-party mappings

2017-05-23 Thread Sricharan R
From: Laurent Pinchart 

arch_setup_dma_ops() is used in device probe code paths to create an
IOMMU mapping and attach it to the device. The function assumes that the
device is attached to a device-specific IOMMU instance (or at least a
device-specific TLB in a shared IOMMU instance) and thus creates a
separate mapping for every device.

On several systems (Renesas R-Car Gen2 being one of them), that
assumption is not true, and IOMMU mappings must be shared between
multiple devices. In those cases the IOMMU driver knows better than the
generic ARM dma-mapping layer and attaches mapping to devices manually
with arm_iommu_attach_device(), which sets the DMA ops for the device.

The arch_setup_dma_ops() function takes this into account and bails out
immediately if the device already has DMA ops assigned. However, the
corresponding arch_teardown_dma_ops() function, called from driver
unbind code paths (including probe deferral), will tear the mapping down
regardless of who created it. When the device is reprobed
arch_setup_dma_ops() will be called again but won't perform any
operation as the DMA ops will still be set.

We need to reset the DMA ops in arch_teardown_dma_ops() to fix this.
However, we can't do so unconditionally, as then a new mapping would be
created by arch_setup_dma_ops() when the device is reprobed, regardless
of whether the device needs to share a mapping or not. We must thus keep
track of whether arch_setup_dma_ops() created the mapping, and only in
that case tear it down in arch_teardown_dma_ops().

Keep track of that information in the dev_archdata structure. As the
structure is embedded in all instances of struct device let's not grow
it, but turn the existing dma_coherent bool field into a bitfield that
can be used for other purposes.

Fixes: 09515ef5ddad ("of/acpi: Configure dma operations at probe time for 
platform/amba/pci bus devices")
Reviewed-by: Robin Murphy 
Signed-off-by: Laurent Pinchart 
---
 arch/arm/include/asm/device.h | 3 ++-
 arch/arm/mm/dma-mapping.c | 4 
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 36ec9c8..3234fe9 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -19,7 +19,8 @@ struct dev_archdata {
 #ifdef CONFIG_XEN
const struct dma_map_ops *dev_dma_ops;
 #endif
-   bool dma_coherent;
+   unsigned int dma_coherent:1;
+   unsigned int dma_ops_setup:1;
 };
 
 struct omap_device;
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c742dfd..b48998f 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2430,9 +2430,13 @@ void arch_setup_dma_ops(struct device *dev, u64 
dma_base, u64 size,
dev->dma_ops = xen_dma_ops;
}
 #endif
+   dev->archdata.dma_ops_setup = true;
 }
 
 void arch_teardown_dma_ops(struct device *dev)
 {
+   if (!dev->archdata.dma_ops_setup)
+   return;
+
arm_teardown_iommu_dma_ops(dev);
 }
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/5] iommu: of: Ignore all errors except EPROBE_DEFER

2017-05-23 Thread Sricharan R
While deferring the probe of IOMMU masters, xlate and
add_device callbacks called from of_iommu_configure
can pass back error values like -ENODEV, which means
the IOMMU cannot be connected with that master for real
reasons. Before the IOMMU probe deferral, all such errors
were ignored. Now all those errors are propagated back,
killing the master's probe for such errors. Instead ignore
all the errors except EPROBE_DEFER, which is the only one
of concern and let the master work without IOMMU, thus
restoring the old behavior. Also make explicit that
of_dma_configure handles only -EPROBE_DEFER from
of_iommu_configure.

Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred 
probing or error")
Reported-by: Geert Uytterhoeven 
Tested-by: Magnus Damn 
Signed-off-by: Sricharan R 
---
[v5] Added the check in of_dma_configure

 drivers/iommu/of_iommu.c | 6 ++
 drivers/of/device.c  | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index e6e9bec..19779b8 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -237,6 +237,12 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
ops = ERR_PTR(err);
}
 
+   /* Ignore all other errors apart from EPROBE_DEFER */
+   if (IS_ERR(ops) && (PTR_ERR(ops) != -EPROBE_DEFER)) {
+   dev_dbg(dev, "Adding to IOMMU failed: %ld\n", PTR_ERR(ops));
+   ops = NULL;
+   }
+
return ops;
 }
 
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 9416d05..28c38c7 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -144,8 +144,8 @@ int of_dma_configure(struct device *dev, struct device_node 
*np)
coherent ? " " : " not ");
 
iommu = of_iommu_configure(dev, np);
-   if (IS_ERR(iommu))
-   return PTR_ERR(iommu);
+   if (IS_ERR(iommu) && PTR_ERR(iommu) == -EPROBE_DEFER)
+   return -EPROBE_DEFER;
 
dev_dbg(dev, "device is%sbehind an iommu\n",
iommu ? " " : " not ");
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 1/5] iommu: of: Fix check for returning EPROBE_DEFER

2017-05-23 Thread Sricharan R
Now with IOMMU probe deferral, we return -EPROBE_DEFER
for masters that are connected to an IOMMU which is not
probed yet, but going to get probed, so that we can attach
the correct dma_ops. So while trying to defer the probe of
the master, check if the of_iommu node that it is connected
to is marked in DT as 'status=disabled', then the IOMMU is never
is going to get probed. So simply return NULL and let the master
work without an IOMMU.

Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred 
probing or error")
Signed-off-by: Sricharan R 
Reported-by: Geert Uytterhoeven 
Reviewed-by: Laurent Pinchart 
Tested-by: Will Deacon 
Tested-by: Magnus Damn 
Acked-by: Will Deacon 
---
 drivers/iommu/of_iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 9f44ee8..e6e9bec 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -118,6 +118,7 @@ static bool of_iommu_driver_present(struct device_node *np)
 
ops = iommu_ops_from_fwnode(fwnode);
if ((ops && !ops->of_xlate) ||
+   !of_device_is_available(iommu_spec->np) ||
(!ops && !of_iommu_driver_present(iommu_spec->np)))
return NULL;
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 3/5] ACPI/IORT: Ignore all errors except EPROBE_DEFER

2017-05-23 Thread Sricharan R
While deferring the probe of IOMMU masters, xlate and
add_device callbacks called from iort_iommu_configure
can pass back error values like -ENODEV, which means
the IOMMU cannot be connected with that master for real
reasons. Before the IOMMU probe deferral, all such errors
were ignored. Now all those errors are propagated back,
killing the master's probe for such errors. Instead ignore
all the errors except EPROBE_DEFER, which is the only one
of concern and let the master work without IOMMU, thus
restoring the old behavior. Also make explicit that
acpi_dma_configure handles only -EPROBE_DEFER from
iort_iommu_configure.

Fixes: 5a1bb638d567 ("drivers: acpi: Handle IOMMU lookup failure with deferred 
probing or error")
Signed-off-by: Sricharan R 
---
[v5] Added the check in acpi_dma_configure

 drivers/acpi/arm64/iort.c | 6 ++
 drivers/acpi/scan.c   | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index c5fecf9..16e101f 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -782,6 +782,12 @@ const struct iommu_ops *iort_iommu_configure(struct device 
*dev)
if (err)
ops = ERR_PTR(err);
 
+   /* Ignore all other errors apart from EPROBE_DEFER */
+   if (IS_ERR(ops) && (PTR_ERR(ops) != -EPROBE_DEFER)) {
+   dev_dbg(dev, "Adding to IOMMU failed: %ld\n", PTR_ERR(ops));
+   ops = NULL;
+   }
+
return ops;
 }
 
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index e39ec7b..3a10d757 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1371,8 +1371,8 @@ int acpi_dma_configure(struct device *dev, enum 
dev_dma_attr attr)
iort_set_dma_mask(dev);
 
iommu = iort_iommu_configure(dev);
-   if (IS_ERR(iommu))
-   return PTR_ERR(iommu);
+   if (IS_ERR(iommu) && PTR_ERR(iommu) == -EPROBE_DEFER)
+   return -EPROBE_DEFER;
 
size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1);
/*
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V11 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error

2017-05-23 Thread Nate Watterson

Hi Lorenzo,

On 5/23/2017 5:26 AM, Lorenzo Pieralisi wrote:

On Tue, May 23, 2017 at 02:31:17PM +0530, Sricharan R wrote:

Hi Lorenzo,

On 5/23/2017 2:22 PM, Lorenzo Pieralisi wrote:

On Tue, May 23, 2017 at 02:26:10AM -0400, Nate Watterson wrote:

Hi Sricharan,

On 4/10/2017 7:21 AM, Sricharan R wrote:

This is an equivalent to the DT's handling of the iommu master's probe
with deferred probing when the corrsponding iommu is not probed yet.
The lack of a registered IOMMU can be caused by the lack of a driver for
the IOMMU, the IOMMU device probe not having been performed yet, having
been deferred, or having failed.

The first case occurs when the firmware describes the bus master and
IOMMU topology correctly but no device driver exists for the IOMMU yet
or the device driver has not been compiled in. Return NULL, the caller
will configure the device without an IOMMU.

The second and third cases are handled by deferring the probe of the bus
master device which will eventually get reprobed after the IOMMU.

The last case is currently handled by deferring the probe of the bus
master device as well. A mechanism to either configure the bus master
device without an IOMMU or to fail the bus master device probe depending
on whether the IOMMU is optional or mandatory would be a good
enhancement.

Tested-by: Hanjun Guo 
Reviewed-by: Robin Murphy 
[Lorenzo: Added fixes for dma_coherent_mask overflow, acpi_dma_configure
   called multiple times for same device]
Signed-off-by: Lorenzo Pieralisi 
Signed-off-by: Sricharan R 
---
  drivers/acpi/arm64/iort.c  | 33 -
  drivers/acpi/scan.c| 11 ---
  drivers/base/dma-mapping.c |  2 +-
  include/acpi/acpi_bus.h|  2 +-
  include/linux/acpi.h   |  7 +--
  5 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 3dd9ec3..e323ece 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -543,6 +543,14 @@ static const struct iommu_ops *iort_iommu_xlate(struct 
device *dev,
const struct iommu_ops *ops = NULL;
int ret = -ENODEV;
struct fwnode_handle *iort_fwnode;
+   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+
+   /*
+* If we already translated the fwspec there
+* is nothing left to do, return the iommu_ops.
+*/
+   if (fwspec && fwspec->ops)
+   return fwspec->ops;


Is this logic strictly required? It breaks masters with multiple SIDs
as only the first SID is actually added to the master's fwspec.


My bad, that's indeed a silly bug I introduced. Please let me know if the
patch below fixes it, we will send it upstream shortly.



oops, i think emails crossed. Please let me know if you are ok to add
this to the other fixes.


No worries, yes I am ok thanks but please give Nate some time to report
back to make sure the diff I sent actually fixes the problem.


The patch you sent fixes the problem. Thanks for the quick turnaround.



Apologies for the breakage.

Lorenzo



Regards,
  Sricharan



--
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 04/30] iommu/arm-smmu-v3: Add support for PCI ATS

2017-05-23 Thread Jean-Philippe Brucker
On 23/05/17 09:41, Leizhen (ThunderTown) wrote:
> On 2017/2/28 3:54, Jean-Philippe Brucker wrote:
>> PCIe devices can implement their own TLB, named Address Translation Cache
>> (ATC). Steps involved in the use and maintenance of such caches are:
>>
>> * Device sends an Address Translation Request for a given IOVA to the
>>   IOMMU. If the translation succeeds, the IOMMU returns the corresponding
>>   physical address, which is stored in the device's ATC.
>>
>> * Device can then use the physical address directly in a transaction.
>>   A PCIe device does so by setting the TLP AT field to 0b10 - translated.
>>   The SMMU might check that the device is allowed to send translated
>>   transactions, and let it pass through.
>>
>> * When an address is unmapped, CPU sends a CMD_ATC_INV command to the
>>   SMMU, that is relayed to the device.
>>
>> In theory, this doesn't require a lot of software intervention. The IOMMU
>> driver needs to enable ATS when adding a PCI device, and send an
>> invalidation request when unmapping. Note that this invalidation is
>> allowed to take up to a minute, according to the PCIe spec. In
>> addition, the invalidation queue on the ATC side is fairly small, 32 by
>> default, so we cannot keep many invalidations in flight (see ATS spec
>> section 3.5, Invalidate Flow Control).
>>
>> Handling these constraints properly would require to postpone
>> invalidations, and keep the stale mappings until we're certain that all
>> devices forgot about them. This requires major work in the page table
>> managers, and is therefore not done by this patch.
>>
>>   Range calculation
>>   -
>>
>> The invalidation packet itself is a bit awkward: range must be naturally
>> aligned, which means that the start address is a multiple of the range
>> size. In addition, the size must be a power of two number of 4k pages. We
>> have a few options to enforce this constraint:
>>
>> (1) Find the smallest naturally aligned region that covers the requested
>> range. This is simple to compute and only takes one ATC_INV, but it
>> will spill on lots of neighbouring ATC entries.
>>
>> (2) Align the start address to the region size (rounded up to a power of
>> two), and send a second invalidation for the next range of the same
>> size. Still not great, but reduces spilling.
>>
>> (3) Cover the range exactly with the smallest number of naturally aligned
>> regions. This would be interesting to implement but as for (2),
>> requires multiple ATC_INV.
>>
>> As I suspect ATC invalidation packets will be a very scarce resource,
>> we'll go with option (1) for now, and only send one big invalidation.
>>
>> Note that with io-pgtable, the unmap function is called for each page, so
>> this doesn't matter. The problem shows up when sharing page tables with
>> the MMU.
> Suppose this is true, I'd like to choose option (2). Because the worst cases 
> of
> both (1) and (2) will not be happened, but the code of (2) will look clearer.
> And (2) is technically more acceptable.

I agree that (2) is a bit clearer, but the question is of performance
rather than readability. I'd like to see some benchmarks or experiment on
my own before switching to a two-invalidation system.

Intuitively one big invalidation will result in more ATC trashing and will
bring overall device performance down. But then according to the PCI spec,
ATC invalidations are grossly expensive, they have an upper bound of a
minute. I agree that this is highly improbable and might depend on the
range size, but purely from an architectural standpoint, reducing the
number of ATC invalidation requests is the priority, because this is much
worse than any performance slow-down incurred by ATC trashing. And for the
moment I can only base my decisions on the architecture.

So I'd like to keep (1) for now, and update it to (2) (or even (3)) once
we have more hardware to experiment with.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V11 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error

2017-05-23 Thread Lorenzo Pieralisi
On Tue, May 23, 2017 at 02:31:17PM +0530, Sricharan R wrote:
> Hi Lorenzo,
> 
> On 5/23/2017 2:22 PM, Lorenzo Pieralisi wrote:
> > On Tue, May 23, 2017 at 02:26:10AM -0400, Nate Watterson wrote:
> >> Hi Sricharan,
> >>
> >> On 4/10/2017 7:21 AM, Sricharan R wrote:
> >>> This is an equivalent to the DT's handling of the iommu master's probe
> >>> with deferred probing when the corrsponding iommu is not probed yet.
> >>> The lack of a registered IOMMU can be caused by the lack of a driver for
> >>> the IOMMU, the IOMMU device probe not having been performed yet, having
> >>> been deferred, or having failed.
> >>>
> >>> The first case occurs when the firmware describes the bus master and
> >>> IOMMU topology correctly but no device driver exists for the IOMMU yet
> >>> or the device driver has not been compiled in. Return NULL, the caller
> >>> will configure the device without an IOMMU.
> >>>
> >>> The second and third cases are handled by deferring the probe of the bus
> >>> master device which will eventually get reprobed after the IOMMU.
> >>>
> >>> The last case is currently handled by deferring the probe of the bus
> >>> master device as well. A mechanism to either configure the bus master
> >>> device without an IOMMU or to fail the bus master device probe depending
> >>> on whether the IOMMU is optional or mandatory would be a good
> >>> enhancement.
> >>>
> >>> Tested-by: Hanjun Guo 
> >>> Reviewed-by: Robin Murphy 
> >>> [Lorenzo: Added fixes for dma_coherent_mask overflow, acpi_dma_configure
> >>>   called multiple times for same device]
> >>> Signed-off-by: Lorenzo Pieralisi 
> >>> Signed-off-by: Sricharan R 
> >>> ---
> >>>  drivers/acpi/arm64/iort.c  | 33 -
> >>>  drivers/acpi/scan.c| 11 ---
> >>>  drivers/base/dma-mapping.c |  2 +-
> >>>  include/acpi/acpi_bus.h|  2 +-
> >>>  include/linux/acpi.h   |  7 +--
> >>>  5 files changed, 47 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> >>> index 3dd9ec3..e323ece 100644
> >>> --- a/drivers/acpi/arm64/iort.c
> >>> +++ b/drivers/acpi/arm64/iort.c
> >>> @@ -543,6 +543,14 @@ static const struct iommu_ops 
> >>> *iort_iommu_xlate(struct device *dev,
> >>>   const struct iommu_ops *ops = NULL;
> >>>   int ret = -ENODEV;
> >>>   struct fwnode_handle *iort_fwnode;
> >>> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> >>> +
> >>> + /*
> >>> +  * If we already translated the fwspec there
> >>> +  * is nothing left to do, return the iommu_ops.
> >>> +  */
> >>> + if (fwspec && fwspec->ops)
> >>> + return fwspec->ops;
> >>
> >> Is this logic strictly required? It breaks masters with multiple SIDs
> >> as only the first SID is actually added to the master's fwspec.
> > 
> > My bad, that's indeed a silly bug I introduced. Please let me know if the
> > patch below fixes it, we will send it upstream shortly.
> > 
> 
> oops, i think emails crossed. Please let me know if you are ok to add
> this to the other fixes.

No worries, yes I am ok thanks but please give Nate some time to report
back to make sure the diff I sent actually fixes the problem.

Apologies for the breakage.

Lorenzo

> 
> Regards,
>  Sricharan
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V11 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error

2017-05-23 Thread Sricharan R
Hi Lorenzo,

On 5/23/2017 2:22 PM, Lorenzo Pieralisi wrote:
> On Tue, May 23, 2017 at 02:26:10AM -0400, Nate Watterson wrote:
>> Hi Sricharan,
>>
>> On 4/10/2017 7:21 AM, Sricharan R wrote:
>>> This is an equivalent to the DT's handling of the iommu master's probe
>>> with deferred probing when the corrsponding iommu is not probed yet.
>>> The lack of a registered IOMMU can be caused by the lack of a driver for
>>> the IOMMU, the IOMMU device probe not having been performed yet, having
>>> been deferred, or having failed.
>>>
>>> The first case occurs when the firmware describes the bus master and
>>> IOMMU topology correctly but no device driver exists for the IOMMU yet
>>> or the device driver has not been compiled in. Return NULL, the caller
>>> will configure the device without an IOMMU.
>>>
>>> The second and third cases are handled by deferring the probe of the bus
>>> master device which will eventually get reprobed after the IOMMU.
>>>
>>> The last case is currently handled by deferring the probe of the bus
>>> master device as well. A mechanism to either configure the bus master
>>> device without an IOMMU or to fail the bus master device probe depending
>>> on whether the IOMMU is optional or mandatory would be a good
>>> enhancement.
>>>
>>> Tested-by: Hanjun Guo 
>>> Reviewed-by: Robin Murphy 
>>> [Lorenzo: Added fixes for dma_coherent_mask overflow, acpi_dma_configure
>>>   called multiple times for same device]
>>> Signed-off-by: Lorenzo Pieralisi 
>>> Signed-off-by: Sricharan R 
>>> ---
>>>  drivers/acpi/arm64/iort.c  | 33 -
>>>  drivers/acpi/scan.c| 11 ---
>>>  drivers/base/dma-mapping.c |  2 +-
>>>  include/acpi/acpi_bus.h|  2 +-
>>>  include/linux/acpi.h   |  7 +--
>>>  5 files changed, 47 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
>>> index 3dd9ec3..e323ece 100644
>>> --- a/drivers/acpi/arm64/iort.c
>>> +++ b/drivers/acpi/arm64/iort.c
>>> @@ -543,6 +543,14 @@ static const struct iommu_ops *iort_iommu_xlate(struct 
>>> device *dev,
>>> const struct iommu_ops *ops = NULL;
>>> int ret = -ENODEV;
>>> struct fwnode_handle *iort_fwnode;
>>> +   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
>>> +
>>> +   /*
>>> +* If we already translated the fwspec there
>>> +* is nothing left to do, return the iommu_ops.
>>> +*/
>>> +   if (fwspec && fwspec->ops)
>>> +   return fwspec->ops;
>>
>> Is this logic strictly required? It breaks masters with multiple SIDs
>> as only the first SID is actually added to the master's fwspec.
> 
> My bad, that's indeed a silly bug I introduced. Please let me know if the
> patch below fixes it, we will send it upstream shortly.
> 

oops, i think emails crossed. Please let me know if you are ok to add this
to the other fixes.

Regards,
 Sricharan

> Lorenzo
> 
> -- >8 --
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index c5fecf9..e326f2a 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -666,14 +666,6 @@ static const struct iommu_ops *iort_iommu_xlate(struct 
> device *dev,
>   int ret = -ENODEV;
>   struct fwnode_handle *iort_fwnode;
>  
> - /*
> -  * If we already translated the fwspec there
> -  * is nothing left to do, return the iommu_ops.
> -  */
> - ops = iort_fwspec_iommu_ops(dev->iommu_fwspec);
> - if (ops)
> - return ops;
> -
>   if (node) {
>   iort_fwnode = iort_get_fwnode(node);
>   if (!iort_fwnode)
> @@ -735,6 +727,14 @@ const struct iommu_ops *iort_iommu_configure(struct 
> device *dev)
>   u32 streamid = 0;
>   int err;
>  
> + /*
> +  * If we already translated the fwspec there
> +  * is nothing left to do, return the iommu_ops.
> +  */
> + ops = iort_fwspec_iommu_ops(dev->iommu_fwspec);
> + if (ops)
> + return ops;
> +
>   if (dev_is_pci(dev)) {
>   struct pci_bus *bus = to_pci_dev(dev)->bus;
>   u32 rid;
> 
> 

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V11 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error

2017-05-23 Thread Sricharan R
Hi,

On 5/23/2017 11:56 AM, Nate Watterson wrote:
> Hi Sricharan,
> 
> On 4/10/2017 7:21 AM, Sricharan R wrote:
>> This is an equivalent to the DT's handling of the iommu master's probe
>> with deferred probing when the corrsponding iommu is not probed yet.
>> The lack of a registered IOMMU can be caused by the lack of a driver for
>> the IOMMU, the IOMMU device probe not having been performed yet, having
>> been deferred, or having failed.
>>
>> The first case occurs when the firmware describes the bus master and
>> IOMMU topology correctly but no device driver exists for the IOMMU yet
>> or the device driver has not been compiled in. Return NULL, the caller
>> will configure the device without an IOMMU.
>>
>> The second and third cases are handled by deferring the probe of the bus
>> master device which will eventually get reprobed after the IOMMU.
>>
>> The last case is currently handled by deferring the probe of the bus
>> master device as well. A mechanism to either configure the bus master
>> device without an IOMMU or to fail the bus master device probe depending
>> on whether the IOMMU is optional or mandatory would be a good
>> enhancement.
>>
>> Tested-by: Hanjun Guo 
>> Reviewed-by: Robin Murphy 
>> [Lorenzo: Added fixes for dma_coherent_mask overflow, acpi_dma_configure
>>called multiple times for same device]
>> Signed-off-by: Lorenzo Pieralisi 
>> Signed-off-by: Sricharan R 
>> ---
>>   drivers/acpi/arm64/iort.c  | 33 -
>>   drivers/acpi/scan.c| 11 ---
>>   drivers/base/dma-mapping.c |  2 +-
>>   include/acpi/acpi_bus.h|  2 +-
>>   include/linux/acpi.h   |  7 +--
>>   5 files changed, 47 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
>> index 3dd9ec3..e323ece 100644
>> --- a/drivers/acpi/arm64/iort.c
>> +++ b/drivers/acpi/arm64/iort.c
>> @@ -543,6 +543,14 @@ static const struct iommu_ops *iort_iommu_xlate(struct 
>> device *dev,
>>   const struct iommu_ops *ops = NULL;
>>   int ret = -ENODEV;
>>   struct fwnode_handle *iort_fwnode;
>> +struct iommu_fwspec *fwspec = dev->iommu_fwspec;
>> +
>> +/*
>> + * If we already translated the fwspec there
>> + * is nothing left to do, return the iommu_ops.
>> + */
>> +if (fwspec && fwspec->ops)
>> +return fwspec->ops;
> 
> Is this logic strictly required? It breaks masters with multiple SIDs
> as only the first SID is actually added to the master's fwspec.
> 

The logic here is same as what is done in DT, in of_iommu_configure.
This is to pullback the iommu_ops that was set for the device from the
previous run. (when the device probe is deferred and called again).
Infact this piece of hunk was initially missing in ACPI, later added
since a bug was reported [1] on V9 and then this came in as a fix in V10.
But looks like this check should be in iort_iommu_configure.
Lorenzo, is that correct ?


[1] https://www.spinics.net/lists/arm-kernel/msg572069.html

If yes i will add this in the fixes that i have already posted.

Regards,
 Sricharan


>> if (node) {
>>   iort_fwnode = iort_get_fwnode(node);
>> @@ -550,8 +558,17 @@ static const struct iommu_ops *iort_iommu_xlate(struct 
>> device *dev,
>>   return NULL;
>> ops = iommu_ops_from_fwnode(iort_fwnode);
>> +/*
>> + * If the ops look-up fails, this means that either
>> + * the SMMU drivers have not been probed yet or that
>> + * the SMMU drivers are not built in the kernel;
>> + * Depending on whether the SMMU drivers are built-in
>> + * in the kernel or not, defer the IOMMU configuration
>> + * or just abort it.
>> + */
>>   if (!ops)
>> -return NULL;
>> +return iort_iommu_driver_enabled(node->type) ?
>> +   ERR_PTR(-EPROBE_DEFER) : NULL;
>> ret = arm_smmu_iort_xlate(dev, streamid, iort_fwnode, ops);
>>   }
>> @@ -625,12 +642,26 @@ const struct iommu_ops *iort_iommu_configure(struct 
>> device *dev)
>> while (parent) {
>>   ops = iort_iommu_xlate(dev, parent, streamid);
>> +if (IS_ERR_OR_NULL(ops))
>> +return ops;
>> parent = iort_node_get_id(node, &streamid,
>> IORT_IOMMU_TYPE, i++);
>>   }
>>   }
>>   +/*
>> + * If we have reason to believe the IOMMU driver missed the initial
>> + * add_device callback for dev, replay it to get things in order.
>> + */
>> +if (!IS_ERR_OR_NULL(ops) && ops->add_device &&
>> +dev->bus && !dev->iommu_group) {
>> +int err = ops->add_device(dev);
>> +
>> +if (err)
>> +ops = ERR_PTR(err);
>> +}
>> +
>>   return ops;
>>   }
>>   diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
>> index 1926918..2a513cc 100644
>> --- a/drivers/acpi/scan.c
>> +++ b/drivers/acpi/scan.c
>> @@ 

Re: [PATCH V11 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error

2017-05-23 Thread Lorenzo Pieralisi
On Tue, May 23, 2017 at 02:26:10AM -0400, Nate Watterson wrote:
> Hi Sricharan,
> 
> On 4/10/2017 7:21 AM, Sricharan R wrote:
> >This is an equivalent to the DT's handling of the iommu master's probe
> >with deferred probing when the corrsponding iommu is not probed yet.
> >The lack of a registered IOMMU can be caused by the lack of a driver for
> >the IOMMU, the IOMMU device probe not having been performed yet, having
> >been deferred, or having failed.
> >
> >The first case occurs when the firmware describes the bus master and
> >IOMMU topology correctly but no device driver exists for the IOMMU yet
> >or the device driver has not been compiled in. Return NULL, the caller
> >will configure the device without an IOMMU.
> >
> >The second and third cases are handled by deferring the probe of the bus
> >master device which will eventually get reprobed after the IOMMU.
> >
> >The last case is currently handled by deferring the probe of the bus
> >master device as well. A mechanism to either configure the bus master
> >device without an IOMMU or to fail the bus master device probe depending
> >on whether the IOMMU is optional or mandatory would be a good
> >enhancement.
> >
> >Tested-by: Hanjun Guo 
> >Reviewed-by: Robin Murphy 
> >[Lorenzo: Added fixes for dma_coherent_mask overflow, acpi_dma_configure
> >   called multiple times for same device]
> >Signed-off-by: Lorenzo Pieralisi 
> >Signed-off-by: Sricharan R 
> >---
> >  drivers/acpi/arm64/iort.c  | 33 -
> >  drivers/acpi/scan.c| 11 ---
> >  drivers/base/dma-mapping.c |  2 +-
> >  include/acpi/acpi_bus.h|  2 +-
> >  include/linux/acpi.h   |  7 +--
> >  5 files changed, 47 insertions(+), 8 deletions(-)
> >
> >diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> >index 3dd9ec3..e323ece 100644
> >--- a/drivers/acpi/arm64/iort.c
> >+++ b/drivers/acpi/arm64/iort.c
> >@@ -543,6 +543,14 @@ static const struct iommu_ops *iort_iommu_xlate(struct 
> >device *dev,
> > const struct iommu_ops *ops = NULL;
> > int ret = -ENODEV;
> > struct fwnode_handle *iort_fwnode;
> >+struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> >+
> >+/*
> >+ * If we already translated the fwspec there
> >+ * is nothing left to do, return the iommu_ops.
> >+ */
> >+if (fwspec && fwspec->ops)
> >+return fwspec->ops;
> 
> Is this logic strictly required? It breaks masters with multiple SIDs
> as only the first SID is actually added to the master's fwspec.

My bad, that's indeed a silly bug I introduced. Please let me know if the
patch below fixes it, we will send it upstream shortly.

Lorenzo

-- >8 --
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index c5fecf9..e326f2a 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -666,14 +666,6 @@ static const struct iommu_ops *iort_iommu_xlate(struct 
device *dev,
int ret = -ENODEV;
struct fwnode_handle *iort_fwnode;
 
-   /*
-* If we already translated the fwspec there
-* is nothing left to do, return the iommu_ops.
-*/
-   ops = iort_fwspec_iommu_ops(dev->iommu_fwspec);
-   if (ops)
-   return ops;
-
if (node) {
iort_fwnode = iort_get_fwnode(node);
if (!iort_fwnode)
@@ -735,6 +727,14 @@ const struct iommu_ops *iort_iommu_configure(struct device 
*dev)
u32 streamid = 0;
int err;
 
+   /*
+* If we already translated the fwspec there
+* is nothing left to do, return the iommu_ops.
+*/
+   ops = iort_fwspec_iommu_ops(dev->iommu_fwspec);
+   if (ops)
+   return ops;
+
if (dev_is_pci(dev)) {
struct pci_bus *bus = to_pci_dev(dev)->bus;
u32 rid;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 04/30] iommu/arm-smmu-v3: Add support for PCI ATS

2017-05-23 Thread Leizhen (ThunderTown)


On 2017/2/28 3:54, Jean-Philippe Brucker wrote:
> PCIe devices can implement their own TLB, named Address Translation Cache
> (ATC). Steps involved in the use and maintenance of such caches are:
> 
> * Device sends an Address Translation Request for a given IOVA to the
>   IOMMU. If the translation succeeds, the IOMMU returns the corresponding
>   physical address, which is stored in the device's ATC.
> 
> * Device can then use the physical address directly in a transaction.
>   A PCIe device does so by setting the TLP AT field to 0b10 - translated.
>   The SMMU might check that the device is allowed to send translated
>   transactions, and let it pass through.
> 
> * When an address is unmapped, CPU sends a CMD_ATC_INV command to the
>   SMMU, that is relayed to the device.
> 
> In theory, this doesn't require a lot of software intervention. The IOMMU
> driver needs to enable ATS when adding a PCI device, and send an
> invalidation request when unmapping. Note that this invalidation is
> allowed to take up to a minute, according to the PCIe spec. In
> addition, the invalidation queue on the ATC side is fairly small, 32 by
> default, so we cannot keep many invalidations in flight (see ATS spec
> section 3.5, Invalidate Flow Control).
> 
> Handling these constraints properly would require to postpone
> invalidations, and keep the stale mappings until we're certain that all
> devices forgot about them. This requires major work in the page table
> managers, and is therefore not done by this patch.
> 
>   Range calculation
>   -
> 
> The invalidation packet itself is a bit awkward: range must be naturally
> aligned, which means that the start address is a multiple of the range
> size. In addition, the size must be a power of two number of 4k pages. We
> have a few options to enforce this constraint:
> 
> (1) Find the smallest naturally aligned region that covers the requested
> range. This is simple to compute and only takes one ATC_INV, but it
> will spill on lots of neighbouring ATC entries.
> 
> (2) Align the start address to the region size (rounded up to a power of
> two), and send a second invalidation for the next range of the same
> size. Still not great, but reduces spilling.
> 
> (3) Cover the range exactly with the smallest number of naturally aligned
> regions. This would be interesting to implement but as for (2),
> requires multiple ATC_INV.
> 
> As I suspect ATC invalidation packets will be a very scarce resource,
> we'll go with option (1) for now, and only send one big invalidation.
> 
> Note that with io-pgtable, the unmap function is called for each page, so
> this doesn't matter. The problem shows up when sharing page tables with
> the MMU.
Suppose this is true, I'd like to choose option (2). Because the worst cases of
both (1) and (2) will not be happened, but the code of (2) will look clearer.
And (2) is technically more acceptable.

> 
>   Locking
>   ---
> 
> The atc_invalidate function is called from arm_smmu_unmap, with pgtbl_lock
> held (hardirq-safe). When sharing page tables with the MMU, we will have a
> few more call sites:
> 
> * When unbinding an address space from a device, to invalidate the whole
>   address space.
> * When a task bound to a device does an mlock, munmap, etc. This comes
>   from an MMU notifier, with mmap_sem and pte_lock held.
> 
> Given this, all locks take on the ATC invalidation path must be hardirq-
> safe.
> 
>   Timeout
>   ---
> 
> Some SMMU implementations will raise a CERROR_ATC_INV_SYNC when a CMD_SYNC
> fails because of an ATC invalidation. Some will just fail the CMD_SYNC.
> Others might let CMD_SYNC complete and have an asynchronous IMPDEF
> mechanism to record the error. When we receive a CERROR_ATC_INV_SYNC, we
> could retry sending all ATC_INV since last successful CMD_SYNC. When a
> CMD_SYNC fails without CERROR_ATC_INV_SYNC, we could retry sending *all*
> commands since last successful CMD_SYNC. This patch doesn't properly
> handle timeout, and ignores devices that don't behave. It might lead to
> memory corruption.
> 
>   Optional support
>   
> 
> For the moment, enable ATS whenever a device advertises it. Later, we
> might want to allow users to opt-in for the whole system or individual
> devices via sysfs or cmdline. Some firmware interfaces also provide a
> description of ATS capabilities in the root complex, and we might want to
> add a similar capability in DT. For instance, the following could be added
> to bindings/pci/pci-iommu.txt, as an optional property to PCI RC:
> 
> - ats-map: describe Address Translation Service support by the root
>   complex. This property is an arbitrary number of tuples of
>   (rid-base,length). Any RID in this interval is allowed to issue address
>   translation requests.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/arm-smmu-v3.c | 262 
> ++--
>  1 file changed, 250 insertions(

Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function

2017-05-23 Thread Liu, Yi L
On Fri, Apr 28, 2017 at 01:51:42PM +0100, Jean-Philippe Brucker wrote:
> On 28/04/17 10:04, Liu, Yi L wrote:
Hi Jean,

Sorry for the delay response. Still have some follow-up comments on
per-device or per-group. Pls refer to comments inline.

> > On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> >> Hi Yi, Jacob,
> >>
> >> On 26/04/17 11:11, Liu, Yi L wrote:
> >>> From: Jacob Pan 
> >>>
> >>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> >>> case in the guest:
> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> >>>
> >>> As part of the proposed architecture, when a SVM capable PCI
> >>> device is assigned to a guest, nested mode is turned on. Guest owns the
> >>> first level page tables (request with PASID) and performs GVA->GPA
> >>> translation. Second level page tables are owned by the host for GPA->HPA
> >>> translation for both request with and without PASID.
> >>>
> >>> A new IOMMU driver interface is therefore needed to perform tasks as
> >>> follows:
> >>> * Enable nested translation and appropriate translation type
> >>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> >>>
> >>> This patch introduces new functions called iommu_(un)bind_pasid_table()
> >>> to IOMMU APIs. Architecture specific IOMMU function can be added later
> >>> to perform the specific steps for binding pasid table of assigned devices.
> >>>
> >>> This patch also adds model definition in iommu.h. It would be used to
> >>> check if the bind request is from a compatible entity. e.g. a bind
> >>> request from an intel_iommu emulator may not be supported by an ARM SMMU
> >>> driver.
> >>>
> >>> Signed-off-by: Jacob Pan 
> >>> Signed-off-by: Liu, Yi L 
> >>> ---
> >>>  drivers/iommu/iommu.c | 19 +++
> >>>  include/linux/iommu.h | 31 +++
> >>>  2 files changed, 50 insertions(+)
> >>>
> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >>> index dbe7f65..f2da636 100644
> >>> --- a/drivers/iommu/iommu.c
> >>> +++ b/drivers/iommu/iommu.c
> >>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain 
> >>> *domain, struct device *dev)
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >>>  
> >>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device 
> >>> *dev,
> >>> + struct pasid_table_info *pasidt_binfo)
> >>
> >> I guess that domain can always be deduced from dev using
> >> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> >>
> >> For the next version of my SVM series, I was thinking of passing group
> >> instead of device to iommu_bind. Since all devices in a group are expected
> >> to share the same mappings (whether they want it or not), users will have
> > 
> > Virtual address space is not tied to protection domain as I/O virtual 
> > address
> > space does. Is it really necessary to affect all the devices in this group.
> > Or it is just for consistence?
> 
> It's mostly about consistency, and also avoid hiding implicit behavior in
> the IOMMU driver. I have the following example, described using group and
> domain structures from the IOMMU API:
>  
> |IOMMU   |
> |  |DOM  __ ||
> |  ||GRP   ||| bind
> |  ||A<-Task 1
> |  ||B |||
> |  ||__|||
> |  | __ ||
> |  ||GRP   |||
> |  ||C |||
> |  ||__|||
> |  |||
> |    |
> |  |DOM  __ ||
> |  ||GRP   |||
> |  ||D |||
> |  ||__|||
> |  |||
> ||
> 
> Let's take PCI functions A, B, C, and D, all with PASID capabilities. Due
> to some hardware limitation (in the bus, the device or the IOMMU), B can
> see all DMA transactions issued by A. A and B are therefore in the same
> IOMMU group. C and D can be isolated by the IOMMU, so they each have their
> own group.
> 
> (As far as I know, in the SVM world at the moment, devices are neatly
> integrated and there is no need for putting multiple devices in the same
> IOMMU group, but I don't think we should expect all future SVM systems to
> be well-behaved.)
>
> So when a user binds Task 1 to device A, it is *implicitly* giving device
> B access to Task 1 as well. Simply because the IOMMU is unable to isolate
> A from B, PASID or not. B could access the same address space as A, even
> if you don't call bind again to explicitly attach the PASID table to B.
> 
> If the bind is done with device as argument, maybe users will believe that
> using PASIDs provides an additional level o