On 04/07/2016 04:38 PM, Jean-Philippe Brucker wrote:
> Hi Eric,
> 
> On Thu, Apr 07, 2016 at 11:33:42AM +0200, Eric Auger wrote:
>> Alex,
>> On 04/07/2016 01:12 AM, Alex Williamson wrote:
>>> On Mon,  4 Apr 2016 08:07:01 +0000
>>> Eric Auger <eric.au...@linaro.org> wrote:
>>>
>>>> This patch introduces iommu_get/put_single_reserved.
>>>>
>>>> iommu_get_single_reserved allows to allocate a new reserved iova page
>>>> and map it onto the physical page that contains a given physical address.
>>>> Page size is the IOMMU page one. It is the responsability of the
>>>> system integrator to make sure the in use IOMMU page size corresponds
>>>> to the granularity of the MSI frame.
>>>>
>>>> It returns the iova that is mapped onto the provided physical address.
>>>> Hence the physical address passed in argument does not need to be aligned.
>>>>
>>>> In case a mapping already exists between both pages, the IOVA mapped
>>>> to the PA is directly returned.
>>>>
>>>> Each time an iova is successfully returned a binding ref count is
>>>> incremented.
>>>>
>>>> iommu_put_single_reserved decrements the ref count and when this latter
>>>> is null, the mapping is destroyed and the iova is released.
>>>>
>>>> Signed-off-by: Eric Auger <eric.au...@linaro.org>
>>>> Signed-off-by: Ankit Jindal <ajin...@apm.com>
>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavku...@linaro.org>
>>>> Signed-off-by: Bharat Bhushan <bharat.bhus...@freescale.com>
>>>>
>>>> ---
>>>>
>>>> v5 -> v6:
>>>> - revisit locking with spin_lock instead of mutex
>>>> - do not kref_get on 1st get
>>>> - add size parameter to the get function following Marc's request
>>>> - use the iova domain shift instead of using the smallest supported page 
>>>> size
>>>>
>>>> v3 -> v4:
>>>> - formerly in iommu: iommu_get/put_single_reserved &
>>>>   iommu/arm-smmu: implement iommu_get/put_single_reserved
>>>> - Attempted to address Marc's doubts about missing size/alignment
>>>>   at VFIO level (user-space knows the IOMMU page size and the number
>>>>   of IOVA pages to provision)
>>>>
>>>> v2 -> v3:
>>>> - remove static implementation of iommu_get_single_reserved &
>>>>   iommu_put_single_reserved when CONFIG_IOMMU_API is not set
>>>>
>>>> v1 -> v2:
>>>> - previously a VFIO API, named vfio_alloc_map/unmap_free_reserved_iova
>>>> ---
>>>>  drivers/iommu/dma-reserved-iommu.c | 146 
>>>> +++++++++++++++++++++++++++++++++++++
>>>>  include/linux/dma-reserved-iommu.h |  28 +++++++
>>>>  2 files changed, 174 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/dma-reserved-iommu.c 
>>>> b/drivers/iommu/dma-reserved-iommu.c
>>>> index f592118..3c759d9 100644
>>>> --- a/drivers/iommu/dma-reserved-iommu.c
>>>> +++ b/drivers/iommu/dma-reserved-iommu.c
>>>> @@ -136,3 +136,149 @@ void iommu_free_reserved_iova_domain(struct 
>>>> iommu_domain *domain)
>>>>    spin_unlock_irqrestore(&domain->reserved_lock, flags);
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(iommu_free_reserved_iova_domain);
>>>> +
>>>> +static void delete_reserved_binding(struct iommu_domain *domain,
>>>> +                              struct iommu_reserved_binding *b)
>>>> +{
>>>> +  struct iova_domain *iovad =
>>>> +          (struct iova_domain *)domain->reserved_iova_cookie;
>>>> +  unsigned long order = iova_shift(iovad);
>>>> +
>>>> +  iommu_unmap(domain, b->iova, b->size);
>>>> +  free_iova(iovad, b->iova >> order);
>>>> +  kfree(b);
>>>> +}
>>>> +
>>>> +int iommu_get_reserved_iova(struct iommu_domain *domain,
>>>> +                        phys_addr_t addr, size_t size, int prot,
>>>> +                        dma_addr_t *iova)
>>>> +{
>>>> +  struct iova_domain *iovad =
>>>> +          (struct iova_domain *)domain->reserved_iova_cookie;
>>>> +  unsigned long order = iova_shift(iovad);
> 
> Another nit: this call should be after the !iovad check belo

Yes definitively
> 
>>>> +  unsigned long  base_pfn, end_pfn, nb_iommu_pages;
>>>> +  size_t iommu_page_size = 1 << order, binding_size;
>>>> +  phys_addr_t aligned_base, offset;
>>>> +  struct iommu_reserved_binding *b, *newb;
>>>> +  unsigned long flags;
>>>> +  struct iova *p_iova;
>>>> +  bool unmap = false;
>>>> +  int ret;
>>>> +
>>>> +  base_pfn = addr >> order;
>>>> +  end_pfn = (addr + size - 1) >> order;
>>>> +  nb_iommu_pages = end_pfn - base_pfn + 1;
>>>> +  aligned_base = base_pfn << order;
>>>> +  offset = addr - aligned_base;
>>>> +  binding_size = nb_iommu_pages * iommu_page_size;
>>>> +
>>>> +  if (!iovad)
>>>> +          return -EINVAL;
>>>> +
>>>> +  spin_lock_irqsave(&domain->reserved_lock, flags);
>>>> +
>>>> +  b = find_reserved_binding(domain, aligned_base, binding_size);
>>>> +  if (b) {
>>>> +          *iova = b->iova + offset;
>>>> +          kref_get(&b->kref);
>>>> +          ret = 0;
>>>> +          goto unlock;
>>>> +  }
>>>> +
>>>> +  spin_unlock_irqrestore(&domain->reserved_lock, flags);
>>>> +
>>>> +  /*
>>>> +   * no reserved IOVA was found for this PA, start allocating and
>>>> +   * registering one while the spin-lock is not held. iommu_map/unmap
>>>> +   * are not supposed to be atomic
>>>> +   */
>>>> +
>>>> +  p_iova = alloc_iova(iovad, nb_iommu_pages, iovad->dma_32bit_pfn, true);
>>>> +  if (!p_iova)
>>>> +          return -ENOMEM;
>>>
>>> Here we're using iovad, which was the reserved_iova_cookie outside of
>>> the locking, which makes the locking ineffective.  Isn't this lock also
>>> protecting our iova domain, I'm confused.
>> I think I was too when I wrote that :-(
>>>
>>>> +
>>>> +  *iova = iova_dma_addr(iovad, p_iova);
>>>> +
>>>> +  newb = kzalloc(sizeof(*b), GFP_KERNEL);
>> needs to to be GPF_ATOMIC as Jean-Philippe stated before.
>>>> +  if (!newb) {
>>>> +          free_iova(iovad, p_iova->pfn_lo);
>>>> +          return -ENOMEM;
>>>> +  }
>>>> +
>>>> +  ret = iommu_map(domain, *iova, aligned_base, binding_size, prot);
>> one problem I have is I would need iommu_map to be atomic (because of
>> the call sequence reported by Jean-Philippe) and I have no guarantee it
>> is in general I think . It is for arm-smmu(-v3).c which covers the ARM
>> use case. But what about other smmus potentially used in that process?
> 
> Hmm, indeed. How about we move all the heavy mapping work to
> msi_domain_prepare_irqs instead? It is allowed to sleep and, more
> importantly, fail. It is called much earlier, by pci_enable_msi_range.

Indeed this could be an option for setup.

However a substitute to msi_domain_set_affinity should also be found I
think, to handle a change in affinity (which can change the doorbell):

We have this path and irq_migrate_all_off_this_cpu takes the irq_desc
raw_spin_lock.

cpuhotplug.c:irq_migrate_all_off_this_cpu
cpuhotplug.c:migrate_one_irq
irq_do_set_affinity
chip->irq_set_affinity
msi_domain_set_affinity
../..
iommu_map/unmap

> 
> All we are missing is details about doorbells, which we could retrieve
> from the MSI controller's driver, using one or more additional callbacks
> in msi_domain_ops. Currently, we already need one such callbacks for
> querying the number of doorbell pages,
Yes currently I assume a single page per MSI controller which is
arbitrary. I can add such callback in my next version.

Thank you for your time

Eric
 maybe we could also ask the
> driver to provide their addresses? And in msi_domain_activate, simply
> search for the IOVA already associated with the MSI message?
> 
> I only briefly though about it from the host point of view, not sure how
> VFIO would cope with this method.
> 
> Thanks,
> Jean-Philippe
> 

Reply via email to