Re: [PATCH V5 3/5] iommu/dma-iommu: Handle deferred devices

2019-08-16 Thread Hillf Danton


On Thu, 15 Aug 2019 12:09:41 +0100 Tom Murphy wrote:
> 
> Handle devices which defer their attach to the iommu in the dma-iommu api
> 
> Signed-off-by: Tom Murphy 
> ---
>  drivers/iommu/dma-iommu.c | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 2712fbc68b28..906b7fa14d3c 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  struct iommu_dma_msi_page {
>   struct list_headlist;
> @@ -351,6 +352,21 @@ static int iommu_dma_init_domain(struct iommu_domain 
> *domain, dma_addr_t base,
>   return iova_reserve_iommu_regions(dev, domain);
>  }
>  
> +static int handle_deferred_device(struct device *dev,
> + struct iommu_domain *domain)
> +{
> + const struct iommu_ops *ops = domain->ops;
> +
> + if (!is_kdump_kernel())
> + return 0;
> +
> + if (unlikely(ops->is_attach_deferred &&
> + ops->is_attach_deferred(domain, dev)))
> + return iommu_attach_device(domain, dev);
> +
> + return 0;
> +}
> +
>  /**
>   * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU 
> API
>   *page flags.
> @@ -463,6 +479,9 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
> phys_addr_t phys,
>   size_t iova_off = iova_offset(iovad, phys);
>   dma_addr_t iova;
>  
> + if (unlikely(handle_deferred_device(dev, domain)))
> + return DMA_MAPPING_ERROR;
> +
>   size = iova_align(iovad, size + iova_off);
>  
>   iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);

iommu_map_atomic() is applied to __iommu_dma_map() in 2/5.
Is it an atomic context currently given the mutex_lock() in
iommu_attach_device()?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: DANGER WILL ROBINSON, DANGER

2019-08-16 Thread Jason Gunthorpe
On Thu, Aug 15, 2019 at 04:16:30PM -0400, Jerome Glisse wrote:
> On Thu, Aug 15, 2019 at 03:19:29PM -0400, Jerome Glisse wrote:
> > On Tue, Aug 13, 2019 at 02:01:35PM +0300, Adalbert Lazăr wrote:
> > > On Fri, 9 Aug 2019 09:24:44 -0700, Matthew Wilcox  
> > > wrote:
> > > > On Fri, Aug 09, 2019 at 07:00:26PM +0300, Adalbert Lazăr wrote:
> > > > > +++ b/include/linux/page-flags.h
> > > > > @@ -417,8 +417,10 @@ PAGEFLAG(Idle, idle, PF_ANY)
> > > > >   */
> > > > >  #define PAGE_MAPPING_ANON0x1
> > > > >  #define PAGE_MAPPING_MOVABLE 0x2
> > > > > +#define PAGE_MAPPING_REMOTE  0x4
> > > > 
> > > > Uh.  How do you know page->mapping would otherwise have bit 2 clear?
> > > > Who's guaranteeing that?
> > > > 
> > > > This is an awfully big patch to the memory management code, buried in
> > > > the middle of a gigantic series which almost guarantees nobody would
> > > > look at it.  I call shenanigans.
> > > > 
> > > > > @@ -1021,7 +1022,7 @@ void page_move_anon_rmap(struct page *page, 
> > > > > struct vm_area_struct *vma)
> > > > >   * __page_set_anon_rmap - set up new anonymous rmap
> > > > >   * @page:Page or Hugepage to add to rmap
> > > > >   * @vma: VM area to add page to.
> > > > > - * @address: User virtual address of the mapping 
> > > > > + * @address: User virtual address of the mapping
> > > > 
> > > > And mixing in fluff changes like this is a real no-no.  Try again.
> > > > 
> > > 
> > > No bad intentions, just overzealous.
> > > I didn't want to hide anything from our patches.
> > > Once we advance with the introspection patches related to KVM we'll be
> > > back with the remote mapping patch, split and cleaned.
> > 
> > They are not bit left in struct page ! Looking at the patch it seems
> > you want to have your own pin count just for KVM. This is bad, we are
> > already trying to solve the GUP thing (see all various patchset about
> > GUP posted recently).
> > 
> > You need to rethink how you want to achieve this. Why not simply a
> > remote read()/write() into the process memory ie KVMI would call
> > an ioctl that allow to read or write into a remote process memory
> > like ptrace() but on steroid ...
> > 
> > Adding this whole big complex infrastructure without justification
> > of why we need to avoid round trip is just too much really.
> 
> Thinking a bit more about this, you can achieve the same thing without
> adding a single line to any mm code. Instead of having mmap with
> PROT_NONE | MAP_LOCKED you have userspace mmap some kvm device file
> (i am assuming this is something you already have and can control
> the mmap callback).
> 
> So now kernel side you have a vma with a vm_operations_struct under
> your control this means that everything you want to block mm wise
> from within the inspector process can be block through those call-
> backs (find_special_page() specificaly for which you have to return
> NULL all the time).

I'm actually aware of a couple of use cases that would like to
mirror the VA of one process into another. One big one in the HPC
world is the out of tree 'xpmem' still in wide use today. xpmem is
basically what Jerome described above.

If you do an approach like Jerome describes it would be nice if was a
general facility and not buried in kvm.

I know past xpmem adventures ran into trouble with locking/etc - ie
getting the mm_struct of the victim seemed a bit hard for some reason,
but maybe that could be done with a FD pass 'ioctl(I AM THE VICITM)' ?

Jason
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization