Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-09-05 Thread Michael S. Tsirkin
On Mon, Aug 12, 2019 at 02:15:32PM +0200, Christoph Hellwig wrote:
> On Sun, Aug 11, 2019 at 04:55:27AM -0400, Michael S. Tsirkin wrote:
> > On Sun, Aug 11, 2019 at 07:56:07AM +0200, Christoph Hellwig wrote:
> > > So we need a flag on the virtio device, exposed by the
> > > hypervisor (or hardware for hw virtio devices) that says:  hey, I'm real,
> > > don't take a shortcut.
> > 
> > The point here is that it's actually still not real. So we would still
> > use a physical address. However Linux decides that it wants extra
> > security by moving all data through the bounce buffer.  The distinction
> > made is that one can actually give device a physical address of the
> > bounce buffer.
> 
> Sure.  The problem is just that you keep piling hacks on top of hacks.
> We need the per-device flag anyway to properly support hardware virtio
> device in all circumstances.  Instead of coming up with another ad-hoc
> hack to force DMA uses implement that one proper bit and reuse it here.

The flag that you mention literally means "I am a real device" so for
example, you can use VFIO with it. And this device isn't a real one,
and you can't use VFIO with it, even though it's part of a power
system which always has an IOMMU.



Or here's another way to put it: we have a broken device that can only
access physical addresses, not DMA addresses. But to enable SEV Linux
requires DMA API.  So we can still make it work if DMA address happens
to be a physical address (not necessarily of the same page).


This is where dma_addr_is_a_phys_addr() is coming from: it tells us this
weird configuration can still work.  What are we going to do for SEV if
dma_addr_is_a_phys_addr does not apply? Fail probe I guess.


So the proposal is really to make things safe and to this end,
to add this in probe:

if (sev_active() &&
!dma_addr_is_a_phys_addr(dev) &&
!virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
return -EINVAL;


the point being to prevent loading driver where it would
corrupt guest memory. Put this way, any objections to adding
dma_addr_is_a_phys_addr to the DMA API?





-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [V3, 2/2] media: i2c: Add Omnivision OV02A10 camera sensor driver

2019-09-05 Thread Dongchun Zhu
On Fri, 2019-09-06 at 06:58 +0800, Nicolas Boichat wrote:
> On Fri, Sep 6, 2019 at 12:05 AM Sakari Ailus
>  wrote:
> >
> > On Thu, Sep 05, 2019 at 07:53:37PM +0900, Tomasz Figa wrote:
> > > On Thu, Sep 5, 2019 at 7:45 PM Sakari Ailus
> > >  wrote:
> > > >
> > > > Hi Dongchun,
> > > >
> > > > On Thu, Sep 05, 2019 at 05:41:05PM +0800, Dongchun Zhu wrote:
> > > >
> > > > ...
> > > >
> > > > > > > + ret = regulator_bulk_enable(OV02A10_NUM_SUPPLIES, 
> > > > > > > ov02a10->supplies);
> > > > > > > + if (ret < 0) {
> > > > > > > + dev_err(dev, "Failed to enable regulators\n");
> > > > > > > + goto disable_clk;
> > > > > > > + }
> > > > > > > + msleep_range(7);
> > > > > >
> > > > > > This has some potential of clashing with more generic functions in 
> > > > > > the
> > > > > > future. Please use usleep_range directly, or msleep.
> > > > > >
> > > > >
> > > > > Did you mean using usleep_range(7*1000, 8*1000), as used in patch v1?
> > > > > https://patchwork.kernel.org/patch/10957225/
> > > >
> > > > Yes, please.
> > >
> > > Why not just msleep()?
> >
> > msleep() is usually less accurate. I'm not sure it makes a big different in
> > this case. Perhaps, if someone wants that the sensor is powered on and
> > streaming as soon as possible.
> 
> https://elixir.bootlin.com/linux/latest/source/Documentation/timers/timers-howto.txt#L70
> 
> Use usleep_range for delays up to 20ms (at least that's what the
> documentation (still) says?)
> 

Thank you for your clarifications.
>From the doc,
"msleep(1~20) may not do what the caller intends, and
will often sleep longer (~20 ms actual sleep for any
value given in the 1~20ms range). In many cases this
is not the desired behavior."

So, it is supposed to use usleep_range in shorter sleep case,
such as 5ms.

> > --
> > Sakari Ailus
> > sakari.ai...@linux.intel.com




Re: [V3, 2/2] media: i2c: Add Omnivision OV02A10 camera sensor driver

2019-09-05 Thread Nicolas Boichat
On Fri, Sep 6, 2019 at 12:05 AM Sakari Ailus
 wrote:
>
> On Thu, Sep 05, 2019 at 07:53:37PM +0900, Tomasz Figa wrote:
> > On Thu, Sep 5, 2019 at 7:45 PM Sakari Ailus
> >  wrote:
> > >
> > > Hi Dongchun,
> > >
> > > On Thu, Sep 05, 2019 at 05:41:05PM +0800, Dongchun Zhu wrote:
> > >
> > > ...
> > >
> > > > > > + ret = regulator_bulk_enable(OV02A10_NUM_SUPPLIES, 
> > > > > > ov02a10->supplies);
> > > > > > + if (ret < 0) {
> > > > > > + dev_err(dev, "Failed to enable regulators\n");
> > > > > > + goto disable_clk;
> > > > > > + }
> > > > > > + msleep_range(7);
> > > > >
> > > > > This has some potential of clashing with more generic functions in the
> > > > > future. Please use usleep_range directly, or msleep.
> > > > >
> > > >
> > > > Did you mean using usleep_range(7*1000, 8*1000), as used in patch v1?
> > > > https://patchwork.kernel.org/patch/10957225/
> > >
> > > Yes, please.
> >
> > Why not just msleep()?
>
> msleep() is usually less accurate. I'm not sure it makes a big different in
> this case. Perhaps, if someone wants that the sensor is powered on and
> streaming as soon as possible.

https://elixir.bootlin.com/linux/latest/source/Documentation/timers/timers-howto.txt#L70

Use usleep_range for delays up to 20ms (at least that's what the
documentation (still) says?)

> --
> Sakari Ailus
> sakari.ai...@linux.intel.com


Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage

2019-09-05 Thread David Rientjes via iommu
On Thu, 5 Sep 2019, Christoph Hellwig wrote:

> > Hi Christoph, Jens, and Ming,
> > 
> > While booting a 5.2 SEV-enabled guest we have encountered the following 
> > WARNING that is followed up by a BUG because we are in atomic context 
> > while trying to call set_memory_decrypted:
> 
> Well, this really is a x86 / DMA API issue unfortunately.  Drivers
> are allowed to do GFP_ATOMIC dma allocation under locks / rcu critical
> sections and from interrupts.  And it seems like the SEV case can't
> handle that.  We have some semi-generic code to have a fixed sized
> pool in kernel/dma for non-coherent platforms that have similar issues
> that we could try to wire up, but I wonder if there is a better way
> to handle the issue, so I've added Tom and the x86 maintainers.
> 
> Now independent of that issue using DMA coherent memory for the nvme
> PRPs/SGLs doesn't actually feel very optional.  We could do with
> normal kmalloc allocations and just sync it to the device and back.
> I wonder if we should create some general mempool-like helpers for that.
> 

Thanks for looking into this.  I assume it's a non-starter to try to 
address this in _vm_unmap_aliases() itself, i.e. rely on a purge spinlock 
to do all synchronization (or trylock if not forced) for 
purge_vmap_area_lazy() rather than only the vmap_area_lock within it.  In 
other words, no mutex.

If that's the case, and set_memory_encrypted() can't be fixed to not need 
to sleep by changing _vm_unmap_aliases() locking, then I assume dmapool is 
our only alternative?  I have no idea with how large this should be.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/amd: fix a race in increase_address_space()

2019-09-05 Thread Qian Cai
On Thu, 2019-09-05 at 13:43 +0200, Joerg Roedel wrote:
> Hi Qian,
> 
> On Wed, Sep 04, 2019 at 05:24:22PM -0400, Qian Cai wrote:
> > if (domain->mode == PAGE_MODE_6_LEVEL)
> > /* address space already 64 bit large */
> > return false;
> > 
> > This gives a clue that there must be a race between multiple concurrent
> > threads in increase_address_space().
> 
> Thanks for tracking this down, there is a race indeed.
> 
> > +   mutex_lock(>api_lock);
> > *dma_addr = __map_single(dev, dma_dom, page_to_phys(page),
> >  size, DMA_BIDIRECTIONAL, dma_mask);
> > +   mutex_unlock(>api_lock);
> >  
> > if (*dma_addr == DMA_MAPPING_ERROR)
> > goto out_free;
> > @@ -2696,7 +2698,9 @@ static void free_coherent(struct device *dev, size_t 
> > size,
> >  
> > dma_dom = to_dma_ops_domain(domain);
> >  
> > +   mutex_lock(>api_lock);
> > __unmap_single(dma_dom, dma_addr, size, DMA_BIDIRECTIONAL);
> > +   mutex_unlock(>api_lock);
> 
> But I think the right fix is to lock the operation in
> increase_address_space() directly, and not the calls around it, like in
> the diff below. It is untested, so can you please try it and report back
> if it fixes your issue?

Yes, it works great so far.

> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index b607a92791d3..1ff705f16239 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -1424,18 +1424,21 @@ static void free_pagetable(struct protection_domain 
> *domain)
>   * another level increases the size of the address space by 9 bits to a size 
> up
>   * to 64 bits.
>   */
> -static bool increase_address_space(struct protection_domain *domain,
> +static void increase_address_space(struct protection_domain *domain,
>  gfp_t gfp)
>  {
> + unsigned long flags;
>   u64 *pte;
>  
> - if (domain->mode == PAGE_MODE_6_LEVEL)
> + spin_lock_irqsave(>lock, flags);
> +
> + if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
>   /* address space already 64 bit large */
> - return false;
> + goto out;
>  
>   pte = (void *)get_zeroed_page(gfp);
>   if (!pte)
> - return false;
> + goto out;
>  
>   *pte = PM_LEVEL_PDE(domain->mode,
>   iommu_virt_to_phys(domain->pt_root));
> @@ -1443,7 +1446,10 @@ static bool increase_address_space(struct 
> protection_domain *domain,
>   domain->mode+= 1;
>   domain->updated  = true;
>  
> - return true;
> +out:
> + spin_unlock_irqrestore(>lock, flags);
> +
> + return;
>  }
>  
>  static u64 *alloc_pte(struct protection_domain *domain,


[PATCH] amd/iommu: flush old domains in kdump kernel

2019-09-05 Thread Stuart Hayes
When devices are attached to the amd_iommu in a kdump kernel, the old device
table entries (DTEs), which were copied from the crashed kernel, will be
overwritten with a new domain number.  When the new DTE is written, the IOMMU
is told to flush the DTE from its internal cache--but it is not told to flush
the translation cache entries for the old domain number.

Without this patch, AMD systems using the tg3 network driver fail when kdump
tries to save the vmcore to a network system, showing network timeouts and
(sometimes) IOMMU errors in the kernel log.

This patch will flush IOMMU translation cache entries for the old domain when
a DTE gets overwritten with a new domain number.


Signed-off-by: Stuart Hayes 

---

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b607a92..f853b96 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1143,6 +1143,17 @@ static void amd_iommu_flush_tlb_all(struct amd_iommu 
*iommu)
iommu_completion_wait(iommu);
 }
 
+static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
+{
+   struct iommu_cmd cmd;
+
+   build_inv_iommu_pages(, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
+ dom_id, 1);
+   iommu_queue_command(iommu, );
+
+   iommu_completion_wait(iommu);
+}
+
 static void amd_iommu_flush_all(struct amd_iommu *iommu)
 {
struct iommu_cmd cmd;
@@ -1873,6 +1884,7 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain,
 {
u64 pte_root = 0;
u64 flags = 0;
+   u32 old_domid;
 
if (domain->mode != PAGE_MODE_NONE)
pte_root = iommu_virt_to_phys(domain->pt_root);
@@ -1922,8 +1934,20 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain,
flags &= ~DEV_DOMID_MASK;
flags |= domain->id;
 
+   old_domid = amd_iommu_dev_table[devid].data[1] & DEV_DOMID_MASK;
amd_iommu_dev_table[devid].data[1]  = flags;
amd_iommu_dev_table[devid].data[0]  = pte_root;
+
+   /*
+* A kdump kernel might be replacing a domain ID that was copied from
+* the previous kernel--if so, it needs to flush the translation cache
+* entries for the old domain ID that is being overwritten
+*/
+   if (old_domid) {
+   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
+
+   amd_iommu_flush_tlb_domid(iommu, old_domid);
+   }
 }
 
 static void clear_dte_entry(u16 devid)




Re: [V3, 2/2] media: i2c: Add Omnivision OV02A10 camera sensor driver

2019-09-05 Thread Sakari Ailus
On Thu, Sep 05, 2019 at 07:53:37PM +0900, Tomasz Figa wrote:
> On Thu, Sep 5, 2019 at 7:45 PM Sakari Ailus
>  wrote:
> >
> > Hi Dongchun,
> >
> > On Thu, Sep 05, 2019 at 05:41:05PM +0800, Dongchun Zhu wrote:
> >
> > ...
> >
> > > > > + ret = regulator_bulk_enable(OV02A10_NUM_SUPPLIES, 
> > > > > ov02a10->supplies);
> > > > > + if (ret < 0) {
> > > > > + dev_err(dev, "Failed to enable regulators\n");
> > > > > + goto disable_clk;
> > > > > + }
> > > > > + msleep_range(7);
> > > >
> > > > This has some potential of clashing with more generic functions in the
> > > > future. Please use usleep_range directly, or msleep.
> > > >
> > >
> > > Did you mean using usleep_range(7*1000, 8*1000), as used in patch v1?
> > > https://patchwork.kernel.org/patch/10957225/
> >
> > Yes, please.
> 
> Why not just msleep()?

msleep() is usually less accurate. I'm not sure it makes a big different in
this case. Perhaps, if someone wants that the sensor is powered on and
streaming as soon as possible.

-- 
Sakari Ailus
sakari.ai...@linux.intel.com


Re: [RFC PATCH] iommu/amd: fix a race in increase_address_space()

2019-09-05 Thread Joerg Roedel
Hi Qian,

On Wed, Sep 04, 2019 at 05:24:22PM -0400, Qian Cai wrote:
>   if (domain->mode == PAGE_MODE_6_LEVEL)
>   /* address space already 64 bit large */
>   return false;
> 
> This gives a clue that there must be a race between multiple concurrent
> threads in increase_address_space().

Thanks for tracking this down, there is a race indeed.

> + mutex_lock(>api_lock);
>   *dma_addr = __map_single(dev, dma_dom, page_to_phys(page),
>size, DMA_BIDIRECTIONAL, dma_mask);
> + mutex_unlock(>api_lock);
>  
>   if (*dma_addr == DMA_MAPPING_ERROR)
>   goto out_free;
> @@ -2696,7 +2698,9 @@ static void free_coherent(struct device *dev, size_t 
> size,
>  
>   dma_dom = to_dma_ops_domain(domain);
>  
> + mutex_lock(>api_lock);
>   __unmap_single(dma_dom, dma_addr, size, DMA_BIDIRECTIONAL);
> + mutex_unlock(>api_lock);

But I think the right fix is to lock the operation in
increase_address_space() directly, and not the calls around it, like in
the diff below. It is untested, so can you please try it and report back
if it fixes your issue?

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b607a92791d3..1ff705f16239 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1424,18 +1424,21 @@ static void free_pagetable(struct protection_domain 
*domain)
  * another level increases the size of the address space by 9 bits to a size up
  * to 64 bits.
  */
-static bool increase_address_space(struct protection_domain *domain,
+static void increase_address_space(struct protection_domain *domain,
   gfp_t gfp)
 {
+   unsigned long flags;
u64 *pte;
 
-   if (domain->mode == PAGE_MODE_6_LEVEL)
+   spin_lock_irqsave(>lock, flags);
+
+   if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
/* address space already 64 bit large */
-   return false;
+   goto out;
 
pte = (void *)get_zeroed_page(gfp);
if (!pte)
-   return false;
+   goto out;
 
*pte = PM_LEVEL_PDE(domain->mode,
iommu_virt_to_phys(domain->pt_root));
@@ -1443,7 +1446,10 @@ static bool increase_address_space(struct 
protection_domain *domain,
domain->mode+= 1;
domain->updated  = true;
 
-   return true;
+out:
+   spin_unlock_irqrestore(>lock, flags);
+
+   return;
 }
 
 static u64 *alloc_pte(struct protection_domain *domain,


[PATCH 11/11] arm64: use asm-generic/dma-mapping.h

2019-09-05 Thread Christoph Hellwig
Now that the Xen special cases are gone nothing worth mentioning is
left in the arm64  file, so switch to use the
asm-generic version instead.

Signed-off-by: Christoph Hellwig 
Acked-by: Will Deacon 
Reviewed-by: Stefano Stabellini 
---
 arch/arm64/include/asm/Kbuild|  1 +
 arch/arm64/include/asm/dma-mapping.h | 22 --
 arch/arm64/mm/dma-mapping.c  |  1 +
 3 files changed, 2 insertions(+), 22 deletions(-)
 delete mode 100644 arch/arm64/include/asm/dma-mapping.h

diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index c52e151afab0..98a5405c8558 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -4,6 +4,7 @@ generic-y += delay.h
 generic-y += div64.h
 generic-y += dma.h
 generic-y += dma-contiguous.h
+generic-y += dma-mapping.h
 generic-y += early_ioremap.h
 generic-y += emergency-restart.h
 generic-y += hw_irq.h
diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
deleted file mode 100644
index 67243255a858..
--- a/arch/arm64/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,22 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2012 ARM Ltd.
- */
-#ifndef __ASM_DMA_MAPPING_H
-#define __ASM_DMA_MAPPING_H
-
-#ifdef __KERNEL__
-
-#include 
-#include 
-
-#include 
-#include 
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
-   return NULL;
-}
-
-#endif /* __KERNEL__ */
-#endif /* __ASM_DMA_MAPPING_H */
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4b244a037349..6578abcfbbc7 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
-- 
2.20.1



[PATCH 09/11] swiotlb-xen: simplify cache maintainance

2019-09-05 Thread Christoph Hellwig
Now that we know we always have the dma-noncoherent.h helpers available
if we are on an architecture with support for non-coherent devices,
we can just call them directly, and remove the calls to the dma-direct
routines, including the fact that we call the dma_direct_map_page
routines but ignore the value returned from it.  Instead we now have
Xen wrappers for the arch_sync_dma_for_{device,cpu} helpers that call
the special Xen versions of those routines for foreign pages.

Note that the new helpers get the physical address passed in addition
to the dma address to avoid another translation for the local cache
maintainance.  The pfn_valid checks remain on the dma address as in
the old code, even if that looks a little funny.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 arch/arm/xen/mm.c| 64 +++-
 arch/x86/include/asm/xen/page-coherent.h | 14 --
 drivers/xen/swiotlb-xen.c| 20 
 include/xen/arm/page-coherent.h  | 63 ---
 include/xen/swiotlb-xen.h|  5 ++
 5 files changed, 32 insertions(+), 134 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 9d73fa4a5991..2b2c208408bb 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -60,63 +60,33 @@ static void dma_cache_maint(dma_addr_t handle, size_t size, 
u32 op)
} while (size);
 }
 
-static void __xen_dma_page_dev_to_cpu(struct device *hwdev, dma_addr_t handle,
-   size_t size, enum dma_data_direction dir)
+/*
+ * Dom0 is mapped 1:1, and while the Linux page can span across multiple Xen
+ * pages, it is not possible for it to contain a mix of local and foreign Xen
+ * pages.  Calling pfn_valid on a foreign mfn will always return false, so if
+ * pfn_valid returns true the pages is local and we can use the native
+ * dma-direct functions, otherwise we call the Xen specific version.
+ */
+void xen_dma_sync_for_cpu(struct device *dev, dma_addr_t handle,
+   phys_addr_t paddr, size_t size, enum dma_data_direction dir)
 {
-   if (dir != DMA_TO_DEVICE)
+   if (pfn_valid(PFN_DOWN(handle)))
+   arch_sync_dma_for_cpu(dev, paddr, size, dir);
+   else if (dir != DMA_TO_DEVICE)
dma_cache_maint(handle, size, GNTTAB_CACHE_INVAL);
 }
 
-static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle,
-   size_t size, enum dma_data_direction dir)
+void xen_dma_sync_for_device(struct device *dev, dma_addr_t handle,
+   phys_addr_t paddr, size_t size, enum dma_data_direction dir)
 {
-   if (dir == DMA_FROM_DEVICE)
+   if (pfn_valid(PFN_DOWN(handle)))
+   arch_sync_dma_for_device(dev, paddr, size, dir);
+   else if (dir == DMA_FROM_DEVICE)
dma_cache_maint(handle, size, GNTTAB_CACHE_INVAL);
else
dma_cache_maint(handle, size, GNTTAB_CACHE_CLEAN);
 }
 
-void __xen_dma_map_page(struct device *hwdev, struct page *page,
-dma_addr_t dev_addr, unsigned long offset, size_t size,
-enum dma_data_direction dir, unsigned long attrs)
-{
-   if (dev_is_dma_coherent(hwdev))
-   return;
-   if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
-   return;
-
-   __xen_dma_page_cpu_to_dev(hwdev, dev_addr, size, dir);
-}
-
-void __xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
-   size_t size, enum dma_data_direction dir,
-   unsigned long attrs)
-
-{
-   if (dev_is_dma_coherent(hwdev))
-   return;
-   if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
-   return;
-
-   __xen_dma_page_dev_to_cpu(hwdev, handle, size, dir);
-}
-
-void __xen_dma_sync_single_for_cpu(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   if (dev_is_dma_coherent(hwdev))
-   return;
-   __xen_dma_page_dev_to_cpu(hwdev, handle, size, dir);
-}
-
-void __xen_dma_sync_single_for_device(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   if (dev_is_dma_coherent(hwdev))
-   return;
-   __xen_dma_page_cpu_to_dev(hwdev, handle, size, dir);
-}
-
 bool xen_arch_need_swiotlb(struct device *dev,
   phys_addr_t phys,
   dma_addr_t dev_addr)
diff --git a/arch/x86/include/asm/xen/page-coherent.h 
b/arch/x86/include/asm/xen/page-coherent.h
index 116777e7f387..63cd41b2e17a 100644
--- a/arch/x86/include/asm/xen/page-coherent.h
+++ b/arch/x86/include/asm/xen/page-coherent.h
@@ -21,18 +21,4 @@ static inline void xen_free_coherent_pages(struct device 
*hwdev, size_t size,
free_pages((unsigned long) cpu_addr, get_order(size));
 }
 
-static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
-dma_addr_t dev_addr, unsigned long offset, size_t size,
-enum dma_data_direction dir, 

[PATCH 10/11] swiotlb-xen: merge xen_unmap_single into xen_swiotlb_unmap_page

2019-09-05 Thread Christoph Hellwig
No need for a no-op wrapper.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 drivers/xen/swiotlb-xen.c | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index f81031f0c1c7..1190934098eb 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -418,9 +418,8 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
  * After this call, reads by the cpu to the buffer are guaranteed to see
  * whatever the device wrote there.
  */
-static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr,
-size_t size, enum dma_data_direction dir,
-unsigned long attrs)
+static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
phys_addr_t paddr = xen_bus_to_phys(dev_addr);
 
@@ -434,13 +433,6 @@ static void xen_unmap_single(struct device *hwdev, 
dma_addr_t dev_addr,
swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
 }
 
-static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
-   size_t size, enum dma_data_direction dir,
-   unsigned long attrs)
-{
-   xen_unmap_single(hwdev, dev_addr, size, dir, attrs);
-}
-
 static void
 xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
size_t size, enum dma_data_direction dir)
@@ -481,7 +473,8 @@ xen_swiotlb_unmap_sg(struct device *hwdev, struct 
scatterlist *sgl, int nelems,
BUG_ON(dir == DMA_NONE);
 
for_each_sg(sgl, sg, nelems, i)
-   xen_unmap_single(hwdev, sg->dma_address, sg_dma_len(sg), dir, 
attrs);
+   xen_swiotlb_unmap_page(hwdev, sg->dma_address, sg_dma_len(sg),
+   dir, attrs);
 
 }
 
-- 
2.20.1



[PATCH 08/11] swiotlb-xen: use the same foreign page check everywhere

2019-09-05 Thread Christoph Hellwig
xen_dma_map_page uses a different and more complicated check for foreign
pages than the other three cache maintainance helpers.  Switch it to the
simpler pfn_valid method a well, and document the scheme with a single
improved comment in xen_dma_map_page.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 include/xen/arm/page-coherent.h | 31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/include/xen/arm/page-coherent.h b/include/xen/arm/page-coherent.h
index a840d6949a87..a8d9c0678c27 100644
--- a/include/xen/arm/page-coherent.h
+++ b/include/xen/arm/page-coherent.h
@@ -53,23 +53,17 @@ static inline void xen_dma_map_page(struct device *hwdev, 
struct page *page,
 dma_addr_t dev_addr, unsigned long offset, size_t size,
 enum dma_data_direction dir, unsigned long attrs)
 {
-   unsigned long page_pfn = page_to_xen_pfn(page);
-   unsigned long dev_pfn = XEN_PFN_DOWN(dev_addr);
-   unsigned long compound_pages =
-   (1

[PATCH 05/11] xen/arm: remove xen_dma_ops

2019-09-05 Thread Christoph Hellwig
arm and arm64 can just use xen_swiotlb_dma_ops directly like x86, no
need for a pointer indirection.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Julien Grall 
Reviewed-by: Stefano Stabellini 
---
 arch/arm/mm/dma-mapping.c| 3 ++-
 arch/arm/xen/mm.c| 4 
 arch/arm64/mm/dma-mapping.c  | 3 ++-
 include/xen/arm/hypervisor.h | 2 --
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 738097396445..2661cad36359 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "dma.h"
 #include "mm.h"
@@ -2360,7 +2361,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
 
 #ifdef CONFIG_XEN
if (xen_initial_domain())
-   dev->dma_ops = xen_dma_ops;
+   dev->dma_ops = _swiotlb_dma_ops;
 #endif
dev->archdata.dma_ops_setup = true;
 }
diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 2fde161733b0..11d5ad26fcfe 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -162,16 +162,12 @@ void xen_destroy_contiguous_region(phys_addr_t pstart, 
unsigned int order)
 }
 EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
-const struct dma_map_ops *xen_dma_ops;
-EXPORT_SYMBOL(xen_dma_ops);
-
 int __init xen_mm_init(void)
 {
struct gnttab_cache_flush cflush;
if (!xen_initial_domain())
return 0;
xen_swiotlb_init(1, false);
-   xen_dma_ops = _swiotlb_dma_ops;
 
cflush.op = 0;
cflush.a.dev_bus_addr = 0;
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index bd2b039f43a6..4b244a037349 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -64,6 +65,6 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 
size,
 
 #ifdef CONFIG_XEN
if (xen_initial_domain())
-   dev->dma_ops = xen_dma_ops;
+   dev->dma_ops = _swiotlb_dma_ops;
 #endif
 }
diff --git a/include/xen/arm/hypervisor.h b/include/xen/arm/hypervisor.h
index 2982571f7cc1..43ef24dd030e 100644
--- a/include/xen/arm/hypervisor.h
+++ b/include/xen/arm/hypervisor.h
@@ -19,8 +19,6 @@ static inline enum paravirt_lazy_mode 
paravirt_get_lazy_mode(void)
return PARAVIRT_LAZY_NONE;
 }
 
-extern const struct dma_map_ops *xen_dma_ops;
-
 #ifdef CONFIG_XEN
 void __init xen_early_init(void);
 #else
-- 
2.20.1



[PATCH 06/11] xen: remove the exports for xen_{create,destroy}_contiguous_region

2019-09-05 Thread Christoph Hellwig
These routines are only used by swiotlb-xen, which cannot be modular.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 arch/arm/xen/mm.c | 2 --
 arch/x86/xen/mmu_pv.c | 2 --
 2 files changed, 4 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 11d5ad26fcfe..9d73fa4a5991 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -154,13 +154,11 @@ int xen_create_contiguous_region(phys_addr_t pstart, 
unsigned int order,
*dma_handle = pstart;
return 0;
 }
-EXPORT_SYMBOL_GPL(xen_create_contiguous_region);
 
 void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
 {
return;
 }
-EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
 int __init xen_mm_init(void)
 {
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 26e8b326966d..c8dbee62ec2a 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -2625,7 +2625,6 @@ int xen_create_contiguous_region(phys_addr_t pstart, 
unsigned int order,
*dma_handle = virt_to_machine(vstart).maddr;
return success ? 0 : -ENOMEM;
 }
-EXPORT_SYMBOL_GPL(xen_create_contiguous_region);
 
 void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
 {
@@ -2660,7 +2659,6 @@ void xen_destroy_contiguous_region(phys_addr_t pstart, 
unsigned int order)
 
spin_unlock_irqrestore(_reservation_lock, flags);
 }
-EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
 static noinline void xen_flush_tlb_all(void)
 {
-- 
2.20.1



[PATCH 04/11] xen/arm: simplify dma_cache_maint

2019-09-05 Thread Christoph Hellwig
Calculate the required operation in the caller, and pass it directly
instead of recalculating it for each page, and use simple arithmetics
to get from the physical address to Xen page size aligned chunks.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 arch/arm/xen/mm.c | 61 ---
 1 file changed, 21 insertions(+), 40 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 90574d89d0d4..2fde161733b0 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -35,64 +35,45 @@ unsigned long xen_get_swiotlb_free_pages(unsigned int order)
return __get_free_pages(flags, order);
 }
 
-enum dma_cache_op {
-   DMA_UNMAP,
-   DMA_MAP,
-};
 static bool hypercall_cflush = false;
 
-/* functions called by SWIOTLB */
-
-static void dma_cache_maint(dma_addr_t handle, unsigned long offset,
-   size_t size, enum dma_data_direction dir, enum dma_cache_op op)
+/* buffers in highmem or foreign pages cannot cross page boundaries */
+static void dma_cache_maint(dma_addr_t handle, size_t size, u32 op)
 {
struct gnttab_cache_flush cflush;
-   unsigned long xen_pfn;
-   size_t left = size;
 
-   xen_pfn = (handle >> XEN_PAGE_SHIFT) + offset / XEN_PAGE_SIZE;
-   offset %= XEN_PAGE_SIZE;
+   cflush.a.dev_bus_addr = handle & XEN_PAGE_MASK;
+   cflush.offset = xen_offset_in_page(handle);
+   cflush.op = op;
 
do {
-   size_t len = left;
-   
-   /* buffers in highmem or foreign pages cannot cross page
-* boundaries */
-   if (len + offset > XEN_PAGE_SIZE)
-   len = XEN_PAGE_SIZE - offset;
-
-   cflush.op = 0;
-   cflush.a.dev_bus_addr = xen_pfn << XEN_PAGE_SHIFT;
-   cflush.offset = offset;
-   cflush.length = len;
-
-   if (op == DMA_UNMAP && dir != DMA_TO_DEVICE)
-   cflush.op = GNTTAB_CACHE_INVAL;
-   if (op == DMA_MAP) {
-   if (dir == DMA_FROM_DEVICE)
-   cflush.op = GNTTAB_CACHE_INVAL;
-   else
-   cflush.op = GNTTAB_CACHE_CLEAN;
-   }
-   if (cflush.op)
-   HYPERVISOR_grant_table_op(GNTTABOP_cache_flush, 
, 1);
+   if (size + cflush.offset > XEN_PAGE_SIZE)
+   cflush.length = XEN_PAGE_SIZE - cflush.offset;
+   else
+   cflush.length = size;
+
+   HYPERVISOR_grant_table_op(GNTTABOP_cache_flush, , 1);
 
-   offset = 0;
-   xen_pfn++;
-   left -= len;
-   } while (left);
+   cflush.offset = 0;
+   cflush.a.dev_bus_addr += cflush.length;
+   size -= cflush.length;
+   } while (size);
 }
 
 static void __xen_dma_page_dev_to_cpu(struct device *hwdev, dma_addr_t handle,
size_t size, enum dma_data_direction dir)
 {
-   dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, 
DMA_UNMAP);
+   if (dir != DMA_TO_DEVICE)
+   dma_cache_maint(handle, size, GNTTAB_CACHE_INVAL);
 }
 
 static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle,
size_t size, enum dma_data_direction dir)
 {
-   dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, 
DMA_MAP);
+   if (dir == DMA_FROM_DEVICE)
+   dma_cache_maint(handle, size, GNTTAB_CACHE_INVAL);
+   else
+   dma_cache_maint(handle, size, GNTTAB_CACHE_CLEAN);
 }
 
 void __xen_dma_map_page(struct device *hwdev, struct page *page,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/11] swiotlb-xen: remove xen_swiotlb_dma_mmap and xen_swiotlb_dma_get_sgtable

2019-09-05 Thread Christoph Hellwig
There is no need to wrap the common version, just wire them up directly.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 drivers/xen/swiotlb-xen.c | 29 ++---
 1 file changed, 2 insertions(+), 27 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index eee86cc7046b..b8808677ae1d 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -547,31 +547,6 @@ xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
return xen_virt_to_bus(xen_io_tlb_end - 1) <= mask;
 }
 
-/*
- * Create userspace mapping for the DMA-coherent memory.
- * This function should be called with the pages from the current domain only,
- * passing pages mapped from other domains would lead to memory corruption.
- */
-static int
-xen_swiotlb_dma_mmap(struct device *dev, struct vm_area_struct *vma,
-void *cpu_addr, dma_addr_t dma_addr, size_t size,
-unsigned long attrs)
-{
-   return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
-}
-
-/*
- * This function should be called with the pages from the current domain only,
- * passing pages mapped from other domains would lead to memory corruption.
- */
-static int
-xen_swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
-   void *cpu_addr, dma_addr_t handle, size_t size,
-   unsigned long attrs)
-{
-   return dma_common_get_sgtable(dev, sgt, cpu_addr, handle, size, attrs);
-}
-
 const struct dma_map_ops xen_swiotlb_dma_ops = {
.alloc = xen_swiotlb_alloc_coherent,
.free = xen_swiotlb_free_coherent,
@@ -584,6 +559,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
.map_page = xen_swiotlb_map_page,
.unmap_page = xen_swiotlb_unmap_page,
.dma_supported = xen_swiotlb_dma_supported,
-   .mmap = xen_swiotlb_dma_mmap,
-   .get_sgtable = xen_swiotlb_get_sgtable,
+   .mmap = dma_common_mmap,
+   .get_sgtable = dma_common_get_sgtable,
 };
-- 
2.20.1



[PATCH 03/11] xen/arm: use dev_is_dma_coherent

2019-09-05 Thread Christoph Hellwig
Use the dma-noncoherent dev_is_dma_coherent helper instead of the home
grown variant.  Note that both are always initialized to the same
value in arch_setup_dma_ops.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Julien Grall 
Reviewed-by: Stefano Stabellini 
---
 arch/arm/include/asm/dma-mapping.h   |  6 --
 arch/arm/xen/mm.c| 12 ++--
 arch/arm64/include/asm/dma-mapping.h |  9 -
 3 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index dba9355e2484..bdd80ddbca34 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -91,12 +91,6 @@ static inline dma_addr_t virt_to_dma(struct device *dev, 
void *addr)
 }
 #endif
 
-/* do not use this function in a driver */
-static inline bool is_device_dma_coherent(struct device *dev)
-{
-   return dev->archdata.dma_coherent;
-}
-
 /**
  * arm_dma_alloc - allocate consistent memory for DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index d33b77e9add3..90574d89d0d4 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -99,7 +99,7 @@ void __xen_dma_map_page(struct device *hwdev, struct page 
*page,
 dma_addr_t dev_addr, unsigned long offset, size_t size,
 enum dma_data_direction dir, unsigned long attrs)
 {
-   if (is_device_dma_coherent(hwdev))
+   if (dev_is_dma_coherent(hwdev))
return;
if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
return;
@@ -112,7 +112,7 @@ void __xen_dma_unmap_page(struct device *hwdev, dma_addr_t 
handle,
unsigned long attrs)
 
 {
-   if (is_device_dma_coherent(hwdev))
+   if (dev_is_dma_coherent(hwdev))
return;
if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
return;
@@ -123,7 +123,7 @@ void __xen_dma_unmap_page(struct device *hwdev, dma_addr_t 
handle,
 void __xen_dma_sync_single_for_cpu(struct device *hwdev,
dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-   if (is_device_dma_coherent(hwdev))
+   if (dev_is_dma_coherent(hwdev))
return;
__xen_dma_page_dev_to_cpu(hwdev, handle, size, dir);
 }
@@ -131,7 +131,7 @@ void __xen_dma_sync_single_for_cpu(struct device *hwdev,
 void __xen_dma_sync_single_for_device(struct device *hwdev,
dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-   if (is_device_dma_coherent(hwdev))
+   if (dev_is_dma_coherent(hwdev))
return;
__xen_dma_page_cpu_to_dev(hwdev, handle, size, dir);
 }
@@ -159,7 +159,7 @@ bool xen_arch_need_swiotlb(struct device *dev,
 * memory and we are not able to flush the cache.
 */
return (!hypercall_cflush && (xen_pfn != bfn) &&
-   !is_device_dma_coherent(dev));
+   !dev_is_dma_coherent(dev));
 }
 
 int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
index bdcb0922a40c..67243255a858 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -18,14 +18,5 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return NULL;
 }
 
-/*
- * Do not use this function in a driver, it is only provided for
- * arch/arm/mm/xen.c, which is used by arm64 as well.
- */
-static inline bool is_device_dma_coherent(struct device *dev)
-{
-   return dev->dma_coherent;
-}
-
 #endif /* __KERNEL__ */
 #endif /* __ASM_DMA_MAPPING_H */
-- 
2.20.1



[PATCH 02/11] xen/arm: consolidate page-coherent.h

2019-09-05 Thread Christoph Hellwig
Shared the duplicate arm/arm64 code in include/xen/arm/page-coherent.h.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/xen/page-coherent.h   | 75 
 arch/arm64/include/asm/xen/page-coherent.h | 75 
 include/xen/arm/page-coherent.h| 80 ++
 3 files changed, 80 insertions(+), 150 deletions(-)

diff --git a/arch/arm/include/asm/xen/page-coherent.h 
b/arch/arm/include/asm/xen/page-coherent.h
index 602ac02f154c..27e984977402 100644
--- a/arch/arm/include/asm/xen/page-coherent.h
+++ b/arch/arm/include/asm/xen/page-coherent.h
@@ -1,77 +1,2 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_ARM_XEN_PAGE_COHERENT_H
-#define _ASM_ARM_XEN_PAGE_COHERENT_H
-
-#include 
-#include 
 #include 
-
-static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
-   dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
-{
-   return dma_direct_alloc(hwdev, size, dma_handle, flags, attrs);
-}
-
-static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
-   void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
-{
-   dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs);
-}
-
-static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   unsigned long pfn = PFN_DOWN(handle);
-
-   if (pfn_valid(pfn))
-   dma_direct_sync_single_for_cpu(hwdev, handle, size, dir);
-   else
-   __xen_dma_sync_single_for_cpu(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_sync_single_for_device(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   unsigned long pfn = PFN_DOWN(handle);
-   if (pfn_valid(pfn))
-   dma_direct_sync_single_for_device(hwdev, handle, size, dir);
-   else
-   __xen_dma_sync_single_for_device(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
-dma_addr_t dev_addr, unsigned long offset, size_t size,
-enum dma_data_direction dir, unsigned long attrs)
-{
-   unsigned long page_pfn = page_to_xen_pfn(page);
-   unsigned long dev_pfn = XEN_PFN_DOWN(dev_addr);
-   unsigned long compound_pages =
-   (1<
-#include 
 #include 
-
-static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
-   dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
-{
-   return dma_direct_alloc(hwdev, size, dma_handle, flags, attrs);
-}
-
-static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
-   void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
-{
-   dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs);
-}
-
-static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   unsigned long pfn = PFN_DOWN(handle);
-
-   if (pfn_valid(pfn))
-   dma_direct_sync_single_for_cpu(hwdev, handle, size, dir);
-   else
-   __xen_dma_sync_single_for_cpu(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_sync_single_for_device(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   unsigned long pfn = PFN_DOWN(handle);
-   if (pfn_valid(pfn))
-   dma_direct_sync_single_for_device(hwdev, handle, size, dir);
-   else
-   __xen_dma_sync_single_for_device(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
-dma_addr_t dev_addr, unsigned long offset, size_t size,
-enum dma_data_direction dir, unsigned long attrs)
-{
-   unsigned long page_pfn = page_to_xen_pfn(page);
-   unsigned long dev_pfn = XEN_PFN_DOWN(dev_addr);
-   unsigned long compound_pages =
-   (1<
+#include 
+
 void __xen_dma_map_page(struct device *hwdev, struct page *page,
 dma_addr_t dev_addr, unsigned long offset, size_t size,
 enum dma_data_direction dir, unsigned long attrs);
@@ -13,4 +16,81 @@ void __xen_dma_sync_single_for_cpu(struct device *hwdev,
 void __xen_dma_sync_single_for_device(struct device *hwdev,
dma_addr_t handle, size_t size, enum dma_data_direction dir);
 
+static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
+   dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
+{
+   return dma_direct_alloc(hwdev, size, dma_handle, flags, attrs);
+}
+
+static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
+   void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
+{
+   dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs);
+}
+
+static inline void 

[PATCH 01/11] xen/arm: use dma-noncoherent.h calls for xen-swiotlb cache maintainance

2019-09-05 Thread Christoph Hellwig
Copy the arm64 code that uses the dma-direct/swiotlb helpers for DMA
on-coherent devices.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/device.h|  3 -
 arch/arm/include/asm/xen/page-coherent.h | 72 +---
 arch/arm/mm/dma-mapping.c|  8 +--
 drivers/xen/swiotlb-xen.c| 20 ---
 4 files changed, 28 insertions(+), 75 deletions(-)

diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index f6955b55c544..c675bc0d5aa8 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -14,9 +14,6 @@ struct dev_archdata {
 #endif
 #ifdef CONFIG_ARM_DMA_USE_IOMMU
struct dma_iommu_mapping*mapping;
-#endif
-#ifdef CONFIG_XEN
-   const struct dma_map_ops *dev_dma_ops;
 #endif
unsigned int dma_coherent:1;
unsigned int dma_ops_setup:1;
diff --git a/arch/arm/include/asm/xen/page-coherent.h 
b/arch/arm/include/asm/xen/page-coherent.h
index 2c403e7c782d..602ac02f154c 100644
--- a/arch/arm/include/asm/xen/page-coherent.h
+++ b/arch/arm/include/asm/xen/page-coherent.h
@@ -6,23 +6,37 @@
 #include 
 #include 
 
-static inline const struct dma_map_ops *xen_get_dma_ops(struct device *dev)
-{
-   if (dev && dev->archdata.dev_dma_ops)
-   return dev->archdata.dev_dma_ops;
-   return get_arch_dma_ops(NULL);
-}
-
 static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
 {
-   return xen_get_dma_ops(hwdev)->alloc(hwdev, size, dma_handle, flags, 
attrs);
+   return dma_direct_alloc(hwdev, size, dma_handle, flags, attrs);
 }
 
 static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
 {
-   xen_get_dma_ops(hwdev)->free(hwdev, size, cpu_addr, dma_handle, attrs);
+   dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs);
+}
+
+static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
+   dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+   unsigned long pfn = PFN_DOWN(handle);
+
+   if (pfn_valid(pfn))
+   dma_direct_sync_single_for_cpu(hwdev, handle, size, dir);
+   else
+   __xen_dma_sync_single_for_cpu(hwdev, handle, size, dir);
+}
+
+static inline void xen_dma_sync_single_for_device(struct device *hwdev,
+   dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+   unsigned long pfn = PFN_DOWN(handle);
+   if (pfn_valid(pfn))
+   dma_direct_sync_single_for_device(hwdev, handle, size, dir);
+   else
+   __xen_dma_sync_single_for_device(hwdev, handle, size, dir);
 }
 
 static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
@@ -36,17 +50,8 @@ static inline void xen_dma_map_page(struct device *hwdev, 
struct page *page,
bool local = (page_pfn <= dev_pfn) &&
(dev_pfn - page_pfn < compound_pages);
 
-   /*
-* Dom0 is mapped 1:1, while the Linux page can span across
-* multiple Xen pages, it's not possible for it to contain a
-* mix of local and foreign Xen pages. So if the first xen_pfn
-* == mfn the page is local otherwise it's a foreign page
-* grant-mapped in dom0. If the page is local we can safely
-* call the native dma_ops function, otherwise we call the xen
-* specific function.
-*/
if (local)
-   xen_get_dma_ops(hwdev)->map_page(hwdev, page, offset, size, 
dir, attrs);
+   dma_direct_map_page(hwdev, page, offset, size, dir, attrs);
else
__xen_dma_map_page(hwdev, page, dev_addr, offset, size, dir, 
attrs);
 }
@@ -63,33 +68,10 @@ static inline void xen_dma_unmap_page(struct device *hwdev, 
dma_addr_t handle,
 * safely call the native dma_ops function, otherwise we call the xen
 * specific function.
 */
-   if (pfn_valid(pfn)) {
-   if (xen_get_dma_ops(hwdev)->unmap_page)
-   xen_get_dma_ops(hwdev)->unmap_page(hwdev, handle, size, 
dir, attrs);
-   } else
+   if (pfn_valid(pfn))
+   dma_direct_unmap_page(hwdev, handle, size, dir, attrs);
+   else
__xen_dma_unmap_page(hwdev, handle, size, dir, attrs);
 }
 
-static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   unsigned long pfn = PFN_DOWN(handle);
-   if (pfn_valid(pfn)) {
-   if (xen_get_dma_ops(hwdev)->sync_single_for_cpu)
-   xen_get_dma_ops(hwdev)->sync_single_for_cpu(hwdev, 
handle, size, dir);
-   } else
-   __xen_dma_sync_single_for_cpu(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_sync_single_for_device(struct device *hwdev,
-

swiotlb-xen cleanups v4

2019-09-05 Thread Christoph Hellwig
Hi Xen maintainers and friends,

please take a look at this series that cleans up the parts of swiotlb-xen
that deal with non-coherent caches.


Changes since v3:
 - don't use dma_direct_alloc on x86

Changes since v2:
 - further dma_cache_maint improvements
 - split the previous patch 1 into 3 patches

Changes since v1:
 - rewrite dma_cache_maint to be much simpler
 - improve various comments and commit logs
 - remove page-coherent.h entirely


Re: [V3, 2/2] media: i2c: Add Omnivision OV02A10 camera sensor driver

2019-09-05 Thread Tomasz Figa
On Thu, Sep 5, 2019 at 7:45 PM Sakari Ailus
 wrote:
>
> Hi Dongchun,
>
> On Thu, Sep 05, 2019 at 05:41:05PM +0800, Dongchun Zhu wrote:
>
> ...
>
> > > > + ret = regulator_bulk_enable(OV02A10_NUM_SUPPLIES, ov02a10->supplies);
> > > > + if (ret < 0) {
> > > > + dev_err(dev, "Failed to enable regulators\n");
> > > > + goto disable_clk;
> > > > + }
> > > > + msleep_range(7);
> > >
> > > This has some potential of clashing with more generic functions in the
> > > future. Please use usleep_range directly, or msleep.
> > >
> >
> > Did you mean using usleep_range(7*1000, 8*1000), as used in patch v1?
> > https://patchwork.kernel.org/patch/10957225/
>
> Yes, please.

Why not just msleep()?


Re: [PATCH 1/2] iommu/ipmmu-vmsa: Move IMTTBCR_SL0_TWOBIT_* to restore sort order

2019-09-05 Thread Simon Horman
On Wed, Sep 04, 2019 at 02:08:01PM +0200, Geert Uytterhoeven wrote:
> Move the recently added IMTTBCR_SL0_TWOBIT_* definitions up, to make
> sure all IMTTBCR register bit definitions are sorted by decreasing bit
> index.  Add comments to make it clear that they exist on R-Car Gen3
> only.
> 
> Fixes: c295f504fb5a38ab ("iommu/ipmmu-vmsa: Allow two bit SL0")
> Signed-off-by: Geert Uytterhoeven 

Reviewed-by: Simon Horman 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/ipmmu-vmsa: Disable cache snoop transactions on R-Car Gen3

2019-09-05 Thread Simon Horman
On Wed, Sep 04, 2019 at 02:08:02PM +0200, Geert Uytterhoeven wrote:
> From: Hai Nguyen Pham 
> 
> According to the Hardware Manual Errata for Rev. 1.50 of April 10, 2019,
> cache snoop transactions for page table walk requests are not supported
> on R-Car Gen3.
> 
> Hence, this patch removes setting these fields in the IMTTBCR register,
> since it will have no effect, and adds comments to the register bit
> definitions, to make it clear they apply to R-Car Gen2 only.
> 
> Signed-off-by: Hai Nguyen Pham 
> [geert: Reword, add comments]
> Signed-off-by: Geert Uytterhoeven 

Reviewed-by: Simon Horman 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [V2, 2/2] media: i2c: Add DW9768 VCM driver

2019-09-05 Thread Tomasz Figa
Hi Dongchun,

On Thu, Sep 5, 2019 at 4:22 PM  wrote:
>
> From: Dongchun Zhu 
>
> This patch adds a V4L2 sub-device driver for DW9768 lens voice coil,
> and provides control to set the desired focus.
>
> The DW9768 is a 10 bit DAC with 100mA output current sink capability
> from Dongwoon, designed for linear control of voice coil motor,
> and controlled via I2C serial interface.
>
> Signed-off-by: Dongchun Zhu 
> ---
>  MAINTAINERS|   1 +
>  drivers/media/i2c/Kconfig  |  10 ++
>  drivers/media/i2c/Makefile |   1 +
>  drivers/media/i2c/dw9768.c | 349 
> +
>  4 files changed, 361 insertions(+)
>  create mode 100644 drivers/media/i2c/dw9768.c
>

Thanks for v2! Please see my comments inline.

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 192a671..c5c9a0e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4976,6 +4976,7 @@ M:Dongchun Zhu 
>  L: linux-me...@vger.kernel.org
>  T: git git://linuxtv.org/media_tree.git
>  S: Maintained
> +F: drivers/media/i2c/dw9768.c
>  F: Documentation/devicetree/bindings/media/i2c/dongwoon,dw9768.txt
>
>  DONGWOON DW9807 LENS VOICE COIL DRIVER
> diff --git a/drivers/media/i2c/Kconfig b/drivers/media/i2c/Kconfig
> index 79ce9ec..dfb665c 100644
> --- a/drivers/media/i2c/Kconfig
> +++ b/drivers/media/i2c/Kconfig
> @@ -1016,6 +1016,16 @@ config VIDEO_DW9714
>   capability. This is designed for linear control of
>   voice coil motors, controlled via I2C serial interface.
>
> +config VIDEO_DW9768
> +   tristate "DW9768 lens voice coil support"
> +   depends on I2C && VIDEO_V4L2 && MEDIA_CONTROLLER
> +   depends on VIDEO_V4L2_SUBDEV_API
> +   help
> + This is a driver for the DW9768 camera lens voice coil.
> + DW9768 is a 10 bit DAC with 100mA output current sink
> + capability. This is designed for linear control of
> + voice coil motors, controlled via I2C serial interface.
> +
>  config VIDEO_DW9807_VCM
> tristate "DW9807 lens voice coil support"
> depends on I2C && VIDEO_V4L2 && MEDIA_CONTROLLER
> diff --git a/drivers/media/i2c/Makefile b/drivers/media/i2c/Makefile
> index fd4ea86..2561239 100644
> --- a/drivers/media/i2c/Makefile
> +++ b/drivers/media/i2c/Makefile
> @@ -24,6 +24,7 @@ obj-$(CONFIG_VIDEO_SAA6752HS) += saa6752hs.o
>  obj-$(CONFIG_VIDEO_AD5820)  += ad5820.o
>  obj-$(CONFIG_VIDEO_AK7375)  += ak7375.o
>  obj-$(CONFIG_VIDEO_DW9714)  += dw9714.o
> +obj-$(CONFIG_VIDEO_DW9768)  += dw9768.o
>  obj-$(CONFIG_VIDEO_DW9807_VCM)  += dw9807-vcm.o
>  obj-$(CONFIG_VIDEO_ADV7170) += adv7170.o
>  obj-$(CONFIG_VIDEO_ADV7175) += adv7175.o
> diff --git a/drivers/media/i2c/dw9768.c b/drivers/media/i2c/dw9768.c
> new file mode 100644
> index 000..66d1712
> --- /dev/null
> +++ b/drivers/media/i2c/dw9768.c
> @@ -0,0 +1,349 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2019 MediaTek Inc.
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DW9768_NAME"dw9768"
> +#define DW9768_MAX_FOCUS_POS   1023
> +/*
> + * This sets the minimum granularity for the focus positions.
> + * A value of 1 gives maximum accuracy for a desired focus position
> + */
> +#define DW9768_FOCUS_STEPS 1
> +/*
> + * DW9768 separates two registers to control the VCM position.
> + * One for MSB value, another is LSB value.
> + */
> +#define DW9768_REG_MASK_MSB0x03
> +#define DW9768_REG_MASK_LSB0xff
> +#define DW9768_SET_POSITION_ADDR0x03
> +
> +#define DW9768_CMD_DELAY   0xff
> +#define DW9768_CTRL_DELAY_US   5000
> +
> +#define DW9768_DAC_SHIFT   8
> +
> +/* dw9768 device structure */
> +struct dw9768 {
> +   struct v4l2_ctrl_handler ctrls;
> +   struct v4l2_subdev sd;
> +   struct regulator *vin;
> +   struct regulator *vdd;
> +};
> +
> +static inline struct dw9768 *to_dw9768_vcm(struct v4l2_ctrl *ctrl)
> +{
> +   return container_of(ctrl->handler, struct dw9768, ctrls);
> +}
> +
> +static inline struct dw9768 *sd_to_dw9768_vcm(struct v4l2_subdev *subdev)
> +{
> +   return container_of(subdev, struct dw9768, sd);
> +}
> +
> +struct regval_list {
> +   unsigned char reg_num;
> +   unsigned char value;
> +};
> +
> +static struct regval_list dw9768_init_regs[] = {
> +   {0x02, 0x02},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +   {0x06, 0x41},
> +   {0x07, 0x39},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +};
> +
> +static struct regval_list dw9768_release_regs[] = {
> +   {0x02, 0x00},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +   {0x01, 0x00},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +};
> +
> +static int dw9768_write_smbus(struct dw9768 *dw9768, unsigned char reg,
> + unsigned char 

Re: [PATCH V5 0/5] iommu/amd: Convert the AMD iommu driver to the dma-iommu api

2019-09-05 Thread Christoph Hellwig
Dave, Joerg, Robin:

is there any chance we could at least pick up patches 2 and 4 ASAP
as they are clearly fixes for current deficits, even without the
amd conversion?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage

2019-09-05 Thread Christoph Hellwig
On Wed, Sep 04, 2019 at 02:40:44PM -0700, David Rientjes wrote:
> Hi Christoph, Jens, and Ming,
> 
> While booting a 5.2 SEV-enabled guest we have encountered the following 
> WARNING that is followed up by a BUG because we are in atomic context 
> while trying to call set_memory_decrypted:

Well, this really is a x86 / DMA API issue unfortunately.  Drivers
are allowed to do GFP_ATOMIC dma allocation under locks / rcu critical
sections and from interrupts.  And it seems like the SEV case can't
handle that.  We have some semi-generic code to have a fixed sized
pool in kernel/dma for non-coherent platforms that have similar issues
that we could try to wire up, but I wonder if there is a better way
to handle the issue, so I've added Tom and the x86 maintainers.

Now independent of that issue using DMA coherent memory for the nvme
PRPs/SGLs doesn't actually feel very optional.  We could do with
normal kmalloc allocations and just sync it to the device and back.
I wonder if we should create some general mempool-like helpers for that.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu