Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

2019-04-22 Thread Christoph Hellwig
On Tue, Apr 23, 2019 at 09:58:19AM +0800, Lu Baolu wrote:
> 554 for (i = 0; i < nslots; i++)
> 555 io_tlb_orig_addr[index+i] = orig_addr + (i << 
> IO_TLB_SHIFT);
>
> Could the tlb orig address set to PAGE_ALIGN_DOWN(orig_addr)? We
> couldn't assume the bounce buffer just starts from the beginning of the
> slot. Or anything I missed?

I don't see why we need to align the orig_addr.  We only use
io_tlb_orig_addr to find the address(es) for the swiotlb_bounce calls,
and I don't see a good reason why we'd need to align those.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 08/10] iommu/vt-d: Check whether device requires bounce buffer

2019-04-22 Thread Christoph Hellwig
>> Again, this and the option should not be in a specific iommu driver.
>>
>
> The option of whether bounce is ignored should be in the specific iommu
> driver.

Why?  As a user I could not care less which IOMMU driver my particular
system uses.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/6] iommu: add generic boot option iommu.dma_mode

2019-04-22 Thread Leizhen (ThunderTown)



On 2019/4/12 19:16, Joerg Roedel wrote:
> On Tue, Apr 09, 2019 at 08:53:03PM +0800, Zhen Lei wrote:
>> +static int __init iommu_dma_mode_setup(char *str)
>> +{
>> +if (!str)
>> +goto fail;
>> +
>> +if (!strncmp(str, "passthrough", 11))
>> +iommu_default_dma_mode = IOMMU_DMA_MODE_PASSTHROUGH;
>> +else if (!strncmp(str, "lazy", 4))
>> +iommu_default_dma_mode = IOMMU_DMA_MODE_LAZY;
>> +else if (!strncmp(str, "strict", 6))
>> +iommu_default_dma_mode = IOMMU_DMA_MODE_STRICT;
>> +else
>> +goto fail;
>> +
>> +pr_info("Force dma mode to be %d\n", iommu_default_dma_mode);
> 
> Printing a number is not very desriptive or helpful to the user. Please
> print the name of the mode instead.

OK, thanks. I have given up adding iommu.dma_mode boot option according
to Robin and Will's suggestion. So these codes will be removed in v6.

> 
> 
> Regards,
> 
>   Joerg
> 
> .
> 

-- 
Thanks!
BestRegards

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 08/10] iommu/vt-d: Check whether device requires bounce buffer

2019-04-22 Thread Lu Baolu

Hi,

On 4/23/19 12:47 AM, Christoph Hellwig wrote:

On Sun, Apr 21, 2019 at 09:17:17AM +0800, Lu Baolu wrote:

+static inline bool device_needs_bounce(struct device *dev)
+{
+   struct pci_dev *pdev = NULL;
+
+   if (intel_no_bounce)
+   return false;
+
+   if (dev_is_pci(dev))
+   pdev = to_pci_dev(dev);
+
+   return pdev ? pdev->untrusted : false;
+}


Again, this and the option should not be in a specific iommu driver.



The option of whether bounce is ignored should be in the specific iommu
driver. Or any other thought?

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

2019-04-22 Thread Lu Baolu

Hi Christoph,

Thanks for reviewing my patches.

On 4/23/19 12:45 AM, Christoph Hellwig wrote:

I looked over your swiotlb modification and I don't think we really need
them.  The only thing we really need is to split the size parameter to
swiotlb_tbl_map_single and swiotlb_tbl_unmap_single into an alloc_size
and a mapping_size paramter, where the latter one is rounded up to the
iommu page size.  Below is an untested patch on top of your series to
show what I mean. 


Good suggestion. The only problem as far as I can see is:

442 phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
443 dma_addr_t tbl_dma_addr, phys_addr_t orig_addr,
444 size_t mapping_size, size_t alloc_size,
445 enum dma_data_direction dir, unsigned long attrs)
446 {
447 unsigned long flags;
448 phys_addr_t tlb_addr;
449 unsigned int nslots, stride, index, wrap;
450 int i;
451 unsigned long mask;
452 unsigned long offset_slots;
453 unsigned long max_slots;

[--->o<]

545 found:
546 io_tlb_used += nslots;
547 spin_unlock_irqrestore(&io_tlb_lock, flags);
548
549 /*
550  * Save away the mapping from the original address to the 
DMA address.
551  * This is needed when we sync the memory.  Then we sync the 
buffer if

552  * needed.
553  */
554 for (i = 0; i < nslots; i++)
555 io_tlb_orig_addr[index+i] = orig_addr + (i << 
IO_TLB_SHIFT);


Could the tlb orig address set to PAGE_ALIGN_DOWN(orig_addr)? We
couldn't assume the bounce buffer just starts from the beginning of the
slot. Or anything I missed?


556 if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
557 (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
558 swiotlb_bounce(orig_addr, tlb_addr, mapping_size,
559 DMA_TO_DEVICE);

Same here. We should sync from the place where the bounce buffer starts
from.


560
561 return tlb_addr;
562 }



That being said - both the current series and the one
with my patch will still leak the content of the swiotlb buffer
allocated but not used to the untrusted external device.  Is that
acceptable?  If not we need to clear that part, at which point you don't
need swiotlb changes.


Good catch. I think the allocated buffer should be cleared, otherwise,
the untrusted device still has a chance to access the data belonging to
other swiotlb consumers.


 Another implication is that for untrusted devices
the size of the dma coherent allocations needs to be rounded up to the
iommu page size (if that can ever be larger than the host page size).


Agreed.

Intel iommu driver has already aligned the dma coherent allocation size
to PAGE_SIZE. And alloc_coherent is equal to alloc plus map. Hence, it
eventually goes into bounce_map().

Best regards,
Lu Baolu



diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8c4a078fb041..eb5c32ad4443 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2151,10 +2151,13 @@ static int bounce_map(struct device *dev, struct 
iommu_domain *domain,
  void *data)
  {
const struct iommu_ops *ops = domain->ops;
+   unsigned long page_size = domain_minimal_pgsize(domain);
phys_addr_t tlb_addr;
int prot = 0;
int ret;
  
+	if (WARN_ON_ONCE(size > page_size))

+   return -EINVAL;
if (unlikely(!ops->map || domain->pgsize_bitmap == 0UL))
return -ENODEV;
  
@@ -2164,16 +2167,16 @@ static int bounce_map(struct device *dev, struct iommu_domain *domain,

if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
prot |= IOMMU_WRITE;
  
-	tlb_addr = phys_to_dma(dev, paddr);

-   if (!swiotlb_map(dev, &paddr, &tlb_addr, size,
-dir, attrs | DMA_ATTR_BOUNCE_PAGE))
+   tlb_addr = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
+   paddr, size, page_size, dir, attrs);
+   if (tlb_addr == DMA_MAPPING_ERROR)
return -ENOMEM;
  
  	ret = ops->map(domain, addr, tlb_addr, size, prot);

-   if (ret)
-   swiotlb_tbl_unmap_single(dev, tlb_addr, size,
-dir, attrs | DMA_ATTR_BOUNCE_PAGE);
-
+   if (ret) {
+   swiotlb_tbl_unmap_single(dev, tlb_addr, size, page_size,
+dir, attrs);
+   }
return ret;
  }
  
@@ -2194,11 +2197,12 @@ static int bounce_unmap(struct device *dev, struct iommu_domain *domain,
  
  	if (unlikely(!ops->unmap))

return -ENODEV;
-   ops->unmap(domain, ALIGN_DOWN(addr, page_size), page_size);
+   ops->unmap(domain, addr, page_size);
  
-	if (tlb_addr)

-   swiotlb_tbl_unmap_single(dev, tlb_addr, size,
-dir, attrs | DMA_ATTR_BOUNCE_PAGE);
+   if (tlb_addr) {
+  

Re: [PATCH v3 07/10] iommu/vt-d: Keep swiotlb on if bounce page is necessary

2019-04-22 Thread Lu Baolu

Hi,

On 4/23/19 12:47 AM, Christoph Hellwig wrote:

On Sun, Apr 21, 2019 at 09:17:16AM +0800, Lu Baolu wrote:

+static inline bool platform_has_untrusted_device(void)
  {
+   bool has_untrusted_device = false;
struct pci_dev *pdev = NULL;
  
  	for_each_pci_dev(pdev) {

if (pdev->untrusted) {
+   has_untrusted_device = true;
break;
}
}
  
+	return has_untrusted_device;


This shouldn't really be in the intel-iommu driver, should it?
This probably should be something like pci_has_untrusted_devices
and be moved to the PCI code.



Fair enough.

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-22 Thread Kees Cook via iommu
On Thu, Apr 18, 2019 at 7:35 AM Khalid Aziz  wrote:
>
> On 4/17/19 11:41 PM, Kees Cook wrote:
> > On Wed, Apr 17, 2019 at 11:41 PM Andy Lutomirski  wrote:
> >> I don't think this type of NX goof was ever the argument for XPFO.
> >> The main argument I've heard is that a malicious user program writes a
> >> ROP payload into user memory (regular anonymous user memory) and then
> >> gets the kernel to erroneously set RSP (*not* RIP) to point there.
> >
> > Well, more than just ROP. Any of the various attack primitives. The NX
> > stuff is about moving RIP: SMEP-bypassing. But there is still basic
> > SMAP-bypassing for putting a malicious structure in userspace and
> > having the kernel access it via the linear mapping, etc.
> >
> >> I find this argument fairly weak for a couple reasons.  First, if
> >> we're worried about this, let's do in-kernel CFI, not XPFO, to
> >
> > CFI is getting much closer. Getting the kernel happy under Clang, LTO,
> > and CFI is under active development. (It's functional for arm64
> > already, and pieces have been getting upstreamed.)
> >
>
> CFI theoretically offers protection with fairly low overhead. I have not
> played much with CFI in clang. I agree with Linus that probability of
> bugs in XPFO implementation itself is a cause of concern. If CFI in
> Clang can provide us the same level of protection as XPFO does, I
> wouldn't want to push for an expensive change like XPFO.
>
> If Clang/CFI can't get us there for extended period of time, does it
> make sense to continue to poke at XPFO?

Well, I think CFI will certainly vastly narrow the execution paths
available to an attacker, but what I continue to see XPFO useful for
is stopping attacks that need to locate something in memory. (i.e. not
ret2dir but, like, read2dir.) It's arguable that such attacks would
just use heap, stack, etc to hold such things, but the linear map
remains relatively easy to find/target. But I agree: the protection is
getting more and more narrow (especially with CFI coming down the
pipe), and if it's still a 28% hit, that's not going to be tenable for
anyone but the truly paranoid. :)

All that said, there isn't a very good backward-edge CFI protection
(i.e. ROP defense) on x86 in Clang. The forward-edge looks decent, but
requires LTO, etc. My point is there is still a long path to gaining
CFI in upstream.

-- 
Kees Cook
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 09/10] NTB: Add MSI interrupt support to ntb_transport

2019-04-22 Thread Logan Gunthorpe
Introduce the module parameter 'use_msi' which, when set, uses
MSI interrupts instead of doorbells for each queue pair (QP). The
parameter is only available if NTB MSI support is configured into
the kernel. We also require there to be more than one memory window
(MW) so that an extra one is available to forward the APIC region.

To use MSIs, we request one interrupt per QP and forward the MSI address
and data to the peer using scratch pad registers (SPADS) above the MW
SPADS. (If there are not enough SPADS the MSI interrupt will not be used.)

Once registered, we simply use ntb_msi_peer_trigger and the receiving
ISR simply queues up the rxc_db_work for the queue.

This addition can significantly improve performance of ntb_transport.
In a simple, untuned, apples-to-apples comparision using ntb_netdev
and iperf with switchtec hardware, I see 3.88Gb/s without MSI
interrupts and 14.1Gb/s wit MSI, which is a more than 3x improvement.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 drivers/ntb/ntb_transport.c | 169 +++-
 1 file changed, 168 insertions(+), 1 deletion(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index d4f39ba1d976..f1cf0942cb99 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -93,6 +93,12 @@ static bool use_dma;
 module_param(use_dma, bool, 0644);
 MODULE_PARM_DESC(use_dma, "Use DMA engine to perform large data copy");
 
+static bool use_msi;
+#ifdef CONFIG_NTB_MSI
+module_param(use_msi, bool, 0644);
+MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
+#endif
+
 static struct dentry *nt_debugfs_dir;
 
 /* Only two-ports NTB devices are supported */
@@ -188,6 +194,11 @@ struct ntb_transport_qp {
u64 tx_err_no_buf;
u64 tx_memcpy;
u64 tx_async;
+
+   bool use_msi;
+   int msi_irq;
+   struct ntb_msi_desc msi_desc;
+   struct ntb_msi_desc peer_msi_desc;
 };
 
 struct ntb_transport_mw {
@@ -221,6 +232,10 @@ struct ntb_transport_ctx {
u64 qp_bitmap;
u64 qp_bitmap_free;
 
+   bool use_msi;
+   unsigned int msi_spad_offset;
+   u64 msi_db_mask;
+
bool link_is_up;
struct delayed_work link_work;
struct work_struct link_cleanup;
@@ -667,6 +682,114 @@ static int ntb_transport_setup_qp_mw(struct 
ntb_transport_ctx *nt,
return 0;
 }
 
+static irqreturn_t ntb_transport_isr(int irq, void *dev)
+{
+   struct ntb_transport_qp *qp = dev;
+
+   tasklet_schedule(&qp->rxc_db_work);
+
+   return IRQ_HANDLED;
+}
+
+static void ntb_transport_setup_qp_peer_msi(struct ntb_transport_ctx *nt,
+   unsigned int qp_num)
+{
+   struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
+   int spad = qp_num * 2 + nt->msi_spad_offset;
+
+   if (!nt->use_msi)
+   return;
+
+   if (spad >= ntb_spad_count(nt->ndev))
+   return;
+
+   qp->peer_msi_desc.addr_offset =
+   ntb_peer_spad_read(qp->ndev, PIDX, spad);
+   qp->peer_msi_desc.data =
+   ntb_peer_spad_read(qp->ndev, PIDX, spad + 1);
+
+   dev_dbg(&qp->ndev->pdev->dev, "QP%d Peer MSI addr=%x data=%x\n",
+   qp_num, qp->peer_msi_desc.addr_offset, qp->peer_msi_desc.data);
+
+   if (qp->peer_msi_desc.addr_offset) {
+   qp->use_msi = true;
+   dev_info(&qp->ndev->pdev->dev,
+"Using MSI interrupts for QP%d\n", qp_num);
+   }
+}
+
+static void ntb_transport_setup_qp_msi(struct ntb_transport_ctx *nt,
+  unsigned int qp_num)
+{
+   struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
+   int spad = qp_num * 2 + nt->msi_spad_offset;
+   int rc;
+
+   if (!nt->use_msi)
+   return;
+
+   if (spad >= ntb_spad_count(nt->ndev)) {
+   dev_warn_once(&qp->ndev->pdev->dev,
+ "Not enough SPADS to use MSI interrupts\n");
+   return;
+   }
+
+   ntb_spad_write(qp->ndev, spad, 0);
+   ntb_spad_write(qp->ndev, spad + 1, 0);
+
+   if (!qp->msi_irq) {
+   qp->msi_irq = ntbm_msi_request_irq(qp->ndev, ntb_transport_isr,
+  KBUILD_MODNAME, qp,
+  &qp->msi_desc);
+   if (qp->msi_irq < 0) {
+   dev_warn(&qp->ndev->pdev->dev,
+"Unable to allocate MSI interrupt for qp%d\n",
+qp_num);
+   return;
+   }
+   }
+
+   rc = ntb_spad_write(qp->ndev, spad, qp->msi_desc.addr_offset);
+   if (rc)
+   goto err_free_interrupt;
+
+   rc = ntb_spad_write(qp->ndev, spad + 1, qp->msi_desc.data);
+   if (rc)
+   goto err_free_interrupt;
+
+   dev_dbg(&qp->ndev->pdev->dev, "QP%d MSI %d addr=

[PATCH v4 10/10] NTB: Describe the ntb_msi_test client in the documentation.

2019-04-22 Thread Logan Gunthorpe
Add a blurb in Documentation/ntb.txt to describe the ntb_msi_test tool's
debugfs interface. Similar to the (out of date) ntb_tool description.

Signed-off-by: Logan Gunthorpe 
---
 Documentation/ntb.txt | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/Documentation/ntb.txt b/Documentation/ntb.txt
index a043854d28df..802a539f1592 100644
--- a/Documentation/ntb.txt
+++ b/Documentation/ntb.txt
@@ -194,6 +194,33 @@ Debugfs Files:
This file is used to read and write peer scratchpads.  See
*spad* for details.
 
+NTB MSI Test Client (ntb\_msi\_test)
+
+
+The MSI test client serves to test and debug the MSI library which
+allows for passing MSI interrupts across NTB memory windows. The
+test client is interacted with through the debugfs filesystem:
+
+* *debugfs*/ntb\_tool/*hw*/
+   A directory in debugfs will be created for each
+   NTB device probed by the tool.  This directory is shortened to *hw*
+   below.
+* *hw*/port
+   This file describes the local port number
+* *hw*/irq*_occurrences
+   One occurrences file exists for each interrupt and, when read,
+   returns the number of times the interrupt has been triggered.
+* *hw*/peer*/port
+   This file describes the port number for each peer
+* *hw*/peer*/count
+   This file describes the number of interrupts that can be
+   triggered on each peer
+* *hw*/peer*/trigger
+   Writing an interrupt number (any number less than the value
+   specified in count) will trigger the interrupt on the
+   specified peer. That peer's interrupt's occurrence file
+   should be incremented.
+
 NTB Hardware Drivers
 
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 06/10] NTB: Introduce MSI library

2019-04-22 Thread Logan Gunthorpe
The NTB MSI library allows passing MSI interrupts across a memory
window. This offers similar functionality to doorbells or messages
except will often have much better latency and the client can
potentially use significantly more remote interrupts than typical hardware
provides for doorbells. (Which can be important in high-multiport
setups.)

The library utilizes one memory window per peer and uses the highest
index memory windows. Before any ntb_msi function may be used, the user
must call ntb_msi_init(). It may then setup and tear down the memory
windows when the link state changes using ntb_msi_setup_mws() and
ntb_msi_clear_mws().

The peer which receives the interrupt must call ntb_msim_request_irq()
to assign the interrupt handler (this function is functionally
similar to devm_request_irq()) and the returned descriptor must be
transferred to the peer which can use it to trigger the interrupt.
The triggering peer, once having received the descriptor, can
trigger the interrupt by calling ntb_msi_peer_trigger().

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 drivers/ntb/Kconfig  |  11 ++
 drivers/ntb/Makefile |   3 +-
 drivers/ntb/msi.c| 415 +++
 include/linux/ntb.h  |  73 
 4 files changed, 501 insertions(+), 1 deletion(-)
 create mode 100644 drivers/ntb/msi.c

diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
index 95944e52fa36..5760764052be 100644
--- a/drivers/ntb/Kconfig
+++ b/drivers/ntb/Kconfig
@@ -12,6 +12,17 @@ menuconfig NTB
 
 if NTB
 
+config NTB_MSI
+   bool "MSI Interrupt Support"
+   depends on PCI_MSI
+   help
+Support using MSI interrupt forwarding instead of (or in addition to)
+hardware doorbells. MSI interrupts typically offer lower latency
+than doorbells and more MSI interrupts can be made available to
+clients. However this requires an extra memory window and support
+in the hardware driver for creating the MSI interrupts.
+
+If unsure, say N.
 source "drivers/ntb/hw/Kconfig"
 
 source "drivers/ntb/test/Kconfig"
diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
index 537226f8e78d..cc27ad2ef150 100644
--- a/drivers/ntb/Makefile
+++ b/drivers/ntb/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_NTB) += ntb.o hw/ test/
 obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport.o
 
-ntb-y := core.o
+ntb-y  := core.o
+ntb-$(CONFIG_NTB_MSI)  += msi.o
diff --git a/drivers/ntb/msi.c b/drivers/ntb/msi.c
new file mode 100644
index ..9dddf133658f
--- /dev/null
+++ b/drivers/ntb/msi.c
@@ -0,0 +1,415 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_VERSION("0.1");
+MODULE_AUTHOR("Logan Gunthorpe ");
+MODULE_DESCRIPTION("NTB MSI Interrupt Library");
+
+struct ntb_msi {
+   u64 base_addr;
+   u64 end_addr;
+
+   void (*desc_changed)(void *ctx);
+
+   u32 __iomem *peer_mws[];
+};
+
+/**
+ * ntb_msi_init() - Initialize the MSI context
+ * @ntb:   NTB device context
+ *
+ * This function must be called before any other ntb_msi function.
+ * It initializes the context for MSI operations and maps
+ * the peer memory windows.
+ *
+ * This function reserves the last N outbound memory windows (where N
+ * is the number of peers).
+ *
+ * Return: Zero on success, otherwise a negative error number.
+ */
+int ntb_msi_init(struct ntb_dev *ntb,
+void (*desc_changed)(void *ctx))
+{
+   phys_addr_t mw_phys_addr;
+   resource_size_t mw_size;
+   size_t struct_size;
+   int peer_widx;
+   int peers;
+   int ret;
+   int i;
+
+   peers = ntb_peer_port_count(ntb);
+   if (peers <= 0)
+   return -EINVAL;
+
+   struct_size = sizeof(*ntb->msi) + sizeof(*ntb->msi->peer_mws) * peers;
+
+   ntb->msi = devm_kzalloc(&ntb->dev, struct_size, GFP_KERNEL);
+   if (!ntb->msi)
+   return -ENOMEM;
+
+   ntb->msi->desc_changed = desc_changed;
+
+   for (i = 0; i < peers; i++) {
+   peer_widx = ntb_peer_mw_count(ntb) - 1 - i;
+
+   ret = ntb_peer_mw_get_addr(ntb, peer_widx, &mw_phys_addr,
+  &mw_size);
+   if (ret)
+   goto unroll;
+
+   ntb->msi->peer_mws[i] = devm_ioremap(&ntb->dev, mw_phys_addr,
+mw_size);
+   if (!ntb->msi->peer_mws[i]) {
+   ret = -EFAULT;
+   goto unroll;
+   }
+   }
+
+   return 0;
+
+unroll:
+   for (i = 0; i < peers; i++)
+   if (ntb->msi->peer_mws[i])
+   devm_iounmap(&ntb->dev, ntb->msi->peer_mws[i]);
+
+   devm_kfree(&ntb->dev, ntb->msi);
+   ntb->msi = NULL;
+   return ret;
+}
+EXPORT_SYMBOL(ntb_msi_init);
+
+/**
+ * ntb_msi_setup_mws() - In

[PATCH v4 07/10] NTB: Introduce NTB MSI Test Client

2019-04-22 Thread Logan Gunthorpe
Introduce a tool to test NTB MSI interrupts similar to the other
NTB test tools. This tool creates a debugfs directory for each
NTB device with the following files:

port
irqX_occurrences
peerX/port
peerX/count
peerX/trigger

The 'port' file tells the user the local port number and the
'occurrences' files tell the number of local interrupts that
have been received for each interrupt.

For each peer, the 'port' file and the 'count' file tell you the
peer's port number and number of interrupts respectively. Writing
the interrupt number to the 'trigger' file triggers the interrupt
handler for the peer which should increment their corresponding
'occurrences' file. The 'ready' file indicates if a peer is ready,
writing to this file blocks until it is ready.

The module parameter num_irqs can be used to set the number of
local interrupts. By default this is 4. This is only limited by
the number of unused MSI interrupts registered by the hardware
(this will require support of the hardware driver) and there must
be at least 2*num_irqs + 1 spads registers available.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 drivers/ntb/test/Kconfig|   9 +
 drivers/ntb/test/Makefile   |   1 +
 drivers/ntb/test/ntb_msi_test.c | 433 
 3 files changed, 443 insertions(+)
 create mode 100644 drivers/ntb/test/ntb_msi_test.c

diff --git a/drivers/ntb/test/Kconfig b/drivers/ntb/test/Kconfig
index a5d0eda44438..a3f3e2638935 100644
--- a/drivers/ntb/test/Kconfig
+++ b/drivers/ntb/test/Kconfig
@@ -25,3 +25,12 @@ config NTB_PERF
 to and from the window without additional software interaction.
 
 If unsure, say N.
+
+config NTB_MSI_TEST
+   tristate "NTB MSI Test Client"
+   depends on NTB_MSI
+   help
+ This tool demonstrates the use of the NTB MSI library to
+ send MSI interrupts between peers.
+
+ If unsure, say N.
diff --git a/drivers/ntb/test/Makefile b/drivers/ntb/test/Makefile
index 9e77e0b761c2..d2895ca995e4 100644
--- a/drivers/ntb/test/Makefile
+++ b/drivers/ntb/test/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_NTB_PINGPONG) += ntb_pingpong.o
 obj-$(CONFIG_NTB_TOOL) += ntb_tool.o
 obj-$(CONFIG_NTB_PERF) += ntb_perf.o
+obj-$(CONFIG_NTB_MSI_TEST) += ntb_msi_test.o
diff --git a/drivers/ntb/test/ntb_msi_test.c b/drivers/ntb/test/ntb_msi_test.c
new file mode 100644
index ..99d826ed9c34
--- /dev/null
+++ b/drivers/ntb/test/ntb_msi_test.c
@@ -0,0 +1,433 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_VERSION("0.1");
+MODULE_AUTHOR("Logan Gunthorpe ");
+MODULE_DESCRIPTION("Test for sending MSI interrupts over an NTB memory 
window");
+
+static int num_irqs = 4;
+module_param(num_irqs, int, 0644);
+MODULE_PARM_DESC(num_irqs, "number of irqs to use");
+
+struct ntb_msit_ctx {
+   struct ntb_dev *ntb;
+   struct dentry *dbgfs_dir;
+   struct work_struct setup_work;
+
+   struct ntb_msit_isr_ctx {
+   int irq_idx;
+   int irq_num;
+   int occurrences;
+   struct ntb_msit_ctx *nm;
+   struct ntb_msi_desc desc;
+   } *isr_ctx;
+
+   struct ntb_msit_peer {
+   struct ntb_msit_ctx *nm;
+   int pidx;
+   int num_irqs;
+   struct completion init_comp;
+   struct ntb_msi_desc *msi_desc;
+   } peers[];
+};
+
+static struct dentry *ntb_msit_dbgfs_topdir;
+
+static irqreturn_t ntb_msit_isr(int irq, void *dev)
+{
+   struct ntb_msit_isr_ctx *isr_ctx = dev;
+   struct ntb_msit_ctx *nm = isr_ctx->nm;
+
+   dev_dbg(&nm->ntb->dev, "Interrupt Occurred: %d",
+   isr_ctx->irq_idx);
+
+   isr_ctx->occurrences++;
+
+   return IRQ_HANDLED;
+}
+
+static void ntb_msit_setup_work(struct work_struct *work)
+{
+   struct ntb_msit_ctx *nm = container_of(work, struct ntb_msit_ctx,
+  setup_work);
+   int irq_count = 0;
+   int irq;
+   int ret;
+   uintptr_t i;
+
+   ret = ntb_msi_setup_mws(nm->ntb);
+   if (ret) {
+   dev_err(&nm->ntb->dev, "Unable to setup MSI windows: %d\n",
+   ret);
+   return;
+   }
+
+   for (i = 0; i < num_irqs; i++) {
+   nm->isr_ctx[i].irq_idx = i;
+   nm->isr_ctx[i].nm = nm;
+
+   if (!nm->isr_ctx[i].irq_num) {
+   irq = ntbm_msi_request_irq(nm->ntb, ntb_msit_isr,
+  KBUILD_MODNAME,
+  &nm->isr_ctx[i],
+  &nm->isr_ctx[i].desc);
+   if (irq < 0)
+   break;
+
+   nm->isr_ctx[i].irq_num = irq;
+   }
+
+ 

[PATCH v4 03/10] NTB: Introduce helper functions to calculate logical port number

2019-04-22 Thread Logan Gunthorpe
This patch introduces the "Logical Port Number" which is similar to the
"Port Number" in that it enumerates the ports in the system.

The original (or Physical) "Port Number" can be any number used by the
hardware to uniquely identify a port in the system. The "Logical Port
Number" enumerates all ports in the system from 0 to the number of
ports minus one.

For example a system with 5 ports might have the following port numbers
which would be enumerated thusly:

Port Number:   1  2  5  7  116
Logical Port Number:   0  1  2  3  4

The logical port number is useful when calculating which resources
to use for which peers. So we thus define two helper functions:
ntb_logical_port_number() and ntb_peer_logical_port_number() which
provide the "Logical Port Number" for the local port and any peer
respectively.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
Cc: Serge Semin 
---
 include/linux/ntb.h | 53 -
 1 file changed, 52 insertions(+), 1 deletion(-)

diff --git a/include/linux/ntb.h b/include/linux/ntb.h
index 56a92e3ae3ae..91cf492b16a0 100644
--- a/include/linux/ntb.h
+++ b/include/linux/ntb.h
@@ -616,7 +616,6 @@ static inline int ntb_port_number(struct ntb_dev *ntb)
 
return ntb->ops->port_number(ntb);
 }
-
 /**
  * ntb_peer_port_count() - get the number of peer device ports
  * @ntb:   NTB device context.
@@ -653,6 +652,58 @@ static inline int ntb_peer_port_number(struct ntb_dev 
*ntb, int pidx)
return ntb->ops->peer_port_number(ntb, pidx);
 }
 
+/**
+ * ntb_logical_port_number() - get the logical port number of the local port
+ * @ntb:   NTB device context.
+ *
+ * The Logical Port Number is defined to be a unique number for each
+ * port starting from zero through to the number of ports minus one.
+ * This is in contrast to the Port Number where each port can be assigned
+ * any unique physical number by the hardware.
+ *
+ * The logical port number is useful for calculating the resource indexes
+ * used by peers.
+ *
+ * Return: the logical port number or negative value indicating an error
+ */
+static inline int ntb_logical_port_number(struct ntb_dev *ntb)
+{
+   int lport = ntb_port_number(ntb);
+   int pidx;
+
+   if (lport < 0)
+   return lport;
+
+   for (pidx = 0; pidx < ntb_peer_port_count(ntb); pidx++)
+   if (lport <= ntb_peer_port_number(ntb, pidx))
+   return pidx;
+
+   return pidx;
+}
+
+/**
+ * ntb_peer_logical_port_number() - get the logical peer port by given index
+ * @ntb:   NTB device context.
+ * @pidx:  Peer port index.
+ *
+ * The Logical Port Number is defined to be a unique number for each
+ * port starting from zero through to the number of ports minus one.
+ * This is in contrast to the Port Number where each port can be assigned
+ * any unique physical number by the hardware.
+ *
+ * The logical port number is useful for calculating the resource indexes
+ * used by peers.
+ *
+ * Return: the peer's logical port number or negative value indicating an error
+ */
+static inline int ntb_peer_logical_port_number(struct ntb_dev *ntb, int pidx)
+{
+   if (ntb_peer_port_number(ntb, pidx) < ntb_port_number(ntb))
+   return pidx;
+   else
+   return pidx + 1;
+}
+
 /**
  * ntb_peer_port_idx() - get the peer device port index by given port number
  * @ntb:   NTB device context.
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 00/10] Support using MSI interrupts in ntb_transport

2019-04-22 Thread Logan Gunthorpe
This is version 4 of the MSI interrupts for ntb_transport patchset.
It's mostly a resend of v3 but with spelling and grammar fixes found by Bjorn.

I've addressed the feedback so far and rebased on the latest kernel
and would like this to be considered for merging this cycle.

The only outstanding issue I know of is that it still will not work
with IDT hardware, but ntb_transport doesn't work with IDT hardware
and there is still no sensible common infrastructure to support
ntb_peer_mw_set_trans(). Thus, I decline to consider that complication
in this patchset. However, I'll be happy to review work that adds this
feature in the future.

Also, as the port number and resource index stuff is a bit complicated,
I made a quick out of tree test fixture to ensure it's correct[1]. As
an excerise I also wrote some test code[2] using the upcomming KUnit
feature.

Logan

[1] https://repl.it/repls/ExcitingPresentFile
[2] https://github.com/sbates130272/linux-p2pmem/commits/ntb_kunit

--

Changes in v4:

* Rebased onto v5.1-rc6 (No changes)

* Numerous grammar and spelling mistakes spotted by Bjorn

--

Changes in v3:

* Rebased onto v5.1-rc1 (Dropped the first two patches as they have
  been merged, and cleaned up some minor conflicts in the PCI tree)

* Added a new patch (#3) to calculate logical port numbers that
  are port numbers from 0 to (number of ports - 1). This is
  then used in ntb_peer_resource_idx() to fix the issues brought
  up by Serge.

* Fixed missing __iomem and iowrite calls (as noticed by Serge)

* Added patch 10 which describes ntb_msi_test in the documentation
  file (as requested by Serge)

* A couple other minor nits and documentation fixes

--

Changes in v2:

* Cleaned up the changes in intel_irq_remapping.c to make them
  less confusing and add a comment. (Per discussion with Jacob and
  Joerg)

* Fixed a nit from Bjorn and collected his Ack

* Added a Kconfig dependancy on CONFIG_PCI_MSI for CONFIG_NTB_MSI
  as the Kbuild robot hit a random config that didn't build
  without it.

* Worked in a callback for when the MSI descriptor changes so that
  the clients can resend the new address and data values to the peer.
  On my test system this was never necessary, but there may be
  other platforms where this can occur. I tested this by hacking
  in a path to rewrite the MSI descriptor when I change the cpu
  affinity of an IRQ. There's a bit of uncertainty over the latency
  of the change, but without hardware this can acctually occur on
  we can't test this. This was the result of a discussion with Dave.

--

This patch series adds optional support for using MSI interrupts instead
of NTB doorbells in ntb_transport. This is desirable seeing doorbells on
current hardware are quite slow and therefore switching to MSI interrupts
provides a significant performance gain. On switchtec hardware, a simple
apples-to-apples comparison shows ntb_netdev/iperf numbers going from
3.88Gb/s to 14.1Gb/s when switching to MSI interrupts.

To do this, a couple changes are required outside of the NTB tree:

1) The IOMMU must know to accept MSI requests from aliased bused numbers
seeing NTB hardware typically sends proxied request IDs through
additional requester IDs. The first patch in this series adds support
for the Intel IOMMU. A quirk to add these aliases for switchtec hardware
was already accepted. See commit ad281ecf1c7d ("PCI: Add DMA alias quirk
for Microsemi Switchtec NTB") for a description of NTB proxy IDs and why
this is necessary.

2) NTB transport (and other clients) may often need more MSI interrupts
than the NTB hardware actually advertises support for. However, seeing
these interrupts will not be triggered by the hardware but through an
NTB memory window, the hardware does not actually need support or need
to know about them. Therefore we add the concept of Virtual MSI
interrupts which are allocated just like any other MSI interrupt but
are not programmed into the hardware's MSI table. This is done in
Patch 2 and then made use of in Patch 3.

The remaining patches in this series add a library for dealing with MSI
interrupts, a test client and finally support in ntb_transport.

The series is based off of v5.1-rc6 plus the patches in ntb-next.
A git repo is available here:

https://github.com/sbates130272/linux-p2pmem/ ntb_transport_msi_v4

Thanks,

Logan

--

Logan Gunthorpe (10):
  PCI/MSI: Support allocating virtual MSI interrupts
  PCI/switchtec: Add module parameter to request more interrupts
  NTB: Introduce helper functions to calculate logical port number
  NTB: Introduce functions to calculate multi-port resource index
  NTB: Rename ntb.c to support multiple source files in the module
  NTB: Introduce MSI library
  NTB: Introduce NTB MSI Test Client
  NTB: Add ntb_msi_test support to ntb_test
  NTB: Add MSI interrupt support to ntb_transport
  NTB: Describe the ntb_msi_test client in the documentation.

 Documentation/ntb.txt   |  27 ++
 drivers/ntb/Kconfig  

[PATCH v4 05/10] NTB: Rename ntb.c to support multiple source files in the module

2019-04-22 Thread Logan Gunthorpe
The kbuild system does not support having multiple source files in
a module if one of those source files has the same name as the module.

Therefore, we must rename ntb.c to core.c, while the module remains
ntb.ko.

This is similar to the way the nvme modules are structured.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 drivers/ntb/Makefile  | 2 ++
 drivers/ntb/{ntb.c => core.c} | 0
 2 files changed, 2 insertions(+)
 rename drivers/ntb/{ntb.c => core.c} (100%)

diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
index 1921dec1949d..537226f8e78d 100644
--- a/drivers/ntb/Makefile
+++ b/drivers/ntb/Makefile
@@ -1,2 +1,4 @@
 obj-$(CONFIG_NTB) += ntb.o hw/ test/
 obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport.o
+
+ntb-y := core.o
diff --git a/drivers/ntb/ntb.c b/drivers/ntb/core.c
similarity index 100%
rename from drivers/ntb/ntb.c
rename to drivers/ntb/core.c
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 02/10] PCI/switchtec: Add module parameter to request more interrupts

2019-04-22 Thread Logan Gunthorpe
Seeing the we want to use more interrupts in the NTB MSI code
we need to be able allocate more (sometimes virtual) interrupts
in the switchtec driver. Therefore add a module parameter to
request to allocate additional interrupts.

This puts virtually no limit on the number of MSI interrupts available
to NTB clients.

Signed-off-by: Logan Gunthorpe 
Cc: Bjorn Helgaas 
---
 drivers/pci/switch/switchtec.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index e22766c79fe9..8b1db78197d9 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -30,6 +30,10 @@ module_param(use_dma_mrpc, bool, 0644);
 MODULE_PARM_DESC(use_dma_mrpc,
 "Enable the use of the DMA MRPC feature");
 
+static int nirqs = 32;
+module_param(nirqs, int, 0644);
+MODULE_PARM_DESC(nirqs, "number of interrupts to allocate (more may be useful 
for NTB applications)");
+
 static dev_t switchtec_devt;
 static DEFINE_IDA(switchtec_minor_ida);
 
@@ -1247,8 +1251,12 @@ static int switchtec_init_isr(struct switchtec_dev 
*stdev)
int dma_mrpc_irq;
int rc;
 
-   nvecs = pci_alloc_irq_vectors(stdev->pdev, 1, 4,
- PCI_IRQ_MSIX | PCI_IRQ_MSI);
+   if (nirqs < 4)
+   nirqs = 4;
+
+   nvecs = pci_alloc_irq_vectors(stdev->pdev, 1, nirqs,
+ PCI_IRQ_MSIX | PCI_IRQ_MSI |
+ PCI_IRQ_VIRTUAL);
if (nvecs < 0)
return nvecs;
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 04/10] NTB: Introduce functions to calculate multi-port resource index

2019-04-22 Thread Logan Gunthorpe
When using multi-ports each port uses resources (dbs, msgs, mws, etc)
on every other port. Creating a mapping for these resources such that
each port has a corresponding resource on every other port is a bit
tricky.

Introduce the ntb_peer_resource_idx() function for this purpose.
It returns the peer resource number that will correspond with the
local peer index on the remote peer.

Also, introduce ntb_peer_highest_mw_idx() which will use
ntb_peer_resource_idx() but return the MW index starting with the
highest index and working down.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 include/linux/ntb.h | 70 +
 1 file changed, 70 insertions(+)

diff --git a/include/linux/ntb.h b/include/linux/ntb.h
index 91cf492b16a0..66552830544b 100644
--- a/include/linux/ntb.h
+++ b/include/linux/ntb.h
@@ -1557,4 +1557,74 @@ static inline int ntb_peer_msg_write(struct ntb_dev 
*ntb, int pidx, int midx,
return ntb->ops->peer_msg_write(ntb, pidx, midx, msg);
 }
 
+/**
+ * ntb_peer_resource_idx() - get a resource index for a given peer idx
+ * @ntb:   NTB device context.
+ * @pidx:  Peer port index.
+ *
+ * When constructing a graph of peers, each remote peer must use a different
+ * resource index (mw, doorbell, etc) to communicate with each other
+ * peer.
+ *
+ * In a two peer system, this function should always return 0 such that
+ * resource 0 points to the remote peer on both ports.
+ *
+ * In a 5 peer system, this function will return the following matrix
+ *
+ * pidx \ port01234
+ * 0  00123
+ * 1  01123
+ * 2  01223
+ * 3  01233
+ *
+ * For example, if this function is used to program peer's memory
+ * windows, port 0 will program MW 0 on all it's peers to point to itself.
+ * port 1 will program MW 0 in port 0 to point to itself and MW 1 on all
+ * other ports. etc.
+ *
+ * For the legacy two host case, ntb_port_number() and ntb_peer_port_number()
+ * both return zero and therefore this function will always return zero.
+ * So MW 0 on each host would be programmed to point to the other host.
+ *
+ * Return: the resource index to use for that peer.
+ */
+static inline int ntb_peer_resource_idx(struct ntb_dev *ntb, int pidx)
+{
+   int local_port, peer_port;
+
+   if (pidx >= ntb_peer_port_count(ntb))
+   return -EINVAL;
+
+   local_port = ntb_logical_port_number(ntb);
+   peer_port = ntb_peer_logical_port_number(ntb, pidx);
+
+   if (peer_port < local_port)
+   return local_port - 1;
+   else
+   return local_port;
+}
+
+/**
+ * ntb_peer_highest_mw_idx() - get a memory window index for a given peer idx
+ * using the highest index memory windows first
+ *
+ * @ntb:   NTB device context.
+ * @pidx:  Peer port index.
+ *
+ * Like ntb_peer_resource_idx(), except it returns indexes starting with
+ * last memory window index.
+ *
+ * Return: the resource index to use for that peer.
+ */
+static inline int ntb_peer_highest_mw_idx(struct ntb_dev *ntb, int pidx)
+{
+   int ret;
+
+   ret = ntb_peer_resource_idx(ntb, pidx);
+   if (ret < 0)
+   return ret;
+
+   return ntb_mw_count(ntb, pidx) - ret - 1;
+}
+
 #endif
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 01/10] PCI/MSI: Support allocating virtual MSI interrupts

2019-04-22 Thread Logan Gunthorpe
For NTB devices, we want to be able to trigger MSI interrupts
through a memory window. In these cases we may want to use
more interrupts than the NTB PCI device has available in its MSI-X
table.

We allow for this by creating a new 'virtual' interrupt. These
interrupts are allocated as usual but are not programmed into the
MSI-X table (as there may not be space for them).

The MSI address and data will then handled through an NTB MSI library
introduced later in this series.

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/msi.c   | 54 +
 include/linux/msi.h |  8 +++
 include/linux/pci.h |  9 
 3 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 73986825d221..668bc16ef4d1 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -192,6 +192,9 @@ static void msi_mask_irq(struct msi_desc *desc, u32 mask, 
u32 flag)
 
 static void __iomem *pci_msix_desc_addr(struct msi_desc *desc)
 {
+   if (desc->msi_attrib.is_virtual)
+   return NULL;
+
return desc->mask_base +
desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
 }
@@ -206,14 +209,19 @@ static void __iomem *pci_msix_desc_addr(struct msi_desc 
*desc)
 u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag)
 {
u32 mask_bits = desc->masked;
+   void __iomem *desc_addr;
 
if (pci_msi_ignore_mask)
return 0;
+   desc_addr = pci_msix_desc_addr(desc);
+   if (!desc_addr)
+   return 0;
 
mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
if (flag)
mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
-   writel(mask_bits, pci_msix_desc_addr(desc) + 
PCI_MSIX_ENTRY_VECTOR_CTRL);
+
+   writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
 
return mask_bits;
 }
@@ -273,6 +281,11 @@ void __pci_read_msi_msg(struct msi_desc *entry, struct 
msi_msg *msg)
if (entry->msi_attrib.is_msix) {
void __iomem *base = pci_msix_desc_addr(entry);
 
+   if (!base) {
+   WARN_ON(1);
+   return;
+   }
+
msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR);
msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR);
msg->data = readl(base + PCI_MSIX_ENTRY_DATA);
@@ -303,6 +316,9 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct 
msi_msg *msg)
} else if (entry->msi_attrib.is_msix) {
void __iomem *base = pci_msix_desc_addr(entry);
 
+   if (!base)
+   goto skip;
+
writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR);
writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR);
writel(msg->data, base + PCI_MSIX_ENTRY_DATA);
@@ -327,7 +343,13 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct 
msi_msg *msg)
  msg->data);
}
}
+
+skip:
entry->msg = *msg;
+
+   if (entry->write_msi_msg)
+   entry->write_msi_msg(entry, entry->write_msi_msg_data);
+
 }
 
 void pci_write_msi_msg(unsigned int irq, struct msi_msg *msg)
@@ -550,6 +572,7 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct 
irq_affinity *affd)
 
entry->msi_attrib.is_msix   = 0;
entry->msi_attrib.is_64 = !!(control & PCI_MSI_FLAGS_64BIT);
+   entry->msi_attrib.is_virtual= 0;
entry->msi_attrib.entry_nr  = 0;
entry->msi_attrib.maskbit   = !!(control & PCI_MSI_FLAGS_MASKBIT);
entry->msi_attrib.default_irq   = dev->irq; /* Save IOAPIC IRQ */
@@ -674,6 +697,7 @@ static int msix_setup_entries(struct pci_dev *dev, void 
__iomem *base,
struct irq_affinity_desc *curmsk, *masks = NULL;
struct msi_desc *entry;
int ret, i;
+   int vec_count = pci_msix_vec_count(dev);
 
if (affd)
masks = irq_create_affinity_masks(nvec, affd);
@@ -696,6 +720,10 @@ static int msix_setup_entries(struct pci_dev *dev, void 
__iomem *base,
entry->msi_attrib.entry_nr = entries[i].entry;
else
entry->msi_attrib.entry_nr = i;
+
+   entry->msi_attrib.is_virtual =
+   entry->msi_attrib.entry_nr >= vec_count;
+
entry->msi_attrib.default_irq   = dev->irq;
entry->mask_base= base;
 
@@ -714,12 +742,19 @@ static void msix_program_entries(struct pci_dev *dev,
 {
struct msi_desc *entry;
int i = 0;
+   void __iomem *desc_addr;
 
for_each_pci_msi_entry(entry, dev) {
if (entries)
entries[i++].vector = entry->irq;
-   entry->masked = readl(pci_msix_desc_addr(entry) +
-   PCI_MSIX_ENTRY_VECTOR_CTRL);
+
+

[PATCH v4 08/10] NTB: Add ntb_msi_test support to ntb_test

2019-04-22 Thread Logan Gunthorpe
When the ntb_msi_test module is available, the test code will trigger
each of the interrupts and ensure the corresponding occurrences files
gets incremented.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 tools/testing/selftests/ntb/ntb_test.sh | 54 -
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/ntb/ntb_test.sh 
b/tools/testing/selftests/ntb/ntb_test.sh
index 17ca36403d04..1a10b8f67727 100755
--- a/tools/testing/selftests/ntb/ntb_test.sh
+++ b/tools/testing/selftests/ntb/ntb_test.sh
@@ -87,10 +87,10 @@ set -e
 
 function _modprobe()
 {
-   modprobe "$@"
+   modprobe "$@" || return 1
 
if [[ "$REMOTE_HOST" != "" ]]; then
-   ssh "$REMOTE_HOST" modprobe "$@"
+   ssh "$REMOTE_HOST" modprobe "$@" || return 1
fi
 }
 
@@ -451,6 +451,30 @@ function pingpong_test()
echo "  Passed"
 }
 
+function msi_test()
+{
+   LOC=$1
+   REM=$2
+
+   write_file 1 $LOC/ready
+
+   echo "Running MSI interrupt tests on: $(subdirname $LOC) / $(subdirname 
$REM)"
+
+   CNT=$(read_file "$LOC/count")
+   for ((i = 0; i < $CNT; i++)); do
+   START=$(read_file $REM/../irq${i}_occurrences)
+   write_file $i $LOC/trigger
+   END=$(read_file $REM/../irq${i}_occurrences)
+
+   if [[ $(($END - $START)) != 1 ]]; then
+   echo "MSI did not trigger the interrupt on the remote 
side!" >&2
+   exit 1
+   fi
+   done
+
+   echo "  Passed"
+}
+
 function perf_test()
 {
USE_DMA=$1
@@ -529,6 +553,29 @@ function ntb_pingpong_tests()
_modprobe -r ntb_pingpong
 }
 
+function ntb_msi_tests()
+{
+   LOCAL_MSI="$DEBUGFS/ntb_msi_test/$LOCAL_DEV"
+   REMOTE_MSI="$REMOTE_HOST:$DEBUGFS/ntb_msi_test/$REMOTE_DEV"
+
+   echo "Starting ntb_msi_test tests..."
+
+   if ! _modprobe ntb_msi_test 2> /dev/null; then
+   echo "  Not doing MSI tests seeing the module is not available."
+   return
+   fi
+
+   port_test $LOCAL_MSI $REMOTE_MSI
+
+   LOCAL_PEER="$LOCAL_MSI/peer$LOCAL_PIDX"
+   REMOTE_PEER="$REMOTE_MSI/peer$REMOTE_PIDX"
+
+   msi_test $LOCAL_PEER $REMOTE_PEER
+   msi_test $REMOTE_PEER $LOCAL_PEER
+
+   _modprobe -r ntb_msi_test
+}
+
 function ntb_perf_tests()
 {
LOCAL_PERF="$DEBUGFS/ntb_perf/$LOCAL_DEV"
@@ -550,6 +597,7 @@ function cleanup()
_modprobe -r ntb_perf 2> /dev/null
_modprobe -r ntb_pingpong 2> /dev/null
_modprobe -r ntb_transport 2> /dev/null
+   _modprobe -r ntb_msi_test 2> /dev/null
set -e
 }
 
@@ -586,5 +634,7 @@ ntb_tool_tests
 echo
 ntb_pingpong_tests
 echo
+ntb_msi_tests
+echo
 ntb_perf_tests
 echo
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 4.14 41/43] iommu/amd: Reserve exclusion range in iova-domain

2019-04-22 Thread Sasha Levin
From: Joerg Roedel 

[ Upstream commit 8aafaaf2212192012f5bae305bb31cdf7681d777 ]

If a device has an exclusion range specified in the IVRS
table, this region needs to be reserved in the iova-domain
of that device. This hasn't happened until now and can cause
data corruption on data transfered with these devices.

Treat exclusion ranges as reserved regions in the iommu-core
to fix the problem.

Fixes: be2a022c0dd0 ('x86, AMD IOMMU: add functions to parse IOMMU memory 
mapping requirements for devices')
Signed-off-by: Joerg Roedel 
Reviewed-by: Gary R Hook 
Signed-off-by: Sasha Levin (Microsoft) 
---
 drivers/iommu/amd_iommu.c   | 9 ++---
 drivers/iommu/amd_iommu_init.c  | 7 ---
 drivers/iommu/amd_iommu_types.h | 2 ++
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index bd339bfe0d15..684f7cdd814b 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3127,21 +3127,24 @@ static void amd_iommu_get_resv_regions(struct device 
*dev,
return;
 
list_for_each_entry(entry, &amd_iommu_unity_map, list) {
+   int type, prot = 0;
size_t length;
-   int prot = 0;
 
if (devid < entry->devid_start || devid > entry->devid_end)
continue;
 
+   type   = IOMMU_RESV_DIRECT;
length = entry->address_end - entry->address_start;
if (entry->prot & IOMMU_PROT_IR)
prot |= IOMMU_READ;
if (entry->prot & IOMMU_PROT_IW)
prot |= IOMMU_WRITE;
+   if (entry->prot & IOMMU_UNITY_MAP_FLAG_EXCL_RANGE)
+   /* Exclusion range */
+   type = IOMMU_RESV_RESERVED;
 
region = iommu_alloc_resv_region(entry->address_start,
-length, prot,
-IOMMU_RESV_DIRECT);
+length, prot, type);
if (!region) {
pr_err("Out of memory allocating dm-regions for %s\n",
dev_name(dev));
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index b97984a5ddad..91d7718625a6 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1980,6 +1980,9 @@ static int __init init_unity_map_range(struct ivmd_header 
*m)
if (e == NULL)
return -ENOMEM;
 
+   if (m->flags & IVMD_FLAG_EXCL_RANGE)
+   init_exclusion_range(m);
+
switch (m->type) {
default:
kfree(e);
@@ -2026,9 +2029,7 @@ static int __init init_memory_definitions(struct 
acpi_table_header *table)
 
while (p < end) {
m = (struct ivmd_header *)p;
-   if (m->flags & IVMD_FLAG_EXCL_RANGE)
-   init_exclusion_range(m);
-   else if (m->flags & IVMD_FLAG_UNITY_MAP)
+   if (m->flags & (IVMD_FLAG_UNITY_MAP | IVMD_FLAG_EXCL_RANGE))
init_unity_map_range(m);
 
p += m->length;
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index f6b24c7d8b70..3054c0971759 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -369,6 +369,8 @@
 #define IOMMU_PROT_IR 0x01
 #define IOMMU_PROT_IW 0x02
 
+#define IOMMU_UNITY_MAP_FLAG_EXCL_RANGE(1 << 2)
+
 /* IOMMU capabilities */
 #define IOMMU_CAP_IOTLB   24
 #define IOMMU_CAP_NPCACHE 26
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 4.19 65/68] iommu/amd: Reserve exclusion range in iova-domain

2019-04-22 Thread Sasha Levin
From: Joerg Roedel 

[ Upstream commit 8aafaaf2212192012f5bae305bb31cdf7681d777 ]

If a device has an exclusion range specified in the IVRS
table, this region needs to be reserved in the iova-domain
of that device. This hasn't happened until now and can cause
data corruption on data transfered with these devices.

Treat exclusion ranges as reserved regions in the iommu-core
to fix the problem.

Fixes: be2a022c0dd0 ('x86, AMD IOMMU: add functions to parse IOMMU memory 
mapping requirements for devices')
Signed-off-by: Joerg Roedel 
Reviewed-by: Gary R Hook 
Signed-off-by: Sasha Levin (Microsoft) 
---
 drivers/iommu/amd_iommu.c   | 9 ++---
 drivers/iommu/amd_iommu_init.c  | 7 ---
 drivers/iommu/amd_iommu_types.h | 2 ++
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 0b3877681e4a..8d9920ff4134 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3119,21 +3119,24 @@ static void amd_iommu_get_resv_regions(struct device 
*dev,
return;
 
list_for_each_entry(entry, &amd_iommu_unity_map, list) {
+   int type, prot = 0;
size_t length;
-   int prot = 0;
 
if (devid < entry->devid_start || devid > entry->devid_end)
continue;
 
+   type   = IOMMU_RESV_DIRECT;
length = entry->address_end - entry->address_start;
if (entry->prot & IOMMU_PROT_IR)
prot |= IOMMU_READ;
if (entry->prot & IOMMU_PROT_IW)
prot |= IOMMU_WRITE;
+   if (entry->prot & IOMMU_UNITY_MAP_FLAG_EXCL_RANGE)
+   /* Exclusion range */
+   type = IOMMU_RESV_RESERVED;
 
region = iommu_alloc_resv_region(entry->address_start,
-length, prot,
-IOMMU_RESV_DIRECT);
+length, prot, type);
if (!region) {
pr_err("Out of memory allocating dm-regions for %s\n",
dev_name(dev));
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index e062ab9687c7..be3801d43d48 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2001,6 +2001,9 @@ static int __init init_unity_map_range(struct ivmd_header 
*m)
if (e == NULL)
return -ENOMEM;
 
+   if (m->flags & IVMD_FLAG_EXCL_RANGE)
+   init_exclusion_range(m);
+
switch (m->type) {
default:
kfree(e);
@@ -2047,9 +2050,7 @@ static int __init init_memory_definitions(struct 
acpi_table_header *table)
 
while (p < end) {
m = (struct ivmd_header *)p;
-   if (m->flags & IVMD_FLAG_EXCL_RANGE)
-   init_exclusion_range(m);
-   else if (m->flags & IVMD_FLAG_UNITY_MAP)
+   if (m->flags & (IVMD_FLAG_UNITY_MAP | IVMD_FLAG_EXCL_RANGE))
init_unity_map_range(m);
 
p += m->length;
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index e2b342e65a7b..69f3d4c95b53 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -373,6 +373,8 @@
 #define IOMMU_PROT_IR 0x01
 #define IOMMU_PROT_IW 0x02
 
+#define IOMMU_UNITY_MAP_FLAG_EXCL_RANGE(1 << 2)
+
 /* IOMMU capabilities */
 #define IOMMU_CAP_IOTLB   24
 #define IOMMU_CAP_NPCACHE 26
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.0 94/98] iommu/amd: Reserve exclusion range in iova-domain

2019-04-22 Thread Sasha Levin
From: Joerg Roedel 

[ Upstream commit 8aafaaf2212192012f5bae305bb31cdf7681d777 ]

If a device has an exclusion range specified in the IVRS
table, this region needs to be reserved in the iova-domain
of that device. This hasn't happened until now and can cause
data corruption on data transfered with these devices.

Treat exclusion ranges as reserved regions in the iommu-core
to fix the problem.

Fixes: be2a022c0dd0 ('x86, AMD IOMMU: add functions to parse IOMMU memory 
mapping requirements for devices')
Signed-off-by: Joerg Roedel 
Reviewed-by: Gary R Hook 
Signed-off-by: Sasha Levin (Microsoft) 
---
 drivers/iommu/amd_iommu.c   | 9 ++---
 drivers/iommu/amd_iommu_init.c  | 7 ---
 drivers/iommu/amd_iommu_types.h | 2 ++
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index e628ef23418f..55b3e4b9d5dc 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3166,21 +3166,24 @@ static void amd_iommu_get_resv_regions(struct device 
*dev,
return;
 
list_for_each_entry(entry, &amd_iommu_unity_map, list) {
+   int type, prot = 0;
size_t length;
-   int prot = 0;
 
if (devid < entry->devid_start || devid > entry->devid_end)
continue;
 
+   type   = IOMMU_RESV_DIRECT;
length = entry->address_end - entry->address_start;
if (entry->prot & IOMMU_PROT_IR)
prot |= IOMMU_READ;
if (entry->prot & IOMMU_PROT_IW)
prot |= IOMMU_WRITE;
+   if (entry->prot & IOMMU_UNITY_MAP_FLAG_EXCL_RANGE)
+   /* Exclusion range */
+   type = IOMMU_RESV_RESERVED;
 
region = iommu_alloc_resv_region(entry->address_start,
-length, prot,
-IOMMU_RESV_DIRECT);
+length, prot, type);
if (!region) {
pr_err("Out of memory allocating dm-regions for %s\n",
dev_name(dev));
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 66123b911ec8..84fa5b22371e 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2013,6 +2013,9 @@ static int __init init_unity_map_range(struct ivmd_header 
*m)
if (e == NULL)
return -ENOMEM;
 
+   if (m->flags & IVMD_FLAG_EXCL_RANGE)
+   init_exclusion_range(m);
+
switch (m->type) {
default:
kfree(e);
@@ -2059,9 +2062,7 @@ static int __init init_memory_definitions(struct 
acpi_table_header *table)
 
while (p < end) {
m = (struct ivmd_header *)p;
-   if (m->flags & IVMD_FLAG_EXCL_RANGE)
-   init_exclusion_range(m);
-   else if (m->flags & IVMD_FLAG_UNITY_MAP)
+   if (m->flags & (IVMD_FLAG_UNITY_MAP | IVMD_FLAG_EXCL_RANGE))
init_unity_map_range(m);
 
p += m->length;
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index eae0741f72dc..87965e4d9647 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -374,6 +374,8 @@
 #define IOMMU_PROT_IR 0x01
 #define IOMMU_PROT_IW 0x02
 
+#define IOMMU_UNITY_MAP_FLAG_EXCL_RANGE(1 << 2)
+
 /* IOMMU capabilities */
 #define IOMMU_CAP_IOTLB   24
 #define IOMMU_CAP_NPCACHE 26
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-22 Thread Khalid Aziz
On 4/18/19 8:34 AM, Khalid Aziz wrote:
> On 4/17/19 11:41 PM, Kees Cook wrote:
>> On Wed, Apr 17, 2019 at 11:41 PM Andy Lutomirski  wrote:
>>> I don't think this type of NX goof was ever the argument for XPFO.
>>> The main argument I've heard is that a malicious user program writes a
>>> ROP payload into user memory (regular anonymous user memory) and then
>>> gets the kernel to erroneously set RSP (*not* RIP) to point there.
>>
>> Well, more than just ROP. Any of the various attack primitives. The NX
>> stuff is about moving RIP: SMEP-bypassing. But there is still basic
>> SMAP-bypassing for putting a malicious structure in userspace and
>> having the kernel access it via the linear mapping, etc.
>>
>>> I find this argument fairly weak for a couple reasons.  First, if
>>> we're worried about this, let's do in-kernel CFI, not XPFO, to
>>
>> CFI is getting much closer. Getting the kernel happy under Clang, LTO,
>> and CFI is under active development. (It's functional for arm64
>> already, and pieces have been getting upstreamed.)
>>
> 
> CFI theoretically offers protection with fairly low overhead. I have not
> played much with CFI in clang. I agree with Linus that probability of
> bugs in XPFO implementation itself is a cause of concern. If CFI in
> Clang can provide us the same level of protection as XPFO does, I
> wouldn't want to push for an expensive change like XPFO.
> 
> If Clang/CFI can't get us there for extended period of time, does it
> make sense to continue to poke at XPFO?

Any feedback on continued effort on XPFO? If it makes sense to have XPFO
available as a solution for ret2dir issue in case Clang/CFI does not
work out, I will continue to refine it.

--
Khalid

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 26/26] arm64: trim includes in dma-mapping.c

2019-04-22 Thread Christoph Hellwig
With most of the previous functionality now elsewhere a lot of the
headers included in this file are not needed.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 184ef9ccd69d..15bd768ceb7e 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -5,20 +5,9 @@
  */
 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
 #include 
-#include 
-#include 
-#include 
-
 #include 
 
 pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 25/26] arm64: switch copyright boilerplace to SPDX in dma-mapping.c

2019-04-22 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
Acked-by: Robin Murphy 
Reviewed-by: Mukesh Ojha 
---
 arch/arm64/mm/dma-mapping.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index d1661f78eb4d..184ef9ccd69d 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -1,20 +1,7 @@
+// SPDX-License-Identifier: GPL-2.0
 /*
- * SWIOTLB-based DMA API implementation
- *
  * Copyright (C) 2012 ARM Ltd.
  * Author: Catalin Marinas 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see .
  */
 
 #include 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 24/26] iommu/dma: Switch copyright boilerplace to SPDX

2019-04-22 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
Acked-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 13 +
 include/linux/dma-iommu.h | 13 +
 2 files changed, 2 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 278a9a960107..b6bc342a8163 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
 /*
  * A fairly generic DMA-API to IOMMU-API glue layer.
  *
@@ -5,18 +6,6 @@
  *
  * based in part on arch/arm/mm/dma-mapping.c:
  * Copyright (C) 2000-2004 Russell King
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see .
  */
 
 #include 
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index dadf4383f555..3fc76918e9bf 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -1,17 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
  * Copyright (C) 2014-2015 ARM Ltd.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see .
  */
 #ifndef __DMA_IOMMU_H
 #define __DMA_IOMMU_H
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] dma-mapping: create iommu mapping for newly allocated dma coherent mem

2019-04-22 Thread Christoph Hellwig
On Mon, Apr 22, 2019 at 07:51:25PM +0300, laurentiu.tu...@nxp.com wrote:
> From: Laurentiu Tudor 
> 
> If possible / available call into the DMA API to get a proper iommu
> mapping and a dma address for the newly allocated coherent dma memory.

I don't think this is so simple.  The original use case of
dma_declare_coherent_memory was memory that is local to a device, where
we copy in data through a MMIO mapping and the device can then access
it.  This use case still seems to be alive in the ohci-sm501 and
ohci-tmio drivers.  Going through the iommu in those cases would be
counter productive.

The other use cases, including the OF ones seem to be (and Marek who
added that support should correct me if I'm wrong) normal host DRAM
that is just set aside for the device.

So if we want to support these prealloc pools with an iommu we need to
split the use cases.  I have to say I really hate the way we do the DMA
"coherent" allocations, so I'm all in favor of that, it just hasn't
bubbled up towards the front of my todo list yet.

My highlevel plan here would be:

 a) move the two OHCI drivers away from our current DMA declare
coherent code, and just use a trivial genalloc allocator for
their MMIO space, given that the USB layer already wraps the
dma_alloc/free APIs anyway
 b) move the rest of the DMA coherent stuff into the actual dma
map ops, similar to how we handle CMA allocations.  In fact
we should probably do this by a new page allocation helper
that tries CMA, "coherent" or the page allocator as fallback
 b) also merge the OF-side handling of CMA vs "coherent" into a
single implementation.  Preferably dropping the misleading
"coherent" name while we are at it.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/21] dma-iommu: factor atomic pool allocations into helpers

2019-04-22 Thread Christoph Hellwig
On Thu, Apr 18, 2019 at 07:15:00PM +0100, Robin Murphy wrote:
> Still, I've worked in the vm_map_pages() stuff pending in MM and given them 
> the same treatment to finish the picture.

I had to drop this - given that the changes are only in -mm and not
an actual git tree we can't pull a tree for that in.  But on the plus
side we can trivially replace __iommu_dma_mmap with vm_map_pages at
any later point in time.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 22/26] iommu/dma: Refactor iommu_dma_mmap

2019-04-22 Thread Christoph Hellwig
Inline __iommu_dma_mmap_pfn into the main function, and use the
fact that __iommu_dma_get_pages return NULL for remapped contigous
allocations to simplify the code flow a bit.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 36 +++-
 1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 138b85e675c8..8fc6098c1eeb 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1025,21 +1025,12 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
return cpu_addr;
 }
 
-static int __iommu_dma_mmap_pfn(struct vm_area_struct *vma,
- unsigned long pfn, size_t size)
-{
-   return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
-  vma->vm_end - vma->vm_start,
-  vma->vm_page_prot);
-}
-
 static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
 {
unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   unsigned long off = vma->vm_pgoff;
-   struct page **pages;
+   unsigned long pfn, off = vma->vm_pgoff;
int ret;
 
vma->vm_page_prot = arch_dma_mmap_pgprot(dev, vma->vm_page_prot, attrs);
@@ -1050,24 +1041,19 @@ static int iommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
if (off >= nr_pages || vma_pages(vma) > nr_pages - off)
return -ENXIO;
 
-   if (!is_vmalloc_addr(cpu_addr)) {
-   unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr));
-   return __iommu_dma_mmap_pfn(vma, pfn, size);
-   }
+   if (is_vmalloc_addr(cpu_addr)) {
+   struct page **pages = __iommu_dma_get_pages(cpu_addr);
 
-   if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   /*
-* DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
-* hence in the vmalloc space.
-*/
-   unsigned long pfn = vmalloc_to_pfn(cpu_addr);
-   return __iommu_dma_mmap_pfn(vma, pfn, size);
+   if (pages)
+   return __iommu_dma_mmap(pages, size, vma);
+   pfn = vmalloc_to_pfn(cpu_addr);
+   } else {
+   pfn = page_to_pfn(virt_to_page(cpu_addr));
}
 
-   pages = __iommu_dma_get_pages(cpu_addr);
-   if (WARN_ON_ONCE(!pages))
-   return -ENXIO;
-   return __iommu_dma_mmap(pages, size, vma);
+   return remap_pfn_range(vma, vma->vm_start, pfn + off,
+  vma->vm_end - vma->vm_start,
+  vma->vm_page_prot);
 }
 
 static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 23/26] iommu/dma: Don't depend on CONFIG_DMA_DIRECT_REMAP

2019-04-22 Thread Christoph Hellwig
For entirely dma coherent architectures there is no requirement to ever
remap dma coherent allocation.  Move all the remap and pool code under
IS_ENABLED() checks and drop the Kconfig dependency.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/Kconfig |  1 -
 drivers/iommu/dma-iommu.c | 16 +---
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index bdc14baf2ee5..6f07f3b21816 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -95,7 +95,6 @@ config IOMMU_DMA
select IOMMU_API
select IOMMU_IOVA
select NEED_SG_DMA_LENGTH
-   depends on DMA_DIRECT_REMAP
 
 config FSL_PAMU
bool "Freescale IOMMU support"
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8fc6098c1eeb..278a9a960107 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -923,10 +923,11 @@ static void __iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr)
struct page *page = NULL;
 
/* Non-coherent atomic allocation? Easy */
-   if (dma_free_from_pool(cpu_addr, alloc_size))
+   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+   dma_free_from_pool(cpu_addr, alloc_size))
return;
 
-   if (is_vmalloc_addr(cpu_addr)) {
+   if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
/*
 * If it the address is remapped, then it's either non-coherent
 * or highmem CMA, or an iommu_dma_alloc_remap() construction.
@@ -972,7 +973,7 @@ static void *iommu_dma_alloc_pages(struct device *dev, 
size_t size,
if (!page)
return NULL;
 
-   if (!coherent || PageHighMem(page)) {
+   if (IS_ENABLED(CONFIG_DMA_REMAP) && (!coherent || PageHighMem(page))) {
pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
 
cpu_addr = dma_common_contiguous_remap(page, alloc_size,
@@ -1005,11 +1006,12 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 
gfp |= __GFP_ZERO;
 
-   if (gfpflags_allow_blocking(gfp) &&
+   if (IS_ENABLED(CONFIG_DMA_REMAP) && gfpflags_allow_blocking(gfp) &&
!(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
return iommu_dma_alloc_remap(dev, size, handle, gfp, attrs);
 
-   if (!gfpflags_allow_blocking(gfp) && !coherent)
+   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+   !gfpflags_allow_blocking(gfp) && !coherent)
cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp);
else
cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs);
@@ -1041,7 +1043,7 @@ static int iommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
if (off >= nr_pages || vma_pages(vma) > nr_pages - off)
return -ENXIO;
 
-   if (is_vmalloc_addr(cpu_addr)) {
+   if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
struct page **pages = __iommu_dma_get_pages(cpu_addr);
 
if (pages)
@@ -1063,7 +1065,7 @@ static int iommu_dma_get_sgtable(struct device *dev, 
struct sg_table *sgt,
struct page *page;
int ret;
 
-   if (is_vmalloc_addr(cpu_addr)) {
+   if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
struct page **pages = __iommu_dma_get_pages(cpu_addr);
 
if (pages) {
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 19/26] iommu/dma: Cleanup variable naming in iommu_dma_alloc

2019-04-22 Thread Christoph Hellwig
From: Robin Murphy 

Most importantly clear up the size / iosize confusion.  Also rename addr
to cpu_addr to match the surrounding code and make the intention a little
more clear.

Signed-off-by: Robin Murphy 
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 45 +++
 1 file changed, 22 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 95a12e975994..9b269f0792f3 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -960,64 +960,63 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 {
bool coherent = dev_is_dma_coherent(dev);
int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
-   size_t iosize = size;
+   size_t alloc_size = PAGE_ALIGN(size);
struct page *page = NULL;
-   void *addr;
+   void *cpu_addr;
 
-   size = PAGE_ALIGN(size);
gfp |= __GFP_ZERO;
 
if (gfpflags_allow_blocking(gfp) &&
!(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
-   return iommu_dma_alloc_remap(dev, iosize, handle, gfp, attrs);
+   return iommu_dma_alloc_remap(dev, size, handle, gfp, attrs);
 
if (!gfpflags_allow_blocking(gfp) && !coherent) {
-   addr = dma_alloc_from_pool(size, &page, gfp);
-   if (!addr)
+   cpu_addr = dma_alloc_from_pool(alloc_size, &page, gfp);
+   if (!cpu_addr)
return NULL;
 
-   *handle = __iommu_dma_map(dev, page_to_phys(page), iosize,
+   *handle = __iommu_dma_map(dev, page_to_phys(page), size,
  ioprot);
if (*handle == DMA_MAPPING_ERROR) {
-   dma_free_from_pool(addr, size);
+   dma_free_from_pool(cpu_addr, alloc_size);
return NULL;
}
-   return addr;
+   return cpu_addr;
}
 
if (gfpflags_allow_blocking(gfp))
-   page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
-get_order(size),
+   page = dma_alloc_from_contiguous(dev, alloc_size >> PAGE_SHIFT,
+get_order(alloc_size),
 gfp & __GFP_NOWARN);
if (!page)
-   page = alloc_pages(gfp, get_order(size));
+   page = alloc_pages(gfp, get_order(alloc_size));
if (!page)
return NULL;
 
-   *handle = __iommu_dma_map(dev, page_to_phys(page), iosize, ioprot);
+   *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot);
if (*handle == DMA_MAPPING_ERROR)
goto out_free_pages;
 
if (!coherent || PageHighMem(page)) {
pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
 
-   addr = dma_common_contiguous_remap(page, size, VM_USERMAP, prot,
-   __builtin_return_address(0));
-   if (!addr)
+   cpu_addr = dma_common_contiguous_remap(page, alloc_size,
+   VM_USERMAP, prot, __builtin_return_address(0));
+   if (!cpu_addr)
goto out_unmap;
 
if (!coherent)
-   arch_dma_prep_coherent(page, iosize);
+   arch_dma_prep_coherent(page, size);
} else {
-   addr = page_address(page);
+   cpu_addr = page_address(page);
}
-   memset(addr, 0, size);
-   return addr;
+   memset(cpu_addr, 0, alloc_size);
+   return cpu_addr;
 out_unmap:
-   __iommu_dma_unmap(dev, *handle, iosize);
+   __iommu_dma_unmap(dev, *handle, size);
 out_free_pages:
-   if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT))
-   __free_pages(page, get_order(size));
+   if (!dma_release_from_contiguous(dev, page, alloc_size >> PAGE_SHIFT))
+   __free_pages(page, get_order(alloc_size));
return NULL;
 }
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 20/26] iommu/dma: Refactor iommu_dma_alloc, part 2

2019-04-22 Thread Christoph Hellwig
From: Robin Murphy 

Apart from the iommu_dma_alloc_remap() case which remains sufficiently
different that it's better off being self-contained, the rest of the
logic can now be consolidated into a single flow which separates the
logcially-distinct steps of allocating pages, getting the CPU address,
and finally getting the IOMMU address.

Signed-off-by: Robin Murphy 
[hch: split the page allocation into a new helper to simplify the flow]
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 65 +--
 1 file changed, 35 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 9b269f0792f3..acdfe866cb29 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -955,35 +955,14 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
__iommu_dma_free(dev, size, cpu_addr);
 }
 
-static void *iommu_dma_alloc(struct device *dev, size_t size,
-   dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
+static void *iommu_dma_alloc_pages(struct device *dev, size_t size,
+   struct page **pagep, gfp_t gfp, unsigned long attrs)
 {
bool coherent = dev_is_dma_coherent(dev);
-   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
size_t alloc_size = PAGE_ALIGN(size);
struct page *page = NULL;
void *cpu_addr;
 
-   gfp |= __GFP_ZERO;
-
-   if (gfpflags_allow_blocking(gfp) &&
-   !(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
-   return iommu_dma_alloc_remap(dev, size, handle, gfp, attrs);
-
-   if (!gfpflags_allow_blocking(gfp) && !coherent) {
-   cpu_addr = dma_alloc_from_pool(alloc_size, &page, gfp);
-   if (!cpu_addr)
-   return NULL;
-
-   *handle = __iommu_dma_map(dev, page_to_phys(page), size,
- ioprot);
-   if (*handle == DMA_MAPPING_ERROR) {
-   dma_free_from_pool(cpu_addr, alloc_size);
-   return NULL;
-   }
-   return cpu_addr;
-   }
-
if (gfpflags_allow_blocking(gfp))
page = dma_alloc_from_contiguous(dev, alloc_size >> PAGE_SHIFT,
 get_order(alloc_size),
@@ -993,33 +972,59 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
if (!page)
return NULL;
 
-   *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot);
-   if (*handle == DMA_MAPPING_ERROR)
-   goto out_free_pages;
-
if (!coherent || PageHighMem(page)) {
pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
 
cpu_addr = dma_common_contiguous_remap(page, alloc_size,
VM_USERMAP, prot, __builtin_return_address(0));
if (!cpu_addr)
-   goto out_unmap;
+   goto out_free_pages;
 
if (!coherent)
arch_dma_prep_coherent(page, size);
} else {
cpu_addr = page_address(page);
}
+
+   *pagep = page;
memset(cpu_addr, 0, alloc_size);
return cpu_addr;
-out_unmap:
-   __iommu_dma_unmap(dev, *handle, size);
 out_free_pages:
if (!dma_release_from_contiguous(dev, page, alloc_size >> PAGE_SHIFT))
__free_pages(page, get_order(alloc_size));
return NULL;
 }
 
+static void *iommu_dma_alloc(struct device *dev, size_t size,
+   dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
+{
+   bool coherent = dev_is_dma_coherent(dev);
+   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
+   struct page *page = NULL;
+   void *cpu_addr;
+
+   gfp |= __GFP_ZERO;
+
+   if (gfpflags_allow_blocking(gfp) &&
+   !(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
+   return iommu_dma_alloc_remap(dev, size, handle, gfp, attrs);
+
+   if (!gfpflags_allow_blocking(gfp) && !coherent)
+   cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp);
+   else
+   cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs);
+   if (!cpu_addr)
+   return NULL;
+
+   *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot);
+   if (*handle == DMA_MAPPING_ERROR) {
+   __iommu_dma_free(dev, size, cpu_addr);
+   return NULL;
+   }
+
+   return cpu_addr;
+}
+
 static int __iommu_dma_mmap_pfn(struct vm_area_struct *vma,
  unsigned long pfn, size_t size)
 {
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 21/26] iommu/dma: Refactor iommu_dma_get_sgtable

2019-04-22 Thread Christoph Hellwig
Inline __iommu_dma_get_sgtable_page into the main function, and use the
fact that __iommu_dma_get_pages return NULL for remapped contigous
allocations to simplify the code flow a bit.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 45 +++
 1 file changed, 17 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index acdfe866cb29..138b85e675c8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1070,42 +1070,31 @@ static int iommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
return __iommu_dma_mmap(pages, size, vma);
 }
 
-static int __iommu_dma_get_sgtable_page(struct sg_table *sgt, struct page 
*page,
-   size_t size)
-{
-   int ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
-
-   if (!ret)
-   sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
-   return ret;
-}
-
 static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
 {
-   unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   struct page **pages;
+   struct page *page;
+   int ret;
 
-   if (!is_vmalloc_addr(cpu_addr)) {
-   struct page *page = virt_to_page(cpu_addr);
-   return __iommu_dma_get_sgtable_page(sgt, page, size);
-   }
+   if (is_vmalloc_addr(cpu_addr)) {
+   struct page **pages = __iommu_dma_get_pages(cpu_addr);
 
-   if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   /*
-* DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
-* hence in the vmalloc space.
-*/
-   struct page *page = vmalloc_to_page(cpu_addr);
-   return __iommu_dma_get_sgtable_page(sgt, page, size);
+   if (pages) {
+   return sg_alloc_table_from_pages(sgt, pages,
+   PAGE_ALIGN(size) >> PAGE_SHIFT,
+   0, size, GFP_KERNEL);
+   }
+
+   page = vmalloc_to_page(cpu_addr);
+   } else {
+   page = virt_to_page(cpu_addr);
}
 
-   pages = __iommu_dma_get_pages(cpu_addr);
-   if (WARN_ON_ONCE(!pages))
-   return -ENXIO;
-   return sg_alloc_table_from_pages(sgt, pages, count, 0, size,
-GFP_KERNEL);
+   ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
+   if (!ret)
+   sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
+   return ret;
 }
 
 static const struct dma_map_ops iommu_dma_ops = {
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 18/26] iommu/dma: Split iommu_dma_free

2019-04-22 Thread Christoph Hellwig
From: Robin Murphy 

Most of it can double up to serve the failure cleanup path for
iommu_dma_alloc().

Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a1b8c232ad42..95a12e975994 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -916,15 +916,12 @@ static void iommu_dma_unmap_resource(struct device *dev, 
dma_addr_t handle,
__iommu_dma_unmap(dev, handle, size);
 }
 
-static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
-   dma_addr_t handle, unsigned long attrs)
+static void __iommu_dma_free(struct device *dev, size_t size, void *cpu_addr)
 {
size_t alloc_size = PAGE_ALIGN(size);
int count = alloc_size >> PAGE_SHIFT;
struct page *page = NULL;
 
-   __iommu_dma_unmap(dev, handle, size);
-
/* Non-coherent atomic allocation? Easy */
if (dma_free_from_pool(cpu_addr, alloc_size))
return;
@@ -951,6 +948,13 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
__free_pages(page, get_order(alloc_size));
 }
 
+static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
+   dma_addr_t handle, unsigned long attrs)
+{
+   __iommu_dma_unmap(dev, handle, size);
+   __iommu_dma_free(dev, size, cpu_addr);
+}
+
 static void *iommu_dma_alloc(struct device *dev, size_t size,
dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
 {
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 15/26] iommu/dma: Refactor iommu_dma_alloc

2019-04-22 Thread Christoph Hellwig
From: Robin Murphy 

Shuffle around the self-contained atomic and non-contiguous cases to
return early and get out of the way of the CMA case that we're about to
work on next.

Signed-off-by: Robin Murphy 
[hch: slight changes to the code flow]
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 60 +++
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 9658c4cc3cfe..504dd27312bb 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -956,14 +956,19 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 {
bool coherent = dev_is_dma_coherent(dev);
int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
+   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
size_t iosize = size;
+   struct page *page;
void *addr;
 
size = PAGE_ALIGN(size);
gfp |= __GFP_ZERO;
 
+   if (gfpflags_allow_blocking(gfp) &&
+   !(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
+   return iommu_dma_alloc_remap(dev, iosize, handle, gfp, attrs);
+
if (!gfpflags_allow_blocking(gfp)) {
-   struct page *page;
/*
 * In atomic context we can't remap anything, so we'll only
 * get the virtually contiguous buffer we need by way of a
@@ -985,39 +990,34 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
__free_pages(page, get_order(size));
else
dma_free_from_pool(addr, size);
-   addr = NULL;
-   }
-   } else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
-   struct page *page;
-
-   page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
-   get_order(size), gfp & __GFP_NOWARN);
-   if (!page)
return NULL;
-
-   *handle = __iommu_dma_map(dev, page_to_phys(page), iosize, 
ioprot);
-   if (*handle == DMA_MAPPING_ERROR) {
-   dma_release_from_contiguous(dev, page,
-   size >> PAGE_SHIFT);
-   return NULL;
-   }
-   addr = dma_common_contiguous_remap(page, size, VM_USERMAP,
-  prot,
-  __builtin_return_address(0));
-   if (addr) {
-   if (!coherent)
-   arch_dma_prep_coherent(page, iosize);
-   memset(addr, 0, size);
-   } else {
-   __iommu_dma_unmap(dev, *handle, iosize);
-   dma_release_from_contiguous(dev, page,
-   size >> PAGE_SHIFT);
}
-   } else {
-   addr = iommu_dma_alloc_remap(dev, iosize, handle, gfp, attrs);
+   return addr;
}
+
+   page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
+get_order(size), gfp & __GFP_NOWARN);
+   if (!page)
+   return NULL;
+
+   *handle = __iommu_dma_map(dev, page_to_phys(page), iosize, ioprot);
+   if (*handle == DMA_MAPPING_ERROR)
+   goto out_free_pages;
+
+   addr = dma_common_contiguous_remap(page, size, VM_USERMAP, prot,
+   __builtin_return_address(0));
+   if (!addr)
+   goto out_unmap;
+
+   if (!coherent)
+   arch_dma_prep_coherent(page, iosize);
+   memset(addr, 0, size);
return addr;
+out_unmap:
+   __iommu_dma_unmap(dev, *handle, iosize);
+out_free_pages:
+   dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
+   return NULL;
 }
 
 static int __iommu_dma_mmap_pfn(struct vm_area_struct *vma,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 16/26] iommu/dma: Don't remap CMA unnecessarily

2019-04-22 Thread Christoph Hellwig
From: Robin Murphy 

Always remapping CMA allocations was largely a bodge to keep the freeing
logic manageable when it was split between here and an arch wrapper. Now
that it's all together and streamlined, we can relax that limitation.

Signed-off-by: Robin Murphy 
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 504dd27312bb..6f4febf5e1de 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -956,7 +956,6 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 {
bool coherent = dev_is_dma_coherent(dev);
int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
-   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
size_t iosize = size;
struct page *page;
void *addr;
@@ -1004,13 +1003,19 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
if (*handle == DMA_MAPPING_ERROR)
goto out_free_pages;
 
-   addr = dma_common_contiguous_remap(page, size, VM_USERMAP, prot,
-   __builtin_return_address(0));
-   if (!addr)
-   goto out_unmap;
+   if (!coherent || PageHighMem(page)) {
+   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
 
-   if (!coherent)
-   arch_dma_prep_coherent(page, iosize);
+   addr = dma_common_contiguous_remap(page, size, VM_USERMAP, prot,
+   __builtin_return_address(0));
+   if (!addr)
+   goto out_unmap;
+
+   if (!coherent)
+   arch_dma_prep_coherent(page, iosize);
+   } else {
+   addr = page_address(page);
+   }
memset(addr, 0, size);
return addr;
 out_unmap:
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 13/26] iommu/dma: Remove __iommu_dma_free

2019-04-22 Thread Christoph Hellwig
We only have a single caller of this function left, so open code it there.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 21 ++---
 1 file changed, 2 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index b8e46e89a60a..4632b9d301a1 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -534,24 +534,6 @@ static struct page **__iommu_dma_get_pages(void *cpu_addr)
return area->pages;
 }
 
-/**
- * iommu_dma_free - Free a buffer allocated by iommu_dma_alloc_remap()
- * @dev: Device which owns this buffer
- * @pages: Array of buffer pages as returned by __iommu_dma_alloc_remap()
- * @size: Size of buffer in bytes
- * @handle: DMA address of buffer
- *
- * Frees both the pages associated with the buffer, and the array
- * describing them
- */
-static void __iommu_dma_free(struct device *dev, struct page **pages,
-   size_t size, dma_addr_t *handle)
-{
-   __iommu_dma_unmap(dev, *handle, size);
-   __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
-   *handle = DMA_MAPPING_ERROR;
-}
-
 /**
  * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA space
  * @dev: Device to allocate memory for. Must be a real device
@@ -1034,7 +1016,8 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
 
if (WARN_ON(!pages))
return;
-   __iommu_dma_free(dev, pages, iosize, &handle);
+   __iommu_dma_unmap(dev, handle, iosize);
+   __iommu_dma_free_pages(pages, size >> PAGE_SHIFT);
dma_common_free_remap(cpu_addr, size, VM_USERMAP);
} else {
__iommu_dma_unmap(dev, handle, iosize);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 10/26] iommu/dma: Squash __iommu_dma_{map,unmap}_page helpers

2019-04-22 Thread Christoph Hellwig
From: Robin Murphy 

The remaining internal callsites don't care about having prototypes
compatible with the relevant dma_map_ops callbacks, so the extra
level of indirection just wastes space and complictaes things.

Signed-off-by: Robin Murphy 
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 25 +++--
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4ebd08e3a83a..b52c5d6be7b4 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -698,18 +698,6 @@ static void iommu_dma_sync_sg_for_device(struct device 
*dev,
arch_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir);
 }
 
-static dma_addr_t __iommu_dma_map_page(struct device *dev, struct page *page,
-   unsigned long offset, size_t size, int prot)
-{
-   return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot);
-}
-
-static void __iommu_dma_unmap_page(struct device *dev, dma_addr_t handle,
-   size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-   __iommu_dma_unmap(dev, handle, size);
-}
-
 static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size, enum dma_data_direction dir,
unsigned long attrs)
@@ -955,7 +943,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
if (!addr)
return NULL;
 
-   *handle = __iommu_dma_map_page(dev, page, 0, iosize, ioprot);
+   *handle = __iommu_dma_map(dev, page_to_phys(page), iosize,
+ ioprot);
if (*handle == DMA_MAPPING_ERROR) {
if (coherent)
__free_pages(page, get_order(size));
@@ -972,7 +961,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
if (!page)
return NULL;
 
-   *handle = __iommu_dma_map_page(dev, page, 0, iosize, ioprot);
+   *handle = __iommu_dma_map(dev, page_to_phys(page), iosize, 
ioprot);
if (*handle == DMA_MAPPING_ERROR) {
dma_release_from_contiguous(dev, page,
size >> PAGE_SHIFT);
@@ -986,7 +975,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
arch_dma_prep_coherent(page, iosize);
memset(addr, 0, size);
} else {
-   __iommu_dma_unmap_page(dev, *handle, iosize, 0, attrs);
+   __iommu_dma_unmap(dev, *handle, iosize);
dma_release_from_contiguous(dev, page,
size >> PAGE_SHIFT);
}
@@ -1025,12 +1014,12 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
 * Hence how dodgy the below logic looks...
 */
if (dma_in_atomic_pool(cpu_addr, size)) {
-   __iommu_dma_unmap_page(dev, handle, iosize, 0, 0);
+   __iommu_dma_unmap(dev, handle, iosize);
dma_free_from_pool(cpu_addr, size);
} else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
struct page *page = vmalloc_to_page(cpu_addr);
 
-   __iommu_dma_unmap_page(dev, handle, iosize, 0, attrs);
+   __iommu_dma_unmap(dev, handle, iosize);
dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
dma_common_free_remap(cpu_addr, size, VM_USERMAP);
} else if (is_vmalloc_addr(cpu_addr)){
@@ -1041,7 +1030,7 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
__iommu_dma_free(dev, area->pages, iosize, &handle);
dma_common_free_remap(cpu_addr, size, VM_USERMAP);
} else {
-   __iommu_dma_unmap_page(dev, handle, iosize, 0, 0);
+   __iommu_dma_unmap(dev, handle, iosize);
__free_pages(virt_to_page(cpu_addr), get_order(size));
}
 }
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 12/26] iommu/dma: Refactor the page array remapping allocator

2019-04-22 Thread Christoph Hellwig
Move the call to dma_common_pages_remap into __iommu_dma_alloc and
rename it to iommu_dma_alloc_remap.  This creates a self-contained
helper for remapped pages allocation and mapping.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 54 +++
 1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8e2d9733113e..b8e46e89a60a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -535,9 +535,9 @@ static struct page **__iommu_dma_get_pages(void *cpu_addr)
 }
 
 /**
- * iommu_dma_free - Free a buffer allocated by __iommu_dma_alloc()
+ * iommu_dma_free - Free a buffer allocated by iommu_dma_alloc_remap()
  * @dev: Device which owns this buffer
- * @pages: Array of buffer pages as returned by __iommu_dma_alloc()
+ * @pages: Array of buffer pages as returned by __iommu_dma_alloc_remap()
  * @size: Size of buffer in bytes
  * @handle: DMA address of buffer
  *
@@ -553,33 +553,35 @@ static void __iommu_dma_free(struct device *dev, struct 
page **pages,
 }
 
 /**
- * __iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
+ * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA space
  * @dev: Device to allocate memory for. Must be a real device
  *  attached to an iommu_dma_domain
  * @size: Size of buffer in bytes
+ * @dma_handle: Out argument for allocated DMA handle
  * @gfp: Allocation flags
  * @attrs: DMA attributes for this allocation
- * @prot: IOMMU mapping flags
- * @handle: Out argument for allocated DMA handle
  *
  * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
  * but an IOMMU which supports smaller pages might not map the whole thing.
  *
- * Return: Array of struct page pointers describing the buffer,
- *or NULL on failure.
+ * Return: Mapped virtual address, or NULL on failure.
  */
-static struct page **__iommu_dma_alloc(struct device *dev, size_t size,
-   gfp_t gfp, unsigned long attrs, int prot, dma_addr_t *handle)
+static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = &cookie->iovad;
+   bool coherent = dev_is_dma_coherent(dev);
+   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
+   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
+   unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
struct page **pages;
struct sg_table sgt;
dma_addr_t iova;
-   unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
+   void *vaddr;
 
-   *handle = DMA_MAPPING_ERROR;
+   *dma_handle = DMA_MAPPING_ERROR;
 
min_size = alloc_sizes & -alloc_sizes;
if (min_size < PAGE_SIZE) {
@@ -605,7 +607,7 @@ static struct page **__iommu_dma_alloc(struct device *dev, 
size_t size,
if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
goto out_free_iova;
 
-   if (!(prot & IOMMU_CACHE)) {
+   if (!(ioprot & IOMMU_CACHE)) {
struct scatterlist *sg;
int i;
 
@@ -613,14 +615,21 @@ static struct page **__iommu_dma_alloc(struct device 
*dev, size_t size,
arch_dma_prep_coherent(sg_page(sg), sg->length);
}
 
-   if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, prot)
+   if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, ioprot)
< size)
goto out_free_sg;
 
-   *handle = iova;
+   vaddr = dma_common_pages_remap(pages, size, VM_USERMAP, prot,
+   __builtin_return_address(0));
+   if (!vaddr)
+   goto out_unmap;
+
+   *dma_handle = iova;
sg_free_table(&sgt);
-   return pages;
+   return vaddr;
 
+out_unmap:
+   __iommu_dma_unmap(dev, iova, size);
 out_free_sg:
sg_free_table(&sgt);
 out_free_iova:
@@ -989,18 +998,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
size >> PAGE_SHIFT);
}
} else {
-   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
-   struct page **pages;
-
-   pages = __iommu_dma_alloc(dev, iosize, gfp, attrs, ioprot,
-   handle);
-   if (!pages)
-   return NULL;
-
-   addr = dma_common_pages_remap(pages, size, VM_USERMAP, prot,
- __builtin_return_address(0));
-   if (!addr)
-   __iommu_dma_free(dev, pages, iosize, handle);
+   addr = iommu_dma_alloc_remap(dev, iosize, handle,

[PATCH 14/26] iommu/dma: Refactor iommu_dma_free

2019-04-22 Thread Christoph Hellwig
From: Robin Murphy 

The freeing logic was made particularly horrible by part of it being
opaque to the arch wrapper, which led to a lot of convoluted repetition
to ensure each path did everything in the right order. Now that it's
all private, we can pick apart and consolidate the logically-distinct
steps of freeing the IOMMU mapping, the underlying pages, and the CPU
remap (if necessary) into something much more manageable.

Signed-off-by: Robin Murphy 
[various cosmetic changes to the code flow]
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 75 ++-
 1 file changed, 35 insertions(+), 40 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4632b9d301a1..9658c4cc3cfe 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -916,6 +916,41 @@ static void iommu_dma_unmap_resource(struct device *dev, 
dma_addr_t handle,
__iommu_dma_unmap(dev, handle, size);
 }
 
+static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
+   dma_addr_t handle, unsigned long attrs)
+{
+   size_t alloc_size = PAGE_ALIGN(size);
+   int count = alloc_size >> PAGE_SHIFT;
+   struct page *page = NULL;
+
+   __iommu_dma_unmap(dev, handle, size);
+
+   /* Non-coherent atomic allocation? Easy */
+   if (dma_free_from_pool(cpu_addr, alloc_size))
+   return;
+
+   if (is_vmalloc_addr(cpu_addr)) {
+   /*
+* If it the address is remapped, then it's either non-coherent
+* or highmem CMA, or an iommu_dma_alloc_remap() construction.
+*/
+   struct page **pages = __iommu_dma_get_pages(cpu_addr);
+
+   if (pages)
+   __iommu_dma_free_pages(pages, count);
+   else
+   page = vmalloc_to_page(cpu_addr);
+
+   dma_common_free_remap(cpu_addr, alloc_size, VM_USERMAP);
+   } else {
+   /* Lowmem means a coherent atomic or CMA allocation */
+   page = virt_to_page(cpu_addr);
+   }
+
+   if (page && !dma_release_from_contiguous(dev, page, count))
+   __free_pages(page, get_order(alloc_size));
+}
+
 static void *iommu_dma_alloc(struct device *dev, size_t size,
dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
 {
@@ -985,46 +1020,6 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
return addr;
 }
 
-static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
-   dma_addr_t handle, unsigned long attrs)
-{
-   size_t iosize = size;
-
-   size = PAGE_ALIGN(size);
-   /*
-* @cpu_addr will be one of 4 things depending on how it was allocated:
-* - A remapped array of pages for contiguous allocations.
-* - A remapped array of pages from iommu_dma_alloc_remap(), for all
-*   non-atomic allocations.
-* - A non-cacheable alias from the atomic pool, for atomic
-*   allocations by non-coherent devices.
-* - A normal lowmem address, for atomic allocations by
-*   coherent devices.
-* Hence how dodgy the below logic looks...
-*/
-   if (dma_in_atomic_pool(cpu_addr, size)) {
-   __iommu_dma_unmap(dev, handle, iosize);
-   dma_free_from_pool(cpu_addr, size);
-   } else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   struct page *page = vmalloc_to_page(cpu_addr);
-
-   __iommu_dma_unmap(dev, handle, iosize);
-   dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
-   dma_common_free_remap(cpu_addr, size, VM_USERMAP);
-   } else if (is_vmalloc_addr(cpu_addr)){
-   struct page **pages = __iommu_dma_get_pages(cpu_addr);
-
-   if (WARN_ON(!pages))
-   return;
-   __iommu_dma_unmap(dev, handle, iosize);
-   __iommu_dma_free_pages(pages, size >> PAGE_SHIFT);
-   dma_common_free_remap(cpu_addr, size, VM_USERMAP);
-   } else {
-   __iommu_dma_unmap(dev, handle, iosize);
-   __free_pages(virt_to_page(cpu_addr), get_order(size));
-   }
-}
-
 static int __iommu_dma_mmap_pfn(struct vm_area_struct *vma,
  unsigned long pfn, size_t size)
 {
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 17/26] iommu/dma: Merge the CMA and alloc_pages allocation paths

2019-04-22 Thread Christoph Hellwig
Instead of having a separate code path for the non-blocking alloc_pages
and CMA allocations paths merge them into one.  There is a slight
behavior change here in that we try the page allocator if CMA fails.
This matches what dma-direct and other iommu drivers do and will be
needed to use the dma-iommu code on architectures without DMA remapping
later on.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 32 
 1 file changed, 12 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 6f4febf5e1de..a1b8c232ad42 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -957,7 +957,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
bool coherent = dev_is_dma_coherent(dev);
int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
size_t iosize = size;
-   struct page *page;
+   struct page *page = NULL;
void *addr;
 
size = PAGE_ALIGN(size);
@@ -967,35 +967,26 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
!(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
return iommu_dma_alloc_remap(dev, iosize, handle, gfp, attrs);
 
-   if (!gfpflags_allow_blocking(gfp)) {
-   /*
-* In atomic context we can't remap anything, so we'll only
-* get the virtually contiguous buffer we need by way of a
-* physically contiguous allocation.
-*/
-   if (coherent) {
-   page = alloc_pages(gfp, get_order(size));
-   addr = page ? page_address(page) : NULL;
-   } else {
-   addr = dma_alloc_from_pool(size, &page, gfp);
-   }
+   if (!gfpflags_allow_blocking(gfp) && !coherent) {
+   addr = dma_alloc_from_pool(size, &page, gfp);
if (!addr)
return NULL;
 
*handle = __iommu_dma_map(dev, page_to_phys(page), iosize,
  ioprot);
if (*handle == DMA_MAPPING_ERROR) {
-   if (coherent)
-   __free_pages(page, get_order(size));
-   else
-   dma_free_from_pool(addr, size);
+   dma_free_from_pool(addr, size);
return NULL;
}
return addr;
}
 
-   page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
-get_order(size), gfp & __GFP_NOWARN);
+   if (gfpflags_allow_blocking(gfp))
+   page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
+get_order(size),
+gfp & __GFP_NOWARN);
+   if (!page)
+   page = alloc_pages(gfp, get_order(size));
if (!page)
return NULL;
 
@@ -1021,7 +1012,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 out_unmap:
__iommu_dma_unmap(dev, *handle, iosize);
 out_free_pages:
-   dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
+   if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT))
+   __free_pages(page, get_order(size));
return NULL;
 }
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 11/26] iommu/dma: Factor out remapped pages lookup

2019-04-22 Thread Christoph Hellwig
From: Robin Murphy 

Since we duplicate the find_vm_area() logic a few times in places where
we only care aboute the pages, factor out a helper to abstract it.

Signed-off-by: Robin Murphy 
[hch: don't warn when not finding a region, as we'll rely on that later]
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 32 
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index b52c5d6be7b4..8e2d9733113e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -525,6 +525,15 @@ static struct page **__iommu_dma_alloc_pages(struct device 
*dev,
return pages;
 }
 
+static struct page **__iommu_dma_get_pages(void *cpu_addr)
+{
+   struct vm_struct *area = find_vm_area(cpu_addr);
+
+   if (!area || !area->pages)
+   return NULL;
+   return area->pages;
+}
+
 /**
  * iommu_dma_free - Free a buffer allocated by __iommu_dma_alloc()
  * @dev: Device which owns this buffer
@@ -1023,11 +1032,11 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
dma_common_free_remap(cpu_addr, size, VM_USERMAP);
} else if (is_vmalloc_addr(cpu_addr)){
-   struct vm_struct *area = find_vm_area(cpu_addr);
+   struct page **pages = __iommu_dma_get_pages(cpu_addr);
 
-   if (WARN_ON(!area || !area->pages))
+   if (WARN_ON(!pages))
return;
-   __iommu_dma_free(dev, area->pages, iosize, &handle);
+   __iommu_dma_free(dev, pages, iosize, &handle);
dma_common_free_remap(cpu_addr, size, VM_USERMAP);
} else {
__iommu_dma_unmap(dev, handle, iosize);
@@ -1049,7 +1058,7 @@ static int iommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
 {
unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
unsigned long off = vma->vm_pgoff;
-   struct vm_struct *area;
+   struct page **pages;
int ret;
 
vma->vm_page_prot = arch_dma_mmap_pgprot(dev, vma->vm_page_prot, attrs);
@@ -1074,11 +1083,10 @@ static int iommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
return __iommu_dma_mmap_pfn(vma, pfn, size);
}
 
-   area = find_vm_area(cpu_addr);
-   if (WARN_ON(!area || !area->pages))
+   pages = __iommu_dma_get_pages(cpu_addr);
+   if (WARN_ON_ONCE(!pages))
return -ENXIO;
-
-   return __iommu_dma_mmap(area->pages, size, vma);
+   return __iommu_dma_mmap(pages, size, vma);
 }
 
 static int __iommu_dma_get_sgtable_page(struct sg_table *sgt, struct page 
*page,
@@ -1096,7 +1104,7 @@ static int iommu_dma_get_sgtable(struct device *dev, 
struct sg_table *sgt,
unsigned long attrs)
 {
unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   struct vm_struct *area = find_vm_area(cpu_addr);
+   struct page **pages;
 
if (!is_vmalloc_addr(cpu_addr)) {
struct page *page = virt_to_page(cpu_addr);
@@ -1112,10 +1120,10 @@ static int iommu_dma_get_sgtable(struct device *dev, 
struct sg_table *sgt,
return __iommu_dma_get_sgtable_page(sgt, page, size);
}
 
-   if (WARN_ON(!area || !area->pages))
+   pages = __iommu_dma_get_pages(cpu_addr);
+   if (WARN_ON_ONCE(!pages))
return -ENXIO;
-
-   return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
+   return sg_alloc_table_from_pages(sgt, pages, count, 0, size,
 GFP_KERNEL);
 }
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 09/26] iommu/dma: Move domain lookup into __iommu_dma_{map, unmap}

2019-04-22 Thread Christoph Hellwig
From: Robin Murphy 

Most of the callers don't care, and the couple that do already have the
domain to hand for other reasons are in slow paths where the (trivial)
overhead of a repeated lookup will be utterly immaterial.

Signed-off-by: Robin Murphy 
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 34 --
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index e33724497c7b..4ebd08e3a83a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -419,9 +419,10 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie 
*cookie,
size >> iova_shift(iovad));
 }
 
-static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr,
+static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr,
size_t size)
 {
+   struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = &cookie->iovad;
size_t iova_off = iova_offset(iovad, dma_addr);
@@ -436,8 +437,9 @@ static void __iommu_dma_unmap(struct iommu_domain *domain, 
dma_addr_t dma_addr,
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-   size_t size, int prot, struct iommu_domain *domain)
+   size_t size, int prot)
 {
+   struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
size_t iova_off = 0;
dma_addr_t iova;
@@ -536,7 +538,7 @@ static struct page **__iommu_dma_alloc_pages(struct device 
*dev,
 static void __iommu_dma_free(struct device *dev, struct page **pages,
size_t size, dma_addr_t *handle)
 {
-   __iommu_dma_unmap(iommu_get_dma_domain(dev), *handle, size);
+   __iommu_dma_unmap(dev, *handle, size);
__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
*handle = DMA_MAPPING_ERROR;
 }
@@ -699,14 +701,13 @@ static void iommu_dma_sync_sg_for_device(struct device 
*dev,
 static dma_addr_t __iommu_dma_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size, int prot)
 {
-   return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot,
-   iommu_get_dma_domain(dev));
+   return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot);
 }
 
 static void __iommu_dma_unmap_page(struct device *dev, dma_addr_t handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-   __iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
+   __iommu_dma_unmap(dev, handle, size);
 }
 
 static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
@@ -715,11 +716,10 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
 {
phys_addr_t phys = page_to_phys(page) + offset;
bool coherent = dev_is_dma_coherent(dev);
+   int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dma_handle;
 
-   dma_handle =__iommu_dma_map(dev, phys, size,
-   dma_info_to_prot(dir, coherent, attrs),
-   iommu_get_dma_domain(dev));
+   dma_handle =__iommu_dma_map(dev, phys, size, prot);
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
dma_handle != DMA_MAPPING_ERROR)
arch_sync_dma_for_device(dev, phys, size, dir);
@@ -731,7 +731,7 @@ static void iommu_dma_unmap_page(struct device *dev, 
dma_addr_t dma_handle,
 {
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
iommu_dma_sync_single_for_cpu(dev, dma_handle, size, dir);
-   __iommu_dma_unmap(iommu_get_dma_domain(dev), dma_handle, size);
+   __iommu_dma_unmap(dev, dma_handle, size);
 }
 
 /*
@@ -912,21 +912,20 @@ static void iommu_dma_unmap_sg(struct device *dev, struct 
scatterlist *sg,
sg = tmp;
}
end = sg_dma_address(sg) + sg_dma_len(sg);
-   __iommu_dma_unmap(iommu_get_dma_domain(dev), start, end - start);
+   __iommu_dma_unmap(dev, start, end - start);
 }
 
 static dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
return __iommu_dma_map(dev, phys, size,
-   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
-   iommu_get_dma_domain(dev));
+   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO);
 }
 
 static void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-   __iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
+   __iommu_dma_unmap(dev, handle, size);
 }
 
 static void *iommu_dma_alloc(struct device *dev, size_t size,
@@ -1176,9 +1175,8 @@

[PATCH 06/26] iommu/dma: Use for_each_sg in iommu_dma_alloc

2019-04-22 Thread Christoph Hellwig
arch_dma_prep_coherent can handle physically contiguous ranges larger
than PAGE_SIZE just fine, which means we don't need a page-based
iterator.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 77d704c8f565..f915cb7c46e6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -577,15 +577,11 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
goto out_free_iova;
 
if (!(prot & IOMMU_CACHE)) {
-   struct sg_mapping_iter miter;
-   /*
-* The CPU-centric flushing implied by SG_MITER_TO_SG isn't
-* sufficient here, so skip it by using the "wrong" direction.
-*/
-   sg_miter_start(&miter, sgt.sgl, sgt.orig_nents, 
SG_MITER_FROM_SG);
-   while (sg_miter_next(&miter))
-   arch_dma_prep_coherent(miter.page, PAGE_SIZE);
-   sg_miter_stop(&miter);
+   struct scatterlist *sg;
+   int i;
+
+   for_each_sg(sgt.sgl, sg, sgt.orig_nents, i)
+   arch_dma_prep_coherent(sg_page(sg), sg->length);
}
 
if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, prot)
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/26] iommu/dma: move the arm64 wrappers to common code

2019-04-22 Thread Christoph Hellwig
There is nothing really arm64 specific in the iommu_dma_ops
implementation, so move it to dma-iommu.c and keep a lot of symbols
self-contained.  Note the implementation does depend on the
DMA_DIRECT_REMAP infrastructure for now, so we'll have to make the
DMA_IOMMU support depend on it, but this will be relaxed soon.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c | 389 +---
 drivers/iommu/Kconfig   |   1 +
 drivers/iommu/dma-iommu.c   | 388 ---
 include/linux/dma-iommu.h   |  43 +---
 4 files changed, 369 insertions(+), 452 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 636fa7c64370..d1661f78eb4d 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -58,27 +59,6 @@ void arch_dma_prep_coherent(struct page *page, size_t size)
__dma_flush_area(page_address(page), size);
 }
 
-#ifdef CONFIG_IOMMU_DMA
-static int __swiotlb_get_sgtable_page(struct sg_table *sgt,
- struct page *page, size_t size)
-{
-   int ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
-
-   if (!ret)
-   sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
-
-   return ret;
-}
-
-static int __swiotlb_mmap_pfn(struct vm_area_struct *vma,
- unsigned long pfn, size_t size)
-{
-   return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
- vma->vm_end - vma->vm_start,
- vma->vm_page_prot);
-}
-#endif /* CONFIG_IOMMU_DMA */
-
 static int __init arm64_dma_init(void)
 {
WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(),
@@ -90,379 +70,18 @@ static int __init arm64_dma_init(void)
 arch_initcall(arm64_dma_init);
 
 #ifdef CONFIG_IOMMU_DMA
-#include 
-#include 
-#include 
-
-static void *__iommu_alloc_attrs(struct device *dev, size_t size,
-dma_addr_t *handle, gfp_t gfp,
-unsigned long attrs)
-{
-   bool coherent = dev_is_dma_coherent(dev);
-   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
-   size_t iosize = size;
-   void *addr;
-
-   if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
-   return NULL;
-
-   size = PAGE_ALIGN(size);
-
-   /*
-* Some drivers rely on this, and we probably don't want the
-* possibility of stale kernel data being read by devices anyway.
-*/
-   gfp |= __GFP_ZERO;
-
-   if (!gfpflags_allow_blocking(gfp)) {
-   struct page *page;
-   /*
-* In atomic context we can't remap anything, so we'll only
-* get the virtually contiguous buffer we need by way of a
-* physically contiguous allocation.
-*/
-   if (coherent) {
-   page = alloc_pages(gfp, get_order(size));
-   addr = page ? page_address(page) : NULL;
-   } else {
-   addr = dma_alloc_from_pool(size, &page, gfp);
-   }
-   if (!addr)
-   return NULL;
-
-   *handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot);
-   if (*handle == DMA_MAPPING_ERROR) {
-   if (coherent)
-   __free_pages(page, get_order(size));
-   else
-   dma_free_from_pool(addr, size);
-   addr = NULL;
-   }
-   } else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
-   struct page *page;
-
-   page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
-   get_order(size), gfp & __GFP_NOWARN);
-   if (!page)
-   return NULL;
-
-   *handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot);
-   if (*handle == DMA_MAPPING_ERROR) {
-   dma_release_from_contiguous(dev, page,
-   size >> PAGE_SHIFT);
-   return NULL;
-   }
-   addr = dma_common_contiguous_remap(page, size, VM_USERMAP,
-  prot,
-  __builtin_return_address(0));
-   if (addr) {
-   if (!coherent)
-   __dma_flush_area(page_to_virt(page), iosize);
-   memset(addr, 0, size);
-   } else {
-   iommu_dma_unmap_page(dev, *handle, iosize, 0, attrs);
-   dma_release_from_contiguous(dev, page,
-

[PATCH 08/26] iommu/dma: Move __iommu_dma_map

2019-04-22 Thread Christoph Hellwig
Moving this function up to its unmap counterpart helps to keep related
code together for the following changes.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 46 +++
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 622123551bba..e33724497c7b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -435,6 +435,29 @@ static void __iommu_dma_unmap(struct iommu_domain *domain, 
dma_addr_t dma_addr,
iommu_dma_free_iova(cookie, dma_addr, size);
 }
 
+static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
+   size_t size, int prot, struct iommu_domain *domain)
+{
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+   size_t iova_off = 0;
+   dma_addr_t iova;
+
+   if (cookie->type == IOMMU_DMA_IOVA_COOKIE) {
+   iova_off = iova_offset(&cookie->iovad, phys);
+   size = iova_align(&cookie->iovad, size + iova_off);
+   }
+
+   iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
+   if (!iova)
+   return DMA_MAPPING_ERROR;
+
+   if (iommu_map(domain, iova, phys - iova_off, size, prot)) {
+   iommu_dma_free_iova(cookie, iova, size);
+   return DMA_MAPPING_ERROR;
+   }
+   return iova + iova_off;
+}
+
 static void __iommu_dma_free_pages(struct page **pages, int count)
 {
while (count--)
@@ -673,29 +696,6 @@ static void iommu_dma_sync_sg_for_device(struct device 
*dev,
arch_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir);
 }
 
-static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-   size_t size, int prot, struct iommu_domain *domain)
-{
-   struct iommu_dma_cookie *cookie = domain->iova_cookie;
-   size_t iova_off = 0;
-   dma_addr_t iova;
-
-   if (cookie->type == IOMMU_DMA_IOVA_COOKIE) {
-   iova_off = iova_offset(&cookie->iovad, phys);
-   size = iova_align(&cookie->iovad, size + iova_off);
-   }
-
-   iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
-   if (!iova)
-   return DMA_MAPPING_ERROR;
-
-   if (iommu_map(domain, iova, phys - iova_off, size, prot)) {
-   iommu_dma_free_iova(cookie, iova, size);
-   return DMA_MAPPING_ERROR;
-   }
-   return iova + iova_off;
-}
-
 static dma_addr_t __iommu_dma_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size, int prot)
 {
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 03/26] dma-mapping: add a Kconfig symbol to indicated arch_dma_prep_coherent presence

2019-04-22 Thread Christoph Hellwig
Add a Kconfig symbol that indicates an architecture provides a
arch_dma_prep_coherent implementation, and provide a stub otherwise.

This will allow the generic dma-iommu code to use it while still
allowing to be built for cache coherent architectures.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Robin Murphy 
---
 arch/arm64/Kconfig  | 1 +
 arch/csky/Kconfig   | 1 +
 include/linux/dma-noncoherent.h | 6 ++
 kernel/dma/Kconfig  | 3 +++
 4 files changed, 11 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7e34b9eba5de..adda078d6df7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -13,6 +13,7 @@ config ARM64
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_DMA_COHERENT_TO_PFN
select ARCH_HAS_DMA_MMAP_PGPROT
+   select ARCH_HAS_DMA_PREP_COHERENT
select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FAST_MULTIPLIER
diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index 725a115759c9..2c3178848b7d 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -1,6 +1,7 @@
 config CSKY
def_bool y
select ARCH_32BIT_OFF_T
+   select ARCH_HAS_DMA_PREP_COHERENT
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_USE_BUILTIN_BSWAP
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index 69b36ed31a99..9741767e400f 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -72,6 +72,12 @@ static inline void arch_sync_dma_for_cpu_all(struct device 
*dev)
 }
 #endif /* CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL */
 
+#ifdef CONFIG_ARCH_HAS_DMA_PREP_COHERENT
 void arch_dma_prep_coherent(struct page *page, size_t size);
+#else
+static inline void arch_dma_prep_coherent(struct page *page, size_t size)
+{
+}
+#endif /* CONFIG_ARCH_HAS_DMA_PREP_COHERENT */
 
 #endif /* _LINUX_DMA_NONCOHERENT_H */
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index a06ba3013b3b..feff2d21d8ee 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -38,6 +38,9 @@ config ARCH_HAS_SYNC_DMA_FOR_CPU
 config ARCH_HAS_SYNC_DMA_FOR_CPU_ALL
bool
 
+config ARCH_HAS_DMA_PREP_COHERENT
+   bool
+
 config ARCH_HAS_DMA_COHERENT_TO_PFN
bool
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 04/26] iommu/dma: Cleanup dma-iommu.h

2019-04-22 Thread Christoph Hellwig
No need for a __KERNEL__ guard outside uapi and add a missing comment
describing the #else cpp statement.  Last but not least include
 instead of the asm version, which is frowned upon.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Robin Murphy 
---
 include/linux/dma-iommu.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index e760dc5d1fa8..8741637941ca 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -16,9 +16,8 @@
 #ifndef __DMA_IOMMU_H
 #define __DMA_IOMMU_H
 
-#ifdef __KERNEL__
+#include 
 #include 
-#include 
 
 #ifdef CONFIG_IOMMU_DMA
 #include 
@@ -74,7 +73,7 @@ void iommu_dma_unmap_resource(struct device *dev, dma_addr_t 
handle,
 void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
 
-#else
+#else /* CONFIG_IOMMU_DMA */
 
 struct iommu_domain;
 struct msi_msg;
@@ -108,5 +107,4 @@ static inline void iommu_dma_get_resv_regions(struct device 
*dev, struct list_he
 }
 
 #endif /* CONFIG_IOMMU_DMA */
-#endif /* __KERNEL__ */
 #endif /* __DMA_IOMMU_H */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 05/26] iommu/dma: Remove the flush_page callback

2019-04-22 Thread Christoph Hellwig
We now have a arch_dma_prep_coherent architecture hook that is used
for the generic DMA remap allocator, and we should use the same
interface for the dma-iommu code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Robin Murphy 
---
 arch/arm64/mm/dma-mapping.c | 8 +---
 drivers/iommu/dma-iommu.c   | 8 +++-
 include/linux/dma-iommu.h   | 3 +--
 3 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 604c638b2787..636fa7c64370 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -94,12 +94,6 @@ arch_initcall(arm64_dma_init);
 #include 
 #include 
 
-/* Thankfully, all cache ops are by VA so we can ignore phys here */
-static void flush_page(struct device *dev, const void *virt, phys_addr_t phys)
-{
-   __dma_flush_area(virt, PAGE_SIZE);
-}
-
 static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 dma_addr_t *handle, gfp_t gfp,
 unsigned long attrs)
@@ -176,7 +170,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t 
size,
struct page **pages;
 
pages = iommu_dma_alloc(dev, iosize, gfp, attrs, ioprot,
-   handle, flush_page);
+   handle);
if (!pages)
return NULL;
 
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 77aabe637a60..77d704c8f565 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -531,8 +532,6 @@ void iommu_dma_free(struct device *dev, struct page 
**pages, size_t size,
  * @attrs: DMA attributes for this allocation
  * @prot: IOMMU mapping flags
  * @handle: Out argument for allocated DMA handle
- * @flush_page: Arch callback which must ensure PAGE_SIZE bytes from the
- * given VA/PA are visible to the given non-coherent device.
  *
  * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
  * but an IOMMU which supports smaller pages might not map the whole thing.
@@ -541,8 +540,7 @@ void iommu_dma_free(struct device *dev, struct page 
**pages, size_t size,
  *or NULL on failure.
  */
 struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
-   unsigned long attrs, int prot, dma_addr_t *handle,
-   void (*flush_page)(struct device *, const void *, phys_addr_t))
+   unsigned long attrs, int prot, dma_addr_t *handle)
 {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
@@ -586,7 +584,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
 */
sg_miter_start(&miter, sgt.sgl, sgt.orig_nents, 
SG_MITER_FROM_SG);
while (sg_miter_next(&miter))
-   flush_page(dev, miter.addr, page_to_phys(miter.page));
+   arch_dma_prep_coherent(miter.page, PAGE_SIZE);
sg_miter_stop(&miter);
}
 
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 8741637941ca..3216447178a7 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -44,8 +44,7 @@ int dma_info_to_prot(enum dma_data_direction dir, bool 
coherent,
  * the arch code to take care of attributes and cache maintenance
  */
 struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
-   unsigned long attrs, int prot, dma_addr_t *handle,
-   void (*flush_page)(struct device *, const void *, phys_addr_t));
+   unsigned long attrs, int prot, dma_addr_t *handle);
 void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
dma_addr_t *handle);
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 02/26] arm64/iommu: improve mmap bounds checking

2019-04-22 Thread Christoph Hellwig
The nr_pages checks should be done for all mmap requests, not just those
using remap_pfn_range.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c | 21 -
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 674860e3e478..604c638b2787 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -73,19 +73,9 @@ static int __swiotlb_get_sgtable_page(struct sg_table *sgt,
 static int __swiotlb_mmap_pfn(struct vm_area_struct *vma,
  unsigned long pfn, size_t size)
 {
-   int ret = -ENXIO;
-   unsigned long nr_vma_pages = vma_pages(vma);
-   unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   unsigned long off = vma->vm_pgoff;
-
-   if (off < nr_pages && nr_vma_pages <= (nr_pages - off)) {
-   ret = remap_pfn_range(vma, vma->vm_start,
- pfn + off,
- vma->vm_end - vma->vm_start,
- vma->vm_page_prot);
-   }
-
-   return ret;
+   return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
+ vma->vm_end - vma->vm_start,
+ vma->vm_page_prot);
 }
 #endif /* CONFIG_IOMMU_DMA */
 
@@ -241,6 +231,8 @@ static int __iommu_mmap_attrs(struct device *dev, struct 
vm_area_struct *vma,
  void *cpu_addr, dma_addr_t dma_addr, size_t size,
  unsigned long attrs)
 {
+   unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+   unsigned long off = vma->vm_pgoff;
struct vm_struct *area;
int ret;
 
@@ -249,6 +241,9 @@ static int __iommu_mmap_attrs(struct device *dev, struct 
vm_area_struct *vma,
if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, &ret))
return ret;
 
+   if (off >= nr_pages || vma_pages(vma) > nr_pages - off)
+   return -ENXIO;
+
if (!is_vmalloc_addr(cpu_addr)) {
unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr));
return __swiotlb_mmap_pfn(vma, pfn, size);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


implement generic dma_map_ops for IOMMUs v3

2019-04-22 Thread Christoph Hellwig
Hi Robin,

please take a look at this series, which implements a completely generic
set of dma_map_ops for IOMMU drivers.  This is done by taking the
existing arm64 code, moving it to drivers/iommu and then massaging it
so that it can also work for architectures with DMA remapping.  This
should help future ports to support IOMMUs more easily, and also allow
to remove various custom IOMMU dma_map_ops implementations, like Tom
was planning to for the AMD one.

A git tree is also available at:

git://git.infradead.org/users/hch/misc.git dma-iommu-ops.3

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-iommu-ops.3

Changes since v2:
 - address various review comments and include patches from Robin

Changes since v1:
 - only include other headers in dma-iommu.h if CONFIG_DMA_IOMMU is enabled
 - keep using a scatterlist in iommu_dma_alloc
 - split out mmap/sgtable fixes and move them early in the series
 - updated a few commit logs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/26] arm64/iommu: handle non-remapped addresses in ->mmap and ->get_sgtable

2019-04-22 Thread Christoph Hellwig
DMA allocations that can't sleep may return non-remapped addresses, but
we do not properly handle them in the mmap and get_sgtable methods.
Resolve non-vmalloc addresses using virt_to_page to handle this corner
case.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Robin Murphy 
---
 arch/arm64/mm/dma-mapping.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 78c0a72f822c..674860e3e478 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -249,6 +249,11 @@ static int __iommu_mmap_attrs(struct device *dev, struct 
vm_area_struct *vma,
if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, &ret))
return ret;
 
+   if (!is_vmalloc_addr(cpu_addr)) {
+   unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr));
+   return __swiotlb_mmap_pfn(vma, pfn, size);
+   }
+
if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
/*
 * DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
@@ -272,6 +277,11 @@ static int __iommu_get_sgtable(struct device *dev, struct 
sg_table *sgt,
unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
struct vm_struct *area = find_vm_area(cpu_addr);
 
+   if (!is_vmalloc_addr(cpu_addr)) {
+   struct page *page = virt_to_page(cpu_addr);
+   return __swiotlb_get_sgtable_page(sgt, page, size);
+   }
+
if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
/*
 * DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: revert dma direct internals abuse

2019-04-22 Thread h...@lst.de
On Wed, Apr 10, 2019 at 03:01:14PM +, Thomas Hellstrom wrote:
> > So can you please respin a version acceptable to you and submit it
> > for 5.1 ASAP?  Otherwise I'll need to move ahead with the simple
> > revert.
> 
> I will. 
> I need to do some testing to investigate how to best choose between the
> options, but will have something ready for 5.1.

I still don't see anything in -rc6..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH] dma-mapping: create iommu mapping for newly allocated dma coherent mem

2019-04-22 Thread laurentiu . tudor
From: Laurentiu Tudor 

If possible / available call into the DMA API to get a proper iommu
mapping and a dma address for the newly allocated coherent dma memory.

Signed-off-by: Laurentiu Tudor 
---
 arch/arm/mm/dma-mapping-nommu.c |  3 ++-
 include/linux/dma-mapping.h | 12 ++---
 kernel/dma/coherent.c   | 45 +++--
 kernel/dma/mapping.c|  3 ++-
 4 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index f304b10e23a4..2c42e83a6995 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -74,7 +74,8 @@ static void arm_nommu_dma_free(struct device *dev, size_t 
size,
dma_direct_free_pages(dev, size, cpu_addr, dma_addr, attrs);
} else {
int ret = dma_release_from_global_coherent(get_order(size),
-  cpu_addr);
+  cpu_addr, size,
+  dma_addr);
 
WARN_ON_ONCE(ret == 0);
}
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 6309a721394b..cb23334608a7 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -161,19 +161,21 @@ static inline int is_device_dma_capable(struct device 
*dev)
  */
 int dma_alloc_from_dev_coherent(struct device *dev, ssize_t size,
   dma_addr_t *dma_handle, void **ret);
-int dma_release_from_dev_coherent(struct device *dev, int order, void *vaddr);
+int dma_release_from_dev_coherent(struct device *dev, int order, void *vaddr,
+ ssize_t size, dma_addr_t dma_handle);
 
 int dma_mmap_from_dev_coherent(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, size_t size, int *ret);
 
 void *dma_alloc_from_global_coherent(ssize_t size, dma_addr_t *dma_handle);
-int dma_release_from_global_coherent(int order, void *vaddr);
+int dma_release_from_global_coherent(int order, void *vaddr, ssize_t size,
+dma_addr_t dma_handle);
 int dma_mmap_from_global_coherent(struct vm_area_struct *vma, void *cpu_addr,
  size_t size, int *ret);
 
 #else
 #define dma_alloc_from_dev_coherent(dev, size, handle, ret) (0)
-#define dma_release_from_dev_coherent(dev, order, vaddr) (0)
+#define dma_release_from_dev_coherent(dev, order, vaddr, size, dma_handle) (0)
 #define dma_mmap_from_dev_coherent(dev, vma, vaddr, order, ret) (0)
 
 static inline void *dma_alloc_from_global_coherent(ssize_t size,
@@ -182,7 +184,9 @@ static inline void *dma_alloc_from_global_coherent(ssize_t 
size,
return NULL;
 }
 
-static inline int dma_release_from_global_coherent(int order, void *vaddr)
+static inline int dma_release_from_global_coherent(int order, void *vaddr
+  ssize_t size,
+  dma_addr_t dma_handle)
 {
return 0;
 }
diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c
index 29fd6590dc1e..b40439d6feaa 100644
--- a/kernel/dma/coherent.c
+++ b/kernel/dma/coherent.c
@@ -135,13 +135,15 @@ void dma_release_declared_memory(struct device *dev)
 }
 EXPORT_SYMBOL(dma_release_declared_memory);
 
-static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem,
-   ssize_t size, dma_addr_t *dma_handle)
+static void *__dma_alloc_from_coherent(struct device *dev,
+  struct dma_coherent_mem *mem,
+  ssize_t size, dma_addr_t *dma_handle)
 {
int order = get_order(size);
unsigned long flags;
int pageno;
void *ret;
+   const struct dma_map_ops *ops = dev ? get_dma_ops(dev) : NULL;
 
spin_lock_irqsave(&mem->spinlock, flags);
 
@@ -155,10 +157,16 @@ static void *__dma_alloc_from_coherent(struct 
dma_coherent_mem *mem,
/*
 * Memory was found in the coherent area.
 */
-   *dma_handle = mem->device_base + (pageno << PAGE_SHIFT);
ret = mem->virt_base + (pageno << PAGE_SHIFT);
spin_unlock_irqrestore(&mem->spinlock, flags);
memset(ret, 0, size);
+   if (ops && ops->map_resource)
+   *dma_handle = ops->map_resource(dev,
+   mem->device_base +
+   (pageno << PAGE_SHIFT),
+   size, DMA_BIDIRECTIONAL, 0);
+   else
+   *dma_handle = mem->device_base + (pageno << PAGE_SHIFT);
return ret;
 err:
spin_unlock_irqrestore(&mem->spinlock, flags);
@@ -187,7 +195,7 @@ int dma_alloc_from_dev_coherent(struct device *dev, ssize_t 
size,
if (!mem)
return 0;
 
-   *ret 

Re: [PATCH v3 08/10] iommu/vt-d: Check whether device requires bounce buffer

2019-04-22 Thread Christoph Hellwig
On Sun, Apr 21, 2019 at 09:17:17AM +0800, Lu Baolu wrote:
> +static inline bool device_needs_bounce(struct device *dev)
> +{
> + struct pci_dev *pdev = NULL;
> +
> + if (intel_no_bounce)
> + return false;
> +
> + if (dev_is_pci(dev))
> + pdev = to_pci_dev(dev);
> +
> + return pdev ? pdev->untrusted : false;
> +}

Again, this and the option should not be in a specific iommu driver.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 07/10] iommu/vt-d: Keep swiotlb on if bounce page is necessary

2019-04-22 Thread Christoph Hellwig
On Sun, Apr 21, 2019 at 09:17:16AM +0800, Lu Baolu wrote:
> +static inline bool platform_has_untrusted_device(void)
>  {
> + bool has_untrusted_device = false;
>   struct pci_dev *pdev = NULL;
>  
>   for_each_pci_dev(pdev) {
>   if (pdev->untrusted) {
> + has_untrusted_device = true;
>   break;
>   }
>   }
>  
> + return has_untrusted_device;

This shouldn't really be in the intel-iommu driver, should it?
This probably should be something like pci_has_untrusted_devices
and be moved to the PCI code.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

2019-04-22 Thread Christoph Hellwig
I looked over your swiotlb modification and I don't think we really need
them.  The only thing we really need is to split the size parameter to
swiotlb_tbl_map_single and swiotlb_tbl_unmap_single into an alloc_size
and a mapping_size paramter, where the latter one is rounded up to the
iommu page size.  Below is an untested patch on top of your series to
show what I mean.  That being said - both the current series and the one
with my patch will still leak the content of the swiotlb buffer
allocated but not used to the untrusted external device.  Is that
acceptable?  If not we need to clear that part, at which point you don't
need swiotlb changes.  Another implication is that for untrusted devices
the size of the dma coherent allocations needs to be rounded up to the
iommu page size (if that can ever be larger than the host page size).

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8c4a078fb041..eb5c32ad4443 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2151,10 +2151,13 @@ static int bounce_map(struct device *dev, struct 
iommu_domain *domain,
  void *data)
 {
const struct iommu_ops *ops = domain->ops;
+   unsigned long page_size = domain_minimal_pgsize(domain);
phys_addr_t tlb_addr;
int prot = 0;
int ret;
 
+   if (WARN_ON_ONCE(size > page_size))
+   return -EINVAL;
if (unlikely(!ops->map || domain->pgsize_bitmap == 0UL))
return -ENODEV;
 
@@ -2164,16 +2167,16 @@ static int bounce_map(struct device *dev, struct 
iommu_domain *domain,
if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
prot |= IOMMU_WRITE;
 
-   tlb_addr = phys_to_dma(dev, paddr);
-   if (!swiotlb_map(dev, &paddr, &tlb_addr, size,
-dir, attrs | DMA_ATTR_BOUNCE_PAGE))
+   tlb_addr = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
+   paddr, size, page_size, dir, attrs);
+   if (tlb_addr == DMA_MAPPING_ERROR)
return -ENOMEM;
 
ret = ops->map(domain, addr, tlb_addr, size, prot);
-   if (ret)
-   swiotlb_tbl_unmap_single(dev, tlb_addr, size,
-dir, attrs | DMA_ATTR_BOUNCE_PAGE);
-
+   if (ret) {
+   swiotlb_tbl_unmap_single(dev, tlb_addr, size, page_size,
+dir, attrs);
+   }
return ret;
 }
 
@@ -2194,11 +2197,12 @@ static int bounce_unmap(struct device *dev, struct 
iommu_domain *domain,
 
if (unlikely(!ops->unmap))
return -ENODEV;
-   ops->unmap(domain, ALIGN_DOWN(addr, page_size), page_size);
+   ops->unmap(domain, addr, page_size);
 
-   if (tlb_addr)
-   swiotlb_tbl_unmap_single(dev, tlb_addr, size,
-dir, attrs | DMA_ATTR_BOUNCE_PAGE);
+   if (tlb_addr) {
+   swiotlb_tbl_unmap_single(dev, tlb_addr, size, page_size,
+dir, attrs);
+   }
 
return 0;
 }
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 877baf2a94f4..3b6ce643bffa 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -404,7 +404,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
 */
trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force);
 
-   map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir,
+   map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, size, dir,
 attrs);
if (map == DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
@@ -420,7 +420,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
return dev_addr;
 
attrs |= DMA_ATTR_SKIP_CPU_SYNC;
-   swiotlb_tbl_unmap_single(dev, map, size, dir, attrs);
+   swiotlb_tbl_unmap_single(dev, map, size, size, dir, attrs);
 
return DMA_MAPPING_ERROR;
 }
@@ -445,7 +445,7 @@ static void xen_unmap_single(struct device *hwdev, 
dma_addr_t dev_addr,
 
/* NOTE: We use dev_addr here, not paddr! */
if (is_xen_swiotlb_buffer(dev_addr))
-   swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
+   swiotlb_tbl_unmap_single(hwdev, paddr, size, size, dir, attrs);
 }
 
 static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
@@ -556,6 +556,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct 
scatterlist *sgl,
 start_dma_addr,
 sg_phys(sg),
 sg->length,
+sg->length,
 dir, attrs);
   

Re: [PATCH v2 56/79] docs: Documentation/*.txt: rename all ReST files to *.rst

2019-04-22 Thread Logan Gunthorpe



On 2019-04-22 7:27 a.m., Mauro Carvalho Chehab wrote:

> 
> Later patches will move them to a better place and remove the
> :orphan: markup.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/ABI/removed/sysfs-class-rfkill  |  2 +-
>  Documentation/ABI/stable/sysfs-class-rfkill   |  2 +-
>  Documentation/ABI/stable/sysfs-devices-node   |  2 +-
>  Documentation/ABI/testing/procfs-diskstats|  2 +-
>  Documentation/ABI/testing/sysfs-block |  2 +-
>  .../ABI/testing/sysfs-class-switchtec |  2 +-
>  .../ABI/testing/sysfs-devices-system-cpu  |  4 +-
>  .../{DMA-API-HOWTO.txt => DMA-API-HOWTO.rst}  |  2 +
>  Documentation/{DMA-API.txt => DMA-API.rst}|  8 ++-
>  .../{DMA-ISA-LPC.txt => DMA-ISA-LPC.rst}  |  4 +-
>  ...{DMA-attributes.txt => DMA-attributes.rst} |  2 +
>  Documentation/{IPMI.txt => IPMI.rst}  |  2 +
>  .../{IRQ-affinity.txt => IRQ-affinity.rst}|  2 +
>  .../{IRQ-domain.txt => IRQ-domain.rst}|  2 +
>  Documentation/{IRQ.txt => IRQ.rst}|  2 +
>  .../{Intel-IOMMU.txt => Intel-IOMMU.rst}  |  2 +
>  Documentation/PCI/pci.txt |  8 +--
>  Documentation/{SAK.txt => SAK.rst}|  2 +
>  Documentation/{SM501.txt => SM501.rst}|  2 +
>  .../admin-guide/kernel-parameters.txt |  6 +-
>  Documentation/admin-guide/l1tf.rst|  2 +-
>  .../{atomic_bitops.txt => atomic_bitops.rst}  |  2 +
>  Documentation/block/biodoc.txt|  2 +-
>  .../{bt8xxgpio.txt => bt8xxgpio.rst}  |  2 +
>  Documentation/{btmrvl.txt => btmrvl.rst}  |  2 +
>  ...-mapping.txt => bus-virt-phys-mapping.rst} |  4 +-
>  ...g-warn-once.txt => clearing-warn-once.rst} |  2 +
>  Documentation/{cpu-load.txt => cpu-load.rst}  |  2 +
>  .../{cputopology.txt => cputopology.rst}  |  2 +
>  Documentation/{crc32.txt => crc32.rst}|  2 +
>  Documentation/{dcdbas.txt => dcdbas.rst}  |  2 +
>  ...ging-modules.txt => debugging-modules.rst} |  2 +
>  ...hci1394.txt => debugging-via-ohci1394.rst} |  2 +
>  Documentation/{dell_rbu.txt => dell_rbu.rst}  |  2 +
>  Documentation/device-mapper/statistics.rst|  4 +-
>  .../devicetree/bindings/phy/phy-bindings.txt  |  2 +-
>  Documentation/{digsig.txt => digsig.rst}  |  2 +
>  Documentation/driver-api/usb/dma.rst  |  6 +-
>  Documentation/driver-model/device.rst |  2 +-
>  Documentation/{efi-stub.txt => efi-stub.rst}  |  2 +
>  Documentation/{eisa.txt => eisa.rst}  |  2 +
>  Documentation/fb/vesafb.rst   |  2 +-
>  Documentation/filesystems/sysfs.txt   |  2 +-
>  ...ex-requeue-pi.txt => futex-requeue-pi.rst} |  2 +
>  .../{gcc-plugins.txt => gcc-plugins.rst}  |  2 +
>  Documentation/gpu/drm-mm.rst  |  2 +-
>  Documentation/{highuid.txt => highuid.rst}|  2 +
>  .../{hw_random.txt => hw_random.rst}  |  2 +
>  .../{hwspinlock.txt => hwspinlock.rst}|  2 +
>  Documentation/ia64/IRQ-redir.txt  |  2 +-
>  .../{intel_txt.txt => intel_txt.rst}  |  2 +
>  .../{io-mapping.txt => io-mapping.rst}|  2 +
>  .../{io_ordering.txt => io_ordering.rst}  |  2 +
>  Documentation/{iostats.txt => iostats.rst}|  2 +
>  ...flags-tracing.txt => irqflags-tracing.rst} |  2 +
>  Documentation/{isa.txt => isa.rst}|  2 +
>  Documentation/{isapnp.txt => isapnp.rst}  |  2 +
>  ...hreads.txt => kernel-per-CPU-kthreads.rst} |  4 +-
>  Documentation/{kobject.txt => kobject.rst}|  4 +-
>  Documentation/{kprobes.txt => kprobes.rst}|  2 +
>  Documentation/{kref.txt => kref.rst}  |  2 +
>  Documentation/laptops/thinkpad-acpi.txt   |  6 +-
>  Documentation/{ldm.txt => ldm.rst}|  2 +
>  Documentation/locking/rt-mutex.rst|  2 +-
>  ...kup-watchdogs.txt => lockup-watchdogs.rst} |  2 +
>  Documentation/{lsm.txt => lsm.rst}|  2 +
>  Documentation/{lzo.txt => lzo.rst}|  2 +
>  Documentation/{mailbox.txt => mailbox.rst}|  2 +
>  Documentation/memory-barriers.txt |  6 +-
>  ...hameleon-bus.txt => men-chameleon-bus.rst} |  2 +
>  Documentation/networking/scaling.rst  |  4 +-
>  .../{nommu-mmap.txt => nommu-mmap.rst}|  2 +
>  Documentation/{ntb.txt => ntb.rst}|  2 +
>  Documentation/{numastat.txt => numastat.rst}  |  2 +
>  Documentation/{padata.txt => padata.rst}  |  2 +
>  ...port-lowlevel.txt => parport-lowlevel.rst} |  2 +
>  ...-semaphore.txt => percpu-rw-semaphore.rst} |  2 +
>  Documentation/{phy.txt => phy.rst}|  2 +
>  Documentation/{pi-futex.txt => pi-futex.rst}  |  2 +
>  Documentation/{pnp.txt => pnp.rst}|  2 +
>  ...reempt-locking.txt => preempt-locking.rst} |  2 +
>  Documentation/{pwm.txt => pwm.rst}|  2 +
>  Documentation/{rbtree.txt => rbtree.rst}  |  2 +
>  .../{remoteproc.txt => remoteproc.rst}|  4 +-
>  Documentation/{rfkill.txt => rfkill.rst}  |  2 +
>  ...ust-futex

Re: [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled

2019-04-22 Thread Bhupesh Sharma

Hi Will,

On 04/16/2019 02:44 PM, Will Deacon wrote:

On Mon, Apr 08, 2019 at 10:31:47AM +0800, Leizhen (ThunderTown) wrote:

On 2019/4/4 23:30, Will Deacon wrote:

On Mon, Mar 18, 2019 at 09:12:41PM +0800, Zhen Lei wrote:

v1 --> v2:
1. Drop part2. Now, we only use the SMMUv3 hardware feature STE.config=0b000
(Report abort to device, no event recorded) to suppress the event messages
caused by the unexpected devices.
2. rewrite the patch description.


This issue came up a while back:

https://lore.kernel.org/linux-pci/20180302103032.gb19...@arm.com/

and I'd still prefer to solve it using the disable_bypass logic which we
already have. Something along the lines of the diff below?


Yes, my patches also use disable_bypass=1(set ste.config=0b000). If
SMMU_IDR0.ST_LEVEL=0(Linear Stream table supported), then all STE entries
are allocated and initialized(set ste.config=0b000). But if SMMU_IDR0.ST_LEVEL=1
(2-level Stream Table), we only allocated and initialized the first level 
tables,
but leave level 2 tables dynamic allocated. That means, 
C_BAD_STREAMID(eventid=0x2)
will be reported, if an unexpeted device access memory without reinitialized in
kdump kernel.


So is your problem just that the C_BAD_STREAMID events are noisy? If so,
perhaps we should be disabling fault reporting entirely in the kdump kernel.

How about the update diff below? I'm keen to have this as simple as
possible, so we don't end up introducing rarely tested, complex code on
the crash path.

Will

--->8

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index d3880010c6cf..d8b73da6447d 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2454,13 +2454,9 @@ static int arm_smmu_device_reset(struct arm_smmu_device 
*smmu, bool bypass)
/* Clear CR0 and sync (disables SMMU and queue processing) */
reg = readl_relaxed(smmu->base + ARM_SMMU_CR0);
if (reg & CR0_SMMUEN) {
-   if (is_kdump_kernel()) {
-   arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
-   arm_smmu_device_disable(smmu);
-   return -EBUSY;
-   }
-
dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
+   WARN_ON(is_kdump_kernel() && !disable_bypass);
+   arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
}
  
  	ret = arm_smmu_device_disable(smmu);

@@ -2553,6 +2549,8 @@ static int arm_smmu_device_reset(struct arm_smmu_device 
*smmu, bool bypass)
return ret;
}
  
+	if (is_kdump_kernel())

+   enables &= ~(CR0_EVTQEN | CR0_PRIQEN);
  
  	/* Enable the SMMU interface, or ensure bypass */

if (!bypass || disable_bypass) {



Thanks for the fix.

I can confirm that after this kdump kernel boots well for me on huawei 
boards, so feel free to add:


Tested-by: Bhupesh Sharma 

Here are the kdump kernel logs without this fix:

[4.514181] arm-smmu-v3 arm-smmu-v3.1.auto: EVTQ overflow detected -- 
events lost


.. And then repeating messages like the following ..

[4.521654] arm-smmu-v3 arm-smmu-v3.1.auto: event 0x02 received:
[4.527654] arm-smmu-v3 arm-smmu-v3.1.auto:  0x7d020002
[4.533567] arm-smmu-v3 arm-smmu-v3.1.auto:  0x0001017e
[4.539478] arm-smmu-v3 arm-smmu-v3.1.auto:  0xff6de000
[4.545390] arm-smmu-v3 arm-smmu-v3.1.auto:  0x0eee03e8

And with the fix applied, kdump kernel logs can be seen below:

[ 9136.361094] Starting crashdump kernel...
[ 9136.365007] Bye!
[0.00] Booting Linux on physical CPU 0x070002 [0x480fd010]
[0.00] Linux version 5.1.0-rc6+

<..snip..>

[3.424103] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
[3.429674] arm-smmu-v3 arm-smmu-v3.0.auto: ias 48-bit, oas 48-bit 
(features 0x0fef)
[3.437780] arm-smmu-v3 arm-smmu-v3.0.auto: SMMU currently enabled! 
Resetting...

[3.445431] arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0


<..snip..>

Thanks,
Bhupesh
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/1] iommu/arm-smmu: Log CBFRSYNRA register on context fault

2019-04-22 Thread Vivek Gautam
Bits[15:0] in CBFRSYNRA register contain information about
StreamID of the incoming transaction that generated the
fault. Dump CBFRSYNRA register to get this info.
This is specially useful in a distributed SMMU architecture
where multiple masters are connected to the SMMU.
SID information helps to quickly identify the faulting
master device.

Signed-off-by: Vivek Gautam 
Reviewed-by: Bjorn Andersson 
---

Changes since v1:
 - Addressed review comments, given by Bjorn, for nits.

 drivers/iommu/arm-smmu-regs.h | 2 ++
 drivers/iommu/arm-smmu.c  | 7 +--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
index a1226e4ab5f8..e9132a926761 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -147,6 +147,8 @@ enum arm_smmu_s2cr_privcfg {
 #define CBAR_IRPTNDX_SHIFT 24
 #define CBAR_IRPTNDX_MASK  0xff
 
+#define ARM_SMMU_GR1_CBFRSYNRA(n)  (0x400 + ((n) << 2))
+
 #define ARM_SMMU_GR1_CBA2R(n)  (0x800 + ((n) << 2))
 #define CBA2R_RW64_32BIT   (0 << 0)
 #define CBA2R_RW64_64BIT   (1 << 0)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 045d93884164..e000473f8205 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -575,7 +575,9 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
*dev)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
struct arm_smmu_device *smmu = smmu_domain->smmu;
+   void __iomem *gr1_base = ARM_SMMU_GR1(smmu);
void __iomem *cb_base;
+   u32 cbfrsynra;
 
cb_base = ARM_SMMU_CB(smmu, cfg->cbndx);
fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
@@ -585,10 +587,11 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
*dev)
 
fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + 
ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx));
 
dev_err_ratelimited(smmu->dev,
-   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cb=%d\n",
-   fsr, iova, fsynr, cfg->cbndx);
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, cfg->cbndx);
 
writel(fsr, cb_base + ARM_SMMU_CB_FSR);
return IRQ_HANDLED;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu