date:20160306

[RFC PATCH v4 3/7] PCI: Ignore resource_alignment if PCI_PROBE_ONLY was set

2016-03-06 Thread Yongji Xie

The resource_alignment will releases memory resources
allocated by firmware so that kernel can reassign new
resources later on. But this will cause the problem
that no resources can be allocated by kernel if
PCI_PROBE_ONLY was set, e.g. on pSeries platform
because PCI_PROBE_ONLY force kernel to use firmware
setup and not to reassign any resources.

To solve this problem, this patch ignores
resource_alignment if PCI_PROBE_ONLY was set.

Signed-off-by: Yongji Xie 
---
 Documentation/kernel-parameters.txt |2 ++
 drivers/pci/probe.c |3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index d8b29ab..8028631 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2922,6 +2922,8 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
windows need to be expanded.
noresize: Don't change the resources' sizes when
reassigning alignment.
+   Note that this option will not work if
+   PCI_PROBE_ONLY is set.
ecrc=   Enable/disable PCIe ECRC (transaction layer
end-to-end CRC checking).
bios: Use BIOS/firmware settings. This is the
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 6d7ab9b..bc31cad 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1719,7 +1719,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus 
*bus)
pci_fixup_device(pci_fixup_header, dev);
 
/* moved out from quirk header fixup code */
-   pci_reassigndev_resource_alignment(dev);
+   if (!pci_has_flag(PCI_PROBE_ONLY))
+   pci_reassigndev_resource_alignment(dev);
 
/* Clear the state_saved flag. */
dev->state_saved = false;
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH v4 7/7] powerpc/powernv/pci-ioda: Add IOMMU_CAP_INTR_REMAP for IODA host bridge

2016-03-06 Thread Yongji Xie

This patch adds IOMMU_CAP_INTR_REMAP for IODA host bridge so that
we can mmap MSI-X table in vfio driver.

Signed-off-by: Yongji Xie 
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   17 +
 1 file changed, 17 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index f90dc04..f01b9ab 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1955,6 +1955,20 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
.free = pnv_ioda2_table_free,
 };
 
+static bool pnv_ioda_iommu_capable(enum iommu_cap cap)
+{
+   switch (cap) {
+   case IOMMU_CAP_INTR_REMAP:
+   return true;
+   default:
+   return false;
+   }
+}
+
+static struct iommu_ops pnv_ioda_iommu_ops = {
+   .capable = pnv_ioda_iommu_capable,
+};
+
 static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
  struct pnv_ioda_pe *pe, unsigned int base,
  unsigned int segs)
@@ -3078,6 +3092,9 @@ static void pnv_pci_ioda_fixup(void)
 
/* Link NPU IODA tables to their PCI devices. */
pnv_npu_ioda_fixup();
+
+   /* Add IOMMU_CAP_INTR_REMAP */
+   bus_set_iommu(_bus_type, _ioda_iommu_ops);
 }
 
 /*
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH v4 6/7] vfio-pci: Allow to mmap MSI-X table if IOMMU_CAP_INTR_REMAP was set

2016-03-06 Thread Yongji Xie

Current vfio-pci implementation disallows to mmap MSI-X
table in case that user get to touch this directly.

But we should allow to mmap these MSI-X tables if IOMMU
supports interrupt remapping which can ensure that a
given pci device can only shoot the MSIs assigned for it.

Signed-off-by: Yongji Xie 
---
 drivers/vfio/pci/vfio_pci.c  |8 +---
 drivers/vfio/pci/vfio_pci_rdwr.c |4 +++-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 49d7a69..d6f4788 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -592,13 +592,14 @@ static long vfio_pci_ioctl(void *device_data,
IORESOURCE_MEM && !pci_resources_share_page(pdev,
info.index)) {
info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
-   if (info.index == vdev->msix_bar) {
+   if (!iommu_capable(pdev->dev.bus,
+   IOMMU_CAP_INTR_REMAP) &&
+   info.index == vdev->msix_bar) {
ret = msix_sparse_mmap_cap(vdev, );
if (ret)
return ret;
}
}
-
break;
case VFIO_PCI_ROM_REGION_INDEX:
{
@@ -1029,7 +1030,8 @@ static int vfio_pci_mmap(void *device_data, struct 
vm_area_struct *vma)
if (phys_len < PAGE_SIZE || req_start + req_len > phys_len)
return -EINVAL;
 
-   if (index == vdev->msix_bar) {
+   if (!iommu_capable(pdev->dev.bus, IOMMU_CAP_INTR_REMAP) &&
+   index == vdev->msix_bar) {
/*
 * Disallow mmaps overlapping the MSI-X table; users don't
 * get to touch this directly.  We could find somewhere
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 5ffd1d9..1c46c29 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vfio_pci_private.h"
 
@@ -164,7 +165,8 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char 
__user *buf,
} else
io = vdev->barmap[bar];
 
-   if (bar == vdev->msix_bar) {
+   if (!iommu_capable(pdev->dev.bus, IOMMU_CAP_INTR_REMAP) &&
+   bar == vdev->msix_bar) {
x_start = vdev->msix_offset;
x_end = vdev->msix_offset + vdev->msix_size;
}
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH v4 4/7] PCI: Modify resource_alignment to support multiple devices

2016-03-06 Thread Yongji Xie

When vfio passthrough a PCI device of which MMIO BARs
are smaller than PAGE_SIZE, guest will not handle the
mmio accesses to the BARs which leads to mmio emulations
in host.

This is because vfio will not allow to passthrough one
BAR's mmio page which may be shared with other BARs.

To solve this performance issue, this patch modifies
resource_alignment to support syntax where multiple
devices get the same alignment. So we can use something
like "pci=resource_alignment=*:*:*.*:noresize" to
enforce the alignment of all MMIO BARs to be at least
PAGE_SIZE so that one BAR's mmio page would not be
shared with other BARs.

Signed-off-by: Yongji Xie 
---
 Documentation/kernel-parameters.txt |2 +
 drivers/pci/pci.c   |   90 ++-
 include/linux/pci.h |4 ++
 3 files changed, 85 insertions(+), 11 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 8028631..74b38ab 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2918,6 +2918,8 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
aligned memory resources.
If  is not specified,
PAGE_SIZE is used as alignment.
+   , ,  and  can be set to
+   "*" which means match all values.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
noresize: Don't change the resources' sizes when
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 760cce5..44ab59f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -102,6 +102,8 @@ unsigned int pcibios_max_latency = 255;
 /* If set, the PCIe ARI capability will not be used. */
 static bool pcie_ari_disabled;
 
+bool pci_resources_page_aligned;
+
 /**
  * pci_bus_max_busnr - returns maximum PCI bus number of given bus' children
  * @bus: pointer to PCI bus structure to search
@@ -4604,6 +4606,7 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
int seg, bus, slot, func, align_order, count;
resource_size_t align = 0;
char *p;
+   bool invalid = false;
 
spin_lock(_alignment_lock);
p = resource_alignment_param;
@@ -4615,16 +4618,49 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
} else {
align_order = -1;
}
-   if (sscanf(p, "%x:%x:%x.%x%n",
-   , , , , ) != 4) {
+   if (p[0] == '*' && p[1] == ':') {
+   seg = -1;
+   count = 1;
+   } else if (sscanf(p, "%x%n", , ) != 1 ||
+   p[count] != ':') {
+   invalid = true;
+   break;
+   }
+   p += count + 1;
+   if (*p == '*') {
+   bus = -1;
+   count = 1;
+   } else if (sscanf(p, "%x%n", , ) != 1) {
+   invalid = true;
+   break;
+   }
+   p += count;
+   if (*p == '.') {
+   slot = bus;
+   bus = seg;
seg = 0;
-   if (sscanf(p, "%x:%x.%x%n",
-   , , , ) != 3) {
-   /* Invalid format */
-   printk(KERN_ERR "PCI: Can't parse 
resource_alignment parameter: %s\n",
-   p);
+   p++;
+   } else if (*p == ':') {
+   p++;
+   if (p[0] == '*' && p[1] == '.') {
+   slot = -1;
+   count = 1;
+   } else if (sscanf(p, "%x%n", , ) != 1 ||
+   p[count] != '.') {
+   invalid = true;
break;
}
+   p += count + 1;
+   } else {
+   invalid = true;
+   break;
+   }
+   if (*p == '*') {
+   func = -1;
+   count = 1;
+   } else if (sscanf(p, "%x%n", , ) != 1) {
+   invalid = true;
+   break;
}
p += count;
if (!strncmp(p, ":noresize", 9)) {
@@ -4632,23 +4668,34 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
p += 9;
} else
*resize = true;
-

[RFC PATCH v4 5/7] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive

2016-03-06 Thread Yongji Xie

Current vfio-pci implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio
page may be shared with other BARs.

But we should allow to mmap these sub-page MMIO BARs if PCI
resource allocator can make sure these BARs' mmio page will
not be shared with other BARs.

Signed-off-by: Yongji Xie 
---
 drivers/vfio/pci/vfio_pci.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 1ce1d36..49d7a69 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -589,7 +589,8 @@ static long vfio_pci_ioctl(void *device_data,
 VFIO_REGION_INFO_FLAG_WRITE;
if (IS_ENABLED(CONFIG_VFIO_PCI_MMAP) &&
pci_resource_flags(pdev, info.index) &
-   IORESOURCE_MEM && info.size >= PAGE_SIZE) {
+   IORESOURCE_MEM && !pci_resources_share_page(pdev,
+   info.index)) {
info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
if (info.index == vdev->msix_bar) {
ret = msix_sparse_mmap_cap(vdev, );
@@ -1016,6 +1017,10 @@ static int vfio_pci_mmap(void *device_data, struct 
vm_area_struct *vma)
return -EINVAL;
 
phys_len = pci_resource_len(pdev, index);
+
+   if (!pci_resources_share_page(pdev, index))
+   phys_len = PAGE_ALIGN(phys_len);
+
req_len = vma->vm_end - vma->vm_start;
pgoff = vma->vm_pgoff &
((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH v4 2/7] PCI: Use IORESOURCE_WINDOW to identify bridge resources

2016-03-06 Thread Yongji Xie

Now we use the IORESOURCE_STARTALIGN to identify bridge
resources in __assign_resources_sorted(). But there would
be some problems because some PCI devices' resources may
also use IORESOURCE_STARTALIGN, e.g. using "noresize"
option of resource_alignment kernel parameter.

So this patch replaces IORESOURCE_STARTALIGN with
IORESOURCE_WINDOW.

Signed-off-by: Yongji Xie 
---
 drivers/pci/setup-bus.c |   21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7796d0a..4ff10ca 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -411,11 +411,11 @@ static void __assign_resources_sorted(struct list_head 
*head,
 
/*
 * There are two kinds of additional resources in the list:
-* 1. bridge resource  -- IORESOURCE_STARTALIGN
-* 2. SR-IOV resource   -- IORESOURCE_SIZEALIGN
+* 1. bridge resource -- IORESOURCE_WINDOW
+* 2. SR-IOV resource
 * Here just fix the additional alignment for bridge
 */
-   if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
+   if (!(dev_res->res->flags & IORESOURCE_WINDOW))
continue;
 
add_align = get_res_add_align(realloc_head, dev_res->res);
@@ -956,7 +956,7 @@ static void pbus_size_io(struct pci_bus *bus, 
resource_size_t min_size,
 
b_res->start = min_align;
b_res->end = b_res->start + size0 - 1;
-   b_res->flags |= IORESOURCE_STARTALIGN;
+   b_res->flags |= IORESOURCE_STARTALIGN | IORESOURCE_WINDOW;
if (size1 > size0 && realloc_head) {
add_to_list(realloc_head, bus->self, b_res, size1-size0,
min_align);
@@ -1104,7 +1104,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned 
long mask,
}
b_res->start = min_align;
b_res->end = size0 + min_align - 1;
-   b_res->flags |= IORESOURCE_STARTALIGN;
+   b_res->flags |= IORESOURCE_STARTALIGN | IORESOURCE_WINDOW;
if (size1 > size0 && realloc_head) {
add_to_list(realloc_head, bus->self, b_res, size1-size0, 
add_align);
dev_printk(KERN_DEBUG, >self->dev, "bridge window %pR to 
%pR add_size %llx add_align %llx\n",
@@ -1140,7 +1140,8 @@ static void pci_bus_size_cardbus(struct pci_bus *bus,
 */
b_res[0].start = pci_cardbus_io_size;
b_res[0].end = b_res[0].start + pci_cardbus_io_size - 1;
-   b_res[0].flags |= IORESOURCE_IO | IORESOURCE_STARTALIGN;
+   b_res[0].flags |= IORESOURCE_IO | IORESOURCE_STARTALIGN |
+ IORESOURCE_WINDOW;
if (realloc_head) {
b_res[0].end -= pci_cardbus_io_size;
add_to_list(realloc_head, bridge, b_res, pci_cardbus_io_size,
@@ -1152,7 +1153,8 @@ handle_b_res_1:
goto handle_b_res_2;
b_res[1].start = pci_cardbus_io_size;
b_res[1].end = b_res[1].start + pci_cardbus_io_size - 1;
-   b_res[1].flags |= IORESOURCE_IO | IORESOURCE_STARTALIGN;
+   b_res[1].flags |= IORESOURCE_IO | IORESOURCE_STARTALIGN |
+ IORESOURCE_WINDOW;
if (realloc_head) {
b_res[1].end -= pci_cardbus_io_size;
add_to_list(realloc_head, bridge, b_res+1, pci_cardbus_io_size,
@@ -1190,7 +1192,7 @@ handle_b_res_2:
b_res[2].start = pci_cardbus_mem_size;
b_res[2].end = b_res[2].start + pci_cardbus_mem_size - 1;
b_res[2].flags |= IORESOURCE_MEM | IORESOURCE_PREFETCH |
- IORESOURCE_STARTALIGN;
+ IORESOURCE_STARTALIGN | IORESOURCE_WINDOW;
if (realloc_head) {
b_res[2].end -= pci_cardbus_mem_size;
add_to_list(realloc_head, bridge, b_res+2,
@@ -1206,7 +1208,8 @@ handle_b_res_3:
goto handle_done;
b_res[3].start = pci_cardbus_mem_size;
b_res[3].end = b_res[3].start + b_res_3_size - 1;
-   b_res[3].flags |= IORESOURCE_MEM | IORESOURCE_STARTALIGN;
+   b_res[3].flags |= IORESOURCE_MEM | IORESOURCE_STARTALIGN |
+ IORESOURCE_WINDOW;
if (realloc_head) {
b_res[3].end -= b_res_3_size;
add_to_list(realloc_head, bridge, b_res+3, b_res_3_size,
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH v4 1/7] PCI: Add a new option for resource_alignment to reassign alignment

2016-03-06 Thread Yongji Xie

When using resource_alignment kernel parameter, the current
implement reassigns the alignment by changing resources' size
which can potentially break some drivers.

So this patch adds a new option "noresize" for the parameter
to solve this problem.

Signed-off-by: Yongji Xie 
---
 Documentation/kernel-parameters.txt |5 -
 drivers/pci/pci.c   |   36 +--
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 9a53c92..d8b29ab 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2912,13 +2912,16 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
window. The default value is 64 megabytes.
resource_alignment=
Format:
-   [@][:]:.[; ...]
+   [@][:]:.
+   [:noresize][; ...]
Specifies alignment and device to reassign
aligned memory resources.
If  is not specified,
PAGE_SIZE is used as alignment.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
+   noresize: Don't change the resources' sizes when
+   reassigning alignment.
ecrc=   Enable/disable PCIe ECRC (transaction layer
end-to-end CRC checking).
bios: Use BIOS/firmware settings. This is the
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 602eb42..760cce5 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4598,7 +4598,8 @@ static DEFINE_SPINLOCK(resource_alignment_lock);
  * RETURNS: Resource alignment if it is specified.
  *  Zero if it is not specified.
  */
-static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
+static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
+   bool *resize)
 {
int seg, bus, slot, func, align_order, count;
resource_size_t align = 0;
@@ -4626,6 +4627,11 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev)
}
}
p += count;
+   if (!strncmp(p, ":noresize", 9)) {
+   *resize = false;
+   p += 9;
+   } else
+   *resize = true;
if (seg == pci_domain_nr(dev->bus) &&
bus == dev->bus->number &&
slot == PCI_SLOT(dev->devfn) &&
@@ -4658,11 +4664,12 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
 {
int i;
struct resource *r;
+   bool resize;
resource_size_t align, size;
u16 command;
 
/* check if specified PCI is target device to reassign */
-   align = pci_specified_resource_alignment(dev);
+   align = pci_specified_resource_alignment(dev, );
if (!align)
return;
 
@@ -4684,15 +4691,24 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
if (!(r->flags & IORESOURCE_MEM))
continue;
size = resource_size(r);
-   if (size < align) {
-   size = align;
-   dev_info(>dev,
-   "Rounding up size of resource #%d to %#llx.\n",
-   i, (unsigned long long)size);
+   if (resize) {
+   if (size < align) {
+   size = align;
+   dev_info(>dev,
+   "Rounding up size of resource #%d to 
%#llx.\n",
+   i, (unsigned long long)size);
+   }
+   r->flags |= IORESOURCE_UNSET;
+   r->end = size - 1;
+   r->start = 0;
+   } else {
+   if (size > align)
+   align = size;
+   r->flags &= ~IORESOURCE_SIZEALIGN;
+   r->flags |= IORESOURCE_STARTALIGN | IORESOURCE_UNSET;
+   r->start = align;
+   r->end = r->start + size - 1;
}
-   r->flags |= IORESOURCE_UNSET;
-   r->end = size - 1;
-   r->start = 0;
}
/* Need to disable bridge's resource window,
 * to enable the kernel to reassign new resource
-- 
1.7.9.5

___
Linuxppc-dev mailing list

[RFC PATCH v4 0/7] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table

2016-03-06 Thread Yongji Xie

Current vfio-pci implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs and MSI-X table. This is because
sub-page BARs' mmio page may be shared with other BARs and MSI-X table
should not be accessed directly from the guest for security reasons.

But these will easily cause some performance issues for mmio accesses
in guest when vfio passthrough sub-page BARs or BARs containing MSI-X
table on PPC64 platform. This is because PAGE_SIZE is 64KB by default
on PPC64 platform and the big page may easily hit the sub-page MMIO
BARs' unmmapping and cause the unmmaping of the mmio page which
MSI-X table locate in, which lead to mmio emulation in host.

For sub-page MMIO BARs' unmmapping, this patchset modifies
resource_alignment kernel parameter to enforce the alignment of all
MMIO BARs to be at least PAGE_SZIE so that sub-page BAR's mmio page
will not be shared with other BARs. Then we can mmap sub-page MMIO BARs
in vfio-pci driver with the modified resource_alignment.

For MSI-X table's unmmapping, we think MSI-X table is safe to access
directly from userspace if PCI host bridge support filtering of MSIs
which can ensure that a given pci device can only shoot the MSIs
assigned for it. So we allow to mmap MSI-X table if
IOMMU_CAP_INTR_REMAP was set. And we add IOMMU_CAP_INTR_REMAP for
IODA host bridge on PPC64 platform.

With this patchset applied, we can get almost 100% improvement on
performance for mmio accesses when we passthrough sub-page BARs to guest
in our test.

The two vfio related patches(patch 5 and patch 6) are based on the
proposed patchset[1].

Changelog v4:
- Rebase on v4.5-rc6 with patchset[1] applied.
- Remove resource_page_aligned kernel parameter
- Fix some problems with resource_alignment kernel parameter
- Modify resource_alignment kernel parameter to support multiple
  devices.
- Remove host bridge attribute: msi_filtered
- Use IOMMU_CAP_INTR_REMAP to check if MSI-X table can be mmapped
- Add IOMMU_CAP_INTR_REMAP for IODA host bridge on PPC64 platform

Changelog v3:
- Rebase on new linux kernel mainline with the patchset[1] applied.
- Add a function to check whether PCI BARs'mmio page is shared with
  other BARs.
- Add a host bridge attribute to indicate PCI host bridge support
  filtering of MSIs.
- Use the new host bridge attribute to check if MSI-X table can
  be mmapped instead of CONFIG_EEH.
- Remove Kconfig option VFIO_PCI_MMAP_MSIX

Changelog v2:
- Rebase on v4.4-rc6 with the patchset[1] applied.
- Use kernel parameter to enforce all MMIO BARs to be page aligned
  on PCI core code instead of doing it on PPC64 arch code.
- Remove flags: VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED
VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP
- Add a Kconfig option to support for mmapping MSI-X table.

[1] http://www.spinics.net/lists/kvm/msg127812.html

Yongji Xie (7):
  PCI: Add a new option for resource_alignment to reassign alignment
  PCI: Use IORESOURCE_WINDOW to identify bridge resources
  PCI: Ignore resource_alignment if PCI_PROBE_ONLY was set
  PCI: Modify resource_alignment to support multiple devices
  vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive
  vfio-pci: Allow to mmap MSI-X table if IOMMU_CAP_INTR_REMAP was set
  powerpc/powernv/pci-ioda: Add IOMMU_CAP_INTR_REMAP for IODA host bridge

 Documentation/kernel-parameters.txt   |9 ++-
 arch/powerpc/platforms/powernv/pci-ioda.c |   17 
 drivers/pci/pci.c |  126 -
 drivers/pci/probe.c   |3 +-
 drivers/pci/setup-bus.c   |   21 ++---
 drivers/vfio/pci/vfio_pci.c   |   15 +++-
 drivers/vfio/pci/vfio_pci_rdwr.c  |4 +-
 include/linux/pci.h   |4 +
 8 files changed, 162 insertions(+), 37 deletions(-)

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel 4/9] powerpc/powernv/iommu: Add real mode version of xchg()

2016-03-06 Thread Alexey Kardashevskiy


On 03/07/2016 05:05 PM, David Gibson wrote:

On Mon, Mar 07, 2016 at 02:41:12PM +1100, Alexey Kardashevskiy wrote:

In real mode, TCE tables are invalidated using different
cache-inhibited store instructions which is different from
the virtual mode.

This defines and implements exchange_rm() callback. This does not
define set_rm/clear_rm/flush_rm callbacks as there is no user for those -
exchange/exchange_rm are only to be used by KVM for VFIO.

The exchange_rm callback is defined for IODA1/IODA2 powernv platforms.

This replaces list_for_each_entry_rcu with its lockless version as
from now on pnv_pci_ioda2_tce_invalidate() can be called in
the real mode too.

Signed-off-by: Alexey Kardashevskiy 
---
  arch/powerpc/include/asm/iommu.h  |  7 +++
  arch/powerpc/kernel/iommu.c   | 15 +++
  arch/powerpc/platforms/powernv/pci-ioda.c | 28 +++-
  3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 7b87bab..3ca877a 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -64,6 +64,11 @@ struct iommu_table_ops {
long index,
unsigned long *hpa,
enum dma_data_direction *direction);
+   /* Real mode */
+   int (*exchange_rm)(struct iommu_table *tbl,
+   long index,
+   unsigned long *hpa,
+   enum dma_data_direction *direction);
  #endif
void (*clear)(struct iommu_table *tbl,
long index, long npages);
@@ -208,6 +213,8 @@ extern void iommu_del_device(struct device *dev);
  extern int __init tce_iommu_bus_notifier_init(void);
  extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
unsigned long *hpa, enum dma_data_direction *direction);
+extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
+   unsigned long *hpa, enum dma_data_direction *direction);
  #else
  static inline void iommu_register_group(struct iommu_table_group *table_group,
int pci_domain_number,
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index a8e3490..2fcc48b 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1062,6 +1062,21 @@ void iommu_release_ownership(struct iommu_table *tbl)
  }
  EXPORT_SYMBOL_GPL(iommu_release_ownership);

+long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
+   unsigned long *hpa, enum dma_data_direction *direction)
+{
+   long ret;
+
+   ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
+
+   if (!ret && ((*direction == DMA_FROM_DEVICE) ||
+   (*direction == DMA_BIDIRECTIONAL)))
+   SetPageDirty(realmode_pfn_to_page(*hpa >> PAGE_SHIFT));
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_tce_xchg_rm);



  int iommu_add_device(struct device *dev)
  {
struct iommu_table *tbl;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index c5baaf3..bed1944 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1791,6 +1791,18 @@ static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, 
long index,

return ret;
  }
+
+static int pnv_ioda1_tce_xchg_rm(struct iommu_table *tbl, long index,
+   unsigned long *hpa, enum dma_data_direction *direction)
+{
+   long ret = pnv_tce_xchg(tbl, index, hpa, direction);
+
+   if (!ret && (tbl->it_type &
+   (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE)))
+   pnv_pci_ioda1_tce_invalidate(tbl, index, 1, true);
+
+   return ret;
+}
  #endif


Both your _rm variants are identical to the non _rm versions.  Why not
just set the function poiinter to the same thing, rather than copying
the whole function.



The last parameter - "rm" - to pnv_pci_ioda1_tce_invalidate() is different.





  static void pnv_ioda1_tce_free(struct iommu_table *tbl, long index,
@@ -1806,6 +1818,7 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = {
.set = pnv_ioda1_tce_build,
  #ifdef CONFIG_IOMMU_API
.exchange = pnv_ioda1_tce_xchg,
+   .exchange_rm = pnv_ioda1_tce_xchg_rm,
  #endif
.clear = pnv_ioda1_tce_free,
.get = pnv_tce_get,
@@ -1866,7 +1879,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
iommu_table *tbl,
  {
struct iommu_table_group_link *tgl;

-   list_for_each_entry_rcu(tgl, >it_group_list, next) {
+   list_for_each_entry_lockless(tgl, >it_group_list, next) {
struct pnv_ioda_pe *npe;
struct pnv_ioda_pe *pe = container_of(tgl->table_group,
struct pnv_ioda_pe, table_group);
@@ -1918,6 +1931,18 @@ static int pnv_ioda2_tce_xchg(struct

Re: [PATCH kernel 6/9] KVM: PPC: Associate IOMMU group with guest view of TCE table

2016-03-06 Thread David Gibson

On Mon, Mar 07, 2016 at 02:41:14PM +1100, Alexey Kardashevskiy wrote:
> The existing in-kernel TCE table for emulated devices contains
> guest physical addresses which are accesses by emulated devices.
> Since we need to keep this information for VFIO devices too
> in order to implement H_GET_TCE, we are reusing it.
> 
> This adds IOMMU group list to kvmppc_spapr_tce_table. Each group
> will have an iommu_table pointer.
> 
> This adds kvm_spapr_tce_attach_iommu_group() helper and its detach
> counterpart to manage the lists.
> 
> This puts a group when:
> - guest copy of TCE table is destroyed when TCE table fd is closed;
> - kvm_spapr_tce_detach_iommu_group() is called from
> the KVM_DEV_VFIO_GROUP_DEL ioctl handler in the case vfio-pci hotunplug
> (will be added in the following patch).
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/include/asm/kvm_host.h |   8 +++
>  arch/powerpc/include/asm/kvm_ppc.h  |   6 ++
>  arch/powerpc/kvm/book3s_64_vio.c| 108 
> 
>  3 files changed, 122 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 2e7c791..2c5c823 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -178,6 +178,13 @@ struct kvmppc_pginfo {
>   atomic_t refcnt;
>  };
>  
> +struct kvmppc_spapr_tce_group {
> + struct list_head next;
> + struct rcu_head rcu;
> + struct iommu_group *refgrp;/* for reference counting only */
> + struct iommu_table *tbl;
> +};
> +
>  struct kvmppc_spapr_tce_table {
>   struct list_head list;
>   struct kvm *kvm;
> @@ -186,6 +193,7 @@ struct kvmppc_spapr_tce_table {
>   u32 page_shift;
>   u64 offset; /* in pages */
>   u64 size;   /* window size in pages */
> + struct list_head groups;
>   struct page *pages[0];
>  };
>  
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index 2544eda..d1482dc 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -164,6 +164,12 @@ extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
>   struct kvm_memory_slot *memslot, unsigned long porder);
>  extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>  
> +extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm,
> + unsigned long liobn,
> + phys_addr_t start_addr,
> + struct iommu_group *grp);
> +extern void kvm_spapr_tce_detach_iommu_group(struct kvm *kvm,
> + struct iommu_group *grp);
>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>   struct kvm_create_spapr_tce_64 *args);
>  extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> b/arch/powerpc/kvm/book3s_64_vio.c
> index 2c2d103..846d16d 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -95,10 +96,18 @@ static void release_spapr_tce_table(struct rcu_head *head)
>   struct kvmppc_spapr_tce_table *stt = container_of(head,
>   struct kvmppc_spapr_tce_table, rcu);
>   unsigned long i, npages = kvmppc_tce_pages(stt->size);
> + struct kvmppc_spapr_tce_group *kg;
>  
>   for (i = 0; i < npages; i++)
>   __free_page(stt->pages[i]);
>  
> + while (!list_empty(>groups)) {
> + kg = list_first_entry(>groups,
> + struct kvmppc_spapr_tce_group, next);
> + list_del(>next);
> + kfree(kg);
> + }
> +
>   kfree(stt);
>  }
>  
> @@ -129,9 +138,15 @@ static int kvm_spapr_tce_mmap(struct file *file, struct 
> vm_area_struct *vma)
>  static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
>  {
>   struct kvmppc_spapr_tce_table *stt = filp->private_data;
> + struct kvmppc_spapr_tce_group *kg;
>  
>   list_del_rcu(>list);
>  
> + list_for_each_entry_rcu(kg, >groups, next) {
> + iommu_group_put(kg->refgrp);
> + kg->refgrp = NULL;
> + }

What's the reason for this kind of two-phase deletion?  Dereffing the
group here, and setting to NULL, then actually removing from the liast above.


>   kvm_put_kvm(stt->kvm);
>  
>   kvmppc_account_memlimit(
> @@ -146,6 +161,98 @@ static const struct file_operations kvm_spapr_tce_fops = 
> {
>   .release= kvm_spapr_tce_release,
>  };
>  
> +extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm,
> + unsigned long liobn,
> + phys_addr_t start_addr,
> + struct iommu_group *grp)
> +{
> + struct kvmppc_spapr_tce_table *stt = NULL;
>

Re: [PATCH kernel 3/9] KVM: PPC: Use preregistered memory API to access TCE list

2016-03-06 Thread David Gibson

On Mon, Mar 07, 2016 at 02:41:11PM +1100, Alexey Kardashevskiy wrote:
> VFIO on sPAPR already implements guest memory pre-registration
> when the entire guest RAM gets pinned. This can be used to translate
> the physical address of a guest page containing the TCE list
> from H_PUT_TCE_INDIRECT.
> 
> This makes use of the pre-registrered memory API to access TCE list
> pages in order to avoid unnecessary locking on the KVM memory
> reverse map.
> 
> Signed-off-by: Alexey Kardashevskiy 

Ok.. so, what's the benefit of not having to lock the rmap?

> ---
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 86 
> ++---
>  1 file changed, 70 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 44be73e..af155f6 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -180,6 +180,38 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>  EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
>  
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +static mm_context_t *kvmppc_mm_context(struct kvm_vcpu *vcpu)
> +{
> + struct task_struct *task;
> +
> + task = vcpu->arch.run_task;
> + if (unlikely(!task || !task->mm))
> + return NULL;
> +
> + return >mm->context;
> +}
> +
> +static inline bool kvmppc_preregistered(struct kvm_vcpu *vcpu)
> +{
> + mm_context_t *mm = kvmppc_mm_context(vcpu);
> +
> + if (unlikely(!mm))
> + return false;
> +
> + return mm_iommu_preregistered(mm);
> +}
> +
> +static struct mm_iommu_table_group_mem_t *kvmppc_rm_iommu_lookup(
> + struct kvm_vcpu *vcpu, unsigned long ua, unsigned long size)
> +{
> + mm_context_t *mm = kvmppc_mm_context(vcpu);
> +
> + if (unlikely(!mm))
> + return NULL;
> +
> + return mm_iommu_lookup_rm(mm, ua, size);
> +}
> +
>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> unsigned long ioba, unsigned long tce)
>  {
> @@ -261,23 +293,44 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>   if (ret != H_SUCCESS)
>   return ret;
>  
> - if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , ))
> - return H_TOO_HARD;
> + if (kvmppc_preregistered(vcpu)) {
> + /*
> +  * We get here if guest memory was pre-registered which
> +  * is normally VFIO case and gpa->hpa translation does not
> +  * depend on hpt.
> +  */
> + struct mm_iommu_table_group_mem_t *mem;
>  
> - rmap = (void *) vmalloc_to_phys(rmap);
> + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , NULL))
> + return H_TOO_HARD;
>  
> - /*
> -  * Synchronize with the MMU notifier callbacks in
> -  * book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
> -  * While we have the rmap lock, code running on other CPUs
> -  * cannot finish unmapping the host real page that backs
> -  * this guest real page, so we are OK to access the host
> -  * real page.
> -  */
> - lock_rmap(rmap);
> - if (kvmppc_rm_ua_to_hpa(vcpu, ua, )) {
> - ret = H_TOO_HARD;
> - goto unlock_exit;
> + mem = kvmppc_rm_iommu_lookup(vcpu, ua, IOMMU_PAGE_SIZE_4K);
> + if (!mem || mm_iommu_rm_ua_to_hpa(mem, ua, ))
> + return H_TOO_HARD;
> + } else {
> + /*
> +  * This is emulated devices case.
> +  * We do not require memory to be preregistered in this case
> +  * so lock rmap and do __find_linux_pte_or_hugepte().
> +  */
> + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , ))
> + return H_TOO_HARD;
> +
> + rmap = (void *) vmalloc_to_phys(rmap);
> +
> + /*
> +  * Synchronize with the MMU notifier callbacks in
> +  * book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
> +  * While we have the rmap lock, code running on other CPUs
> +  * cannot finish unmapping the host real page that backs
> +  * this guest real page, so we are OK to access the host
> +  * real page.
> +  */
> + lock_rmap(rmap);
> + if (kvmppc_rm_ua_to_hpa(vcpu, ua, )) {
> + ret = H_TOO_HARD;
> + goto unlock_exit;
> + }
>   }
>  
>   for (i = 0; i < npages; ++i) {
> @@ -291,7 +344,8 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>   }
>  
>  unlock_exit:
> - unlock_rmap(rmap);
> + if (rmap)

I don't see where rmap is initialized to NULL in the case where it's
not being used.

> + unlock_rmap(rmap);
>  
>   return ret;
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_

Re: [PATCH kernel 4/9] powerpc/powernv/iommu: Add real mode version of xchg()

2016-03-06 Thread David Gibson

On Mon, Mar 07, 2016 at 02:41:12PM +1100, Alexey Kardashevskiy wrote:
> In real mode, TCE tables are invalidated using different
> cache-inhibited store instructions which is different from
> the virtual mode.
> 
> This defines and implements exchange_rm() callback. This does not
> define set_rm/clear_rm/flush_rm callbacks as there is no user for those -
> exchange/exchange_rm are only to be used by KVM for VFIO.
> 
> The exchange_rm callback is defined for IODA1/IODA2 powernv platforms.
> 
> This replaces list_for_each_entry_rcu with its lockless version as
> from now on pnv_pci_ioda2_tce_invalidate() can be called in
> the real mode too.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/include/asm/iommu.h  |  7 +++
>  arch/powerpc/kernel/iommu.c   | 15 +++
>  arch/powerpc/platforms/powernv/pci-ioda.c | 28 +++-
>  3 files changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index 7b87bab..3ca877a 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -64,6 +64,11 @@ struct iommu_table_ops {
>   long index,
>   unsigned long *hpa,
>   enum dma_data_direction *direction);
> + /* Real mode */
> + int (*exchange_rm)(struct iommu_table *tbl,
> + long index,
> + unsigned long *hpa,
> + enum dma_data_direction *direction);
>  #endif
>   void (*clear)(struct iommu_table *tbl,
>   long index, long npages);
> @@ -208,6 +213,8 @@ extern void iommu_del_device(struct device *dev);
>  extern int __init tce_iommu_bus_notifier_init(void);
>  extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
>   unsigned long *hpa, enum dma_data_direction *direction);
> +extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
> + unsigned long *hpa, enum dma_data_direction *direction);
>  #else
>  static inline void iommu_register_group(struct iommu_table_group 
> *table_group,
>   int pci_domain_number,
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index a8e3490..2fcc48b 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -1062,6 +1062,21 @@ void iommu_release_ownership(struct iommu_table *tbl)
>  }
>  EXPORT_SYMBOL_GPL(iommu_release_ownership);
>  
> +long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
> + unsigned long *hpa, enum dma_data_direction *direction)
> +{
> + long ret;
> +
> + ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
> +
> + if (!ret && ((*direction == DMA_FROM_DEVICE) ||
> + (*direction == DMA_BIDIRECTIONAL)))
> + SetPageDirty(realmode_pfn_to_page(*hpa >> PAGE_SHIFT));
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_tce_xchg_rm);

>  int iommu_add_device(struct device *dev)
>  {
>   struct iommu_table *tbl;
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index c5baaf3..bed1944 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1791,6 +1791,18 @@ static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, 
> long index,
>  
>   return ret;
>  }
> +
> +static int pnv_ioda1_tce_xchg_rm(struct iommu_table *tbl, long index,
> + unsigned long *hpa, enum dma_data_direction *direction)
> +{
> + long ret = pnv_tce_xchg(tbl, index, hpa, direction);
> +
> + if (!ret && (tbl->it_type &
> + (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE)))
> + pnv_pci_ioda1_tce_invalidate(tbl, index, 1, true);
> +
> + return ret;
> +}
>  #endif

Both your _rm variants are identical to the non _rm versions.  Why not
just set the function poiinter to the same thing, rather than copying
the whole function.

>  static void pnv_ioda1_tce_free(struct iommu_table *tbl, long index,
> @@ -1806,6 +1818,7 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = {
>   .set = pnv_ioda1_tce_build,
>  #ifdef CONFIG_IOMMU_API
>   .exchange = pnv_ioda1_tce_xchg,
> + .exchange_rm = pnv_ioda1_tce_xchg_rm,
>  #endif
>   .clear = pnv_ioda1_tce_free,
>   .get = pnv_tce_get,
> @@ -1866,7 +1879,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
> iommu_table *tbl,
>  {
>   struct iommu_table_group_link *tgl;
>  
> - list_for_each_entry_rcu(tgl, >it_group_list, next) {
> + list_for_each_entry_lockless(tgl, >it_group_list, next) {
>   struct pnv_ioda_pe *npe;
>   struct pnv_ioda_pe *pe = container_of(tgl->table_group,
>   struct pnv_ioda_pe, table_group);
> @@ -1918,6 +1931,18 @@ static int

Re: [PATCH v6 14/20] cxl: Support to flash a new image on the adapter from a guest

2016-03-06 Thread Ian Munsie

> +struct cxl_adapter_image {
> +__u64 flags;
> +__u64 data;
> +__u64 len_data;
> +__u64 len_image;
> +__u64 reserved1;
> +__u64 reserved2;
> +__u64 reserved3;
> +__u64 reserved4;
> +};

Thanks, that looks better now :)

Acked-by: Ian Munsie 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 00/20] cxl: Add support for powerVM guest

2016-03-06 Thread Ian Munsie

Thanks guys - I'm pretty happy with this series now and am happy for
this to be merged, unless @mpe has any comments.

Cheers,
-Ian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel 1/9] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number

2016-03-06 Thread David Gibson

On Mon, Mar 07, 2016 at 02:41:09PM +1100, Alexey Kardashevskiy wrote:
> This adds a capability number for in-kernel support for VFIO on
> SPAPR platform.
> 
> The capability will tell the user space whether in-kernel handlers of
> H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space
> must not attempt allocating a TCE table in the host kernel via
> the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests
> will not be passed to the user space which is desired action in
> the situation like that.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  include/uapi/linux/kvm.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index c251f06..080ffbf 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -863,6 +863,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_HYPERV_SYNIC 123
>  #define KVM_CAP_S390_RI 124
>  #define KVM_CAP_SPAPR_TCE_64 125
> +#define KVM_CAP_SPAPR_TCE_VFIO 126
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel 2/9] powerpc/mmu: Add real mode support for IOMMU preregistered memory

2016-03-06 Thread David Gibson

On Mon, Mar 07, 2016 at 02:41:10PM +1100, Alexey Kardashevskiy wrote:
> This makes mm_iommu_lookup() able to work in realmode by replacing
> list_for_each_entry_rcu() (which can do debug stuff which can fail in
> real mode) with list_for_each_entry_lockless().
> 
> This adds realmode version of mm_iommu_ua_to_hpa() which adds
> explicit vmalloc'd-to-linear address conversion.
> Unlike mm_iommu_ua_to_hpa(), mm_iommu_rm_ua_to_hpa() can fail.
> 
> This changes mm_iommu_preregistered() to receive @mm as in real mode
> @current does not always have a correct pointer.

So, I'd generally expect a parameter called @mm to be an mm_struct *,
not a mm_context_t.

> 
> This adds realmode version of mm_iommu_lookup() which receives @mm
> (for the same reason as for mm_iommu_preregistered()) and uses
> lockless version of list_for_each_entry_rcu().
> 
> Signed-off-by: Alexey Kardashevskiy 



> ---
>  arch/powerpc/include/asm/mmu_context.h |  6 -
>  arch/powerpc/mm/mmu_context_iommu.c| 45 
> ++
>  2 files changed, 45 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index 878c277..3ba652a 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -18,7 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  struct mm_iommu_table_group_mem_t;
>  
> -extern bool mm_iommu_preregistered(void);
> +extern bool mm_iommu_preregistered(mm_context_t *mm);
>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>   struct mm_iommu_table_group_mem_t **pmem);
>  extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem);
> @@ -26,10 +26,14 @@ extern void mm_iommu_init(mm_context_t *ctx);
>  extern void mm_iommu_cleanup(mm_context_t *ctx);
>  extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua,
>   unsigned long size);
> +extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(mm_context_t 
> *mm,
> + unsigned long ua, unsigned long size);
>  extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua,
>   unsigned long entries);
>  extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
>   unsigned long ua, unsigned long *hpa);
> +extern long mm_iommu_rm_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
> + unsigned long ua, unsigned long *hpa);
>  extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
>  extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
>  #endif
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
> b/arch/powerpc/mm/mmu_context_iommu.c
> index da6a216..aa1565d 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -63,12 +63,9 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
>   return ret;
>  }
>  
> -bool mm_iommu_preregistered(void)
> +bool mm_iommu_preregistered(mm_context_t *mm)
>  {
> - if (!current || !current->mm)
> - return false;
> -
> - return !list_empty(>mm->context.iommu_group_mem_list);
> + return !list_empty(>iommu_group_mem_list);
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>  
> @@ -231,6 +228,24 @@ unlock_exit:
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_put);
>  
> +struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(mm_context_t *mm,
> + unsigned long ua, unsigned long size)
> +{
> + struct mm_iommu_table_group_mem_t *mem, *ret = NULL;

I think you could do with a comment here explaining why the lockless
traversal is safe.

> + list_for_each_entry_lockless(mem, >iommu_group_mem_list, next) {
> + if ((mem->ua <= ua) &&
> + (ua + size <= mem->ua +
> +  (mem->entries << PAGE_SHIFT))) {
> + ret = mem;
> + break;
> + }
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(mm_iommu_lookup_rm);
> +
>  struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua,
>   unsigned long size)
>  {
> @@ -284,6 +299,26 @@ long mm_iommu_ua_to_hpa(struct 
> mm_iommu_table_group_mem_t *mem,
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa);
>  
> +long mm_iommu_rm_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
> + unsigned long ua, unsigned long *hpa)
> +{
> + const long entry = (ua - mem->ua) >> PAGE_SHIFT;
> + void *va = >hpas[entry];
> + unsigned long *ra;
> +
> + if (entry >= mem->entries)
> + return -EFAULT;
> +
> + ra = (void *) vmalloc_to_phys(va);
> + if (!ra)
> + return -EFAULT;
> +
> + *hpa = *ra | (ua & ~PAGE_MASK);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(mm_iommu_rm_ua_to_hpa);
> +
>  long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem)
>  {
>   if

Re: [PATCH v6 20/20] cxl: Remove cxl_get_phys_dev() kernel API

2016-03-06 Thread Ian Munsie

Acked-by: Ian Munsie 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 13/20] cxl: sysfs support for guests

2016-03-06 Thread Ian Munsie

Acked-by: Ian Munsie 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/process: fix altivec SPR not being saved

2016-03-06 Thread Anton Blanchard via Linuxppc-dev

Hi Oliver,

> In save_sprs() in process.c contains the following test:
> 
>   if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC)))
>   t->vrsave = mfspr(SPRN_VRSAVE);
> 
> CPU feature with the mask 0x1 is CPU_FTR_COHERENT_ICACHE so the test
> is equivilent to:
> 
>   if (cpu_has_feature(CPU_FTR_ALTIVEC) &&
>   cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
> 
> On CPUs without support for both (i.e G5) this results in vrsave not
> being saved between context switches. The vector register
> save/restore code doesn't use VRSAVE to determine which registers to
> save/restore, but the value of VRSAVE is used to determine if altivec
> is being used in several code paths.

Nice catch, not sure how I missed that. As Ben suggests, it should
definitely go to -stable as well.

Feel free to add my sign off:

Signed-off-by: Anton Blanchard 

Anton

> Signed-off-by: Oliver O'Halloran 
> ---
>  arch/powerpc/kernel/process.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/process.c
> b/arch/powerpc/kernel/process.c index 8224852..5a4d4d1 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -855,7 +855,7 @@ void restore_tm_state(struct pt_regs *regs)
>  static inline void save_sprs(struct thread_struct *t)
>  {
>  #ifdef CONFIG_ALTIVEC
> - if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC)))
> + if (cpu_has_feature(CPU_FTR_ALTIVEC))
>   t->vrsave = mfspr(SPRN_VRSAVE);
>  #endif
>  #ifdef CONFIG_PPC_BOOK3S_64

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/process: fix altivec SPR not being saved

2016-03-06 Thread Benjamin Herrenschmidt

On Mon, 2016-03-07 at 09:33 +1100, Oliver O'Halloran wrote:
> In save_sprs() in process.c contains the following test:
> 
>   if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC)))
>   t->vrsave = mfspr(SPRN_VRSAVE);
> 
> CPU feature with the mask 0x1 is CPU_FTR_COHERENT_ICACHE so the test
> is equivilent to:
> 
>   if (cpu_has_feature(CPU_FTR_ALTIVEC) &&
>   cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
> 
> On CPUs without support for both (i.e G5) this results in vrsave not
> being
> saved between context switches. The vector register save/restore code
> doesn't use VRSAVE to determine which registers to save/restore,
> but the value of VRSAVE is used to determine if altivec is being used
> in several code paths.

Nice one, should probably go to stable !

> Signed-off-by: Oliver O'Halloran 
> ---
>  arch/powerpc/kernel/process.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/process.c
> b/arch/powerpc/kernel/process.c
> index 8224852..5a4d4d1 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -855,7 +855,7 @@ void restore_tm_state(struct pt_regs *regs)
>  static inline void save_sprs(struct thread_struct *t)
>  {
>  #ifdef CONFIG_ALTIVEC
> - if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC)))
> + if (cpu_has_feature(CPU_FTR_ALTIVEC))
>   t->vrsave = mfspr(SPRN_VRSAVE);
>  #endif
>  #ifdef CONFIG_PPC_BOOK3S_64

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel 5/9] KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently

2016-03-06 Thread Alexey Kardashevskiy

It does not make much sense to have KVM in book3s-64 and
not to have IOMMU bits for PCI pass through support as it costs little
and allows VFIO to function on book3s KVM.

Having IOMMU_API always enabled makes it unnecessary to have a lot of
"#ifdef IOMMU_API" in arch/powerpc/kvm/book3s_64_vio*. With those
ifdef's we could have only user space emulated devices accelerated
(but not VFIO) which do not seem to be very useful.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index c2024ac..1059846 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -64,6 +64,7 @@ config KVM_BOOK3S_64
select KVM_BOOK3S_64_HANDLER
select KVM
select KVM_BOOK3S_PR_POSSIBLE if !KVM_BOOK3S_HV_POSSIBLE
+   select SPAPR_TCE_IOMMU if IOMMU_SUPPORT
---help---
  Support running unmodified book3s_64 and book3s_32 guest kernels
  in virtual machines on book3s_64 host processors.
-- 
2.5.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel 8/9] KVM: PPC: Add in-kernel handling for VFIO

2016-03-06 Thread Alexey Kardashevskiy

This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
without passing them to user space which saves time on switching
to user space and back.

Both real and virtual modes are supported. The kernel tries to
handle a TCE request in the real mode, if fails it passes the request
to the virtual mode to complete the operation. If it a virtual mode
handler fails, the request is passed to user space; this is not expected
to happen ever though.

The first user of this is VFIO on POWER. Trampolines to the VFIO external
user API functions are required for this patch.

This uses a VFIO KVM device to associate a logical bus number (LIOBN)
with an VFIO IOMMU group fd and enable in-kernel handling of map/unmap
requests.

To make use of the feature, the user space has to create a guest view
of the TCE table via KVM_CAP_SPAPR_TCE/KVM_CAP_SPAPR_TCE_64 and
then associate a LIOBN with this table via VFIO KVM device,
a KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN property (which is added in
the next patch).

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/kvm/book3s_64_vio.c| 184 +++
 arch/powerpc/kvm/book3s_64_vio_hv.c | 186 
 2 files changed, 370 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 7965fc7..9417d12 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -317,11 +318,161 @@ fail:
return ret;
 }
 
+static long kvmppc_tce_iommu_mapped_dec(struct iommu_table *tbl,
+   unsigned long entry)
+{
+   struct mm_iommu_table_group_mem_t *mem = NULL;
+   const unsigned long pgsize = 1ULL << tbl->it_page_shift;
+   unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
+
+   if (!pua)
+   return H_HARDWARE;
+
+   mem = mm_iommu_lookup(*pua, pgsize);
+   if (!mem)
+   return H_HARDWARE;
+
+   mm_iommu_mapped_dec(mem);
+
+   *pua = 0;
+
+   return H_SUCCESS;
+}
+
+static long kvmppc_tce_iommu_unmap(struct iommu_table *tbl,
+   unsigned long entry)
+{
+   enum dma_data_direction dir = DMA_NONE;
+   unsigned long hpa = 0;
+
+   if (iommu_tce_xchg(tbl, entry, , ))
+   return H_HARDWARE;
+
+   if (dir == DMA_NONE)
+   return H_SUCCESS;
+
+   return kvmppc_tce_iommu_mapped_dec(tbl, entry);
+}
+
+long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
+   unsigned long entry, unsigned long gpa,
+   enum dma_data_direction dir)
+{
+   long ret;
+   unsigned long hpa, ua, *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
+   struct mm_iommu_table_group_mem_t *mem;
+
+   if (!pua)
+   return H_HARDWARE;
+
+   if (kvmppc_gpa_to_ua(kvm, gpa, , NULL))
+   return H_HARDWARE;
+
+   mem = mm_iommu_lookup(ua, 1ULL << tbl->it_page_shift);
+   if (!mem)
+   return H_HARDWARE;
+
+   if (mm_iommu_ua_to_hpa(mem, ua, ))
+   return H_HARDWARE;
+
+   if (mm_iommu_mapped_inc(mem))
+   return H_HARDWARE;
+
+   ret = iommu_tce_xchg(tbl, entry, , );
+   if (ret) {
+   mm_iommu_mapped_dec(mem);
+   return H_TOO_HARD;
+   }
+
+   if (dir != DMA_NONE)
+   kvmppc_tce_iommu_mapped_dec(tbl, entry);
+
+   *pua = ua;
+
+   return 0;
+}
+
+long kvmppc_h_put_tce_iommu(struct kvm_vcpu *vcpu,
+   struct iommu_table *tbl,
+   unsigned long liobn, unsigned long ioba,
+   unsigned long tce)
+{
+   long idx, ret = H_HARDWARE;
+   const unsigned long entry = ioba >> tbl->it_page_shift;
+   const unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
+   const enum dma_data_direction dir = iommu_tce_direction(tce);
+
+   /* Clear TCE */
+   if (dir == DMA_NONE) {
+   if (iommu_tce_clear_param_check(tbl, ioba, 0, 1))
+   return H_PARAMETER;
+
+   return kvmppc_tce_iommu_unmap(tbl, entry);
+   }
+
+   /* Put TCE */
+   if (iommu_tce_put_param_check(tbl, ioba, tce))
+   return H_PARAMETER;
+
+   idx = srcu_read_lock(>kvm->srcu);
+   ret = kvmppc_tce_iommu_map(vcpu->kvm, tbl, entry, gpa, dir);
+   srcu_read_unlock(>kvm->srcu, idx);
+
+   return ret;
+}
+
+static long kvmppc_h_put_tce_indirect_iommu(struct kvm_vcpu *vcpu,
+   struct iommu_table *tbl, unsigned long ioba,
+   u64 __user *tces, unsigned long npages)
+{
+   unsigned long i, ret, tce, gpa;
+   const unsigned long entry = ioba >>

[PATCH kernel 3/9] KVM: PPC: Use preregistered memory API to access TCE list

2016-03-06 Thread Alexey Kardashevskiy

VFIO on sPAPR already implements guest memory pre-registration
when the entire guest RAM gets pinned. This can be used to translate
the physical address of a guest page containing the TCE list
from H_PUT_TCE_INDIRECT.

This makes use of the pre-registrered memory API to access TCE list
pages in order to avoid unnecessary locking on the KVM memory
reverse map.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/kvm/book3s_64_vio_hv.c | 86 ++---
 1 file changed, 70 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 44be73e..af155f6 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -180,6 +180,38 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
 EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+static mm_context_t *kvmppc_mm_context(struct kvm_vcpu *vcpu)
+{
+   struct task_struct *task;
+
+   task = vcpu->arch.run_task;
+   if (unlikely(!task || !task->mm))
+   return NULL;
+
+   return >mm->context;
+}
+
+static inline bool kvmppc_preregistered(struct kvm_vcpu *vcpu)
+{
+   mm_context_t *mm = kvmppc_mm_context(vcpu);
+
+   if (unlikely(!mm))
+   return false;
+
+   return mm_iommu_preregistered(mm);
+}
+
+static struct mm_iommu_table_group_mem_t *kvmppc_rm_iommu_lookup(
+   struct kvm_vcpu *vcpu, unsigned long ua, unsigned long size)
+{
+   mm_context_t *mm = kvmppc_mm_context(vcpu);
+
+   if (unlikely(!mm))
+   return NULL;
+
+   return mm_iommu_lookup_rm(mm, ua, size);
+}
+
 long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
  unsigned long ioba, unsigned long tce)
 {
@@ -261,23 +293,44 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
if (ret != H_SUCCESS)
return ret;
 
-   if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , ))
-   return H_TOO_HARD;
+   if (kvmppc_preregistered(vcpu)) {
+   /*
+* We get here if guest memory was pre-registered which
+* is normally VFIO case and gpa->hpa translation does not
+* depend on hpt.
+*/
+   struct mm_iommu_table_group_mem_t *mem;
 
-   rmap = (void *) vmalloc_to_phys(rmap);
+   if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , NULL))
+   return H_TOO_HARD;
 
-   /*
-* Synchronize with the MMU notifier callbacks in
-* book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
-* While we have the rmap lock, code running on other CPUs
-* cannot finish unmapping the host real page that backs
-* this guest real page, so we are OK to access the host
-* real page.
-*/
-   lock_rmap(rmap);
-   if (kvmppc_rm_ua_to_hpa(vcpu, ua, )) {
-   ret = H_TOO_HARD;
-   goto unlock_exit;
+   mem = kvmppc_rm_iommu_lookup(vcpu, ua, IOMMU_PAGE_SIZE_4K);
+   if (!mem || mm_iommu_rm_ua_to_hpa(mem, ua, ))
+   return H_TOO_HARD;
+   } else {
+   /*
+* This is emulated devices case.
+* We do not require memory to be preregistered in this case
+* so lock rmap and do __find_linux_pte_or_hugepte().
+*/
+   if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , ))
+   return H_TOO_HARD;
+
+   rmap = (void *) vmalloc_to_phys(rmap);
+
+   /*
+* Synchronize with the MMU notifier callbacks in
+* book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
+* While we have the rmap lock, code running on other CPUs
+* cannot finish unmapping the host real page that backs
+* this guest real page, so we are OK to access the host
+* real page.
+*/
+   lock_rmap(rmap);
+   if (kvmppc_rm_ua_to_hpa(vcpu, ua, )) {
+   ret = H_TOO_HARD;
+   goto unlock_exit;
+   }
}
 
for (i = 0; i < npages; ++i) {
@@ -291,7 +344,8 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
}
 
 unlock_exit:
-   unlock_rmap(rmap);
+   if (rmap)
+   unlock_rmap(rmap);
 
return ret;
 }
-- 
2.5.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel 2/9] powerpc/mmu: Add real mode support for IOMMU preregistered memory

2016-03-06 Thread Alexey Kardashevskiy

This makes mm_iommu_lookup() able to work in realmode by replacing
list_for_each_entry_rcu() (which can do debug stuff which can fail in
real mode) with list_for_each_entry_lockless().

This adds realmode version of mm_iommu_ua_to_hpa() which adds
explicit vmalloc'd-to-linear address conversion.
Unlike mm_iommu_ua_to_hpa(), mm_iommu_rm_ua_to_hpa() can fail.

This changes mm_iommu_preregistered() to receive @mm as in real mode
@current does not always have a correct pointer.

This adds realmode version of mm_iommu_lookup() which receives @mm
(for the same reason as for mm_iommu_preregistered()) and uses
lockless version of list_for_each_entry_rcu().

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/mmu_context.h |  6 -
 arch/powerpc/mm/mmu_context_iommu.c| 45 ++
 2 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 878c277..3ba652a 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -18,7 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 struct mm_iommu_table_group_mem_t;
 
-extern bool mm_iommu_preregistered(void);
+extern bool mm_iommu_preregistered(mm_context_t *mm);
 extern long mm_iommu_get(unsigned long ua, unsigned long entries,
struct mm_iommu_table_group_mem_t **pmem);
 extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem);
@@ -26,10 +26,14 @@ extern void mm_iommu_init(mm_context_t *ctx);
 extern void mm_iommu_cleanup(mm_context_t *ctx);
 extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua,
unsigned long size);
+extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(mm_context_t *mm,
+   unsigned long ua, unsigned long size);
 extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua,
unsigned long entries);
 extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
unsigned long ua, unsigned long *hpa);
+extern long mm_iommu_rm_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
+   unsigned long ua, unsigned long *hpa);
 extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
 extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
 #endif
diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
b/arch/powerpc/mm/mmu_context_iommu.c
index da6a216..aa1565d 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -63,12 +63,9 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
return ret;
 }
 
-bool mm_iommu_preregistered(void)
+bool mm_iommu_preregistered(mm_context_t *mm)
 {
-   if (!current || !current->mm)
-   return false;
-
-   return !list_empty(>mm->context.iommu_group_mem_list);
+   return !list_empty(>iommu_group_mem_list);
 }
 EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
 
@@ -231,6 +228,24 @@ unlock_exit:
 }
 EXPORT_SYMBOL_GPL(mm_iommu_put);
 
+struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(mm_context_t *mm,
+   unsigned long ua, unsigned long size)
+{
+   struct mm_iommu_table_group_mem_t *mem, *ret = NULL;
+
+   list_for_each_entry_lockless(mem, >iommu_group_mem_list, next) {
+   if ((mem->ua <= ua) &&
+   (ua + size <= mem->ua +
+(mem->entries << PAGE_SHIFT))) {
+   ret = mem;
+   break;
+   }
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(mm_iommu_lookup_rm);
+
 struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua,
unsigned long size)
 {
@@ -284,6 +299,26 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t 
*mem,
 }
 EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa);
 
+long mm_iommu_rm_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
+   unsigned long ua, unsigned long *hpa)
+{
+   const long entry = (ua - mem->ua) >> PAGE_SHIFT;
+   void *va = >hpas[entry];
+   unsigned long *ra;
+
+   if (entry >= mem->entries)
+   return -EFAULT;
+
+   ra = (void *) vmalloc_to_phys(va);
+   if (!ra)
+   return -EFAULT;
+
+   *hpa = *ra | (ua & ~PAGE_MASK);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(mm_iommu_rm_ua_to_hpa);
+
 long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem)
 {
if (atomic64_inc_not_zero(>mapped))
-- 
2.5.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel 1/9] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number

2016-03-06 Thread Alexey Kardashevskiy

This adds a capability number for in-kernel support for VFIO on
SPAPR platform.

The capability will tell the user space whether in-kernel handlers of
H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space
must not attempt allocating a TCE table in the host kernel via
the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests
will not be passed to the user space which is desired action in
the situation like that.

Signed-off-by: Alexey Kardashevskiy 
---
 include/uapi/linux/kvm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c251f06..080ffbf 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -863,6 +863,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_HYPERV_SYNIC 123
 #define KVM_CAP_S390_RI 124
 #define KVM_CAP_SPAPR_TCE_64 125
+#define KVM_CAP_SPAPR_TCE_VFIO 126
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.5.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel 9/9] KVM: PPC: VFIO device: support SPAPR TCE

2016-03-06 Thread Alexey Kardashevskiy

sPAPR TCE IOMMU is para-virtualized and the guest does map/unmap
via hypercalls which take a logical bus id (LIOBN) as a target IOMMU
identifier. LIOBNs are made up, advertised to guest systems and
linked to IOMMU groups by the user space.
In order to enable acceleration for IOMMU operations in KVM, we need
to tell KVM the information about the LIOBN-to-group mapping.

For that, a new KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN parameter
is added which accepts:
- a VFIO group fd and IO base address to find the actual hardware
TCE table;
- a LIOBN to assign to the found table.

Before notifying KVM about new link, this check the group for being
registered with KVM device in order to release them at unexpected KVM
finish.

This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user
space.

While we are here, this also fixes VFIO KVM device compiling to let it
link to a KVM module.

Signed-off-by: Alexey Kardashevskiy 
---
 Documentation/virtual/kvm/devices/vfio.txt |  21 +-
 arch/powerpc/kvm/Kconfig   |   1 +
 arch/powerpc/kvm/Makefile  |   5 +-
 arch/powerpc/kvm/powerpc.c |   1 +
 include/uapi/linux/kvm.h   |   9 +++
 virt/kvm/vfio.c| 106 +
 6 files changed, 140 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
b/Documentation/virtual/kvm/devices/vfio.txt
index ef51740..c0d3eb7 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -16,7 +16,24 @@ Groups:
 
 KVM_DEV_VFIO_GROUP attributes:
   KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
+   kvm_device_attr.addr points to an int32_t file descriptor
+   for the VFIO group.
+
   KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
+   kvm_device_attr.addr points to an int32_t file descriptor
+   for the VFIO group.
 
-For each, kvm_device_attr.addr points to an int32_t file descriptor
-for the VFIO group.
+  KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN: sets a liobn for a VFIO group
+   kvm_device_attr.addr points to a struct:
+   struct kvm_vfio_spapr_tce_liobn {
+   __u32   argsz;
+   __s32   fd;
+   __u32   liobn;
+   __u8pad[4];
+   __u64   start_addr;
+   };
+   where
+   @argsz is the size of kvm_vfio_spapr_tce_liobn;
+   @fd is a file descriptor for a VFIO group;
+   @liobn is a logical bus id to be associated with the group;
+   @start_addr is a DMA window offset on the IO (PCI) bus
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 1059846..dfa3488 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -65,6 +65,7 @@ config KVM_BOOK3S_64
select KVM
select KVM_BOOK3S_PR_POSSIBLE if !KVM_BOOK3S_HV_POSSIBLE
select SPAPR_TCE_IOMMU if IOMMU_SUPPORT
+   select KVM_VFIO if VFIO
---help---
  Support running unmodified book3s_64 and book3s_32 guest kernels
  in virtual machines on book3s_64 host processors.
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 7f7b6d8..71f577c 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -8,7 +8,7 @@ ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm
 KVM := ../../../virt/kvm
 
 common-objs-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
-   $(KVM)/eventfd.o $(KVM)/vfio.o
+   $(KVM)/eventfd.o
 
 CFLAGS_e500_mmu.o := -I.
 CFLAGS_e500_mmu_host.o := -I.
@@ -87,6 +87,9 @@ endif
 kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
book3s_xics.o
 
+kvm-book3s_64-objs-$(CONFIG_KVM_VFIO) += \
+   $(KVM)/vfio.o \
+
 kvm-book3s_64-module-objs += \
$(KVM)/kvm_main.o \
$(KVM)/eventfd.o \
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 19aa59b..63f188d 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -521,6 +521,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 #ifdef CONFIG_PPC_BOOK3S_64
case KVM_CAP_SPAPR_TCE:
case KVM_CAP_SPAPR_TCE_64:
+   case KVM_CAP_SPAPR_TCE_VFIO:
case KVM_CAP_PPC_ALLOC_HTAB:
case KVM_CAP_PPC_RTAS:
case KVM_CAP_PPC_FIXUP_HCALL:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 080ffbf..f1abbea 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1056,6 +1056,7 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_GROUP1
 #define   KVM_DEV_VFIO_GROUP_ADD   1
 #define   KVM_DEV_VFIO_GROUP_DEL   2
+#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN   3
 
 enum kvm_device_type {
KVM_DEV_TYPE_FSL_MPIC_20= 1,
@@ -1075,6 +1076,14 @@ enum kvm_device_type {

[PATCH kernel 7/9] KVM: PPC: Create a virtual-mode only TCE table handlers

2016-03-06 Thread Alexey Kardashevskiy

In-kernel VFIO acceleration needs different handling in real and virtual
modes which makes it hard to support both modes in the same handler.

This creates a copy of kvmppc_rm_h_stuff_tce and kvmppc_rm_h_put_tce
in addition to the existing kvmppc_rm_h_put_tce_indirect.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/kvm/book3s_64_vio.c| 52 +
 arch/powerpc/kvm/book3s_64_vio_hv.c |  8 ++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  4 +--
 3 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 846d16d..7965fc7 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -317,6 +317,32 @@ fail:
return ret;
 }
 
+long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
+ unsigned long ioba, unsigned long tce)
+{
+   struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
+   long ret;
+
+   /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
+   /*  liobn, ioba, tce); */
+
+   if (!stt)
+   return H_TOO_HARD;
+
+   ret = kvmppc_ioba_validate(stt, ioba, 1);
+   if (ret != H_SUCCESS)
+   return ret;
+
+   ret = kvmppc_tce_validate(stt, tce);
+   if (ret != H_SUCCESS)
+   return ret;
+
+   kvmppc_tce_put(stt, ioba >> stt->page_shift, tce);
+
+   return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
+
 long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
unsigned long liobn, unsigned long ioba,
unsigned long tce_list, unsigned long npages)
@@ -372,3 +398,29 @@ unlock_exit:
return ret;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
+
+long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+   unsigned long liobn, unsigned long ioba,
+   unsigned long tce_value, unsigned long npages)
+{
+   struct kvmppc_spapr_tce_table *stt;
+   long i, ret;
+
+   stt = kvmppc_find_table(vcpu, liobn);
+   if (!stt)
+   return H_TOO_HARD;
+
+   ret = kvmppc_ioba_validate(stt, ioba, npages);
+   if (ret != H_SUCCESS)
+   return ret;
+
+   /* Check permission bits only to allow userspace poison TCE for debug */
+   if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
+   return H_PARAMETER;
+
+   for (i = 0; i < npages; ++i, ioba += (1ULL << stt->page_shift))
+   kvmppc_tce_put(stt, ioba >> stt->page_shift, tce_value);
+
+   return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index af155f6..11163ae 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -212,8 +212,8 @@ static struct mm_iommu_table_group_mem_t 
*kvmppc_rm_iommu_lookup(
return mm_iommu_lookup_rm(mm, ua, size);
 }
 
-long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
- unsigned long ioba, unsigned long tce)
+long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
+   unsigned long ioba, unsigned long tce)
 {
struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
long ret;
@@ -236,7 +236,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long 
liobn,
 
return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
 
 static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
unsigned long ua, unsigned long *phpa)
@@ -350,7 +349,7 @@ unlock_exit:
return ret;
 }
 
-long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
unsigned long liobn, unsigned long ioba,
unsigned long tce_value, unsigned long npages)
 {
@@ -374,7 +373,6 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 
return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
 
 long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
  unsigned long ioba)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index ed16182..d6dad2c 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1928,7 +1928,7 @@ hcall_real_table:
.long   DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
.long   DOTSYM(kvmppc_h_protect) - hcall_real_table
.long   DOTSYM(kvmppc_h_get_tce) - hcall_real_table
-   .long   DOTSYM(kvmppc_h_put_tce) - hcall_real_table
+   .long   DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
.long   0   /* 0x24 - H_SET_SPRG0 */
.long   DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
.long   0   /* 0x2c */
@@ -2006,7 +2006,7 @@ hcall_real_table:
.long   0   /* 0x12c */
.long   0

[PATCH kernel 6/9] KVM: PPC: Associate IOMMU group with guest view of TCE table

2016-03-06 Thread Alexey Kardashevskiy

The existing in-kernel TCE table for emulated devices contains
guest physical addresses which are accesses by emulated devices.
Since we need to keep this information for VFIO devices too
in order to implement H_GET_TCE, we are reusing it.

This adds IOMMU group list to kvmppc_spapr_tce_table. Each group
will have an iommu_table pointer.

This adds kvm_spapr_tce_attach_iommu_group() helper and its detach
counterpart to manage the lists.

This puts a group when:
- guest copy of TCE table is destroyed when TCE table fd is closed;
- kvm_spapr_tce_detach_iommu_group() is called from
the KVM_DEV_VFIO_GROUP_DEL ioctl handler in the case vfio-pci hotunplug
(will be added in the following patch).

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/kvm_host.h |   8 +++
 arch/powerpc/include/asm/kvm_ppc.h  |   6 ++
 arch/powerpc/kvm/book3s_64_vio.c| 108 
 3 files changed, 122 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2e7c791..2c5c823 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -178,6 +178,13 @@ struct kvmppc_pginfo {
atomic_t refcnt;
 };
 
+struct kvmppc_spapr_tce_group {
+   struct list_head next;
+   struct rcu_head rcu;
+   struct iommu_group *refgrp;/* for reference counting only */
+   struct iommu_table *tbl;
+};
+
 struct kvmppc_spapr_tce_table {
struct list_head list;
struct kvm *kvm;
@@ -186,6 +193,7 @@ struct kvmppc_spapr_tce_table {
u32 page_shift;
u64 offset; /* in pages */
u64 size;   /* window size in pages */
+   struct list_head groups;
struct page *pages[0];
 };
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 2544eda..d1482dc 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -164,6 +164,12 @@ extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
struct kvm_memory_slot *memslot, unsigned long porder);
 extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
+extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm,
+   unsigned long liobn,
+   phys_addr_t start_addr,
+   struct iommu_group *grp);
+extern void kvm_spapr_tce_detach_iommu_group(struct kvm *kvm,
+   struct iommu_group *grp);
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce_64 *args);
 extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 2c2d103..846d16d 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -95,10 +96,18 @@ static void release_spapr_tce_table(struct rcu_head *head)
struct kvmppc_spapr_tce_table *stt = container_of(head,
struct kvmppc_spapr_tce_table, rcu);
unsigned long i, npages = kvmppc_tce_pages(stt->size);
+   struct kvmppc_spapr_tce_group *kg;
 
for (i = 0; i < npages; i++)
__free_page(stt->pages[i]);
 
+   while (!list_empty(>groups)) {
+   kg = list_first_entry(>groups,
+   struct kvmppc_spapr_tce_group, next);
+   list_del(>next);
+   kfree(kg);
+   }
+
kfree(stt);
 }
 
@@ -129,9 +138,15 @@ static int kvm_spapr_tce_mmap(struct file *file, struct 
vm_area_struct *vma)
 static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
 {
struct kvmppc_spapr_tce_table *stt = filp->private_data;
+   struct kvmppc_spapr_tce_group *kg;
 
list_del_rcu(>list);
 
+   list_for_each_entry_rcu(kg, >groups, next) {
+   iommu_group_put(kg->refgrp);
+   kg->refgrp = NULL;
+   }
+
kvm_put_kvm(stt->kvm);
 
kvmppc_account_memlimit(
@@ -146,6 +161,98 @@ static const struct file_operations kvm_spapr_tce_fops = {
.release= kvm_spapr_tce_release,
 };
 
+extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm,
+   unsigned long liobn,
+   phys_addr_t start_addr,
+   struct iommu_group *grp)
+{
+   struct kvmppc_spapr_tce_table *stt = NULL;
+   struct iommu_table_group *table_group;
+   long i;
+   bool found = false;
+   struct kvmppc_spapr_tce_group *kg;
+   struct iommu_table *tbltmp;
+
+   /* Check this LIOBN hasn't been previously allocated */
+   list_for_each_entry_rcu(stt, >arch.spapr_tce_tables, list) {
+   if (stt->liobn == liobn) {
+   if ((stt->offset <<

[PATCH kernel 4/9] powerpc/powernv/iommu: Add real mode version of xchg()

2016-03-06 Thread Alexey Kardashevskiy

In real mode, TCE tables are invalidated using different
cache-inhibited store instructions which is different from
the virtual mode.

This defines and implements exchange_rm() callback. This does not
define set_rm/clear_rm/flush_rm callbacks as there is no user for those -
exchange/exchange_rm are only to be used by KVM for VFIO.

The exchange_rm callback is defined for IODA1/IODA2 powernv platforms.

This replaces list_for_each_entry_rcu with its lockless version as
from now on pnv_pci_ioda2_tce_invalidate() can be called in
the real mode too.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h  |  7 +++
 arch/powerpc/kernel/iommu.c   | 15 +++
 arch/powerpc/platforms/powernv/pci-ioda.c | 28 +++-
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 7b87bab..3ca877a 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -64,6 +64,11 @@ struct iommu_table_ops {
long index,
unsigned long *hpa,
enum dma_data_direction *direction);
+   /* Real mode */
+   int (*exchange_rm)(struct iommu_table *tbl,
+   long index,
+   unsigned long *hpa,
+   enum dma_data_direction *direction);
 #endif
void (*clear)(struct iommu_table *tbl,
long index, long npages);
@@ -208,6 +213,8 @@ extern void iommu_del_device(struct device *dev);
 extern int __init tce_iommu_bus_notifier_init(void);
 extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
unsigned long *hpa, enum dma_data_direction *direction);
+extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
+   unsigned long *hpa, enum dma_data_direction *direction);
 #else
 static inline void iommu_register_group(struct iommu_table_group *table_group,
int pci_domain_number,
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index a8e3490..2fcc48b 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1062,6 +1062,21 @@ void iommu_release_ownership(struct iommu_table *tbl)
 }
 EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
+long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
+   unsigned long *hpa, enum dma_data_direction *direction)
+{
+   long ret;
+
+   ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
+
+   if (!ret && ((*direction == DMA_FROM_DEVICE) ||
+   (*direction == DMA_BIDIRECTIONAL)))
+   SetPageDirty(realmode_pfn_to_page(*hpa >> PAGE_SHIFT));
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_tce_xchg_rm);
+
 int iommu_add_device(struct device *dev)
 {
struct iommu_table *tbl;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index c5baaf3..bed1944 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1791,6 +1791,18 @@ static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, 
long index,
 
return ret;
 }
+
+static int pnv_ioda1_tce_xchg_rm(struct iommu_table *tbl, long index,
+   unsigned long *hpa, enum dma_data_direction *direction)
+{
+   long ret = pnv_tce_xchg(tbl, index, hpa, direction);
+
+   if (!ret && (tbl->it_type &
+   (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE)))
+   pnv_pci_ioda1_tce_invalidate(tbl, index, 1, true);
+
+   return ret;
+}
 #endif
 
 static void pnv_ioda1_tce_free(struct iommu_table *tbl, long index,
@@ -1806,6 +1818,7 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = {
.set = pnv_ioda1_tce_build,
 #ifdef CONFIG_IOMMU_API
.exchange = pnv_ioda1_tce_xchg,
+   .exchange_rm = pnv_ioda1_tce_xchg_rm,
 #endif
.clear = pnv_ioda1_tce_free,
.get = pnv_tce_get,
@@ -1866,7 +1879,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
iommu_table *tbl,
 {
struct iommu_table_group_link *tgl;
 
-   list_for_each_entry_rcu(tgl, >it_group_list, next) {
+   list_for_each_entry_lockless(tgl, >it_group_list, next) {
struct pnv_ioda_pe *npe;
struct pnv_ioda_pe *pe = container_of(tgl->table_group,
struct pnv_ioda_pe, table_group);
@@ -1918,6 +1931,18 @@ static int pnv_ioda2_tce_xchg(struct iommu_table *tbl, 
long index,
 
return ret;
 }
+
+static int pnv_ioda2_tce_xchg_rm(struct iommu_table *tbl, long index,
+   unsigned long *hpa, enum dma_data_direction *direction)
+{
+   long ret = pnv_tce_xchg(tbl, index, hpa, direction);
+
+   if (!ret && (tbl->it_type &
+   (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE)))
+

[PATCH kernel 0/9] KVM, PPC, VFIO: Enable in-kernel acceleration

2016-03-06 Thread Alexey Kardashevskiy

This enables in-kernel acceleration of H_PUT_TCE/etc hypercalls for pseries
guests using VFIO. As pseries is a para-virtualized environment, the guest
can see and control IOMMUs via special hypercalls which let the guest
to add and remove mappings in real hardware IOMMU.

This was posted last time quite a long time ago so I dropped versions now,
this re-respin is v1. This was successfully used in the PowerKVM product
for quite a while now.

This is based on git://git.kernel.org/pub/scm/virt/kvm/kvm.git , "next"
branch which got "multi-tce in-kernel acceleration" and "64 bit in-kernel
TCE" support.

Please comment. Thanks!


Alexey Kardashevskiy (9):
  KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number
  powerpc/mmu: Add real mode support for IOMMU preregistered memory
  KVM: PPC: Use preregistered memory API to access TCE list
  powerpc/powernv/iommu: Add real mode version of xchg()
  KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently
  KVM: PPC: Associate IOMMU group with guest view of TCE table
  KVM: PPC: Create a virtual-mode only TCE table handlers
  KVM: PPC: Add in-kernel handling for VFIO
  KVM: PPC: VFIO device: support SPAPR TCE

 Documentation/virtual/kvm/devices/vfio.txt |  21 +-
 arch/powerpc/include/asm/iommu.h   |   7 +
 arch/powerpc/include/asm/kvm_host.h|   8 +
 arch/powerpc/include/asm/kvm_ppc.h |   6 +
 arch/powerpc/include/asm/mmu_context.h |   6 +-
 arch/powerpc/kernel/iommu.c|  15 ++
 arch/powerpc/kvm/Kconfig   |   2 +
 arch/powerpc/kvm/Makefile  |   5 +-
 arch/powerpc/kvm/book3s_64_vio.c   | 344 +
 arch/powerpc/kvm/book3s_64_vio_hv.c| 280 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S|   4 +-
 arch/powerpc/kvm/powerpc.c |   1 +
 arch/powerpc/mm/mmu_context_iommu.c|  45 +++-
 arch/powerpc/platforms/powernv/pci-ioda.c  |  28 ++-
 include/uapi/linux/kvm.h   |  10 +
 virt/kvm/vfio.c| 106 +
 16 files changed, 855 insertions(+), 33 deletions(-)

-- 
2.5.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][v4] livepatch/ppc: Enable livepatching on powerpc

2016-03-06 Thread Michael Ellerman

On Fri, 2016-03-04 at 10:31 +0100, Miroslav Benes wrote:
> On Fri, 4 Mar 2016, Michael Ellerman wrote:
> > On Thu, 2016-03-03 at 17:52 +0100, Petr Mladek wrote:
> > 
> > > 3. Added an error message when including
> > >powerpc/include/asm/livepatch.h without HAVE_LIVEPATCH
> > 
> > I don't know why we want to do that, I don't see how it is helpful. It 
> > doesn't
> > even do what it says:
> > 
> > > +#ifdef CONFIG_LIVEPATCH
> > ...
> > > +#else /* CONFIG_LIVEPATCH */
> > > +#error Include linux/livepatch.h, not asm/livepatch.h
> > > +#endif /* CONFIG_LIVEPATCH */
> > 
> > If I turn on CONFIG_LIVEPATCH then I can quite happily include 
> > asm/livepatch.h
> > and not get an error. So the check doesn't do what the message suggests.
> 
> Well, yes. I looked into the archives to find if there was a reason to 
> even introduce it. It was not. It came up during a review process of the 
> livepatching patch set somehow and we left it there. I only changed the 
> error message to the mentioned one because we deemed it was better.

Thanks for looking into it.

> > And on x86 & s390 it does:
> > 
> >   #else
> >   #error Live patching support is disabled; check CONFIG_LIVEPATCH
> >   #endif
> 
> This is the old message. See 383bf44d1a8b ("livepatch: change the error 
> message in asm/livepatch.h header files").
> 
> Anyway, it really does not mean much. I'll send a patch for s390 and x86 
> to remove it completely in a minute.

Thanks. I know it's not a big deal, but the kernel is complicated enough
without extra code we don't really need :)

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Problems with swapping in v4.5-rc on POWER

2016-03-06 Thread Michael Ellerman

On Fri, 2016-03-04 at 09:58 -0800, Hugh Dickins wrote:
> 
> The alternative bisection was as unsatisfactory as the first:
> again it fingered an irrelevant merge (rather than any commit
> pulled in by that merge) as the bad commit.
> 
> It seems this issue is too intermittent for bisection to be useful,
> on my load anyway.

Darn. Thanks for trying.

> The best I can do now is try v4.4 for a couple of days, to verify that
> still comes out good (rather than the machine going bad coincident with
> v4.5-rc), then try v4.5-rc7 to verify that that still comes out bad.

Thanks, that would still be helpful.

> I'll report back on those; but beyond that, I'll have to leave it to you.

I haven't had any luck here :/

Can you give us a more verbose description of your test setup?

 - G5, which exact model?
 - 4k pages, no THP.
 - how much ram & swap?
 - building linus' tree, make -j ?
 - source and output on tmpfs? (how big?)
 - what device is the swap device? (you said SSD I think?)
 - anything else I've forgotten?

Oh and can you send us your bisect logs, we can at least trust the bad results
I think.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v3 3/7] QE: Add uqe_serial document to bindings

2016-03-06 Thread Qiang Zhao

On Tue, Mar 05, 2016 at 12:26PM, Rob Herring wrote:
> -Original Message-
> From: Rob Herring [mailto:r...@kernel.org]
> Sent: Saturday, March 05, 2016 12:26 PM
> To: Qiang Zhao 
> Cc: o...@buserror.net; Yang-Leo Li ; Xiaobo Xie
> ; linux-ker...@vger.kernel.org;
> devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH v3 3/7] QE: Add uqe_serial document to bindings
> 
> On Tue, Mar 01, 2016 at 03:09:39PM +0800, Zhao Qiang wrote:
> > Add uqe_serial document to
> > Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt
> >
> > Signed-off-by: Zhao Qiang 
> > ---
> > Changes for v2
> > - modify tx/rx-clock-name specification Changes for v2
> > - NA
> >
> >  .../bindings/powerpc/fsl/cpm_qe/uqe_serial.txt| 19
> +++
> >  1 file changed, 19 insertions(+)
> >  create mode 100644
> > Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt
> >
> > diff --git
> > a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt
> > b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt
> > new file mode 100644
> > index 000..436c71c
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.
> > +++ txt
> > @@ -0,0 +1,19 @@
> > +* Serial
> > +
> > +Currently defined compatibles:
> > +- ucc_uart
> 
> I guess this is in use already and okay. However, looking at the driver there
> really should be SoC specific compatible strings here since the driver is 
> looking
> up the SoC compatible string and composing the firmware filename from that.

Ok, I will changed both driver and this compatible.

> 
> > +
> > +Properties for ucc_uart:
> > +port-number : port number of UCC-UART tx/rx-clock-name : should be
> > +"brg1"-"brg16" for internal clock source,
> > +  should be "clk1"-"clk28" for external clock source.
> > +
> > +Example:
> > +
> > +   ucc_serial: ucc@2200 {
> > +   device_type = "serial";
> 
> Drop device_type. It should only be used in a few legacy cases.
> 
> Looks like the driver is matching on this. Please drop it from the driver 
> too. I'd
> leave dts files for now, but they should be updated too later.

Ok, Thank you for your Reviewing, I will drop it

> 
> > +   compatible = "ucc_uart";
> > +   port-number = <1>;
> > +   rx-clock-name = "brg2";
> > +   tx-clock-name = "brg2";
> > +   };
> > --
> > 2.1.0.27.g96db324
> >
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 8/8] QE-UART: modify of_device_id for qe-uart driver

2016-03-06 Thread Zhao Qiang

Drop device type and modify compatible to SoC specific compatible.

Signed-off-by: Zhao Qiang 
---
 drivers/tty/serial/ucc_uart.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/tty/serial/ucc_uart.c b/drivers/tty/serial/ucc_uart.c
index 1a7dc3c..ff6c1ab 100644
--- a/drivers/tty/serial/ucc_uart.c
+++ b/drivers/tty/serial/ucc_uart.c
@@ -1475,8 +1475,7 @@ static int ucc_uart_remove(struct platform_device *ofdev)
 
 static const struct of_device_id ucc_uart_match[] = {
{
-   .type = "serial",
-   .compatible = "ucc_uart",
+   .compatible = "t1040-ucc-uart",
},
{},
 };
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 7/8] T104xQDS: Add qe node to t104xqds

2016-03-06 Thread Zhao Qiang

add qe node to t104xqds.dtsi

Signed-off-by: Zhao Qiang 
---
Changes for v2
- rebase
Changes for v3
- rebase
Changes for v4
- rebase

 arch/powerpc/boot/dts/fsl/t104xqds.dtsi | 38 +
 1 file changed, 38 insertions(+)

diff --git a/arch/powerpc/boot/dts/fsl/t104xqds.dtsi 
b/arch/powerpc/boot/dts/fsl/t104xqds.dtsi
index 1498d1e..8e72041 100644
--- a/arch/powerpc/boot/dts/fsl/t104xqds.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t104xqds.dtsi
@@ -190,4 +190,42 @@
  0 0x0001>;
};
};
+
+   qe: qe@ffe14 {
+   ranges = <0x0 0xf 0xfe14 0x4>;
+   reg = <0xf 0xfe14 0 0x480>;
+   brg-frequency = <0>;
+   bus-frequency = <0>;
+
+   si1: si@700 {
+   compatible = "fsl,t1040-qe-si";
+   reg = <0x700 0x80>;
+   };
+
+   siram1: siram@1000 {
+   compatible = "fsl,t1040-qe-siram";
+   reg = <0x1000 0x800>;
+   };
+
+   ucc_hdlc: ucc@2000 {
+   compatible = "fsl,ucc-hdlc";
+   rx-clock-name = "clk8";
+   tx-clock-name = "clk9";
+   fsl,rx-sync-clock = "rsync_pin";
+   fsl,tx-sync-clock = "tsync_pin";
+   fsl,tx-timeslot-mask = <0xfffe>;
+   fsl,rx-timeslot-mask = <0xfffe>;
+   fsl,tdm-framer-type = "e1";
+   fsl,tdm-id = <0>;
+   fsl,siram-entry-id = <0>;
+   fsl,tdm-interface;
+   };
+
+   ucc_serial: ucc@2200 {
+   compatible = "t1040-ucc-uart";
+   port-number = <0>;
+   rx-clock-name = "brg2";
+   tx-clock-name = "brg2";
+   };
+   };
 };
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 6/8] T104xRDB: Add qe node to t104xrdb

2016-03-06 Thread Zhao Qiang

add qe node to t104xrdb.dtsi

Signed-off-by: Zhao Qiang 
---
Changes for v2
- rebase
Changes for v3
- rebase
Changes for v4
- rebase

 arch/powerpc/boot/dts/fsl/t104xrdb.dtsi | 38 +
 1 file changed, 38 insertions(+)

diff --git a/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi 
b/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi
index 830ea48..dd7fc2b 100644
--- a/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi
@@ -186,4 +186,42 @@
  0 0x0001>;
};
};
+
+   qe: qe@ffe14 {
+   ranges = <0x0 0xf 0xfe14 0x4>;
+   reg = <0xf 0xfe14 0 0x480>;
+   brg-frequency = <0>;
+   bus-frequency = <0>;
+
+   si1: si@700 {
+   compatible = "fsl,t1040-qe-si";
+   reg = <0x700 0x80>;
+   };
+
+   siram1: siram@1000 {
+   compatible = "fsl,t1040-qe-siram";
+   reg = <0x1000 0x800>;
+   };
+
+   ucc_hdlc: ucc@2000 {
+   compatible = "fsl,ucc-hdlc";
+   rx-clock-name = "clk8";
+   tx-clock-name = "clk9";
+   fsl,rx-sync-clock = "rsync_pin";
+   fsl,tx-sync-clock = "tsync_pin";
+   fsl,tx-timeslot-mask = <0xfffe>;
+   fsl,rx-timeslot-mask = <0xfffe>;
+   fsl,tdm-framer-type = "e1";
+   fsl,tdm-id = <0>;
+   fsl,siram-entry-id = <0>;
+   fsl,tdm-interface;
+   };
+
+   ucc_serial: ucc@2200 {
+   compatible = "t1040-ucc-uart";
+   port-number = <0>;
+   rx-clock-name = "brg2";
+   tx-clock-name = "brg2";
+   };
+   };
 };
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 5/8] T104xD4RDB: Add qe node to t104xd4rdb

2016-03-06 Thread Zhao Qiang

add qe node to t104xd4rdb.dtsi and t1040si-post.dtsi.

Signed-off-by: Zhao Qiang 
---
Changes for v2
- rebase
Changes for v3
- rebase
Changes for v4
- rebase

 arch/powerpc/boot/dts/fsl/t1040si-post.dtsi | 45 +
 arch/powerpc/boot/dts/fsl/t104xd4rdb.dtsi   | 38 
 2 files changed, 83 insertions(+)

diff --git a/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
index e0f4da5..012f813 100644
--- a/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
@@ -673,3 +673,48 @@
};
};
 };
+
+ {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   device_type = "qe";
+   compatible = "fsl,qe";
+   fsl,qe-num-riscs = <1>;
+   fsl,qe-num-snums = <28>;
+
+   qeic: interrupt-controller@80 {
+   interrupt-controller;
+   compatible = "fsl,qe-ic";
+   #address-cells = <0>;
+   #interrupt-cells = <1>;
+   reg = <0x80 0x80>;
+   interrupts = <95 2 0 0  94 2 0 0>; //high:79 low:78
+   };
+
+   ucc@2000 {
+   cell-index = <1>;
+   reg = <0x2000 0x200>;
+   interrupts = <32>;
+   interrupt-parent = <>;
+   };
+
+   ucc@2200 {
+   cell-index = <3>;
+   reg = <0x2200 0x200>;
+   interrupts = <34>;
+   interrupt-parent = <>;
+   };
+
+   muram@1 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "fsl,qe-muram", "fsl,cpm-muram";
+   ranges = <0x0 0x1 0x6000>;
+
+   data-only@0 {
+   compatible = "fsl,qe-muram-data",
+   "fsl,cpm-muram-data";
+   reg = <0x0 0x6000>;
+   };
+   };
+};
diff --git a/arch/powerpc/boot/dts/fsl/t104xd4rdb.dtsi 
b/arch/powerpc/boot/dts/fsl/t104xd4rdb.dtsi
index 3f6d7c6..41ed3a6 100644
--- a/arch/powerpc/boot/dts/fsl/t104xd4rdb.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t104xd4rdb.dtsi
@@ -212,4 +212,42 @@
  0 0x0001>;
};
};
+
+   qe: qe@ffe14 {
+   ranges = <0x0 0xf 0xfe14 0x4>;
+   reg = <0xf 0xfe14 0 0x480>;
+   brg-frequency = <0>;
+   bus-frequency = <0>;
+
+   si1: si@700 {
+   compatible = "fsl,t1040-qe-si";
+   reg = <0x700 0x80>;
+   };
+
+   siram1: siram@1000 {
+   compatible = "fsl,t1040-qe-siram";
+   reg = <0x1000 0x800>;
+   };
+
+   ucc_hdlc: ucc@2000 {
+   compatible = "fsl,ucc-hdlc";
+   rx-clock-name = "clk8";
+   tx-clock-name = "clk9";
+   fsl,rx-sync-clock = "rsync_pin";
+   fsl,tx-sync-clock = "tsync_pin";
+   fsl,tx-timeslot-mask = <0xfffe>;
+   fsl,rx-timeslot-mask = <0xfffe>;
+   fsl,tdm-framer-type = "e1";
+   fsl,tdm-id = <0>;
+   fsl,siram-entry-id = <0>;
+   fsl,tdm-interface;
+   };
+
+   ucc_serial: ucc@2200 {
+   compatible = "t1040-ucc-uart";
+   port-number = <0>;
+   rx-clock-name = "brg2";
+   tx-clock-name = "brg2";
+   };
+   };
 };
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 4/8] bindings: move cpm_qe binding from powerpc/fsl to soc/fsl

2016-03-06 Thread Zhao Qiang

cpm_qe is supported on both powerpc and arm.
and the QE code has been moved from arch/powerpc into
drivers/soc/fsl, so move cpm_qe binding from powerpc/fsl
to soc/fsl

Signed-off-by: Zhao Qiang 
Acked-by: Rob Herring
---
Changes for v3
- NA
Changes for v4
- NA

 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm.txt | 0
 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm/brg.txt | 0
 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm/i2c.txt | 0
 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm/pic.txt | 0
 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm/usb.txt | 0
 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/gpio.txt| 0
 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/network.txt | 0
 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe.txt  | 0
 .../devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/firmware.txt   | 0
 .../devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/par_io.txt | 0
 .../devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/pincfg.txt | 0
 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/ucc.txt  | 0
 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/usb.txt  | 0
 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/serial.txt  | 0
 .../devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/uqe_serial.txt| 0
 15 files changed, 0 insertions(+), 0 deletions(-)
 rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm.txt 
(100%)
 rename Documentation/devicetree/bindings/{powerpc => 
soc}/fsl/cpm_qe/cpm/brg.txt (100%)
 rename Documentation/devicetree/bindings/{powerpc => 
soc}/fsl/cpm_qe/cpm/i2c.txt (100%)
 rename Documentation/devicetree/bindings/{powerpc => 
soc}/fsl/cpm_qe/cpm/pic.txt (100%)
 rename Documentation/devicetree/bindings/{powerpc => 
soc}/fsl/cpm_qe/cpm/usb.txt (100%)
 rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/gpio.txt 
(100%)
 rename Documentation/devicetree/bindings/{powerpc => 
soc}/fsl/cpm_qe/network.txt (100%)
 rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe.txt 
(100%)
 rename Documentation/devicetree/bindings/{powerpc => 
soc}/fsl/cpm_qe/qe/firmware.txt (100%)
 rename Documentation/devicetree/bindings/{powerpc => 
soc}/fsl/cpm_qe/qe/par_io.txt (100%)
 rename Documentation/devicetree/bindings/{powerpc => 
soc}/fsl/cpm_qe/qe/pincfg.txt (100%)
 rename Documentation/devicetree/bindings/{powerpc => 
soc}/fsl/cpm_qe/qe/ucc.txt (100%)
 rename Documentation/devicetree/bindings/{powerpc => 
soc}/fsl/cpm_qe/qe/usb.txt (100%)
 rename Documentation/devicetree/bindings/{powerpc => 
soc}/fsl/cpm_qe/serial.txt (100%)
 rename Documentation/devicetree/bindings/{powerpc => 
soc}/fsl/cpm_qe/uqe_serial.txt (100%)

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm.txt 
b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm.txt
similarity index 100%
rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm.txt
rename to Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm.txt
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/brg.txt 
b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/brg.txt
similarity index 100%
rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/brg.txt
rename to Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/brg.txt
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/i2c.txt 
b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/i2c.txt
similarity index 100%
rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/i2c.txt
rename to Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/i2c.txt
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/pic.txt 
b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/pic.txt
similarity index 100%
rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/pic.txt
rename to Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/pic.txt
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/usb.txt 
b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/usb.txt
similarity index 100%
rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/usb.txt
rename to Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/usb.txt
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/gpio.txt 
b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/gpio.txt
similarity index 100%
rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/gpio.txt
rename to Documentation/devicetree/bindings/soc/fsl/cpm_qe/gpio.txt
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt 
b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt
similarity index 100%
rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt
rename to

[PATCH v4 3/8] QE: Add uqe_serial document to bindings

2016-03-06 Thread Zhao Qiang

Add uqe_serial document to
Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt

Signed-off-by: Zhao Qiang 
---
Changes for v2
- modify tx/rx-clock-name specification
Changes for v3
- NA 
Changes for v4
- drop device_type
- modify to SoC specific compatible 

 .../bindings/powerpc/fsl/cpm_qe/uqe_serial.txt | 18 ++
 1 file changed, 18 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt

diff --git 
a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt 
b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt
new file mode 100644
index 000..c2de8ba
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt
@@ -0,0 +1,18 @@
+* Serial
+
+Currently defined compatibles:
+- t1040-ucc-uart
+
+Properties for t1040-ucc-uart:
+port-number : port number of UCC-UART
+tx/rx-clock-name : should be "brg1"-"brg16" for internal clock source,
+  should be "clk1"-"clk28" for external clock source.
+
+Example:
+
+   ucc_serial: ucc@2200 {
+   compatible = "t1040-ucc-uart";
+   port-number = <0>;
+   rx-clock-name = "brg2";
+   tx-clock-name = "brg2";
+   };
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 2/8] QE: Add ucc hdlc document to bindings

2016-03-06 Thread Zhao Qiang

Add ucc hdlc document to
Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt

Signed-off-by: Zhao Qiang 
Acked-by: Rob Herring 
---
Changes for v2
- use ucc-hdlc instead of ucc_hdlc
- add more information to properties.
Changes for v3
- use fsl,tx-timeslot-mask instead of fsl,tx-timeslot 
- use fsl,rx-timeslot-mask instead of fsl,rx-timeslot 
- add more info
Changes for v4
- NA 

 .../bindings/powerpc/fsl/cpm_qe/network.txt| 81 ++
 1 file changed, 81 insertions(+)

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt 
b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt
index 29b28b8..03c7416 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt
@@ -41,3 +41,84 @@ Example:
fsl,mdio-pin = <12>;
fsl,mdc-pin = <13>;
};
+
+* HDLC
+
+Currently defined compatibles:
+- fsl,ucc-hdlc
+
+Properties for fsl,ucc-hdlc:
+- rx-clock-name
+- tx-clock-name
+   Usage: required
+   Value type: 
+   Definition : Must be "brg1"-"brg16" for internal clock source,
+Must be "clk1"-"clk24" for external clock source.
+
+- fsl,tdm-interface
+   Usage: optional
+   Value type: 
+   Definition : Specify that hdlc is based on tdm-interface
+
+The property below is dependent on fsl,tdm-interface:
+- fsl,rx-sync-clock
+   Usage: required
+   Value type: 
+   Definition : Must be "none", "rsync_pin", "brg9-11" and "brg13-15".
+
+- fsl,tx-sync-clock
+   Usage: required
+   Value type: 
+   Definition : Must be "none", "tsync_pin", "brg9-11" and "brg13-15".
+
+- fsl,tdm-framer-type
+   Usage: required for tdm interface
+   Value type: 
+   Definition : "e1" or "t1".Now e1 and t1 are used, other framer types
+are not supported.
+
+- fsl,tdm-id
+   Usage: required for tdm interface
+   Value type: 
+   Definition : number of TDM ID
+
+- fsl,tx-timeslot-mask
+- fsl,rx-timeslot-mask
+   Usage: required for tdm interface
+   Value type: 
+   Definition : time slot mask for TDM operation. Indicates which time
+slots used for transmitting and receiving.
+
+- fsl,siram-entry-id
+   Usage: required for tdm interface
+   Value type: 
+   Definition : Must be 0,2,4...64. the number of TDM entry.
+
+- fsl,tdm-internal-loopback
+   usage: optional for tdm interface
+   value type: 
+   Definition : Internal loopback connecting on TDM layer.
+
+Example for tdm interface:
+
+   ucc@2000 {
+   compatible = "fsl,ucc-hdlc";
+   rx-clock-name = "clk8";
+   tx-clock-name = "clk9";
+   fsl,rx-sync-clock = "rsync_pin";
+   fsl,tx-sync-clock = "tsync_pin";
+   fsl,tx-timeslot-mask = <0xfffe>;
+   fsl,rx-timeslot-mask = <0xfffe>;
+   fsl,tdm-framer-type = "e1";
+   fsl,tdm-id = <0>;
+   fsl,siram-entry-id = <0>;
+   fsl,tdm-interface;
+   };
+
+Example for hdlc without tdm interface:
+
+   ucc@2000 {
+   compatible = "fsl,ucc-hdlc";
+   rx-clock-name = "brg1";
+   tx-clock-name = "brg1";
+   };
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 1/8] QE: Add IC, SI and SIRAM document to device tree bindings.

2016-03-06 Thread Zhao Qiang

Add IC, SI and SIRAM document of QE to
Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/qe.txt

Signed-off-by: Zhao Qiang 
Acked-by: Rob Herring 
---
changes for v2
- Add interrupt-controller in Required properties
- delete address-cells and size-cells for qe-si and qe-siram
Changes for v3
- Add SoC specific caompatible strings to qe-si and qe-siram
Changes for v4
- NA 

 .../devicetree/bindings/powerpc/fsl/cpm_qe/qe.txt  | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/qe.txt 
b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/qe.txt
index 4f89302..7ab21cb 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/qe.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/qe.txt
@@ -69,6 +69,56 @@ Example:
};
  };
 
+* Interrupt Controller (IC)
+
+Required properties:
+- compatible : should be "fsl,qe-ic".
+- reg : Address range of IC register set.
+- interrupts : interrupts generated by the device.
+- interrupt-controller : this device is a interrupt controller.
+
+Example:
+
+   qeic: interrupt-controller@80 {
+   interrupt-controller;
+   compatible = "fsl,qe-ic";
+   #address-cells = <0>;
+   #interrupt-cells = <1>;
+   reg = <0x80 0x80>;
+   interrupts = <95 2 0 0  94 2 0 0>; //high:79 low:78
+   };
+
+* Serial Interface Block (SI)
+
+The SI manages the routing of eight TDM lines to the QE block serial drivers
+, the MCC and the UCCs, for receive and transmit.
+
+Required properties:
+- compatible : should be "fsl,t1040-qe-si".
+- reg : Address range of SI register set.
+
+Example:
+
+   si1: si@700 {
+   compatible = "fsl,t1040-qe-si";
+   reg = <0x700 0x80>;
+   };
+
+* Serial Interface Block RAM(SIRAM)
+
+store the routing entries of SI
+
+Required properties:
+- compatible : should be "fsl,t1040-qe-siram".
+- reg : Address range of SI RAM.
+
+Example:
+
+   siram1: siram@1000 {
+   compatible = "fsl,t1040-qe-siram";
+   reg = <0x1000 0x800>;
+   };
+
 * QE Firmware Node
 
 This node defines a firmware binary that is embedded in the device tree, for
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] freescale:Make the function ucc_geth_tx have a return type of void

2016-03-06 Thread Nicholas Krause

This makes the function ucc_geth_tx have a return type of void now
due to this particular function always completing without ever
executing a non recoverable error.

Signed-off-by: Nicholas Krause 
---
 drivers/net/ethernet/freescale/ucc_geth.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
b/drivers/net/ethernet/freescale/ucc_geth.c
index 78ebd73..70f3045 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -3226,7 +3226,7 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 
rxQ, int rx_work_limit
return howmany;
 }
 
-static int ucc_geth_tx(struct net_device *dev, u8 txQ)
+static void ucc_geth_tx(struct net_device *dev, u8 txQ)
 {
/* Start from the next BD that should be filled */
struct ucc_geth_private *ugeth = netdev_priv(dev);
@@ -3269,7 +3269,6 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
bd_status = in_be32((u32 __iomem *)bd);
}
ugeth->confBd[txQ] = bd;
-   return 0;
 }
 
 static int ucc_geth_poll(struct napi_struct *napi, int budget)
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][v4] livepatch/ppc: Enable livepatching on powerpc

2016-03-06 Thread Balbir Singh



On 04/03/16 23:42, Torsten Duwe wrote:
> On Thu, Mar 03, 2016 at 05:52:01PM +0100, Petr Mladek wrote:
> [...]
>> index ec7f8aada697..2d5333c228f1 100644
>> --- a/arch/powerpc/kernel/entry_64.S
>> +++ b/arch/powerpc/kernel/entry_64.S
>> @@ -1265,6 +1271,31 @@ ftrace_call:
>>  ld  r0, LRSAVE(r1)
>>  mtlrr0
>>  
>> +#ifdef CONFIG_LIVEPATCH
>> +beq+4f  /* likely(old_NIP == new_NIP) */
>> +/*
>> + * For a local call, restore this TOC after calling the patch function.
>> + * For a global call, it does not matter what we restore here,
>> + * since the global caller does its own restore right afterwards,
>> + * anyway. Just insert a klp_return_helper frame in any case,
>> + * so a patch function can always count on the changed stack offsets.
>> + * The patch introduces a frame such that from the patched function
>> + * we return back to klp_return helper. For ABI compliance r12,
>> + * lr and LRSAVE(r1) contain the address of klp_return_helper.
>> + * We loaded ctr with the address of the patched function earlier
>> + */
>> +stdur1, -32(r1) /* open new mini stack frame */
>> +std r2, 24(r1)  /* save TOC now, unconditionally. */
>> +bl  5f
>> +5:  mflrr12
>> +addir12, r12, (klp_return_helper + 4 - .)@l
>> +std r12, LRSAVE(r1)
>> +mtlrr12
>> +mfctr   r12 /* allow for TOC calculation in newfunc */
>> +bctr
>> +4:
>> +#endif
>> +
>>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>  stdur1, -112(r1)
>>  .globl ftrace_graph_call
>> @@ -1281,6 +1312,25 @@ _GLOBAL(ftrace_graph_stub)
>>  
>>  _GLOBAL(ftrace_stub)
>>  blr
>> +#ifdef CONFIG_LIVEPATCH
>> +/* Helper function for local calls that are becoming global
>> + * due to live patching.
>> + * We can't simply patch the NOP after the original call,
>> + * because, depending on the consistency model, some kernel
>> + * threads may still have called the original, local function
>> + * *without* saving their TOC in the respective stack frame slot,
>> + * so the decision is made per-thread during function return by
>> + * maybe inserting a klp_return_helper frame or not.
>> +*/
>> +klp_return_helper:
>> +ld  r2, 24(r1)  /* restore TOC (saved by ftrace_caller) */
>> +addi r1, r1, 32 /* destroy mini stack frame */
>> +ld  r0, LRSAVE(r1)  /* get the real return address */
>> +mtlrr0
>> +blr
>> +#endif
>> +
>> +
>>  #else
>>  _GLOBAL_TOC(_mcount)
>>  /* Taken from output of objdump from lib64/glibc */
> We need a caveat here, at least in the comments, even better
> in some documentation, that the klp_return_helper shifts the stack layout.
>
> This is relevant for functions with more than 8 fixed integer arguments
> or for any varargs creator. As soon as the patch function is to replace
> an original with arguments on the stack, the extra stack frame needs to
> be accounted for.
>
> Where shall we put this warning?
Good catch! We should just document it in livepatch.c (I suppose). I wonder if 
we can reuse the previous stack frame -- the caller into ftrace_caller. I think 
our arch.trampoline does bunch of the work anyway, klp_return_helper would just 
need to restore the right set of values

I hope I am thinking clearly on a Monday morning
Balbir Singh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/process: fix altivec SPR not being saved

2016-03-06 Thread Oliver O'Halloran

In save_sprs() in process.c contains the following test:

if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC)))
t->vrsave = mfspr(SPRN_VRSAVE);

CPU feature with the mask 0x1 is CPU_FTR_COHERENT_ICACHE so the test
is equivilent to:

if (cpu_has_feature(CPU_FTR_ALTIVEC) &&
cpu_has_feature(CPU_FTR_COHERENT_ICACHE))

On CPUs without support for both (i.e G5) this results in vrsave not being
saved between context switches. The vector register save/restore code
doesn't use VRSAVE to determine which registers to save/restore,
but the value of VRSAVE is used to determine if altivec is being used
in several code paths.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/kernel/process.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 8224852..5a4d4d1 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -855,7 +855,7 @@ void restore_tm_state(struct pt_regs *regs)
 static inline void save_sprs(struct thread_struct *t)
 {
 #ifdef CONFIG_ALTIVEC
-   if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC)))
+   if (cpu_has_feature(CPU_FTR_ALTIVEC))
t->vrsave = mfspr(SPRN_VRSAVE);
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

44 matches

Mail list logo