[RFC PATCH v4 3/7] PCI: Ignore resource_alignment if PCI_PROBE_ONLY was set
The resource_alignment will releases memory resources allocated by firmware so that kernel can reassign new resources later on. But this will cause the problem that no resources can be allocated by kernel if PCI_PROBE_ONLY was set, e.g. on pSeries platform because PCI_PROBE_ONLY force kernel to use firmware setup and not to reassign any resources. To solve this problem, this patch ignores resource_alignment if PCI_PROBE_ONLY was set. Signed-off-by: Yongji Xie--- Documentation/kernel-parameters.txt |2 ++ drivers/pci/probe.c |3 ++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index d8b29ab..8028631 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2922,6 +2922,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. windows need to be expanded. noresize: Don't change the resources' sizes when reassigning alignment. + Note that this option will not work if + PCI_PROBE_ONLY is set. ecrc= Enable/disable PCIe ECRC (transaction layer end-to-end CRC checking). bios: Use BIOS/firmware settings. This is the diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 6d7ab9b..bc31cad 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1719,7 +1719,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) pci_fixup_device(pci_fixup_header, dev); /* moved out from quirk header fixup code */ - pci_reassigndev_resource_alignment(dev); + if (!pci_has_flag(PCI_PROBE_ONLY)) + pci_reassigndev_resource_alignment(dev); /* Clear the state_saved flag. */ dev->state_saved = false; -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 7/7] powerpc/powernv/pci-ioda: Add IOMMU_CAP_INTR_REMAP for IODA host bridge
This patch adds IOMMU_CAP_INTR_REMAP for IODA host bridge so that we can mmap MSI-X table in vfio driver. Signed-off-by: Yongji Xie--- arch/powerpc/platforms/powernv/pci-ioda.c | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index f90dc04..f01b9ab 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1955,6 +1955,20 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = { .free = pnv_ioda2_table_free, }; +static bool pnv_ioda_iommu_capable(enum iommu_cap cap) +{ + switch (cap) { + case IOMMU_CAP_INTR_REMAP: + return true; + default: + return false; + } +} + +static struct iommu_ops pnv_ioda_iommu_ops = { + .capable = pnv_ioda_iommu_capable, +}; + static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe, unsigned int base, unsigned int segs) @@ -3078,6 +3092,9 @@ static void pnv_pci_ioda_fixup(void) /* Link NPU IODA tables to their PCI devices. */ pnv_npu_ioda_fixup(); + + /* Add IOMMU_CAP_INTR_REMAP */ + bus_set_iommu(_bus_type, _ioda_iommu_ops); } /* -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 6/7] vfio-pci: Allow to mmap MSI-X table if IOMMU_CAP_INTR_REMAP was set
Current vfio-pci implementation disallows to mmap MSI-X table in case that user get to touch this directly. But we should allow to mmap these MSI-X tables if IOMMU supports interrupt remapping which can ensure that a given pci device can only shoot the MSIs assigned for it. Signed-off-by: Yongji Xie--- drivers/vfio/pci/vfio_pci.c |8 +--- drivers/vfio/pci/vfio_pci_rdwr.c |4 +++- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 49d7a69..d6f4788 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -592,13 +592,14 @@ static long vfio_pci_ioctl(void *device_data, IORESOURCE_MEM && !pci_resources_share_page(pdev, info.index)) { info.flags |= VFIO_REGION_INFO_FLAG_MMAP; - if (info.index == vdev->msix_bar) { + if (!iommu_capable(pdev->dev.bus, + IOMMU_CAP_INTR_REMAP) && + info.index == vdev->msix_bar) { ret = msix_sparse_mmap_cap(vdev, ); if (ret) return ret; } } - break; case VFIO_PCI_ROM_REGION_INDEX: { @@ -1029,7 +1030,8 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) if (phys_len < PAGE_SIZE || req_start + req_len > phys_len) return -EINVAL; - if (index == vdev->msix_bar) { + if (!iommu_capable(pdev->dev.bus, IOMMU_CAP_INTR_REMAP) && + index == vdev->msix_bar) { /* * Disallow mmaps overlapping the MSI-X table; users don't * get to touch this directly. We could find somewhere diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 5ffd1d9..1c46c29 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -18,6 +18,7 @@ #include #include #include +#include #include "vfio_pci_private.h" @@ -164,7 +165,8 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf, } else io = vdev->barmap[bar]; - if (bar == vdev->msix_bar) { + if (!iommu_capable(pdev->dev.bus, IOMMU_CAP_INTR_REMAP) && + bar == vdev->msix_bar) { x_start = vdev->msix_offset; x_end = vdev->msix_offset + vdev->msix_size; } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 4/7] PCI: Modify resource_alignment to support multiple devices
When vfio passthrough a PCI device of which MMIO BARs are smaller than PAGE_SIZE, guest will not handle the mmio accesses to the BARs which leads to mmio emulations in host. This is because vfio will not allow to passthrough one BAR's mmio page which may be shared with other BARs. To solve this performance issue, this patch modifies resource_alignment to support syntax where multiple devices get the same alignment. So we can use something like "pci=resource_alignment=*:*:*.*:noresize" to enforce the alignment of all MMIO BARs to be at least PAGE_SIZE so that one BAR's mmio page would not be shared with other BARs. Signed-off-by: Yongji Xie--- Documentation/kernel-parameters.txt |2 + drivers/pci/pci.c | 90 ++- include/linux/pci.h |4 ++ 3 files changed, 85 insertions(+), 11 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 8028631..74b38ab 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2918,6 +2918,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. aligned memory resources. If is not specified, PAGE_SIZE is used as alignment. + , , and can be set to + "*" which means match all values. PCI-PCI bridge can be specified, if resource windows need to be expanded. noresize: Don't change the resources' sizes when diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 760cce5..44ab59f 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -102,6 +102,8 @@ unsigned int pcibios_max_latency = 255; /* If set, the PCIe ARI capability will not be used. */ static bool pcie_ari_disabled; +bool pci_resources_page_aligned; + /** * pci_bus_max_busnr - returns maximum PCI bus number of given bus' children * @bus: pointer to PCI bus structure to search @@ -4604,6 +4606,7 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, int seg, bus, slot, func, align_order, count; resource_size_t align = 0; char *p; + bool invalid = false; spin_lock(_alignment_lock); p = resource_alignment_param; @@ -4615,16 +4618,49 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, } else { align_order = -1; } - if (sscanf(p, "%x:%x:%x.%x%n", - , , , , ) != 4) { + if (p[0] == '*' && p[1] == ':') { + seg = -1; + count = 1; + } else if (sscanf(p, "%x%n", , ) != 1 || + p[count] != ':') { + invalid = true; + break; + } + p += count + 1; + if (*p == '*') { + bus = -1; + count = 1; + } else if (sscanf(p, "%x%n", , ) != 1) { + invalid = true; + break; + } + p += count; + if (*p == '.') { + slot = bus; + bus = seg; seg = 0; - if (sscanf(p, "%x:%x.%x%n", - , , , ) != 3) { - /* Invalid format */ - printk(KERN_ERR "PCI: Can't parse resource_alignment parameter: %s\n", - p); + p++; + } else if (*p == ':') { + p++; + if (p[0] == '*' && p[1] == '.') { + slot = -1; + count = 1; + } else if (sscanf(p, "%x%n", , ) != 1 || + p[count] != '.') { + invalid = true; break; } + p += count + 1; + } else { + invalid = true; + break; + } + if (*p == '*') { + func = -1; + count = 1; + } else if (sscanf(p, "%x%n", , ) != 1) { + invalid = true; + break; } p += count; if (!strncmp(p, ":noresize", 9)) { @@ -4632,23 +4668,34 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p += 9; } else *resize = true; -
[RFC PATCH v4 5/7] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive
Current vfio-pci implementation disallows to mmap sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio page may be shared with other BARs. But we should allow to mmap these sub-page MMIO BARs if PCI resource allocator can make sure these BARs' mmio page will not be shared with other BARs. Signed-off-by: Yongji Xie--- drivers/vfio/pci/vfio_pci.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 1ce1d36..49d7a69 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -589,7 +589,8 @@ static long vfio_pci_ioctl(void *device_data, VFIO_REGION_INFO_FLAG_WRITE; if (IS_ENABLED(CONFIG_VFIO_PCI_MMAP) && pci_resource_flags(pdev, info.index) & - IORESOURCE_MEM && info.size >= PAGE_SIZE) { + IORESOURCE_MEM && !pci_resources_share_page(pdev, + info.index)) { info.flags |= VFIO_REGION_INFO_FLAG_MMAP; if (info.index == vdev->msix_bar) { ret = msix_sparse_mmap_cap(vdev, ); @@ -1016,6 +1017,10 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) return -EINVAL; phys_len = pci_resource_len(pdev, index); + + if (!pci_resources_share_page(pdev, index)) + phys_len = PAGE_ALIGN(phys_len); + req_len = vma->vm_end - vma->vm_start; pgoff = vma->vm_pgoff & ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 2/7] PCI: Use IORESOURCE_WINDOW to identify bridge resources
Now we use the IORESOURCE_STARTALIGN to identify bridge resources in __assign_resources_sorted(). But there would be some problems because some PCI devices' resources may also use IORESOURCE_STARTALIGN, e.g. using "noresize" option of resource_alignment kernel parameter. So this patch replaces IORESOURCE_STARTALIGN with IORESOURCE_WINDOW. Signed-off-by: Yongji Xie--- drivers/pci/setup-bus.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 7796d0a..4ff10ca 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -411,11 +411,11 @@ static void __assign_resources_sorted(struct list_head *head, /* * There are two kinds of additional resources in the list: -* 1. bridge resource -- IORESOURCE_STARTALIGN -* 2. SR-IOV resource -- IORESOURCE_SIZEALIGN +* 1. bridge resource -- IORESOURCE_WINDOW +* 2. SR-IOV resource * Here just fix the additional alignment for bridge */ - if (!(dev_res->res->flags & IORESOURCE_STARTALIGN)) + if (!(dev_res->res->flags & IORESOURCE_WINDOW)) continue; add_align = get_res_add_align(realloc_head, dev_res->res); @@ -956,7 +956,7 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size, b_res->start = min_align; b_res->end = b_res->start + size0 - 1; - b_res->flags |= IORESOURCE_STARTALIGN; + b_res->flags |= IORESOURCE_STARTALIGN | IORESOURCE_WINDOW; if (size1 > size0 && realloc_head) { add_to_list(realloc_head, bus->self, b_res, size1-size0, min_align); @@ -1104,7 +1104,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask, } b_res->start = min_align; b_res->end = size0 + min_align - 1; - b_res->flags |= IORESOURCE_STARTALIGN; + b_res->flags |= IORESOURCE_STARTALIGN | IORESOURCE_WINDOW; if (size1 > size0 && realloc_head) { add_to_list(realloc_head, bus->self, b_res, size1-size0, add_align); dev_printk(KERN_DEBUG, >self->dev, "bridge window %pR to %pR add_size %llx add_align %llx\n", @@ -1140,7 +1140,8 @@ static void pci_bus_size_cardbus(struct pci_bus *bus, */ b_res[0].start = pci_cardbus_io_size; b_res[0].end = b_res[0].start + pci_cardbus_io_size - 1; - b_res[0].flags |= IORESOURCE_IO | IORESOURCE_STARTALIGN; + b_res[0].flags |= IORESOURCE_IO | IORESOURCE_STARTALIGN | + IORESOURCE_WINDOW; if (realloc_head) { b_res[0].end -= pci_cardbus_io_size; add_to_list(realloc_head, bridge, b_res, pci_cardbus_io_size, @@ -1152,7 +1153,8 @@ handle_b_res_1: goto handle_b_res_2; b_res[1].start = pci_cardbus_io_size; b_res[1].end = b_res[1].start + pci_cardbus_io_size - 1; - b_res[1].flags |= IORESOURCE_IO | IORESOURCE_STARTALIGN; + b_res[1].flags |= IORESOURCE_IO | IORESOURCE_STARTALIGN | + IORESOURCE_WINDOW; if (realloc_head) { b_res[1].end -= pci_cardbus_io_size; add_to_list(realloc_head, bridge, b_res+1, pci_cardbus_io_size, @@ -1190,7 +1192,7 @@ handle_b_res_2: b_res[2].start = pci_cardbus_mem_size; b_res[2].end = b_res[2].start + pci_cardbus_mem_size - 1; b_res[2].flags |= IORESOURCE_MEM | IORESOURCE_PREFETCH | - IORESOURCE_STARTALIGN; + IORESOURCE_STARTALIGN | IORESOURCE_WINDOW; if (realloc_head) { b_res[2].end -= pci_cardbus_mem_size; add_to_list(realloc_head, bridge, b_res+2, @@ -1206,7 +1208,8 @@ handle_b_res_3: goto handle_done; b_res[3].start = pci_cardbus_mem_size; b_res[3].end = b_res[3].start + b_res_3_size - 1; - b_res[3].flags |= IORESOURCE_MEM | IORESOURCE_STARTALIGN; + b_res[3].flags |= IORESOURCE_MEM | IORESOURCE_STARTALIGN | + IORESOURCE_WINDOW; if (realloc_head) { b_res[3].end -= b_res_3_size; add_to_list(realloc_head, bridge, b_res+3, b_res_3_size, -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 1/7] PCI: Add a new option for resource_alignment to reassign alignment
When using resource_alignment kernel parameter, the current implement reassigns the alignment by changing resources' size which can potentially break some drivers. So this patch adds a new option "noresize" for the parameter to solve this problem. Signed-off-by: Yongji Xie--- Documentation/kernel-parameters.txt |5 - drivers/pci/pci.c | 36 +-- 2 files changed, 30 insertions(+), 11 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 9a53c92..d8b29ab 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2912,13 +2912,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. window. The default value is 64 megabytes. resource_alignment= Format: - [@][:]:.[; ...] + [@][:]:. + [:noresize][; ...] Specifies alignment and device to reassign aligned memory resources. If is not specified, PAGE_SIZE is used as alignment. PCI-PCI bridge can be specified, if resource windows need to be expanded. + noresize: Don't change the resources' sizes when + reassigning alignment. ecrc= Enable/disable PCIe ECRC (transaction layer end-to-end CRC checking). bios: Use BIOS/firmware settings. This is the diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 602eb42..760cce5 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4598,7 +4598,8 @@ static DEFINE_SPINLOCK(resource_alignment_lock); * RETURNS: Resource alignment if it is specified. * Zero if it is not specified. */ -static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev) +static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, + bool *resize) { int seg, bus, slot, func, align_order, count; resource_size_t align = 0; @@ -4626,6 +4627,11 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev) } } p += count; + if (!strncmp(p, ":noresize", 9)) { + *resize = false; + p += 9; + } else + *resize = true; if (seg == pci_domain_nr(dev->bus) && bus == dev->bus->number && slot == PCI_SLOT(dev->devfn) && @@ -4658,11 +4664,12 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev) { int i; struct resource *r; + bool resize; resource_size_t align, size; u16 command; /* check if specified PCI is target device to reassign */ - align = pci_specified_resource_alignment(dev); + align = pci_specified_resource_alignment(dev, ); if (!align) return; @@ -4684,15 +4691,24 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev) if (!(r->flags & IORESOURCE_MEM)) continue; size = resource_size(r); - if (size < align) { - size = align; - dev_info(>dev, - "Rounding up size of resource #%d to %#llx.\n", - i, (unsigned long long)size); + if (resize) { + if (size < align) { + size = align; + dev_info(>dev, + "Rounding up size of resource #%d to %#llx.\n", + i, (unsigned long long)size); + } + r->flags |= IORESOURCE_UNSET; + r->end = size - 1; + r->start = 0; + } else { + if (size > align) + align = size; + r->flags &= ~IORESOURCE_SIZEALIGN; + r->flags |= IORESOURCE_STARTALIGN | IORESOURCE_UNSET; + r->start = align; + r->end = r->start + size - 1; } - r->flags |= IORESOURCE_UNSET; - r->end = size - 1; - r->start = 0; } /* Need to disable bridge's resource window, * to enable the kernel to reassign new resource -- 1.7.9.5 ___ Linuxppc-dev mailing list
[RFC PATCH v4 0/7] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table
Current vfio-pci implementation disallows to mmap sub-page(size < PAGE_SIZE) MMIO BARs and MSI-X table. This is because sub-page BARs' mmio page may be shared with other BARs and MSI-X table should not be accessed directly from the guest for security reasons. But these will easily cause some performance issues for mmio accesses in guest when vfio passthrough sub-page BARs or BARs containing MSI-X table on PPC64 platform. This is because PAGE_SIZE is 64KB by default on PPC64 platform and the big page may easily hit the sub-page MMIO BARs' unmmapping and cause the unmmaping of the mmio page which MSI-X table locate in, which lead to mmio emulation in host. For sub-page MMIO BARs' unmmapping, this patchset modifies resource_alignment kernel parameter to enforce the alignment of all MMIO BARs to be at least PAGE_SZIE so that sub-page BAR's mmio page will not be shared with other BARs. Then we can mmap sub-page MMIO BARs in vfio-pci driver with the modified resource_alignment. For MSI-X table's unmmapping, we think MSI-X table is safe to access directly from userspace if PCI host bridge support filtering of MSIs which can ensure that a given pci device can only shoot the MSIs assigned for it. So we allow to mmap MSI-X table if IOMMU_CAP_INTR_REMAP was set. And we add IOMMU_CAP_INTR_REMAP for IODA host bridge on PPC64 platform. With this patchset applied, we can get almost 100% improvement on performance for mmio accesses when we passthrough sub-page BARs to guest in our test. The two vfio related patches(patch 5 and patch 6) are based on the proposed patchset[1]. Changelog v4: - Rebase on v4.5-rc6 with patchset[1] applied. - Remove resource_page_aligned kernel parameter - Fix some problems with resource_alignment kernel parameter - Modify resource_alignment kernel parameter to support multiple devices. - Remove host bridge attribute: msi_filtered - Use IOMMU_CAP_INTR_REMAP to check if MSI-X table can be mmapped - Add IOMMU_CAP_INTR_REMAP for IODA host bridge on PPC64 platform Changelog v3: - Rebase on new linux kernel mainline with the patchset[1] applied. - Add a function to check whether PCI BARs'mmio page is shared with other BARs. - Add a host bridge attribute to indicate PCI host bridge support filtering of MSIs. - Use the new host bridge attribute to check if MSI-X table can be mmapped instead of CONFIG_EEH. - Remove Kconfig option VFIO_PCI_MMAP_MSIX Changelog v2: - Rebase on v4.4-rc6 with the patchset[1] applied. - Use kernel parameter to enforce all MMIO BARs to be page aligned on PCI core code instead of doing it on PPC64 arch code. - Remove flags: VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP - Add a Kconfig option to support for mmapping MSI-X table. [1] http://www.spinics.net/lists/kvm/msg127812.html Yongji Xie (7): PCI: Add a new option for resource_alignment to reassign alignment PCI: Use IORESOURCE_WINDOW to identify bridge resources PCI: Ignore resource_alignment if PCI_PROBE_ONLY was set PCI: Modify resource_alignment to support multiple devices vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive vfio-pci: Allow to mmap MSI-X table if IOMMU_CAP_INTR_REMAP was set powerpc/powernv/pci-ioda: Add IOMMU_CAP_INTR_REMAP for IODA host bridge Documentation/kernel-parameters.txt |9 ++- arch/powerpc/platforms/powernv/pci-ioda.c | 17 drivers/pci/pci.c | 126 - drivers/pci/probe.c |3 +- drivers/pci/setup-bus.c | 21 ++--- drivers/vfio/pci/vfio_pci.c | 15 +++- drivers/vfio/pci/vfio_pci_rdwr.c |4 +- include/linux/pci.h |4 + 8 files changed, 162 insertions(+), 37 deletions(-) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH kernel 4/9] powerpc/powernv/iommu: Add real mode version of xchg()
On 03/07/2016 05:05 PM, David Gibson wrote: On Mon, Mar 07, 2016 at 02:41:12PM +1100, Alexey Kardashevskiy wrote: In real mode, TCE tables are invalidated using different cache-inhibited store instructions which is different from the virtual mode. This defines and implements exchange_rm() callback. This does not define set_rm/clear_rm/flush_rm callbacks as there is no user for those - exchange/exchange_rm are only to be used by KVM for VFIO. The exchange_rm callback is defined for IODA1/IODA2 powernv platforms. This replaces list_for_each_entry_rcu with its lockless version as from now on pnv_pci_ioda2_tce_invalidate() can be called in the real mode too. Signed-off-by: Alexey Kardashevskiy--- arch/powerpc/include/asm/iommu.h | 7 +++ arch/powerpc/kernel/iommu.c | 15 +++ arch/powerpc/platforms/powernv/pci-ioda.c | 28 +++- 3 files changed, 49 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 7b87bab..3ca877a 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -64,6 +64,11 @@ struct iommu_table_ops { long index, unsigned long *hpa, enum dma_data_direction *direction); + /* Real mode */ + int (*exchange_rm)(struct iommu_table *tbl, + long index, + unsigned long *hpa, + enum dma_data_direction *direction); #endif void (*clear)(struct iommu_table *tbl, long index, long npages); @@ -208,6 +213,8 @@ extern void iommu_del_device(struct device *dev); extern int __init tce_iommu_bus_notifier_init(void); extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry, unsigned long *hpa, enum dma_data_direction *direction); +extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry, + unsigned long *hpa, enum dma_data_direction *direction); #else static inline void iommu_register_group(struct iommu_table_group *table_group, int pci_domain_number, diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index a8e3490..2fcc48b 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1062,6 +1062,21 @@ void iommu_release_ownership(struct iommu_table *tbl) } EXPORT_SYMBOL_GPL(iommu_release_ownership); +long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry, + unsigned long *hpa, enum dma_data_direction *direction) +{ + long ret; + + ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction); + + if (!ret && ((*direction == DMA_FROM_DEVICE) || + (*direction == DMA_BIDIRECTIONAL))) + SetPageDirty(realmode_pfn_to_page(*hpa >> PAGE_SHIFT)); + + return ret; +} +EXPORT_SYMBOL_GPL(iommu_tce_xchg_rm); int iommu_add_device(struct device *dev) { struct iommu_table *tbl; diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index c5baaf3..bed1944 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1791,6 +1791,18 @@ static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, long index, return ret; } + +static int pnv_ioda1_tce_xchg_rm(struct iommu_table *tbl, long index, + unsigned long *hpa, enum dma_data_direction *direction) +{ + long ret = pnv_tce_xchg(tbl, index, hpa, direction); + + if (!ret && (tbl->it_type & + (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE))) + pnv_pci_ioda1_tce_invalidate(tbl, index, 1, true); + + return ret; +} #endif Both your _rm variants are identical to the non _rm versions. Why not just set the function poiinter to the same thing, rather than copying the whole function. The last parameter - "rm" - to pnv_pci_ioda1_tce_invalidate() is different. static void pnv_ioda1_tce_free(struct iommu_table *tbl, long index, @@ -1806,6 +1818,7 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = { .set = pnv_ioda1_tce_build, #ifdef CONFIG_IOMMU_API .exchange = pnv_ioda1_tce_xchg, + .exchange_rm = pnv_ioda1_tce_xchg_rm, #endif .clear = pnv_ioda1_tce_free, .get = pnv_tce_get, @@ -1866,7 +1879,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl, { struct iommu_table_group_link *tgl; - list_for_each_entry_rcu(tgl, >it_group_list, next) { + list_for_each_entry_lockless(tgl, >it_group_list, next) { struct pnv_ioda_pe *npe; struct pnv_ioda_pe *pe = container_of(tgl->table_group, struct pnv_ioda_pe, table_group); @@ -1918,6 +1931,18 @@ static int pnv_ioda2_tce_xchg(struct
Re: [PATCH kernel 6/9] KVM: PPC: Associate IOMMU group with guest view of TCE table
On Mon, Mar 07, 2016 at 02:41:14PM +1100, Alexey Kardashevskiy wrote: > The existing in-kernel TCE table for emulated devices contains > guest physical addresses which are accesses by emulated devices. > Since we need to keep this information for VFIO devices too > in order to implement H_GET_TCE, we are reusing it. > > This adds IOMMU group list to kvmppc_spapr_tce_table. Each group > will have an iommu_table pointer. > > This adds kvm_spapr_tce_attach_iommu_group() helper and its detach > counterpart to manage the lists. > > This puts a group when: > - guest copy of TCE table is destroyed when TCE table fd is closed; > - kvm_spapr_tce_detach_iommu_group() is called from > the KVM_DEV_VFIO_GROUP_DEL ioctl handler in the case vfio-pci hotunplug > (will be added in the following patch). > > Signed-off-by: Alexey Kardashevskiy> --- > arch/powerpc/include/asm/kvm_host.h | 8 +++ > arch/powerpc/include/asm/kvm_ppc.h | 6 ++ > arch/powerpc/kvm/book3s_64_vio.c| 108 > > 3 files changed, 122 insertions(+) > > diff --git a/arch/powerpc/include/asm/kvm_host.h > b/arch/powerpc/include/asm/kvm_host.h > index 2e7c791..2c5c823 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -178,6 +178,13 @@ struct kvmppc_pginfo { > atomic_t refcnt; > }; > > +struct kvmppc_spapr_tce_group { > + struct list_head next; > + struct rcu_head rcu; > + struct iommu_group *refgrp;/* for reference counting only */ > + struct iommu_table *tbl; > +}; > + > struct kvmppc_spapr_tce_table { > struct list_head list; > struct kvm *kvm; > @@ -186,6 +193,7 @@ struct kvmppc_spapr_tce_table { > u32 page_shift; > u64 offset; /* in pages */ > u64 size; /* window size in pages */ > + struct list_head groups; > struct page *pages[0]; > }; > > diff --git a/arch/powerpc/include/asm/kvm_ppc.h > b/arch/powerpc/include/asm/kvm_ppc.h > index 2544eda..d1482dc 100644 > --- a/arch/powerpc/include/asm/kvm_ppc.h > +++ b/arch/powerpc/include/asm/kvm_ppc.h > @@ -164,6 +164,12 @@ extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu, > struct kvm_memory_slot *memslot, unsigned long porder); > extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); > > +extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, > + unsigned long liobn, > + phys_addr_t start_addr, > + struct iommu_group *grp); > +extern void kvm_spapr_tce_detach_iommu_group(struct kvm *kvm, > + struct iommu_group *grp); > extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, > struct kvm_create_spapr_tce_64 *args); > extern struct kvmppc_spapr_tce_table *kvmppc_find_table( > diff --git a/arch/powerpc/kvm/book3s_64_vio.c > b/arch/powerpc/kvm/book3s_64_vio.c > index 2c2d103..846d16d 100644 > --- a/arch/powerpc/kvm/book3s_64_vio.c > +++ b/arch/powerpc/kvm/book3s_64_vio.c > @@ -27,6 +27,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -95,10 +96,18 @@ static void release_spapr_tce_table(struct rcu_head *head) > struct kvmppc_spapr_tce_table *stt = container_of(head, > struct kvmppc_spapr_tce_table, rcu); > unsigned long i, npages = kvmppc_tce_pages(stt->size); > + struct kvmppc_spapr_tce_group *kg; > > for (i = 0; i < npages; i++) > __free_page(stt->pages[i]); > > + while (!list_empty(>groups)) { > + kg = list_first_entry(>groups, > + struct kvmppc_spapr_tce_group, next); > + list_del(>next); > + kfree(kg); > + } > + > kfree(stt); > } > > @@ -129,9 +138,15 @@ static int kvm_spapr_tce_mmap(struct file *file, struct > vm_area_struct *vma) > static int kvm_spapr_tce_release(struct inode *inode, struct file *filp) > { > struct kvmppc_spapr_tce_table *stt = filp->private_data; > + struct kvmppc_spapr_tce_group *kg; > > list_del_rcu(>list); > > + list_for_each_entry_rcu(kg, >groups, next) { > + iommu_group_put(kg->refgrp); > + kg->refgrp = NULL; > + } What's the reason for this kind of two-phase deletion? Dereffing the group here, and setting to NULL, then actually removing from the liast above. > kvm_put_kvm(stt->kvm); > > kvmppc_account_memlimit( > @@ -146,6 +161,98 @@ static const struct file_operations kvm_spapr_tce_fops = > { > .release= kvm_spapr_tce_release, > }; > > +extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, > + unsigned long liobn, > + phys_addr_t start_addr, > + struct iommu_group *grp) > +{ > + struct kvmppc_spapr_tce_table *stt = NULL; >
Re: [PATCH kernel 3/9] KVM: PPC: Use preregistered memory API to access TCE list
On Mon, Mar 07, 2016 at 02:41:11PM +1100, Alexey Kardashevskiy wrote: > VFIO on sPAPR already implements guest memory pre-registration > when the entire guest RAM gets pinned. This can be used to translate > the physical address of a guest page containing the TCE list > from H_PUT_TCE_INDIRECT. > > This makes use of the pre-registrered memory API to access TCE list > pages in order to avoid unnecessary locking on the KVM memory > reverse map. > > Signed-off-by: Alexey KardashevskiyOk.. so, what's the benefit of not having to lock the rmap? > --- > arch/powerpc/kvm/book3s_64_vio_hv.c | 86 > ++--- > 1 file changed, 70 insertions(+), 16 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c > b/arch/powerpc/kvm/book3s_64_vio_hv.c > index 44be73e..af155f6 100644 > --- a/arch/powerpc/kvm/book3s_64_vio_hv.c > +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c > @@ -180,6 +180,38 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa, > EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua); > > #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > +static mm_context_t *kvmppc_mm_context(struct kvm_vcpu *vcpu) > +{ > + struct task_struct *task; > + > + task = vcpu->arch.run_task; > + if (unlikely(!task || !task->mm)) > + return NULL; > + > + return >mm->context; > +} > + > +static inline bool kvmppc_preregistered(struct kvm_vcpu *vcpu) > +{ > + mm_context_t *mm = kvmppc_mm_context(vcpu); > + > + if (unlikely(!mm)) > + return false; > + > + return mm_iommu_preregistered(mm); > +} > + > +static struct mm_iommu_table_group_mem_t *kvmppc_rm_iommu_lookup( > + struct kvm_vcpu *vcpu, unsigned long ua, unsigned long size) > +{ > + mm_context_t *mm = kvmppc_mm_context(vcpu); > + > + if (unlikely(!mm)) > + return NULL; > + > + return mm_iommu_lookup_rm(mm, ua, size); > +} > + > long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, > unsigned long ioba, unsigned long tce) > { > @@ -261,23 +293,44 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, > if (ret != H_SUCCESS) > return ret; > > - if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , )) > - return H_TOO_HARD; > + if (kvmppc_preregistered(vcpu)) { > + /* > + * We get here if guest memory was pre-registered which > + * is normally VFIO case and gpa->hpa translation does not > + * depend on hpt. > + */ > + struct mm_iommu_table_group_mem_t *mem; > > - rmap = (void *) vmalloc_to_phys(rmap); > + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , NULL)) > + return H_TOO_HARD; > > - /* > - * Synchronize with the MMU notifier callbacks in > - * book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.). > - * While we have the rmap lock, code running on other CPUs > - * cannot finish unmapping the host real page that backs > - * this guest real page, so we are OK to access the host > - * real page. > - */ > - lock_rmap(rmap); > - if (kvmppc_rm_ua_to_hpa(vcpu, ua, )) { > - ret = H_TOO_HARD; > - goto unlock_exit; > + mem = kvmppc_rm_iommu_lookup(vcpu, ua, IOMMU_PAGE_SIZE_4K); > + if (!mem || mm_iommu_rm_ua_to_hpa(mem, ua, )) > + return H_TOO_HARD; > + } else { > + /* > + * This is emulated devices case. > + * We do not require memory to be preregistered in this case > + * so lock rmap and do __find_linux_pte_or_hugepte(). > + */ > + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , )) > + return H_TOO_HARD; > + > + rmap = (void *) vmalloc_to_phys(rmap); > + > + /* > + * Synchronize with the MMU notifier callbacks in > + * book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.). > + * While we have the rmap lock, code running on other CPUs > + * cannot finish unmapping the host real page that backs > + * this guest real page, so we are OK to access the host > + * real page. > + */ > + lock_rmap(rmap); > + if (kvmppc_rm_ua_to_hpa(vcpu, ua, )) { > + ret = H_TOO_HARD; > + goto unlock_exit; > + } > } > > for (i = 0; i < npages; ++i) { > @@ -291,7 +344,8 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, > } > > unlock_exit: > - unlock_rmap(rmap); > + if (rmap) I don't see where rmap is initialized to NULL in the case where it's not being used. > + unlock_rmap(rmap); > > return ret; > } -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
Re: [PATCH kernel 4/9] powerpc/powernv/iommu: Add real mode version of xchg()
On Mon, Mar 07, 2016 at 02:41:12PM +1100, Alexey Kardashevskiy wrote: > In real mode, TCE tables are invalidated using different > cache-inhibited store instructions which is different from > the virtual mode. > > This defines and implements exchange_rm() callback. This does not > define set_rm/clear_rm/flush_rm callbacks as there is no user for those - > exchange/exchange_rm are only to be used by KVM for VFIO. > > The exchange_rm callback is defined for IODA1/IODA2 powernv platforms. > > This replaces list_for_each_entry_rcu with its lockless version as > from now on pnv_pci_ioda2_tce_invalidate() can be called in > the real mode too. > > Signed-off-by: Alexey Kardashevskiy> --- > arch/powerpc/include/asm/iommu.h | 7 +++ > arch/powerpc/kernel/iommu.c | 15 +++ > arch/powerpc/platforms/powernv/pci-ioda.c | 28 +++- > 3 files changed, 49 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/include/asm/iommu.h > b/arch/powerpc/include/asm/iommu.h > index 7b87bab..3ca877a 100644 > --- a/arch/powerpc/include/asm/iommu.h > +++ b/arch/powerpc/include/asm/iommu.h > @@ -64,6 +64,11 @@ struct iommu_table_ops { > long index, > unsigned long *hpa, > enum dma_data_direction *direction); > + /* Real mode */ > + int (*exchange_rm)(struct iommu_table *tbl, > + long index, > + unsigned long *hpa, > + enum dma_data_direction *direction); > #endif > void (*clear)(struct iommu_table *tbl, > long index, long npages); > @@ -208,6 +213,8 @@ extern void iommu_del_device(struct device *dev); > extern int __init tce_iommu_bus_notifier_init(void); > extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry, > unsigned long *hpa, enum dma_data_direction *direction); > +extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry, > + unsigned long *hpa, enum dma_data_direction *direction); > #else > static inline void iommu_register_group(struct iommu_table_group > *table_group, > int pci_domain_number, > diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c > index a8e3490..2fcc48b 100644 > --- a/arch/powerpc/kernel/iommu.c > +++ b/arch/powerpc/kernel/iommu.c > @@ -1062,6 +1062,21 @@ void iommu_release_ownership(struct iommu_table *tbl) > } > EXPORT_SYMBOL_GPL(iommu_release_ownership); > > +long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry, > + unsigned long *hpa, enum dma_data_direction *direction) > +{ > + long ret; > + > + ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction); > + > + if (!ret && ((*direction == DMA_FROM_DEVICE) || > + (*direction == DMA_BIDIRECTIONAL))) > + SetPageDirty(realmode_pfn_to_page(*hpa >> PAGE_SHIFT)); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(iommu_tce_xchg_rm); > int iommu_add_device(struct device *dev) > { > struct iommu_table *tbl; > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index c5baaf3..bed1944 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -1791,6 +1791,18 @@ static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, > long index, > > return ret; > } > + > +static int pnv_ioda1_tce_xchg_rm(struct iommu_table *tbl, long index, > + unsigned long *hpa, enum dma_data_direction *direction) > +{ > + long ret = pnv_tce_xchg(tbl, index, hpa, direction); > + > + if (!ret && (tbl->it_type & > + (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE))) > + pnv_pci_ioda1_tce_invalidate(tbl, index, 1, true); > + > + return ret; > +} > #endif Both your _rm variants are identical to the non _rm versions. Why not just set the function poiinter to the same thing, rather than copying the whole function. > static void pnv_ioda1_tce_free(struct iommu_table *tbl, long index, > @@ -1806,6 +1818,7 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = { > .set = pnv_ioda1_tce_build, > #ifdef CONFIG_IOMMU_API > .exchange = pnv_ioda1_tce_xchg, > + .exchange_rm = pnv_ioda1_tce_xchg_rm, > #endif > .clear = pnv_ioda1_tce_free, > .get = pnv_tce_get, > @@ -1866,7 +1879,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct > iommu_table *tbl, > { > struct iommu_table_group_link *tgl; > > - list_for_each_entry_rcu(tgl, >it_group_list, next) { > + list_for_each_entry_lockless(tgl, >it_group_list, next) { > struct pnv_ioda_pe *npe; > struct pnv_ioda_pe *pe = container_of(tgl->table_group, > struct pnv_ioda_pe, table_group); > @@ -1918,6 +1931,18 @@ static int
Re: [PATCH v6 14/20] cxl: Support to flash a new image on the adapter from a guest
> +struct cxl_adapter_image { > +__u64 flags; > +__u64 data; > +__u64 len_data; > +__u64 len_image; > +__u64 reserved1; > +__u64 reserved2; > +__u64 reserved3; > +__u64 reserved4; > +}; Thanks, that looks better now :) Acked-by: Ian Munsie___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 00/20] cxl: Add support for powerVM guest
Thanks guys - I'm pretty happy with this series now and am happy for this to be merged, unless @mpe has any comments. Cheers, -Ian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH kernel 1/9] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number
On Mon, Mar 07, 2016 at 02:41:09PM +1100, Alexey Kardashevskiy wrote: > This adds a capability number for in-kernel support for VFIO on > SPAPR platform. > > The capability will tell the user space whether in-kernel handlers of > H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space > must not attempt allocating a TCE table in the host kernel via > the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests > will not be passed to the user space which is desired action in > the situation like that. > > Signed-off-by: Alexey KardashevskiyReviewed-by: David Gibson > --- > include/uapi/linux/kvm.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index c251f06..080ffbf 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -863,6 +863,7 @@ struct kvm_ppc_smmu_info { > #define KVM_CAP_HYPERV_SYNIC 123 > #define KVM_CAP_S390_RI 124 > #define KVM_CAP_SPAPR_TCE_64 125 > +#define KVM_CAP_SPAPR_TCE_VFIO 126 > > #ifdef KVM_CAP_IRQ_ROUTING > -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH kernel 2/9] powerpc/mmu: Add real mode support for IOMMU preregistered memory
On Mon, Mar 07, 2016 at 02:41:10PM +1100, Alexey Kardashevskiy wrote: > This makes mm_iommu_lookup() able to work in realmode by replacing > list_for_each_entry_rcu() (which can do debug stuff which can fail in > real mode) with list_for_each_entry_lockless(). > > This adds realmode version of mm_iommu_ua_to_hpa() which adds > explicit vmalloc'd-to-linear address conversion. > Unlike mm_iommu_ua_to_hpa(), mm_iommu_rm_ua_to_hpa() can fail. > > This changes mm_iommu_preregistered() to receive @mm as in real mode > @current does not always have a correct pointer. So, I'd generally expect a parameter called @mm to be an mm_struct *, not a mm_context_t. > > This adds realmode version of mm_iommu_lookup() which receives @mm > (for the same reason as for mm_iommu_preregistered()) and uses > lockless version of list_for_each_entry_rcu(). > > Signed-off-by: Alexey Kardashevskiy> --- > arch/powerpc/include/asm/mmu_context.h | 6 - > arch/powerpc/mm/mmu_context_iommu.c| 45 > ++ > 2 files changed, 45 insertions(+), 6 deletions(-) > > diff --git a/arch/powerpc/include/asm/mmu_context.h > b/arch/powerpc/include/asm/mmu_context.h > index 878c277..3ba652a 100644 > --- a/arch/powerpc/include/asm/mmu_context.h > +++ b/arch/powerpc/include/asm/mmu_context.h > @@ -18,7 +18,7 @@ extern void destroy_context(struct mm_struct *mm); > #ifdef CONFIG_SPAPR_TCE_IOMMU > struct mm_iommu_table_group_mem_t; > > -extern bool mm_iommu_preregistered(void); > +extern bool mm_iommu_preregistered(mm_context_t *mm); > extern long mm_iommu_get(unsigned long ua, unsigned long entries, > struct mm_iommu_table_group_mem_t **pmem); > extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem); > @@ -26,10 +26,14 @@ extern void mm_iommu_init(mm_context_t *ctx); > extern void mm_iommu_cleanup(mm_context_t *ctx); > extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua, > unsigned long size); > +extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(mm_context_t > *mm, > + unsigned long ua, unsigned long size); > extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua, > unsigned long entries); > extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, > unsigned long ua, unsigned long *hpa); > +extern long mm_iommu_rm_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, > + unsigned long ua, unsigned long *hpa); > extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem); > extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem); > #endif > diff --git a/arch/powerpc/mm/mmu_context_iommu.c > b/arch/powerpc/mm/mmu_context_iommu.c > index da6a216..aa1565d 100644 > --- a/arch/powerpc/mm/mmu_context_iommu.c > +++ b/arch/powerpc/mm/mmu_context_iommu.c > @@ -63,12 +63,9 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm, > return ret; > } > > -bool mm_iommu_preregistered(void) > +bool mm_iommu_preregistered(mm_context_t *mm) > { > - if (!current || !current->mm) > - return false; > - > - return !list_empty(>mm->context.iommu_group_mem_list); > + return !list_empty(>iommu_group_mem_list); > } > EXPORT_SYMBOL_GPL(mm_iommu_preregistered); > > @@ -231,6 +228,24 @@ unlock_exit: > } > EXPORT_SYMBOL_GPL(mm_iommu_put); > > +struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(mm_context_t *mm, > + unsigned long ua, unsigned long size) > +{ > + struct mm_iommu_table_group_mem_t *mem, *ret = NULL; I think you could do with a comment here explaining why the lockless traversal is safe. > + list_for_each_entry_lockless(mem, >iommu_group_mem_list, next) { > + if ((mem->ua <= ua) && > + (ua + size <= mem->ua + > + (mem->entries << PAGE_SHIFT))) { > + ret = mem; > + break; > + } > + } > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(mm_iommu_lookup_rm); > + > struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua, > unsigned long size) > { > @@ -284,6 +299,26 @@ long mm_iommu_ua_to_hpa(struct > mm_iommu_table_group_mem_t *mem, > } > EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa); > > +long mm_iommu_rm_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, > + unsigned long ua, unsigned long *hpa) > +{ > + const long entry = (ua - mem->ua) >> PAGE_SHIFT; > + void *va = >hpas[entry]; > + unsigned long *ra; > + > + if (entry >= mem->entries) > + return -EFAULT; > + > + ra = (void *) vmalloc_to_phys(va); > + if (!ra) > + return -EFAULT; > + > + *hpa = *ra | (ua & ~PAGE_MASK); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(mm_iommu_rm_ua_to_hpa); > + > long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem) > { > if
Re: [PATCH v6 20/20] cxl: Remove cxl_get_phys_dev() kernel API
Acked-by: Ian Munsie___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 13/20] cxl: sysfs support for guests
Acked-by: Ian Munsie___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/process: fix altivec SPR not being saved
Hi Oliver, > In save_sprs() in process.c contains the following test: > > if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC))) > t->vrsave = mfspr(SPRN_VRSAVE); > > CPU feature with the mask 0x1 is CPU_FTR_COHERENT_ICACHE so the test > is equivilent to: > > if (cpu_has_feature(CPU_FTR_ALTIVEC) && > cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) > > On CPUs without support for both (i.e G5) this results in vrsave not > being saved between context switches. The vector register > save/restore code doesn't use VRSAVE to determine which registers to > save/restore, but the value of VRSAVE is used to determine if altivec > is being used in several code paths. Nice catch, not sure how I missed that. As Ben suggests, it should definitely go to -stable as well. Feel free to add my sign off: Signed-off-by: Anton BlanchardAnton > Signed-off-by: Oliver O'Halloran > --- > arch/powerpc/kernel/process.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/process.c > b/arch/powerpc/kernel/process.c index 8224852..5a4d4d1 100644 > --- a/arch/powerpc/kernel/process.c > +++ b/arch/powerpc/kernel/process.c > @@ -855,7 +855,7 @@ void restore_tm_state(struct pt_regs *regs) > static inline void save_sprs(struct thread_struct *t) > { > #ifdef CONFIG_ALTIVEC > - if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC))) > + if (cpu_has_feature(CPU_FTR_ALTIVEC)) > t->vrsave = mfspr(SPRN_VRSAVE); > #endif > #ifdef CONFIG_PPC_BOOK3S_64 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/process: fix altivec SPR not being saved
On Mon, 2016-03-07 at 09:33 +1100, Oliver O'Halloran wrote: > In save_sprs() in process.c contains the following test: > > if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC))) > t->vrsave = mfspr(SPRN_VRSAVE); > > CPU feature with the mask 0x1 is CPU_FTR_COHERENT_ICACHE so the test > is equivilent to: > > if (cpu_has_feature(CPU_FTR_ALTIVEC) && > cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) > > On CPUs without support for both (i.e G5) this results in vrsave not > being > saved between context switches. The vector register save/restore code > doesn't use VRSAVE to determine which registers to save/restore, > but the value of VRSAVE is used to determine if altivec is being used > in several code paths. Nice one, should probably go to stable ! > Signed-off-by: Oliver O'Halloran> --- > Â arch/powerpc/kernel/process.c | 2 +- > Â 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/process.c > b/arch/powerpc/kernel/process.c > index 8224852..5a4d4d1 100644 > --- a/arch/powerpc/kernel/process.c > +++ b/arch/powerpc/kernel/process.c > @@ -855,7 +855,7 @@ void restore_tm_state(struct pt_regs *regs) > Â static inline void save_sprs(struct thread_struct *t) > Â { > Â #ifdef CONFIG_ALTIVEC > - if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC))) > + if (cpu_has_feature(CPU_FTR_ALTIVEC)) > Â t->vrsave = mfspr(SPRN_VRSAVE); > Â #endif > Â #ifdef CONFIG_PPC_BOOK3S_64 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH kernel 5/9] KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently
It does not make much sense to have KVM in book3s-64 and not to have IOMMU bits for PCI pass through support as it costs little and allows VFIO to function on book3s KVM. Having IOMMU_API always enabled makes it unnecessary to have a lot of "#ifdef IOMMU_API" in arch/powerpc/kvm/book3s_64_vio*. With those ifdef's we could have only user space emulated devices accelerated (but not VFIO) which do not seem to be very useful. Signed-off-by: Alexey Kardashevskiy--- arch/powerpc/kvm/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index c2024ac..1059846 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -64,6 +64,7 @@ config KVM_BOOK3S_64 select KVM_BOOK3S_64_HANDLER select KVM select KVM_BOOK3S_PR_POSSIBLE if !KVM_BOOK3S_HV_POSSIBLE + select SPAPR_TCE_IOMMU if IOMMU_SUPPORT ---help--- Support running unmodified book3s_64 and book3s_32 guest kernels in virtual machines on book3s_64 host processors. -- 2.5.0.rc3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH kernel 8/9] KVM: PPC: Add in-kernel handling for VFIO
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO without passing them to user space which saves time on switching to user space and back. Both real and virtual modes are supported. The kernel tries to handle a TCE request in the real mode, if fails it passes the request to the virtual mode to complete the operation. If it a virtual mode handler fails, the request is passed to user space; this is not expected to happen ever though. The first user of this is VFIO on POWER. Trampolines to the VFIO external user API functions are required for this patch. This uses a VFIO KVM device to associate a logical bus number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling of map/unmap requests. To make use of the feature, the user space has to create a guest view of the TCE table via KVM_CAP_SPAPR_TCE/KVM_CAP_SPAPR_TCE_64 and then associate a LIOBN with this table via VFIO KVM device, a KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN property (which is added in the next patch). Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Signed-off-by: Alexey Kardashevskiy--- arch/powerpc/kvm/book3s_64_vio.c| 184 +++ arch/powerpc/kvm/book3s_64_vio_hv.c | 186 2 files changed, 370 insertions(+) diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index 7965fc7..9417d12 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include #include @@ -317,11 +318,161 @@ fail: return ret; } +static long kvmppc_tce_iommu_mapped_dec(struct iommu_table *tbl, + unsigned long entry) +{ + struct mm_iommu_table_group_mem_t *mem = NULL; + const unsigned long pgsize = 1ULL << tbl->it_page_shift; + unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry); + + if (!pua) + return H_HARDWARE; + + mem = mm_iommu_lookup(*pua, pgsize); + if (!mem) + return H_HARDWARE; + + mm_iommu_mapped_dec(mem); + + *pua = 0; + + return H_SUCCESS; +} + +static long kvmppc_tce_iommu_unmap(struct iommu_table *tbl, + unsigned long entry) +{ + enum dma_data_direction dir = DMA_NONE; + unsigned long hpa = 0; + + if (iommu_tce_xchg(tbl, entry, , )) + return H_HARDWARE; + + if (dir == DMA_NONE) + return H_SUCCESS; + + return kvmppc_tce_iommu_mapped_dec(tbl, entry); +} + +long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl, + unsigned long entry, unsigned long gpa, + enum dma_data_direction dir) +{ + long ret; + unsigned long hpa, ua, *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry); + struct mm_iommu_table_group_mem_t *mem; + + if (!pua) + return H_HARDWARE; + + if (kvmppc_gpa_to_ua(kvm, gpa, , NULL)) + return H_HARDWARE; + + mem = mm_iommu_lookup(ua, 1ULL << tbl->it_page_shift); + if (!mem) + return H_HARDWARE; + + if (mm_iommu_ua_to_hpa(mem, ua, )) + return H_HARDWARE; + + if (mm_iommu_mapped_inc(mem)) + return H_HARDWARE; + + ret = iommu_tce_xchg(tbl, entry, , ); + if (ret) { + mm_iommu_mapped_dec(mem); + return H_TOO_HARD; + } + + if (dir != DMA_NONE) + kvmppc_tce_iommu_mapped_dec(tbl, entry); + + *pua = ua; + + return 0; +} + +long kvmppc_h_put_tce_iommu(struct kvm_vcpu *vcpu, + struct iommu_table *tbl, + unsigned long liobn, unsigned long ioba, + unsigned long tce) +{ + long idx, ret = H_HARDWARE; + const unsigned long entry = ioba >> tbl->it_page_shift; + const unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE); + const enum dma_data_direction dir = iommu_tce_direction(tce); + + /* Clear TCE */ + if (dir == DMA_NONE) { + if (iommu_tce_clear_param_check(tbl, ioba, 0, 1)) + return H_PARAMETER; + + return kvmppc_tce_iommu_unmap(tbl, entry); + } + + /* Put TCE */ + if (iommu_tce_put_param_check(tbl, ioba, tce)) + return H_PARAMETER; + + idx = srcu_read_lock(>kvm->srcu); + ret = kvmppc_tce_iommu_map(vcpu->kvm, tbl, entry, gpa, dir); + srcu_read_unlock(>kvm->srcu, idx); + + return ret; +} + +static long kvmppc_h_put_tce_indirect_iommu(struct kvm_vcpu *vcpu, + struct iommu_table *tbl, unsigned long ioba, + u64 __user *tces, unsigned long npages) +{ + unsigned long i, ret, tce, gpa; + const unsigned long entry = ioba >>
[PATCH kernel 3/9] KVM: PPC: Use preregistered memory API to access TCE list
VFIO on sPAPR already implements guest memory pre-registration when the entire guest RAM gets pinned. This can be used to translate the physical address of a guest page containing the TCE list from H_PUT_TCE_INDIRECT. This makes use of the pre-registrered memory API to access TCE list pages in order to avoid unnecessary locking on the KVM memory reverse map. Signed-off-by: Alexey Kardashevskiy--- arch/powerpc/kvm/book3s_64_vio_hv.c | 86 ++--- 1 file changed, 70 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index 44be73e..af155f6 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -180,6 +180,38 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa, EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua); #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE +static mm_context_t *kvmppc_mm_context(struct kvm_vcpu *vcpu) +{ + struct task_struct *task; + + task = vcpu->arch.run_task; + if (unlikely(!task || !task->mm)) + return NULL; + + return >mm->context; +} + +static inline bool kvmppc_preregistered(struct kvm_vcpu *vcpu) +{ + mm_context_t *mm = kvmppc_mm_context(vcpu); + + if (unlikely(!mm)) + return false; + + return mm_iommu_preregistered(mm); +} + +static struct mm_iommu_table_group_mem_t *kvmppc_rm_iommu_lookup( + struct kvm_vcpu *vcpu, unsigned long ua, unsigned long size) +{ + mm_context_t *mm = kvmppc_mm_context(vcpu); + + if (unlikely(!mm)) + return NULL; + + return mm_iommu_lookup_rm(mm, ua, size); +} + long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba, unsigned long tce) { @@ -261,23 +293,44 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, if (ret != H_SUCCESS) return ret; - if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , )) - return H_TOO_HARD; + if (kvmppc_preregistered(vcpu)) { + /* +* We get here if guest memory was pre-registered which +* is normally VFIO case and gpa->hpa translation does not +* depend on hpt. +*/ + struct mm_iommu_table_group_mem_t *mem; - rmap = (void *) vmalloc_to_phys(rmap); + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , NULL)) + return H_TOO_HARD; - /* -* Synchronize with the MMU notifier callbacks in -* book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.). -* While we have the rmap lock, code running on other CPUs -* cannot finish unmapping the host real page that backs -* this guest real page, so we are OK to access the host -* real page. -*/ - lock_rmap(rmap); - if (kvmppc_rm_ua_to_hpa(vcpu, ua, )) { - ret = H_TOO_HARD; - goto unlock_exit; + mem = kvmppc_rm_iommu_lookup(vcpu, ua, IOMMU_PAGE_SIZE_4K); + if (!mem || mm_iommu_rm_ua_to_hpa(mem, ua, )) + return H_TOO_HARD; + } else { + /* +* This is emulated devices case. +* We do not require memory to be preregistered in this case +* so lock rmap and do __find_linux_pte_or_hugepte(). +*/ + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, , )) + return H_TOO_HARD; + + rmap = (void *) vmalloc_to_phys(rmap); + + /* +* Synchronize with the MMU notifier callbacks in +* book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.). +* While we have the rmap lock, code running on other CPUs +* cannot finish unmapping the host real page that backs +* this guest real page, so we are OK to access the host +* real page. +*/ + lock_rmap(rmap); + if (kvmppc_rm_ua_to_hpa(vcpu, ua, )) { + ret = H_TOO_HARD; + goto unlock_exit; + } } for (i = 0; i < npages; ++i) { @@ -291,7 +344,8 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, } unlock_exit: - unlock_rmap(rmap); + if (rmap) + unlock_rmap(rmap); return ret; } -- 2.5.0.rc3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH kernel 2/9] powerpc/mmu: Add real mode support for IOMMU preregistered memory
This makes mm_iommu_lookup() able to work in realmode by replacing list_for_each_entry_rcu() (which can do debug stuff which can fail in real mode) with list_for_each_entry_lockless(). This adds realmode version of mm_iommu_ua_to_hpa() which adds explicit vmalloc'd-to-linear address conversion. Unlike mm_iommu_ua_to_hpa(), mm_iommu_rm_ua_to_hpa() can fail. This changes mm_iommu_preregistered() to receive @mm as in real mode @current does not always have a correct pointer. This adds realmode version of mm_iommu_lookup() which receives @mm (for the same reason as for mm_iommu_preregistered()) and uses lockless version of list_for_each_entry_rcu(). Signed-off-by: Alexey Kardashevskiy--- arch/powerpc/include/asm/mmu_context.h | 6 - arch/powerpc/mm/mmu_context_iommu.c| 45 ++ 2 files changed, 45 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 878c277..3ba652a 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -18,7 +18,7 @@ extern void destroy_context(struct mm_struct *mm); #ifdef CONFIG_SPAPR_TCE_IOMMU struct mm_iommu_table_group_mem_t; -extern bool mm_iommu_preregistered(void); +extern bool mm_iommu_preregistered(mm_context_t *mm); extern long mm_iommu_get(unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem); extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem); @@ -26,10 +26,14 @@ extern void mm_iommu_init(mm_context_t *ctx); extern void mm_iommu_cleanup(mm_context_t *ctx); extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua, unsigned long size); +extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(mm_context_t *mm, + unsigned long ua, unsigned long size); extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua, unsigned long entries); extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, unsigned long ua, unsigned long *hpa); +extern long mm_iommu_rm_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, + unsigned long ua, unsigned long *hpa); extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem); extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem); #endif diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index da6a216..aa1565d 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -63,12 +63,9 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm, return ret; } -bool mm_iommu_preregistered(void) +bool mm_iommu_preregistered(mm_context_t *mm) { - if (!current || !current->mm) - return false; - - return !list_empty(>mm->context.iommu_group_mem_list); + return !list_empty(>iommu_group_mem_list); } EXPORT_SYMBOL_GPL(mm_iommu_preregistered); @@ -231,6 +228,24 @@ unlock_exit: } EXPORT_SYMBOL_GPL(mm_iommu_put); +struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(mm_context_t *mm, + unsigned long ua, unsigned long size) +{ + struct mm_iommu_table_group_mem_t *mem, *ret = NULL; + + list_for_each_entry_lockless(mem, >iommu_group_mem_list, next) { + if ((mem->ua <= ua) && + (ua + size <= mem->ua + +(mem->entries << PAGE_SHIFT))) { + ret = mem; + break; + } + } + + return ret; +} +EXPORT_SYMBOL_GPL(mm_iommu_lookup_rm); + struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua, unsigned long size) { @@ -284,6 +299,26 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, } EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa); +long mm_iommu_rm_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, + unsigned long ua, unsigned long *hpa) +{ + const long entry = (ua - mem->ua) >> PAGE_SHIFT; + void *va = >hpas[entry]; + unsigned long *ra; + + if (entry >= mem->entries) + return -EFAULT; + + ra = (void *) vmalloc_to_phys(va); + if (!ra) + return -EFAULT; + + *hpa = *ra | (ua & ~PAGE_MASK); + + return 0; +} +EXPORT_SYMBOL_GPL(mm_iommu_rm_ua_to_hpa); + long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem) { if (atomic64_inc_not_zero(>mapped)) -- 2.5.0.rc3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH kernel 1/9] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number
This adds a capability number for in-kernel support for VFIO on SPAPR platform. The capability will tell the user space whether in-kernel handlers of H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space must not attempt allocating a TCE table in the host kernel via the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests will not be passed to the user space which is desired action in the situation like that. Signed-off-by: Alexey Kardashevskiy--- include/uapi/linux/kvm.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index c251f06..080ffbf 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -863,6 +863,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_HYPERV_SYNIC 123 #define KVM_CAP_S390_RI 124 #define KVM_CAP_SPAPR_TCE_64 125 +#define KVM_CAP_SPAPR_TCE_VFIO 126 #ifdef KVM_CAP_IRQ_ROUTING -- 2.5.0.rc3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH kernel 9/9] KVM: PPC: VFIO device: support SPAPR TCE
sPAPR TCE IOMMU is para-virtualized and the guest does map/unmap via hypercalls which take a logical bus id (LIOBN) as a target IOMMU identifier. LIOBNs are made up, advertised to guest systems and linked to IOMMU groups by the user space. In order to enable acceleration for IOMMU operations in KVM, we need to tell KVM the information about the LIOBN-to-group mapping. For that, a new KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN parameter is added which accepts: - a VFIO group fd and IO base address to find the actual hardware TCE table; - a LIOBN to assign to the found table. Before notifying KVM about new link, this check the group for being registered with KVM device in order to release them at unexpected KVM finish. This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user space. While we are here, this also fixes VFIO KVM device compiling to let it link to a KVM module. Signed-off-by: Alexey Kardashevskiy--- Documentation/virtual/kvm/devices/vfio.txt | 21 +- arch/powerpc/kvm/Kconfig | 1 + arch/powerpc/kvm/Makefile | 5 +- arch/powerpc/kvm/powerpc.c | 1 + include/uapi/linux/kvm.h | 9 +++ virt/kvm/vfio.c| 106 + 6 files changed, 140 insertions(+), 3 deletions(-) diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt index ef51740..c0d3eb7 100644 --- a/Documentation/virtual/kvm/devices/vfio.txt +++ b/Documentation/virtual/kvm/devices/vfio.txt @@ -16,7 +16,24 @@ Groups: KVM_DEV_VFIO_GROUP attributes: KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking + kvm_device_attr.addr points to an int32_t file descriptor + for the VFIO group. + KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking + kvm_device_attr.addr points to an int32_t file descriptor + for the VFIO group. -For each, kvm_device_attr.addr points to an int32_t file descriptor -for the VFIO group. + KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN: sets a liobn for a VFIO group + kvm_device_attr.addr points to a struct: + struct kvm_vfio_spapr_tce_liobn { + __u32 argsz; + __s32 fd; + __u32 liobn; + __u8pad[4]; + __u64 start_addr; + }; + where + @argsz is the size of kvm_vfio_spapr_tce_liobn; + @fd is a file descriptor for a VFIO group; + @liobn is a logical bus id to be associated with the group; + @start_addr is a DMA window offset on the IO (PCI) bus diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 1059846..dfa3488 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -65,6 +65,7 @@ config KVM_BOOK3S_64 select KVM select KVM_BOOK3S_PR_POSSIBLE if !KVM_BOOK3S_HV_POSSIBLE select SPAPR_TCE_IOMMU if IOMMU_SUPPORT + select KVM_VFIO if VFIO ---help--- Support running unmodified book3s_64 and book3s_32 guest kernels in virtual machines on book3s_64 host processors. diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index 7f7b6d8..71f577c 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -8,7 +8,7 @@ ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm KVM := ../../../virt/kvm common-objs-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ - $(KVM)/eventfd.o $(KVM)/vfio.o + $(KVM)/eventfd.o CFLAGS_e500_mmu.o := -I. CFLAGS_e500_mmu_host.o := -I. @@ -87,6 +87,9 @@ endif kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \ book3s_xics.o +kvm-book3s_64-objs-$(CONFIG_KVM_VFIO) += \ + $(KVM)/vfio.o \ + kvm-book3s_64-module-objs += \ $(KVM)/kvm_main.o \ $(KVM)/eventfd.o \ diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 19aa59b..63f188d 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -521,6 +521,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) #ifdef CONFIG_PPC_BOOK3S_64 case KVM_CAP_SPAPR_TCE: case KVM_CAP_SPAPR_TCE_64: + case KVM_CAP_SPAPR_TCE_VFIO: case KVM_CAP_PPC_ALLOC_HTAB: case KVM_CAP_PPC_RTAS: case KVM_CAP_PPC_FIXUP_HCALL: diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 080ffbf..f1abbea 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1056,6 +1056,7 @@ struct kvm_device_attr { #define KVM_DEV_VFIO_GROUP1 #define KVM_DEV_VFIO_GROUP_ADD 1 #define KVM_DEV_VFIO_GROUP_DEL 2 +#define KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN 3 enum kvm_device_type { KVM_DEV_TYPE_FSL_MPIC_20= 1, @@ -1075,6 +1076,14 @@ enum kvm_device_type {
[PATCH kernel 7/9] KVM: PPC: Create a virtual-mode only TCE table handlers
In-kernel VFIO acceleration needs different handling in real and virtual modes which makes it hard to support both modes in the same handler. This creates a copy of kvmppc_rm_h_stuff_tce and kvmppc_rm_h_put_tce in addition to the existing kvmppc_rm_h_put_tce_indirect. Signed-off-by: Alexey Kardashevskiy--- arch/powerpc/kvm/book3s_64_vio.c| 52 + arch/powerpc/kvm/book3s_64_vio_hv.c | 8 ++--- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 +-- 3 files changed, 57 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index 846d16d..7965fc7 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -317,6 +317,32 @@ fail: return ret; } +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, + unsigned long ioba, unsigned long tce) +{ + struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn); + long ret; + + /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */ + /* liobn, ioba, tce); */ + + if (!stt) + return H_TOO_HARD; + + ret = kvmppc_ioba_validate(stt, ioba, 1); + if (ret != H_SUCCESS) + return ret; + + ret = kvmppc_tce_validate(stt, tce); + if (ret != H_SUCCESS) + return ret; + + kvmppc_tce_put(stt, ioba >> stt->page_shift, tce); + + return H_SUCCESS; +} +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce); + long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba, unsigned long tce_list, unsigned long npages) @@ -372,3 +398,29 @@ unlock_exit: return ret; } EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect); + +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu, + unsigned long liobn, unsigned long ioba, + unsigned long tce_value, unsigned long npages) +{ + struct kvmppc_spapr_tce_table *stt; + long i, ret; + + stt = kvmppc_find_table(vcpu, liobn); + if (!stt) + return H_TOO_HARD; + + ret = kvmppc_ioba_validate(stt, ioba, npages); + if (ret != H_SUCCESS) + return ret; + + /* Check permission bits only to allow userspace poison TCE for debug */ + if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)) + return H_PARAMETER; + + for (i = 0; i < npages; ++i, ioba += (1ULL << stt->page_shift)) + kvmppc_tce_put(stt, ioba >> stt->page_shift, tce_value); + + return H_SUCCESS; +} +EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce); diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index af155f6..11163ae 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -212,8 +212,8 @@ static struct mm_iommu_table_group_mem_t *kvmppc_rm_iommu_lookup( return mm_iommu_lookup_rm(mm, ua, size); } -long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, - unsigned long ioba, unsigned long tce) +long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, + unsigned long ioba, unsigned long tce) { struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn); long ret; @@ -236,7 +236,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, return H_SUCCESS; } -EXPORT_SYMBOL_GPL(kvmppc_h_put_tce); static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu, unsigned long ua, unsigned long *phpa) @@ -350,7 +349,7 @@ unlock_exit: return ret; } -long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu, +long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba, unsigned long tce_value, unsigned long npages) { @@ -374,7 +373,6 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu, return H_SUCCESS; } -EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce); long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index ed16182..d6dad2c 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1928,7 +1928,7 @@ hcall_real_table: .long DOTSYM(kvmppc_h_clear_ref) - hcall_real_table .long DOTSYM(kvmppc_h_protect) - hcall_real_table .long DOTSYM(kvmppc_h_get_tce) - hcall_real_table - .long DOTSYM(kvmppc_h_put_tce) - hcall_real_table + .long DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table .long 0 /* 0x24 - H_SET_SPRG0 */ .long DOTSYM(kvmppc_h_set_dabr) - hcall_real_table .long 0 /* 0x2c */ @@ -2006,7 +2006,7 @@ hcall_real_table: .long 0 /* 0x12c */ .long 0
[PATCH kernel 6/9] KVM: PPC: Associate IOMMU group with guest view of TCE table
The existing in-kernel TCE table for emulated devices contains guest physical addresses which are accesses by emulated devices. Since we need to keep this information for VFIO devices too in order to implement H_GET_TCE, we are reusing it. This adds IOMMU group list to kvmppc_spapr_tce_table. Each group will have an iommu_table pointer. This adds kvm_spapr_tce_attach_iommu_group() helper and its detach counterpart to manage the lists. This puts a group when: - guest copy of TCE table is destroyed when TCE table fd is closed; - kvm_spapr_tce_detach_iommu_group() is called from the KVM_DEV_VFIO_GROUP_DEL ioctl handler in the case vfio-pci hotunplug (will be added in the following patch). Signed-off-by: Alexey Kardashevskiy--- arch/powerpc/include/asm/kvm_host.h | 8 +++ arch/powerpc/include/asm/kvm_ppc.h | 6 ++ arch/powerpc/kvm/book3s_64_vio.c| 108 3 files changed, 122 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2e7c791..2c5c823 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -178,6 +178,13 @@ struct kvmppc_pginfo { atomic_t refcnt; }; +struct kvmppc_spapr_tce_group { + struct list_head next; + struct rcu_head rcu; + struct iommu_group *refgrp;/* for reference counting only */ + struct iommu_table *tbl; +}; + struct kvmppc_spapr_tce_table { struct list_head list; struct kvm *kvm; @@ -186,6 +193,7 @@ struct kvmppc_spapr_tce_table { u32 page_shift; u64 offset; /* in pages */ u64 size; /* window size in pages */ + struct list_head groups; struct page *pages[0]; }; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 2544eda..d1482dc 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -164,6 +164,12 @@ extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, unsigned long porder); extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); +extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, + unsigned long liobn, + phys_addr_t start_addr, + struct iommu_group *grp); +extern void kvm_spapr_tce_detach_iommu_group(struct kvm *kvm, + struct iommu_group *grp); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce_64 *args); extern struct kvmppc_spapr_tce_table *kvmppc_find_table( diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index 2c2d103..846d16d 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -95,10 +96,18 @@ static void release_spapr_tce_table(struct rcu_head *head) struct kvmppc_spapr_tce_table *stt = container_of(head, struct kvmppc_spapr_tce_table, rcu); unsigned long i, npages = kvmppc_tce_pages(stt->size); + struct kvmppc_spapr_tce_group *kg; for (i = 0; i < npages; i++) __free_page(stt->pages[i]); + while (!list_empty(>groups)) { + kg = list_first_entry(>groups, + struct kvmppc_spapr_tce_group, next); + list_del(>next); + kfree(kg); + } + kfree(stt); } @@ -129,9 +138,15 @@ static int kvm_spapr_tce_mmap(struct file *file, struct vm_area_struct *vma) static int kvm_spapr_tce_release(struct inode *inode, struct file *filp) { struct kvmppc_spapr_tce_table *stt = filp->private_data; + struct kvmppc_spapr_tce_group *kg; list_del_rcu(>list); + list_for_each_entry_rcu(kg, >groups, next) { + iommu_group_put(kg->refgrp); + kg->refgrp = NULL; + } + kvm_put_kvm(stt->kvm); kvmppc_account_memlimit( @@ -146,6 +161,98 @@ static const struct file_operations kvm_spapr_tce_fops = { .release= kvm_spapr_tce_release, }; +extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, + unsigned long liobn, + phys_addr_t start_addr, + struct iommu_group *grp) +{ + struct kvmppc_spapr_tce_table *stt = NULL; + struct iommu_table_group *table_group; + long i; + bool found = false; + struct kvmppc_spapr_tce_group *kg; + struct iommu_table *tbltmp; + + /* Check this LIOBN hasn't been previously allocated */ + list_for_each_entry_rcu(stt, >arch.spapr_tce_tables, list) { + if (stt->liobn == liobn) { + if ((stt->offset <<
[PATCH kernel 4/9] powerpc/powernv/iommu: Add real mode version of xchg()
In real mode, TCE tables are invalidated using different cache-inhibited store instructions which is different from the virtual mode. This defines and implements exchange_rm() callback. This does not define set_rm/clear_rm/flush_rm callbacks as there is no user for those - exchange/exchange_rm are only to be used by KVM for VFIO. The exchange_rm callback is defined for IODA1/IODA2 powernv platforms. This replaces list_for_each_entry_rcu with its lockless version as from now on pnv_pci_ioda2_tce_invalidate() can be called in the real mode too. Signed-off-by: Alexey Kardashevskiy--- arch/powerpc/include/asm/iommu.h | 7 +++ arch/powerpc/kernel/iommu.c | 15 +++ arch/powerpc/platforms/powernv/pci-ioda.c | 28 +++- 3 files changed, 49 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 7b87bab..3ca877a 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -64,6 +64,11 @@ struct iommu_table_ops { long index, unsigned long *hpa, enum dma_data_direction *direction); + /* Real mode */ + int (*exchange_rm)(struct iommu_table *tbl, + long index, + unsigned long *hpa, + enum dma_data_direction *direction); #endif void (*clear)(struct iommu_table *tbl, long index, long npages); @@ -208,6 +213,8 @@ extern void iommu_del_device(struct device *dev); extern int __init tce_iommu_bus_notifier_init(void); extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry, unsigned long *hpa, enum dma_data_direction *direction); +extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry, + unsigned long *hpa, enum dma_data_direction *direction); #else static inline void iommu_register_group(struct iommu_table_group *table_group, int pci_domain_number, diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index a8e3490..2fcc48b 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1062,6 +1062,21 @@ void iommu_release_ownership(struct iommu_table *tbl) } EXPORT_SYMBOL_GPL(iommu_release_ownership); +long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry, + unsigned long *hpa, enum dma_data_direction *direction) +{ + long ret; + + ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction); + + if (!ret && ((*direction == DMA_FROM_DEVICE) || + (*direction == DMA_BIDIRECTIONAL))) + SetPageDirty(realmode_pfn_to_page(*hpa >> PAGE_SHIFT)); + + return ret; +} +EXPORT_SYMBOL_GPL(iommu_tce_xchg_rm); + int iommu_add_device(struct device *dev) { struct iommu_table *tbl; diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index c5baaf3..bed1944 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1791,6 +1791,18 @@ static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, long index, return ret; } + +static int pnv_ioda1_tce_xchg_rm(struct iommu_table *tbl, long index, + unsigned long *hpa, enum dma_data_direction *direction) +{ + long ret = pnv_tce_xchg(tbl, index, hpa, direction); + + if (!ret && (tbl->it_type & + (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE))) + pnv_pci_ioda1_tce_invalidate(tbl, index, 1, true); + + return ret; +} #endif static void pnv_ioda1_tce_free(struct iommu_table *tbl, long index, @@ -1806,6 +1818,7 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = { .set = pnv_ioda1_tce_build, #ifdef CONFIG_IOMMU_API .exchange = pnv_ioda1_tce_xchg, + .exchange_rm = pnv_ioda1_tce_xchg_rm, #endif .clear = pnv_ioda1_tce_free, .get = pnv_tce_get, @@ -1866,7 +1879,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl, { struct iommu_table_group_link *tgl; - list_for_each_entry_rcu(tgl, >it_group_list, next) { + list_for_each_entry_lockless(tgl, >it_group_list, next) { struct pnv_ioda_pe *npe; struct pnv_ioda_pe *pe = container_of(tgl->table_group, struct pnv_ioda_pe, table_group); @@ -1918,6 +1931,18 @@ static int pnv_ioda2_tce_xchg(struct iommu_table *tbl, long index, return ret; } + +static int pnv_ioda2_tce_xchg_rm(struct iommu_table *tbl, long index, + unsigned long *hpa, enum dma_data_direction *direction) +{ + long ret = pnv_tce_xchg(tbl, index, hpa, direction); + + if (!ret && (tbl->it_type & + (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE))) +
[PATCH kernel 0/9] KVM, PPC, VFIO: Enable in-kernel acceleration
This enables in-kernel acceleration of H_PUT_TCE/etc hypercalls for pseries guests using VFIO. As pseries is a para-virtualized environment, the guest can see and control IOMMUs via special hypercalls which let the guest to add and remove mappings in real hardware IOMMU. This was posted last time quite a long time ago so I dropped versions now, this re-respin is v1. This was successfully used in the PowerKVM product for quite a while now. This is based on git://git.kernel.org/pub/scm/virt/kvm/kvm.git , "next" branch which got "multi-tce in-kernel acceleration" and "64 bit in-kernel TCE" support. Please comment. Thanks! Alexey Kardashevskiy (9): KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number powerpc/mmu: Add real mode support for IOMMU preregistered memory KVM: PPC: Use preregistered memory API to access TCE list powerpc/powernv/iommu: Add real mode version of xchg() KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently KVM: PPC: Associate IOMMU group with guest view of TCE table KVM: PPC: Create a virtual-mode only TCE table handlers KVM: PPC: Add in-kernel handling for VFIO KVM: PPC: VFIO device: support SPAPR TCE Documentation/virtual/kvm/devices/vfio.txt | 21 +- arch/powerpc/include/asm/iommu.h | 7 + arch/powerpc/include/asm/kvm_host.h| 8 + arch/powerpc/include/asm/kvm_ppc.h | 6 + arch/powerpc/include/asm/mmu_context.h | 6 +- arch/powerpc/kernel/iommu.c| 15 ++ arch/powerpc/kvm/Kconfig | 2 + arch/powerpc/kvm/Makefile | 5 +- arch/powerpc/kvm/book3s_64_vio.c | 344 + arch/powerpc/kvm/book3s_64_vio_hv.c| 280 +-- arch/powerpc/kvm/book3s_hv_rmhandlers.S| 4 +- arch/powerpc/kvm/powerpc.c | 1 + arch/powerpc/mm/mmu_context_iommu.c| 45 +++- arch/powerpc/platforms/powernv/pci-ioda.c | 28 ++- include/uapi/linux/kvm.h | 10 + virt/kvm/vfio.c| 106 + 16 files changed, 855 insertions(+), 33 deletions(-) -- 2.5.0.rc3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH][v4] livepatch/ppc: Enable livepatching on powerpc
On Fri, 2016-03-04 at 10:31 +0100, Miroslav Benes wrote: > On Fri, 4 Mar 2016, Michael Ellerman wrote: > > On Thu, 2016-03-03 at 17:52 +0100, Petr Mladek wrote: > > > > > 3. Added an error message when including > > >powerpc/include/asm/livepatch.h without HAVE_LIVEPATCH > > > > I don't know why we want to do that, I don't see how it is helpful. It > > doesn't > > even do what it says: > > > > > +#ifdef CONFIG_LIVEPATCH > > ... > > > +#else /* CONFIG_LIVEPATCH */ > > > +#error Include linux/livepatch.h, not asm/livepatch.h > > > +#endif /* CONFIG_LIVEPATCH */ > > > > If I turn on CONFIG_LIVEPATCH then I can quite happily include > > asm/livepatch.h > > and not get an error. So the check doesn't do what the message suggests. > > Well, yes. I looked into the archives to find if there was a reason to > even introduce it. It was not. It came up during a review process of the > livepatching patch set somehow and we left it there. I only changed the > error message to the mentioned one because we deemed it was better. Thanks for looking into it. > > And on x86 & s390 it does: > > > > #else > > #error Live patching support is disabled; check CONFIG_LIVEPATCH > > #endif > > This is the old message. See 383bf44d1a8b ("livepatch: change the error > message in asm/livepatch.h header files"). > > Anyway, it really does not mean much. I'll send a patch for s390 and x86 > to remove it completely in a minute. Thanks. I know it's not a big deal, but the kernel is complicated enough without extra code we don't really need :) cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problems with swapping in v4.5-rc on POWER
On Fri, 2016-03-04 at 09:58 -0800, Hugh Dickins wrote: > > The alternative bisection was as unsatisfactory as the first: > again it fingered an irrelevant merge (rather than any commit > pulled in by that merge) as the bad commit. > > It seems this issue is too intermittent for bisection to be useful, > on my load anyway. Darn. Thanks for trying. > The best I can do now is try v4.4 for a couple of days, to verify that > still comes out good (rather than the machine going bad coincident with > v4.5-rc), then try v4.5-rc7 to verify that that still comes out bad. Thanks, that would still be helpful. > I'll report back on those; but beyond that, I'll have to leave it to you. I haven't had any luck here :/ Can you give us a more verbose description of your test setup? - G5, which exact model? - 4k pages, no THP. - how much ram & swap? - building linus' tree, make -j ? - source and output on tmpfs? (how big?) - what device is the swap device? (you said SSD I think?) - anything else I've forgotten? Oh and can you send us your bisect logs, we can at least trust the bad results I think. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH v3 3/7] QE: Add uqe_serial document to bindings
On Tue, Mar 05, 2016 at 12:26PM, Rob Herring wrote: > -Original Message- > From: Rob Herring [mailto:r...@kernel.org] > Sent: Saturday, March 05, 2016 12:26 PM > To: Qiang Zhao> Cc: o...@buserror.net; Yang-Leo Li ; Xiaobo Xie > ; linux-ker...@vger.kernel.org; > devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org > Subject: Re: [PATCH v3 3/7] QE: Add uqe_serial document to bindings > > On Tue, Mar 01, 2016 at 03:09:39PM +0800, Zhao Qiang wrote: > > Add uqe_serial document to > > Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt > > > > Signed-off-by: Zhao Qiang > > --- > > Changes for v2 > > - modify tx/rx-clock-name specification Changes for v2 > > - NA > > > > .../bindings/powerpc/fsl/cpm_qe/uqe_serial.txt| 19 > +++ > > 1 file changed, 19 insertions(+) > > create mode 100644 > > Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt > > > > diff --git > > a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt > > b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt > > new file mode 100644 > > index 000..436c71c > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial. > > +++ txt > > @@ -0,0 +1,19 @@ > > +* Serial > > + > > +Currently defined compatibles: > > +- ucc_uart > > I guess this is in use already and okay. However, looking at the driver there > really should be SoC specific compatible strings here since the driver is > looking > up the SoC compatible string and composing the firmware filename from that. Ok, I will changed both driver and this compatible. > > > + > > +Properties for ucc_uart: > > +port-number : port number of UCC-UART tx/rx-clock-name : should be > > +"brg1"-"brg16" for internal clock source, > > + should be "clk1"-"clk28" for external clock source. > > + > > +Example: > > + > > + ucc_serial: ucc@2200 { > > + device_type = "serial"; > > Drop device_type. It should only be used in a few legacy cases. > > Looks like the driver is matching on this. Please drop it from the driver > too. I'd > leave dts files for now, but they should be updated too later. Ok, Thank you for your Reviewing, I will drop it > > > + compatible = "ucc_uart"; > > + port-number = <1>; > > + rx-clock-name = "brg2"; > > + tx-clock-name = "brg2"; > > + }; > > -- > > 2.1.0.27.g96db324 > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 8/8] QE-UART: modify of_device_id for qe-uart driver
Drop device type and modify compatible to SoC specific compatible. Signed-off-by: Zhao Qiang--- drivers/tty/serial/ucc_uart.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/tty/serial/ucc_uart.c b/drivers/tty/serial/ucc_uart.c index 1a7dc3c..ff6c1ab 100644 --- a/drivers/tty/serial/ucc_uart.c +++ b/drivers/tty/serial/ucc_uart.c @@ -1475,8 +1475,7 @@ static int ucc_uart_remove(struct platform_device *ofdev) static const struct of_device_id ucc_uart_match[] = { { - .type = "serial", - .compatible = "ucc_uart", + .compatible = "t1040-ucc-uart", }, {}, }; -- 2.1.0.27.g96db324 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 7/8] T104xQDS: Add qe node to t104xqds
add qe node to t104xqds.dtsi Signed-off-by: Zhao Qiang--- Changes for v2 - rebase Changes for v3 - rebase Changes for v4 - rebase arch/powerpc/boot/dts/fsl/t104xqds.dtsi | 38 + 1 file changed, 38 insertions(+) diff --git a/arch/powerpc/boot/dts/fsl/t104xqds.dtsi b/arch/powerpc/boot/dts/fsl/t104xqds.dtsi index 1498d1e..8e72041 100644 --- a/arch/powerpc/boot/dts/fsl/t104xqds.dtsi +++ b/arch/powerpc/boot/dts/fsl/t104xqds.dtsi @@ -190,4 +190,42 @@ 0 0x0001>; }; }; + + qe: qe@ffe14 { + ranges = <0x0 0xf 0xfe14 0x4>; + reg = <0xf 0xfe14 0 0x480>; + brg-frequency = <0>; + bus-frequency = <0>; + + si1: si@700 { + compatible = "fsl,t1040-qe-si"; + reg = <0x700 0x80>; + }; + + siram1: siram@1000 { + compatible = "fsl,t1040-qe-siram"; + reg = <0x1000 0x800>; + }; + + ucc_hdlc: ucc@2000 { + compatible = "fsl,ucc-hdlc"; + rx-clock-name = "clk8"; + tx-clock-name = "clk9"; + fsl,rx-sync-clock = "rsync_pin"; + fsl,tx-sync-clock = "tsync_pin"; + fsl,tx-timeslot-mask = <0xfffe>; + fsl,rx-timeslot-mask = <0xfffe>; + fsl,tdm-framer-type = "e1"; + fsl,tdm-id = <0>; + fsl,siram-entry-id = <0>; + fsl,tdm-interface; + }; + + ucc_serial: ucc@2200 { + compatible = "t1040-ucc-uart"; + port-number = <0>; + rx-clock-name = "brg2"; + tx-clock-name = "brg2"; + }; + }; }; -- 2.1.0.27.g96db324 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 6/8] T104xRDB: Add qe node to t104xrdb
add qe node to t104xrdb.dtsi Signed-off-by: Zhao Qiang--- Changes for v2 - rebase Changes for v3 - rebase Changes for v4 - rebase arch/powerpc/boot/dts/fsl/t104xrdb.dtsi | 38 + 1 file changed, 38 insertions(+) diff --git a/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi b/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi index 830ea48..dd7fc2b 100644 --- a/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi +++ b/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi @@ -186,4 +186,42 @@ 0 0x0001>; }; }; + + qe: qe@ffe14 { + ranges = <0x0 0xf 0xfe14 0x4>; + reg = <0xf 0xfe14 0 0x480>; + brg-frequency = <0>; + bus-frequency = <0>; + + si1: si@700 { + compatible = "fsl,t1040-qe-si"; + reg = <0x700 0x80>; + }; + + siram1: siram@1000 { + compatible = "fsl,t1040-qe-siram"; + reg = <0x1000 0x800>; + }; + + ucc_hdlc: ucc@2000 { + compatible = "fsl,ucc-hdlc"; + rx-clock-name = "clk8"; + tx-clock-name = "clk9"; + fsl,rx-sync-clock = "rsync_pin"; + fsl,tx-sync-clock = "tsync_pin"; + fsl,tx-timeslot-mask = <0xfffe>; + fsl,rx-timeslot-mask = <0xfffe>; + fsl,tdm-framer-type = "e1"; + fsl,tdm-id = <0>; + fsl,siram-entry-id = <0>; + fsl,tdm-interface; + }; + + ucc_serial: ucc@2200 { + compatible = "t1040-ucc-uart"; + port-number = <0>; + rx-clock-name = "brg2"; + tx-clock-name = "brg2"; + }; + }; }; -- 2.1.0.27.g96db324 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 5/8] T104xD4RDB: Add qe node to t104xd4rdb
add qe node to t104xd4rdb.dtsi and t1040si-post.dtsi. Signed-off-by: Zhao Qiang--- Changes for v2 - rebase Changes for v3 - rebase Changes for v4 - rebase arch/powerpc/boot/dts/fsl/t1040si-post.dtsi | 45 + arch/powerpc/boot/dts/fsl/t104xd4rdb.dtsi | 38 2 files changed, 83 insertions(+) diff --git a/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi index e0f4da5..012f813 100644 --- a/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi @@ -673,3 +673,48 @@ }; }; }; + + { + #address-cells = <1>; + #size-cells = <1>; + device_type = "qe"; + compatible = "fsl,qe"; + fsl,qe-num-riscs = <1>; + fsl,qe-num-snums = <28>; + + qeic: interrupt-controller@80 { + interrupt-controller; + compatible = "fsl,qe-ic"; + #address-cells = <0>; + #interrupt-cells = <1>; + reg = <0x80 0x80>; + interrupts = <95 2 0 0 94 2 0 0>; //high:79 low:78 + }; + + ucc@2000 { + cell-index = <1>; + reg = <0x2000 0x200>; + interrupts = <32>; + interrupt-parent = <>; + }; + + ucc@2200 { + cell-index = <3>; + reg = <0x2200 0x200>; + interrupts = <34>; + interrupt-parent = <>; + }; + + muram@1 { + #address-cells = <1>; + #size-cells = <1>; + compatible = "fsl,qe-muram", "fsl,cpm-muram"; + ranges = <0x0 0x1 0x6000>; + + data-only@0 { + compatible = "fsl,qe-muram-data", + "fsl,cpm-muram-data"; + reg = <0x0 0x6000>; + }; + }; +}; diff --git a/arch/powerpc/boot/dts/fsl/t104xd4rdb.dtsi b/arch/powerpc/boot/dts/fsl/t104xd4rdb.dtsi index 3f6d7c6..41ed3a6 100644 --- a/arch/powerpc/boot/dts/fsl/t104xd4rdb.dtsi +++ b/arch/powerpc/boot/dts/fsl/t104xd4rdb.dtsi @@ -212,4 +212,42 @@ 0 0x0001>; }; }; + + qe: qe@ffe14 { + ranges = <0x0 0xf 0xfe14 0x4>; + reg = <0xf 0xfe14 0 0x480>; + brg-frequency = <0>; + bus-frequency = <0>; + + si1: si@700 { + compatible = "fsl,t1040-qe-si"; + reg = <0x700 0x80>; + }; + + siram1: siram@1000 { + compatible = "fsl,t1040-qe-siram"; + reg = <0x1000 0x800>; + }; + + ucc_hdlc: ucc@2000 { + compatible = "fsl,ucc-hdlc"; + rx-clock-name = "clk8"; + tx-clock-name = "clk9"; + fsl,rx-sync-clock = "rsync_pin"; + fsl,tx-sync-clock = "tsync_pin"; + fsl,tx-timeslot-mask = <0xfffe>; + fsl,rx-timeslot-mask = <0xfffe>; + fsl,tdm-framer-type = "e1"; + fsl,tdm-id = <0>; + fsl,siram-entry-id = <0>; + fsl,tdm-interface; + }; + + ucc_serial: ucc@2200 { + compatible = "t1040-ucc-uart"; + port-number = <0>; + rx-clock-name = "brg2"; + tx-clock-name = "brg2"; + }; + }; }; -- 2.1.0.27.g96db324 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 4/8] bindings: move cpm_qe binding from powerpc/fsl to soc/fsl
cpm_qe is supported on both powerpc and arm. and the QE code has been moved from arch/powerpc into drivers/soc/fsl, so move cpm_qe binding from powerpc/fsl to soc/fsl Signed-off-by: Zhao QiangAcked-by: Rob Herring --- Changes for v3 - NA Changes for v4 - NA Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm.txt | 0 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm/brg.txt | 0 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm/i2c.txt | 0 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm/pic.txt | 0 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm/usb.txt | 0 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/gpio.txt| 0 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/network.txt | 0 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe.txt | 0 .../devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/firmware.txt | 0 .../devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/par_io.txt | 0 .../devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/pincfg.txt | 0 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/ucc.txt | 0 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/usb.txt | 0 Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/serial.txt | 0 .../devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/uqe_serial.txt| 0 15 files changed, 0 insertions(+), 0 deletions(-) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm/brg.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm/i2c.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm/pic.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/cpm/usb.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/gpio.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/network.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/firmware.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/par_io.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/pincfg.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/ucc.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/qe/usb.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/serial.txt (100%) rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/cpm_qe/uqe_serial.txt (100%) diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm.txt b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm.txt similarity index 100% rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm.txt rename to Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm.txt diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/brg.txt b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/brg.txt similarity index 100% rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/brg.txt rename to Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/brg.txt diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/i2c.txt b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/i2c.txt similarity index 100% rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/i2c.txt rename to Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/i2c.txt diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/pic.txt b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/pic.txt similarity index 100% rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/pic.txt rename to Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/pic.txt diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/usb.txt b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/usb.txt similarity index 100% rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/cpm/usb.txt rename to Documentation/devicetree/bindings/soc/fsl/cpm_qe/cpm/usb.txt diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/gpio.txt b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/gpio.txt similarity index 100% rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/gpio.txt rename to Documentation/devicetree/bindings/soc/fsl/cpm_qe/gpio.txt diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt similarity index 100% rename from Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt rename to
[PATCH v4 3/8] QE: Add uqe_serial document to bindings
Add uqe_serial document to Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt Signed-off-by: Zhao Qiang--- Changes for v2 - modify tx/rx-clock-name specification Changes for v3 - NA Changes for v4 - drop device_type - modify to SoC specific compatible .../bindings/powerpc/fsl/cpm_qe/uqe_serial.txt | 18 ++ 1 file changed, 18 insertions(+) create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt new file mode 100644 index 000..c2de8ba --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt @@ -0,0 +1,18 @@ +* Serial + +Currently defined compatibles: +- t1040-ucc-uart + +Properties for t1040-ucc-uart: +port-number : port number of UCC-UART +tx/rx-clock-name : should be "brg1"-"brg16" for internal clock source, + should be "clk1"-"clk28" for external clock source. + +Example: + + ucc_serial: ucc@2200 { + compatible = "t1040-ucc-uart"; + port-number = <0>; + rx-clock-name = "brg2"; + tx-clock-name = "brg2"; + }; -- 2.1.0.27.g96db324 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 2/8] QE: Add ucc hdlc document to bindings
Add ucc hdlc document to Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt Signed-off-by: Zhao QiangAcked-by: Rob Herring --- Changes for v2 - use ucc-hdlc instead of ucc_hdlc - add more information to properties. Changes for v3 - use fsl,tx-timeslot-mask instead of fsl,tx-timeslot - use fsl,rx-timeslot-mask instead of fsl,rx-timeslot - add more info Changes for v4 - NA .../bindings/powerpc/fsl/cpm_qe/network.txt| 81 ++ 1 file changed, 81 insertions(+) diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt index 29b28b8..03c7416 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/network.txt @@ -41,3 +41,84 @@ Example: fsl,mdio-pin = <12>; fsl,mdc-pin = <13>; }; + +* HDLC + +Currently defined compatibles: +- fsl,ucc-hdlc + +Properties for fsl,ucc-hdlc: +- rx-clock-name +- tx-clock-name + Usage: required + Value type: + Definition : Must be "brg1"-"brg16" for internal clock source, +Must be "clk1"-"clk24" for external clock source. + +- fsl,tdm-interface + Usage: optional + Value type: + Definition : Specify that hdlc is based on tdm-interface + +The property below is dependent on fsl,tdm-interface: +- fsl,rx-sync-clock + Usage: required + Value type: + Definition : Must be "none", "rsync_pin", "brg9-11" and "brg13-15". + +- fsl,tx-sync-clock + Usage: required + Value type: + Definition : Must be "none", "tsync_pin", "brg9-11" and "brg13-15". + +- fsl,tdm-framer-type + Usage: required for tdm interface + Value type: + Definition : "e1" or "t1".Now e1 and t1 are used, other framer types +are not supported. + +- fsl,tdm-id + Usage: required for tdm interface + Value type: + Definition : number of TDM ID + +- fsl,tx-timeslot-mask +- fsl,rx-timeslot-mask + Usage: required for tdm interface + Value type: + Definition : time slot mask for TDM operation. Indicates which time +slots used for transmitting and receiving. + +- fsl,siram-entry-id + Usage: required for tdm interface + Value type: + Definition : Must be 0,2,4...64. the number of TDM entry. + +- fsl,tdm-internal-loopback + usage: optional for tdm interface + value type: + Definition : Internal loopback connecting on TDM layer. + +Example for tdm interface: + + ucc@2000 { + compatible = "fsl,ucc-hdlc"; + rx-clock-name = "clk8"; + tx-clock-name = "clk9"; + fsl,rx-sync-clock = "rsync_pin"; + fsl,tx-sync-clock = "tsync_pin"; + fsl,tx-timeslot-mask = <0xfffe>; + fsl,rx-timeslot-mask = <0xfffe>; + fsl,tdm-framer-type = "e1"; + fsl,tdm-id = <0>; + fsl,siram-entry-id = <0>; + fsl,tdm-interface; + }; + +Example for hdlc without tdm interface: + + ucc@2000 { + compatible = "fsl,ucc-hdlc"; + rx-clock-name = "brg1"; + tx-clock-name = "brg1"; + }; -- 2.1.0.27.g96db324 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 1/8] QE: Add IC, SI and SIRAM document to device tree bindings.
Add IC, SI and SIRAM document of QE to Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/qe.txt Signed-off-by: Zhao QiangAcked-by: Rob Herring --- changes for v2 - Add interrupt-controller in Required properties - delete address-cells and size-cells for qe-si and qe-siram Changes for v3 - Add SoC specific caompatible strings to qe-si and qe-siram Changes for v4 - NA .../devicetree/bindings/powerpc/fsl/cpm_qe/qe.txt | 50 ++ 1 file changed, 50 insertions(+) diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/qe.txt b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/qe.txt index 4f89302..7ab21cb 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/qe.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/qe.txt @@ -69,6 +69,56 @@ Example: }; }; +* Interrupt Controller (IC) + +Required properties: +- compatible : should be "fsl,qe-ic". +- reg : Address range of IC register set. +- interrupts : interrupts generated by the device. +- interrupt-controller : this device is a interrupt controller. + +Example: + + qeic: interrupt-controller@80 { + interrupt-controller; + compatible = "fsl,qe-ic"; + #address-cells = <0>; + #interrupt-cells = <1>; + reg = <0x80 0x80>; + interrupts = <95 2 0 0 94 2 0 0>; //high:79 low:78 + }; + +* Serial Interface Block (SI) + +The SI manages the routing of eight TDM lines to the QE block serial drivers +, the MCC and the UCCs, for receive and transmit. + +Required properties: +- compatible : should be "fsl,t1040-qe-si". +- reg : Address range of SI register set. + +Example: + + si1: si@700 { + compatible = "fsl,t1040-qe-si"; + reg = <0x700 0x80>; + }; + +* Serial Interface Block RAM(SIRAM) + +store the routing entries of SI + +Required properties: +- compatible : should be "fsl,t1040-qe-siram". +- reg : Address range of SI RAM. + +Example: + + siram1: siram@1000 { + compatible = "fsl,t1040-qe-siram"; + reg = <0x1000 0x800>; + }; + * QE Firmware Node This node defines a firmware binary that is embedded in the device tree, for -- 2.1.0.27.g96db324 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] freescale:Make the function ucc_geth_tx have a return type of void
This makes the function ucc_geth_tx have a return type of void now due to this particular function always completing without ever executing a non recoverable error. Signed-off-by: Nicholas Krause--- drivers/net/ethernet/freescale/ucc_geth.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c index 78ebd73..70f3045 100644 --- a/drivers/net/ethernet/freescale/ucc_geth.c +++ b/drivers/net/ethernet/freescale/ucc_geth.c @@ -3226,7 +3226,7 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int rx_work_limit return howmany; } -static int ucc_geth_tx(struct net_device *dev, u8 txQ) +static void ucc_geth_tx(struct net_device *dev, u8 txQ) { /* Start from the next BD that should be filled */ struct ucc_geth_private *ugeth = netdev_priv(dev); @@ -3269,7 +3269,6 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ) bd_status = in_be32((u32 __iomem *)bd); } ugeth->confBd[txQ] = bd; - return 0; } static int ucc_geth_poll(struct napi_struct *napi, int budget) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH][v4] livepatch/ppc: Enable livepatching on powerpc
On 04/03/16 23:42, Torsten Duwe wrote: > On Thu, Mar 03, 2016 at 05:52:01PM +0100, Petr Mladek wrote: > [...] >> index ec7f8aada697..2d5333c228f1 100644 >> --- a/arch/powerpc/kernel/entry_64.S >> +++ b/arch/powerpc/kernel/entry_64.S >> @@ -1265,6 +1271,31 @@ ftrace_call: >> ld r0, LRSAVE(r1) >> mtlrr0 >> >> +#ifdef CONFIG_LIVEPATCH >> +beq+4f /* likely(old_NIP == new_NIP) */ >> +/* >> + * For a local call, restore this TOC after calling the patch function. >> + * For a global call, it does not matter what we restore here, >> + * since the global caller does its own restore right afterwards, >> + * anyway. Just insert a klp_return_helper frame in any case, >> + * so a patch function can always count on the changed stack offsets. >> + * The patch introduces a frame such that from the patched function >> + * we return back to klp_return helper. For ABI compliance r12, >> + * lr and LRSAVE(r1) contain the address of klp_return_helper. >> + * We loaded ctr with the address of the patched function earlier >> + */ >> +stdur1, -32(r1) /* open new mini stack frame */ >> +std r2, 24(r1) /* save TOC now, unconditionally. */ >> +bl 5f >> +5: mflrr12 >> +addir12, r12, (klp_return_helper + 4 - .)@l >> +std r12, LRSAVE(r1) >> +mtlrr12 >> +mfctr r12 /* allow for TOC calculation in newfunc */ >> +bctr >> +4: >> +#endif >> + >> #ifdef CONFIG_FUNCTION_GRAPH_TRACER >> stdur1, -112(r1) >> .globl ftrace_graph_call >> @@ -1281,6 +1312,25 @@ _GLOBAL(ftrace_graph_stub) >> >> _GLOBAL(ftrace_stub) >> blr >> +#ifdef CONFIG_LIVEPATCH >> +/* Helper function for local calls that are becoming global >> + * due to live patching. >> + * We can't simply patch the NOP after the original call, >> + * because, depending on the consistency model, some kernel >> + * threads may still have called the original, local function >> + * *without* saving their TOC in the respective stack frame slot, >> + * so the decision is made per-thread during function return by >> + * maybe inserting a klp_return_helper frame or not. >> +*/ >> +klp_return_helper: >> +ld r2, 24(r1) /* restore TOC (saved by ftrace_caller) */ >> +addi r1, r1, 32 /* destroy mini stack frame */ >> +ld r0, LRSAVE(r1) /* get the real return address */ >> +mtlrr0 >> +blr >> +#endif >> + >> + >> #else >> _GLOBAL_TOC(_mcount) >> /* Taken from output of objdump from lib64/glibc */ > We need a caveat here, at least in the comments, even better > in some documentation, that the klp_return_helper shifts the stack layout. > > This is relevant for functions with more than 8 fixed integer arguments > or for any varargs creator. As soon as the patch function is to replace > an original with arguments on the stack, the extra stack frame needs to > be accounted for. > > Where shall we put this warning? Good catch! We should just document it in livepatch.c (I suppose). I wonder if we can reuse the previous stack frame -- the caller into ftrace_caller. I think our arch.trampoline does bunch of the work anyway, klp_return_helper would just need to restore the right set of values I hope I am thinking clearly on a Monday morning Balbir Singh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/process: fix altivec SPR not being saved
In save_sprs() in process.c contains the following test: if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC))) t->vrsave = mfspr(SPRN_VRSAVE); CPU feature with the mask 0x1 is CPU_FTR_COHERENT_ICACHE so the test is equivilent to: if (cpu_has_feature(CPU_FTR_ALTIVEC) && cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) On CPUs without support for both (i.e G5) this results in vrsave not being saved between context switches. The vector register save/restore code doesn't use VRSAVE to determine which registers to save/restore, but the value of VRSAVE is used to determine if altivec is being used in several code paths. Signed-off-by: Oliver O'Halloran--- arch/powerpc/kernel/process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 8224852..5a4d4d1 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -855,7 +855,7 @@ void restore_tm_state(struct pt_regs *regs) static inline void save_sprs(struct thread_struct *t) { #ifdef CONFIG_ALTIVEC - if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC))) + if (cpu_has_feature(CPU_FTR_ALTIVEC)) t->vrsave = mfspr(SPRN_VRSAVE); #endif #ifdef CONFIG_PPC_BOOK3S_64 -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev