RE: [PATCH V3] powerpc/85xx: workaround for chips with MSI hardware errata
Hi Scott, I sent V3 of this patch a few days ago. Comments are welcome. Thanks. -Hongtao > -Original Message- > From: Jia Hongtao [mailto:hongtao@freescale.com] > Sent: Thursday, February 26, 2015 3:23 PM > To: linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421; asolo...@kb.kras.ru > Cc: ga...@kernel.crashing.org; Li Yang-Leo-R58472; Jia Hongtao-B38951 > Subject: [PATCH V3] powerpc/85xx: workaround for chips with MSI hardware > errata > > From: Hongtao Jia > > The MPIC version 2.0 has a MSI errata (errata PIC1 of mpc8544), It causes > that neither MSI nor MSI-X can work fine. This is a workaround to allow > MSI-X to function properly. > > Signed-off-by: Liu Shuo > Signed-off-by: Li Yang > Signed-off-by: Jia Hongtao > --- > Changes for V3: > * remove mpic_has_erratum_pic1() function. Test erratum directly. > * rebase on latest kernel update. > > Changes for V2: > * change the name of function mpic_has_errata() to > mpic_has_erratum_pic1(). > * move MSI_HW_ERRATA_ENDIAN define to fsl_msi.h with all other defines. > > arch/powerpc/sysdev/fsl_msi.c | 29 ++--- > arch/powerpc/sysdev/fsl_msi.h | 2 ++ > 2 files changed, 28 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/sysdev/fsl_msi.c > b/arch/powerpc/sysdev/fsl_msi.c > index 4bbb4b8..f086c6f 100644 > --- a/arch/powerpc/sysdev/fsl_msi.c > +++ b/arch/powerpc/sysdev/fsl_msi.c > @@ -162,7 +162,17 @@ static void fsl_compose_msi_msg(struct pci_dev *pdev, > int hwirq, > msg->address_lo = lower_32_bits(address); > msg->address_hi = upper_32_bits(address); > > - msg->data = hwirq; > + /* > + * MPIC version 2.0 has erratum PIC1. It causes > + * that neither MSI nor MSI-X can work fine. > + * This is a workaround to allow MSI-X to function > + * properly. It only works for MSI-X, we prevent > + * MSI on buggy chips in fsl_setup_msi_irqs(). > + */ > + if (msi_data->feature & MSI_HW_ERRATA_ENDIAN) > + msg->data = __swab32(hwirq); > + else > + msg->data = hwirq; > > pr_debug("%s: allocated srs: %d, ibs: %d\n", __func__, >(hwirq >> msi_data->srs_shift) & MSI_SRS_MASK, > @@ -180,8 +190,16 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, > int nvec, int type) > struct msi_msg msg; > struct fsl_msi *msi_data; > > - if (type == PCI_CAP_ID_MSIX) > - pr_debug("fslmsi: MSI-X untested, trying anyway.\n"); > + if (type == PCI_CAP_ID_MSI) { > + /* > + * MPIC version 2.0 has erratum PIC1. For now MSI > + * could not work. So check to prevent MSI from > + * being used on the board with this erratum. > + */ > + list_for_each_entry(msi_data, &msi_head, list) > + if (msi_data->feature & MSI_HW_ERRATA_ENDIAN) > + return -EINVAL; > + } > > /* >* If the PCI node has an fsl,msi property, then we need to use it > @@ -446,6 +464,11 @@ static int fsl_of_msi_probe(struct platform_device > *dev) > > msi->feature = features->fsl_pic_ip; > > + /* For erratum PIC1 on MPIC version 2.0*/ > + if ((features->fsl_pic_ip & FSL_PIC_IP_MASK) == FSL_PIC_IP_MPIC > + && (fsl_mpic_primary_get_version() == 0x0200)) > + msi->feature |= MSI_HW_ERRATA_ENDIAN; > + > /* >* Remember the phandle, so that we can match with any PCI nodes >* that have an "fsl,msi" property. > diff --git a/arch/powerpc/sysdev/fsl_msi.h > b/arch/powerpc/sysdev/fsl_msi.h > index 420cfcb..a67359d 100644 > --- a/arch/powerpc/sysdev/fsl_msi.h > +++ b/arch/powerpc/sysdev/fsl_msi.h > @@ -27,6 +27,8 @@ > #define FSL_PIC_IP_IPIC 0x0002 > #define FSL_PIC_IP_VMPIC 0x0003 > > +#define MSI_HW_ERRATA_ENDIAN 0x0010 > + > struct fsl_msi_cascade_data; > > struct fsl_msi { > -- > 2.1.0.27.g96db324 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)
On Mon, 2015-03-02 at 09:42 -0600, Emil Medve wrote: > On 03/02/2015 09:32 AM, Emil Medve wrote: > > From: Igal Liberman > > > > Describe the PHY topology for all configurations supported by each board > > > > Based on prior work by Andy Fleming > > > > Change-Id: I4fbcc5df9ee7c4f784afae9dab5d1e78cdc24f0f > > Bah, I'll remove this... Something like: $ cat .git/hooks/commit-msg #!/bin/sh grep "Change-Id:" $1 > /dev/null if [ $? -eq 0 ]; then echo "***: Error commit message includes Change-Id" >&2 exit 1 fi Ought to do it. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V13 21/21] powerpc/pci: Add PCI resource alignment documentation
In order to enable SRIOV on PowerNV platform, the PF's IOV BAR needs to be adjusted: 1. size expanded 2. aligned to M64BT size This patch documents this change on the reason and how. [bhelgaas: reformat, clarify, expand] Signed-off-by: Wei Yang --- .../powerpc/pci_iov_resource_on_powernv.txt| 305 1 file changed, 305 insertions(+) create mode 100644 Documentation/powerpc/pci_iov_resource_on_powernv.txt diff --git a/Documentation/powerpc/pci_iov_resource_on_powernv.txt b/Documentation/powerpc/pci_iov_resource_on_powernv.txt new file mode 100644 index 000..4e9bb28 --- /dev/null +++ b/Documentation/powerpc/pci_iov_resource_on_powernv.txt @@ -0,0 +1,305 @@ +Wei Yang +Benjamin Herrenschmidt +26 Aug 2014 + +This document describes the requirement from hardware for PCI MMIO resource +sizing and assignment on PowerNV platform and how generic PCI code handles +this requirement. The first two sections describe the concepts of +Partitionable Endpoints and the implementation on P8 (IODA2). + +1. Introduction to Partitionable Endpoints + +A Partitionable Endpoint (PE) is a way to group the various resources +associated with a device or a set of device to provide isolation between +partitions (i.e., filtering of DMA, MSIs etc.) and to provide a mechanism +to freeze a device that is causing errors in order to limit the possibility +of propagation of bad data. + +There is thus, in HW, a table of PE states that contains a pair of "frozen" +state bits (one for MMIO and one for DMA, they get set together but can be +cleared independently) for each PE. + +When a PE is frozen, all stores in any direction are dropped and all loads +return all 1's value. MSIs are also blocked. There's a bit more state +that captures things like the details of the error that caused the freeze +etc., but that's not critical. + +The interesting part is how the various PCIe transactions (MMIO, DMA, ...) +are matched to their corresponding PEs. + +The following section provides a rough description of what we have on P8 +(IODA2). Keep in mind that this is all per PHB (PCI host bridge). Each +PHB is a completely separate HW entity that replicates the entire logic, +so has its own set of PEs, etc. + +2. Implementation of Partitionable Endpoints on P8 (IODA2) + +P8 supports up to 256 Partitionable Endpoints per PHB. + + * Inbound + +For DMA, MSIs and inbound PCIe error messages, we have a table (in +memory but accessed in HW by the chip) that provides a direct +correspondence between a PCIe RID (bus/dev/fn) with a PE number. +We call this the RTT. + +- For DMA we then provide an entire address space for each PE that can + contains two "windows", depending on the value of PCI address bit 59. + Each window can be configured to be remapped via a "TCE table" (IOMMU + translation table), which has various configurable characteristics + not described here. + +- For MSIs, we have two windows in the address space (one at the top of + the 32-bit space and one much higher) which, via a combination of the + address and MSI value, will result in one of the 2048 interrupts per + bridge being triggered. There's a PE# in the interrupt controller + descriptor table as well which is compared with the PE# obtained from + the RTT to "authorize" the device to emit that specific interrupt. + +- Error messages just use the RTT. + + * Outbound. That's where the tricky part is. + +Like other PCI host bridges, the Power8 IODA2 PHB supports "windows" +from the CPU address space to the PCI address space. There is one M32 +window and sixteen M64 windows. They have different characteristics. +First what they have in common: they forward a configurable portion of +the CPU address space to the PCIe bus and must be naturally aligned +power of two in size. The rest is different: + +- The M32 window: + + * Is limited to 4GB in size. + + * Drops the top bits of the address (above the size) and replaces + them with a configurable value. This is typically used to generate + 32-bit PCIe accesses. We configure that window at boot from FW and + don't touch it from Linux; it's usually set to forward a 2GB + portion of address space from the CPU to PCIe + 0x8000_..0x_. (Note: The top 64KB are actually + reserved for MSIs but this is not a problem at this point; we just + need to ensure Linux doesn't assign anything there, the M32 logic + ignores that however and will forward in that space if we try). + + * It is divided into 256 segments of equal size. A table in the chip + maps each segment to a PE#. That allows portions of the MMIO space + to be assigned to PEs on a segment granularity. For a 2GB window, + the segment granularity is 2GB/256 = 8MB. + +Now, this is the "main" window we use in Linux today (excluding +SR-IOV). We
[PATCH V13 20/21] powerpc/pci: Remove unused struct pci_dn.pcidev field
In struct pci_dn, the pcidev field is assigned but not used, so remove it. Signed-off-by: Wei Yang Acked-by: Gavin Shan --- arch/powerpc/include/asm/pci-bridge.h |1 - arch/powerpc/platforms/powernv/pci-ioda.c |1 - 2 files changed, 2 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 958ea86..109efba 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -168,7 +168,6 @@ struct pci_dn { int pci_ext_config_space; /* for pci devices */ - struct pci_dev *pcidev;/* back-pointer to the pci device */ #ifdef CONFIG_EEH struct eeh_dev *edev; /* eeh device */ #endif diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 47c46b7..47780c3 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1024,7 +1024,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe) pci_name(dev)); continue; } - pdn->pcidev = dev; pdn->pe_number = pe->pe_number; pe->dma_weight += pnv_ioda_dma_weight(dev); if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V13 19/21] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3
When IOV BAR is big, each is covered by 4 M64 windows. This leads to several VF PE sits in one PE in terms of M64. Group VF PEs according to the M64 allocation. [bhelgaas: use dev_printk() when possible] Signed-off-by: Wei Yang --- arch/powerpc/include/asm/pci-bridge.h |2 +- arch/powerpc/platforms/powernv/pci-ioda.c | 197 ++--- 2 files changed, 154 insertions(+), 45 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index d824bb1..958ea86 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -182,7 +182,7 @@ struct pci_dn { #define M64_PER_IOV 4 int m64_per_iov; #define IODA_INVALID_M64(-1) - int m64_wins[PCI_SRIOV_NUM_BARS]; + int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV]; #endif /* CONFIG_PCI_IOV */ #endif struct list_head child_list; diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index b1e936e..47c46b7 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1152,26 +1152,27 @@ static int pnv_pci_vf_release_m64(struct pci_dev *pdev) struct pci_controller *hose; struct pnv_phb*phb; struct pci_dn *pdn; - inti; + inti, j; bus = pdev->bus; hose = pci_bus_to_host(bus); phb = hose->private_data; pdn = pci_get_pdn(pdev); - for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { - if (pdn->m64_wins[i] == IODA_INVALID_M64) - continue; - opal_pci_phb_mmio_enable(phb->opal_id, - OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i], 0); - clear_bit(pdn->m64_wins[i], &phb->ioda.m64_bar_alloc); - pdn->m64_wins[i] = IODA_INVALID_M64; - } + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) + for (j = 0; j < M64_PER_IOV; j++) { + if (pdn->m64_wins[i][j] == IODA_INVALID_M64) + continue; + opal_pci_phb_mmio_enable(phb->opal_id, + OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0); + clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc); + pdn->m64_wins[i][j] = IODA_INVALID_M64; + } return 0; } -static int pnv_pci_vf_assign_m64(struct pci_dev *pdev) +static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs) { struct pci_bus*bus; struct pci_controller *hose; @@ -1179,17 +1180,33 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev) struct pci_dn *pdn; unsigned int win; struct resource *res; - inti; + inti, j; int64_trc; + inttotal_vfs; + resource_size_tsize, start; + intpe_num; + intvf_groups; + intvf_per_group; bus = pdev->bus; hose = pci_bus_to_host(bus); phb = hose->private_data; pdn = pci_get_pdn(pdev); + total_vfs = pci_sriov_get_totalvfs(pdev); /* Initialize the m64_wins to IODA_INVALID_M64 */ for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) - pdn->m64_wins[i] = IODA_INVALID_M64; + for (j = 0; j < M64_PER_IOV; j++) + pdn->m64_wins[i][j] = IODA_INVALID_M64; + + if (pdn->m64_per_iov == M64_PER_IOV) { + vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV; + vf_per_group = (num_vfs <= M64_PER_IOV)? 1: + roundup_pow_of_two(num_vfs) / pdn->m64_per_iov; + } else { + vf_groups = 1; + vf_per_group = 1; + } for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { res = &pdev->resource[i + PCI_IOV_RESOURCES]; @@ -1199,35 +1216,61 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev) if (!pnv_pci_is_mem_pref_64(res->flags)) continue; - do { - win = find_next_zero_bit(&phb->ioda.m64_bar_alloc, - phb->ioda.m64_bar_idx + 1, 0); - - if (win >= phb->ioda.m64_bar_idx + 1) - goto m64_failed; - } while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc)); + for (j = 0; j < vf_groups; j++) { + do { + win = find_next_zero_bit(&phb->ioda.m64_bar_alloc, + phb->ioda.m64_bar_idx + 1, 0); + + if (win >= phb->ioda.m64_bar_idx + 1) +
[PATCH V13 18/21] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported
M64 aperture size is limited on PHB3. When the IOV BAR is too big, this will exceed the limitation and failed to be assigned. Introduce a different mechanism based on the IOV BAR size: - if IOV BAR size is smaller than 64MB, expand to total_pe - if IOV BAR size is bigger than 64MB, roundup power2 [bhelgaas: make dev_printk() output more consistent, use PCI_SRIOV_NUM_BARS] Signed-off-by: Wei Yang --- arch/powerpc/include/asm/pci-bridge.h |2 ++ arch/powerpc/platforms/powernv/pci-ioda.c | 33 ++--- 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 011340d..d824bb1 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -179,6 +179,8 @@ struct pci_dn { u16 max_vfs;/* number of VFs IOV BAR expended */ u16 vf_pes; /* VF PE# under this PF */ int offset; /* PE# for the first VF PE */ +#define M64_PER_IOV 4 + int m64_per_iov; #define IODA_INVALID_M64(-1) int m64_wins[PCI_SRIOV_NUM_BARS]; #endif /* CONFIG_PCI_IOV */ diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 6f4ae91..b1e936e 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2242,6 +2242,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) int i; resource_size_t size; struct pci_dn *pdn; + int mul, total_vfs; if (!pdev->is_physfn || pdev->is_added) return; @@ -2252,6 +2253,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) pdn = pci_get_pdn(pdev); pdn->max_vfs = 0; + total_vfs = pci_sriov_get_totalvfs(pdev); + pdn->m64_per_iov = 1; + mul = phb->ioda.total_pe; + + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { + res = &pdev->resource[i + PCI_IOV_RESOURCES]; + if (!res->flags || res->parent) + continue; + if (!pnv_pci_is_mem_pref_64(res->flags)) { + dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n", +i, res); + continue; + } + + size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); + + /* bigger than 64M */ + if (size > (1 << 26)) { + dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n", +i, res); + pdn->m64_per_iov = M64_PER_IOV; + mul = roundup_pow_of_two(total_vfs); + break; + } + } + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { res = &pdev->resource[i + PCI_IOV_RESOURCES]; if (!res->flags || res->parent) @@ -2264,12 +2291,12 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res); size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); - res->end = res->start + size * phb->ioda.total_pe - 1; + res->end = res->start + size * mul - 1; dev_dbg(&pdev->dev, " %pR\n", res); dev_info(&pdev->dev, "VF BAR%d: %pR (expanded to %d VFs for PE alignment)", - i, res, phb->ioda.total_pe); +i, res, mul); } - pdn->max_vfs = phb->ioda.total_pe; + pdn->max_vfs = mul; } static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V13 17/21] powerpc/powernv: Shift VF resource with an offset
On PowerNV platform, resource position in M64 implies the PE# the resource belongs to. In some cases, adjustment of a resource is necessary to locate it to a correct position in M64. Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address according to an offset. [bhelgaas: rework loops, rework overlap check, index resource[] conventionally, remove pci_regs.h include, squashed with next patch] Signed-off-by: Wei Yang --- arch/powerpc/include/asm/pci-bridge.h |4 + arch/powerpc/kernel/pci_dn.c | 11 + arch/powerpc/platforms/powernv/pci-ioda.c | 520 - arch/powerpc/platforms/powernv/pci.c | 18 + arch/powerpc/platforms/powernv/pci.h |7 + 5 files changed, 543 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index de11de7..011340d 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -177,6 +177,10 @@ struct pci_dn { int pe_number; #ifdef CONFIG_PCI_IOV u16 max_vfs;/* number of VFs IOV BAR expended */ + u16 vf_pes; /* VF PE# under this PF */ + int offset; /* PE# for the first VF PE */ +#define IODA_INVALID_M64(-1) + int m64_wins[PCI_SRIOV_NUM_BARS]; #endif /* CONFIG_PCI_IOV */ #endif struct list_head child_list; diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index f3a1a81..5faf7ca 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -217,6 +217,17 @@ void remove_dev_pci_info(struct pci_dev *pdev) struct pci_dn *pdn, *tmp; int i; + /* +* VF and VF PE are created/released dynamically, so we need to +* bind/unbind them. Otherwise the VF and VF PE would be mismatched +* when re-enabling SR-IOV. +*/ + if (pdev->is_virtfn) { + pdn = pci_get_pdn(pdev); + pdn->pe_number = IODA_INVALID_PE; + return; + } + /* Only support IOV PF for now */ if (!pdev->is_physfn) return; diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 50155c9..6f4ae91 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -44,6 +44,9 @@ #include "powernv.h" #include "pci.h" +/* 256M DMA window, 4K TCE pages, 8 bytes TCE */ +#define TCE32_TABLE_SIZE ((0x1000 / 0x1000) * 8) + static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level, const char *fmt, ...) { @@ -56,11 +59,18 @@ static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level, vaf.fmt = fmt; vaf.va = &args; - if (pe->pdev) + if (pe->flags & PNV_IODA_PE_DEV) strlcpy(pfix, dev_name(&pe->pdev->dev), sizeof(pfix)); - else + else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) sprintf(pfix, "%04x:%02x ", pci_domain_nr(pe->pbus), pe->pbus->number); +#ifdef CONFIG_PCI_IOV + else if (pe->flags & PNV_IODA_PE_VF) + sprintf(pfix, "%04x:%02x:%2x.%d", + pci_domain_nr(pe->parent_dev->bus), + (pe->rid & 0xff00) >> 8, + PCI_SLOT(pe->rid), PCI_FUNC(pe->rid)); +#endif /* CONFIG_PCI_IOV*/ printk("%spci %s: [PE# %.3d] %pV", level, pfix, pe->pe_number, &vaf); @@ -591,7 +601,7 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, bool is_add) { struct pnv_ioda_pe *slave; - struct pci_dev *pdev; + struct pci_dev *pdev = NULL; int ret; /* @@ -630,8 +640,12 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, if (pe->flags & (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS)) pdev = pe->pbus->self; - else + else if (pe->flags & PNV_IODA_PE_DEV) pdev = pe->pdev->bus->self; +#ifdef CONFIG_PCI_IOV + else if (pe->flags & PNV_IODA_PE_VF) + pdev = pe->parent_dev->bus->self; +#endif /* CONFIG_PCI_IOV */ while (pdev) { struct pci_dn *pdn = pci_get_pdn(pdev); struct pnv_ioda_pe *parent; @@ -649,6 +663,87 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, return 0; } +#ifdef CONFIG_PCI_IOV +static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe) +{ + struct pci_dev *parent; + uint8_t bcomp, dcomp, fcomp; + int64_t rc; + long rid_end, rid; + + /* Currently, we just deconfigure VF PE. Bus PE will always there.*/ + if (pe->pbus) { + int count; + + dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER; + fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER; +
[PATCH V13 16/21] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv
Implement pcibios_iov_resource_alignment() on powernv platform. On PowerNV platform, there are 3 cases for the IOV BAR: 1. initial state, the IOV BAR size is multiple times of VF BAR size 2. after expanded, the IOV BAR size is expanded to meet the M64 segment size 3. sizing stage, the IOV BAR is truncated to 0 pnv_pci_iov_resource_alignment() handle these three cases respectively. [bhelgaas: adjust to drop "align" parameter, return pci_iov_resource_size() if no ppc_md machdep_call version] Signed-off-by: Wei Yang --- arch/powerpc/include/asm/machdep.h|1 + arch/powerpc/kernel/pci-common.c | 10 ++ arch/powerpc/platforms/powernv/pci-ioda.c | 20 3 files changed, 31 insertions(+) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 965547c..045448f 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -252,6 +252,7 @@ struct machdep_calls { #ifdef CONFIG_PCI_IOV void (*pcibios_fixup_sriov)(struct pci_bus *bus); + resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno); #endif /* CONFIG_PCI_IOV */ /* Called to shutdown machine specific hardware not already controlled diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 022e9fe..b91eff3 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -130,6 +130,16 @@ void pcibios_reset_secondary_bus(struct pci_dev *dev) pci_reset_secondary_bus(dev); } +#ifdef CONFIG_PCI_IOV +resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno) +{ + if (ppc_md.pcibios_iov_resource_alignment) + return ppc_md.pcibios_iov_resource_alignment(pdev, resno); + + return pci_iov_resource_size(pdev, resno); +} +#endif /* CONFIG_PCI_IOV */ + static resource_size_t pcibios_io_size(const struct pci_controller *hose) { #ifdef CONFIG_PPC64 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 958c7a3..50155c9 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1983,6 +1983,25 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus, return phb->ioda.io_segsize; } +#ifdef CONFIG_PCI_IOV +static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev, + int resno) +{ + struct pci_dn *pdn = pci_get_pdn(pdev); + resource_size_t align, iov_align; + + iov_align = resource_size(&pdev->resource[resno]); + if (iov_align) + return iov_align; + + align = pci_iov_resource_size(pdev, resno); + if (pdn->max_vfs) + return pdn->max_vfs * align; + + return align; +} +#endif /* CONFIG_PCI_IOV */ + /* Prevent enabling devices for which we couldn't properly * assign a PE */ @@ -2185,6 +2204,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, ppc_md.pcibios_reset_secondary_bus = pnv_pci_reset_secondary_bus; #ifdef CONFIG_PCI_IOV ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_sriov; + ppc_md.pcibios_iov_resource_alignment = pnv_pci_iov_resource_alignment; #endif /* CONFIG_PCI_IOV */ pci_add_flags(PCI_REASSIGN_ALL_RSRC); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V13 15/21] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe
On PHB3, PF IOV BAR will be covered by M64 window to have better PE isolation. The total_pe number is usually different from total_VFs, which can lead to a conflict between MMIO space and the PE number. For example, if total_VFs is 128 and total_pe is 256, the second half of M64 window will be part of other PCI device, which may already belong to other PEs. Prevent the conflict by reserving additional space for the PF IOV BAR, which is total_pe number of VF's BAR size. [bhelgaas: make dev_printk() output more consistent, index resource[] conventionally] Signed-off-by: Wei Yang --- arch/powerpc/include/asm/machdep.h|4 ++ arch/powerpc/include/asm/pci-bridge.h |3 ++ arch/powerpc/kernel/pci-common.c |5 +++ arch/powerpc/kernel/pci-hotplug.c |4 ++ arch/powerpc/platforms/powernv/pci-ioda.c | 61 + 5 files changed, 77 insertions(+) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index c8175a3..965547c 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -250,6 +250,10 @@ struct machdep_calls { /* Reset the secondary bus of bridge */ void (*pcibios_reset_secondary_bus)(struct pci_dev *dev); +#ifdef CONFIG_PCI_IOV + void (*pcibios_fixup_sriov)(struct pci_bus *bus); +#endif /* CONFIG_PCI_IOV */ + /* Called to shutdown machine specific hardware not already controlled * by other drivers. */ diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 513f8f2..de11de7 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -175,6 +175,9 @@ struct pci_dn { #define IODA_INVALID_PE(-1) #ifdef CONFIG_PPC_POWERNV int pe_number; +#ifdef CONFIG_PCI_IOV + u16 max_vfs;/* number of VFs IOV BAR expended */ +#endif /* CONFIG_PCI_IOV */ #endif struct list_head child_list; struct list_head list; diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 8203101..022e9fe 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -1646,6 +1646,11 @@ void pcibios_scan_phb(struct pci_controller *hose) if (ppc_md.pcibios_fixup_phb) ppc_md.pcibios_fixup_phb(hose); +#ifdef CONFIG_PCI_IOV + if (ppc_md.pcibios_fixup_sriov) + ppc_md.pcibios_fixup_sriov(bus); +#endif /* CONFIG_PCI_IOV */ + /* Configure PCI Express settings */ if (bus && !pci_has_flag(PCI_PROBE_ONLY)) { struct pci_bus *child; diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c index 5b78917..7d238ae 100644 --- a/arch/powerpc/kernel/pci-hotplug.c +++ b/arch/powerpc/kernel/pci-hotplug.c @@ -94,6 +94,10 @@ void pcibios_add_pci_devices(struct pci_bus * bus) */ slotno = PCI_SLOT(PCI_DN(dn->child)->devfn); pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); +#ifdef CONFIG_PCI_IOV + if (ppc_md.pcibios_fixup_sriov) + ppc_md.pcibios_fixup_sriov(bus); +#endif /* CONFIG_PCI_IOV */ pcibios_setup_bus_devices(bus); max = bus->busn_res.start; for (pass = 0; pass < 2; pass++) { diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 1b37066..958c7a3 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1749,6 +1749,64 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { } #endif /* CONFIG_PCI_MSI */ +#ifdef CONFIG_PCI_IOV +static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) +{ + struct pci_controller *hose; + struct pnv_phb *phb; + struct resource *res; + int i; + resource_size_t size; + struct pci_dn *pdn; + + if (!pdev->is_physfn || pdev->is_added) + return; + + hose = pci_bus_to_host(pdev->bus); + phb = hose->private_data; + + pdn = pci_get_pdn(pdev); + pdn->max_vfs = 0; + + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { + res = &pdev->resource[i + PCI_IOV_RESOURCES]; + if (!res->flags || res->parent) + continue; + if (!pnv_pci_is_mem_pref_64(res->flags)) { + dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n", +i, res); + continue; + } + + dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res); + size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); + res->end = res->start + size * phb->ioda.total_pe - 1; + dev_dbg(&pdev->dev, " %pR\n", res);
[PATCH V13 14/21] powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically
Current iommu_table of a PE is a static field. This will have a problem when iommu_free_table() is called. Allocate iommu_table dynamically. Signed-off-by: Wei Yang --- arch/powerpc/include/asm/iommu.h |3 +++ arch/powerpc/platforms/powernv/pci-ioda.c | 26 ++ arch/powerpc/platforms/powernv/pci.h |2 +- 3 files changed, 18 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 9cfa370..5574eeb 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -78,6 +78,9 @@ struct iommu_table { struct iommu_group *it_group; #endif void (*set_bypass)(struct iommu_table *tbl, bool enable); +#ifdef CONFIG_PPC_POWERNV + void *data; +#endif }; /* Pure 2^n version of get_order */ diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index df4a295..1b37066 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -916,6 +916,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all) return; } + pe->tce32_table = kzalloc_node(sizeof(struct iommu_table), + GFP_KERNEL, hose->node); + pe->tce32_table->data = pe; + /* Associate it with all child devices */ pnv_ioda_setup_same_PE(bus, pe); @@ -1005,7 +1009,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev pe = &phb->ioda.pe_array[pdn->pe_number]; WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops); - set_iommu_table_base_and_group(&pdev->dev, &pe->tce32_table); + set_iommu_table_base_and_group(&pdev->dev, pe->tce32_table); } static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb, @@ -1032,7 +1036,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb, } else { dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n"); set_dma_ops(&pdev->dev, &dma_iommu_ops); - set_iommu_table_base(&pdev->dev, &pe->tce32_table); + set_iommu_table_base(&pdev->dev, pe->tce32_table); } *pdev->dev.dma_mask = dma_mask; return 0; @@ -1069,9 +1073,9 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, list_for_each_entry(dev, &bus->devices, bus_list) { if (add_to_iommu_group) set_iommu_table_base_and_group(&dev->dev, - &pe->tce32_table); + pe->tce32_table); else - set_iommu_table_base(&dev->dev, &pe->tce32_table); + set_iommu_table_base(&dev->dev, pe->tce32_table); if (dev->subordinate) pnv_ioda_setup_bus_dma(pe, dev->subordinate, @@ -1161,8 +1165,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe, void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl, __be64 *startp, __be64 *endp, bool rm) { - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe, - tce32_table); + struct pnv_ioda_pe *pe = tbl->data; struct pnv_phb *phb = pe->phb; if (phb->type == PNV_PHB_IODA1) @@ -1228,7 +1231,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, } /* Setup linux iommu table */ - tbl = &pe->tce32_table; + tbl = pe->tce32_table; pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs, base << 28, IOMMU_PAGE_SHIFT_4K); @@ -1266,8 +1269,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable) { - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe, - tce32_table); + struct pnv_ioda_pe *pe = tbl->data; uint16_t window_id = (pe->pe_number << 1 ) + 1; int64_t rc; @@ -1312,10 +1314,10 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb *phb, pe->tce_bypass_base = 1ull << 59; /* Install set_bypass callback for VFIO */ - pe->tce32_table.set_bypass = pnv_pci_ioda2_set_bypass; + pe->tce32_table->set_bypass = pnv_pci_ioda2_set_bypass; /* Enable bypass by default */ - pnv_pci_ioda2_set_bypass(&pe->tce32_table, true); + pnv_pci_ioda2_set_bypass(pe->tce32_table, true); } static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb, @@ -1363,7 +1365,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb, } /* Setup linux iommu table */ - tbl = &pe->tce32_table; + tbl = pe->tce32_table; pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
[PATCH V13 13/21] powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor
The PCI config accessors previously relied on device_node. Unfortunately, VFs don't have a corresponding device_node, so change the accessors to use pci_dn instead. [bhelgaas: changelog] Signed-off-by: Gavin Shan --- arch/powerpc/platforms/powernv/eeh-powernv.c | 14 +- arch/powerpc/platforms/powernv/pci.c | 69 ++ arch/powerpc/platforms/powernv/pci.h |4 +- 3 files changed, 40 insertions(+), 47 deletions(-) diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c index e261869..7a5021b 100644 --- a/arch/powerpc/platforms/powernv/eeh-powernv.c +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c @@ -430,21 +430,31 @@ static inline bool powernv_eeh_cfg_blocked(struct device_node *dn) static int powernv_eeh_read_config(struct device_node *dn, int where, int size, u32 *val) { + struct pci_dn *pdn = PCI_DN(dn); + + if (!pdn) + return PCIBIOS_DEVICE_NOT_FOUND; + if (powernv_eeh_cfg_blocked(dn)) { *val = 0x; return PCIBIOS_SET_FAILED; } - return pnv_pci_cfg_read(dn, where, size, val); + return pnv_pci_cfg_read(pdn, where, size, val); } static int powernv_eeh_write_config(struct device_node *dn, int where, int size, u32 val) { + struct pci_dn *pdn = PCI_DN(dn); + + if (!pdn) + return PCIBIOS_DEVICE_NOT_FOUND; + if (powernv_eeh_cfg_blocked(dn)) return PCIBIOS_SET_FAILED; - return pnv_pci_cfg_write(dn, where, size, val); + return pnv_pci_cfg_write(pdn, where, size, val); } /** diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c index e69142f..6c20d6e 100644 --- a/arch/powerpc/platforms/powernv/pci.c +++ b/arch/powerpc/platforms/powernv/pci.c @@ -366,9 +366,9 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no) spin_unlock_irqrestore(&phb->lock, flags); } -static void pnv_pci_config_check_eeh(struct pnv_phb *phb, -struct device_node *dn) +static void pnv_pci_config_check_eeh(struct pci_dn *pdn) { + struct pnv_phb *phb = pdn->phb->private_data; u8 fstate; __be16 pcierr; int pe_no; @@ -379,7 +379,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb, * setup that yet. So all ER errors should be mapped to * reserved PE. */ - pe_no = PCI_DN(dn)->pe_number; + pe_no = pdn->pe_number; if (pe_no == IODA_INVALID_PE) { if (phb->type == PNV_PHB_P5IOC2) pe_no = 0; @@ -407,8 +407,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb, } cfg_dbg(" -> EEH check, bdfn=%04x PE#%d fstate=%x\n", - (PCI_DN(dn)->busno << 8) | (PCI_DN(dn)->devfn), - pe_no, fstate); + (pdn->busno << 8) | (pdn->devfn), pe_no, fstate); /* Clear the frozen state if applicable */ if (fstate == OPAL_EEH_STOPPED_MMIO_FREEZE || @@ -425,10 +424,9 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb, } } -int pnv_pci_cfg_read(struct device_node *dn, +int pnv_pci_cfg_read(struct pci_dn *pdn, int where, int size, u32 *val) { - struct pci_dn *pdn = PCI_DN(dn); struct pnv_phb *phb = pdn->phb->private_data; u32 bdfn = (pdn->busno << 8) | pdn->devfn; s64 rc; @@ -462,10 +460,9 @@ int pnv_pci_cfg_read(struct device_node *dn, return PCIBIOS_SUCCESSFUL; } -int pnv_pci_cfg_write(struct device_node *dn, +int pnv_pci_cfg_write(struct pci_dn *pdn, int where, int size, u32 val) { - struct pci_dn *pdn = PCI_DN(dn); struct pnv_phb *phb = pdn->phb->private_data; u32 bdfn = (pdn->busno << 8) | pdn->devfn; @@ -489,18 +486,17 @@ int pnv_pci_cfg_write(struct device_node *dn, } #if CONFIG_EEH -static bool pnv_pci_cfg_check(struct pci_controller *hose, - struct device_node *dn) +static bool pnv_pci_cfg_check(struct pci_dn *pdn) { struct eeh_dev *edev = NULL; - struct pnv_phb *phb = hose->private_data; + struct pnv_phb *phb = pdn->phb->private_data; /* EEH not enabled ? */ if (!(phb->flags & PNV_PHB_FLAG_EEH)) return true; /* PE reset or device removed ? */ - edev = of_node_to_eeh_dev(dn); + edev = pdn->edev; if (edev) { if (edev->pe && (edev->pe->state & EEH_PE_CFG_BLOCKED)) @@ -513,8 +509,7 @@ static bool pnv_pci_cfg_check(struct pci_controller *hose, return true; } #else -static inline pnv_pci_cfg_check(struct pci_controller *hose, - struct device_node *dn) +static inline pnv_pci_cfg_check(struct pci_dn *pdn)
[PATCH V13 12/21] powerpc/pci: Refactor pci_dn
From: Gavin Shan pci_dn is the extension of PCI device node and is created from device node. Unfortunately, VFs are enabled dynamically by PF's driver and they don't have corresponding device nodes, and pci_dn. Refactor pci_dn to support VFs: * pci_dn is organized as a hierarchy tree. VF's pci_dn is put to the child list of pci_dn of PF's bridge. pci_dn of other device put to the child list of pci_dn of its upstream bridge. * VF's pci_dn is expected to be created dynamically when PF enabling VFs. VF's pci_dn will be destroyed when PF disabling VFs. pci_dn of other device is still created from device node as before. * For one particular PCI device (VF or not), its pci_dn can be found from pdev->dev.archdata.firmware_data, PCI_DN(devnode), or parent's list. The fast path (fetching pci_dn through PCI device instance) is populated during early fixup time. [bhelgaas: add ifdef around add_one_dev_pci_info(), use dev_printk()] Signed-off-by: Gavin Shan --- arch/powerpc/include/asm/device.h |3 + arch/powerpc/include/asm/pci-bridge.h | 14 +- arch/powerpc/kernel/pci_dn.c | 245 - arch/powerpc/platforms/powernv/pci-ioda.c | 16 ++ 4 files changed, 272 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h index 38faede..29992cd 100644 --- a/arch/powerpc/include/asm/device.h +++ b/arch/powerpc/include/asm/device.h @@ -34,6 +34,9 @@ struct dev_archdata { #ifdef CONFIG_SWIOTLB dma_addr_t max_direct_dma_addr; #endif +#ifdef CONFIG_PPC64 + void*firmware_data; +#endif #ifdef CONFIG_EEH struct eeh_dev *edev; #endif diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 546d036..513f8f2 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -89,6 +89,7 @@ struct pci_controller { #ifdef CONFIG_PPC64 unsigned long buid; + void *firmware_data; #endif /* CONFIG_PPC64 */ void *private_data; @@ -154,9 +155,13 @@ static inline int isa_vaddr_is_ioport(void __iomem *address) struct iommu_table; struct pci_dn { + int flags; +#define PCI_DN_FLAG_IOV_VF 0x01 + int busno; /* pci bus number */ int devfn; /* pci device and function number */ + struct pci_dn *parent; struct pci_controller *phb;/* for pci devices */ struct iommu_table *iommu_table; /* for phb's or bridges */ struct device_node *node; /* back-pointer to the device_node */ @@ -171,14 +176,19 @@ struct pci_dn { #ifdef CONFIG_PPC_POWERNV int pe_number; #endif + struct list_head child_list; + struct list_head list; }; /* Get the pointer to a device_node's pci_dn */ #define PCI_DN(dn) ((struct pci_dn *) (dn)->data) +extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus, + int devfn); extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev); - -extern void * update_dn_pci_info(struct device_node *dn, void *data); +extern struct pci_dn *add_dev_pci_info(struct pci_dev *pdev); +extern void remove_dev_pci_info(struct pci_dev *pdev); +extern void *update_dn_pci_info(struct device_node *dn, void *data); static inline int pci_device_from_OF_node(struct device_node *np, u8 *bus, u8 *devfn) diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index 83df307..f3a1a81 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -32,12 +32,223 @@ #include #include +/* + * The function is used to find the firmware data of one + * specific PCI device, which is attached to the indicated + * PCI bus. For VFs, their firmware data is linked to that + * one of PF's bridge. For other devices, their firmware + * data is linked to that of their bridge. + */ +static struct pci_dn *pci_bus_to_pdn(struct pci_bus *bus) +{ + struct pci_bus *pbus; + struct device_node *dn; + struct pci_dn *pdn; + + /* +* We probably have virtual bus which doesn't +* have associated bridge. +*/ + pbus = bus; + while (pbus) { + if (pci_is_root_bus(pbus) || pbus->self) + break; + + pbus = pbus->parent; + } + + /* +* Except virtual bus, all PCI buses should +* have device nodes. +*/ + dn = pci_bus_to_OF_node(pbus); + pdn = dn ? PCI_DN(dn) : NULL; + + return pdn; +} + +struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus, + int devfn) +{ + struct device_node *dn = NULL; + struct pci_dn *parent, *pdn; + struct pci_dev *pdev = NULL; + + /* Fast path: fetch
[PATCH V13 11/21] powerpc/pci: Don't unset PCI resources for VFs
Flag PCI_REASSIGN_ALL_RSRC is used to ignore resources information setup by firmware, so that kernel would re-assign all resources of pci devices. On powerpc arch, this happens in a header fixup function pcibios_fixup_resources(), which will clean up the resources if this flag is set. This works fine for PFs, since after clean up, kernel will re-assign the resources in pcibios_resource_survey(). Below is a simple call flow on how it works: pcibios_init pcibios_scan_phb pci_scan_child_bus ... pci_device_add pci_fixup_device(pci_fixup_header) pcibios_fixup_resources # header fixup for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) dev->resource[i].start = 0 pcibios_resource_survey # re-assign pcibios_allocate_resources However, the VF resources won't be re-assigned, since the VF resources are completely determined by the PF resources, and the PF resources have already been reassigned. This means we need to leave VF's resources un-cleared in pcibios_fixup_resources(). In this patch, we skip the resource unset process in pcibios_fixup_resources(), if the pci_dev is a VF. Signed-off-by: Wei Yang --- arch/powerpc/kernel/pci-common.c |4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 2a525c9..8203101 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -788,6 +788,10 @@ static void pcibios_fixup_resources(struct pci_dev *dev) pci_name(dev)); return; } + + if (dev->is_virtfn) + return; + for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { struct resource *res = dev->resource + i; struct pci_bus_region reg; -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V13 10/21] PCI: Consider additional PF's IOV BAR alignment in sizing and assigning
When sizing and assigning resources, we divide the resources into two lists: the requested list and the additional list. We don't consider the alignment of additional VF(n) BAR space. This is reasonable because the alignment required for the VF(n) BAR space is the size of an individual VF BAR, not the size of the space for *all* VFs. But some platforms, e.g., PowerNV, require additional alignment. Consider the additional IOV BAR alignment when sizing and assigning resources. When there is not enough system MMIO space, the PF's IOV BAR alignment will not contribute to the bridge. When there is enough system MMIO space, the additional alignment will contribute to the bridge. Also, take advantage of pci_dev_resource::min_align to store this additional alignment. [bhelgaas: changelog, printk cast] Signed-off-by: Wei Yang --- drivers/pci/setup-bus.c | 83 +++ 1 file changed, 70 insertions(+), 13 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index e3e17f3..affbcea 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -99,8 +99,8 @@ static void remove_from_list(struct list_head *head, } } -static resource_size_t get_res_add_size(struct list_head *head, - struct resource *res) +static struct pci_dev_resource *res_to_dev_res(struct list_head *head, + struct resource *res) { struct pci_dev_resource *dev_res; @@ -109,17 +109,37 @@ static resource_size_t get_res_add_size(struct list_head *head, int idx = res - &dev_res->dev->resource[0]; dev_printk(KERN_DEBUG, &dev_res->dev->dev, -"res[%d]=%pR get_res_add_size add_size %llx\n", +"res[%d]=%pR res_to_dev_res add_size %llx min_align %llx\n", idx, dev_res->res, -(unsigned long long)dev_res->add_size); +(unsigned long long)dev_res->add_size, +(unsigned long long)dev_res->min_align); - return dev_res->add_size; + return dev_res; } } - return 0; + return NULL; +} + +static resource_size_t get_res_add_size(struct list_head *head, + struct resource *res) +{ + struct pci_dev_resource *dev_res; + + dev_res = res_to_dev_res(head, res); + return dev_res ? dev_res->add_size : 0; +} + +static resource_size_t get_res_add_align(struct list_head *head, +struct resource *res) +{ + struct pci_dev_resource *dev_res; + + dev_res = res_to_dev_res(head, res); + return dev_res ? dev_res->min_align : 0; } + /* Sort resources by alignment */ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head) { @@ -368,8 +388,9 @@ static void __assign_resources_sorted(struct list_head *head, LIST_HEAD(save_head); LIST_HEAD(local_fail_head); struct pci_dev_resource *save_res; - struct pci_dev_resource *dev_res, *tmp_res; + struct pci_dev_resource *dev_res, *tmp_res, *dev_res2; unsigned long fail_type; + resource_size_t add_align, align; /* Check if optional add_size is there */ if (!realloc_head || list_empty(realloc_head)) @@ -384,10 +405,38 @@ static void __assign_resources_sorted(struct list_head *head, } /* Update res in head list with add_size in realloc_head list */ - list_for_each_entry(dev_res, head, list) + list_for_each_entry_safe(dev_res, tmp_res, head, list) { dev_res->res->end += get_res_add_size(realloc_head, dev_res->res); + /* +* There are two kinds of additional resources in the list: +* 1. bridge resource -- IORESOURCE_STARTALIGN +* 2. SR-IOV resource -- IORESOURCE_SIZEALIGN +* Here just fix the additional alignment for bridge +*/ + if (!(dev_res->res->flags & IORESOURCE_STARTALIGN)) + continue; + + add_align = get_res_add_align(realloc_head, dev_res->res); + + /* Reorder the list by their alignment */ + if (add_align > dev_res->res->start) { + dev_res->res->start = add_align; + dev_res->res->end = add_align + + resource_size(dev_res->res); + + list_for_each_entry(dev_res2, head, list) { + align = pci_resource_alignment(dev_res2->dev, + dev_res2->res); + if (
[PATCH V13 09/21] PCI: Add pcibios_iov_resource_alignment() interface
Per the SR-IOV spec r1.1, sec 3.3.14, the required alignment of a PF's IOV BAR is the size of an individual VF BAR, and the size consumed is the individual VF BAR size times NumVFs. The PowerNV platform has additional alignment requirements to help support its Partitionable Endpoint device isolation feature (see Documentation/powerpc/pci_iov_resource_on_powernv.txt). Add a pcibios_iov_resource_alignment() interface to allow platforms to request additional alignment. [bhelgaas: changelog, adapt to reworked pci_sriov_resource_alignment(), drop "align" parameter] Signed-off-by: Wei Yang --- drivers/pci/iov.c |8 +++- include/linux/pci.h |1 + 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 64c4692..ee0ebff 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -569,6 +569,12 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) 4 * (resno - PCI_IOV_RESOURCES); } +resource_size_t __weak pcibios_iov_resource_alignment(struct pci_dev *dev, + int resno) +{ + return pci_iov_resource_size(dev, resno); +} + /** * pci_sriov_resource_alignment - get resource alignment for VF BAR * @dev: the PCI device @@ -581,7 +587,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) */ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno) { - return pci_iov_resource_size(dev, resno); + return pcibios_iov_resource_alignment(dev, resno); } /** diff --git a/include/linux/pci.h b/include/linux/pci.h index 99ea948..4e1f17d 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1174,6 +1174,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus); void pci_setup_bridge(struct pci_bus *bus); resource_size_t pcibios_window_alignment(struct pci_bus *bus, unsigned long type); +resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev, int resno); #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0) #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V13 08/21] PCI: Add pcibios_sriov_enable() and pcibios_sriov_disable()
VFs are dynamically created when a driver enables them. On some platforms, like PowerNV, special resources are necessary to enable VFs. Add platform hooks for enabling and disabling VFs. Signed-off-by: Wei Yang --- drivers/pci/iov.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 5643a10..64c4692 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -220,6 +220,11 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset) pci_dev_put(dev); } +int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs) +{ + return 0; +} + static int sriov_enable(struct pci_dev *dev, int nr_virtfn) { int rc; @@ -231,6 +236,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) struct pci_sriov *iov = dev->sriov; int bars = 0; int bus; + int retval; if (!nr_virtfn) return 0; @@ -307,6 +313,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) if (nr_virtfn < initial) initial = nr_virtfn; + if ((retval = pcibios_sriov_enable(dev, initial))) { + dev_err(&dev->dev, "failure %d from pcibios_sriov_enable()\n", + retval); + return retval; + } + for (i = 0; i < initial; i++) { rc = virtfn_add(dev, i, 0); if (rc) @@ -335,6 +347,11 @@ failed: return rc; } +int __weak pcibios_sriov_disable(struct pci_dev *pdev) +{ + return 0; +} + static void sriov_disable(struct pci_dev *dev) { int i; @@ -346,6 +363,8 @@ static void sriov_disable(struct pci_dev *dev) for (i = 0; i < iov->num_VFs; i++) virtfn_remove(dev, i, 0); + pcibios_sriov_disable(dev); + iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE); pci_cfg_access_lock(dev); pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V13 07/21] PCI: Export pci_iov_virtfn_bus() and pci_iov_virtfn_devfn()
On PowerNV, some resource reservation is needed for SR-IOV VFs that don't exist at the bootup stage. To do the match between resources and VFs, the code need to get the VF's BDF in advance. Rename virtfn_bus() and virtfn_devfn() to pci_iov_virtfn_bus() and pci_iov_virtfn_devfn() and export them. [bhelgaas: changelog, make "busnr" int] Signed-off-by: Wei Yang --- drivers/pci/iov.c | 28 include/linux/pci.h | 11 +++ 2 files changed, 27 insertions(+), 12 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 2ae921f..5643a10 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -19,16 +19,20 @@ #define VIRTFN_ID_LEN 16 -static inline u8 virtfn_bus(struct pci_dev *dev, int id) +int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id) { + if (!dev->is_physfn) + return -EINVAL; return dev->bus->number + ((dev->devfn + dev->sriov->offset + - dev->sriov->stride * id) >> 8); + dev->sriov->stride * vf_id) >> 8); } -static inline u8 virtfn_devfn(struct pci_dev *dev, int id) +int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id) { + if (!dev->is_physfn) + return -EINVAL; return (dev->devfn + dev->sriov->offset + - dev->sriov->stride * id) & 0xff; + dev->sriov->stride * vf_id) & 0xff; } /* @@ -58,11 +62,11 @@ static inline u8 virtfn_max_buses(struct pci_dev *dev) struct pci_sriov *iov = dev->sriov; int nr_virtfn; u8 max = 0; - u8 busnr; + int busnr; for (nr_virtfn = 1; nr_virtfn <= iov->total_VFs; nr_virtfn++) { pci_iov_set_numvfs(dev, nr_virtfn); - busnr = virtfn_bus(dev, nr_virtfn - 1); + busnr = pci_iov_virtfn_bus(dev, nr_virtfn - 1); if (busnr > max) max = busnr; } @@ -116,7 +120,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) struct pci_bus *bus; mutex_lock(&iov->dev->sriov->lock); - bus = virtfn_add_bus(dev->bus, virtfn_bus(dev, id)); + bus = virtfn_add_bus(dev->bus, pci_iov_virtfn_bus(dev, id)); if (!bus) goto failed; @@ -124,7 +128,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) if (!virtfn) goto failed0; - virtfn->devfn = virtfn_devfn(dev, id); + virtfn->devfn = pci_iov_virtfn_devfn(dev, id); virtfn->vendor = dev->vendor; pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_DID, &virtfn->device); pci_setup_device(virtfn); @@ -186,8 +190,8 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset) struct pci_sriov *iov = dev->sriov; virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus), -virtfn_bus(dev, id), -virtfn_devfn(dev, id)); +pci_iov_virtfn_bus(dev, id), +pci_iov_virtfn_devfn(dev, id)); if (!virtfn) return; @@ -226,7 +230,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) struct pci_dev *pdev; struct pci_sriov *iov = dev->sriov; int bars = 0; - u8 bus; + int bus; if (!nr_virtfn) return 0; @@ -263,7 +267,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) iov->offset = offset; iov->stride = stride; - bus = virtfn_bus(dev, nr_virtfn - 1); + bus = pci_iov_virtfn_bus(dev, nr_virtfn - 1); if (bus > dev->bus->busn_res.end) { dev_err(&dev->dev, "can't enable %d VFs (bus %02x out of range of %pR)\n", nr_virtfn, bus, &dev->bus->busn_res); diff --git a/include/linux/pci.h b/include/linux/pci.h index 1559658..99ea948 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1669,6 +1669,9 @@ int pci_ext_cfg_avail(void); void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar); #ifdef CONFIG_PCI_IOV +int pci_iov_virtfn_bus(struct pci_dev *dev, int id); +int pci_iov_virtfn_devfn(struct pci_dev *dev, int id); + int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); void pci_disable_sriov(struct pci_dev *dev); int pci_num_vf(struct pci_dev *dev); @@ -1677,6 +1680,14 @@ int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs); int pci_sriov_get_totalvfs(struct pci_dev *dev); resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno); #else +static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id) +{ + return -ENOSYS; +} +static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id) +{ + return -ENOSYS; +} static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn) { return -ENODEV; } static inline void pci_disable_sriov(stru
[PATCH V13 06/21] PCI: Calculate maximum number of buses required for VFs
An SR-IOV device can change its First VF Offset and VF Stride based on the values of ARI Capable Hierarchy and NumVFs. The number of buses required for all VFs is determined by NumVFs, First VF Offset, and VF Stride (see SR-IOV spec r1.1, sec 2.1.2). Previously pci_iov_bus_range() computed how many buses would be required by TotalVFs, but this was based on a single NumVFs value and may not have been the maximum for all NumVFs configurations. Iterate over all valid NumVFs and calculate the maximum number of bus numbers that could ever be required for VFs of this device. [bhelgaas: changelog, compute busnr of NumVFs, not TotalVFs, remove kerenl-doc comment marker] Signed-off-by: Wei Yang --- drivers/pci/iov.c | 31 +++ drivers/pci/pci.h |1 + 2 files changed, 28 insertions(+), 4 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index a8752c2..2ae921f 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -46,6 +46,30 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn) pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &iov->stride); } +/* + * The PF consumes one bus number. NumVFs, First VF Offset, and VF Stride + * determine how many additional bus numbers will be consumed by VFs. + * + * Iterate over all valid NumVFs and calculate the maximum number of bus + * numbers that could ever be required. + */ +static inline u8 virtfn_max_buses(struct pci_dev *dev) +{ + struct pci_sriov *iov = dev->sriov; + int nr_virtfn; + u8 max = 0; + u8 busnr; + + for (nr_virtfn = 1; nr_virtfn <= iov->total_VFs; nr_virtfn++) { + pci_iov_set_numvfs(dev, nr_virtfn); + busnr = virtfn_bus(dev, nr_virtfn - 1); + if (busnr > max) + max = busnr; + } + + return max; +} + static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr) { struct pci_bus *child; @@ -427,6 +451,7 @@ found: dev->sriov = iov; dev->is_physfn = 1; + iov->max_VF_buses = virtfn_max_buses(dev); return 0; @@ -556,15 +581,13 @@ void pci_restore_iov_state(struct pci_dev *dev) int pci_iov_bus_range(struct pci_bus *bus) { int max = 0; - u8 busnr; struct pci_dev *dev; list_for_each_entry(dev, &bus->devices, bus_list) { if (!dev->is_physfn) continue; - busnr = virtfn_bus(dev, dev->sriov->total_VFs - 1); - if (busnr > max) - max = busnr; + if (dev->sriov->max_VF_buses > max) + max = dev->sriov->max_VF_buses; } return max ? max - bus->number : 0; diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 5732964..bae593c 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -243,6 +243,7 @@ struct pci_sriov { u16 stride; /* following VF stride */ u32 pgsz; /* page size for BAR alignment */ u8 link;/* Function Dependency Link */ + u8 max_VF_buses;/* max buses consumed by VFs */ u16 driver_max_VFs; /* max num VFs driver supports */ struct pci_dev *dev;/* lowest numbered PF */ struct pci_dev *self; /* this PF */ -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V13 05/21] PCI: Refresh First VF Offset and VF Stride when updating NumVFs
The First VF Offset and VF Stride fields depend on the NumVFs setting, so refresh the cached fields in struct pci_sriov when updating NumVFs. See the SR-IOV spec r1.1, sec 3.3.9 and 3.3.10. [bhelgaas: changelog, remove kernel-doc comment marker] Signed-off-by: Wei Yang --- drivers/pci/iov.c | 23 +++ 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 27b98c3..a8752c2 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -31,6 +31,21 @@ static inline u8 virtfn_devfn(struct pci_dev *dev, int id) dev->sriov->stride * id) & 0xff; } +/* + * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may + * change when NumVFs changes. + * + * Update iov->offset and iov->stride when NumVFs is written. + */ +static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn) +{ + struct pci_sriov *iov = dev->sriov; + + pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, nr_virtfn); + pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_OFFSET, &iov->offset); + pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &iov->stride); +} + static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr) { struct pci_bus *child; @@ -253,7 +268,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) return rc; } - pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, nr_virtfn); + pci_iov_set_numvfs(dev, nr_virtfn); iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE; pci_cfg_access_lock(dev); pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); @@ -282,7 +297,7 @@ failed: iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE); pci_cfg_access_lock(dev); pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); - pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, 0); + pci_iov_set_numvfs(dev, 0); ssleep(1); pci_cfg_access_unlock(dev); @@ -313,7 +328,7 @@ static void sriov_disable(struct pci_dev *dev) sysfs_remove_link(&dev->dev.kobj, "dep_link"); iov->num_VFs = 0; - pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, 0); + pci_iov_set_numvfs(dev, 0); } static int sriov_init(struct pci_dev *dev, int pos) @@ -452,7 +467,7 @@ static void sriov_restore_state(struct pci_dev *dev) pci_update_resource(dev, i); pci_write_config_dword(dev, iov->pos + PCI_SRIOV_SYS_PGSIZE, iov->pgsz); - pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, iov->num_VFs); + pci_iov_set_numvfs(dev, iov->num_VFs); pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); if (iov->ctrl & PCI_SRIOV_CTRL_VFE) msleep(100); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V13 04/21] PCI: Index IOV resources in the conventional style
From: Bjorn Helgaas Most of PCI uses "res = &dev->resource[i]", not "res = dev->resource + i". Use that style in iov.c also. No functional change. Signed-off-by: Bjorn Helgaas --- drivers/pci/iov.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 5bca0e1..27b98c3 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -95,7 +95,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) virtfn->multifunction = 0; for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { - res = dev->resource + PCI_IOV_RESOURCES + i; + res = &dev->resource[i + PCI_IOV_RESOURCES]; if (!res->parent) continue; virtfn->resource[i].name = pci_name(virtfn); @@ -212,7 +212,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) nres = 0; for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { bars |= (1 << (i + PCI_IOV_RESOURCES)); - res = dev->resource + PCI_IOV_RESOURCES + i; + res = &dev->resource[i + PCI_IOV_RESOURCES]; if (res->parent) nres++; } @@ -373,7 +373,7 @@ found: nres = 0; for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { - res = dev->resource + PCI_IOV_RESOURCES + i; + res = &dev->resource[i + PCI_IOV_RESOURCES]; bar64 = __pci_read_base(dev, pci_bar_unknown, res, pos + PCI_SRIOV_BAR + i * 4); if (!res->flags) @@ -417,7 +417,7 @@ found: failed: for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { - res = dev->resource + PCI_IOV_RESOURCES + i; + res = &dev->resource[i + PCI_IOV_RESOURCES]; res->flags = 0; } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V13 03/21] PCI: Keep individual VF BAR size in struct pci_sriov
Currently we don't store the individual VF BAR size. We calculate it when needed by dividing the PF's IOV resource size (which contains space for *all* the VFs) by total_VFs or by reading the BAR in the SR-IOV capability again. Keep the individual VF BAR size in struct pci_sriov.barsz[], add pci_iov_resource_size() to retrieve it, and use that instead of doing the division or reading the SR-IOV capability BAR. [bhelgaas: rename to "barsz[]", simplify barsz[] index computation, remove SR-IOV capability BAR sizing] Signed-off-by: Wei Yang --- drivers/pci/iov.c | 39 --- drivers/pci/pci.h |1 + include/linux/pci.h |3 +++ 3 files changed, 24 insertions(+), 19 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 05f9d97..5bca0e1 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -57,6 +57,14 @@ static void virtfn_remove_bus(struct pci_bus *physbus, struct pci_bus *virtbus) pci_remove_bus(virtbus); } +resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno) +{ + if (!dev->is_physfn) + return 0; + + return dev->sriov->barsz[resno - PCI_IOV_RESOURCES]; +} + static int virtfn_add(struct pci_dev *dev, int id, int reset) { int i; @@ -92,8 +100,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) continue; virtfn->resource[i].name = pci_name(virtfn); virtfn->resource[i].flags = res->flags; - size = resource_size(res); - do_div(size, iov->total_VFs); + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); virtfn->resource[i].start = res->start + size * id; virtfn->resource[i].end = virtfn->resource[i].start + size - 1; rc = request_resource(res, &virtfn->resource[i]); @@ -311,7 +318,7 @@ static void sriov_disable(struct pci_dev *dev) static int sriov_init(struct pci_dev *dev, int pos) { - int i; + int i, bar64; int rc; int nres; u32 pgsz; @@ -360,29 +367,29 @@ found: pgsz &= ~(pgsz - 1); pci_write_config_dword(dev, pos + PCI_SRIOV_SYS_PGSIZE, pgsz); + iov = kzalloc(sizeof(*iov), GFP_KERNEL); + if (!iov) + return -ENOMEM; + nres = 0; for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { res = dev->resource + PCI_IOV_RESOURCES + i; - i += __pci_read_base(dev, pci_bar_unknown, res, -pos + PCI_SRIOV_BAR + i * 4); + bar64 = __pci_read_base(dev, pci_bar_unknown, res, + pos + PCI_SRIOV_BAR + i * 4); if (!res->flags) continue; if (resource_size(res) & (PAGE_SIZE - 1)) { rc = -EIO; goto failed; } + iov->barsz[i] = resource_size(res); res->end = res->start + resource_size(res) * total - 1; dev_info(&dev->dev, "VF(n) BAR%d space: %pR (contains BAR%d for %d VFs)\n", i, res, i, total); + i += bar64; nres++; } - iov = kzalloc(sizeof(*iov), GFP_KERNEL); - if (!iov) { - rc = -ENOMEM; - goto failed; - } - iov->pos = pos; iov->nres = nres; iov->ctrl = ctrl; @@ -414,6 +421,7 @@ failed: res->flags = 0; } + kfree(iov); return rc; } @@ -510,14 +518,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) */ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno) { - struct resource tmp; - int reg = pci_iov_resource_bar(dev, resno); - - if (!reg) - return 0; - -__pci_read_base(dev, pci_bar_unknown, &tmp, reg); - return resource_alignment(&tmp); + return pci_iov_resource_size(dev, resno); } /** diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 4091f82..5732964 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -247,6 +247,7 @@ struct pci_sriov { struct pci_dev *dev;/* lowest numbered PF */ struct pci_dev *self; /* this PF */ struct mutex lock; /* lock for VF bus */ + resource_size_t barsz[PCI_SRIOV_NUM_BARS]; /* VF BAR size */ }; #ifdef CONFIG_PCI_ATS diff --git a/include/linux/pci.h b/include/linux/pci.h index 211e9da..1559658 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1675,6 +1675,7 @@ int pci_num_vf(struct pci_dev *dev); int pci_vfs_assigned(struct pci_dev *dev); int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs); int pci_sriov_get_totalvfs(struct pci_dev *dev); +resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno); #else static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
[PATCH V13 02/21] PCI: Print PF SR-IOV resource that contains all VF(n) BAR space
When we size VF BAR0, VF BAR1, etc., from the SR-IOV Capability of a PF, we learn the alignment requirement and amount of space consumed by a single VF. But when VFs are enabled, *each* of the NumVFs consumes that amount of space, so the total size of the PF resource is "VF BAR size * NumVFs". Add a printk of the total space consumed by the VFs corresponding to what we already do for normal non-IOV BARs. No functional change; new message only. [bhelgaas: split out into its own patch] Signed-off-by: Wei Yang --- drivers/pci/iov.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index c4c33ea..05f9d97 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -372,6 +372,8 @@ found: goto failed; } res->end = res->start + resource_size(res) * total - 1; + dev_info(&dev->dev, "VF(n) BAR%d space: %pR (contains BAR%d for %d VFs)\n", +i, res, i, total); nres++; } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V13 01/21] PCI: Print more info in sriov_enable() error message
From: Bjorn Helgaas If we don't have space for all the bus numbers required to enable VFs, print the largest bus number required and the range available. No functional change; improved error message only. Signed-off-by: Bjorn Helgaas --- drivers/pci/iov.c |7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 4b3a4ea..c4c33ea 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -180,6 +180,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) struct pci_dev *pdev; struct pci_sriov *iov = dev->sriov; int bars = 0; + u8 bus; if (!nr_virtfn) return 0; @@ -216,8 +217,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) iov->offset = offset; iov->stride = stride; - if (virtfn_bus(dev, nr_virtfn - 1) > dev->bus->busn_res.end) { - dev_err(&dev->dev, "SR-IOV: bus number out of range\n"); + bus = virtfn_bus(dev, nr_virtfn - 1); + if (bus > dev->bus->busn_res.end) { + dev_err(&dev->dev, "can't enable %d VFs (bus %02x out of range of %pR)\n", + nr_virtfn, bus, &dev->bus->busn_res); return -ENOMEM; } -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V13 00/21] Enable SRIOV on Power8
This patchset enables the SRIOV on POWER8. The gerneral idea is put each VF into one individual PE and allocate required resources like MMIO/DMA/MSI. The major difficulty comes from the MMIO allocation and adjustment for PF's IOV BAR. On P8, we use M64BT to cover a PF's IOV BAR, which could make an individual VF sit in its own PE. This gives more flexiblity, while at the mean time it brings on some restrictions on the PF's IOV BAR size and alignment. To achieve this effect, we need to do some hack on pci devices's resources. 1. Expand the IOV BAR properly. Done by pnv_pci_ioda_fixup_iov_resources(). 2. Shift the IOV BAR properly. Done by pnv_pci_vf_resource_shift(). 3. IOV BAR alignment is calculated by arch dependent function instead of an individual VF BAR size. Done by pnv_pcibios_sriov_resource_alignment(). 4. Take the IOV BAR alignment into consideration in the sizing and assigning. This is achieved by commit: "PCI: Take additional IOV BAR alignment in sizing and assigning" Test Environment: The SRIOV device tested is Emulex Lancer(10df:e220) and Mellanox ConnectX-3(15b3:1003) on POWER8. Examples on pass through a VF to guest through vfio: 1. unbind the original driver and bind to vfio-pci driver echo :06:0d.0 > /sys/bus/pci/devices/:06:0d.0/driver/unbind echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id Note: this should be done for each device in the same iommu_group 2. Start qemu and pass device through vfio /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \ -M pseries -m 2048 -enable-kvm -nographic \ -drive file=/home/ywywyang/kvm/fc19.img \ -monitor telnet:localhost:5435,server,nowait -boot cd \ -device "spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6" Verify this is the exact VF response: 1. ping from a machine in the same subnet(the broadcast domain) 2. run arp -n on this machine 9.115.251.20 ether 00:00:c9:df:ed:bf C eth0 3. ifconfig in the guest # ifconfig eth1 eth1: flags=4163 mtu 1500 inet 9.115.251.20 netmask 255.255.255.0 broadcast 9.115.251.255 inet6 fe80::200:c9ff:fedf:edbf prefixlen 64 scopeid 0x20 ether 00:00:c9:df:ed:bf txqueuelen 1000 (Ethernet) RX packets 175 bytes 13278 (12.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 58 bytes 9276 (9.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 4. They have the same MAC address Note: make sure you shutdown other network interfaces in guest. --- v13: * fix error in pcibios_iov_resource_alignment(), use pdev instead of dev * rename vf_num to num_vfs in pcibios_sriov_enable(), pnv_pci_vf_resource_shift(), pnv_pci_sriov_disable(), pnv_pci_sriov_enable(), pnv_pci_ioda2_setup_dma_pe() * add more explanation in commit "powerpc/pci: Don't unset PCI resources for VFs" * fix IOV BAR in hotplug path as well, and don't fixup an already added device * use roundup_pow_of_two() instead of __roundup_pow_of_two() * this is based on v4.0-rc1 v12: * remove "align" parameter from pcibios_iov_resource_alignment() default version returns pci_iov_resource_size() instead of the "align" parameter * in powerpc pcibios_iov_resource_alignment(), return pci_iov_resource_size() if there's no ppc_md function pointer * in pci_sriov_resource_alignment(), don't re-read base, since we saved the required alignment when reading it the first time * remove "vf_num" parameter from add_dev_pci_info() and remove_dev_pci_info(); use pci_sriov_get_totalvfs() instead * use dev_warn() instead of pr_warn() when possible * check to be sure IOV BAR is still in range after shifting, change pnv_pci_vf_resource_shift() from void to int * improve sriov_enable() error message * improve SR-IOV BAR sizing message * index IOV resources in conventional style * include preamble patches (refresh offset/stride when updating numVFs, calculate max buses required * restructure pci_iov_max_bus_range() to return value instead of updating internally, rename to virtfn_max_buses() * fix typos & formatting * expand documentation v11: * fix some compile warning v10: * remove weak function pcibios_iov_resource_size() the VF BAR size is stored in pci_sriov structure and retrieved from pci_iov_resource_size() * Use "Reserve additional" instead of "Expand" to be more acurate in the change log * add log message to show the PF's IOV BAR final size * add pcibios_sriov_enable/disable() weak funcion in sriov_enable/disable() for arch setup before enable VFs. Like the arch could fix up the BDF for VFs, since the change of
[git pull] Please pull mpe/linux.git powerpc-4.0-2 tag
Hi Linus, Please pull some powerpc fixes for 4.0: The following changes since commit c517d838eb7d07bbe9507871fab3931deccff539: Linux 4.0-rc1 (2015-02-22 18:21:14 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux.git tags/powerpc-4.0-2 for you to fetch changes up to 4ad04e5987115ece5fa8a0cf1dc72fcd4707e33e: powerpc/iommu: Remove IOMMU device references via bus notifier (2015-03-04 13:19:33 +1100) powerpc fixes for 4.0 - Fix for dynticks. - Fix for smpboot bug. - Fix for IOMMU group refcounting. Michael Ellerman (1): powerpc/smp: Wait until secondaries are active & online Nishanth Aravamudan (1): powerpc/iommu: Remove IOMMU device references via bus notifier Paul Clarke (1): powerpc: Re-enable dynticks arch/powerpc/include/asm/iommu.h | 6 ++ arch/powerpc/include/asm/irq_work.h| 9 + arch/powerpc/kernel/iommu.c| 26 ++ arch/powerpc/kernel/smp.c | 4 ++-- arch/powerpc/platforms/powernv/pci.c | 26 -- arch/powerpc/platforms/pseries/iommu.c | 2 ++ 6 files changed, 45 insertions(+), 28 deletions(-) create mode 100644 arch/powerpc/include/asm/irq_work.h signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 0/5] split ET_DYN ASLR from mmap ASLR
* Kees Cook wrote: > On Mon, Mar 2, 2015 at 11:31 PM, Ingo Molnar wrote: > > > > * Kees Cook wrote: > > > >> To address the "offset2lib" ASLR weakness[1], this separates ET_DYN > >> ASLR from mmap ASLR, as already done on s390. The architectures > >> that are already randomizing mmap (arm, arm64, mips, powerpc, s390, > >> and x86), have their various forms of arch_mmap_rnd() made available > >> via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these architectures, > >> arch_randomize_brk() is collapsed as well. > >> > >> This is an alternative to the solutions in: > >> https://lkml.org/lkml/2015/2/23/442 > > > > Looks good so far: > > > > Reviewed-by: Ingo Molnar > > > > While reviewing this series I also noticed that the following code > > could be factored out from architecture mmap code as well: > > > > - arch_pick_mmap_layout() uses very similar patterns across the > > platforms, with only few variations. Many architectures use > > the same duplicated mmap_is_legacy() helper as well. There's > > usually just trivial differences between mmap_legacy_base() > > approaches as well. > > I was nervous to start refactoring this code, but it's true: most of > it is the same. Well, it still needs to be done if we want to add new randomization features: code fractured over multiple architectures is a receipe for bugs, as this series demonstrates. So it first has to be made more maintainable. > > - arch_mmap_rnd(): the PF_RANDOMIZE checks are needlessly > > exposed to the arch routine - the arch routine should only > > concentrate on arch details, not generic flags like > > PF_RANDOMIZE. > > Yeah, excellent point. I will send a follow-up patch to move this > into binfmt_elf instead. I'd like to avoid removing it in any of the > other patches since each was attempting a single step in the > refactoring. Finegrained patches are ideal! > > In theory the mmap layout could be fully parametrized as well: > > i.e. no callback functions to architectures by default at all: > > just declarations of bits of randomization desired (or, available > > address space bits), and perhaps an arch helper to allow 32-bit > > vs. 64-bit address space distinctions. > > Yeah, I was considering that too, since each architecture has a > nearly identical arch_mmap_rnd() at this point. Only the size of the > entropy was changing. > > > 'Weird' architectures could provide special routines, but only by > > overriding the default behavior, which should be generic, safe and > > robust. > > Yeah, quite true. Should entropy size be a #define like > ELF_ET_DYN_BASE? Something like ASLR_MMAP_ENTROPY and > ASLR_MMAP_ENTROPY_32? [...] That would work I suspect. > [...] Is there a common function for determining a compat task? That > seemed to be per-arch too. Maybe arch_mmap_entropy()? Compat flags are a bit of a mess, and since they often tie into arch low level assembly code, they are hard to untangle. So maybe as an intermediate step add an is_compat() generic method, and make that obvious and self-defined function a per arch thing? But I'm just handwaving here - I suspect it has to be tried to see all the complications and to determine whether that's the best structure and whether it's a win ... Only one thing is certain: the current code is not compact and reviewable enough, and VM bits hiding in arch/*/mm/mmap.c tends to reduce net attention paid to these details. Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 4/5] mm: split ET_DYN ASLR from mmap ASLR
On Mon, 2015-03-02 at 16:19 -0800, Kees Cook wrote: > This fixes the "offset2lib" weakness in ASLR for arm, arm64, mips, > powerpc, and x86. The problem is that if there is a leak of ASLR from > the executable (ET_DYN), it means a leak of shared library offset as > well (mmap), and vice versa. Further details and a PoC of this attack > are available here: > http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html > > With this patch, a PIE linked executable (ET_DYN) has its own ASLR region: > > $ ./show_mmaps_pie > 54859ccd6000-54859ccd7000 r-xp ... /tmp/show_mmaps_pie > 54859ced6000-54859ced7000 r--p ... /tmp/show_mmaps_pie > 54859ced7000-54859ced8000 rw-p ... /tmp/show_mmaps_pie Just to be clear, it's the fact that the above vmas are in a different address range to those below that shows the patch is working, right? > 7f75be764000-7f75be91f000 r-xp ... /lib/x86_64-linux-gnu/libc.so.6 > 7f75be91f000-7f75beb1f000 ---p ... /lib/x86_64-linux-gnu/libc.so.6 On powerpc I'm seeing: # /bin/dash # cat /proc/$$/maps 524e-5251 r-xp 08:03 129814 /bin/dash 5251-5252 rw-p 0002 08:03 129814 /bin/dash 10034f2-10034f5 rw-p 00:00 0[heap] 3fffaeaf-3fffaeca r-xp 08:03 13529 /lib/powerpc64le-linux-gnu/libc-2.19.so 3fffaeca-3fffaecb rw-p 001a 08:03 13529 /lib/powerpc64le-linux-gnu/libc-2.19.so 3fffaecc-3fffaecd rw-p 00:00 0 3fffaecd-3fffaecf r-xp 00:00 0 [vdso] 3fffaecf-3fffaed2 r-xp 08:03 13539 /lib/powerpc64le-linux-gnu/ld-2.19.so 3fffaed2-3fffaed3 rw-p 0002 08:03 13539 /lib/powerpc64le-linux-gnu/ld-2.19.so 3fffc707-3fffc70a rw-p 00:00 0 [stack] Whereas previously the /bin/dash vmas were up at 3fff.. So looks good to me for powerpc. Acked-by: Michael Ellerman cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v12 17/21] powerpc/powernv: Shift VF resource with an offset
On Tue, Feb 24, 2015 at 03:00:37AM -0600, Bjorn Helgaas wrote: >On Tue, Feb 24, 2015 at 02:34:57AM -0600, Bjorn Helgaas wrote: >> From: Wei Yang >> >> On PowerNV platform, resource position in M64 implies the PE# the resource >> belongs to. In some cases, adjustment of a resource is necessary to locate >> it to a correct position in M64. >> >> Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address >> according to an offset. >> >> [bhelgaas: rework loops, rework overlap check, index resource[] >> conventionally, remove pci_regs.h include, squashed with next patch] >> Signed-off-by: Wei Yang >> Signed-off-by: Bjorn Helgaas > >... > >> +#ifdef CONFIG_PCI_IOV >> +static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) >> +{ >> +struct pci_dn *pdn = pci_get_pdn(dev); >> +int i; >> +struct resource *res, res2; >> +resource_size_t size; >> +u16 vf_num; >> + >> +if (!dev->is_physfn) >> +return -EINVAL; >> + >> +/* >> + * "offset" is in VFs. The M64 windows are sized so that when they >> + * are segmented, each segment is the same size as the IOV BAR. >> + * Each segment is in a separate PE, and the high order bits of the >> + * address are the PE number. Therefore, each VF's BAR is in a >> + * separate PE, and changing the IOV BAR start address changes the >> + * range of PEs the VFs are in. >> + */ >> +vf_num = pdn->vf_pes; >> +for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { >> +res = &dev->resource[i + PCI_IOV_RESOURCES]; >> +if (!res->flags || !res->parent) >> +continue; >> + >> +if (!pnv_pci_is_mem_pref_64(res->flags)) >> +continue; >> + >> +/* >> + * The actual IOV BAR range is determined by the start address >> + * and the actual size for vf_num VFs BAR. This check is to >> + * make sure that after shifting, the range will not overlap >> + * with another device. >> + */ >> +size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); >> +res2.flags = res->flags; >> +res2.start = res->start + (size * offset); >> +res2.end = res2.start + (size * vf_num) - 1; >> + >> +if (res2.end > res->end) { >> +dev_err(&dev->dev, "VF BAR%d: %pR would extend past %pR >> (trying to enable %d VFs shifted by %d)\n", >> +i, &res2, res, vf_num, offset); >> +return -EBUSY; >> +} >> +} >> + >> +for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { >> +res = &dev->resource[i + PCI_IOV_RESOURCES]; >> +if (!res->flags || !res->parent) >> +continue; >> + >> +if (!pnv_pci_is_mem_pref_64(res->flags)) >> +continue; >> + >> +size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); >> +res2 = *res; >> +res->start += size * offset; > >I'm still not happy about this fiddling with res->start. > >Increasing res->start means that in principle, the "size * offset" bytes >that we just removed from res are now available for allocation to somebody >else. I don't think we *will* give that space to anything else because of >the alignment restrictions you're enforcing, but "res" now doesn't >correctly describe the real resource map. > >Would you be able to just update the BAR here while leaving the struct >resource alone? In that case, it would look a little funny that lspci >would show a BAR value in the middle of the region in /proc/iomem, but >the /proc/iomem region would be more correct. Bjorn, I did some tests, while the result is not good. What I did is still write the shifted resource address to the device by pci_update_resource(), but I revert the res->start to the original one. If this step is not correct, please let me know. This can't work since after we revert the res->start, those VFs will be given resources from res->start instead of (res->start + offset * size). This is not what we expect. I have rebased/clean/change the code according to your comments based on this patch set. Will send it out v13 soon. > >> + >> +dev_info(&dev->dev, "VF BAR%d: %pR shifted to %pR (enabling %d >> VFs shifted by %d)\n", >> + i, &res2, res, vf_num, offset); >> +pci_update_resource(dev, i + PCI_IOV_RESOURCES); >> +} >> +pdn->max_vfs -= offset; >> +return 0; >> +} >> +#endif /* CONFIG_PCI_IOV */ -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 06/10] s390: standardize mmap_rnd() usage
In preparation for splitting out ET_DYN ASLR, this refactors the use of mmap_rnd() to be used similarly to arm and x86, and extracts the checking of PF_RANDOMIZE. Signed-off-by: Kees Cook --- arch/s390/mm/mmap.c | 34 +++--- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c index 179a2c20b01f..db57078075c5 100644 --- a/arch/s390/mm/mmap.c +++ b/arch/s390/mm/mmap.c @@ -62,20 +62,18 @@ static inline int mmap_is_legacy(void) static unsigned long mmap_rnd(void) { - if (!(current->flags & PF_RANDOMIZE)) - return 0; if (is_32bit_task()) return (get_random_int() & 0x7ff) << PAGE_SHIFT; else return (get_random_int() & mmap_rnd_mask) << PAGE_SHIFT; } -static unsigned long mmap_base_legacy(void) +static unsigned long mmap_base_legacy(unsigned long rnd) { - return TASK_UNMAPPED_BASE + mmap_rnd(); + return TASK_UNMAPPED_BASE + rnd; } -static inline unsigned long mmap_base(void) +static inline unsigned long mmap_base(unsigned long rnd) { unsigned long gap = rlimit(RLIMIT_STACK); @@ -84,7 +82,7 @@ static inline unsigned long mmap_base(void) else if (gap > MAX_GAP) gap = MAX_GAP; gap &= PAGE_MASK; - return STACK_TOP - stack_maxrandom_size() - mmap_rnd() - gap; + return STACK_TOP - stack_maxrandom_size() - rnd - gap; } unsigned long @@ -187,7 +185,11 @@ unsigned long randomize_et_dyn(void) if (!is_32bit_task()) /* Align to 4GB */ base &= ~((1UL << 32) - 1); - return base + mmap_rnd(); + + if (current->flags & PF_RANDOMIZE) + base += mmap_rnd(); + + return base; } #ifndef CONFIG_64BIT @@ -198,15 +200,20 @@ unsigned long randomize_et_dyn(void) */ void arch_pick_mmap_layout(struct mm_struct *mm) { + unsigned long random_factor = 0UL; + + if (current->flags & PF_RANDOMIZE) + random_factor = mmap_rnd(); + /* * Fall back to the standard layout if the personality * bit is set, or if the expected stack growth is unlimited: */ if (mmap_is_legacy()) { - mm->mmap_base = mmap_base_legacy(); + mm->mmap_base = mmap_base_legacy(random_factor); mm->get_unmapped_area = arch_get_unmapped_area; } else { - mm->mmap_base = mmap_base(); + mm->mmap_base = mmap_base(random_factor); mm->get_unmapped_area = arch_get_unmapped_area_topdown; } } @@ -273,15 +280,20 @@ s390_get_unmapped_area_topdown(struct file *filp, const unsigned long addr, */ void arch_pick_mmap_layout(struct mm_struct *mm) { + unsigned long random_factor = 0UL; + + if (current->flags & PF_RANDOMIZE) + random_factor = mmap_rnd(); + /* * Fall back to the standard layout if the personality * bit is set, or if the expected stack growth is unlimited: */ if (mmap_is_legacy()) { - mm->mmap_base = mmap_base_legacy(); + mm->mmap_base = mmap_base_legacy(random_factor); mm->get_unmapped_area = s390_get_unmapped_area; } else { - mm->mmap_base = mmap_base(); + mm->mmap_base = mmap_base(random_factor); mm->get_unmapped_area = s390_get_unmapped_area_topdown; } } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 09/10] mm: split ET_DYN ASLR from mmap ASLR
This fixes the "offset2lib" weakness in ASLR for arm, arm64, mips, powerpc, and x86. The problem is that if there is a leak of ASLR from the executable (ET_DYN), it means a leak of shared library offset as well (mmap), and vice versa. Further details and a PoC of this attack is available here: http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html With this patch, a PIE linked executable (ET_DYN) has its own ASLR region: $ ./show_mmaps_pie 54859ccd6000-54859ccd7000 r-xp ... /tmp/show_mmaps_pie 54859ced6000-54859ced7000 r--p ... /tmp/show_mmaps_pie 54859ced7000-54859ced8000 rw-p ... /tmp/show_mmaps_pie 7f75be764000-7f75be91f000 r-xp ... /lib/x86_64-linux-gnu/libc.so.6 7f75be91f000-7f75beb1f000 ---p ... /lib/x86_64-linux-gnu/libc.so.6 7f75beb1f000-7f75beb23000 r--p ... /lib/x86_64-linux-gnu/libc.so.6 7f75beb23000-7f75beb25000 rw-p ... /lib/x86_64-linux-gnu/libc.so.6 7f75beb25000-7f75beb2a000 rw-p ... 7f75beb2a000-7f75beb4d000 r-xp ... /lib64/ld-linux-x86-64.so.2 7f75bed45000-7f75bed46000 rw-p ... 7f75bed46000-7f75bed47000 r-xp ... 7f75bed47000-7f75bed4c000 rw-p ... 7f75bed4c000-7f75bed4d000 r--p ... /lib64/ld-linux-x86-64.so.2 7f75bed4d000-7f75bed4e000 rw-p ... /lib64/ld-linux-x86-64.so.2 7f75bed4e000-7f75bed4f000 rw-p ... 7fffb3741000-7fffb3762000 rw-p ... [stack] 7fffb377b000-7fffb377d000 r--p ... [vvar] 7fffb377d000-7fffb377f000 r-xp ... [vdso] The change is to add a call the newly created arch_mmap_rnd() into the ELF loader for handling ET_DYN ASLR in a separate region from mmap ASLR, as was already done on s390. Removes CONFIG_BINFMT_ELF_RANDOMIZE_PIE, which is no longer needed. Reported-by: Hector Marco-Gisbert Signed-off-by: Kees Cook --- arch/arm/Kconfig| 1 - arch/arm64/Kconfig | 1 - arch/mips/Kconfig | 1 - arch/powerpc/Kconfig| 1 - arch/s390/include/asm/elf.h | 5 ++--- arch/s390/mm/mmap.c | 8 arch/x86/Kconfig| 1 - fs/Kconfig.binfmt | 3 --- fs/binfmt_elf.c | 18 -- 9 files changed, 6 insertions(+), 33 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 248d99cabaa8..e2f0ef9c6ee3 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1,7 +1,6 @@ config ARM bool default y - select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 5f469095e0e2..07e0fc7adc88 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1,6 +1,5 @@ config ARM64 def_bool y - select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_GCOV_PROFILE_ALL diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 72ce5cece768..557c5f1772c1 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -23,7 +23,6 @@ config MIPS select HAVE_KRETPROBES select HAVE_DEBUG_KMEMLEAK select HAVE_SYSCALL_TRACEPOINTS - select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ELF_RANDOMIZE select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES && 64BIT select RTC_LIB if !MACH_LOONGSON diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 14fe1c411489..910fa4f9ad1e 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -88,7 +88,6 @@ config PPC select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO select BINFMT_ELF - select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ELF_RANDOMIZE select OF select OF_EARLY_FLATTREE diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h index 2e63de8aac7c..d0db9d944b6d 100644 --- a/arch/s390/include/asm/elf.h +++ b/arch/s390/include/asm/elf.h @@ -163,10 +163,9 @@ extern unsigned int vdso_enabled; the loader. We need to make sure that it is out of the way of the program that it will "exec", and that there is sufficient room for the brk. 64-bit tasks are aligned to 4GB. */ -extern unsigned long randomize_et_dyn(void); -#define ELF_ET_DYN_BASE (randomize_et_dyn() + (is_32bit_task() ? \ +#define ELF_ET_DYN_BASE (is_32bit_task() ? \ (STACK_TOP / 3 * 2) : \ - (STACK_TOP / 3 * 2) & ~((1UL << 32) - 1))) + (STACK_TOP / 3 * 2) & ~((1UL << 32) - 1)) /* This yields a mask that user programs can use to figure out what instruction set this CPU supports. */ diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c index 8c11536f972d..bb3367c5cb0b 100644 --- a/arch/s390/mm/mmap.c +++ b/arch/s390/mm/mmap.c @@ -177,14 +177,6 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, return addr; } -unsigned long randomize_et_dyn(void
[PATCH v3 04/10] mips: extract logic for mmap_rnd()
In preparation for splitting out ET_DYN ASLR, extract the mmap ASLR selection into a separate function. Signed-off-by: Kees Cook --- arch/mips/mm/mmap.c | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c index f1baadd56e82..673a5cfe082f 100644 --- a/arch/mips/mm/mmap.c +++ b/arch/mips/mm/mmap.c @@ -142,18 +142,26 @@ unsigned long arch_get_unmapped_area_topdown(struct file *filp, addr0, len, pgoff, flags, DOWN); } +static unsigned long mmap_rnd(void) +{ + unsigned long rnd; + + rnd = (unsigned long)get_random_int(); + rnd <<= PAGE_SHIFT; + if (TASK_IS_32BIT_ADDR) + random_factor &= 0xfful; + else + random_factor &= 0xffful; + + return rnd; +} + void arch_pick_mmap_layout(struct mm_struct *mm) { unsigned long random_factor = 0UL; - if (current->flags & PF_RANDOMIZE) { - random_factor = get_random_int(); - random_factor = random_factor << PAGE_SHIFT; - if (TASK_IS_32BIT_ADDR) - random_factor &= 0xfful; - else - random_factor &= 0xffful; - } + if (current->flags & PF_RANDOMIZE) + random_factor = mmap_rnd(); if (mmap_is_legacy()) { mm->mmap_base = TASK_UNMAPPED_BASE + random_factor; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 07/10] mm: expose arch_mmap_rnd when available
When an architecture fully supports randomizing the ELF load location, a per-arch mmap_rnd() function is used to find a randomized mmap base. In preparation for randomizing the location of ET_DYN binaries separately from mmap, this renames and exports these functions as arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE for describing this feature on architectures that support it (which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390 already supports a separated ET_DYN ASLR from mmap ASLR without the ARCH_BINFMT_ELF_RANDOMIZE_PIE logic). Signed-off-by: Kees Cook --- arch/Kconfig | 7 +++ arch/arm/Kconfig | 1 + arch/arm/mm/mmap.c| 4 ++-- arch/arm64/Kconfig| 1 + arch/arm64/mm/mmap.c | 4 ++-- arch/mips/Kconfig | 1 + arch/mips/mm/mmap.c | 4 ++-- arch/powerpc/Kconfig | 1 + arch/powerpc/mm/mmap.c| 4 ++-- arch/s390/Kconfig | 1 + arch/s390/mm/mmap.c | 8 arch/x86/Kconfig | 1 + arch/x86/mm/mmap.c| 4 ++-- include/linux/elf-randomize.h | 10 ++ 14 files changed, 37 insertions(+), 14 deletions(-) create mode 100644 include/linux/elf-randomize.h diff --git a/arch/Kconfig b/arch/Kconfig index 05d7a8a458d5..9ff5aa8fa2c1 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -484,6 +484,13 @@ config HAVE_IRQ_EXIT_ON_IRQ_STACK This spares a stack switch and improves cache usage on softirq processing. +config ARCH_HAS_ELF_RANDOMIZE + bool + help + An architecture supports choosing randomized locations for + stack, mmap, brk, and ET_DYN. Defined functions: + - arch_mmap_rnd() + # # ABI hall of shame # diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 9f1f09a2bc9b..248d99cabaa8 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -3,6 +3,7 @@ config ARM default y select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE + select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAVE_CUSTOM_GPIO_H select ARCH_HAS_GCOV_PROFILE_ALL diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c index 15a8160096b3..407dc786583a 100644 --- a/arch/arm/mm/mmap.c +++ b/arch/arm/mm/mmap.c @@ -169,7 +169,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, return addr; } -static unsigned long mmap_rnd(void) +unsigned long arch_mmap_rnd(void) { unsigned long rnd; @@ -184,7 +184,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm) unsigned long random_factor = 0UL; if (current->flags & PF_RANDOMIZE) - random_factor = mmap_rnd(); + random_factor = arch_mmap_rnd(); if (mmap_is_legacy()) { mm->mmap_base = TASK_UNMAPPED_BASE + random_factor; diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 1b8e97331ffb..5f469095e0e2 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -2,6 +2,7 @@ config ARM64 def_bool y select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE + select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_SG_CHAIN select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c index 837626d0e142..c25f8ed6d7b6 100644 --- a/arch/arm64/mm/mmap.c +++ b/arch/arm64/mm/mmap.c @@ -47,7 +47,7 @@ static int mmap_is_legacy(void) return sysctl_legacy_va_layout; } -static unsigned long mmap_rnd(void) +unsigned long arch_mmap_rnd(void) { unsigned long rnd; @@ -77,7 +77,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm) unsigned long random_factor = 0UL; if (current->flags & PF_RANDOMIZE) - random_factor = mmap_rnd(); + random_factor = arch_mmap_rnd(); /* * Fall back to the standard layout if the personality bit is set, or diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index c7a16904cd03..72ce5cece768 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -24,6 +24,7 @@ config MIPS select HAVE_DEBUG_KMEMLEAK select HAVE_SYSCALL_TRACEPOINTS select ARCH_BINFMT_ELF_RANDOMIZE_PIE + select ARCH_HAS_ELF_RANDOMIZE select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES && 64BIT select RTC_LIB if !MACH_LOONGSON select GENERIC_ATOMIC64 if !64BIT diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c index 673a5cfe082f..ce54fff57c93 100644 --- a/arch/mips/mm/mmap.c +++ b/arch/mips/mm/mmap.c @@ -142,7 +142,7 @@ unsigned long arch_get_unmapped_area_topdown(struct file *filp, addr0, len, pgoff, flags, DOWN); } -static unsigned long mmap_rnd(void) +unsigned long arch_mmap_rnd(void) {
[PATCH v3 03/10] arm64: standardize mmap_rnd() usage
In preparation for splitting out ET_DYN ASLR, this refactors the use of mmap_rnd() to be used similarly to arm and x86. This additionally enables mmap ASLR on legacy mmap layouts, which appeared to be missing on arm64, and was already supported on arm. Additionally removes a copy/pasted declaration of an unused function. Signed-off-by: Kees Cook --- arch/arm64/include/asm/elf.h | 1 - arch/arm64/mm/mmap.c | 18 +++--- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index 1f65be393139..f724db00b235 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -125,7 +125,6 @@ typedef struct user_fpsimd_state elf_fpregset_t; * the loader. We need to make sure that it is out of the way of the program * that it will "exec", and that there is sufficient room for the brk. */ -extern unsigned long randomize_et_dyn(unsigned long base); #define ELF_ET_DYN_BASE(2 * TASK_SIZE_64 / 3) /* diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c index 54922d1275b8..837626d0e142 100644 --- a/arch/arm64/mm/mmap.c +++ b/arch/arm64/mm/mmap.c @@ -49,15 +49,14 @@ static int mmap_is_legacy(void) static unsigned long mmap_rnd(void) { - unsigned long rnd = 0; + unsigned long rnd; - if (current->flags & PF_RANDOMIZE) - rnd = (long)get_random_int() & STACK_RND_MASK; + rnd = (unsigned long)get_random_int() & STACK_RND_MASK; return rnd << PAGE_SHIFT; } -static unsigned long mmap_base(void) +static unsigned long mmap_base(unsigned long base) { unsigned long gap = rlimit(RLIMIT_STACK); @@ -66,7 +65,7 @@ static unsigned long mmap_base(void) else if (gap > MAX_GAP) gap = MAX_GAP; - return PAGE_ALIGN(STACK_TOP - gap - mmap_rnd()); + return PAGE_ALIGN(STACK_TOP - gap - base); } /* @@ -75,15 +74,20 @@ static unsigned long mmap_base(void) */ void arch_pick_mmap_layout(struct mm_struct *mm) { + unsigned long random_factor = 0UL; + + if (current->flags & PF_RANDOMIZE) + random_factor = mmap_rnd(); + /* * Fall back to the standard layout if the personality bit is set, or * if the expected stack growth is unlimited: */ if (mmap_is_legacy()) { - mm->mmap_base = TASK_UNMAPPED_BASE; + mm->mmap_base = TASK_UNMAPPED_BASE + random_factor; mm->get_unmapped_area = arch_get_unmapped_area; } else { - mm->mmap_base = mmap_base(); + mm->mmap_base = mmap_base(random_factor); mm->get_unmapped_area = arch_get_unmapped_area_topdown; } } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 10/10] mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
The arch_randomize_brk() function is used on several architectures, even those that don't support ET_DYN ASLR. To avoid bulky extern/#define tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for the architectures that support it, while still handling CONFIG_COMPAT_BRK. Signed-off-by: Kees Cook --- arch/Kconfig | 1 + arch/arm/include/asm/elf.h | 4 arch/arm64/include/asm/elf.h | 4 arch/mips/include/asm/elf.h| 4 arch/powerpc/include/asm/elf.h | 4 arch/s390/include/asm/elf.h| 3 --- arch/x86/include/asm/elf.h | 3 --- fs/binfmt_elf.c| 4 +--- include/linux/elf-randomize.h | 12 9 files changed, 14 insertions(+), 25 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 9ff5aa8fa2c1..d4f270a54fe6 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -490,6 +490,7 @@ config ARCH_HAS_ELF_RANDOMIZE An architecture supports choosing randomized locations for stack, mmap, brk, and ET_DYN. Defined functions: - arch_mmap_rnd() + - arch_randomize_brk() # # ABI hall of shame diff --git a/arch/arm/include/asm/elf.h b/arch/arm/include/asm/elf.h index afb9cafd3786..c1ff8ab12914 100644 --- a/arch/arm/include/asm/elf.h +++ b/arch/arm/include/asm/elf.h @@ -125,10 +125,6 @@ int dump_task_regs(struct task_struct *t, elf_gregset_t *elfregs); extern void elf_set_personality(const struct elf32_hdr *); #define SET_PERSONALITY(ex)elf_set_personality(&(ex)) -struct mm_struct; -extern unsigned long arch_randomize_brk(struct mm_struct *mm); -#define arch_randomize_brk arch_randomize_brk - #ifdef CONFIG_MMU #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1 struct linux_binprm; diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index f724db00b235..faad6df49e5b 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -156,10 +156,6 @@ extern int arch_setup_additional_pages(struct linux_binprm *bprm, #define STACK_RND_MASK (0x3 >> (PAGE_SHIFT - 12)) #endif -struct mm_struct; -extern unsigned long arch_randomize_brk(struct mm_struct *mm); -#define arch_randomize_brk arch_randomize_brk - #ifdef CONFIG_COMPAT #ifdef __AARCH64EB__ diff --git a/arch/mips/include/asm/elf.h b/arch/mips/include/asm/elf.h index 535f196ffe02..31d747d46a23 100644 --- a/arch/mips/include/asm/elf.h +++ b/arch/mips/include/asm/elf.h @@ -410,10 +410,6 @@ struct linux_binprm; extern int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp); -struct mm_struct; -extern unsigned long arch_randomize_brk(struct mm_struct *mm); -#define arch_randomize_brk arch_randomize_brk - struct arch_elf_state { int fp_abi; int interp_fp_abi; diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h index 57d289acb803..ee46ffef608e 100644 --- a/arch/powerpc/include/asm/elf.h +++ b/arch/powerpc/include/asm/elf.h @@ -128,10 +128,6 @@ extern int arch_setup_additional_pages(struct linux_binprm *bprm, (0x7ff >> (PAGE_SHIFT - 12)) : \ (0x3 >> (PAGE_SHIFT - 12))) -extern unsigned long arch_randomize_brk(struct mm_struct *mm); -#define arch_randomize_brk arch_randomize_brk - - #ifdef CONFIG_SPU_BASE /* Notes used in ET_CORE. Note name is "SPU//". */ #define NT_SPU 1 diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h index d0db9d944b6d..fdda72e56404 100644 --- a/arch/s390/include/asm/elf.h +++ b/arch/s390/include/asm/elf.h @@ -226,9 +226,6 @@ struct linux_binprm; #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1 int arch_setup_additional_pages(struct linux_binprm *, int); -extern unsigned long arch_randomize_brk(struct mm_struct *mm); -#define arch_randomize_brk arch_randomize_brk - void *fill_cpu_elf_notes(void *ptr, struct save_area *sa, __vector128 *vxrs); #endif diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index ca3347a9dab5..bbdace22daf8 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -338,9 +338,6 @@ extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp); #define compat_arch_setup_additional_pages compat_arch_setup_additional_pages -extern unsigned long arch_randomize_brk(struct mm_struct *mm); -#define arch_randomize_brk arch_randomize_brk - /* * True on X86_32 or when emulating IA32 on X86_64 */ diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 6f08f5fa99dc..a115da230ce0 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1043,15 +1043,13 @@ static int load_elf_binary(struct linux_binprm *bprm) current->mm->end_data = end_data; current->mm->start_stack = bprm->p; -#ifdef arch_randomize_brk if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) { current->mm->brk = current->mm->start_brk = a
[PATCH v3 02/10] x86: standardize mmap_rnd() usage
In preparation for splitting out ET_DYN ASLR, this refactors the use of mmap_rnd() to be used similarly to arm, and extracts the checking of PF_RANDOMIZE. Signed-off-by: Kees Cook --- arch/x86/mm/mmap.c | 36 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index df4552bd239e..ebfa52030d5c 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -67,22 +67,21 @@ static int mmap_is_legacy(void) static unsigned long mmap_rnd(void) { - unsigned long rnd = 0; + unsigned long rnd; /* - * 8 bits of randomness in 32bit mmaps, 20 address space bits - * 28 bits of randomness in 64bit mmaps, 40 address space bits - */ - if (current->flags & PF_RANDOMIZE) { - if (mmap_is_ia32()) - rnd = get_random_int() % (1<<8); - else - rnd = get_random_int() % (1<<28); - } +* 8 bits of randomness in 32bit mmaps, 20 address space bits +* 28 bits of randomness in 64bit mmaps, 40 address space bits +*/ + if (mmap_is_ia32()) + rnd = (unsigned long)get_random_int() % (1<<8); + else + rnd = (unsigned long)get_random_int() % (1<<28); + return rnd << PAGE_SHIFT; } -static unsigned long mmap_base(void) +static unsigned long mmap_base(unsigned long rnd) { unsigned long gap = rlimit(RLIMIT_STACK); @@ -91,19 +90,19 @@ static unsigned long mmap_base(void) else if (gap > MAX_GAP) gap = MAX_GAP; - return PAGE_ALIGN(TASK_SIZE - gap - mmap_rnd()); + return PAGE_ALIGN(TASK_SIZE - gap - rnd); } /* * Bottom-up (legacy) layout on X86_32 did not support randomization, X86_64 * does, but not when emulating X86_32 */ -static unsigned long mmap_legacy_base(void) +static unsigned long mmap_legacy_base(unsigned long rnd) { if (mmap_is_ia32()) return TASK_UNMAPPED_BASE; else - return TASK_UNMAPPED_BASE + mmap_rnd(); + return TASK_UNMAPPED_BASE + rnd; } /* @@ -112,13 +111,18 @@ static unsigned long mmap_legacy_base(void) */ void arch_pick_mmap_layout(struct mm_struct *mm) { - mm->mmap_legacy_base = mmap_legacy_base(); - mm->mmap_base = mmap_base(); + unsigned long random_factor = 0UL; + + if (current->flags & PF_RANDOMIZE) + random_factor = mmap_rnd(); + + mm->mmap_legacy_base = mmap_legacy_base(random_factor); if (mmap_is_legacy()) { mm->mmap_base = mm->mmap_legacy_base; mm->get_unmapped_area = arch_get_unmapped_area; } else { + mm->mmap_base = mmap_base(random_factor); mm->get_unmapped_area = arch_get_unmapped_area_topdown; } } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 01/10] arm: factor out mmap ASLR into mmap_rnd
In preparation for splitting out ET_DYN ASLR, this moves the ASLR calculations for mmap on ARM into a separate routine, similar to x86. This also removes the redundant check of personality (PF_RANDOMIZE is already set before calling arch_pick_mmap_layout). Signed-off-by: Kees Cook --- arch/arm/mm/mmap.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c index 5e85ed371364..15a8160096b3 100644 --- a/arch/arm/mm/mmap.c +++ b/arch/arm/mm/mmap.c @@ -169,14 +169,22 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, return addr; } +static unsigned long mmap_rnd(void) +{ + unsigned long rnd; + + /* 8 bits of randomness in 20 address space bits */ + rnd = (unsigned long)get_random_int() % (1 << 8); + + return rnd << PAGE_SHIFT; +} + void arch_pick_mmap_layout(struct mm_struct *mm) { unsigned long random_factor = 0UL; - /* 8 bits of randomness in 20 address space bits */ - if ((current->flags & PF_RANDOMIZE) && - !(current->personality & ADDR_NO_RANDOMIZE)) - random_factor = (get_random_int() % (1 << 8)) << PAGE_SHIFT; + if (current->flags & PF_RANDOMIZE) + random_factor = mmap_rnd(); if (mmap_is_legacy()) { mm->mmap_base = TASK_UNMAPPED_BASE + random_factor; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 05/10] powerpc: standardize mmap_rnd() usage
In preparation for splitting out ET_DYN ASLR, this refactors the use of mmap_rnd() to be used similarly to arm and x86. Signed-off-by: Kees Cook --- Can mmap ASLR be safely enabled in the legacy mmap case here? Other archs use "mm->mmap_base = TASK_UNMAPPED_BASE + random_factor". --- arch/powerpc/mm/mmap.c | 26 +++--- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c index cb8bdbe4972f..3d7088bfe93c 100644 --- a/arch/powerpc/mm/mmap.c +++ b/arch/powerpc/mm/mmap.c @@ -55,19 +55,18 @@ static inline int mmap_is_legacy(void) static unsigned long mmap_rnd(void) { - unsigned long rnd = 0; + unsigned long rnd; + + /* 8MB for 32bit, 1GB for 64bit */ + if (is_32bit_task()) + rnd = (unsigned long)get_random_int() % (1<<(23-PAGE_SHIFT)); + else + rnd = (unsigned long)get_random_int() % (1<<(30-PAGE_SHIFT)); - if (current->flags & PF_RANDOMIZE) { - /* 8MB for 32bit, 1GB for 64bit */ - if (is_32bit_task()) - rnd = (long)(get_random_int() % (1<<(23-PAGE_SHIFT))); - else - rnd = (long)(get_random_int() % (1<<(30-PAGE_SHIFT))); - } return rnd << PAGE_SHIFT; } -static inline unsigned long mmap_base(void) +static inline unsigned long mmap_base(unsigned long base) { unsigned long gap = rlimit(RLIMIT_STACK); @@ -76,7 +75,7 @@ static inline unsigned long mmap_base(void) else if (gap > MAX_GAP) gap = MAX_GAP; - return PAGE_ALIGN(TASK_SIZE - gap - mmap_rnd()); + return PAGE_ALIGN(TASK_SIZE - gap - base); } /* @@ -85,6 +84,11 @@ static inline unsigned long mmap_base(void) */ void arch_pick_mmap_layout(struct mm_struct *mm) { + unsigned long random_factor = 0UL; + + if (current->flags & PF_RANDOMIZE) + random_factor = mmap_rnd(); + /* * Fall back to the standard layout if the personality * bit is set, or if the expected stack growth is unlimited: @@ -93,7 +97,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm) mm->mmap_base = TASK_UNMAPPED_BASE; mm->get_unmapped_area = arch_get_unmapped_area; } else { - mm->mmap_base = mmap_base(); + mm->mmap_base = mmap_base(random_factor); mm->get_unmapped_area = arch_get_unmapped_area_topdown; } } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 08/10] s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
In preparation for moving ET_DYN randomization into the ELF loader (which requires a static ELF_ET_DYN_BASE), this redefines s390's existing ET_DYN randomization in a call to arch_mmap_rnd(). This refactoring results in the same ET_DYN randomization on s390. Signed-off-by: Kees Cook --- arch/s390/include/asm/elf.h | 8 +--- arch/s390/mm/mmap.c | 11 ++- 2 files changed, 7 insertions(+), 12 deletions(-) diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h index c9df40b5c0ac..2e63de8aac7c 100644 --- a/arch/s390/include/asm/elf.h +++ b/arch/s390/include/asm/elf.h @@ -161,10 +161,12 @@ extern unsigned int vdso_enabled; /* This is the location that an ET_DYN program is loaded if exec'ed. Typical use of this is to invoke "./ld.so someprog" to test out a new version of the loader. We need to make sure that it is out of the way of the program - that it will "exec", and that there is sufficient room for the brk. */ - + that it will "exec", and that there is sufficient room for the brk. 64-bit + tasks are aligned to 4GB. */ extern unsigned long randomize_et_dyn(void); -#define ELF_ET_DYN_BASErandomize_et_dyn() +#define ELF_ET_DYN_BASE (randomize_et_dyn() + (is_32bit_task() ? \ + (STACK_TOP / 3 * 2) : \ + (STACK_TOP / 3 * 2) & ~((1UL << 32) - 1))) /* This yields a mask that user programs can use to figure out what instruction set this CPU supports. */ diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c index a94504d99c47..8c11536f972d 100644 --- a/arch/s390/mm/mmap.c +++ b/arch/s390/mm/mmap.c @@ -179,17 +179,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, unsigned long randomize_et_dyn(void) { - unsigned long base; - - base = STACK_TOP / 3 * 2; - if (!is_32bit_task()) - /* Align to 4GB */ - base &= ~((1UL << 32) - 1); - if (current->flags & PF_RANDOMIZE) - base += arch_mmap_rnd(); + return arch_mmap_rnd(); - return base; + return 0UL; } #ifndef CONFIG_64BIT -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 0/10] split ET_DYN ASLR from mmap ASLR
To address the "offset2lib" ASLR weakness[1], this separates ET_DYN ASLR from mmap ASLR, as already done on s390. The architectures that are already randomizing mmap (arm, arm64, mips, powerpc, s390, and x86), have their various forms of arch_mmap_rnd() made available via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these architectures, arch_randomize_brk() is collapsed as well. This is an alternative to the solutions in: https://lkml.org/lkml/2015/2/23/442 I've been able to test x86 and arm, and the buildbot (so far) seems happy with building the rest. Thanks! -Kees [1] http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html v3: - split change on a per-arch basis for easier review - moved PF_RANDOMIZE check out of per-arch code (ingo) v2: - verbosified the commit logs, especially 4/5 (akpm) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update
On 03/03/2015 05:20 PM, Cyril Bur wrote: > On Tue, 2015-03-03 at 15:15 -0800, Tyrel Datwyler wrote: >> On 03/02/2015 01:49 PM, Tyrel Datwyler wrote: >>> On 03/01/2015 09:20 PM, Cyril Bur wrote: On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote: > We currently use the device tree update code in the kernel after resuming > from a suspend operation to re-sync the kernels view of the device tree > with > that of the hypervisor. The code as it stands is not endian safe as it > relies > on parsing buffers returned by RTAS calls that thusly contains data in big > endian format. > > This patch annotates variables and structure members with __be types as > well > as performing necessary byte swaps to cpu endian for data that needs to be > parsed. > > Signed-off-by: Tyrel Datwyler > --- > arch/powerpc/platforms/pseries/mobility.c | 36 > --- > 1 file changed, 19 insertions(+), 17 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/mobility.c > b/arch/powerpc/platforms/pseries/mobility.c > index 29e4f04..0b1f70e 100644 > --- a/arch/powerpc/platforms/pseries/mobility.c > +++ b/arch/powerpc/platforms/pseries/mobility.c > @@ -25,10 +25,10 @@ > static struct kobject *mobility_kobj; > > struct update_props_workarea { > - u32 phandle; > - u32 state; > - u64 reserved; > - u32 nprops; > + __be32 phandle; > + __be32 state; > + __be64 reserved; > + __be32 nprops; > } __packed; > > #define NODE_ACTION_MASK 0xff00 > @@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, > struct property **prop, > return 0; > } > > -static int update_dt_node(u32 phandle, s32 scope) > +static int update_dt_node(__be32 phandle, s32 scope) > { On line 153 of this function: dn = of_find_node_by_phandle(phandle); You're passing a __be32 to device tree code, if we can treat the phandle as a opaque value returned to us from the rtas call and pass it around like that then all good. >> >> After digging deeper the device_node->phandle is stored in cpu endian >> under the covers. So, for the of_find_node_by_phandle() we do need to >> convert the phandle to cpu endian first. It appears I got lucky with the >> update fixing the observed RMC issue because the phandle for the root >> node seems to always be 0x. >> > I think we've both switched opinions here, initially I thought an endian > conversion was necessary but turns out that all of_find_node_by_phandle > really does is: >for_each_of_allnodes(np) > if (np->phandle == handle) > break; >of_node_get(np); > > The == is safe either way and I think the of code might be trying to > imply that it doesn't matter by having a typedefed type 'phandle'. > > I'm still digging around, we want to get this right! When the device tree is unflattened the phandle is byte swapped to cpu endian. The following code is from unflatten_dt_node(). if (strcmp(pname, "ibm,phandle") == 0) np->phandle = be32_to_cpup(p); I added some debug to the of_find_node_by_phandle() and verified if the phandle isn't swapped to cpu endian we fail to find a matching node except in the case where the phandle is equivalent in both big and little endian. -Tyrel > > > Cyril >> -Tyrel >> >>> >>> Yes, of_find_node_by_phandle directly compares phandle passed in against >>> the handle stored in each device_node when searching for a matching >>> node. Since, the device tree is big endian it follows that the big >>> endian phandle received in the rtas buffer needs no conversion. >>> >>> Further, we need to pass the phandle to ibm,update-properties in the >>> work area which is also required to be big endian. So, again it seemed >>> that converting to cpu endian was a waste of effort just to convert it >>> back to big endian. >>> Its also hard to be sure if these need to be BE and have always been that way because we've always run BE so they've never actually wanted CPU endian its just that CPU endian has always been BE (I think I started rambling...) Just want to check that *not* converting them is done on purpose. >>> >>> Yes, I explicitly did not convert them on purpose. As mentioned above we >>> need phandle in BE for the ibm,update-properties rtas work area. >>> Similarly, drc_index needs to be in BE for the ibm,configure-connector >>> rtas work area. Outside, of that we do no other manipulation of those >>> values. >>> And having read on, I'm assuming the answer is yes since this observation is true for your changes which affect: delete_dt_node() update_dt_node() add_dt_node() Worth noting that you didn't change the definition of delete_dt_node() >>> >>> You are correct. Oversight. I will fix that as it should
Re: [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update
On Tue, 2015-03-03 at 15:15 -0800, Tyrel Datwyler wrote: > On 03/02/2015 01:49 PM, Tyrel Datwyler wrote: > > On 03/01/2015 09:20 PM, Cyril Bur wrote: > >> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote: > >>> We currently use the device tree update code in the kernel after resuming > >>> from a suspend operation to re-sync the kernels view of the device tree > >>> with > >>> that of the hypervisor. The code as it stands is not endian safe as it > >>> relies > >>> on parsing buffers returned by RTAS calls that thusly contains data in big > >>> endian format. > >>> > >>> This patch annotates variables and structure members with __be types as > >>> well > >>> as performing necessary byte swaps to cpu endian for data that needs to be > >>> parsed. > >>> > >>> Signed-off-by: Tyrel Datwyler > >>> --- > >>> arch/powerpc/platforms/pseries/mobility.c | 36 > >>> --- > >>> 1 file changed, 19 insertions(+), 17 deletions(-) > >>> > >>> diff --git a/arch/powerpc/platforms/pseries/mobility.c > >>> b/arch/powerpc/platforms/pseries/mobility.c > >>> index 29e4f04..0b1f70e 100644 > >>> --- a/arch/powerpc/platforms/pseries/mobility.c > >>> +++ b/arch/powerpc/platforms/pseries/mobility.c > >>> @@ -25,10 +25,10 @@ > >>> static struct kobject *mobility_kobj; > >>> > >>> struct update_props_workarea { > >>> - u32 phandle; > >>> - u32 state; > >>> - u64 reserved; > >>> - u32 nprops; > >>> + __be32 phandle; > >>> + __be32 state; > >>> + __be64 reserved; > >>> + __be32 nprops; > >>> } __packed; > >>> > >>> #define NODE_ACTION_MASK 0xff00 > >>> @@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, > >>> struct property **prop, > >>> return 0; > >>> } > >>> > >>> -static int update_dt_node(u32 phandle, s32 scope) > >>> +static int update_dt_node(__be32 phandle, s32 scope) > >>> { > >> > >> On line 153 of this function: > >>dn = of_find_node_by_phandle(phandle); > >> > >> You're passing a __be32 to device tree code, if we can treat the phandle > >> as a opaque value returned to us from the rtas call and pass it around > >> like that then all good. > > After digging deeper the device_node->phandle is stored in cpu endian > under the covers. So, for the of_find_node_by_phandle() we do need to > convert the phandle to cpu endian first. It appears I got lucky with the > update fixing the observed RMC issue because the phandle for the root > node seems to always be 0x. > I think we've both switched opinions here, initially I thought an endian conversion was necessary but turns out that all of_find_node_by_phandle really does is: for_each_of_allnodes(np) if (np->phandle == handle) break; of_node_get(np); The == is safe either way and I think the of code might be trying to imply that it doesn't matter by having a typedefed type 'phandle'. I'm still digging around, we want to get this right! Cyril > -Tyrel > > > > > Yes, of_find_node_by_phandle directly compares phandle passed in against > > the handle stored in each device_node when searching for a matching > > node. Since, the device tree is big endian it follows that the big > > endian phandle received in the rtas buffer needs no conversion. > > > > Further, we need to pass the phandle to ibm,update-properties in the > > work area which is also required to be big endian. So, again it seemed > > that converting to cpu endian was a waste of effort just to convert it > > back to big endian. > > > >> Its also hard to be sure if these need to be BE and have always been > >> that way because we've always run BE so they've never actually wanted > >> CPU endian its just that CPU endian has always been BE (I think I > >> started rambling...) > >> > >> Just want to check that *not* converting them is done on purpose. > > > > Yes, I explicitly did not convert them on purpose. As mentioned above we > > need phandle in BE for the ibm,update-properties rtas work area. > > Similarly, drc_index needs to be in BE for the ibm,configure-connector > > rtas work area. Outside, of that we do no other manipulation of those > > values. > > > >> > >> And having read on, I'm assuming the answer is yes since this > >> observation is true for your changes which affect: > >>delete_dt_node() > >>update_dt_node() > >> add_dt_node() > >> Worth noting that you didn't change the definition of delete_dt_node() > > > > You are correct. Oversight. I will fix that as it should generate a > > sparse complaint. > > > > -Tyrel > > > >> > >> I'll have a look once you address the non compiling in patch 1/3 (I'm > >> getting blocked the unused var because somehow Werror is on, odd it > >> didn't trip you up) but I also suspect this will have sparse go a bit > >> nuts. > >> I wonder if there is a nice way of shutting sparse up. > >> > >>> struct update_props_workarea *upwa; > >>> struct device_node *dn; > >>> @@ -136,6 +136,7 @@ static int update_dt_node(u32 phandle, s32 scope) > >>>
Re: [PATCH v2 3/5] crypto: talitos: Fix off-by-one and use all hardware slots
On Tue, 3 Mar 2015 08:21:35 -0500 Martin Hicks wrote: > The submission count was off by one. > > Signed-off-by: Martin Hicks > --- sadly, this directly contradicts: commit 4b24ea971a93f5d0bec34bf7bfd0939f70cfaae6 Author: Vishnu Suresh Date: Mon Oct 20 21:06:18 2008 +0800 crypto: talitos - Preempt overflow interrupts off-by-one fix My guess is your request submission pattern differs from that of Vishnu's (probably IPSec and/or tcrypt), or later h/w versions have gotten better about dealing with channel near-overflow conditions. Either way, I'd prefer we not do this: it might break others, and I'm guessing doesn't improve performance _that_ much? If it does, we could risk it and restrict it to SEC versions 3.3 and above maybe? Not sure what to do here exactly, barring digging up and old 2.x SEC and testing. Kim p.s. I checked, Vishnu isn't with Freescale anymore, so I can't cc him. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling
On Tue, 3 Mar 2015 08:21:37 -0500 Martin Hicks wrote: > @@ -1170,6 +1237,8 @@ static struct talitos_edesc *talitos_edesc_alloc(struct > device *dev, >edesc->dma_len, >DMA_BIDIRECTIONAL); > edesc->req.desc = &edesc->desc; > + /* A copy of the crypto_async_request to use the crypto_queue backlog */ > + memcpy(&edesc->req.base, areq, sizeof(struct crypto_async_request)); this seems backward, or, at least can be done more efficiently IMO: talitos_cra_init should set the tfm's reqsize so the rest of the driver can wholly embed its talitos_edesc (which should also wholly encapsulate its talitos_request (i.e., not via a pointer)) into the crypto API's request handle allocation. This would absorb and eliminate the talitos_edesc kmalloc and frees, the above memcpy, and replace the container_of after the crypto_dequeue_request with an offset_of, right? When scatter-gather buffers are needed, we can assume a slower-path and make them do their own allocations, since their sizes vary depending on each request. Of course, a pointer to those allocations would need to be retained somewhere in the request handle. Only potential problem is getting the crypto API to set the GFP_DMA flag in the allocation request, but presumably a CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that. Kim ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update
On 03/02/2015 01:49 PM, Tyrel Datwyler wrote: > On 03/01/2015 09:20 PM, Cyril Bur wrote: >> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote: >>> We currently use the device tree update code in the kernel after resuming >>> from a suspend operation to re-sync the kernels view of the device tree with >>> that of the hypervisor. The code as it stands is not endian safe as it >>> relies >>> on parsing buffers returned by RTAS calls that thusly contains data in big >>> endian format. >>> >>> This patch annotates variables and structure members with __be types as well >>> as performing necessary byte swaps to cpu endian for data that needs to be >>> parsed. >>> >>> Signed-off-by: Tyrel Datwyler >>> --- >>> arch/powerpc/platforms/pseries/mobility.c | 36 >>> --- >>> 1 file changed, 19 insertions(+), 17 deletions(-) >>> >>> diff --git a/arch/powerpc/platforms/pseries/mobility.c >>> b/arch/powerpc/platforms/pseries/mobility.c >>> index 29e4f04..0b1f70e 100644 >>> --- a/arch/powerpc/platforms/pseries/mobility.c >>> +++ b/arch/powerpc/platforms/pseries/mobility.c >>> @@ -25,10 +25,10 @@ >>> static struct kobject *mobility_kobj; >>> >>> struct update_props_workarea { >>> - u32 phandle; >>> - u32 state; >>> - u64 reserved; >>> - u32 nprops; >>> + __be32 phandle; >>> + __be32 state; >>> + __be64 reserved; >>> + __be32 nprops; >>> } __packed; >>> >>> #define NODE_ACTION_MASK 0xff00 >>> @@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, >>> struct property **prop, >>> return 0; >>> } >>> >>> -static int update_dt_node(u32 phandle, s32 scope) >>> +static int update_dt_node(__be32 phandle, s32 scope) >>> { >> >> On line 153 of this function: >>dn = of_find_node_by_phandle(phandle); >> >> You're passing a __be32 to device tree code, if we can treat the phandle >> as a opaque value returned to us from the rtas call and pass it around >> like that then all good. After digging deeper the device_node->phandle is stored in cpu endian under the covers. So, for the of_find_node_by_phandle() we do need to convert the phandle to cpu endian first. It appears I got lucky with the update fixing the observed RMC issue because the phandle for the root node seems to always be 0x. -Tyrel > > Yes, of_find_node_by_phandle directly compares phandle passed in against > the handle stored in each device_node when searching for a matching > node. Since, the device tree is big endian it follows that the big > endian phandle received in the rtas buffer needs no conversion. > > Further, we need to pass the phandle to ibm,update-properties in the > work area which is also required to be big endian. So, again it seemed > that converting to cpu endian was a waste of effort just to convert it > back to big endian. > >> Its also hard to be sure if these need to be BE and have always been >> that way because we've always run BE so they've never actually wanted >> CPU endian its just that CPU endian has always been BE (I think I >> started rambling...) >> >> Just want to check that *not* converting them is done on purpose. > > Yes, I explicitly did not convert them on purpose. As mentioned above we > need phandle in BE for the ibm,update-properties rtas work area. > Similarly, drc_index needs to be in BE for the ibm,configure-connector > rtas work area. Outside, of that we do no other manipulation of those > values. > >> >> And having read on, I'm assuming the answer is yes since this >> observation is true for your changes which affect: >> delete_dt_node() >> update_dt_node() >> add_dt_node() >> Worth noting that you didn't change the definition of delete_dt_node() > > You are correct. Oversight. I will fix that as it should generate a > sparse complaint. > > -Tyrel > >> >> I'll have a look once you address the non compiling in patch 1/3 (I'm >> getting blocked the unused var because somehow Werror is on, odd it >> didn't trip you up) but I also suspect this will have sparse go a bit >> nuts. >> I wonder if there is a nice way of shutting sparse up. >> >>> struct update_props_workarea *upwa; >>> struct device_node *dn; >>> @@ -136,6 +136,7 @@ static int update_dt_node(u32 phandle, s32 scope) >>> char *prop_data; >>> char *rtas_buf; >>> int update_properties_token; >>> + u32 nprops; >>> u32 vd; >>> >>> update_properties_token = rtas_token("ibm,update-properties"); >>> @@ -162,6 +163,7 @@ static int update_dt_node(u32 phandle, s32 scope) >>> break; >>> >>> prop_data = rtas_buf + sizeof(*upwa); >>> + nprops = be32_to_cpu(upwa->nprops); >>> >>> /* On the first call to ibm,update-properties for a node the >>> * the first property value descriptor contains an empty >>> @@ -170,17 +172,17 @@ static int update_dt_node(u32 phandle, s32 scope) >>> */ >>> if (*prop_data == 0) { >>>
Re: [PATCH 3/3] powerpc/pseries: Expose post-migration in kernel device tree update to drmgr
On 03/02/2015 10:24 PM, Michael Ellerman wrote: > On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote: >> Traditionally after a migration operation drmgr has coordinated the device >> tree >> update with the kernel in userspace via the ugly /proc/ppc64/ofdt interface. >> This >> can be better done fully in the kernel where support already exists. >> Currently, >> drmgr makes a faux ibm,suspend-me RTAS call which we intercept in the kernel >> so >> that we can check VASI state for suspendability. After the LPAR resumes and >> returns to drmgr that is followed by the necessary update-nodes and >> update-properties RTAS calls which are parsed and communitated back to the >> kernel >> through /proc/ppc64/ofdt for the device tree update. The drmgr tool should >> instead initiate the migration using the already existing >> /sysfs/kernel/mobility/migration entry that performs all this work in the >> kernel. >> >> This patch adds a show function to the sysfs "migration" attribute that >> returns >> 1 to indicate the kernel will perform the device tree update after a >> migration >> operation and that drmgr should initiated the migration through the sysfs >> "migration" attribute. > > I don't understand why we need this? > > Can't drmgr just check if /sysfs/kernel/mobility/migration exists, and if so > it > knows it should use it and that the kernel will handle the whole procedure? The problem is that this sysfs entry was originally added with the remainder of the in kernel device tree update code in 2.6.37, but drmgr was never modified to use it. By the time I started looking at the in-kernel device tree code I found it very broken. I had bunch of fixes to get it working that went into 3.12. So, if somebody were to use a newer version of drmgr that simply checks for the existence of the migration sysfs entry on a pre-3.12 kernel their device-tree update experience is going to be sub-par. The approach taken here is identical to what was done in 9da3489 when we hooked the device tree update code into the suspend code. However, in that case we were already using the sysfs entry to trigger the suspend and legitimately needed a way to tell drmgr the kernel was now taking care of updating the device tree. Here we are really just trying to inform drmgr that it is running on a new enough kernel that the kernel device tree code actually works properly. Now, I don't really care for this approach, but the only other thought I had was to change the sysfs entry from "migration" to "migrate". -Tyrel > > cheers > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v1 2/2] sata_dwc_460ex: re-use hsdev->dev instead of dwc_dev
This patch re-uses hsdev->dev which is allocated on heap. Therefore, the private structure, which is global variable, is reduced by one field. In one case ap->dev is used and there it seems to be right decision. Signed-off-by: Andy Shevchenko --- drivers/ata/sata_dwc_460ex.c | 23 +++ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/drivers/ata/sata_dwc_460ex.c b/drivers/ata/sata_dwc_460ex.c index 08cd63f..5ab4849 100644 --- a/drivers/ata/sata_dwc_460ex.c +++ b/drivers/ata/sata_dwc_460ex.c @@ -194,7 +194,6 @@ struct sata_dwc_host_priv { void__iomem *scr_addr_sstatus; u32 sata_dwc_sactive_issued ; u32 sata_dwc_sactive_queued ; - struct device *dwc_dev; }; static struct sata_dwc_host_priv host_pvt; @@ -252,16 +251,16 @@ static const char *get_dma_dir_descript(int dma_dir) } } -static void sata_dwc_tf_dump(struct ata_taskfile *tf) +static void sata_dwc_tf_dump(struct ata_port *ap, struct ata_taskfile *tf) { - dev_vdbg(host_pvt.dwc_dev, + dev_vdbg(ap->dev, "taskfile cmd: 0x%02x protocol: %s flags: 0x%lx device: %x\n", tf->command, get_prot_descript(tf->protocol), tf->flags, tf->device); - dev_vdbg(host_pvt.dwc_dev, + dev_vdbg(ap->dev, "feature: 0x%02x nsect: 0x%x lbal: 0x%x lbam: 0x%x lbah: 0x%x\n", tf->feature, tf->nsect, tf->lbal, tf->lbam, tf->lbah); - dev_vdbg(host_pvt.dwc_dev, + dev_vdbg(ap->dev, "hob_feature: 0x%02x hob_nsect: 0x%x hob_lbal: 0x%x hob_lbam: 0x%x hob_lbah: 0x%x\n", tf->hob_feature, tf->hob_nsect, tf->hob_lbal, tf->hob_lbam, tf->hob_lbah); @@ -337,7 +336,7 @@ static struct dma_async_tx_descriptor *dma_dwc_xfer_setup(struct ata_queued_cmd desc->callback = dma_dwc_xfer_done; desc->callback_param = hsdev; - dev_dbg(host_pvt.dwc_dev, "%s sg: 0x%p, count: %d addr: %pad\n", + dev_dbg(hsdev->dev, "%s sg: 0x%p, count: %d addr: %pad\n", __func__, qc->sg, qc->n_elem, &addr); return desc; @@ -687,7 +686,7 @@ static void sata_dwc_clear_dmacr(struct sata_dwc_device_port *hsdevp, u8 tag) * This should not happen, it indicates the driver is out of * sync. If it does happen, clear dmacr anyway. */ - dev_err(host_pvt.dwc_dev, + dev_err(hsdev->dev, "%s DMA protocol RX and TX DMA not pending tag=0x%02x pending=%d dmacr: 0x%08x\n", __func__, tag, hsdevp->dma_pending[tag], in_le32(&hsdev->sata_dwc_regs->dmacr)); @@ -779,7 +778,7 @@ static void sata_dwc_enable_interrupts(struct sata_dwc_device *hsdev) */ out_le32(&hsdev->sata_dwc_regs->errmr, SATA_DWC_SERROR_ERR_BITS); - dev_dbg(host_pvt.dwc_dev, "%s: INTMR = 0x%08x, ERRMR = 0x%08x\n", + dev_dbg(hsdev->dev, "%s: INTMR = 0x%08x, ERRMR = 0x%08x\n", __func__, in_le32(&hsdev->sata_dwc_regs->intmr), in_le32(&hsdev->sata_dwc_regs->errmr)); } @@ -855,7 +854,7 @@ static int sata_dwc_port_start(struct ata_port *ap) hsdevp->hsdev = hsdev; hsdevp->dws = &sata_dwc_dma_dws; - hsdevp->dws->dma_dev = host_pvt.dwc_dev; + hsdevp->dws->dma_dev = hsdev->dev; dma_cap_zero(mask); dma_cap_set(DMA_SLAVE, mask); @@ -863,7 +862,7 @@ static int sata_dwc_port_start(struct ata_port *ap) /* Acquire DMA channel */ hsdevp->chan = dma_request_channel(mask, sata_dwc_dma_filter, hsdevp); if (!hsdevp->chan) { - dev_err(host_pvt.dwc_dev, "%s: dma channel unavailable\n", + dev_err(hsdev->dev, "%s: dma channel unavailable\n", __func__); err = -EAGAIN; goto CLEANUP_ALLOC; @@ -990,7 +989,7 @@ static void sata_dwc_bmdma_start_by_tag(struct ata_queued_cmd *qc, u8 tag) "%s qc=%p tag: %x cmd: 0x%02x dma_dir: %s start_dma? %x\n", __func__, qc, tag, qc->tf.command, get_dma_dir_descript(qc->dma_dir), start_dma); - sata_dwc_tf_dump(&(qc->tf)); + sata_dwc_tf_dump(ap, &qc->tf); if (start_dma) { reg = core_scr_read(SCR_ERROR); @@ -1244,7 +1243,7 @@ static int sata_dwc_probe(struct platform_device *ofdev) } /* Save dev for later use in dev_xxx() routines */ - host_pvt.dwc_dev = &ofdev->dev; + hsdev->dev = &ofdev->dev; hsdev->dma->dev = &ofdev->dev; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code
On 03/02/2015 10:10 PM, Michael Ellerman wrote: > On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote: >> This patchset simplifies the usage of rtas_ibm_suspend_me() by removing an >> extraneous function parameter, fixes device tree updating on little endian >> platforms, and adds a mechanism for informing drmgr that the kernel is >> cabable of >> performing the whole migration including device tree update itself. >> >> Tyrel Datwyler (3): >> powerpc/pseries: Simplify check for suspendability during >> suspend/migration >> powerpc/pseries: Little endian fixes for post mobility device tree >> update >> powerpc/pseries: Expose post-migration in kernel device tree update >> to drmgr > > Hi Tyrel, > > Firstly let me say how much I hate this code, so thanks for working on it :) I did it once. Might as well sacrifice my sanity a second time. :) > > But I need you to split this series, into 1) fixes for 4.0 (and stable?), and > 2) the rest. > > I *think* that would be patch 2, and then patches 1 & 3, but I don't want to > guess. So please resend. Sure. Your split seems correct as patch 2 is fixes while 1 and 3 are cosmetic/new feature. Seeing as patch 1 is endian fixes I'll Cc -stable as well. -Tyrel > > cheers > > > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v1 0/2] sata_dwc_460ex: move to generic DMA driver
The SATA implementation based on two actually different devices, i.e. SATA and DMA controllers. For Synopsys DesignWare DMA we have already a generic implementation of the driver. Thus, the patch 1/2 converts the code to use DMAEngine framework and dw_dmac driver. In future it will be better to split the devices inside DTS as well like it's done on other platforms and remove hardcoded parameters of DMA controller. Besides it's a nice clean up it removes a lot of warnings produced by the original code, that pissed off even Linus [1]. Though, this series doesn't re-enable COMPILE_TEST for this module. The driver is compile tested only on x86. So, it would be nice if anyone who has either AMCC 460EX Canyonlands board or similar SATA controller in possession can test this. [1] http://www.spinics.net/lists/linux-ide/msg50334.html Andy Shevchenko (2): sata_dwc_460ex: move to generic DMA driver sata_dwc_460ex: re-use hsdev->dev instead of dwc_dev drivers/ata/sata_dwc_460ex.c | 753 --- 1 file changed, 130 insertions(+), 623 deletions(-) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v1 1/2] sata_dwc_460ex: move to generic DMA driver
The SATA implementation based on two actually different devices, i.e. SATA and DMA controllers. For Synopsys DesignWare DMA we have already a generic implementation of the driver. Thus, the patch converts the code to use DMAEngine framework and dw_dmac driver. In future it will be better to split the devices inside DTS as well like it's done on other platforms. Signed-off-by: Andy Shevchenko --- drivers/ata/sata_dwc_460ex.c | 736 +++ 1 file changed, 122 insertions(+), 614 deletions(-) diff --git a/drivers/ata/sata_dwc_460ex.c b/drivers/ata/sata_dwc_460ex.c index 7bc0c12..08cd63f 100644 --- a/drivers/ata/sata_dwc_460ex.c +++ b/drivers/ata/sata_dwc_460ex.c @@ -36,11 +36,16 @@ #include #include #include + #include "libata.h" #include #include +/* Supported DMA engine drivers */ +#include +#include + /* These two are defined in "libata.h" */ #undef DRV_NAME #undef DRV_VERSION @@ -60,153 +65,9 @@ #define NO_IRQ 0 #endif -/* SATA DMA driver Globals */ -#define DMA_NUM_CHANS 1 -#define DMA_NUM_CHAN_REGS 8 - -/* SATA DMA Register definitions */ #define AHB_DMA_BRST_DFLT 64 /* 16 data items burst length*/ -struct dmareg { - u32 low;/* Low bits 0-31 */ - u32 high; /* High bits 32-63 */ -}; - -/* DMA Per Channel registers */ -struct dma_chan_regs { - struct dmareg sar; /* Source Address */ - struct dmareg dar; /* Destination address */ - struct dmareg llp; /* Linked List Pointer */ - struct dmareg ctl; /* Control */ - struct dmareg sstat;/* Source Status not implemented in core */ - struct dmareg dstat;/* Destination Status not implemented in core*/ - struct dmareg sstatar; /* Source Status Address not impl in core */ - struct dmareg dstatar; /* Destination Status Address not implemente */ - struct dmareg cfg; /* Config */ - struct dmareg sgr; /* Source Gather */ - struct dmareg dsr; /* Destination Scatter */ -}; - -/* Generic Interrupt Registers */ -struct dma_interrupt_regs { - struct dmareg tfr; /* Transfer Interrupt */ - struct dmareg block;/* Block Interrupt */ - struct dmareg srctran; /* Source Transfer Interrupt */ - struct dmareg dsttran; /* Dest Transfer Interrupt */ - struct dmareg error;/* Error */ -}; - -struct ahb_dma_regs { - struct dma_chan_regschan_regs[DMA_NUM_CHAN_REGS]; - struct dma_interrupt_regs interrupt_raw;/* Raw Interrupt */ - struct dma_interrupt_regs interrupt_status; /* Interrupt Status */ - struct dma_interrupt_regs interrupt_mask; /* Interrupt Mask */ - struct dma_interrupt_regs interrupt_clear; /* Interrupt Clear */ - struct dmareg statusInt; /* Interrupt combined*/ - struct dmareg rq_srcreg; /* Src Trans Req */ - struct dmareg rq_dstreg; /* Dst Trans Req */ - struct dmareg rq_sgl_srcreg; /* Sngl Src Trans Req*/ - struct dmareg rq_sgl_dstreg; /* Sngl Dst Trans Req*/ - struct dmareg rq_lst_srcreg; /* Last Src Trans Req*/ - struct dmareg rq_lst_dstreg; /* Last Dst Trans Req*/ - struct dmareg dma_cfg;/* DMA Config */ - struct dmareg dma_chan_en;/* DMA Channel Enable*/ - struct dmareg dma_id; /* DMA ID */ - struct dmareg dma_test; /* DMA Test */ - struct dmareg res1; /* reserved */ - struct dmareg res2; /* reserved */ - /* -* DMA Comp Params -* Param 6 = dma_param[0], Param 5 = dma_param[1], -* Param 4 = dma_param[2] ... -*/ - struct dmareg dma_params[6]; -}; - -/* Data structure for linked list item */ -struct lli { - u32 sar;/* Source Address */ - u32 dar;/* Destination address */ - u32 llp;/* Linked List Pointer */ - struct dmareg ctl;/* Control */ - struct dmareg dstat; /* Destination Status */ -}; - -enum { - SATA_DWC_DMAC_LLI_SZ = (sizeof(struct lli)), - SATA_DWC_DMAC_LLI_NUM = 256, - SATA_DWC_DMAC_LLI_TBL_SZ = (SATA_DWC_DMAC_LLI_SZ * \ - SATA_DWC_DMAC_LLI_NUM), - SATA_DWC_DMAC_TWIDTH_BYTES = 4, - SATA_DWC_DMAC_CTRL_TSIZE_MAX = (0x0800 * \ - SATA_DWC_DMAC_TWIDTH_BYTES), -}; - -/* DMA Register Operation Bits */ enum { - DMA_EN = 0x0001, /* Enable AHB DMA */ - DMA_CTL_LLP_SRCEN = 0x1000, /* Blk chain enable Src */ - DMA_CTL_LLP_DSTEN = 0x0800, /* Blk chain enable Dst */
Re: [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration
On 03/02/2015 10:15 PM, Michael Ellerman wrote: > On Mon, 2015-03-02 at 13:30 -0800, Tyrel Datwyler wrote: >> On 03/01/2015 08:19 PM, Cyril Bur wrote: >>> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote: During suspend/migration operation we must wait for the VASI state reported by the hypervisor to become Suspending prior to making the ibm,suspend-me RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state variable that exposes the VASI state to the caller. This is unnecessary as the caller only really cares about the following three conditions; if there is an error we should bailout, success indicating we have suspended and woken back up so proceed to device tree updated, or we are not suspendable yet so try calling rtas_ibm_suspend_me again shortly. This patch removes the extraneous vasi_state variable and simply uses the return code to communicate how to proceed. We either succeed, fail, or get -EAGAIN in which case we sleep for a second before trying to call rtas_ibm_suspend_me again. u64 handle = ((u64)be32_to_cpu(args.args[0]) << 32) | be32_to_cpu(args.args[1]); - rc = rtas_ibm_suspend_me(handle, &vasi_rc); - args.rets[0] = cpu_to_be32(vasi_rc); - if (rc) + rc = rtas_ibm_suspend_me(handle); + if (rc == -EAGAIN) + args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE); >>> >>> (continuing on...) so perhaps here have >>> rc = 0; >>> else if (rc == -EIO) >>> args.rets[0] = cpu_to_be32(-1); >>> rc = 0; >>> Which should keep the original behaviour, the last thing we want to do >>> is break BE. >> >> The biggest problem here is we are making what basically equates to a >> fake rtas call from drmgr which we intercept in ppc_rtas(). From there >> we make this special call to rtas_ibm_suspend_me() to check VASI state >> and do a bunch of other specialized work that needs to be setup prior to >> making the actual ibm,suspend-me rtas call. Since, we are cheating PAPR >> here I guess we can really handle it however we want. I chose to simply >> fail the rtas call in the case where rtas_ibm_suspend_me() fails with >> something other than -EAGAIN. In user space librtas will log errno for >> the failure and return RTAS_IO_ASSERT to drmgr which in turn will log >> that error and fail. > > We don't want to change the return values of the syscall unless we absolutely > have to. And I don't think that's the case here. I'd like to argue that the one case I changed makes sense, but its just as easy to keep the original behavior. > > Sure we think drmgr is the only thing that uses this crap, but we don't know > for sure. I can't imagine how anybody else could possibly use this hack without a streamid from the hmc/hypervisor, but I've been wrong in the past more times than I can count. :) -Tyrel > > cheers > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3] ibmveth: Add function to enable live MAC address changes
From: Thomas Falcon Date: Mon, 2 Mar 2015 11:56:12 -0600 > Add a function that will enable changing the MAC address > of an ibmveth interface while it is still running. > > Signed-off-by: Thomas Falcon Applied, thanks. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] spi: fsl-spi: use of_iomap() to map parameter ram on CPM1
On Thu, Feb 26, 2015 at 05:11:42PM +0100, Christophe Leroy wrote: > On CPM2, the SPI parameter RAM is dynamically allocated in the dualport RAM > whereas in CPM1, it is statically allocated to a default address with > capability to relocate it somewhere else via the use of CPM micropatch. > The address of the parameter RAM is given by the boot loader and expected > to be mapped via of_iomap() Why are we using of_iomap() rather than a generic I/O mapping function here? signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 0/5] split ET_DYN ASLR from mmap ASLR
On Mon, Mar 2, 2015 at 11:31 PM, Ingo Molnar wrote: > > * Kees Cook wrote: > >> To address the "offset2lib" ASLR weakness[1], this separates ET_DYN >> ASLR from mmap ASLR, as already done on s390. The architectures >> that are already randomizing mmap (arm, arm64, mips, powerpc, s390, >> and x86), have their various forms of arch_mmap_rnd() made available >> via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these architectures, >> arch_randomize_brk() is collapsed as well. >> >> This is an alternative to the solutions in: >> https://lkml.org/lkml/2015/2/23/442 > > Looks good so far: > > Reviewed-by: Ingo Molnar > > While reviewing this series I also noticed that the following code > could be factored out from architecture mmap code as well: > > - arch_pick_mmap_layout() uses very similar patterns across the > platforms, with only few variations. Many architectures use > the same duplicated mmap_is_legacy() helper as well. There's > usually just trivial differences between mmap_legacy_base() > approaches as well. I was nervous to start refactoring this code, but it's true: most of it is the same. > - arch_mmap_rnd(): the PF_RANDOMIZE checks are needlessly > exposed to the arch routine - the arch routine should only > concentrate on arch details, not generic flags like > PF_RANDOMIZE. Yeah, excellent point. I will send a follow-up patch to move this into binfmt_elf instead. I'd like to avoid removing it in any of the other patches since each was attempting a single step in the refactoring. > In theory the mmap layout could be fully parametrized as well: i.e. no > callback functions to architectures by default at all: just > declarations of bits of randomization desired (or, available address > space bits), and perhaps an arch helper to allow 32-bit vs. 64-bit > address space distinctions. Yeah, I was considering that too, since each architecture has a nearly identical arch_mmap_rnd() at this point. Only the size of the entropy was changing. > 'Weird' architectures could provide special routines, but only by > overriding the default behavior, which should be generic, safe and > robust. Yeah, quite true. Should entropy size be a #define like ELF_ET_DYN_BASE? Something like ASLR_MMAP_ENTROPY and ASLR_MMAP_ENTROPY_32? Is there a common function for determining a compat task? That seemed to be per-arch too. Maybe arch_mmap_entropy()? -Kees -- Kees Cook Chrome OS Security ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] seccomp: switch to using asm-generic for seccomp.h
On Tue, Mar 3, 2015 at 12:30 AM, Ingo Molnar wrote: > > * Kees Cook wrote: > >> Most architectures don't need to do anything special for the strict >> seccomp syscall entries. Remove the redundant headers and reduce the >> others. > >> 19 files changed, 27 insertions(+), 137 deletions(-) > > Lovely cleanup factor. > > Just to make sure, are you sure the 32-bit details are identical > across architectures? I did "gcc -E -dM" style output comparisons on the architectures I had compilers for, and the buildbot hasn't complained on any of the others (though see the bottom of this email). > > For example some architectures did this: > >> --- a/arch/microblaze/include/asm/seccomp.h >> +++ /dev/null >> @@ -1,16 +0,0 @@ >> -#ifndef _ASM_MICROBLAZE_SECCOMP_H >> -#define _ASM_MICROBLAZE_SECCOMP_H >> - >> -#include >> - >> -#define __NR_seccomp_read__NR_read >> -#define __NR_seccomp_write __NR_write >> -#define __NR_seccomp_exit__NR_exit >> -#define __NR_seccomp_sigreturn __NR_sigreturn >> - >> -#define __NR_seccomp_read_32 __NR_read >> -#define __NR_seccomp_write_32__NR_write >> -#define __NR_seccomp_exit_32 __NR_exit >> -#define __NR_seccomp_sigreturn_32__NR_sigreturn The asm-generic uses the same syscall numbers from both 64 and 32, which matches most architectures, and those are the ones that had their seccomp.h entirely eliminated. > others did this: > >> diff --git a/arch/x86/include/asm/seccomp_64.h >> b/arch/x86/include/asm/seccomp_64.h >> deleted file mode 100644 >> index 84ec1bd161a5.. >> --- a/arch/x86/include/asm/seccomp_64.h >> +++ /dev/null >> @@ -1,17 +0,0 @@ >> -#ifndef _ASM_X86_SECCOMP_64_H >> -#define _ASM_X86_SECCOMP_64_H >> - >> -#include >> -#include >> - >> -#define __NR_seccomp_read __NR_read >> -#define __NR_seccomp_write __NR_write >> -#define __NR_seccomp_exit __NR_exit >> -#define __NR_seccomp_sigreturn __NR_rt_sigreturn >> - >> -#define __NR_seccomp_read_32 __NR_ia32_read >> -#define __NR_seccomp_write_32 __NR_ia32_write >> -#define __NR_seccomp_exit_32 __NR_ia32_exit >> -#define __NR_seccomp_sigreturn_32 __NR_ia32_sigreturn >> - >> -#endif /* _ASM_X86_SECCOMP_64_H */ Well, this was x86's split config that was consolidated into the file below: > > While in yet another case you kept the syscall mappings: > >> --- a/arch/x86/include/asm/seccomp.h >> +++ b/arch/x86/include/asm/seccomp.h >> @@ -1,5 +1,20 @@ >> +#ifndef _ASM_X86_SECCOMP_H >> +#define _ASM_X86_SECCOMP_H >> + >> +#include >> + >> +#ifdef CONFIG_COMPAT >> +#include >> +#define __NR_seccomp_read_32 __NR_ia32_read >> +#define __NR_seccomp_write_32__NR_ia32_write >> +#define __NR_seccomp_exit_32 __NR_ia32_exit >> +#define __NR_seccomp_sigreturn_32__NR_ia32_sigreturn >> +#endif >> + >> #ifdef CONFIG_X86_32 >> -# include >> -#else >> -# include >> +#define __NR_seccomp_sigreturn __NR_sigreturn >> #endif >> + >> +#include >> + >> +#endif /* _ASM_X86_SECCOMP_H */ > > It might all be correct, but it's not obvious to me. The x86 change was the most complex as it removed a seccomp_32. and seccomp_64.h file and merged into a single asm/seccomp.h to provide overrides for the _32 #defines. However, in looking at it now... I see some flip/flopping of __NR_sigreturn and __NR_rt_sigreturn between some of the architectures. Let me study that and send a v3. I think there are some accidental changes on microblaze and powerpc. -Kees -- Kees Cook Chrome OS Security ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode
On Tue, Mar 3, 2015 at 10:44 AM, Horia Geantă wrote: > On 3/3/2015 12:09 AM, Martin Hicks wrote: >> >> On Mon, Mar 02, 2015 at 03:37:28PM +0100, Milan Broz wrote: >>> >>> If crypto API allows to encrypt more sectors in one run >>> (handling IV internally) dmcrypt can be modified of course. >>> >>> But do not forget we can use another IV (not only sequential number) >>> e.g. ESSIV with XTS as well (even if it doesn't make much sense, some people >>> are using it). >> >> Interesting, I'd not considered using XTS with an IV other than plain/64. >> The talitos hardware would not support aes/xts in any mode other than >> plain/plain64 I don't think...Although perhaps you could push in an 8-byte >> IV and the hardware would interpret it as the sector #. >> > > For talitos, there are two cases: > > 1. request data size is <= data unit / sector size > talitos can handle any IV / tweak scheme > > 2. request data size > sector size > since talitos internally generates the IV for the next sector by > incrementing the previous IV, only IV schemes that allocate consecutive > IV to consecutive sectors will function correctly. > it's not clear to me that #1 is right. I guess it could be, but the IV length would be limited to 8 bytes. This also points out that claiming that the XTS IV size is 16 bytes, as my current patch does, could be problematic. It's handy because the first 8 bytes should contain a plain64 sector #, and the second u64 can be used to encode the sector size but it would be a mistake for someone to use the second 8 bytes for the rest of a 16byte IV. mh -- Martin Hicks P.Eng. | m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode
On 3/3/2015 12:09 AM, Martin Hicks wrote: > > On Mon, Mar 02, 2015 at 03:37:28PM +0100, Milan Broz wrote: >> >> If crypto API allows to encrypt more sectors in one run >> (handling IV internally) dmcrypt can be modified of course. >> >> But do not forget we can use another IV (not only sequential number) >> e.g. ESSIV with XTS as well (even if it doesn't make much sense, some people >> are using it). > > Interesting, I'd not considered using XTS with an IV other than plain/64. > The talitos hardware would not support aes/xts in any mode other than > plain/plain64 I don't think...Although perhaps you could push in an 8-byte > IV and the hardware would interpret it as the sector #. > For talitos, there are two cases: 1. request data size is <= data unit / sector size talitos can handle any IV / tweak scheme 2. request data size > sector size since talitos internally generates the IV for the next sector by incrementing the previous IV, only IV schemes that allocate consecutive IV to consecutive sectors will function correctly. Let's not forget what XTS standard says about IVs / tweak values: - each data unit (sector in this case) is assigned a non-negative tweak value and - tweak values are assigned *consecutively*, starting from an arbitrary non-negative value - there's no requirement for tweak values to be unpredictable Thus, in theory ESSIV is not supposed to be used with XTS mode: the IVs for consecutive sectors are not consecutive values. In practice, as Milan said, the combination is sometimes used. It functions correctly in SW (and also in talitos as long as req. data size <= sector size). >> Maybe the following question would be if the dmcrypt sector IV algorithms >> should moved into crypto API as well. >> (But because I misused dmcrypt IVs hooks for some additional operations >> for loopAES and old Truecrypt CBC mode, it is not so simple...) > > Speaking again with talitos in mind, there would be no advantage for this > hardware. Although larger requests are possible only a single IV can be > provided per request, so for algorithms like AES-CBC and dm-crypt 512byte IOs > are the only option (short of switching to 4kB block size). Right, as explained above talitos does what the XTS mode standard mandates. So it won't work properly in case of cbc-aes:essiv with request sizes larger than sector size. Still, in SW at least, XTS could be improved to process more sectors in one shot, regardless of the IV scheme used - as long as there's a IV.next() function and both data size and sector size are known. Horia ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 5/5] crypto: talitos: Add software backlog queue handling
I was running into situations where the hardware FIFO was filling up, and the code was returning EAGAIN to dm-crypt and just dropping the submitted crypto request. This adds support in talitos for a software backlog queue. When requests can't be queued to the hardware immediately EBUSY is returned. The queued requests are dispatched to the hardware in received order as hardware FIFO slots become available. Signed-off-by: Martin Hicks --- drivers/crypto/talitos.c | 135 -- drivers/crypto/talitos.h |3 ++ 2 files changed, 110 insertions(+), 28 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index b0c85ce..bb7fba0 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -182,55 +182,118 @@ static int init_device(struct device *dev) return 0; } -/** - * talitos_submit - submits a descriptor to the device for processing - * @dev: the SEC device to be used - * @ch:the SEC device channel to be used - * @edesc: the descriptor to be processed by the device - * - * desc must contain valid dma-mapped (bus physical) address pointers. - * callback must check err and feedback in descriptor header - * for device processing status. - */ -int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc) +/* Dispatch 'request' if provided, otherwise a backlogged request */ +static int __talitos_handle_queue(struct device *dev, int ch, + struct talitos_edesc *edesc, + unsigned long *irq_flags) { struct talitos_private *priv = dev_get_drvdata(dev); - struct talitos_request *request = &edesc->req; - unsigned long flags; + struct talitos_request *request; + struct crypto_async_request *areq; int head; + int ret = -EINPROGRESS; - spin_lock_irqsave(&priv->chan[ch].head_lock, flags); - - if (!atomic_inc_not_zero(&priv->chan[ch].submit_count)) { + if (!atomic_inc_not_zero(&priv->chan[ch].submit_count)) /* h/w fifo is full */ - spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags); - return -EAGAIN; + return -EBUSY; + + if (!edesc) { + /* Dequeue the oldest request */ + areq = crypto_dequeue_request(&priv->chan[ch].queue); + request = container_of(areq, struct talitos_request, base); + } else { + request = &edesc->req; } - head = priv->chan[ch].head; request->dma_desc = dma_map_single(dev, request->desc, sizeof(*request->desc), DMA_BIDIRECTIONAL); /* increment fifo head */ + head = priv->chan[ch].head; priv->chan[ch].head = (priv->chan[ch].head + 1) & (priv->fifo_len - 1); - smp_wmb(); - priv->chan[ch].fifo[head] = request; + spin_unlock_irqrestore(&priv->chan[ch].head_lock, *irq_flags); + + /* +* Mark a backlogged request as in-progress. +*/ + if (!edesc) { + areq = request->context; + areq->complete(areq, -EINPROGRESS); + } + + spin_lock_irqsave(&priv->chan[ch].head_lock, *irq_flags); /* GO! */ + priv->chan[ch].fifo[head] = request; wmb(); out_be32(priv->chan[ch].reg + TALITOS_FF, upper_32_bits(request->dma_desc)); out_be32(priv->chan[ch].reg + TALITOS_FF_LO, lower_32_bits(request->dma_desc)); + return ret; +} + +/** + * talitos_submit - performs submissions of a new descriptors + * + * @dev: the SEC device to be used + * @ch:the SEC device channel to be used + * @edesc: the request to be processed by the device + * + * edesc->req must contain valid dma-mapped (bus physical) address pointers. + * callback must check err and feedback in descriptor header + * for device processing status upon completion. + */ +int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc) +{ + struct talitos_private *priv = dev_get_drvdata(dev); + struct talitos_request *request = &edesc->req; + unsigned long flags; + int ret = -EINPROGRESS; + + spin_lock_irqsave(&priv->chan[ch].head_lock, flags); + + if (priv->chan[ch].queue.qlen) { + /* +* There are backlogged requests. Just queue this new request +* and dispatch the oldest backlogged request to the hardware. +*/ + crypto_enqueue_request(&priv->chan[ch].queue, + &request->base); + __talitos_handle_queue(dev, ch, NULL, &flags); + ret = -EBUSY; + } else { + ret = __talitos_handle_queue(dev, ch, edesc, &flags); + if (ret == -EBUSY) +
[PATCH v2 4/5] crypto: talitos: Reorganize request submission data structures
This is preparatory work for moving to using the crypto async queue handling code. A talitos_request structure is buried into each talitos_edesc so that when talitos_submit() is called, everything required to defer the submission to the hardware is contained within talitos_edesc. Signed-off-by: Martin Hicks --- drivers/crypto/talitos.c | 95 +++--- drivers/crypto/talitos.h | 41 +--- 2 files changed, 66 insertions(+), 70 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 7709805..b0c85ce 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -186,22 +186,16 @@ static int init_device(struct device *dev) * talitos_submit - submits a descriptor to the device for processing * @dev: the SEC device to be used * @ch:the SEC device channel to be used - * @desc: the descriptor to be processed by the device - * @callback: whom to call when processing is complete - * @context: a handle for use by caller (optional) + * @edesc: the descriptor to be processed by the device * * desc must contain valid dma-mapped (bus physical) address pointers. * callback must check err and feedback in descriptor header * for device processing status. */ -int talitos_submit(struct device *dev, int ch, struct talitos_desc *desc, - void (*callback)(struct device *dev, - struct talitos_desc *desc, - void *context, int error), - void *context) +int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc) { struct talitos_private *priv = dev_get_drvdata(dev); - struct talitos_request *request; + struct talitos_request *request = &edesc->req; unsigned long flags; int head; @@ -214,19 +208,15 @@ int talitos_submit(struct device *dev, int ch, struct talitos_desc *desc, } head = priv->chan[ch].head; - request = &priv->chan[ch].fifo[head]; - - /* map descriptor and save caller data */ - request->dma_desc = dma_map_single(dev, desc, sizeof(*desc), + request->dma_desc = dma_map_single(dev, request->desc, + sizeof(*request->desc), DMA_BIDIRECTIONAL); - request->callback = callback; - request->context = context; /* increment fifo head */ priv->chan[ch].head = (priv->chan[ch].head + 1) & (priv->fifo_len - 1); smp_wmb(); - request->desc = desc; + priv->chan[ch].fifo[head] = request; /* GO! */ wmb(); @@ -247,15 +237,16 @@ EXPORT_SYMBOL(talitos_submit); static void flush_channel(struct device *dev, int ch, int error, int reset_ch) { struct talitos_private *priv = dev_get_drvdata(dev); - struct talitos_request *request, saved_req; + struct talitos_request *request; unsigned long flags; int tail, status; spin_lock_irqsave(&priv->chan[ch].tail_lock, flags); tail = priv->chan[ch].tail; - while (priv->chan[ch].fifo[tail].desc) { - request = &priv->chan[ch].fifo[tail]; + while (priv->chan[ch].fifo[tail]) { + request = priv->chan[ch].fifo[tail]; + status = 0; /* descriptors with their done bits set don't get the error */ rmb(); @@ -271,14 +262,9 @@ static void flush_channel(struct device *dev, int ch, int error, int reset_ch) sizeof(struct talitos_desc), DMA_BIDIRECTIONAL); - /* copy entries so we can call callback outside lock */ - saved_req.desc = request->desc; - saved_req.callback = request->callback; - saved_req.context = request->context; - /* release request entry in fifo */ smp_wmb(); - request->desc = NULL; + priv->chan[ch].fifo[tail] = NULL; /* increment fifo tail */ priv->chan[ch].tail = (tail + 1) & (priv->fifo_len - 1); @@ -287,8 +273,8 @@ static void flush_channel(struct device *dev, int ch, int error, int reset_ch) atomic_dec(&priv->chan[ch].submit_count); - saved_req.callback(dev, saved_req.desc, saved_req.context, - status); + request->callback(dev, request->desc, request->context, status); + /* channel may resume processing in single desc error case */ if (error && !reset_ch && status == error) return; @@ -352,7 +338,8 @@ static u32 current_desc_hdr(struct device *dev, int ch) tail = priv->chan[ch].tail; iter = tail; - while (priv->chan[ch].fifo[iter].dma_desc != cur_desc) { + while (priv->chan[ch].fifo[iter] && +
[PATCH v2 3/5] crypto: talitos: Fix off-by-one and use all hardware slots
The submission count was off by one. Signed-off-by: Martin Hicks --- drivers/crypto/talitos.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 89cf4d5..7709805 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -2722,8 +2722,7 @@ static int talitos_probe(struct platform_device *ofdev) goto err_out; } - atomic_set(&priv->chan[i].submit_count, - -(priv->chfifo_len - 1)); + atomic_set(&priv->chan[i].submit_count, -priv->chfifo_len); } dma_set_mask(dev, DMA_BIT_MASK(36)); -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 2/5] crypto: talitos: Remove MD5_BLOCK_SIZE
This is properly defined in the md5 header file. Signed-off-by: Martin Hicks --- drivers/crypto/talitos.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index c49d977..89cf4d5 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -637,8 +637,6 @@ static void talitos_unregister_rng(struct device *dev) #define TALITOS_MAX_KEY_SIZE 96 #define TALITOS_MAX_IV_LENGTH 16 /* max of AES_BLOCK_SIZE, DES3_EDE_BLOCK_SIZE */ -#define MD5_BLOCK_SIZE64 - struct talitos_ctx { struct device *dev; int ch; @@ -2195,7 +2193,7 @@ static struct talitos_alg_template driver_algs[] = { .halg.base = { .cra_name = "md5", .cra_driver_name = "md5-talitos", - .cra_blocksize = MD5_BLOCK_SIZE, + .cra_blocksize = MD5_HMAC_BLOCK_SIZE, .cra_flags = CRYPTO_ALG_TYPE_AHASH | CRYPTO_ALG_ASYNC, } @@ -2285,7 +2283,7 @@ static struct talitos_alg_template driver_algs[] = { .halg.base = { .cra_name = "hmac(md5)", .cra_driver_name = "hmac-md5-talitos", - .cra_blocksize = MD5_BLOCK_SIZE, + .cra_blocksize = MD5_HMAC_BLOCK_SIZE, .cra_flags = CRYPTO_ALG_TYPE_AHASH | CRYPTO_ALG_ASYNC, } -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 0/5] crypto: talitos: Add crypto async queue handling
I was testing dm-crypt performance with a Freescale P1022 board with a recent kernel and was getting IO errors while doing testing with LUKS. Investigation showed that all hardware FIFO slots were filling and the driver was returning EAGAIN to the block layer, which is not an expected response for an async crypto implementation. The following patch series adds a few small fixes, and reworks the submission path to use the crypto_queue mechanism to handle the request backlog. Changes since v1: - Ran checkpatch.pl - Split the path for submitting new requests vs. issuing backlogged requests. - Avoid enqueuing a submitted request to the crypto queue unnecessarily. - Fix return paths where CRYPTO_TFM_REQ_MAY_BACKLOG is not set. Martin Hicks (5): crypto: talitos: Simplify per-channel initialization crypto: talitos: Remove MD5_BLOCK_SIZE crypto: talitos: Fix off-by-one and use all hardware slots crypto: talitos: Reorganize request submission data structures crypto: talitos: Add software backlog queue handling drivers/crypto/talitos.c | 240 +++--- drivers/crypto/talitos.h | 44 +++-- 2 files changed, 177 insertions(+), 107 deletions(-) -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 1/5] crypto: talitos: Simplify per-channel initialization
There were multiple loops in a row, for each separate step of the initialization of the channels. Simplify to a single loop. Signed-off-by: Martin Hicks --- drivers/crypto/talitos.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 067ec21..c49d977 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -2706,20 +2706,16 @@ static int talitos_probe(struct platform_device *ofdev) goto err_out; } + priv->fifo_len = roundup_pow_of_two(priv->chfifo_len); + for (i = 0; i < priv->num_channels; i++) { priv->chan[i].reg = priv->reg + TALITOS_CH_STRIDE * (i + 1); if (!priv->irq[1] || !(i & 1)) priv->chan[i].reg += TALITOS_CH_BASE_OFFSET; - } - for (i = 0; i < priv->num_channels; i++) { spin_lock_init(&priv->chan[i].head_lock); spin_lock_init(&priv->chan[i].tail_lock); - } - priv->fifo_len = roundup_pow_of_two(priv->chfifo_len); - - for (i = 0; i < priv->num_channels; i++) { priv->chan[i].fifo = kzalloc(sizeof(struct talitos_request) * priv->fifo_len, GFP_KERNEL); if (!priv->chan[i].fifo) { @@ -2727,11 +2723,10 @@ static int talitos_probe(struct platform_device *ofdev) err = -ENOMEM; goto err_out; } - } - for (i = 0; i < priv->num_channels; i++) atomic_set(&priv->chan[i].submit_count, -(priv->chfifo_len - 1)); + } dma_set_mask(dev, DMA_BIT_MASK(36)); -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] powerpc/pmac: Fix DT refcount imbalance in pmac_pic_probe_oldstyle
Internally, of_find_node_by_name() calls of_node_put() on its "from" parameter, which must not be done on "master", as it's still in use, and will be released manually later. This may cause a zero kref refcount. Call of_node_get() before to compensate for this. Signed-off-by: Geert Uytterhoeven --- Compile-tested only v2: - Avoid a logic change by adding a call to of_node_get() instead of replacing of_find_node_by_name() by of_get_child_by_name(). No-one seems to remember which machines are affected by this. --- arch/powerpc/platforms/powermac/pic.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/platforms/powermac/pic.c b/arch/powerpc/platforms/powermac/pic.c index 4c24bf60d39d2834..59cfc9d63c2d51a7 100644 --- a/arch/powerpc/platforms/powermac/pic.c +++ b/arch/powerpc/platforms/powermac/pic.c @@ -321,6 +321,9 @@ static void __init pmac_pic_probe_oldstyle(void) max_irqs = max_real_irqs = 64; /* We might have a second cascaded heathrow */ + + /* Compensate for of_node_put() in of_find_node_by_name() */ + of_node_get(master); slave = of_find_node_by_name(master, "mac-io"); /* Check ordering of master & slave */ -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] seccomp: switch to using asm-generic for seccomp.h
* Kees Cook wrote: > Most architectures don't need to do anything special for the strict > seccomp syscall entries. Remove the redundant headers and reduce the > others. > 19 files changed, 27 insertions(+), 137 deletions(-) Lovely cleanup factor. Just to make sure, are you sure the 32-bit details are identical across architectures? For example some architectures did this: > --- a/arch/microblaze/include/asm/seccomp.h > +++ /dev/null > @@ -1,16 +0,0 @@ > -#ifndef _ASM_MICROBLAZE_SECCOMP_H > -#define _ASM_MICROBLAZE_SECCOMP_H > - > -#include > - > -#define __NR_seccomp_read__NR_read > -#define __NR_seccomp_write __NR_write > -#define __NR_seccomp_exit__NR_exit > -#define __NR_seccomp_sigreturn __NR_sigreturn > - > -#define __NR_seccomp_read_32 __NR_read > -#define __NR_seccomp_write_32__NR_write > -#define __NR_seccomp_exit_32 __NR_exit > -#define __NR_seccomp_sigreturn_32__NR_sigreturn others did this: > diff --git a/arch/x86/include/asm/seccomp_64.h > b/arch/x86/include/asm/seccomp_64.h > deleted file mode 100644 > index 84ec1bd161a5.. > --- a/arch/x86/include/asm/seccomp_64.h > +++ /dev/null > @@ -1,17 +0,0 @@ > -#ifndef _ASM_X86_SECCOMP_64_H > -#define _ASM_X86_SECCOMP_64_H > - > -#include > -#include > - > -#define __NR_seccomp_read __NR_read > -#define __NR_seccomp_write __NR_write > -#define __NR_seccomp_exit __NR_exit > -#define __NR_seccomp_sigreturn __NR_rt_sigreturn > - > -#define __NR_seccomp_read_32 __NR_ia32_read > -#define __NR_seccomp_write_32 __NR_ia32_write > -#define __NR_seccomp_exit_32 __NR_ia32_exit > -#define __NR_seccomp_sigreturn_32 __NR_ia32_sigreturn > - > -#endif /* _ASM_X86_SECCOMP_64_H */ While in yet another case you kept the syscall mappings: > --- a/arch/x86/include/asm/seccomp.h > +++ b/arch/x86/include/asm/seccomp.h > @@ -1,5 +1,20 @@ > +#ifndef _ASM_X86_SECCOMP_H > +#define _ASM_X86_SECCOMP_H > + > +#include > + > +#ifdef CONFIG_COMPAT > +#include > +#define __NR_seccomp_read_32 __NR_ia32_read > +#define __NR_seccomp_write_32__NR_ia32_write > +#define __NR_seccomp_exit_32 __NR_ia32_exit > +#define __NR_seccomp_sigreturn_32__NR_ia32_sigreturn > +#endif > + > #ifdef CONFIG_X86_32 > -# include > -#else > -# include > +#define __NR_seccomp_sigreturn __NR_sigreturn > #endif > + > +#include > + > +#endif /* _ASM_X86_SECCOMP_H */ It might all be correct, but it's not obvious to me. Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3] ibmveth: Add function to enable live MAC address changes
Mon, Mar 02, 2015 at 06:56:12PM CET, tlfal...@linux.vnet.ibm.com wrote: >Add a function that will enable changing the MAC address >of an ibmveth interface while it is still running. > >Signed-off-by: Thomas Falcon Reviewed-by: Jiri Pirko ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev