RE: [PATCH V3] powerpc/85xx: workaround for chips with MSI hardware errata

2015-03-03 Thread Hongtao Jia
Hi Scott,

I sent V3 of this patch a few days ago.
Comments are welcome.

Thanks.
-Hongtao

> -Original Message-
> From: Jia Hongtao [mailto:hongtao@freescale.com]
> Sent: Thursday, February 26, 2015 3:23 PM
> To: linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421; asolo...@kb.kras.ru
> Cc: ga...@kernel.crashing.org; Li Yang-Leo-R58472; Jia Hongtao-B38951
> Subject: [PATCH V3] powerpc/85xx: workaround for chips with MSI hardware
> errata
> 
> From: Hongtao Jia 
> 
> The MPIC version 2.0 has a MSI errata (errata PIC1 of mpc8544), It causes
> that neither MSI nor MSI-X can work fine. This is a workaround to allow
> MSI-X to function properly.
> 
> Signed-off-by: Liu Shuo 
> Signed-off-by: Li Yang 
> Signed-off-by: Jia Hongtao 
> ---
> Changes for V3:
> * remove mpic_has_erratum_pic1() function. Test erratum directly.
> * rebase on latest kernel update.
> 
> Changes for V2:
> * change the name of function mpic_has_errata() to
> mpic_has_erratum_pic1().
> * move MSI_HW_ERRATA_ENDIAN define to fsl_msi.h with all other defines.
> 
>  arch/powerpc/sysdev/fsl_msi.c | 29 ++---
>  arch/powerpc/sysdev/fsl_msi.h |  2 ++
>  2 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/sysdev/fsl_msi.c
> b/arch/powerpc/sysdev/fsl_msi.c
> index 4bbb4b8..f086c6f 100644
> --- a/arch/powerpc/sysdev/fsl_msi.c
> +++ b/arch/powerpc/sysdev/fsl_msi.c
> @@ -162,7 +162,17 @@ static void fsl_compose_msi_msg(struct pci_dev *pdev,
> int hwirq,
>   msg->address_lo = lower_32_bits(address);
>   msg->address_hi = upper_32_bits(address);
> 
> - msg->data = hwirq;
> + /*
> +  * MPIC version 2.0 has erratum PIC1. It causes
> +  * that neither MSI nor MSI-X can work fine.
> +  * This is a workaround to allow MSI-X to function
> +  * properly. It only works for MSI-X, we prevent
> +  * MSI on buggy chips in fsl_setup_msi_irqs().
> +  */
> + if (msi_data->feature & MSI_HW_ERRATA_ENDIAN)
> + msg->data = __swab32(hwirq);
> + else
> + msg->data = hwirq;
> 
>   pr_debug("%s: allocated srs: %d, ibs: %d\n", __func__,
>(hwirq >> msi_data->srs_shift) & MSI_SRS_MASK,
> @@ -180,8 +190,16 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev,
> int nvec, int type)
>   struct msi_msg msg;
>   struct fsl_msi *msi_data;
> 
> - if (type == PCI_CAP_ID_MSIX)
> - pr_debug("fslmsi: MSI-X untested, trying anyway.\n");
> + if (type == PCI_CAP_ID_MSI) {
> + /*
> +  * MPIC version 2.0 has erratum PIC1. For now MSI
> +  * could not work. So check to prevent MSI from
> +  * being used on the board with this erratum.
> +  */
> + list_for_each_entry(msi_data, &msi_head, list)
> + if (msi_data->feature & MSI_HW_ERRATA_ENDIAN)
> + return -EINVAL;
> + }
> 
>   /*
>* If the PCI node has an fsl,msi property, then we need to use it
> @@ -446,6 +464,11 @@ static int fsl_of_msi_probe(struct platform_device
> *dev)
> 
>   msi->feature = features->fsl_pic_ip;
> 
> + /* For erratum PIC1 on MPIC version 2.0*/
> + if ((features->fsl_pic_ip & FSL_PIC_IP_MASK) == FSL_PIC_IP_MPIC
> + && (fsl_mpic_primary_get_version() == 0x0200))
> + msi->feature |= MSI_HW_ERRATA_ENDIAN;
> +
>   /*
>* Remember the phandle, so that we can match with any PCI nodes
>* that have an "fsl,msi" property.
> diff --git a/arch/powerpc/sysdev/fsl_msi.h
> b/arch/powerpc/sysdev/fsl_msi.h
> index 420cfcb..a67359d 100644
> --- a/arch/powerpc/sysdev/fsl_msi.h
> +++ b/arch/powerpc/sysdev/fsl_msi.h
> @@ -27,6 +27,8 @@
>  #define FSL_PIC_IP_IPIC   0x0002
>  #define FSL_PIC_IP_VMPIC  0x0003
> 
> +#define MSI_HW_ERRATA_ENDIAN 0x0010
> +
>  struct fsl_msi_cascade_data;
> 
>  struct fsl_msi {
> --
> 2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)

2015-03-03 Thread Michael Ellerman
On Mon, 2015-03-02 at 09:42 -0600, Emil Medve wrote:
> On 03/02/2015 09:32 AM, Emil Medve wrote:
> > From: Igal Liberman 
> > 
> > Describe the PHY topology for all configurations supported by each board
> > 
> > Based on prior work by Andy Fleming 
> > 
> > Change-Id: I4fbcc5df9ee7c4f784afae9dab5d1e78cdc24f0f
> 
> Bah, I'll remove this...

Something like:

$ cat .git/hooks/commit-msg
#!/bin/sh

grep "Change-Id:" $1 > /dev/null

if [ $? -eq 0 ]; then
echo "***: Error commit message includes Change-Id" >&2
exit 1
fi


Ought to do it.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V13 21/21] powerpc/pci: Add PCI resource alignment documentation

2015-03-03 Thread Wei Yang
In order to enable SRIOV on PowerNV platform, the PF's IOV BAR needs to be
adjusted:

1. size expanded
2. aligned to M64BT size

This patch documents this change on the reason and how.

[bhelgaas: reformat, clarify, expand]
Signed-off-by: Wei Yang 
---
 .../powerpc/pci_iov_resource_on_powernv.txt|  305 
 1 file changed, 305 insertions(+)
 create mode 100644 Documentation/powerpc/pci_iov_resource_on_powernv.txt

diff --git a/Documentation/powerpc/pci_iov_resource_on_powernv.txt 
b/Documentation/powerpc/pci_iov_resource_on_powernv.txt
new file mode 100644
index 000..4e9bb28
--- /dev/null
+++ b/Documentation/powerpc/pci_iov_resource_on_powernv.txt
@@ -0,0 +1,305 @@
+Wei Yang 
+Benjamin Herrenschmidt 
+26 Aug 2014
+
+This document describes the requirement from hardware for PCI MMIO resource
+sizing and assignment on PowerNV platform and how generic PCI code handles
+this requirement.  The first two sections describe the concepts of
+Partitionable Endpoints and the implementation on P8 (IODA2).
+
+1. Introduction to Partitionable Endpoints
+
+A Partitionable Endpoint (PE) is a way to group the various resources
+associated with a device or a set of device to provide isolation between
+partitions (i.e., filtering of DMA, MSIs etc.) and to provide a mechanism
+to freeze a device that is causing errors in order to limit the possibility
+of propagation of bad data.
+
+There is thus, in HW, a table of PE states that contains a pair of "frozen"
+state bits (one for MMIO and one for DMA, they get set together but can be
+cleared independently) for each PE.
+
+When a PE is frozen, all stores in any direction are dropped and all loads
+return all 1's value.  MSIs are also blocked.  There's a bit more state
+that captures things like the details of the error that caused the freeze
+etc., but that's not critical.
+
+The interesting part is how the various PCIe transactions (MMIO, DMA, ...)
+are matched to their corresponding PEs.
+
+The following section provides a rough description of what we have on P8
+(IODA2).  Keep in mind that this is all per PHB (PCI host bridge).  Each
+PHB is a completely separate HW entity that replicates the entire logic,
+so has its own set of PEs, etc.
+
+2. Implementation of Partitionable Endpoints on P8 (IODA2)
+
+P8 supports up to 256 Partitionable Endpoints per PHB.
+
+  * Inbound
+
+For DMA, MSIs and inbound PCIe error messages, we have a table (in
+memory but accessed in HW by the chip) that provides a direct
+correspondence between a PCIe RID (bus/dev/fn) with a PE number.
+We call this the RTT.
+
+- For DMA we then provide an entire address space for each PE that can
+  contains two "windows", depending on the value of PCI address bit 59.
+  Each window can be configured to be remapped via a "TCE table" (IOMMU
+  translation table), which has various configurable characteristics
+  not described here.
+
+- For MSIs, we have two windows in the address space (one at the top of
+  the 32-bit space and one much higher) which, via a combination of the
+  address and MSI value, will result in one of the 2048 interrupts per
+  bridge being triggered.  There's a PE# in the interrupt controller
+  descriptor table as well which is compared with the PE# obtained from
+  the RTT to "authorize" the device to emit that specific interrupt.
+
+- Error messages just use the RTT.
+
+  * Outbound.  That's where the tricky part is.
+
+Like other PCI host bridges, the Power8 IODA2 PHB supports "windows"
+from the CPU address space to the PCI address space.  There is one M32
+window and sixteen M64 windows.  They have different characteristics.
+First what they have in common: they forward a configurable portion of
+the CPU address space to the PCIe bus and must be naturally aligned
+power of two in size.  The rest is different:
+
+- The M32 window:
+
+  * Is limited to 4GB in size.
+
+  * Drops the top bits of the address (above the size) and replaces
+   them with a configurable value.  This is typically used to generate
+   32-bit PCIe accesses.  We configure that window at boot from FW and
+   don't touch it from Linux; it's usually set to forward a 2GB
+   portion of address space from the CPU to PCIe
+   0x8000_..0x_.  (Note: The top 64KB are actually
+   reserved for MSIs but this is not a problem at this point; we just
+   need to ensure Linux doesn't assign anything there, the M32 logic
+   ignores that however and will forward in that space if we try).
+
+  * It is divided into 256 segments of equal size.  A table in the chip
+   maps each segment to a PE#.  That allows portions of the MMIO space
+   to be assigned to PEs on a segment granularity.  For a 2GB window,
+   the segment granularity is 2GB/256 = 8MB.
+
+Now, this is the "main" window we use in Linux today (excluding
+SR-IOV).  We

[PATCH V13 20/21] powerpc/pci: Remove unused struct pci_dn.pcidev field

2015-03-03 Thread Wei Yang
In struct pci_dn, the pcidev field is assigned but not used, so remove it.

Signed-off-by: Wei Yang 
Acked-by: Gavin Shan 
---
 arch/powerpc/include/asm/pci-bridge.h |1 -
 arch/powerpc/platforms/powernv/pci-ioda.c |1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 958ea86..109efba 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -168,7 +168,6 @@ struct pci_dn {
 
int pci_ext_config_space;   /* for pci devices */
 
-   struct  pci_dev *pcidev;/* back-pointer to the pci device */
 #ifdef CONFIG_EEH
struct eeh_dev *edev;   /* eeh device */
 #endif
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 47c46b7..47780c3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1024,7 +1024,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, 
struct pnv_ioda_pe *pe)
pci_name(dev));
continue;
}
-   pdn->pcidev = dev;
pdn->pe_number = pe->pe_number;
pe->dma_weight += pnv_ioda_dma_weight(dev);
if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V13 19/21] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3

2015-03-03 Thread Wei Yang
When IOV BAR is big, each is covered by 4 M64 windows.  This leads to
several VF PE sits in one PE in terms of M64.

Group VF PEs according to the M64 allocation.

[bhelgaas: use dev_printk() when possible]
Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/pci-bridge.h |2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  197 ++---
 2 files changed, 154 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index d824bb1..958ea86 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -182,7 +182,7 @@ struct pci_dn {
 #define M64_PER_IOV 4
int m64_per_iov;
 #define IODA_INVALID_M64(-1)
-   int m64_wins[PCI_SRIOV_NUM_BARS];
+   int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
 #endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index b1e936e..47c46b7 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1152,26 +1152,27 @@ static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
struct pci_controller *hose;
struct pnv_phb*phb;
struct pci_dn *pdn;
-   inti;
+   inti, j;
 
bus = pdev->bus;
hose = pci_bus_to_host(bus);
phb = hose->private_data;
pdn = pci_get_pdn(pdev);
 
-   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
-   if (pdn->m64_wins[i] == IODA_INVALID_M64)
-   continue;
-   opal_pci_phb_mmio_enable(phb->opal_id,
-   OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i], 0);
-   clear_bit(pdn->m64_wins[i], &phb->ioda.m64_bar_alloc);
-   pdn->m64_wins[i] = IODA_INVALID_M64;
-   }
+   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+   for (j = 0; j < M64_PER_IOV; j++) {
+   if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
+   continue;
+   opal_pci_phb_mmio_enable(phb->opal_id,
+   OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
+   clear_bit(pdn->m64_wins[i][j], 
&phb->ioda.m64_bar_alloc);
+   pdn->m64_wins[i][j] = IODA_INVALID_M64;
+   }
 
return 0;
 }
 
-static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
+static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 {
struct pci_bus*bus;
struct pci_controller *hose;
@@ -1179,17 +1180,33 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
struct pci_dn *pdn;
unsigned int   win;
struct resource   *res;
-   inti;
+   inti, j;
int64_trc;
+   inttotal_vfs;
+   resource_size_tsize, start;
+   intpe_num;
+   intvf_groups;
+   intvf_per_group;
 
bus = pdev->bus;
hose = pci_bus_to_host(bus);
phb = hose->private_data;
pdn = pci_get_pdn(pdev);
+   total_vfs = pci_sriov_get_totalvfs(pdev);
 
/* Initialize the m64_wins to IODA_INVALID_M64 */
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
-   pdn->m64_wins[i] = IODA_INVALID_M64;
+   for (j = 0; j < M64_PER_IOV; j++)
+   pdn->m64_wins[i][j] = IODA_INVALID_M64;
+
+   if (pdn->m64_per_iov == M64_PER_IOV) {
+   vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV;
+   vf_per_group = (num_vfs <= M64_PER_IOV)? 1:
+   roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
+   } else {
+   vf_groups = 1;
+   vf_per_group = 1;
+   }
 
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = &pdev->resource[i + PCI_IOV_RESOURCES];
@@ -1199,35 +1216,61 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
if (!pnv_pci_is_mem_pref_64(res->flags))
continue;
 
-   do {
-   win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
-   phb->ioda.m64_bar_idx + 1, 0);
-
-   if (win >= phb->ioda.m64_bar_idx + 1)
-   goto m64_failed;
-   } while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
+   for (j = 0; j < vf_groups; j++) {
+   do {
+   win = 
find_next_zero_bit(&phb->ioda.m64_bar_alloc,
+   phb->ioda.m64_bar_idx + 1, 0);
+
+   if (win >= phb->ioda.m64_bar_idx + 1)
+

[PATCH V13 18/21] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported

2015-03-03 Thread Wei Yang
M64 aperture size is limited on PHB3.  When the IOV BAR is too big, this
will exceed the limitation and failed to be assigned.

Introduce a different mechanism based on the IOV BAR size:

  - if IOV BAR size is smaller than 64MB, expand to total_pe
  - if IOV BAR size is bigger than 64MB, roundup power2

[bhelgaas: make dev_printk() output more consistent, use PCI_SRIOV_NUM_BARS]
Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/pci-bridge.h |2 ++
 arch/powerpc/platforms/powernv/pci-ioda.c |   33 ++---
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 011340d..d824bb1 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -179,6 +179,8 @@ struct pci_dn {
u16 max_vfs;/* number of VFs IOV BAR expended */
u16 vf_pes; /* VF PE# under this PF */
int offset; /* PE# for the first VF PE */
+#define M64_PER_IOV 4
+   int m64_per_iov;
 #define IODA_INVALID_M64(-1)
int m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 6f4ae91..b1e936e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2242,6 +2242,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
int i;
resource_size_t size;
struct pci_dn *pdn;
+   int mul, total_vfs;
 
if (!pdev->is_physfn || pdev->is_added)
return;
@@ -2252,6 +2253,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
pdn = pci_get_pdn(pdev);
pdn->max_vfs = 0;
 
+   total_vfs = pci_sriov_get_totalvfs(pdev);
+   pdn->m64_per_iov = 1;
+   mul = phb->ioda.total_pe;
+
+   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+   res = &pdev->resource[i + PCI_IOV_RESOURCES];
+   if (!res->flags || res->parent)
+   continue;
+   if (!pnv_pci_is_mem_pref_64(res->flags)) {
+   dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
+i, res);
+   continue;
+   }
+
+   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
+
+   /* bigger than 64M */
+   if (size > (1 << 26)) {
+   dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size 
is bigger than 64M, roundup power2\n",
+i, res);
+   pdn->m64_per_iov = M64_PER_IOV;
+   mul = roundup_pow_of_two(total_vfs);
+   break;
+   }
+   }
+
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = &pdev->resource[i + PCI_IOV_RESOURCES];
if (!res->flags || res->parent)
@@ -2264,12 +2291,12 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
 
dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
-   res->end = res->start + size * phb->ioda.total_pe - 1;
+   res->end = res->start + size * mul - 1;
dev_dbg(&pdev->dev, "   %pR\n", res);
dev_info(&pdev->dev, "VF BAR%d: %pR (expanded to %d VFs for PE 
alignment)",
-   i, res, phb->ioda.total_pe);
+i, res, mul);
}
-   pdn->max_vfs = phb->ioda.total_pe;
+   pdn->max_vfs = mul;
 }
 
 static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V13 17/21] powerpc/powernv: Shift VF resource with an offset

2015-03-03 Thread Wei Yang
On PowerNV platform, resource position in M64 implies the PE# the resource
belongs to.  In some cases, adjustment of a resource is necessary to locate
it to a correct position in M64.

Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address
according to an offset.

[bhelgaas: rework loops, rework overlap check, index resource[]
conventionally, remove pci_regs.h include, squashed with next patch]
Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/pci-bridge.h |4 +
 arch/powerpc/kernel/pci_dn.c  |   11 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  520 -
 arch/powerpc/platforms/powernv/pci.c  |   18 +
 arch/powerpc/platforms/powernv/pci.h  |7 +
 5 files changed, 543 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index de11de7..011340d 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -177,6 +177,10 @@ struct pci_dn {
int pe_number;
 #ifdef CONFIG_PCI_IOV
u16 max_vfs;/* number of VFs IOV BAR expended */
+   u16 vf_pes; /* VF PE# under this PF */
+   int offset; /* PE# for the first VF PE */
+#define IODA_INVALID_M64(-1)
+   int m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index f3a1a81..5faf7ca 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -217,6 +217,17 @@ void remove_dev_pci_info(struct pci_dev *pdev)
struct pci_dn *pdn, *tmp;
int i;
 
+   /*
+* VF and VF PE are created/released dynamically, so we need to
+* bind/unbind them.  Otherwise the VF and VF PE would be mismatched
+* when re-enabling SR-IOV.
+*/
+   if (pdev->is_virtfn) {
+   pdn = pci_get_pdn(pdev);
+   pdn->pe_number = IODA_INVALID_PE;
+   return;
+   }
+
/* Only support IOV PF for now */
if (!pdev->is_physfn)
return;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 50155c9..6f4ae91 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -44,6 +44,9 @@
 #include "powernv.h"
 #include "pci.h"
 
+/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
+#define TCE32_TABLE_SIZE   ((0x1000 / 0x1000) * 8)
+
 static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
const char *fmt, ...)
 {
@@ -56,11 +59,18 @@ static void pe_level_printk(const struct pnv_ioda_pe *pe, 
const char *level,
vaf.fmt = fmt;
vaf.va = &args;
 
-   if (pe->pdev)
+   if (pe->flags & PNV_IODA_PE_DEV)
strlcpy(pfix, dev_name(&pe->pdev->dev), sizeof(pfix));
-   else
+   else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
sprintf(pfix, "%04x:%02x ",
pci_domain_nr(pe->pbus), pe->pbus->number);
+#ifdef CONFIG_PCI_IOV
+   else if (pe->flags & PNV_IODA_PE_VF)
+   sprintf(pfix, "%04x:%02x:%2x.%d",
+   pci_domain_nr(pe->parent_dev->bus),
+   (pe->rid & 0xff00) >> 8,
+   PCI_SLOT(pe->rid), PCI_FUNC(pe->rid));
+#endif /* CONFIG_PCI_IOV*/
 
printk("%spci %s: [PE# %.3d] %pV",
   level, pfix, pe->pe_number, &vaf);
@@ -591,7 +601,7 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
  bool is_add)
 {
struct pnv_ioda_pe *slave;
-   struct pci_dev *pdev;
+   struct pci_dev *pdev = NULL;
int ret;
 
/*
@@ -630,8 +640,12 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
 
if (pe->flags & (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS))
pdev = pe->pbus->self;
-   else
+   else if (pe->flags & PNV_IODA_PE_DEV)
pdev = pe->pdev->bus->self;
+#ifdef CONFIG_PCI_IOV
+   else if (pe->flags & PNV_IODA_PE_VF)
+   pdev = pe->parent_dev->bus->self;
+#endif /* CONFIG_PCI_IOV */
while (pdev) {
struct pci_dn *pdn = pci_get_pdn(pdev);
struct pnv_ioda_pe *parent;
@@ -649,6 +663,87 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
return 0;
 }
 
+#ifdef CONFIG_PCI_IOV
+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
+{
+   struct pci_dev *parent;
+   uint8_t bcomp, dcomp, fcomp;
+   int64_t rc;
+   long rid_end, rid;
+
+   /* Currently, we just deconfigure VF PE. Bus PE will always there.*/
+   if (pe->pbus) {
+   int count;
+
+   dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
+   fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
+  

[PATCH V13 16/21] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv

2015-03-03 Thread Wei Yang
Implement pcibios_iov_resource_alignment() on powernv platform.

On PowerNV platform, there are 3 cases for the IOV BAR:
1. initial state, the IOV BAR size is multiple times of VF BAR size
2. after expanded, the IOV BAR size is expanded to meet the M64 segment size
3. sizing stage, the IOV BAR is truncated to 0

pnv_pci_iov_resource_alignment() handle these three cases respectively.

[bhelgaas: adjust to drop "align" parameter, return pci_iov_resource_size()
if no ppc_md machdep_call version]
Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/machdep.h|1 +
 arch/powerpc/kernel/pci-common.c  |   10 ++
 arch/powerpc/platforms/powernv/pci-ioda.c |   20 
 3 files changed, 31 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 965547c..045448f 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -252,6 +252,7 @@ struct machdep_calls {
 
 #ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_bus *bus);
+   resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int 
resno);
 #endif /* CONFIG_PCI_IOV */
 
/* Called to shutdown machine specific hardware not already controlled
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 022e9fe..b91eff3 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -130,6 +130,16 @@ void pcibios_reset_secondary_bus(struct pci_dev *dev)
pci_reset_secondary_bus(dev);
 }
 
+#ifdef CONFIG_PCI_IOV
+resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
+{
+   if (ppc_md.pcibios_iov_resource_alignment)
+   return ppc_md.pcibios_iov_resource_alignment(pdev, resno);
+
+   return pci_iov_resource_size(pdev, resno);
+}
+#endif /* CONFIG_PCI_IOV */
+
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
 {
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 958c7a3..50155c9 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1983,6 +1983,25 @@ static resource_size_t pnv_pci_window_alignment(struct 
pci_bus *bus,
return phb->ioda.io_segsize;
 }
 
+#ifdef CONFIG_PCI_IOV
+static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
+ int resno)
+{
+   struct pci_dn *pdn = pci_get_pdn(pdev);
+   resource_size_t align, iov_align;
+
+   iov_align = resource_size(&pdev->resource[resno]);
+   if (iov_align)
+   return iov_align;
+
+   align = pci_iov_resource_size(pdev, resno);
+   if (pdn->max_vfs)
+   return pdn->max_vfs * align;
+
+   return align;
+}
+#endif /* CONFIG_PCI_IOV */
+
 /* Prevent enabling devices for which we couldn't properly
  * assign a PE
  */
@@ -2185,6 +2204,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
ppc_md.pcibios_reset_secondary_bus = pnv_pci_reset_secondary_bus;
 #ifdef CONFIG_PCI_IOV
ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_sriov;
+   ppc_md.pcibios_iov_resource_alignment = pnv_pci_iov_resource_alignment;
 #endif /* CONFIG_PCI_IOV */
pci_add_flags(PCI_REASSIGN_ALL_RSRC);
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V13 15/21] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe

2015-03-03 Thread Wei Yang
On PHB3, PF IOV BAR will be covered by M64 window to have better PE
isolation.  The total_pe number is usually different from total_VFs, which
can lead to a conflict between MMIO space and the PE number.

For example, if total_VFs is 128 and total_pe is 256, the second half of
M64 window will be part of other PCI device, which may already belong
to other PEs.

Prevent the conflict by reserving additional space for the PF IOV BAR,
which is total_pe number of VF's BAR size.

[bhelgaas: make dev_printk() output more consistent, index resource[]
conventionally]
Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/machdep.h|4 ++
 arch/powerpc/include/asm/pci-bridge.h |3 ++
 arch/powerpc/kernel/pci-common.c  |5 +++
 arch/powerpc/kernel/pci-hotplug.c |4 ++
 arch/powerpc/platforms/powernv/pci-ioda.c |   61 +
 5 files changed, 77 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index c8175a3..965547c 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -250,6 +250,10 @@ struct machdep_calls {
/* Reset the secondary bus of bridge */
void  (*pcibios_reset_secondary_bus)(struct pci_dev *dev);
 
+#ifdef CONFIG_PCI_IOV
+   void (*pcibios_fixup_sriov)(struct pci_bus *bus);
+#endif /* CONFIG_PCI_IOV */
+
/* Called to shutdown machine specific hardware not already controlled
 * by other drivers.
 */
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 513f8f2..de11de7 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -175,6 +175,9 @@ struct pci_dn {
 #define IODA_INVALID_PE(-1)
 #ifdef CONFIG_PPC_POWERNV
int pe_number;
+#ifdef CONFIG_PCI_IOV
+   u16 max_vfs;/* number of VFs IOV BAR expended */
+#endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
struct list_head list;
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 8203101..022e9fe 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1646,6 +1646,11 @@ void pcibios_scan_phb(struct pci_controller *hose)
if (ppc_md.pcibios_fixup_phb)
ppc_md.pcibios_fixup_phb(hose);
 
+#ifdef CONFIG_PCI_IOV
+   if (ppc_md.pcibios_fixup_sriov)
+   ppc_md.pcibios_fixup_sriov(bus);
+#endif /* CONFIG_PCI_IOV */
+
/* Configure PCI Express settings */
if (bus && !pci_has_flag(PCI_PROBE_ONLY)) {
struct pci_bus *child;
diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 5b78917..7d238ae 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -94,6 +94,10 @@ void pcibios_add_pci_devices(struct pci_bus * bus)
 */
slotno = PCI_SLOT(PCI_DN(dn->child)->devfn);
pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
+#ifdef CONFIG_PCI_IOV
+   if (ppc_md.pcibios_fixup_sriov)
+   ppc_md.pcibios_fixup_sriov(bus);
+#endif /* CONFIG_PCI_IOV */
pcibios_setup_bus_devices(bus);
max = bus->busn_res.start;
for (pass = 0; pass < 2; pass++) {
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1b37066..958c7a3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1749,6 +1749,64 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
 static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { }
 #endif /* CONFIG_PCI_MSI */
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
+{
+   struct pci_controller *hose;
+   struct pnv_phb *phb;
+   struct resource *res;
+   int i;
+   resource_size_t size;
+   struct pci_dn *pdn;
+
+   if (!pdev->is_physfn || pdev->is_added)
+   return;
+
+   hose = pci_bus_to_host(pdev->bus);
+   phb = hose->private_data;
+
+   pdn = pci_get_pdn(pdev);
+   pdn->max_vfs = 0;
+
+   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+   res = &pdev->resource[i + PCI_IOV_RESOURCES];
+   if (!res->flags || res->parent)
+   continue;
+   if (!pnv_pci_is_mem_pref_64(res->flags)) {
+   dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: 
%pR\n",
+i, res);
+   continue;
+   }
+
+   dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
+   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
+   res->end = res->start + size * phb->ioda.total_pe - 1;
+   dev_dbg(&pdev->dev, "   %pR\n", res);

[PATCH V13 14/21] powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically

2015-03-03 Thread Wei Yang
Current iommu_table of a PE is a static field.  This will have a problem
when iommu_free_table() is called.

Allocate iommu_table dynamically.

Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/iommu.h  |3 +++
 arch/powerpc/platforms/powernv/pci-ioda.c |   26 ++
 arch/powerpc/platforms/powernv/pci.h  |2 +-
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 9cfa370..5574eeb 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -78,6 +78,9 @@ struct iommu_table {
struct iommu_group *it_group;
 #endif
void (*set_bypass)(struct iommu_table *tbl, bool enable);
+#ifdef CONFIG_PPC_POWERNV
+   void   *data;
+#endif
 };
 
 /* Pure 2^n version of get_order */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index df4a295..1b37066 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -916,6 +916,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int 
all)
return;
}
 
+   pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
+   GFP_KERNEL, hose->node);
+   pe->tce32_table->data = pe;
+
/* Associate it with all child devices */
pnv_ioda_setup_same_PE(bus, pe);
 
@@ -1005,7 +1009,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb 
*phb, struct pci_dev *pdev
 
pe = &phb->ioda.pe_array[pdn->pe_number];
WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
-   set_iommu_table_base_and_group(&pdev->dev, &pe->tce32_table);
+   set_iommu_table_base_and_group(&pdev->dev, pe->tce32_table);
 }
 
 static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
@@ -1032,7 +1036,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
} else {
dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n");
set_dma_ops(&pdev->dev, &dma_iommu_ops);
-   set_iommu_table_base(&pdev->dev, &pe->tce32_table);
+   set_iommu_table_base(&pdev->dev, pe->tce32_table);
}
*pdev->dev.dma_mask = dma_mask;
return 0;
@@ -1069,9 +1073,9 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe,
list_for_each_entry(dev, &bus->devices, bus_list) {
if (add_to_iommu_group)
set_iommu_table_base_and_group(&dev->dev,
-  &pe->tce32_table);
+  pe->tce32_table);
else
-   set_iommu_table_base(&dev->dev, &pe->tce32_table);
+   set_iommu_table_base(&dev->dev, pe->tce32_table);
 
if (dev->subordinate)
pnv_ioda_setup_bus_dma(pe, dev->subordinate,
@@ -1161,8 +1165,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
pnv_ioda_pe *pe,
 void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
 __be64 *startp, __be64 *endp, bool rm)
 {
-   struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
- tce32_table);
+   struct pnv_ioda_pe *pe = tbl->data;
struct pnv_phb *phb = pe->phb;
 
if (phb->type == PNV_PHB_IODA1)
@@ -1228,7 +1231,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
}
 
/* Setup linux iommu table */
-   tbl = &pe->tce32_table;
+   tbl = pe->tce32_table;
pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
  base << 28, IOMMU_PAGE_SHIFT_4K);
 
@@ -1266,8 +1269,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 
 static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
 {
-   struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
- tce32_table);
+   struct pnv_ioda_pe *pe = tbl->data;
uint16_t window_id = (pe->pe_number << 1 ) + 1;
int64_t rc;
 
@@ -1312,10 +1314,10 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct 
pnv_phb *phb,
pe->tce_bypass_base = 1ull << 59;
 
/* Install set_bypass callback for VFIO */
-   pe->tce32_table.set_bypass = pnv_pci_ioda2_set_bypass;
+   pe->tce32_table->set_bypass = pnv_pci_ioda2_set_bypass;
 
/* Enable bypass by default */
-   pnv_pci_ioda2_set_bypass(&pe->tce32_table, true);
+   pnv_pci_ioda2_set_bypass(pe->tce32_table, true);
 }
 
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
@@ -1363,7 +1365,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
}
 
/* Setup linux iommu table */
-   tbl = &pe->tce32_table;
+   tbl = pe->tce32_table;
pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,

[PATCH V13 13/21] powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor

2015-03-03 Thread Wei Yang
The PCI config accessors previously relied on device_node.  Unfortunately,
VFs don't have a corresponding device_node, so change the accessors to use
pci_dn instead.

[bhelgaas: changelog]
Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c |   14 +-
 arch/powerpc/platforms/powernv/pci.c |   69 ++
 arch/powerpc/platforms/powernv/pci.h |4 +-
 3 files changed, 40 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index e261869..7a5021b 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -430,21 +430,31 @@ static inline bool powernv_eeh_cfg_blocked(struct 
device_node *dn)
 static int powernv_eeh_read_config(struct device_node *dn,
   int where, int size, u32 *val)
 {
+   struct pci_dn *pdn = PCI_DN(dn);
+
+   if (!pdn)
+   return PCIBIOS_DEVICE_NOT_FOUND;
+
if (powernv_eeh_cfg_blocked(dn)) {
*val = 0x;
return PCIBIOS_SET_FAILED;
}
 
-   return pnv_pci_cfg_read(dn, where, size, val);
+   return pnv_pci_cfg_read(pdn, where, size, val);
 }
 
 static int powernv_eeh_write_config(struct device_node *dn,
int where, int size, u32 val)
 {
+   struct pci_dn *pdn = PCI_DN(dn);
+
+   if (!pdn)
+   return PCIBIOS_DEVICE_NOT_FOUND;
+
if (powernv_eeh_cfg_blocked(dn))
return PCIBIOS_SET_FAILED;
 
-   return pnv_pci_cfg_write(dn, where, size, val);
+   return pnv_pci_cfg_write(pdn, where, size, val);
 }
 
 /**
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index e69142f..6c20d6e 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -366,9 +366,9 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, 
u32 pe_no)
spin_unlock_irqrestore(&phb->lock, flags);
 }
 
-static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
-struct device_node *dn)
+static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
 {
+   struct pnv_phb *phb = pdn->phb->private_data;
u8  fstate;
__be16  pcierr;
int pe_no;
@@ -379,7 +379,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
 * setup that yet. So all ER errors should be mapped to
 * reserved PE.
 */
-   pe_no = PCI_DN(dn)->pe_number;
+   pe_no = pdn->pe_number;
if (pe_no == IODA_INVALID_PE) {
if (phb->type == PNV_PHB_P5IOC2)
pe_no = 0;
@@ -407,8 +407,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
}
 
cfg_dbg(" -> EEH check, bdfn=%04x PE#%d fstate=%x\n",
-   (PCI_DN(dn)->busno << 8) | (PCI_DN(dn)->devfn),
-   pe_no, fstate);
+   (pdn->busno << 8) | (pdn->devfn), pe_no, fstate);
 
/* Clear the frozen state if applicable */
if (fstate == OPAL_EEH_STOPPED_MMIO_FREEZE ||
@@ -425,10 +424,9 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
}
 }
 
-int pnv_pci_cfg_read(struct device_node *dn,
+int pnv_pci_cfg_read(struct pci_dn *pdn,
 int where, int size, u32 *val)
 {
-   struct pci_dn *pdn = PCI_DN(dn);
struct pnv_phb *phb = pdn->phb->private_data;
u32 bdfn = (pdn->busno << 8) | pdn->devfn;
s64 rc;
@@ -462,10 +460,9 @@ int pnv_pci_cfg_read(struct device_node *dn,
return PCIBIOS_SUCCESSFUL;
 }
 
-int pnv_pci_cfg_write(struct device_node *dn,
+int pnv_pci_cfg_write(struct pci_dn *pdn,
  int where, int size, u32 val)
 {
-   struct pci_dn *pdn = PCI_DN(dn);
struct pnv_phb *phb = pdn->phb->private_data;
u32 bdfn = (pdn->busno << 8) | pdn->devfn;
 
@@ -489,18 +486,17 @@ int pnv_pci_cfg_write(struct device_node *dn,
 }
 
 #if CONFIG_EEH
-static bool pnv_pci_cfg_check(struct pci_controller *hose,
- struct device_node *dn)
+static bool pnv_pci_cfg_check(struct pci_dn *pdn)
 {
struct eeh_dev *edev = NULL;
-   struct pnv_phb *phb = hose->private_data;
+   struct pnv_phb *phb = pdn->phb->private_data;
 
/* EEH not enabled ? */
if (!(phb->flags & PNV_PHB_FLAG_EEH))
return true;
 
/* PE reset or device removed ? */
-   edev = of_node_to_eeh_dev(dn);
+   edev = pdn->edev;
if (edev) {
if (edev->pe &&
(edev->pe->state & EEH_PE_CFG_BLOCKED))
@@ -513,8 +509,7 @@ static bool pnv_pci_cfg_check(struct pci_controller *hose,
return true;
 }
 #else
-static inline pnv_pci_cfg_check(struct pci_controller *hose,
-   struct device_node *dn)
+static inline pnv_pci_cfg_check(struct pci_dn *pdn)

[PATCH V13 12/21] powerpc/pci: Refactor pci_dn

2015-03-03 Thread Wei Yang
From: Gavin Shan 

pci_dn is the extension of PCI device node and is created from device node.
Unfortunately, VFs are enabled dynamically by PF's driver and they don't
have corresponding device nodes, and pci_dn.  Refactor pci_dn to support
VFs:

   * pci_dn is organized as a hierarchy tree.  VF's pci_dn is put
 to the child list of pci_dn of PF's bridge.  pci_dn of other device
 put to the child list of pci_dn of its upstream bridge.

   * VF's pci_dn is expected to be created dynamically when PF
 enabling VFs.  VF's pci_dn will be destroyed when PF disabling VFs.
 pci_dn of other device is still created from device node as before.

   * For one particular PCI device (VF or not), its pci_dn can be
 found from pdev->dev.archdata.firmware_data, PCI_DN(devnode), or
 parent's list.  The fast path (fetching pci_dn through PCI device
 instance) is populated during early fixup time.

[bhelgaas: add ifdef around add_one_dev_pci_info(), use dev_printk()]
Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/device.h |3 +
 arch/powerpc/include/asm/pci-bridge.h |   14 +-
 arch/powerpc/kernel/pci_dn.c  |  245 -
 arch/powerpc/platforms/powernv/pci-ioda.c |   16 ++
 4 files changed, 272 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 38faede..29992cd 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -34,6 +34,9 @@ struct dev_archdata {
 #ifdef CONFIG_SWIOTLB
dma_addr_t  max_direct_dma_addr;
 #endif
+#ifdef CONFIG_PPC64
+   void*firmware_data;
+#endif
 #ifdef CONFIG_EEH
struct eeh_dev  *edev;
 #endif
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 546d036..513f8f2 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -89,6 +89,7 @@ struct pci_controller {
 
 #ifdef CONFIG_PPC64
unsigned long buid;
+   void *firmware_data;
 #endif /* CONFIG_PPC64 */
 
void *private_data;
@@ -154,9 +155,13 @@ static inline int isa_vaddr_is_ioport(void __iomem 
*address)
 struct iommu_table;
 
 struct pci_dn {
+   int flags;
+#define PCI_DN_FLAG_IOV_VF 0x01
+
int busno;  /* pci bus number */
int devfn;  /* pci device and function number */
 
+   struct  pci_dn *parent;
struct  pci_controller *phb;/* for pci devices */
struct  iommu_table *iommu_table;   /* for phb's or bridges */
struct  device_node *node;  /* back-pointer to the device_node */
@@ -171,14 +176,19 @@ struct pci_dn {
 #ifdef CONFIG_PPC_POWERNV
int pe_number;
 #endif
+   struct list_head child_list;
+   struct list_head list;
 };
 
 /* Get the pointer to a device_node's pci_dn */
 #define PCI_DN(dn) ((struct pci_dn *) (dn)->data)
 
+extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
+  int devfn);
 extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
-
-extern void * update_dn_pci_info(struct device_node *dn, void *data);
+extern struct pci_dn *add_dev_pci_info(struct pci_dev *pdev);
+extern void remove_dev_pci_info(struct pci_dev *pdev);
+extern void *update_dn_pci_info(struct device_node *dn, void *data);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
  u8 *bus, u8 *devfn)
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 83df307..f3a1a81 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -32,12 +32,223 @@
 #include 
 #include 
 
+/*
+ * The function is used to find the firmware data of one
+ * specific PCI device, which is attached to the indicated
+ * PCI bus. For VFs, their firmware data is linked to that
+ * one of PF's bridge. For other devices, their firmware
+ * data is linked to that of their bridge.
+ */
+static struct pci_dn *pci_bus_to_pdn(struct pci_bus *bus)
+{
+   struct pci_bus *pbus;
+   struct device_node *dn;
+   struct pci_dn *pdn;
+
+   /*
+* We probably have virtual bus which doesn't
+* have associated bridge.
+*/
+   pbus = bus;
+   while (pbus) {
+   if (pci_is_root_bus(pbus) || pbus->self)
+   break;
+
+   pbus = pbus->parent;
+   }
+
+   /*
+* Except virtual bus, all PCI buses should
+* have device nodes.
+*/
+   dn = pci_bus_to_OF_node(pbus);
+   pdn = dn ? PCI_DN(dn) : NULL;
+
+   return pdn;
+}
+
+struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
+   int devfn)
+{
+   struct device_node *dn = NULL;
+   struct pci_dn *parent, *pdn;
+   struct pci_dev *pdev = NULL;
+
+   /* Fast path: fetch 

[PATCH V13 11/21] powerpc/pci: Don't unset PCI resources for VFs

2015-03-03 Thread Wei Yang
Flag PCI_REASSIGN_ALL_RSRC is used to ignore resources information setup by
firmware, so that kernel would re-assign all resources of pci devices.

On powerpc arch, this happens in a header fixup function
pcibios_fixup_resources(), which will clean up the resources if this flag
is set. This works fine for PFs, since after clean up, kernel will
re-assign the resources in pcibios_resource_survey().

Below is a simple call flow on how it works:

pcibios_init
  pcibios_scan_phb
pci_scan_child_bus
  ...
pci_device_add
  pci_fixup_device(pci_fixup_header)
pcibios_fixup_resources # header fixup
  for (i = 0; i < DEVICE_COUNT_RESOURCE; i++)
dev->resource[i].start = 0
  pcibios_resource_survey   # re-assign
pcibios_allocate_resources

However, the VF resources won't be re-assigned, since the VF resources are
completely determined by the PF resources, and the PF resources have
already been reassigned. This means we need to leave VF's resources
un-cleared in pcibios_fixup_resources().

In this patch, we skip the resource unset process in
pcibios_fixup_resources(), if the pci_dev is a VF.

Signed-off-by: Wei Yang 
---
 arch/powerpc/kernel/pci-common.c |4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 2a525c9..8203101 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -788,6 +788,10 @@ static void pcibios_fixup_resources(struct pci_dev *dev)
   pci_name(dev));
return;
}
+
+   if (dev->is_virtfn)
+   return;
+
for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
struct resource *res = dev->resource + i;
struct pci_bus_region reg;
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V13 10/21] PCI: Consider additional PF's IOV BAR alignment in sizing and assigning

2015-03-03 Thread Wei Yang
When sizing and assigning resources, we divide the resources into two
lists: the requested list and the additional list.  We don't consider the
alignment of additional VF(n) BAR space.

This is reasonable because the alignment required for the VF(n) BAR space
is the size of an individual VF BAR, not the size of the space for *all*
VFs.  But some platforms, e.g., PowerNV, require additional alignment.

Consider the additional IOV BAR alignment when sizing and assigning
resources.  When there is not enough system MMIO space, the PF's IOV BAR
alignment will not contribute to the bridge.  When there is enough system
MMIO space, the additional alignment will contribute to the bridge.

Also, take advantage of pci_dev_resource::min_align to store this
additional alignment.

[bhelgaas: changelog, printk cast]
Signed-off-by: Wei Yang 
---
 drivers/pci/setup-bus.c |   83 +++
 1 file changed, 70 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index e3e17f3..affbcea 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -99,8 +99,8 @@ static void remove_from_list(struct list_head *head,
}
 }
 
-static resource_size_t get_res_add_size(struct list_head *head,
-   struct resource *res)
+static struct pci_dev_resource *res_to_dev_res(struct list_head *head,
+  struct resource *res)
 {
struct pci_dev_resource *dev_res;
 
@@ -109,17 +109,37 @@ static resource_size_t get_res_add_size(struct list_head 
*head,
int idx = res - &dev_res->dev->resource[0];
 
dev_printk(KERN_DEBUG, &dev_res->dev->dev,
-"res[%d]=%pR get_res_add_size add_size %llx\n",
+"res[%d]=%pR res_to_dev_res add_size %llx 
min_align %llx\n",
 idx, dev_res->res,
-(unsigned long long)dev_res->add_size);
+(unsigned long long)dev_res->add_size,
+(unsigned long long)dev_res->min_align);
 
-   return dev_res->add_size;
+   return dev_res;
}
}
 
-   return 0;
+   return NULL;
+}
+
+static resource_size_t get_res_add_size(struct list_head *head,
+   struct resource *res)
+{
+   struct pci_dev_resource *dev_res;
+
+   dev_res = res_to_dev_res(head, res);
+   return dev_res ? dev_res->add_size : 0;
+}
+
+static resource_size_t get_res_add_align(struct list_head *head,
+struct resource *res)
+{
+   struct pci_dev_resource *dev_res;
+
+   dev_res = res_to_dev_res(head, res);
+   return dev_res ? dev_res->min_align : 0;
 }
 
+
 /* Sort resources by alignment */
 static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
@@ -368,8 +388,9 @@ static void __assign_resources_sorted(struct list_head 
*head,
LIST_HEAD(save_head);
LIST_HEAD(local_fail_head);
struct pci_dev_resource *save_res;
-   struct pci_dev_resource *dev_res, *tmp_res;
+   struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
unsigned long fail_type;
+   resource_size_t add_align, align;
 
/* Check if optional add_size is there */
if (!realloc_head || list_empty(realloc_head))
@@ -384,10 +405,38 @@ static void __assign_resources_sorted(struct list_head 
*head,
}
 
/* Update res in head list with add_size in realloc_head list */
-   list_for_each_entry(dev_res, head, list)
+   list_for_each_entry_safe(dev_res, tmp_res, head, list) {
dev_res->res->end += get_res_add_size(realloc_head,
dev_res->res);
 
+   /*
+* There are two kinds of additional resources in the list:
+* 1. bridge resource  -- IORESOURCE_STARTALIGN
+* 2. SR-IOV resource   -- IORESOURCE_SIZEALIGN
+* Here just fix the additional alignment for bridge
+*/
+   if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
+   continue;
+
+   add_align = get_res_add_align(realloc_head, dev_res->res);
+
+   /* Reorder the list by their alignment */
+   if (add_align > dev_res->res->start) {
+   dev_res->res->start = add_align;
+   dev_res->res->end = add_align +
+   resource_size(dev_res->res);
+
+   list_for_each_entry(dev_res2, head, list) {
+   align = pci_resource_alignment(dev_res2->dev,
+  dev_res2->res);
+   if (

[PATCH V13 09/21] PCI: Add pcibios_iov_resource_alignment() interface

2015-03-03 Thread Wei Yang
Per the SR-IOV spec r1.1, sec 3.3.14, the required alignment of a PF's IOV
BAR is the size of an individual VF BAR, and the size consumed is the
individual VF BAR size times NumVFs.

The PowerNV platform has additional alignment requirements to help support
its Partitionable Endpoint device isolation feature (see
Documentation/powerpc/pci_iov_resource_on_powernv.txt).

Add a pcibios_iov_resource_alignment() interface to allow platforms to
request additional alignment.

[bhelgaas: changelog, adapt to reworked pci_sriov_resource_alignment(),
drop "align" parameter]
Signed-off-by: Wei Yang 
---
 drivers/pci/iov.c   |8 +++-
 include/linux/pci.h |1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 64c4692..ee0ebff 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -569,6 +569,12 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
4 * (resno - PCI_IOV_RESOURCES);
 }
 
+resource_size_t __weak pcibios_iov_resource_alignment(struct pci_dev *dev,
+ int resno)
+{
+   return pci_iov_resource_size(dev, resno);
+}
+
 /**
  * pci_sriov_resource_alignment - get resource alignment for VF BAR
  * @dev: the PCI device
@@ -581,7 +587,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
  */
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 {
-   return pci_iov_resource_size(dev, resno);
+   return pcibios_iov_resource_alignment(dev, resno);
 }
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 99ea948..4e1f17d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1174,6 +1174,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
 void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 unsigned long type);
+resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev, int resno);
 
 #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
 #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V13 08/21] PCI: Add pcibios_sriov_enable() and pcibios_sriov_disable()

2015-03-03 Thread Wei Yang
VFs are dynamically created when a driver enables them.  On some platforms,
like PowerNV, special resources are necessary to enable VFs.

Add platform hooks for enabling and disabling VFs.

Signed-off-by: Wei Yang 
---
 drivers/pci/iov.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 5643a10..64c4692 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -220,6 +220,11 @@ static void virtfn_remove(struct pci_dev *dev, int id, int 
reset)
pci_dev_put(dev);
 }
 
+int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
+{
+   return 0;
+}
+
 static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 {
int rc;
@@ -231,6 +236,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
struct pci_sriov *iov = dev->sriov;
int bars = 0;
int bus;
+   int retval;
 
if (!nr_virtfn)
return 0;
@@ -307,6 +313,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
if (nr_virtfn < initial)
initial = nr_virtfn;
 
+   if ((retval = pcibios_sriov_enable(dev, initial))) {
+   dev_err(&dev->dev, "failure %d from pcibios_sriov_enable()\n",
+   retval);
+   return retval;
+   }
+
for (i = 0; i < initial; i++) {
rc = virtfn_add(dev, i, 0);
if (rc)
@@ -335,6 +347,11 @@ failed:
return rc;
 }
 
+int __weak pcibios_sriov_disable(struct pci_dev *pdev)
+{
+   return 0;
+}
+
 static void sriov_disable(struct pci_dev *dev)
 {
int i;
@@ -346,6 +363,8 @@ static void sriov_disable(struct pci_dev *dev)
for (i = 0; i < iov->num_VFs; i++)
virtfn_remove(dev, i, 0);
 
+   pcibios_sriov_disable(dev);
+
iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
pci_cfg_access_lock(dev);
pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V13 07/21] PCI: Export pci_iov_virtfn_bus() and pci_iov_virtfn_devfn()

2015-03-03 Thread Wei Yang
On PowerNV, some resource reservation is needed for SR-IOV VFs that don't
exist at the bootup stage.  To do the match between resources and VFs, the
code need to get the VF's BDF in advance.

Rename virtfn_bus() and virtfn_devfn() to pci_iov_virtfn_bus() and
pci_iov_virtfn_devfn() and export them.

[bhelgaas: changelog, make "busnr" int]
Signed-off-by: Wei Yang 
---
 drivers/pci/iov.c   |   28 
 include/linux/pci.h |   11 +++
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 2ae921f..5643a10 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -19,16 +19,20 @@
 
 #define VIRTFN_ID_LEN  16
 
-static inline u8 virtfn_bus(struct pci_dev *dev, int id)
+int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id)
 {
+   if (!dev->is_physfn)
+   return -EINVAL;
return dev->bus->number + ((dev->devfn + dev->sriov->offset +
-   dev->sriov->stride * id) >> 8);
+   dev->sriov->stride * vf_id) >> 8);
 }
 
-static inline u8 virtfn_devfn(struct pci_dev *dev, int id)
+int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
 {
+   if (!dev->is_physfn)
+   return -EINVAL;
return (dev->devfn + dev->sriov->offset +
-   dev->sriov->stride * id) & 0xff;
+   dev->sriov->stride * vf_id) & 0xff;
 }
 
 /*
@@ -58,11 +62,11 @@ static inline u8 virtfn_max_buses(struct pci_dev *dev)
struct pci_sriov *iov = dev->sriov;
int nr_virtfn;
u8 max = 0;
-   u8 busnr;
+   int busnr;
 
for (nr_virtfn = 1; nr_virtfn <= iov->total_VFs; nr_virtfn++) {
pci_iov_set_numvfs(dev, nr_virtfn);
-   busnr = virtfn_bus(dev, nr_virtfn - 1);
+   busnr = pci_iov_virtfn_bus(dev, nr_virtfn - 1);
if (busnr > max)
max = busnr;
}
@@ -116,7 +120,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int 
reset)
struct pci_bus *bus;
 
mutex_lock(&iov->dev->sriov->lock);
-   bus = virtfn_add_bus(dev->bus, virtfn_bus(dev, id));
+   bus = virtfn_add_bus(dev->bus, pci_iov_virtfn_bus(dev, id));
if (!bus)
goto failed;
 
@@ -124,7 +128,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int 
reset)
if (!virtfn)
goto failed0;
 
-   virtfn->devfn = virtfn_devfn(dev, id);
+   virtfn->devfn = pci_iov_virtfn_devfn(dev, id);
virtfn->vendor = dev->vendor;
pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_DID, &virtfn->device);
pci_setup_device(virtfn);
@@ -186,8 +190,8 @@ static void virtfn_remove(struct pci_dev *dev, int id, int 
reset)
struct pci_sriov *iov = dev->sriov;
 
virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
-virtfn_bus(dev, id),
-virtfn_devfn(dev, id));
+pci_iov_virtfn_bus(dev, id),
+pci_iov_virtfn_devfn(dev, id));
if (!virtfn)
return;
 
@@ -226,7 +230,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
struct pci_dev *pdev;
struct pci_sriov *iov = dev->sriov;
int bars = 0;
-   u8 bus;
+   int bus;
 
if (!nr_virtfn)
return 0;
@@ -263,7 +267,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
iov->offset = offset;
iov->stride = stride;
 
-   bus = virtfn_bus(dev, nr_virtfn - 1);
+   bus = pci_iov_virtfn_bus(dev, nr_virtfn - 1);
if (bus > dev->bus->busn_res.end) {
dev_err(&dev->dev, "can't enable %d VFs (bus %02x out of range 
of %pR)\n",
nr_virtfn, bus, &dev->bus->busn_res);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1559658..99ea948 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1669,6 +1669,9 @@ int pci_ext_cfg_avail(void);
 void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
 
 #ifdef CONFIG_PCI_IOV
+int pci_iov_virtfn_bus(struct pci_dev *dev, int id);
+int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
+
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
 int pci_num_vf(struct pci_dev *dev);
@@ -1677,6 +1680,14 @@ int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 
numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 #else
+static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
+{
+   return -ENOSYS;
+}
+static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
+{
+   return -ENOSYS;
+}
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 { return -ENODEV; }
 static inline void pci_disable_sriov(stru

[PATCH V13 06/21] PCI: Calculate maximum number of buses required for VFs

2015-03-03 Thread Wei Yang
An SR-IOV device can change its First VF Offset and VF Stride based on the
values of ARI Capable Hierarchy and NumVFs.  The number of buses required
for all VFs is determined by NumVFs, First VF Offset, and VF Stride (see
SR-IOV spec r1.1, sec 2.1.2).

Previously pci_iov_bus_range() computed how many buses would be required by
TotalVFs, but this was based on a single NumVFs value and may not have been
the maximum for all NumVFs configurations.

Iterate over all valid NumVFs and calculate the maximum number of bus
numbers that could ever be required for VFs of this device.

[bhelgaas: changelog, compute busnr of NumVFs, not TotalVFs, remove
kerenl-doc comment marker]
Signed-off-by: Wei Yang 
---
 drivers/pci/iov.c |   31 +++
 drivers/pci/pci.h |1 +
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index a8752c2..2ae921f 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -46,6 +46,30 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, 
int nr_virtfn)
pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &iov->stride);
 }
 
+/*
+ * The PF consumes one bus number.  NumVFs, First VF Offset, and VF Stride
+ * determine how many additional bus numbers will be consumed by VFs.
+ *
+ * Iterate over all valid NumVFs and calculate the maximum number of bus
+ * numbers that could ever be required.
+ */
+static inline u8 virtfn_max_buses(struct pci_dev *dev)
+{
+   struct pci_sriov *iov = dev->sriov;
+   int nr_virtfn;
+   u8 max = 0;
+   u8 busnr;
+
+   for (nr_virtfn = 1; nr_virtfn <= iov->total_VFs; nr_virtfn++) {
+   pci_iov_set_numvfs(dev, nr_virtfn);
+   busnr = virtfn_bus(dev, nr_virtfn - 1);
+   if (busnr > max)
+   max = busnr;
+   }
+
+   return max;
+}
+
 static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
 {
struct pci_bus *child;
@@ -427,6 +451,7 @@ found:
 
dev->sriov = iov;
dev->is_physfn = 1;
+   iov->max_VF_buses = virtfn_max_buses(dev);
 
return 0;
 
@@ -556,15 +581,13 @@ void pci_restore_iov_state(struct pci_dev *dev)
 int pci_iov_bus_range(struct pci_bus *bus)
 {
int max = 0;
-   u8 busnr;
struct pci_dev *dev;
 
list_for_each_entry(dev, &bus->devices, bus_list) {
if (!dev->is_physfn)
continue;
-   busnr = virtfn_bus(dev, dev->sriov->total_VFs - 1);
-   if (busnr > max)
-   max = busnr;
+   if (dev->sriov->max_VF_buses > max)
+   max = dev->sriov->max_VF_buses;
}
 
return max ? max - bus->number : 0;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 5732964..bae593c 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -243,6 +243,7 @@ struct pci_sriov {
u16 stride; /* following VF stride */
u32 pgsz;   /* page size for BAR alignment */
u8 link;/* Function Dependency Link */
+   u8 max_VF_buses;/* max buses consumed by VFs */
u16 driver_max_VFs; /* max num VFs driver supports */
struct pci_dev *dev;/* lowest numbered PF */
struct pci_dev *self;   /* this PF */
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V13 05/21] PCI: Refresh First VF Offset and VF Stride when updating NumVFs

2015-03-03 Thread Wei Yang
The First VF Offset and VF Stride fields depend on the NumVFs setting, so
refresh the cached fields in struct pci_sriov when updating NumVFs.  See
the SR-IOV spec r1.1, sec 3.3.9 and 3.3.10.

[bhelgaas: changelog, remove kernel-doc comment marker]
Signed-off-by: Wei Yang 
---
 drivers/pci/iov.c |   23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 27b98c3..a8752c2 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -31,6 +31,21 @@ static inline u8 virtfn_devfn(struct pci_dev *dev, int id)
dev->sriov->stride * id) & 0xff;
 }
 
+/*
+ * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may
+ * change when NumVFs changes.
+ *
+ * Update iov->offset and iov->stride when NumVFs is written.
+ */
+static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
+{
+   struct pci_sriov *iov = dev->sriov;
+
+   pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, nr_virtfn);
+   pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_OFFSET, &iov->offset);
+   pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &iov->stride);
+}
+
 static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
 {
struct pci_bus *child;
@@ -253,7 +268,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
return rc;
}
 
-   pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, nr_virtfn);
+   pci_iov_set_numvfs(dev, nr_virtfn);
iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
pci_cfg_access_lock(dev);
pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
@@ -282,7 +297,7 @@ failed:
iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
pci_cfg_access_lock(dev);
pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
-   pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, 0);
+   pci_iov_set_numvfs(dev, 0);
ssleep(1);
pci_cfg_access_unlock(dev);
 
@@ -313,7 +328,7 @@ static void sriov_disable(struct pci_dev *dev)
sysfs_remove_link(&dev->dev.kobj, "dep_link");
 
iov->num_VFs = 0;
-   pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, 0);
+   pci_iov_set_numvfs(dev, 0);
 }
 
 static int sriov_init(struct pci_dev *dev, int pos)
@@ -452,7 +467,7 @@ static void sriov_restore_state(struct pci_dev *dev)
pci_update_resource(dev, i);
 
pci_write_config_dword(dev, iov->pos + PCI_SRIOV_SYS_PGSIZE, iov->pgsz);
-   pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, iov->num_VFs);
+   pci_iov_set_numvfs(dev, iov->num_VFs);
pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
if (iov->ctrl & PCI_SRIOV_CTRL_VFE)
msleep(100);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V13 04/21] PCI: Index IOV resources in the conventional style

2015-03-03 Thread Wei Yang
From: Bjorn Helgaas 

Most of PCI uses "res = &dev->resource[i]", not "res = dev->resource + i".
Use that style in iov.c also.

No functional change.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 5bca0e1..27b98c3 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -95,7 +95,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
virtfn->multifunction = 0;
 
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
-   res = dev->resource + PCI_IOV_RESOURCES + i;
+   res = &dev->resource[i + PCI_IOV_RESOURCES];
if (!res->parent)
continue;
virtfn->resource[i].name = pci_name(virtfn);
@@ -212,7 +212,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
nres = 0;
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
bars |= (1 << (i + PCI_IOV_RESOURCES));
-   res = dev->resource + PCI_IOV_RESOURCES + i;
+   res = &dev->resource[i + PCI_IOV_RESOURCES];
if (res->parent)
nres++;
}
@@ -373,7 +373,7 @@ found:
 
nres = 0;
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
-   res = dev->resource + PCI_IOV_RESOURCES + i;
+   res = &dev->resource[i + PCI_IOV_RESOURCES];
bar64 = __pci_read_base(dev, pci_bar_unknown, res,
pos + PCI_SRIOV_BAR + i * 4);
if (!res->flags)
@@ -417,7 +417,7 @@ found:
 
 failed:
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
-   res = dev->resource + PCI_IOV_RESOURCES + i;
+   res = &dev->resource[i + PCI_IOV_RESOURCES];
res->flags = 0;
}
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V13 03/21] PCI: Keep individual VF BAR size in struct pci_sriov

2015-03-03 Thread Wei Yang
Currently we don't store the individual VF BAR size.  We calculate it when
needed by dividing the PF's IOV resource size (which contains space for
*all* the VFs) by total_VFs or by reading the BAR in the SR-IOV capability
again.

Keep the individual VF BAR size in struct pci_sriov.barsz[], add
pci_iov_resource_size() to retrieve it, and use that instead of doing the
division or reading the SR-IOV capability BAR.

[bhelgaas: rename to "barsz[]", simplify barsz[] index computation, remove
SR-IOV capability BAR sizing]
Signed-off-by: Wei Yang 
---
 drivers/pci/iov.c   |   39 ---
 drivers/pci/pci.h   |1 +
 include/linux/pci.h |3 +++
 3 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 05f9d97..5bca0e1 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -57,6 +57,14 @@ static void virtfn_remove_bus(struct pci_bus *physbus, 
struct pci_bus *virtbus)
pci_remove_bus(virtbus);
 }
 
+resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
+{
+   if (!dev->is_physfn)
+   return 0;
+
+   return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
+}
+
 static int virtfn_add(struct pci_dev *dev, int id, int reset)
 {
int i;
@@ -92,8 +100,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
continue;
virtfn->resource[i].name = pci_name(virtfn);
virtfn->resource[i].flags = res->flags;
-   size = resource_size(res);
-   do_div(size, iov->total_VFs);
+   size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
virtfn->resource[i].start = res->start + size * id;
virtfn->resource[i].end = virtfn->resource[i].start + size - 1;
rc = request_resource(res, &virtfn->resource[i]);
@@ -311,7 +318,7 @@ static void sriov_disable(struct pci_dev *dev)
 
 static int sriov_init(struct pci_dev *dev, int pos)
 {
-   int i;
+   int i, bar64;
int rc;
int nres;
u32 pgsz;
@@ -360,29 +367,29 @@ found:
pgsz &= ~(pgsz - 1);
pci_write_config_dword(dev, pos + PCI_SRIOV_SYS_PGSIZE, pgsz);
 
+   iov = kzalloc(sizeof(*iov), GFP_KERNEL);
+   if (!iov)
+   return -ENOMEM;
+
nres = 0;
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = dev->resource + PCI_IOV_RESOURCES + i;
-   i += __pci_read_base(dev, pci_bar_unknown, res,
-pos + PCI_SRIOV_BAR + i * 4);
+   bar64 = __pci_read_base(dev, pci_bar_unknown, res,
+   pos + PCI_SRIOV_BAR + i * 4);
if (!res->flags)
continue;
if (resource_size(res) & (PAGE_SIZE - 1)) {
rc = -EIO;
goto failed;
}
+   iov->barsz[i] = resource_size(res);
res->end = res->start + resource_size(res) * total - 1;
dev_info(&dev->dev, "VF(n) BAR%d space: %pR (contains BAR%d for 
%d VFs)\n",
 i, res, i, total);
+   i += bar64;
nres++;
}
 
-   iov = kzalloc(sizeof(*iov), GFP_KERNEL);
-   if (!iov) {
-   rc = -ENOMEM;
-   goto failed;
-   }
-
iov->pos = pos;
iov->nres = nres;
iov->ctrl = ctrl;
@@ -414,6 +421,7 @@ failed:
res->flags = 0;
}
 
+   kfree(iov);
return rc;
 }
 
@@ -510,14 +518,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
  */
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 {
-   struct resource tmp;
-   int reg = pci_iov_resource_bar(dev, resno);
-
-   if (!reg)
-   return 0;
-
-__pci_read_base(dev, pci_bar_unknown, &tmp, reg);
-   return resource_alignment(&tmp);
+   return pci_iov_resource_size(dev, resno);
 }
 
 /**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4091f82..5732964 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -247,6 +247,7 @@ struct pci_sriov {
struct pci_dev *dev;/* lowest numbered PF */
struct pci_dev *self;   /* this PF */
struct mutex lock;  /* lock for VF bus */
+   resource_size_t barsz[PCI_SRIOV_NUM_BARS];  /* VF BAR size */
 };
 
 #ifdef CONFIG_PCI_ATS
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 211e9da..1559658 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1675,6 +1675,7 @@ int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
+resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 #else
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 

[PATCH V13 02/21] PCI: Print PF SR-IOV resource that contains all VF(n) BAR space

2015-03-03 Thread Wei Yang
When we size VF BAR0, VF BAR1, etc., from the SR-IOV Capability of a PF, we
learn the alignment requirement and amount of space consumed by a single
VF.  But when VFs are enabled, *each* of the NumVFs consumes that amount of
space, so the total size of the PF resource is "VF BAR size * NumVFs".

Add a printk of the total space consumed by the VFs corresponding to what
we already do for normal non-IOV BARs.

No functional change; new message only.

[bhelgaas: split out into its own patch]
Signed-off-by: Wei Yang 
---
 drivers/pci/iov.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index c4c33ea..05f9d97 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -372,6 +372,8 @@ found:
goto failed;
}
res->end = res->start + resource_size(res) * total - 1;
+   dev_info(&dev->dev, "VF(n) BAR%d space: %pR (contains BAR%d for 
%d VFs)\n",
+i, res, i, total);
nres++;
}
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V13 01/21] PCI: Print more info in sriov_enable() error message

2015-03-03 Thread Wei Yang
From: Bjorn Helgaas 

If we don't have space for all the bus numbers required to enable VFs,
print the largest bus number required and the range available.

No functional change; improved error message only.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 4b3a4ea..c4c33ea 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -180,6 +180,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
struct pci_dev *pdev;
struct pci_sriov *iov = dev->sriov;
int bars = 0;
+   u8 bus;
 
if (!nr_virtfn)
return 0;
@@ -216,8 +217,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
iov->offset = offset;
iov->stride = stride;
 
-   if (virtfn_bus(dev, nr_virtfn - 1) > dev->bus->busn_res.end) {
-   dev_err(&dev->dev, "SR-IOV: bus number out of range\n");
+   bus = virtfn_bus(dev, nr_virtfn - 1);
+   if (bus > dev->bus->busn_res.end) {
+   dev_err(&dev->dev, "can't enable %d VFs (bus %02x out of range 
of %pR)\n",
+   nr_virtfn, bus, &dev->bus->busn_res);
return -ENOMEM;
}
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V13 00/21] Enable SRIOV on Power8

2015-03-03 Thread Wei Yang
This patchset enables the SRIOV on POWER8.

The gerneral idea is put each VF into one individual PE and allocate required
resources like MMIO/DMA/MSI. The major difficulty comes from the MMIO
allocation and adjustment for PF's IOV BAR.

On P8, we use M64BT to cover a PF's IOV BAR, which could make an individual VF
sit in its own PE. This gives more flexiblity, while at the mean time it
brings on some restrictions on the PF's IOV BAR size and alignment.

To achieve this effect, we need to do some hack on pci devices's resources.
1. Expand the IOV BAR properly.
   Done by pnv_pci_ioda_fixup_iov_resources().
2. Shift the IOV BAR properly.
   Done by pnv_pci_vf_resource_shift().
3. IOV BAR alignment is calculated by arch dependent function instead of an
   individual VF BAR size.
   Done by pnv_pcibios_sriov_resource_alignment().
4. Take the IOV BAR alignment into consideration in the sizing and assigning.
   This is achieved by commit: "PCI: Take additional IOV BAR alignment in
   sizing and assigning"

Test Environment:
   The SRIOV device tested is Emulex Lancer(10df:e220) and
   Mellanox ConnectX-3(15b3:1003) on POWER8.

Examples on pass through a VF to guest through vfio:
1. unbind the original driver and bind to vfio-pci driver
   echo :06:0d.0 > /sys/bus/pci/devices/:06:0d.0/driver/unbind
   echo  1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id
   Note: this should be done for each device in the same iommu_group
2. Start qemu and pass device through vfio
   /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \
   -M pseries -m 2048 -enable-kvm -nographic \
   -drive file=/home/ywywyang/kvm/fc19.img \
   -monitor telnet:localhost:5435,server,nowait -boot cd \
   -device 
"spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6"

Verify this is the exact VF response:
1. ping from a machine in the same subnet(the broadcast domain)
2. run arp -n on this machine
   9.115.251.20 ether   00:00:c9:df:ed:bf   C eth0
3. ifconfig in the guest
   # ifconfig eth1
   eth1: flags=4163  mtu 1500
inet 9.115.251.20  netmask 255.255.255.0  broadcast 
9.115.251.255
inet6 fe80::200:c9ff:fedf:edbf  prefixlen 64  scopeid 0x20
ether 00:00:c9:df:ed:bf  txqueuelen 1000 (Ethernet)
RX packets 175  bytes 13278 (12.9 KiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 58  bytes 9276 (9.0 KiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
4. They have the same MAC address

Note: make sure you shutdown other network interfaces in guest.

---
v13:
   * fix error in pcibios_iov_resource_alignment(), use pdev instead of dev
   * rename vf_num to num_vfs in pcibios_sriov_enable(),
 pnv_pci_vf_resource_shift(), pnv_pci_sriov_disable(),
 pnv_pci_sriov_enable(), pnv_pci_ioda2_setup_dma_pe()
   * add more explanation in commit "powerpc/pci: Don't unset PCI resources
 for VFs"
   * fix IOV BAR in hotplug path as well, and don't fixup an already added
 device
   * use roundup_pow_of_two() instead of __roundup_pow_of_two()
   * this is based on v4.0-rc1
v12:
   * remove "align" parameter from pcibios_iov_resource_alignment()
 default version returns pci_iov_resource_size() instead of the
 "align" parameter
   * in powerpc pcibios_iov_resource_alignment(), return
 pci_iov_resource_size() if there's no ppc_md function pointer
   * in pci_sriov_resource_alignment(), don't re-read base, since we
 saved the required alignment when reading it the first time
   * remove "vf_num" parameter from add_dev_pci_info() and
 remove_dev_pci_info(); use pci_sriov_get_totalvfs() instead
   * use dev_warn() instead of pr_warn() when possible
   * check to be sure IOV BAR is still in range after shifting, change
 pnv_pci_vf_resource_shift() from void to int
   * improve sriov_enable() error message
   * improve SR-IOV BAR sizing message
   * index IOV resources in conventional style
   * include preamble patches (refresh offset/stride when updating numVFs,
 calculate max buses required
   * restructure pci_iov_max_bus_range() to return value instead of updating
 internally, rename to virtfn_max_buses()
   * fix typos & formatting
   * expand documentation
v11:
   * fix some compile warning
v10:
   * remove weak function pcibios_iov_resource_size()
 the VF BAR size is stored in pci_sriov structure and retrieved from
 pci_iov_resource_size()
   * Use "Reserve additional" instead of "Expand" to be more acurate in the
 change log
   * add log message to show the PF's IOV BAR final size
   * add pcibios_sriov_enable/disable() weak funcion in sriov_enable/disable()
 for arch setup before enable VFs. Like the arch could fix up the BDF for
 VFs, since the change of 

[git pull] Please pull mpe/linux.git powerpc-4.0-2 tag

2015-03-03 Thread Michael Ellerman
Hi Linus,

Please pull some powerpc fixes for 4.0:

The following changes since commit c517d838eb7d07bbe9507871fab3931deccff539:

  Linux 4.0-rc1 (2015-02-22 18:21:14 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux.git tags/powerpc-4.0-2

for you to fetch changes up to 4ad04e5987115ece5fa8a0cf1dc72fcd4707e33e:

  powerpc/iommu: Remove IOMMU device references via bus notifier (2015-03-04 
13:19:33 +1100)


powerpc fixes for 4.0

- Fix for dynticks.
- Fix for smpboot bug.
- Fix for IOMMU group refcounting.


Michael Ellerman (1):
  powerpc/smp: Wait until secondaries are active & online

Nishanth Aravamudan (1):
  powerpc/iommu: Remove IOMMU device references via bus notifier

Paul Clarke (1):
  powerpc: Re-enable dynticks

 arch/powerpc/include/asm/iommu.h   |  6 ++
 arch/powerpc/include/asm/irq_work.h|  9 +
 arch/powerpc/kernel/iommu.c| 26 ++
 arch/powerpc/kernel/smp.c  |  4 ++--
 arch/powerpc/platforms/powernv/pci.c   | 26 --
 arch/powerpc/platforms/pseries/iommu.c |  2 ++
 6 files changed, 45 insertions(+), 28 deletions(-)
 create mode 100644 arch/powerpc/include/asm/irq_work.h





signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/5] split ET_DYN ASLR from mmap ASLR

2015-03-03 Thread Ingo Molnar

* Kees Cook  wrote:

> On Mon, Mar 2, 2015 at 11:31 PM, Ingo Molnar  wrote:
> >
> > * Kees Cook  wrote:
> >
> >> To address the "offset2lib" ASLR weakness[1], this separates ET_DYN
> >> ASLR from mmap ASLR, as already done on s390. The architectures
> >> that are already randomizing mmap (arm, arm64, mips, powerpc, s390,
> >> and x86), have their various forms of arch_mmap_rnd() made available
> >> via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these architectures,
> >> arch_randomize_brk() is collapsed as well.
> >>
> >> This is an alternative to the solutions in:
> >> https://lkml.org/lkml/2015/2/23/442
> >
> > Looks good so far:
> >
> > Reviewed-by: Ingo Molnar 
> >
> > While reviewing this series I also noticed that the following code
> > could be factored out from architecture mmap code as well:
> >
> >   - arch_pick_mmap_layout() uses very similar patterns across the
> > platforms, with only few variations. Many architectures use
> > the same duplicated mmap_is_legacy() helper as well. There's
> > usually just trivial differences between mmap_legacy_base()
> > approaches as well.
> 
> I was nervous to start refactoring this code, but it's true: most of 
> it is the same.

Well, it still needs to be done if we want to add new randomization 
features: code fractured over multiple architectures is a receipe for 
bugs, as this series demonstrates. So it first has to be made more 
maintainable.

> >   - arch_mmap_rnd(): the PF_RANDOMIZE checks are needlessly
> > exposed to the arch routine - the arch routine should only
> > concentrate on arch details, not generic flags like
> > PF_RANDOMIZE.
> 
> Yeah, excellent point. I will send a follow-up patch to move this 
> into binfmt_elf instead. I'd like to avoid removing it in any of the 
> other patches since each was attempting a single step in the 
> refactoring.

Finegrained patches are ideal!

> > In theory the mmap layout could be fully parametrized as well: 
> > i.e. no callback functions to architectures by default at all: 
> > just declarations of bits of randomization desired (or, available 
> > address space bits), and perhaps an arch helper to allow 32-bit 
> > vs. 64-bit address space distinctions.
> 
> Yeah, I was considering that too, since each architecture has a 
> nearly identical arch_mmap_rnd() at this point. Only the size of the 
> entropy was changing.
>
> > 'Weird' architectures could provide special routines, but only by 
> > overriding the default behavior, which should be generic, safe and 
> > robust.
> 
> Yeah, quite true. Should entropy size be a #define like 
> ELF_ET_DYN_BASE? Something like ASLR_MMAP_ENTROPY and 
> ASLR_MMAP_ENTROPY_32? [...]

That would work I suspect.

> [...] Is there a common function for determining a compat task? That 
> seemed to be per-arch too. Maybe arch_mmap_entropy()?

Compat flags are a bit of a mess, and since they often tie into arch 
low level assembly code, they are hard to untangle. So maybe as an 
intermediate step add an is_compat() generic method, and make that 
obvious and self-defined function a per arch thing?

But I'm just handwaving here - I suspect it has to be tried to see all 
the complications and to determine whether that's the best structure 
and whether it's a win ... Only one thing is certain: the current code 
is not compact and reviewable enough, and VM bits hiding in 
arch/*/mm/mmap.c tends to reduce net attention paid to these details.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/5] mm: split ET_DYN ASLR from mmap ASLR

2015-03-03 Thread Michael Ellerman
On Mon, 2015-03-02 at 16:19 -0800, Kees Cook wrote:
> This fixes the "offset2lib" weakness in ASLR for arm, arm64, mips,
> powerpc, and x86. The problem is that if there is a leak of ASLR from
> the executable (ET_DYN), it means a leak of shared library offset as
> well (mmap), and vice versa. Further details and a PoC of this attack
> are available here:
> http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html
> 
> With this patch, a PIE linked executable (ET_DYN) has its own ASLR region:
> 
> $ ./show_mmaps_pie
> 54859ccd6000-54859ccd7000 r-xp  ...  /tmp/show_mmaps_pie
> 54859ced6000-54859ced7000 r--p  ...  /tmp/show_mmaps_pie
> 54859ced7000-54859ced8000 rw-p  ...  /tmp/show_mmaps_pie

Just to be clear, it's the fact that the above vmas are in a different
address range to those below that shows the patch is working, right?

> 7f75be764000-7f75be91f000 r-xp  ...  /lib/x86_64-linux-gnu/libc.so.6
> 7f75be91f000-7f75beb1f000 ---p  ...  /lib/x86_64-linux-gnu/libc.so.6


On powerpc I'm seeing:

# /bin/dash
# cat /proc/$$/maps
524e-5251 r-xp  08:03 129814 
/bin/dash
5251-5252 rw-p 0002 08:03 129814 
/bin/dash
10034f2-10034f5 rw-p  00:00 0[heap]
3fffaeaf-3fffaeca r-xp  08:03 13529  
/lib/powerpc64le-linux-gnu/libc-2.19.so
3fffaeca-3fffaecb rw-p 001a 08:03 13529  
/lib/powerpc64le-linux-gnu/libc-2.19.so
3fffaecc-3fffaecd rw-p  00:00 0 
3fffaecd-3fffaecf r-xp  00:00 0  [vdso]
3fffaecf-3fffaed2 r-xp  08:03 13539  
/lib/powerpc64le-linux-gnu/ld-2.19.so
3fffaed2-3fffaed3 rw-p 0002 08:03 13539  
/lib/powerpc64le-linux-gnu/ld-2.19.so
3fffc707-3fffc70a rw-p  00:00 0  [stack]


Whereas previously the /bin/dash vmas were up at 3fff..

So looks good to me for powerpc.

Acked-by: Michael Ellerman 

cheers



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v12 17/21] powerpc/powernv: Shift VF resource with an offset

2015-03-03 Thread Wei Yang
On Tue, Feb 24, 2015 at 03:00:37AM -0600, Bjorn Helgaas wrote:
>On Tue, Feb 24, 2015 at 02:34:57AM -0600, Bjorn Helgaas wrote:
>> From: Wei Yang 
>> 
>> On PowerNV platform, resource position in M64 implies the PE# the resource
>> belongs to.  In some cases, adjustment of a resource is necessary to locate
>> it to a correct position in M64.
>> 
>> Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address
>> according to an offset.
>> 
>> [bhelgaas: rework loops, rework overlap check, index resource[]
>> conventionally, remove pci_regs.h include, squashed with next patch]
>> Signed-off-by: Wei Yang 
>> Signed-off-by: Bjorn Helgaas 
>
>...
>
>> +#ifdef CONFIG_PCI_IOV
>> +static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>> +{
>> +struct pci_dn *pdn = pci_get_pdn(dev);
>> +int i;
>> +struct resource *res, res2;
>> +resource_size_t size;
>> +u16 vf_num;
>> +
>> +if (!dev->is_physfn)
>> +return -EINVAL;
>> +
>> +/*
>> + * "offset" is in VFs.  The M64 windows are sized so that when they
>> + * are segmented, each segment is the same size as the IOV BAR.
>> + * Each segment is in a separate PE, and the high order bits of the
>> + * address are the PE number.  Therefore, each VF's BAR is in a
>> + * separate PE, and changing the IOV BAR start address changes the
>> + * range of PEs the VFs are in.
>> + */
>> +vf_num = pdn->vf_pes;
>> +for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>> +res = &dev->resource[i + PCI_IOV_RESOURCES];
>> +if (!res->flags || !res->parent)
>> +continue;
>> +
>> +if (!pnv_pci_is_mem_pref_64(res->flags))
>> +continue;
>> +
>> +/*
>> + * The actual IOV BAR range is determined by the start address
>> + * and the actual size for vf_num VFs BAR.  This check is to
>> + * make sure that after shifting, the range will not overlap
>> + * with another device.
>> + */
>> +size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>> +res2.flags = res->flags;
>> +res2.start = res->start + (size * offset);
>> +res2.end = res2.start + (size * vf_num) - 1;
>> +
>> +if (res2.end > res->end) {
>> +dev_err(&dev->dev, "VF BAR%d: %pR would extend past %pR 
>> (trying to enable %d VFs shifted by %d)\n",
>> +i, &res2, res, vf_num, offset);
>> +return -EBUSY;
>> +}
>> +}
>> +
>> +for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>> +res = &dev->resource[i + PCI_IOV_RESOURCES];
>> +if (!res->flags || !res->parent)
>> +continue;
>> +
>> +if (!pnv_pci_is_mem_pref_64(res->flags))
>> +continue;
>> +
>> +size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>> +res2 = *res;
>> +res->start += size * offset;
>
>I'm still not happy about this fiddling with res->start.
>
>Increasing res->start means that in principle, the "size * offset" bytes
>that we just removed from res are now available for allocation to somebody
>else.  I don't think we *will* give that space to anything else because of
>the alignment restrictions you're enforcing, but "res" now doesn't
>correctly describe the real resource map.
>
>Would you be able to just update the BAR here while leaving the struct
>resource alone?  In that case, it would look a little funny that lspci
>would show a BAR value in the middle of the region in /proc/iomem, but
>the /proc/iomem region would be more correct.

Bjorn,

I did some tests, while the result is not good.

What I did is still write the shifted resource address to the device by
pci_update_resource(), but I revert the res->start to the original one. If
this step is not correct, please let me know.

This can't work since after we revert the res->start, those VFs will be given
resources from res->start instead of (res->start + offset * size). This is not
what we expect.

I have rebased/clean/change the code according to your comments based on this
patch set. Will send it out v13 soon.

>
>> +
>> +dev_info(&dev->dev, "VF BAR%d: %pR shifted to %pR (enabling %d 
>> VFs shifted by %d)\n",
>> + i, &res2, res, vf_num, offset);
>> +pci_update_resource(dev, i + PCI_IOV_RESOURCES);
>> +}
>> +pdn->max_vfs -= offset;
>> +return 0;
>> +}
>> +#endif /* CONFIG_PCI_IOV */

-- 
Richard Yang
Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 06/10] s390: standardize mmap_rnd() usage

2015-03-03 Thread Kees Cook
In preparation for splitting out ET_DYN ASLR, this refactors the use of
mmap_rnd() to be used similarly to arm and x86, and extracts the checking
of PF_RANDOMIZE.

Signed-off-by: Kees Cook 
---
 arch/s390/mm/mmap.c | 34 +++---
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
index 179a2c20b01f..db57078075c5 100644
--- a/arch/s390/mm/mmap.c
+++ b/arch/s390/mm/mmap.c
@@ -62,20 +62,18 @@ static inline int mmap_is_legacy(void)
 
 static unsigned long mmap_rnd(void)
 {
-   if (!(current->flags & PF_RANDOMIZE))
-   return 0;
if (is_32bit_task())
return (get_random_int() & 0x7ff) << PAGE_SHIFT;
else
return (get_random_int() & mmap_rnd_mask) << PAGE_SHIFT;
 }
 
-static unsigned long mmap_base_legacy(void)
+static unsigned long mmap_base_legacy(unsigned long rnd)
 {
-   return TASK_UNMAPPED_BASE + mmap_rnd();
+   return TASK_UNMAPPED_BASE + rnd;
 }
 
-static inline unsigned long mmap_base(void)
+static inline unsigned long mmap_base(unsigned long rnd)
 {
unsigned long gap = rlimit(RLIMIT_STACK);
 
@@ -84,7 +82,7 @@ static inline unsigned long mmap_base(void)
else if (gap > MAX_GAP)
gap = MAX_GAP;
gap &= PAGE_MASK;
-   return STACK_TOP - stack_maxrandom_size() - mmap_rnd() - gap;
+   return STACK_TOP - stack_maxrandom_size() - rnd - gap;
 }
 
 unsigned long
@@ -187,7 +185,11 @@ unsigned long randomize_et_dyn(void)
if (!is_32bit_task())
/* Align to 4GB */
base &= ~((1UL << 32) - 1);
-   return base + mmap_rnd();
+
+   if (current->flags & PF_RANDOMIZE)
+   base += mmap_rnd();
+
+   return base;
 }
 
 #ifndef CONFIG_64BIT
@@ -198,15 +200,20 @@ unsigned long randomize_et_dyn(void)
  */
 void arch_pick_mmap_layout(struct mm_struct *mm)
 {
+   unsigned long random_factor = 0UL;
+
+   if (current->flags & PF_RANDOMIZE)
+   random_factor = mmap_rnd();
+
/*
 * Fall back to the standard layout if the personality
 * bit is set, or if the expected stack growth is unlimited:
 */
if (mmap_is_legacy()) {
-   mm->mmap_base = mmap_base_legacy();
+   mm->mmap_base = mmap_base_legacy(random_factor);
mm->get_unmapped_area = arch_get_unmapped_area;
} else {
-   mm->mmap_base = mmap_base();
+   mm->mmap_base = mmap_base(random_factor);
mm->get_unmapped_area = arch_get_unmapped_area_topdown;
}
 }
@@ -273,15 +280,20 @@ s390_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr,
  */
 void arch_pick_mmap_layout(struct mm_struct *mm)
 {
+   unsigned long random_factor = 0UL;
+
+   if (current->flags & PF_RANDOMIZE)
+   random_factor = mmap_rnd();
+
/*
 * Fall back to the standard layout if the personality
 * bit is set, or if the expected stack growth is unlimited:
 */
if (mmap_is_legacy()) {
-   mm->mmap_base = mmap_base_legacy();
+   mm->mmap_base = mmap_base_legacy(random_factor);
mm->get_unmapped_area = s390_get_unmapped_area;
} else {
-   mm->mmap_base = mmap_base();
+   mm->mmap_base = mmap_base(random_factor);
mm->get_unmapped_area = s390_get_unmapped_area_topdown;
}
 }
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 09/10] mm: split ET_DYN ASLR from mmap ASLR

2015-03-03 Thread Kees Cook
This fixes the "offset2lib" weakness in ASLR for arm, arm64, mips,
powerpc, and x86. The problem is that if there is a leak of ASLR from
the executable (ET_DYN), it means a leak of shared library offset as
well (mmap), and vice versa. Further details and a PoC of this attack
is available here:
http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html

With this patch, a PIE linked executable (ET_DYN) has its own ASLR region:

$ ./show_mmaps_pie
54859ccd6000-54859ccd7000 r-xp  ...  /tmp/show_mmaps_pie
54859ced6000-54859ced7000 r--p  ...  /tmp/show_mmaps_pie
54859ced7000-54859ced8000 rw-p  ...  /tmp/show_mmaps_pie
7f75be764000-7f75be91f000 r-xp  ...  /lib/x86_64-linux-gnu/libc.so.6
7f75be91f000-7f75beb1f000 ---p  ...  /lib/x86_64-linux-gnu/libc.so.6
7f75beb1f000-7f75beb23000 r--p  ...  /lib/x86_64-linux-gnu/libc.so.6
7f75beb23000-7f75beb25000 rw-p  ...  /lib/x86_64-linux-gnu/libc.so.6
7f75beb25000-7f75beb2a000 rw-p  ...
7f75beb2a000-7f75beb4d000 r-xp  ...  /lib64/ld-linux-x86-64.so.2
7f75bed45000-7f75bed46000 rw-p  ...
7f75bed46000-7f75bed47000 r-xp  ...
7f75bed47000-7f75bed4c000 rw-p  ...
7f75bed4c000-7f75bed4d000 r--p  ...  /lib64/ld-linux-x86-64.so.2
7f75bed4d000-7f75bed4e000 rw-p  ...  /lib64/ld-linux-x86-64.so.2
7f75bed4e000-7f75bed4f000 rw-p  ...
7fffb3741000-7fffb3762000 rw-p  ...  [stack]
7fffb377b000-7fffb377d000 r--p  ...  [vvar]
7fffb377d000-7fffb377f000 r-xp  ...  [vdso]

The change is to add a call the newly created arch_mmap_rnd() into the
ELF loader for handling ET_DYN ASLR in a separate region from mmap ASLR,
as was already done on s390. Removes CONFIG_BINFMT_ELF_RANDOMIZE_PIE,
which is no longer needed.

Reported-by: Hector Marco-Gisbert 
Signed-off-by: Kees Cook 
---
 arch/arm/Kconfig|  1 -
 arch/arm64/Kconfig  |  1 -
 arch/mips/Kconfig   |  1 -
 arch/powerpc/Kconfig|  1 -
 arch/s390/include/asm/elf.h |  5 ++---
 arch/s390/mm/mmap.c |  8 
 arch/x86/Kconfig|  1 -
 fs/Kconfig.binfmt   |  3 ---
 fs/binfmt_elf.c | 18 --
 9 files changed, 6 insertions(+), 33 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 248d99cabaa8..e2f0ef9c6ee3 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1,7 +1,6 @@
 config ARM
bool
default y
-   select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5f469095e0e2..07e0fc7adc88 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1,6 +1,5 @@
 config ARM64
def_bool y
-   select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 72ce5cece768..557c5f1772c1 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -23,7 +23,6 @@ config MIPS
select HAVE_KRETPROBES
select HAVE_DEBUG_KMEMLEAK
select HAVE_SYSCALL_TRACEPOINTS
-   select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ELF_RANDOMIZE
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES && 64BIT
select RTC_LIB if !MACH_LOONGSON
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 14fe1c411489..910fa4f9ad1e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -88,7 +88,6 @@ config PPC
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
select BINFMT_ELF
-   select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ELF_RANDOMIZE
select OF
select OF_EARLY_FLATTREE
diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h
index 2e63de8aac7c..d0db9d944b6d 100644
--- a/arch/s390/include/asm/elf.h
+++ b/arch/s390/include/asm/elf.h
@@ -163,10 +163,9 @@ extern unsigned int vdso_enabled;
the loader.  We need to make sure that it is out of the way of the program
that it will "exec", and that there is sufficient room for the brk. 64-bit
tasks are aligned to 4GB. */
-extern unsigned long randomize_et_dyn(void);
-#define ELF_ET_DYN_BASE (randomize_et_dyn() + (is_32bit_task() ? \
+#define ELF_ET_DYN_BASE (is_32bit_task() ? \
(STACK_TOP / 3 * 2) : \
-   (STACK_TOP / 3 * 2) & ~((1UL << 32) - 1)))
+   (STACK_TOP / 3 * 2) & ~((1UL << 32) - 1))
 
 /* This yields a mask that user programs can use to figure out what
instruction set this CPU supports. */
diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
index 8c11536f972d..bb3367c5cb0b 100644
--- a/arch/s390/mm/mmap.c
+++ b/arch/s390/mm/mmap.c
@@ -177,14 +177,6 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
return addr;
 }
 
-unsigned long randomize_et_dyn(void

[PATCH v3 04/10] mips: extract logic for mmap_rnd()

2015-03-03 Thread Kees Cook
In preparation for splitting out ET_DYN ASLR, extract the mmap ASLR
selection into a separate function.

Signed-off-by: Kees Cook 
---
 arch/mips/mm/mmap.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
index f1baadd56e82..673a5cfe082f 100644
--- a/arch/mips/mm/mmap.c
+++ b/arch/mips/mm/mmap.c
@@ -142,18 +142,26 @@ unsigned long arch_get_unmapped_area_topdown(struct file 
*filp,
addr0, len, pgoff, flags, DOWN);
 }
 
+static unsigned long mmap_rnd(void)
+{
+   unsigned long rnd;
+
+   rnd = (unsigned long)get_random_int();
+   rnd <<= PAGE_SHIFT;
+   if (TASK_IS_32BIT_ADDR)
+   random_factor &= 0xfful;
+   else
+   random_factor &= 0xffful;
+
+   return rnd;
+}
+
 void arch_pick_mmap_layout(struct mm_struct *mm)
 {
unsigned long random_factor = 0UL;
 
-   if (current->flags & PF_RANDOMIZE) {
-   random_factor = get_random_int();
-   random_factor = random_factor << PAGE_SHIFT;
-   if (TASK_IS_32BIT_ADDR)
-   random_factor &= 0xfful;
-   else
-   random_factor &= 0xffful;
-   }
+   if (current->flags & PF_RANDOMIZE)
+   random_factor = mmap_rnd();
 
if (mmap_is_legacy()) {
mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 07/10] mm: expose arch_mmap_rnd when available

2015-03-03 Thread Kees Cook
When an architecture fully supports randomizing the ELF load location,
a per-arch mmap_rnd() function is used to find a randomized mmap base.
In preparation for randomizing the location of ET_DYN binaries
separately from mmap, this renames and exports these functions as
arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE
for describing this feature on architectures that support it
(which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390
already supports a separated ET_DYN ASLR from mmap ASLR without the
ARCH_BINFMT_ELF_RANDOMIZE_PIE logic).

Signed-off-by: Kees Cook 
---
 arch/Kconfig  |  7 +++
 arch/arm/Kconfig  |  1 +
 arch/arm/mm/mmap.c|  4 ++--
 arch/arm64/Kconfig|  1 +
 arch/arm64/mm/mmap.c  |  4 ++--
 arch/mips/Kconfig |  1 +
 arch/mips/mm/mmap.c   |  4 ++--
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/mm/mmap.c|  4 ++--
 arch/s390/Kconfig |  1 +
 arch/s390/mm/mmap.c   |  8 
 arch/x86/Kconfig  |  1 +
 arch/x86/mm/mmap.c|  4 ++--
 include/linux/elf-randomize.h | 10 ++
 14 files changed, 37 insertions(+), 14 deletions(-)
 create mode 100644 include/linux/elf-randomize.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 05d7a8a458d5..9ff5aa8fa2c1 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -484,6 +484,13 @@ config HAVE_IRQ_EXIT_ON_IRQ_STACK
  This spares a stack switch and improves cache usage on softirq
  processing.
 
+config ARCH_HAS_ELF_RANDOMIZE
+   bool
+   help
+ An architecture supports choosing randomized locations for
+ stack, mmap, brk, and ET_DYN. Defined functions:
+ - arch_mmap_rnd()
+
 #
 # ABI hall of shame
 #
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9f1f09a2bc9b..248d99cabaa8 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@ config ARM
default y
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
+   select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c
index 15a8160096b3..407dc786583a 100644
--- a/arch/arm/mm/mmap.c
+++ b/arch/arm/mm/mmap.c
@@ -169,7 +169,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
return addr;
 }
 
-static unsigned long mmap_rnd(void)
+unsigned long arch_mmap_rnd(void)
 {
unsigned long rnd;
 
@@ -184,7 +184,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
unsigned long random_factor = 0UL;
 
if (current->flags & PF_RANDOMIZE)
-   random_factor = mmap_rnd();
+   random_factor = arch_mmap_rnd();
 
if (mmap_is_legacy()) {
mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1b8e97331ffb..5f469095e0e2 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2,6 +2,7 @@ config ARM64
def_bool y
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
+   select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index 837626d0e142..c25f8ed6d7b6 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -47,7 +47,7 @@ static int mmap_is_legacy(void)
return sysctl_legacy_va_layout;
 }
 
-static unsigned long mmap_rnd(void)
+unsigned long arch_mmap_rnd(void)
 {
unsigned long rnd;
 
@@ -77,7 +77,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
unsigned long random_factor = 0UL;
 
if (current->flags & PF_RANDOMIZE)
-   random_factor = mmap_rnd();
+   random_factor = arch_mmap_rnd();
 
/*
 * Fall back to the standard layout if the personality bit is set, or
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index c7a16904cd03..72ce5cece768 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -24,6 +24,7 @@ config MIPS
select HAVE_DEBUG_KMEMLEAK
select HAVE_SYSCALL_TRACEPOINTS
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
+   select ARCH_HAS_ELF_RANDOMIZE
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES && 64BIT
select RTC_LIB if !MACH_LOONGSON
select GENERIC_ATOMIC64 if !64BIT
diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
index 673a5cfe082f..ce54fff57c93 100644
--- a/arch/mips/mm/mmap.c
+++ b/arch/mips/mm/mmap.c
@@ -142,7 +142,7 @@ unsigned long arch_get_unmapped_area_topdown(struct file 
*filp,
addr0, len, pgoff, flags, DOWN);
 }
 
-static unsigned long mmap_rnd(void)
+unsigned long arch_mmap_rnd(void)
 {
  

[PATCH v3 03/10] arm64: standardize mmap_rnd() usage

2015-03-03 Thread Kees Cook
In preparation for splitting out ET_DYN ASLR, this refactors the use of
mmap_rnd() to be used similarly to arm and x86. This additionally enables
mmap ASLR on legacy mmap layouts, which appeared to be missing on arm64,
and was already supported on arm. Additionally removes a copy/pasted
declaration of an unused function.

Signed-off-by: Kees Cook 
---
 arch/arm64/include/asm/elf.h |  1 -
 arch/arm64/mm/mmap.c | 18 +++---
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index 1f65be393139..f724db00b235 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -125,7 +125,6 @@ typedef struct user_fpsimd_state elf_fpregset_t;
  * the loader.  We need to make sure that it is out of the way of the program
  * that it will "exec", and that there is sufficient room for the brk.
  */
-extern unsigned long randomize_et_dyn(unsigned long base);
 #define ELF_ET_DYN_BASE(2 * TASK_SIZE_64 / 3)
 
 /*
diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index 54922d1275b8..837626d0e142 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -49,15 +49,14 @@ static int mmap_is_legacy(void)
 
 static unsigned long mmap_rnd(void)
 {
-   unsigned long rnd = 0;
+   unsigned long rnd;
 
-   if (current->flags & PF_RANDOMIZE)
-   rnd = (long)get_random_int() & STACK_RND_MASK;
+   rnd = (unsigned long)get_random_int() & STACK_RND_MASK;
 
return rnd << PAGE_SHIFT;
 }
 
-static unsigned long mmap_base(void)
+static unsigned long mmap_base(unsigned long base)
 {
unsigned long gap = rlimit(RLIMIT_STACK);
 
@@ -66,7 +65,7 @@ static unsigned long mmap_base(void)
else if (gap > MAX_GAP)
gap = MAX_GAP;
 
-   return PAGE_ALIGN(STACK_TOP - gap - mmap_rnd());
+   return PAGE_ALIGN(STACK_TOP - gap - base);
 }
 
 /*
@@ -75,15 +74,20 @@ static unsigned long mmap_base(void)
  */
 void arch_pick_mmap_layout(struct mm_struct *mm)
 {
+   unsigned long random_factor = 0UL;
+
+   if (current->flags & PF_RANDOMIZE)
+   random_factor = mmap_rnd();
+
/*
 * Fall back to the standard layout if the personality bit is set, or
 * if the expected stack growth is unlimited:
 */
if (mmap_is_legacy()) {
-   mm->mmap_base = TASK_UNMAPPED_BASE;
+   mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
mm->get_unmapped_area = arch_get_unmapped_area;
} else {
-   mm->mmap_base = mmap_base();
+   mm->mmap_base = mmap_base(random_factor);
mm->get_unmapped_area = arch_get_unmapped_area_topdown;
}
 }
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 10/10] mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE

2015-03-03 Thread Kees Cook
The arch_randomize_brk() function is used on several architectures,
even those that don't support ET_DYN ASLR. To avoid bulky extern/#define
tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for
the architectures that support it, while still handling CONFIG_COMPAT_BRK.

Signed-off-by: Kees Cook 
---
 arch/Kconfig   |  1 +
 arch/arm/include/asm/elf.h |  4 
 arch/arm64/include/asm/elf.h   |  4 
 arch/mips/include/asm/elf.h|  4 
 arch/powerpc/include/asm/elf.h |  4 
 arch/s390/include/asm/elf.h|  3 ---
 arch/x86/include/asm/elf.h |  3 ---
 fs/binfmt_elf.c|  4 +---
 include/linux/elf-randomize.h  | 12 
 9 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 9ff5aa8fa2c1..d4f270a54fe6 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -490,6 +490,7 @@ config ARCH_HAS_ELF_RANDOMIZE
  An architecture supports choosing randomized locations for
  stack, mmap, brk, and ET_DYN. Defined functions:
  - arch_mmap_rnd()
+ - arch_randomize_brk()
 
 #
 # ABI hall of shame
diff --git a/arch/arm/include/asm/elf.h b/arch/arm/include/asm/elf.h
index afb9cafd3786..c1ff8ab12914 100644
--- a/arch/arm/include/asm/elf.h
+++ b/arch/arm/include/asm/elf.h
@@ -125,10 +125,6 @@ int dump_task_regs(struct task_struct *t, elf_gregset_t 
*elfregs);
 extern void elf_set_personality(const struct elf32_hdr *);
 #define SET_PERSONALITY(ex)elf_set_personality(&(ex))
 
-struct mm_struct;
-extern unsigned long arch_randomize_brk(struct mm_struct *mm);
-#define arch_randomize_brk arch_randomize_brk
-
 #ifdef CONFIG_MMU
 #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1
 struct linux_binprm;
diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index f724db00b235..faad6df49e5b 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -156,10 +156,6 @@ extern int arch_setup_additional_pages(struct linux_binprm 
*bprm,
 #define STACK_RND_MASK (0x3 >> (PAGE_SHIFT - 12))
 #endif
 
-struct mm_struct;
-extern unsigned long arch_randomize_brk(struct mm_struct *mm);
-#define arch_randomize_brk arch_randomize_brk
-
 #ifdef CONFIG_COMPAT
 
 #ifdef __AARCH64EB__
diff --git a/arch/mips/include/asm/elf.h b/arch/mips/include/asm/elf.h
index 535f196ffe02..31d747d46a23 100644
--- a/arch/mips/include/asm/elf.h
+++ b/arch/mips/include/asm/elf.h
@@ -410,10 +410,6 @@ struct linux_binprm;
 extern int arch_setup_additional_pages(struct linux_binprm *bprm,
   int uses_interp);
 
-struct mm_struct;
-extern unsigned long arch_randomize_brk(struct mm_struct *mm);
-#define arch_randomize_brk arch_randomize_brk
-
 struct arch_elf_state {
int fp_abi;
int interp_fp_abi;
diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h
index 57d289acb803..ee46ffef608e 100644
--- a/arch/powerpc/include/asm/elf.h
+++ b/arch/powerpc/include/asm/elf.h
@@ -128,10 +128,6 @@ extern int arch_setup_additional_pages(struct linux_binprm 
*bprm,
(0x7ff >> (PAGE_SHIFT - 12)) : \
(0x3 >> (PAGE_SHIFT - 12)))
 
-extern unsigned long arch_randomize_brk(struct mm_struct *mm);
-#define arch_randomize_brk arch_randomize_brk
-
-
 #ifdef CONFIG_SPU_BASE
 /* Notes used in ET_CORE. Note name is "SPU//". */
 #define NT_SPU 1
diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h
index d0db9d944b6d..fdda72e56404 100644
--- a/arch/s390/include/asm/elf.h
+++ b/arch/s390/include/asm/elf.h
@@ -226,9 +226,6 @@ struct linux_binprm;
 #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1
 int arch_setup_additional_pages(struct linux_binprm *, int);
 
-extern unsigned long arch_randomize_brk(struct mm_struct *mm);
-#define arch_randomize_brk arch_randomize_brk
-
 void *fill_cpu_elf_notes(void *ptr, struct save_area *sa, __vector128 *vxrs);
 
 #endif
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index ca3347a9dab5..bbdace22daf8 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -338,9 +338,6 @@ extern int compat_arch_setup_additional_pages(struct 
linux_binprm *bprm,
  int uses_interp);
 #define compat_arch_setup_additional_pages compat_arch_setup_additional_pages
 
-extern unsigned long arch_randomize_brk(struct mm_struct *mm);
-#define arch_randomize_brk arch_randomize_brk
-
 /*
  * True on X86_32 or when emulating IA32 on X86_64
  */
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 6f08f5fa99dc..a115da230ce0 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1043,15 +1043,13 @@ static int load_elf_binary(struct linux_binprm *bprm)
current->mm->end_data = end_data;
current->mm->start_stack = bprm->p;
 
-#ifdef arch_randomize_brk
if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) {
current->mm->brk = current->mm->start_brk =
a

[PATCH v3 02/10] x86: standardize mmap_rnd() usage

2015-03-03 Thread Kees Cook
In preparation for splitting out ET_DYN ASLR, this refactors the use of
mmap_rnd() to be used similarly to arm, and extracts the checking of
PF_RANDOMIZE.

Signed-off-by: Kees Cook 
---
 arch/x86/mm/mmap.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index df4552bd239e..ebfa52030d5c 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -67,22 +67,21 @@ static int mmap_is_legacy(void)
 
 static unsigned long mmap_rnd(void)
 {
-   unsigned long rnd = 0;
+   unsigned long rnd;
 
/*
-   *  8 bits of randomness in 32bit mmaps, 20 address space bits
-   * 28 bits of randomness in 64bit mmaps, 40 address space bits
-   */
-   if (current->flags & PF_RANDOMIZE) {
-   if (mmap_is_ia32())
-   rnd = get_random_int() % (1<<8);
-   else
-   rnd = get_random_int() % (1<<28);
-   }
+*  8 bits of randomness in 32bit mmaps, 20 address space bits
+* 28 bits of randomness in 64bit mmaps, 40 address space bits
+*/
+   if (mmap_is_ia32())
+   rnd = (unsigned long)get_random_int() % (1<<8);
+   else
+   rnd = (unsigned long)get_random_int() % (1<<28);
+
return rnd << PAGE_SHIFT;
 }
 
-static unsigned long mmap_base(void)
+static unsigned long mmap_base(unsigned long rnd)
 {
unsigned long gap = rlimit(RLIMIT_STACK);
 
@@ -91,19 +90,19 @@ static unsigned long mmap_base(void)
else if (gap > MAX_GAP)
gap = MAX_GAP;
 
-   return PAGE_ALIGN(TASK_SIZE - gap - mmap_rnd());
+   return PAGE_ALIGN(TASK_SIZE - gap - rnd);
 }
 
 /*
  * Bottom-up (legacy) layout on X86_32 did not support randomization, X86_64
  * does, but not when emulating X86_32
  */
-static unsigned long mmap_legacy_base(void)
+static unsigned long mmap_legacy_base(unsigned long rnd)
 {
if (mmap_is_ia32())
return TASK_UNMAPPED_BASE;
else
-   return TASK_UNMAPPED_BASE + mmap_rnd();
+   return TASK_UNMAPPED_BASE + rnd;
 }
 
 /*
@@ -112,13 +111,18 @@ static unsigned long mmap_legacy_base(void)
  */
 void arch_pick_mmap_layout(struct mm_struct *mm)
 {
-   mm->mmap_legacy_base = mmap_legacy_base();
-   mm->mmap_base = mmap_base();
+   unsigned long random_factor = 0UL;
+
+   if (current->flags & PF_RANDOMIZE)
+   random_factor = mmap_rnd();
+
+   mm->mmap_legacy_base = mmap_legacy_base(random_factor);
 
if (mmap_is_legacy()) {
mm->mmap_base = mm->mmap_legacy_base;
mm->get_unmapped_area = arch_get_unmapped_area;
} else {
+   mm->mmap_base = mmap_base(random_factor);
mm->get_unmapped_area = arch_get_unmapped_area_topdown;
}
 }
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 01/10] arm: factor out mmap ASLR into mmap_rnd

2015-03-03 Thread Kees Cook
In preparation for splitting out ET_DYN ASLR, this moves the ASLR calculations
for mmap on ARM into a separate routine, similar to x86. This also removes
the redundant check of personality (PF_RANDOMIZE is already set before calling
arch_pick_mmap_layout).

Signed-off-by: Kees Cook 
---
 arch/arm/mm/mmap.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c
index 5e85ed371364..15a8160096b3 100644
--- a/arch/arm/mm/mmap.c
+++ b/arch/arm/mm/mmap.c
@@ -169,14 +169,22 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
return addr;
 }
 
+static unsigned long mmap_rnd(void)
+{
+   unsigned long rnd;
+
+   /* 8 bits of randomness in 20 address space bits */
+   rnd = (unsigned long)get_random_int() % (1 << 8);
+
+   return rnd << PAGE_SHIFT;
+}
+
 void arch_pick_mmap_layout(struct mm_struct *mm)
 {
unsigned long random_factor = 0UL;
 
-   /* 8 bits of randomness in 20 address space bits */
-   if ((current->flags & PF_RANDOMIZE) &&
-   !(current->personality & ADDR_NO_RANDOMIZE))
-   random_factor = (get_random_int() % (1 << 8)) << PAGE_SHIFT;
+   if (current->flags & PF_RANDOMIZE)
+   random_factor = mmap_rnd();
 
if (mmap_is_legacy()) {
mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 05/10] powerpc: standardize mmap_rnd() usage

2015-03-03 Thread Kees Cook
In preparation for splitting out ET_DYN ASLR, this refactors the use of
mmap_rnd() to be used similarly to arm and x86.

Signed-off-by: Kees Cook 
---
Can mmap ASLR be safely enabled in the legacy mmap case here? Other archs
use "mm->mmap_base = TASK_UNMAPPED_BASE + random_factor".
---
 arch/powerpc/mm/mmap.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c
index cb8bdbe4972f..3d7088bfe93c 100644
--- a/arch/powerpc/mm/mmap.c
+++ b/arch/powerpc/mm/mmap.c
@@ -55,19 +55,18 @@ static inline int mmap_is_legacy(void)
 
 static unsigned long mmap_rnd(void)
 {
-   unsigned long rnd = 0;
+   unsigned long rnd;
+
+   /* 8MB for 32bit, 1GB for 64bit */
+   if (is_32bit_task())
+   rnd = (unsigned long)get_random_int() % (1<<(23-PAGE_SHIFT));
+   else
+   rnd = (unsigned long)get_random_int() % (1<<(30-PAGE_SHIFT));
 
-   if (current->flags & PF_RANDOMIZE) {
-   /* 8MB for 32bit, 1GB for 64bit */
-   if (is_32bit_task())
-   rnd = (long)(get_random_int() % (1<<(23-PAGE_SHIFT)));
-   else
-   rnd = (long)(get_random_int() % (1<<(30-PAGE_SHIFT)));
-   }
return rnd << PAGE_SHIFT;
 }
 
-static inline unsigned long mmap_base(void)
+static inline unsigned long mmap_base(unsigned long base)
 {
unsigned long gap = rlimit(RLIMIT_STACK);
 
@@ -76,7 +75,7 @@ static inline unsigned long mmap_base(void)
else if (gap > MAX_GAP)
gap = MAX_GAP;
 
-   return PAGE_ALIGN(TASK_SIZE - gap - mmap_rnd());
+   return PAGE_ALIGN(TASK_SIZE - gap - base);
 }
 
 /*
@@ -85,6 +84,11 @@ static inline unsigned long mmap_base(void)
  */
 void arch_pick_mmap_layout(struct mm_struct *mm)
 {
+   unsigned long random_factor = 0UL;
+
+   if (current->flags & PF_RANDOMIZE)
+   random_factor = mmap_rnd();
+
/*
 * Fall back to the standard layout if the personality
 * bit is set, or if the expected stack growth is unlimited:
@@ -93,7 +97,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
mm->mmap_base = TASK_UNMAPPED_BASE;
mm->get_unmapped_area = arch_get_unmapped_area;
} else {
-   mm->mmap_base = mmap_base();
+   mm->mmap_base = mmap_base(random_factor);
mm->get_unmapped_area = arch_get_unmapped_area_topdown;
}
 }
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 08/10] s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE

2015-03-03 Thread Kees Cook
In preparation for moving ET_DYN randomization into the ELF loader (which
requires a static ELF_ET_DYN_BASE), this redefines s390's existing ET_DYN
randomization in a call to arch_mmap_rnd(). This refactoring results in
the same ET_DYN randomization on s390.

Signed-off-by: Kees Cook 
---
 arch/s390/include/asm/elf.h |  8 +---
 arch/s390/mm/mmap.c | 11 ++-
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h
index c9df40b5c0ac..2e63de8aac7c 100644
--- a/arch/s390/include/asm/elf.h
+++ b/arch/s390/include/asm/elf.h
@@ -161,10 +161,12 @@ extern unsigned int vdso_enabled;
 /* This is the location that an ET_DYN program is loaded if exec'ed.  Typical
use of this is to invoke "./ld.so someprog" to test out a new version of
the loader.  We need to make sure that it is out of the way of the program
-   that it will "exec", and that there is sufficient room for the brk.  */
-
+   that it will "exec", and that there is sufficient room for the brk. 64-bit
+   tasks are aligned to 4GB. */
 extern unsigned long randomize_et_dyn(void);
-#define ELF_ET_DYN_BASErandomize_et_dyn()
+#define ELF_ET_DYN_BASE (randomize_et_dyn() + (is_32bit_task() ? \
+   (STACK_TOP / 3 * 2) : \
+   (STACK_TOP / 3 * 2) & ~((1UL << 32) - 1)))
 
 /* This yields a mask that user programs can use to figure out what
instruction set this CPU supports. */
diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
index a94504d99c47..8c11536f972d 100644
--- a/arch/s390/mm/mmap.c
+++ b/arch/s390/mm/mmap.c
@@ -179,17 +179,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
 
 unsigned long randomize_et_dyn(void)
 {
-   unsigned long base;
-
-   base = STACK_TOP / 3 * 2;
-   if (!is_32bit_task())
-   /* Align to 4GB */
-   base &= ~((1UL << 32) - 1);
-
if (current->flags & PF_RANDOMIZE)
-   base += arch_mmap_rnd();
+   return arch_mmap_rnd();
 
-   return base;
+   return 0UL;
 }
 
 #ifndef CONFIG_64BIT
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 0/10] split ET_DYN ASLR from mmap ASLR

2015-03-03 Thread Kees Cook
To address the "offset2lib" ASLR weakness[1], this separates ET_DYN
ASLR from mmap ASLR, as already done on s390. The architectures
that are already randomizing mmap (arm, arm64, mips, powerpc, s390,
and x86), have their various forms of arch_mmap_rnd() made available
via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these architectures,
arch_randomize_brk() is collapsed as well.

This is an alternative to the solutions in:
https://lkml.org/lkml/2015/2/23/442

I've been able to test x86 and arm, and the buildbot (so far) seems
happy with building the rest.

Thanks!

-Kees

[1] http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html

v3:
- split change on a per-arch basis for easier review
- moved PF_RANDOMIZE check out of per-arch code (ingo)
v2:
- verbosified the commit logs, especially 4/5 (akpm)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update

2015-03-03 Thread Tyrel Datwyler
On 03/03/2015 05:20 PM, Cyril Bur wrote:
> On Tue, 2015-03-03 at 15:15 -0800, Tyrel Datwyler wrote:
>> On 03/02/2015 01:49 PM, Tyrel Datwyler wrote:
>>> On 03/01/2015 09:20 PM, Cyril Bur wrote:
 On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
> We currently use the device tree update code in the kernel after resuming
> from a suspend operation to re-sync the kernels view of the device tree 
> with
> that of the hypervisor. The code as it stands is not endian safe as it 
> relies
> on parsing buffers returned by RTAS calls that thusly contains data in big
> endian format.
>
> This patch annotates variables and structure members with __be types as 
> well
> as performing necessary byte swaps to cpu endian for data that needs to be
> parsed.
>
> Signed-off-by: Tyrel Datwyler 
> ---
>  arch/powerpc/platforms/pseries/mobility.c | 36 
> ---
>  1 file changed, 19 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/mobility.c 
> b/arch/powerpc/platforms/pseries/mobility.c
> index 29e4f04..0b1f70e 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -25,10 +25,10 @@
>  static struct kobject *mobility_kobj;
>  
>  struct update_props_workarea {
> - u32 phandle;
> - u32 state;
> - u64 reserved;
> - u32 nprops;
> + __be32 phandle;
> + __be32 state;
> + __be64 reserved;
> + __be32 nprops;
>  } __packed;
>  
>  #define NODE_ACTION_MASK 0xff00
> @@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, 
> struct property **prop,
>   return 0;
>  }
>  
> -static int update_dt_node(u32 phandle, s32 scope)
> +static int update_dt_node(__be32 phandle, s32 scope)
>  {

 On line 153 of this function:
dn = of_find_node_by_phandle(phandle);

 You're passing a __be32 to device tree code, if we can treat the phandle
 as a opaque value returned to us from the rtas call and pass it around
 like that then all good.
>>
>> After digging deeper the device_node->phandle is stored in cpu endian
>> under the covers. So, for the of_find_node_by_phandle() we do need to
>> convert the phandle to cpu endian first. It appears I got lucky with the
>> update fixing the observed RMC issue because the phandle for the root
>> node seems to always be 0x.
>>
> I think we've both switched opinions here, initially I thought an endian
> conversion was necessary but turns out that all of_find_node_by_phandle
> really does is:
>for_each_of_allnodes(np)
>   if (np->phandle == handle)
>  break;
>of_node_get(np);
> 
> The == is safe either way and I think the of code might be trying to
> imply that it doesn't matter by having a typedefed type 'phandle'.
> 
> I'm still digging around, we want to get this right!

When the device tree is unflattened the phandle is byte swapped to cpu
endian. The following code is from unflatten_dt_node().

if (strcmp(pname, "ibm,phandle") == 0)
np->phandle = be32_to_cpup(p);

I added some debug to the of_find_node_by_phandle() and verified if the
phandle isn't swapped to cpu endian we fail to find a matching node
except in the case where the phandle is equivalent in both big and
little endian.

-Tyrel

> 
> 
> Cyril
>> -Tyrel
>>
>>>
>>> Yes, of_find_node_by_phandle directly compares phandle passed in against
>>> the handle stored in each device_node when searching for a matching
>>> node. Since, the device tree is big endian it follows that the big
>>> endian phandle received in the rtas buffer needs no conversion.
>>>
>>> Further, we need to pass the phandle to ibm,update-properties in the
>>> work area which is also required to be big endian. So, again it seemed
>>> that converting to cpu endian was a waste of effort just to convert it
>>> back to big endian.
>>>
 Its also hard to be sure if these need to be BE and have always been
 that way because we've always run BE so they've never actually wanted
 CPU endian its just that CPU endian has always been BE (I think I
 started rambling...)

 Just want to check that *not* converting them is done on purpose.
>>>
>>> Yes, I explicitly did not convert them on purpose. As mentioned above we
>>> need phandle in BE for the ibm,update-properties rtas work area.
>>> Similarly, drc_index needs to be in BE for the ibm,configure-connector
>>> rtas work area. Outside, of that we do no other manipulation of those
>>> values.
>>>

 And having read on, I'm assuming the answer is yes since this
 observation is true for your changes which affect:
delete_dt_node()
update_dt_node()
 add_dt_node()
 Worth noting that you didn't change the definition of delete_dt_node()
>>>
>>> You are correct. Oversight. I will fix that as it should 

Re: [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update

2015-03-03 Thread Cyril Bur
On Tue, 2015-03-03 at 15:15 -0800, Tyrel Datwyler wrote:
> On 03/02/2015 01:49 PM, Tyrel Datwyler wrote:
> > On 03/01/2015 09:20 PM, Cyril Bur wrote:
> >> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
> >>> We currently use the device tree update code in the kernel after resuming
> >>> from a suspend operation to re-sync the kernels view of the device tree 
> >>> with
> >>> that of the hypervisor. The code as it stands is not endian safe as it 
> >>> relies
> >>> on parsing buffers returned by RTAS calls that thusly contains data in big
> >>> endian format.
> >>>
> >>> This patch annotates variables and structure members with __be types as 
> >>> well
> >>> as performing necessary byte swaps to cpu endian for data that needs to be
> >>> parsed.
> >>>
> >>> Signed-off-by: Tyrel Datwyler 
> >>> ---
> >>>  arch/powerpc/platforms/pseries/mobility.c | 36 
> >>> ---
> >>>  1 file changed, 19 insertions(+), 17 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/platforms/pseries/mobility.c 
> >>> b/arch/powerpc/platforms/pseries/mobility.c
> >>> index 29e4f04..0b1f70e 100644
> >>> --- a/arch/powerpc/platforms/pseries/mobility.c
> >>> +++ b/arch/powerpc/platforms/pseries/mobility.c
> >>> @@ -25,10 +25,10 @@
> >>>  static struct kobject *mobility_kobj;
> >>>  
> >>>  struct update_props_workarea {
> >>> - u32 phandle;
> >>> - u32 state;
> >>> - u64 reserved;
> >>> - u32 nprops;
> >>> + __be32 phandle;
> >>> + __be32 state;
> >>> + __be64 reserved;
> >>> + __be32 nprops;
> >>>  } __packed;
> >>>  
> >>>  #define NODE_ACTION_MASK 0xff00
> >>> @@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, 
> >>> struct property **prop,
> >>>   return 0;
> >>>  }
> >>>  
> >>> -static int update_dt_node(u32 phandle, s32 scope)
> >>> +static int update_dt_node(__be32 phandle, s32 scope)
> >>>  {
> >>
> >> On line 153 of this function:
> >>dn = of_find_node_by_phandle(phandle);
> >>
> >> You're passing a __be32 to device tree code, if we can treat the phandle
> >> as a opaque value returned to us from the rtas call and pass it around
> >> like that then all good.
> 
> After digging deeper the device_node->phandle is stored in cpu endian
> under the covers. So, for the of_find_node_by_phandle() we do need to
> convert the phandle to cpu endian first. It appears I got lucky with the
> update fixing the observed RMC issue because the phandle for the root
> node seems to always be 0x.
> 
I think we've both switched opinions here, initially I thought an endian
conversion was necessary but turns out that all of_find_node_by_phandle
really does is:
   for_each_of_allnodes(np)
  if (np->phandle == handle)
 break;
   of_node_get(np);

The == is safe either way and I think the of code might be trying to
imply that it doesn't matter by having a typedefed type 'phandle'.

I'm still digging around, we want to get this right!


Cyril
> -Tyrel
> 
> > 
> > Yes, of_find_node_by_phandle directly compares phandle passed in against
> > the handle stored in each device_node when searching for a matching
> > node. Since, the device tree is big endian it follows that the big
> > endian phandle received in the rtas buffer needs no conversion.
> > 
> > Further, we need to pass the phandle to ibm,update-properties in the
> > work area which is also required to be big endian. So, again it seemed
> > that converting to cpu endian was a waste of effort just to convert it
> > back to big endian.
> > 
> >> Its also hard to be sure if these need to be BE and have always been
> >> that way because we've always run BE so they've never actually wanted
> >> CPU endian its just that CPU endian has always been BE (I think I
> >> started rambling...)
> >>
> >> Just want to check that *not* converting them is done on purpose.
> > 
> > Yes, I explicitly did not convert them on purpose. As mentioned above we
> > need phandle in BE for the ibm,update-properties rtas work area.
> > Similarly, drc_index needs to be in BE for the ibm,configure-connector
> > rtas work area. Outside, of that we do no other manipulation of those
> > values.
> > 
> >>
> >> And having read on, I'm assuming the answer is yes since this
> >> observation is true for your changes which affect:
> >>delete_dt_node()
> >>update_dt_node()
> >> add_dt_node()
> >> Worth noting that you didn't change the definition of delete_dt_node()
> > 
> > You are correct. Oversight. I will fix that as it should generate a
> > sparse complaint.
> > 
> > -Tyrel
> > 
> >>
> >> I'll have a look once you address the non compiling in patch 1/3 (I'm
> >> getting blocked the unused var because somehow Werror is on, odd it
> >> didn't trip you up) but I also suspect this will have sparse go a bit
> >> nuts. 
> >> I wonder if there is a nice way of shutting sparse up.
> >>
> >>>   struct update_props_workarea *upwa;
> >>>   struct device_node *dn;
> >>> @@ -136,6 +136,7 @@ static int update_dt_node(u32 phandle, s32 scope)
> >>> 

Re: [PATCH v2 3/5] crypto: talitos: Fix off-by-one and use all hardware slots

2015-03-03 Thread Kim Phillips
On Tue,  3 Mar 2015 08:21:35 -0500
Martin Hicks  wrote:

> The submission count was off by one.
> 
> Signed-off-by: Martin Hicks 
> ---
sadly, this directly contradicts:

commit 4b24ea971a93f5d0bec34bf7bfd0939f70cfaae6
Author: Vishnu Suresh 
Date:   Mon Oct 20 21:06:18 2008 +0800

crypto: talitos - Preempt overflow interrupts off-by-one fix

My guess is your request submission pattern differs from that of
Vishnu's (probably IPSec and/or tcrypt), or later h/w versions have
gotten better about dealing with channel near-overflow conditions.
Either way, I'd prefer we not do this: it might break others, and
I'm guessing doesn't improve performance _that_ much?

If it does, we could risk it and restrict it to SEC versions 3.3 and
above maybe?  Not sure what to do here exactly, barring digging up
and old 2.x SEC and testing.

Kim

p.s. I checked, Vishnu isn't with Freescale anymore, so I can't
cc him.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-03 Thread Kim Phillips
On Tue,  3 Mar 2015 08:21:37 -0500
Martin Hicks  wrote:

> @@ -1170,6 +1237,8 @@ static struct talitos_edesc *talitos_edesc_alloc(struct 
> device *dev,
>edesc->dma_len,
>DMA_BIDIRECTIONAL);
>   edesc->req.desc = &edesc->desc;
> + /* A copy of the crypto_async_request to use the crypto_queue backlog */
> + memcpy(&edesc->req.base, areq, sizeof(struct crypto_async_request));

this seems backward, or, at least can be done more efficiently IMO:
talitos_cra_init should set the tfm's reqsize so the rest of
the driver can wholly embed its talitos_edesc (which should also
wholly encapsulate its talitos_request (i.e., not via a pointer))
into the crypto API's request handle allocation.  This
would absorb and eliminate the talitos_edesc kmalloc and frees, the
above memcpy, and replace the container_of after the
crypto_dequeue_request with an offset_of, right?

When scatter-gather buffers are needed, we can assume a slower-path
and make them do their own allocations, since their sizes vary
depending on each request.  Of course, a pointer to those
allocations would need to be retained somewhere in the request
handle.

Only potential problem is getting the crypto API to set the GFP_DMA
flag in the allocation request, but presumably a
CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.

Kim
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update

2015-03-03 Thread Tyrel Datwyler
On 03/02/2015 01:49 PM, Tyrel Datwyler wrote:
> On 03/01/2015 09:20 PM, Cyril Bur wrote:
>> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
>>> We currently use the device tree update code in the kernel after resuming
>>> from a suspend operation to re-sync the kernels view of the device tree with
>>> that of the hypervisor. The code as it stands is not endian safe as it 
>>> relies
>>> on parsing buffers returned by RTAS calls that thusly contains data in big
>>> endian format.
>>>
>>> This patch annotates variables and structure members with __be types as well
>>> as performing necessary byte swaps to cpu endian for data that needs to be
>>> parsed.
>>>
>>> Signed-off-by: Tyrel Datwyler 
>>> ---
>>>  arch/powerpc/platforms/pseries/mobility.c | 36 
>>> ---
>>>  1 file changed, 19 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/mobility.c 
>>> b/arch/powerpc/platforms/pseries/mobility.c
>>> index 29e4f04..0b1f70e 100644
>>> --- a/arch/powerpc/platforms/pseries/mobility.c
>>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>>> @@ -25,10 +25,10 @@
>>>  static struct kobject *mobility_kobj;
>>>  
>>>  struct update_props_workarea {
>>> -   u32 phandle;
>>> -   u32 state;
>>> -   u64 reserved;
>>> -   u32 nprops;
>>> +   __be32 phandle;
>>> +   __be32 state;
>>> +   __be64 reserved;
>>> +   __be32 nprops;
>>>  } __packed;
>>>  
>>>  #define NODE_ACTION_MASK   0xff00
>>> @@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, 
>>> struct property **prop,
>>> return 0;
>>>  }
>>>  
>>> -static int update_dt_node(u32 phandle, s32 scope)
>>> +static int update_dt_node(__be32 phandle, s32 scope)
>>>  {
>>
>> On line 153 of this function:
>>dn = of_find_node_by_phandle(phandle);
>>
>> You're passing a __be32 to device tree code, if we can treat the phandle
>> as a opaque value returned to us from the rtas call and pass it around
>> like that then all good.

After digging deeper the device_node->phandle is stored in cpu endian
under the covers. So, for the of_find_node_by_phandle() we do need to
convert the phandle to cpu endian first. It appears I got lucky with the
update fixing the observed RMC issue because the phandle for the root
node seems to always be 0x.

-Tyrel

> 
> Yes, of_find_node_by_phandle directly compares phandle passed in against
> the handle stored in each device_node when searching for a matching
> node. Since, the device tree is big endian it follows that the big
> endian phandle received in the rtas buffer needs no conversion.
> 
> Further, we need to pass the phandle to ibm,update-properties in the
> work area which is also required to be big endian. So, again it seemed
> that converting to cpu endian was a waste of effort just to convert it
> back to big endian.
> 
>> Its also hard to be sure if these need to be BE and have always been
>> that way because we've always run BE so they've never actually wanted
>> CPU endian its just that CPU endian has always been BE (I think I
>> started rambling...)
>>
>> Just want to check that *not* converting them is done on purpose.
> 
> Yes, I explicitly did not convert them on purpose. As mentioned above we
> need phandle in BE for the ibm,update-properties rtas work area.
> Similarly, drc_index needs to be in BE for the ibm,configure-connector
> rtas work area. Outside, of that we do no other manipulation of those
> values.
> 
>>
>> And having read on, I'm assuming the answer is yes since this
>> observation is true for your changes which affect:
>>  delete_dt_node()
>>  update_dt_node()
>> add_dt_node()
>> Worth noting that you didn't change the definition of delete_dt_node()
> 
> You are correct. Oversight. I will fix that as it should generate a
> sparse complaint.
> 
> -Tyrel
> 
>>
>> I'll have a look once you address the non compiling in patch 1/3 (I'm
>> getting blocked the unused var because somehow Werror is on, odd it
>> didn't trip you up) but I also suspect this will have sparse go a bit
>> nuts. 
>> I wonder if there is a nice way of shutting sparse up.
>>
>>> struct update_props_workarea *upwa;
>>> struct device_node *dn;
>>> @@ -136,6 +136,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>>> char *prop_data;
>>> char *rtas_buf;
>>> int update_properties_token;
>>> +   u32 nprops;
>>> u32 vd;
>>>  
>>> update_properties_token = rtas_token("ibm,update-properties");
>>> @@ -162,6 +163,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>>> break;
>>>  
>>> prop_data = rtas_buf + sizeof(*upwa);
>>> +   nprops = be32_to_cpu(upwa->nprops);
>>>  
>>> /* On the first call to ibm,update-properties for a node the
>>>  * the first property value descriptor contains an empty
>>> @@ -170,17 +172,17 @@ static int update_dt_node(u32 phandle, s32 scope)
>>>  */
>>> if (*prop_data == 0) {
>>>  

Re: [PATCH 3/3] powerpc/pseries: Expose post-migration in kernel device tree update to drmgr

2015-03-03 Thread Tyrel Datwyler
On 03/02/2015 10:24 PM, Michael Ellerman wrote:
> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
>> Traditionally after a migration operation drmgr has coordinated the device 
>> tree
>> update with the kernel in userspace via the ugly /proc/ppc64/ofdt interface. 
>> This
>> can be better done fully in the kernel where support already exists. 
>> Currently,
>> drmgr makes a faux ibm,suspend-me RTAS call which we intercept in the kernel 
>> so
>> that we can check VASI state for suspendability. After the LPAR resumes and
>> returns to drmgr that is followed by the necessary update-nodes and
>> update-properties RTAS calls which are parsed and communitated back to the 
>> kernel
>> through /proc/ppc64/ofdt for the device tree update. The drmgr tool should
>> instead initiate the migration using the already existing
>> /sysfs/kernel/mobility/migration entry that performs all this work in the 
>> kernel.
>>
>> This patch adds a show function to the sysfs "migration" attribute that 
>> returns
>> 1 to indicate the kernel will perform the device tree update after a 
>> migration
>> operation and that drmgr should initiated the migration through the sysfs
>> "migration" attribute.
> 
> I don't understand why we need this?
> 
> Can't drmgr just check if /sysfs/kernel/mobility/migration exists, and if so 
> it
> knows it should use it and that the kernel will handle the whole procedure?

The problem is that this sysfs entry was originally added with the
remainder of the in kernel device tree update code in 2.6.37, but drmgr
was never modified to use it. By the time I started looking at the
in-kernel device tree code I found it very broken. I had bunch of fixes
to get it working that went into 3.12.

So, if somebody were to use a newer version of drmgr that simply checks
for the existence of the migration sysfs entry on a pre-3.12 kernel
their device-tree update experience is going to be sub-par.

The approach taken here is identical to what was done in 9da3489 when we
hooked the device tree update code into the suspend code. However, in
that case we were already using the sysfs entry to trigger the suspend
and legitimately needed a way to tell drmgr the kernel was now taking
care of updating the device tree. Here we are really just trying to
inform drmgr that it is running on a new enough kernel that the kernel
device tree code actually works properly.

Now, I don't really care for this approach, but the only other thought I
had was to change the sysfs entry from "migration" to "migrate".

-Tyrel

> 
> cheers
> 
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v1 2/2] sata_dwc_460ex: re-use hsdev->dev instead of dwc_dev

2015-03-03 Thread Andy Shevchenko
This patch re-uses hsdev->dev which is allocated on heap. Therefore, the
private structure, which is global variable, is reduced by one field.

In one case ap->dev is used and there it seems to be right decision.

Signed-off-by: Andy Shevchenko 
---
 drivers/ata/sata_dwc_460ex.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/ata/sata_dwc_460ex.c b/drivers/ata/sata_dwc_460ex.c
index 08cd63f..5ab4849 100644
--- a/drivers/ata/sata_dwc_460ex.c
+++ b/drivers/ata/sata_dwc_460ex.c
@@ -194,7 +194,6 @@ struct sata_dwc_host_priv {
void__iomem  *scr_addr_sstatus;
u32 sata_dwc_sactive_issued ;
u32 sata_dwc_sactive_queued ;
-   struct  device  *dwc_dev;
 };
 
 static struct sata_dwc_host_priv host_pvt;
@@ -252,16 +251,16 @@ static const char *get_dma_dir_descript(int dma_dir)
}
 }
 
-static void sata_dwc_tf_dump(struct ata_taskfile *tf)
+static void sata_dwc_tf_dump(struct ata_port *ap, struct ata_taskfile *tf)
 {
-   dev_vdbg(host_pvt.dwc_dev,
+   dev_vdbg(ap->dev,
"taskfile cmd: 0x%02x protocol: %s flags: 0x%lx device: %x\n",
tf->command, get_prot_descript(tf->protocol), tf->flags,
tf->device);
-   dev_vdbg(host_pvt.dwc_dev,
+   dev_vdbg(ap->dev,
"feature: 0x%02x nsect: 0x%x lbal: 0x%x lbam: 0x%x lbah: 
0x%x\n",
tf->feature, tf->nsect, tf->lbal, tf->lbam, tf->lbah);
-   dev_vdbg(host_pvt.dwc_dev,
+   dev_vdbg(ap->dev,
"hob_feature: 0x%02x hob_nsect: 0x%x hob_lbal: 0x%x hob_lbam: 
0x%x hob_lbah: 0x%x\n",
tf->hob_feature, tf->hob_nsect, tf->hob_lbal, tf->hob_lbam,
tf->hob_lbah);
@@ -337,7 +336,7 @@ static struct dma_async_tx_descriptor 
*dma_dwc_xfer_setup(struct ata_queued_cmd
desc->callback = dma_dwc_xfer_done;
desc->callback_param = hsdev;
 
-   dev_dbg(host_pvt.dwc_dev, "%s sg: 0x%p, count: %d addr: %pad\n",
+   dev_dbg(hsdev->dev, "%s sg: 0x%p, count: %d addr: %pad\n",
__func__, qc->sg, qc->n_elem, &addr);
 
return desc;
@@ -687,7 +686,7 @@ static void sata_dwc_clear_dmacr(struct 
sata_dwc_device_port *hsdevp, u8 tag)
 * This should not happen, it indicates the driver is out of
 * sync.  If it does happen, clear dmacr anyway.
 */
-   dev_err(host_pvt.dwc_dev,
+   dev_err(hsdev->dev,
"%s DMA protocol RX and TX DMA not pending tag=0x%02x 
pending=%d dmacr: 0x%08x\n",
__func__, tag, hsdevp->dma_pending[tag],
in_le32(&hsdev->sata_dwc_regs->dmacr));
@@ -779,7 +778,7 @@ static void sata_dwc_enable_interrupts(struct 
sata_dwc_device *hsdev)
 */
out_le32(&hsdev->sata_dwc_regs->errmr, SATA_DWC_SERROR_ERR_BITS);
 
-   dev_dbg(host_pvt.dwc_dev, "%s: INTMR = 0x%08x, ERRMR = 0x%08x\n",
+   dev_dbg(hsdev->dev, "%s: INTMR = 0x%08x, ERRMR = 0x%08x\n",
 __func__, in_le32(&hsdev->sata_dwc_regs->intmr),
in_le32(&hsdev->sata_dwc_regs->errmr));
 }
@@ -855,7 +854,7 @@ static int sata_dwc_port_start(struct ata_port *ap)
hsdevp->hsdev = hsdev;
 
hsdevp->dws = &sata_dwc_dma_dws;
-   hsdevp->dws->dma_dev = host_pvt.dwc_dev;
+   hsdevp->dws->dma_dev = hsdev->dev;
 
dma_cap_zero(mask);
dma_cap_set(DMA_SLAVE, mask);
@@ -863,7 +862,7 @@ static int sata_dwc_port_start(struct ata_port *ap)
/* Acquire DMA channel */
hsdevp->chan = dma_request_channel(mask, sata_dwc_dma_filter, hsdevp);
if (!hsdevp->chan) {
-   dev_err(host_pvt.dwc_dev, "%s: dma channel unavailable\n",
+   dev_err(hsdev->dev, "%s: dma channel unavailable\n",
 __func__);
err = -EAGAIN;
goto CLEANUP_ALLOC;
@@ -990,7 +989,7 @@ static void sata_dwc_bmdma_start_by_tag(struct 
ata_queued_cmd *qc, u8 tag)
"%s qc=%p tag: %x cmd: 0x%02x dma_dir: %s start_dma? %x\n",
__func__, qc, tag, qc->tf.command,
get_dma_dir_descript(qc->dma_dir), start_dma);
-   sata_dwc_tf_dump(&(qc->tf));
+   sata_dwc_tf_dump(ap, &qc->tf);
 
if (start_dma) {
reg = core_scr_read(SCR_ERROR);
@@ -1244,7 +1243,7 @@ static int sata_dwc_probe(struct platform_device *ofdev)
}
 
/* Save dev for later use in dev_xxx() routines */
-   host_pvt.dwc_dev = &ofdev->dev;
+   hsdev->dev = &ofdev->dev;
 
hsdev->dma->dev = &ofdev->dev;
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code

2015-03-03 Thread Tyrel Datwyler
On 03/02/2015 10:10 PM, Michael Ellerman wrote:
> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
>> This patchset simplifies the usage of rtas_ibm_suspend_me() by removing an
>> extraneous function parameter, fixes device tree updating on little endian
>> platforms, and adds a mechanism for informing drmgr that the kernel is 
>> cabable of
>> performing the whole migration including device tree update itself.
>>
>> Tyrel Datwyler (3):
>>   powerpc/pseries: Simplify check for suspendability during
>> suspend/migration
>>   powerpc/pseries: Little endian fixes for post mobility device tree
>> update
>>   powerpc/pseries: Expose post-migration in kernel device tree update
>> to drmgr
> 
> Hi Tyrel,
> 
> Firstly let me say how much I hate this code, so thanks for working on it :)

I did it once. Might as well sacrifice my sanity a second time. :)

> 
> But I need you to split this series, into 1) fixes for 4.0 (and stable?), and
> 2) the rest.
> 
> I *think* that would be patch 2, and then patches 1 & 3, but I don't want to
> guess. So please resend.

Sure. Your split seems correct as patch 2 is fixes while 1 and 3 are
cosmetic/new feature. Seeing as patch 1 is endian fixes I'll Cc -stable
as well.

-Tyrel

> 
> cheers
> 
> 
> 
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v1 0/2] sata_dwc_460ex: move to generic DMA driver

2015-03-03 Thread Andy Shevchenko
The SATA implementation based on two actually different devices, i.e. SATA and
DMA controllers.

For Synopsys DesignWare DMA we have already a generic implementation of the
driver. Thus, the patch 1/2 converts the code to use DMAEngine framework and
dw_dmac driver.

In future it will be better to split the devices inside DTS as well like it's
done on other platforms and remove hardcoded parameters of DMA controller.

Besides it's a nice clean up it removes a lot of warnings produced by the
original code, that pissed off even Linus [1]. Though, this series doesn't
re-enable COMPILE_TEST for this module.

The driver is compile tested only on x86. So, it would be nice if anyone who
has either AMCC 460EX Canyonlands board or similar SATA controller in
possession can test this.

[1] http://www.spinics.net/lists/linux-ide/msg50334.html

Andy Shevchenko (2):
  sata_dwc_460ex: move to generic DMA driver
  sata_dwc_460ex: re-use hsdev->dev instead of dwc_dev

 drivers/ata/sata_dwc_460ex.c | 753 ---
 1 file changed, 130 insertions(+), 623 deletions(-)

-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v1 1/2] sata_dwc_460ex: move to generic DMA driver

2015-03-03 Thread Andy Shevchenko
The SATA implementation based on two actually different devices, i.e. SATA and
DMA controllers.

For Synopsys DesignWare DMA we have already a generic implementation of the
driver. Thus, the patch converts the code to use DMAEngine framework and
dw_dmac driver.

In future it will be better to split the devices inside DTS as well like it's
done on other platforms.

Signed-off-by: Andy Shevchenko 
---
 drivers/ata/sata_dwc_460ex.c | 736 +++
 1 file changed, 122 insertions(+), 614 deletions(-)

diff --git a/drivers/ata/sata_dwc_460ex.c b/drivers/ata/sata_dwc_460ex.c
index 7bc0c12..08cd63f 100644
--- a/drivers/ata/sata_dwc_460ex.c
+++ b/drivers/ata/sata_dwc_460ex.c
@@ -36,11 +36,16 @@
 #include 
 #include 
 #include 
+
 #include "libata.h"
 
 #include 
 #include 
 
+/* Supported DMA engine drivers */
+#include 
+#include 
+
 /* These two are defined in "libata.h" */
 #undef DRV_NAME
 #undef DRV_VERSION
@@ -60,153 +65,9 @@
 #define NO_IRQ 0
 #endif
 
-/* SATA DMA driver Globals */
-#define DMA_NUM_CHANS  1
-#define DMA_NUM_CHAN_REGS  8
-
-/* SATA DMA Register definitions */
 #define AHB_DMA_BRST_DFLT  64  /* 16 data items burst length*/
 
-struct dmareg {
-   u32 low;/* Low bits 0-31 */
-   u32 high;   /* High bits 32-63 */
-};
-
-/* DMA Per Channel registers */
-struct dma_chan_regs {
-   struct dmareg sar;  /* Source Address */
-   struct dmareg dar;  /* Destination address */
-   struct dmareg llp;  /* Linked List Pointer */
-   struct dmareg ctl;  /* Control */
-   struct dmareg sstat;/* Source Status not implemented in core */
-   struct dmareg dstat;/* Destination Status not implemented in core*/
-   struct dmareg sstatar;  /* Source Status Address not impl in core */
-   struct dmareg dstatar;  /* Destination Status Address not implemente */
-   struct dmareg cfg;  /* Config */
-   struct dmareg sgr;  /* Source Gather */
-   struct dmareg dsr;  /* Destination Scatter */
-};
-
-/* Generic Interrupt Registers */
-struct dma_interrupt_regs {
-   struct dmareg tfr;  /* Transfer Interrupt */
-   struct dmareg block;/* Block Interrupt */
-   struct dmareg srctran;  /* Source Transfer Interrupt */
-   struct dmareg dsttran;  /* Dest Transfer Interrupt */
-   struct dmareg error;/* Error */
-};
-
-struct ahb_dma_regs {
-   struct dma_chan_regschan_regs[DMA_NUM_CHAN_REGS];
-   struct dma_interrupt_regs interrupt_raw;/* Raw Interrupt */
-   struct dma_interrupt_regs interrupt_status; /* Interrupt Status */
-   struct dma_interrupt_regs interrupt_mask;   /* Interrupt Mask */
-   struct dma_interrupt_regs interrupt_clear;  /* Interrupt Clear */
-   struct dmareg   statusInt;  /* Interrupt combined*/
-   struct dmareg   rq_srcreg;  /* Src Trans Req */
-   struct dmareg   rq_dstreg;  /* Dst Trans Req */
-   struct dmareg   rq_sgl_srcreg;  /* Sngl Src Trans Req*/
-   struct dmareg   rq_sgl_dstreg;  /* Sngl Dst Trans Req*/
-   struct dmareg   rq_lst_srcreg;  /* Last Src Trans Req*/
-   struct dmareg   rq_lst_dstreg;  /* Last Dst Trans Req*/
-   struct dmareg   dma_cfg;/* DMA Config */
-   struct dmareg   dma_chan_en;/* DMA Channel Enable*/
-   struct dmareg   dma_id; /* DMA ID */
-   struct dmareg   dma_test;   /* DMA Test */
-   struct dmareg   res1;   /* reserved */
-   struct dmareg   res2;   /* reserved */
-   /*
-* DMA Comp Params
-* Param 6 = dma_param[0], Param 5 = dma_param[1],
-* Param 4 = dma_param[2] ...
-*/
-   struct dmareg   dma_params[6];
-};
-
-/* Data structure for linked list item */
-struct lli {
-   u32 sar;/* Source Address */
-   u32 dar;/* Destination address */
-   u32 llp;/* Linked List Pointer */
-   struct dmareg   ctl;/* Control */
-   struct dmareg   dstat;  /* Destination Status */
-};
-
-enum {
-   SATA_DWC_DMAC_LLI_SZ =  (sizeof(struct lli)),
-   SATA_DWC_DMAC_LLI_NUM = 256,
-   SATA_DWC_DMAC_LLI_TBL_SZ = (SATA_DWC_DMAC_LLI_SZ * \
-   SATA_DWC_DMAC_LLI_NUM),
-   SATA_DWC_DMAC_TWIDTH_BYTES = 4,
-   SATA_DWC_DMAC_CTRL_TSIZE_MAX = (0x0800 * \
-   SATA_DWC_DMAC_TWIDTH_BYTES),
-};
-
-/* DMA Register Operation Bits */
 enum {
-   DMA_EN  =   0x0001, /* Enable AHB DMA */
-   DMA_CTL_LLP_SRCEN = 0x1000, /* Blk chain enable Src */
-   DMA_CTL_LLP_DSTEN = 0x0800, /* Blk chain enable Dst */

Re: [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration

2015-03-03 Thread Tyrel Datwyler
On 03/02/2015 10:15 PM, Michael Ellerman wrote:
> On Mon, 2015-03-02 at 13:30 -0800, Tyrel Datwyler wrote:
>> On 03/01/2015 08:19 PM, Cyril Bur wrote:
>>> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
 During suspend/migration operation we must wait for the VASI state reported
 by the hypervisor to become Suspending prior to making the ibm,suspend-me
 RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state 
 variable
 that exposes the VASI state to the caller. This is unnecessary as the 
 caller
 only really cares about the following three conditions; if there is an 
 error
 we should bailout, success indicating we have suspended and woken back up 
 so
 proceed to device tree updated, or we are not suspendable yet so try 
 calling
 rtas_ibm_suspend_me again shortly.

 This patch removes the extraneous vasi_state variable and simply uses the
 return code to communicate how to proceed. We either succeed, fail, or get
 -EAGAIN in which case we sleep for a second before trying to call
 rtas_ibm_suspend_me again.

u64 handle = ((u64)be32_to_cpu(args.args[0]) << 32)
  | be32_to_cpu(args.args[1]);
 -  rc = rtas_ibm_suspend_me(handle, &vasi_rc);
 -  args.rets[0] = cpu_to_be32(vasi_rc);
 -  if (rc)
 +  rc = rtas_ibm_suspend_me(handle);
 +  if (rc == -EAGAIN)
 +  args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE);
>>>
>>> (continuing on...) so perhaps here have
>>> rc = 0;
>>> else if (rc == -EIO)
>>> args.rets[0] = cpu_to_be32(-1);
>>> rc = 0;
>>> Which should keep the original behaviour, the last thing we want to do
>>> is break BE.
>>
>> The biggest problem here is we are making what basically equates to a
>> fake rtas call from drmgr which we intercept in ppc_rtas(). From there
>> we make this special call to rtas_ibm_suspend_me() to check VASI state
>> and do a bunch of other specialized work that needs to be setup prior to
>> making the actual ibm,suspend-me rtas call. Since, we are cheating PAPR
>> here I guess we can really handle it however we want. I chose to simply
>> fail the rtas call in the case where rtas_ibm_suspend_me() fails with
>> something other than -EAGAIN. In user space librtas will log errno for
>> the failure and return RTAS_IO_ASSERT to drmgr which in turn will log
>> that error and fail.
> 
> We don't want to change the return values of the syscall unless we absolutely
> have to. And I don't think that's the case here.

I'd like to argue that the one case I changed makes sense, but its just
as easy to keep the original behavior.

> 
> Sure we think drmgr is the only thing that uses this crap, but we don't know
> for sure.

I can't imagine how anybody else could possibly use this hack without a
streamid from the hmc/hypervisor, but I've been wrong in the past more
times than I can count. :)

-Tyrel

> 
> cheers
> 
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] ibmveth: Add function to enable live MAC address changes

2015-03-03 Thread David Miller
From: Thomas Falcon 
Date: Mon,  2 Mar 2015 11:56:12 -0600

> Add a function that will enable changing the MAC address
> of an ibmveth interface while it is still running.
> 
> Signed-off-by: Thomas Falcon 

Applied, thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] spi: fsl-spi: use of_iomap() to map parameter ram on CPM1

2015-03-03 Thread Mark Brown
On Thu, Feb 26, 2015 at 05:11:42PM +0100, Christophe Leroy wrote:
> On CPM2, the SPI parameter RAM is dynamically allocated in the dualport RAM
> whereas in CPM1, it is statically allocated to a default address with
> capability to relocate it somewhere else via the use of CPM micropatch.
> The address of the parameter RAM is given by the boot loader and expected
> to be mapped via of_iomap()

Why are we using of_iomap() rather than a generic I/O mapping function
here?


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/5] split ET_DYN ASLR from mmap ASLR

2015-03-03 Thread Kees Cook
On Mon, Mar 2, 2015 at 11:31 PM, Ingo Molnar  wrote:
>
> * Kees Cook  wrote:
>
>> To address the "offset2lib" ASLR weakness[1], this separates ET_DYN
>> ASLR from mmap ASLR, as already done on s390. The architectures
>> that are already randomizing mmap (arm, arm64, mips, powerpc, s390,
>> and x86), have their various forms of arch_mmap_rnd() made available
>> via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these architectures,
>> arch_randomize_brk() is collapsed as well.
>>
>> This is an alternative to the solutions in:
>> https://lkml.org/lkml/2015/2/23/442
>
> Looks good so far:
>
> Reviewed-by: Ingo Molnar 
>
> While reviewing this series I also noticed that the following code
> could be factored out from architecture mmap code as well:
>
>   - arch_pick_mmap_layout() uses very similar patterns across the
> platforms, with only few variations. Many architectures use
> the same duplicated mmap_is_legacy() helper as well. There's
> usually just trivial differences between mmap_legacy_base()
> approaches as well.

I was nervous to start refactoring this code, but it's true: most of
it is the same.

>   - arch_mmap_rnd(): the PF_RANDOMIZE checks are needlessly
> exposed to the arch routine - the arch routine should only
> concentrate on arch details, not generic flags like
> PF_RANDOMIZE.

Yeah, excellent point. I will send a follow-up patch to move this into
binfmt_elf instead. I'd like to avoid removing it in any of the other
patches since each was attempting a single step in the refactoring.

> In theory the mmap layout could be fully parametrized as well: i.e. no
> callback functions to architectures by default at all: just
> declarations of bits of randomization desired (or, available address
> space bits), and perhaps an arch helper to allow 32-bit vs. 64-bit
> address space distinctions.

Yeah, I was considering that too, since each architecture has a nearly
identical arch_mmap_rnd() at this point. Only the size of the entropy
was changing.

> 'Weird' architectures could provide special routines, but only by
> overriding the default behavior, which should be generic, safe and
> robust.

Yeah, quite true. Should entropy size be a #define like
ELF_ET_DYN_BASE? Something like ASLR_MMAP_ENTROPY and
ASLR_MMAP_ENTROPY_32? Is there a common function for determining a
compat task? That seemed to be per-arch too. Maybe
arch_mmap_entropy()?

-Kees

-- 
Kees Cook
Chrome OS Security
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] seccomp: switch to using asm-generic for seccomp.h

2015-03-03 Thread Kees Cook
On Tue, Mar 3, 2015 at 12:30 AM, Ingo Molnar  wrote:
>
> * Kees Cook  wrote:
>
>> Most architectures don't need to do anything special for the strict
>> seccomp syscall entries. Remove the redundant headers and reduce the
>> others.
>
>>  19 files changed, 27 insertions(+), 137 deletions(-)
>
> Lovely cleanup factor.
>
> Just to make sure, are you sure the 32-bit details are identical
> across architectures?

I did "gcc -E -dM" style output comparisons on the architectures I had
compilers for, and the buildbot hasn't complained on any of the others
(though see the bottom of this email).

>
> For example some architectures did this:
>
>> --- a/arch/microblaze/include/asm/seccomp.h
>> +++ /dev/null
>> @@ -1,16 +0,0 @@
>> -#ifndef _ASM_MICROBLAZE_SECCOMP_H
>> -#define _ASM_MICROBLAZE_SECCOMP_H
>> -
>> -#include 
>> -
>> -#define __NR_seccomp_read__NR_read
>> -#define __NR_seccomp_write   __NR_write
>> -#define __NR_seccomp_exit__NR_exit
>> -#define __NR_seccomp_sigreturn   __NR_sigreturn
>> -
>> -#define __NR_seccomp_read_32 __NR_read
>> -#define __NR_seccomp_write_32__NR_write
>> -#define __NR_seccomp_exit_32 __NR_exit
>> -#define __NR_seccomp_sigreturn_32__NR_sigreturn

The asm-generic uses the same syscall numbers from both 64 and 32,
which matches most architectures, and those are the ones that had
their seccomp.h entirely eliminated.

> others did this:
>
>> diff --git a/arch/x86/include/asm/seccomp_64.h 
>> b/arch/x86/include/asm/seccomp_64.h
>> deleted file mode 100644
>> index 84ec1bd161a5..
>> --- a/arch/x86/include/asm/seccomp_64.h
>> +++ /dev/null
>> @@ -1,17 +0,0 @@
>> -#ifndef _ASM_X86_SECCOMP_64_H
>> -#define _ASM_X86_SECCOMP_64_H
>> -
>> -#include 
>> -#include 
>> -
>> -#define __NR_seccomp_read __NR_read
>> -#define __NR_seccomp_write __NR_write
>> -#define __NR_seccomp_exit __NR_exit
>> -#define __NR_seccomp_sigreturn __NR_rt_sigreturn
>> -
>> -#define __NR_seccomp_read_32 __NR_ia32_read
>> -#define __NR_seccomp_write_32 __NR_ia32_write
>> -#define __NR_seccomp_exit_32 __NR_ia32_exit
>> -#define __NR_seccomp_sigreturn_32 __NR_ia32_sigreturn
>> -
>> -#endif /* _ASM_X86_SECCOMP_64_H */

Well, this was x86's split config that was consolidated into the file below:

>
> While in yet another case you kept the syscall mappings:
>
>> --- a/arch/x86/include/asm/seccomp.h
>> +++ b/arch/x86/include/asm/seccomp.h
>> @@ -1,5 +1,20 @@
>> +#ifndef _ASM_X86_SECCOMP_H
>> +#define _ASM_X86_SECCOMP_H
>> +
>> +#include 
>> +
>> +#ifdef CONFIG_COMPAT
>> +#include 
>> +#define __NR_seccomp_read_32 __NR_ia32_read
>> +#define __NR_seccomp_write_32__NR_ia32_write
>> +#define __NR_seccomp_exit_32 __NR_ia32_exit
>> +#define __NR_seccomp_sigreturn_32__NR_ia32_sigreturn
>> +#endif
>> +
>>  #ifdef CONFIG_X86_32
>> -# include 
>> -#else
>> -# include 
>> +#define __NR_seccomp_sigreturn   __NR_sigreturn
>>  #endif
>> +
>> +#include 
>> +
>> +#endif /* _ASM_X86_SECCOMP_H */
>
> It might all be correct, but it's not obvious to me.

The x86 change was the most complex as it removed a seccomp_32. and
seccomp_64.h file and merged into a single asm/seccomp.h to provide
overrides for the _32 #defines.

However, in looking at it now... I see some flip/flopping of
__NR_sigreturn and __NR_rt_sigreturn between some of the
architectures. Let me study that and send a v3. I think there are some
accidental changes on microblaze and powerpc.

-Kees

-- 
Kees Cook
Chrome OS Security
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode

2015-03-03 Thread Martin Hicks
On Tue, Mar 3, 2015 at 10:44 AM, Horia Geantă
 wrote:
> On 3/3/2015 12:09 AM, Martin Hicks wrote:
>>
>> On Mon, Mar 02, 2015 at 03:37:28PM +0100, Milan Broz wrote:
>>>
>>> If crypto API allows to encrypt more sectors in one run
>>> (handling IV internally) dmcrypt can be modified of course.
>>>
>>> But do not forget we can use another IV (not only sequential number)
>>> e.g. ESSIV with XTS as well (even if it doesn't make much sense, some people
>>> are using it).
>>
>> Interesting, I'd not considered using XTS with an IV other than plain/64.
>> The talitos hardware would not support aes/xts in any mode other than
>> plain/plain64 I don't think...Although perhaps you could push in an 8-byte
>> IV and the hardware would interpret it as the sector #.
>>
>
> For talitos, there are two cases:
>
> 1. request data size is <= data unit / sector size
> talitos can handle any IV / tweak scheme
>
> 2. request data size > sector size
> since talitos internally generates the IV for the next sector by
> incrementing the previous IV, only IV schemes that allocate consecutive
> IV to consecutive sectors will function correctly.
>

it's not clear to me that #1 is right.  I guess it could be, but the
IV length would be limited to 8 bytes.

This also points out that claiming that the XTS IV size is 16 bytes,
as my current patch does, could be problematic.  It's handy because
the first 8 bytes should contain a plain64 sector #, and the second
u64 can be used to encode the sector size but it would be a mistake
for someone to use the second 8 bytes for the rest of a 16byte IV.

mh

-- 
Martin Hicks P.Eng.  | m...@bork.org
Bork Consulting Inc. |   +1 (613) 266-2296
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode

2015-03-03 Thread Horia Geantă
On 3/3/2015 12:09 AM, Martin Hicks wrote:
> 
> On Mon, Mar 02, 2015 at 03:37:28PM +0100, Milan Broz wrote:
>>
>> If crypto API allows to encrypt more sectors in one run
>> (handling IV internally) dmcrypt can be modified of course.
>>
>> But do not forget we can use another IV (not only sequential number)
>> e.g. ESSIV with XTS as well (even if it doesn't make much sense, some people
>> are using it).
> 
> Interesting, I'd not considered using XTS with an IV other than plain/64.
> The talitos hardware would not support aes/xts in any mode other than
> plain/plain64 I don't think...Although perhaps you could push in an 8-byte
> IV and the hardware would interpret it as the sector #.
> 

For talitos, there are two cases:

1. request data size is <= data unit / sector size
talitos can handle any IV / tweak scheme

2. request data size > sector size
since talitos internally generates the IV for the next sector by
incrementing the previous IV, only IV schemes that allocate consecutive
IV to consecutive sectors will function correctly.

Let's not forget what XTS standard says about IVs / tweak values:
- each data unit (sector in this case) is assigned a non-negative tweak
value and
- tweak values are assigned *consecutively*, starting from an arbitrary
non-negative value
- there's no requirement for tweak values to be unpredictable

Thus, in theory ESSIV is not supposed to be used with XTS mode: the IVs
for consecutive sectors are not consecutive values.
In practice, as Milan said, the combination is sometimes used. It
functions correctly in SW (and also in talitos as long as req. data size
<= sector size).

>> Maybe the following question would be if the dmcrypt sector IV algorithms
>> should moved into crypto API as well.
>> (But because I misused dmcrypt IVs hooks for some additional operations
>> for loopAES and old Truecrypt CBC mode, it is not so simple...)
> 
> Speaking again with talitos in mind, there would be no advantage for this
> hardware.  Although larger requests are possible only a single IV can be
> provided per request, so for algorithms like AES-CBC and dm-crypt 512byte IOs
> are the only option (short of switching to 4kB block size).

Right, as explained above talitos does what the XTS mode standard
mandates. So it won't work properly in case of cbc-aes:essiv with
request sizes larger than sector size.

Still, in SW at least, XTS could be improved to process more sectors in
one shot, regardless of the IV scheme used - as long as there's a
IV.next() function and both data size and sector size are known.

Horia

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-03 Thread Martin Hicks
I was running into situations where the hardware FIFO was filling up, and
the code was returning EAGAIN to dm-crypt and just dropping the submitted
crypto request.

This adds support in talitos for a software backlog queue.  When requests
can't be queued to the hardware immediately EBUSY is returned.  The queued
requests are dispatched to the hardware in received order as hardware FIFO
slots become available.

Signed-off-by: Martin Hicks 
---
 drivers/crypto/talitos.c |  135 --
 drivers/crypto/talitos.h |3 ++
 2 files changed, 110 insertions(+), 28 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index b0c85ce..bb7fba0 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -182,55 +182,118 @@ static int init_device(struct device *dev)
return 0;
 }
 
-/**
- * talitos_submit - submits a descriptor to the device for processing
- * @dev:   the SEC device to be used
- * @ch:the SEC device channel to be used
- * @edesc: the descriptor to be processed by the device
- *
- * desc must contain valid dma-mapped (bus physical) address pointers.
- * callback must check err and feedback in descriptor header
- * for device processing status.
- */
-int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc)
+/* Dispatch 'request' if provided, otherwise a backlogged request */
+static int __talitos_handle_queue(struct device *dev, int ch,
+ struct talitos_edesc *edesc,
+ unsigned long *irq_flags)
 {
struct talitos_private *priv = dev_get_drvdata(dev);
-   struct talitos_request *request = &edesc->req;
-   unsigned long flags;
+   struct talitos_request *request;
+   struct crypto_async_request *areq;
int head;
+   int ret = -EINPROGRESS;
 
-   spin_lock_irqsave(&priv->chan[ch].head_lock, flags);
-
-   if (!atomic_inc_not_zero(&priv->chan[ch].submit_count)) {
+   if (!atomic_inc_not_zero(&priv->chan[ch].submit_count))
/* h/w fifo is full */
-   spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags);
-   return -EAGAIN;
+   return -EBUSY;
+
+   if (!edesc) {
+   /* Dequeue the oldest request */
+   areq = crypto_dequeue_request(&priv->chan[ch].queue);
+   request = container_of(areq, struct talitos_request, base);
+   } else {
+   request = &edesc->req;
}
 
-   head = priv->chan[ch].head;
request->dma_desc = dma_map_single(dev, request->desc,
   sizeof(*request->desc),
   DMA_BIDIRECTIONAL);
 
/* increment fifo head */
+   head = priv->chan[ch].head;
priv->chan[ch].head = (priv->chan[ch].head + 1) & (priv->fifo_len - 1);
 
-   smp_wmb();
-   priv->chan[ch].fifo[head] = request;
+   spin_unlock_irqrestore(&priv->chan[ch].head_lock, *irq_flags);
+
+   /*
+* Mark a backlogged request as in-progress.
+*/
+   if (!edesc) {
+   areq = request->context;
+   areq->complete(areq, -EINPROGRESS);
+   }
+
+   spin_lock_irqsave(&priv->chan[ch].head_lock, *irq_flags);
 
/* GO! */
+   priv->chan[ch].fifo[head] = request;
wmb();
out_be32(priv->chan[ch].reg + TALITOS_FF,
 upper_32_bits(request->dma_desc));
out_be32(priv->chan[ch].reg + TALITOS_FF_LO,
 lower_32_bits(request->dma_desc));
 
+   return ret;
+}
+
+/**
+ * talitos_submit - performs submissions of a new descriptors
+ *
+ * @dev:   the SEC device to be used
+ * @ch:the SEC device channel to be used
+ * @edesc: the request to be processed by the device
+ *
+ * edesc->req must contain valid dma-mapped (bus physical) address pointers.
+ * callback must check err and feedback in descriptor header
+ * for device processing status upon completion.
+ */
+int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc)
+{
+   struct talitos_private *priv = dev_get_drvdata(dev);
+   struct talitos_request *request = &edesc->req;
+   unsigned long flags;
+   int ret = -EINPROGRESS;
+
+   spin_lock_irqsave(&priv->chan[ch].head_lock, flags);
+
+   if (priv->chan[ch].queue.qlen) {
+   /*
+* There are backlogged requests.  Just queue this new request
+* and dispatch the oldest backlogged request to the hardware.
+*/
+   crypto_enqueue_request(&priv->chan[ch].queue,
+  &request->base);
+   __talitos_handle_queue(dev, ch, NULL, &flags);
+   ret = -EBUSY;
+   } else {
+   ret = __talitos_handle_queue(dev, ch, edesc, &flags);
+   if (ret == -EBUSY)
+   

[PATCH v2 4/5] crypto: talitos: Reorganize request submission data structures

2015-03-03 Thread Martin Hicks
This is preparatory work for moving to using the crypto async queue
handling code.  A talitos_request structure is buried into each
talitos_edesc so that when talitos_submit() is called, everything required
to defer the submission to the hardware is contained within talitos_edesc.

Signed-off-by: Martin Hicks 
---
 drivers/crypto/talitos.c |   95 +++---
 drivers/crypto/talitos.h |   41 +---
 2 files changed, 66 insertions(+), 70 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 7709805..b0c85ce 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -186,22 +186,16 @@ static int init_device(struct device *dev)
  * talitos_submit - submits a descriptor to the device for processing
  * @dev:   the SEC device to be used
  * @ch:the SEC device channel to be used
- * @desc:  the descriptor to be processed by the device
- * @callback:  whom to call when processing is complete
- * @context:   a handle for use by caller (optional)
+ * @edesc: the descriptor to be processed by the device
  *
  * desc must contain valid dma-mapped (bus physical) address pointers.
  * callback must check err and feedback in descriptor header
  * for device processing status.
  */
-int talitos_submit(struct device *dev, int ch, struct talitos_desc *desc,
-  void (*callback)(struct device *dev,
-   struct talitos_desc *desc,
-   void *context, int error),
-  void *context)
+int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc)
 {
struct talitos_private *priv = dev_get_drvdata(dev);
-   struct talitos_request *request;
+   struct talitos_request *request = &edesc->req;
unsigned long flags;
int head;
 
@@ -214,19 +208,15 @@ int talitos_submit(struct device *dev, int ch, struct 
talitos_desc *desc,
}
 
head = priv->chan[ch].head;
-   request = &priv->chan[ch].fifo[head];
-
-   /* map descriptor and save caller data */
-   request->dma_desc = dma_map_single(dev, desc, sizeof(*desc),
+   request->dma_desc = dma_map_single(dev, request->desc,
+  sizeof(*request->desc),
   DMA_BIDIRECTIONAL);
-   request->callback = callback;
-   request->context = context;
 
/* increment fifo head */
priv->chan[ch].head = (priv->chan[ch].head + 1) & (priv->fifo_len - 1);
 
smp_wmb();
-   request->desc = desc;
+   priv->chan[ch].fifo[head] = request;
 
/* GO! */
wmb();
@@ -247,15 +237,16 @@ EXPORT_SYMBOL(talitos_submit);
 static void flush_channel(struct device *dev, int ch, int error, int reset_ch)
 {
struct talitos_private *priv = dev_get_drvdata(dev);
-   struct talitos_request *request, saved_req;
+   struct talitos_request *request;
unsigned long flags;
int tail, status;
 
spin_lock_irqsave(&priv->chan[ch].tail_lock, flags);
 
tail = priv->chan[ch].tail;
-   while (priv->chan[ch].fifo[tail].desc) {
-   request = &priv->chan[ch].fifo[tail];
+   while (priv->chan[ch].fifo[tail]) {
+   request = priv->chan[ch].fifo[tail];
+   status = 0;
 
/* descriptors with their done bits set don't get the error */
rmb();
@@ -271,14 +262,9 @@ static void flush_channel(struct device *dev, int ch, int 
error, int reset_ch)
 sizeof(struct talitos_desc),
 DMA_BIDIRECTIONAL);
 
-   /* copy entries so we can call callback outside lock */
-   saved_req.desc = request->desc;
-   saved_req.callback = request->callback;
-   saved_req.context = request->context;
-
/* release request entry in fifo */
smp_wmb();
-   request->desc = NULL;
+   priv->chan[ch].fifo[tail] = NULL;
 
/* increment fifo tail */
priv->chan[ch].tail = (tail + 1) & (priv->fifo_len - 1);
@@ -287,8 +273,8 @@ static void flush_channel(struct device *dev, int ch, int 
error, int reset_ch)
 
atomic_dec(&priv->chan[ch].submit_count);
 
-   saved_req.callback(dev, saved_req.desc, saved_req.context,
-  status);
+   request->callback(dev, request->desc, request->context, status);
+
/* channel may resume processing in single desc error case */
if (error && !reset_ch && status == error)
return;
@@ -352,7 +338,8 @@ static u32 current_desc_hdr(struct device *dev, int ch)
tail = priv->chan[ch].tail;
 
iter = tail;
-   while (priv->chan[ch].fifo[iter].dma_desc != cur_desc) {
+   while (priv->chan[ch].fifo[iter] &&
+

[PATCH v2 3/5] crypto: talitos: Fix off-by-one and use all hardware slots

2015-03-03 Thread Martin Hicks
The submission count was off by one.

Signed-off-by: Martin Hicks 
---
 drivers/crypto/talitos.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 89cf4d5..7709805 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -2722,8 +2722,7 @@ static int talitos_probe(struct platform_device *ofdev)
goto err_out;
}
 
-   atomic_set(&priv->chan[i].submit_count,
-  -(priv->chfifo_len - 1));
+   atomic_set(&priv->chan[i].submit_count, -priv->chfifo_len);
}
 
dma_set_mask(dev, DMA_BIT_MASK(36));
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/5] crypto: talitos: Remove MD5_BLOCK_SIZE

2015-03-03 Thread Martin Hicks
This is properly defined in the md5 header file.

Signed-off-by: Martin Hicks 
---
 drivers/crypto/talitos.c |6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index c49d977..89cf4d5 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -637,8 +637,6 @@ static void talitos_unregister_rng(struct device *dev)
 #define TALITOS_MAX_KEY_SIZE   96
 #define TALITOS_MAX_IV_LENGTH  16 /* max of AES_BLOCK_SIZE, 
DES3_EDE_BLOCK_SIZE */
 
-#define MD5_BLOCK_SIZE64
-
 struct talitos_ctx {
struct device *dev;
int ch;
@@ -2195,7 +2193,7 @@ static struct talitos_alg_template driver_algs[] = {
.halg.base = {
.cra_name = "md5",
.cra_driver_name = "md5-talitos",
-   .cra_blocksize = MD5_BLOCK_SIZE,
+   .cra_blocksize = MD5_HMAC_BLOCK_SIZE,
.cra_flags = CRYPTO_ALG_TYPE_AHASH |
 CRYPTO_ALG_ASYNC,
}
@@ -2285,7 +2283,7 @@ static struct talitos_alg_template driver_algs[] = {
.halg.base = {
.cra_name = "hmac(md5)",
.cra_driver_name = "hmac-md5-talitos",
-   .cra_blocksize = MD5_BLOCK_SIZE,
+   .cra_blocksize = MD5_HMAC_BLOCK_SIZE,
.cra_flags = CRYPTO_ALG_TYPE_AHASH |
 CRYPTO_ALG_ASYNC,
}
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/5] crypto: talitos: Add crypto async queue handling

2015-03-03 Thread Martin Hicks
I was testing dm-crypt performance with a Freescale P1022 board with 
a recent kernel and was getting IO errors while doing testing with LUKS.
Investigation showed that all hardware FIFO slots were filling and
the driver was returning EAGAIN to the block layer, which is not an
expected response for an async crypto implementation.

The following patch series adds a few small fixes, and reworks the
submission path to use the crypto_queue mechanism to handle the
request backlog.

Changes since v1:

- Ran checkpatch.pl
- Split the path for submitting new requests vs. issuing backlogged
  requests.
- Avoid enqueuing a submitted request to the crypto queue unnecessarily.
- Fix return paths where CRYPTO_TFM_REQ_MAY_BACKLOG is not set.


Martin Hicks (5):
  crypto: talitos: Simplify per-channel initialization
  crypto: talitos: Remove MD5_BLOCK_SIZE
  crypto: talitos: Fix off-by-one and use all hardware slots
  crypto: talitos: Reorganize request submission data structures
  crypto: talitos: Add software backlog queue handling

 drivers/crypto/talitos.c |  240 +++---
 drivers/crypto/talitos.h |   44 +++--
 2 files changed, 177 insertions(+), 107 deletions(-)

-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/5] crypto: talitos: Simplify per-channel initialization

2015-03-03 Thread Martin Hicks
There were multiple loops in a row, for each separate step of the
initialization of the channels.  Simplify to a single loop.

Signed-off-by: Martin Hicks 
---
 drivers/crypto/talitos.c |   11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 067ec21..c49d977 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -2706,20 +2706,16 @@ static int talitos_probe(struct platform_device *ofdev)
goto err_out;
}
 
+   priv->fifo_len = roundup_pow_of_two(priv->chfifo_len);
+
for (i = 0; i < priv->num_channels; i++) {
priv->chan[i].reg = priv->reg + TALITOS_CH_STRIDE * (i + 1);
if (!priv->irq[1] || !(i & 1))
priv->chan[i].reg += TALITOS_CH_BASE_OFFSET;
-   }
 
-   for (i = 0; i < priv->num_channels; i++) {
spin_lock_init(&priv->chan[i].head_lock);
spin_lock_init(&priv->chan[i].tail_lock);
-   }
 
-   priv->fifo_len = roundup_pow_of_two(priv->chfifo_len);
-
-   for (i = 0; i < priv->num_channels; i++) {
priv->chan[i].fifo = kzalloc(sizeof(struct talitos_request) *
 priv->fifo_len, GFP_KERNEL);
if (!priv->chan[i].fifo) {
@@ -2727,11 +2723,10 @@ static int talitos_probe(struct platform_device *ofdev)
err = -ENOMEM;
goto err_out;
}
-   }
 
-   for (i = 0; i < priv->num_channels; i++)
atomic_set(&priv->chan[i].submit_count,
   -(priv->chfifo_len - 1));
+   }
 
dma_set_mask(dev, DMA_BIT_MASK(36));
 
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] powerpc/pmac: Fix DT refcount imbalance in pmac_pic_probe_oldstyle

2015-03-03 Thread Geert Uytterhoeven
Internally, of_find_node_by_name() calls of_node_put() on its "from"
parameter, which must not be done on "master", as it's still in use, and
will be released manually later.  This may cause a zero kref refcount.

Call of_node_get() before to compensate for this.

Signed-off-by: Geert Uytterhoeven 
---
Compile-tested only

v2:
  - Avoid a logic change by adding a call to of_node_get() instead of
replacing of_find_node_by_name() by of_get_child_by_name().
No-one seems to remember which machines are affected by this.
---
 arch/powerpc/platforms/powermac/pic.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/platforms/powermac/pic.c 
b/arch/powerpc/platforms/powermac/pic.c
index 4c24bf60d39d2834..59cfc9d63c2d51a7 100644
--- a/arch/powerpc/platforms/powermac/pic.c
+++ b/arch/powerpc/platforms/powermac/pic.c
@@ -321,6 +321,9 @@ static void __init pmac_pic_probe_oldstyle(void)
max_irqs = max_real_irqs = 64;
 
/* We might have a second cascaded heathrow */
+
+   /* Compensate for of_node_put() in of_find_node_by_name() */
+   of_node_get(master);
slave = of_find_node_by_name(master, "mac-io");
 
/* Check ordering of master & slave */
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] seccomp: switch to using asm-generic for seccomp.h

2015-03-03 Thread Ingo Molnar

* Kees Cook  wrote:

> Most architectures don't need to do anything special for the strict
> seccomp syscall entries. Remove the redundant headers and reduce the
> others.

>  19 files changed, 27 insertions(+), 137 deletions(-)

Lovely cleanup factor.

Just to make sure, are you sure the 32-bit details are identical 
across architectures?

For example some architectures did this:

> --- a/arch/microblaze/include/asm/seccomp.h
> +++ /dev/null
> @@ -1,16 +0,0 @@
> -#ifndef _ASM_MICROBLAZE_SECCOMP_H
> -#define _ASM_MICROBLAZE_SECCOMP_H
> -
> -#include 
> -
> -#define __NR_seccomp_read__NR_read
> -#define __NR_seccomp_write   __NR_write
> -#define __NR_seccomp_exit__NR_exit
> -#define __NR_seccomp_sigreturn   __NR_sigreturn
> -
> -#define __NR_seccomp_read_32 __NR_read
> -#define __NR_seccomp_write_32__NR_write
> -#define __NR_seccomp_exit_32 __NR_exit
> -#define __NR_seccomp_sigreturn_32__NR_sigreturn

others did this:

> diff --git a/arch/x86/include/asm/seccomp_64.h 
> b/arch/x86/include/asm/seccomp_64.h
> deleted file mode 100644
> index 84ec1bd161a5..
> --- a/arch/x86/include/asm/seccomp_64.h
> +++ /dev/null
> @@ -1,17 +0,0 @@
> -#ifndef _ASM_X86_SECCOMP_64_H
> -#define _ASM_X86_SECCOMP_64_H
> -
> -#include 
> -#include 
> -
> -#define __NR_seccomp_read __NR_read
> -#define __NR_seccomp_write __NR_write
> -#define __NR_seccomp_exit __NR_exit
> -#define __NR_seccomp_sigreturn __NR_rt_sigreturn
> -
> -#define __NR_seccomp_read_32 __NR_ia32_read
> -#define __NR_seccomp_write_32 __NR_ia32_write
> -#define __NR_seccomp_exit_32 __NR_ia32_exit
> -#define __NR_seccomp_sigreturn_32 __NR_ia32_sigreturn
> -
> -#endif /* _ASM_X86_SECCOMP_64_H */

While in yet another case you kept the syscall mappings:

> --- a/arch/x86/include/asm/seccomp.h
> +++ b/arch/x86/include/asm/seccomp.h
> @@ -1,5 +1,20 @@
> +#ifndef _ASM_X86_SECCOMP_H
> +#define _ASM_X86_SECCOMP_H
> +
> +#include 
> +
> +#ifdef CONFIG_COMPAT
> +#include 
> +#define __NR_seccomp_read_32 __NR_ia32_read
> +#define __NR_seccomp_write_32__NR_ia32_write
> +#define __NR_seccomp_exit_32 __NR_ia32_exit
> +#define __NR_seccomp_sigreturn_32__NR_ia32_sigreturn
> +#endif
> +
>  #ifdef CONFIG_X86_32
> -# include 
> -#else
> -# include 
> +#define __NR_seccomp_sigreturn   __NR_sigreturn
>  #endif
> +
> +#include 
> +
> +#endif /* _ASM_X86_SECCOMP_H */

It might all be correct, but it's not obvious to me.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] ibmveth: Add function to enable live MAC address changes

2015-03-03 Thread Jiri Pirko
Mon, Mar 02, 2015 at 06:56:12PM CET, tlfal...@linux.vnet.ibm.com wrote:
>Add a function that will enable changing the MAC address
>of an ibmveth interface while it is still running.
>
>Signed-off-by: Thomas Falcon 

Reviewed-by: Jiri Pirko 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev