date:20130905

Re: [PATCH] can: pcan_usb_core: fix memory leak on failure paths in peak_usb_start()

2013-09-05 Thread Stephane Grosjean



Le 06/09/2013 08:56, Marc Kleine-Budde a écrit :

On 09/06/2013 08:52 AM, Stephane Grosjean wrote:

Tx and rx urbs are not deallocated if something goes wrong in peak_usb_start().
The patch fixes error handling to deallocate all the resources.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
Acked-by: Stephane Grosjean 

Tnx,
Marc

BTW: A simply reply to the original patch with your Acked-by is sufficient.



Ok, thx Marc. I keep it in mind for the next time (if any ;-))

Stéphane
--
PEAK-System Technik GmbH, Otto-Roehm-Strasse 69, D-64293 Darmstadt 
Geschaeftsleitung: A.Gach/U.Wilhelm,St.Nr.:007/241/13586 FA Darmstadt 
HRB-9183 Darmstadt, Ust.IdNr.:DE 202220078, WEE-Reg.-Nr.: DE39305391 
Tel.+49 (0)6151-817320 / Fax:+49 (0)6151-817329, i...@peak-system.com


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: OMAP2+: am335x-bone*: add DT for BeagleBone Black

2013-09-05 Thread George Cherian


On 9/6/2013 12:03 PM, Koen Kooi wrote:

The BeagleBone Black is basically a regular BeagleBone with eMMC and HDMI added,
so create a common dtsi both can use. MMC support for AM335x still isn't in, so
only the LDO change has been added.

Signed-off-by: Koen Kooi 
---
  .../{am335x-bone.dts => am335x-bone-common.dtsi}   |   3 -
  arch/arm/boot/dts/am335x-bone.dts  | 256 +
  arch/arm/boot/dts/am335x-boneblack.dts |  18 ++
  3 files changed, 19 insertions(+), 258 deletions(-)
  copy arch/arm/boot/dts/{am335x-bone.dts => am335x-bone-common.dtsi} (99%)
  create mode 100644 arch/arm/boot/dts/am335x-boneblack.dts
How did you test am335x-boneblack.dtb? where are the Makefile changes 
for boneblack?




diff --git a/arch/arm/boot/dts/am335x-bone.dts 
b/arch/arm/boot/dts/am335x-bone-common.dtsi
similarity index 99%
copy from arch/arm/boot/dts/am335x-bone.dts
copy to arch/arm/boot/dts/am335x-bone-common.dtsi
index d318987..2f66ded 100644
--- a/arch/arm/boot/dts/am335x-bone.dts
+++ b/arch/arm/boot/dts/am335x-bone-common.dtsi
@@ -5,9 +5,6 @@
   * it under the terms of the GNU General Public License version 2 as
   * published by the Free Software Foundation.
   */
-/dts-v1/;
-
-#include "am33xx.dtsi"
  
  / {

model = "TI AM335x BeagleBone";
diff --git a/arch/arm/boot/dts/am335x-bone.dts 
b/arch/arm/boot/dts/am335x-bone.dts
index d318987..7993c48 100644
--- a/arch/arm/boot/dts/am335x-bone.dts
+++ b/arch/arm/boot/dts/am335x-bone.dts
@@ -8,258 +8,4 @@
  /dts-v1/;
  
  #include "am33xx.dtsi"

-
-/ {
-   model = "TI AM335x BeagleBone";
-   compatible = "ti,am335x-bone", "ti,am33xx";
-
-   cpus {
-   cpu@0 {
-   cpu0-supply = <&dcdc2_reg>;
-   };
-   };
-
-   memory {
-   device_type = "memory";
-   reg = <0x8000 0x1000>; /* 256 MB */
-   };
-
-   am33xx_pinmux: pinmux@44e10800 {
-   pinctrl-names = "default";
-   pinctrl-0 = <&clkout2_pin>;
-
-   user_leds_s0: user_leds_s0 {
-   pinctrl-single,pins = <
-   0x54 (PIN_OUTPUT_PULLDOWN | MUX_MODE7)  /* 
gpmc_a5.gpio1_21 */
-   0x58 (PIN_OUTPUT_PULLUP | MUX_MODE7)/* 
gpmc_a6.gpio1_22 */
-   0x5c (PIN_OUTPUT_PULLDOWN | MUX_MODE7)  /* 
gpmc_a7.gpio1_23 */
-   0x60 (PIN_OUTPUT_PULLUP | MUX_MODE7)/* 
gpmc_a8.gpio1_24 */
-   >;
-   };
-
-   i2c0_pins: pinmux_i2c0_pins {
-   pinctrl-single,pins = <
-   0x188 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
i2c0_sda.i2c0_sda */
-   0x18c (PIN_INPUT_PULLUP | MUX_MODE0)/* 
i2c0_scl.i2c0_scl */
-   >;
-   };
-
-   uart0_pins: pinmux_uart0_pins {
-   pinctrl-single,pins = <
-   0x170 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
uart0_rxd.uart0_rxd */
-   0x174 (PIN_OUTPUT_PULLDOWN | MUX_MODE0) /* 
uart0_txd.uart0_txd */
-   >;
-   };
-
-   clkout2_pin: pinmux_clkout2_pin {
-   pinctrl-single,pins = <
-   0x1b4 (PIN_OUTPUT_PULLDOWN | MUX_MODE3) /* 
xdma_event_intr1.clkout2 */
-   >;
-   };
-
-   cpsw_default: cpsw_default {
-   pinctrl-single,pins = <
-   /* Slave 1 */
-   0x110 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxerr.mii1_rxerr */
-   0x114 (PIN_OUTPUT_PULLDOWN | MUX_MODE0) /* 
mii1_txen.mii1_txen */
-   0x118 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxdv.mii1_rxdv */
-   0x11c (PIN_OUTPUT_PULLDOWN | MUX_MODE0) /* 
mii1_txd3.mii1_txd3 */
-   0x120 (PIN_OUTPUT_PULLDOWN | MUX_MODE0) /* 
mii1_txd2.mii1_txd2 */
-   0x124 (PIN_OUTPUT_PULLDOWN | MUX_MODE0) /* 
mii1_txd1.mii1_txd1 */
-   0x128 (PIN_OUTPUT_PULLDOWN | MUX_MODE0) /* 
mii1_txd0.mii1_txd0 */
-   0x12c (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_txclk.mii1_txclk */
-   0x130 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxclk.mii1_rxclk */
-   0x134 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxd3.mii1_rxd3 */
-   0x138 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxd2.mii1_rxd2 */
-   0x13c (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxd1.mii1_rxd1 */
-   0x140 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxd0.mii1_rxd0 */
-   >;
-   };
-
-   cpsw_sl

Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-05 Thread Gleb Natapov

On Thu, Sep 05, 2013 at 02:05:09PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2013-09-03 at 13:53 +0300, Gleb Natapov wrote:
> > > Or supporting all IOMMU links (and leaving emulated stuff as is) in on
> > > "device" is the last thing I have to do and then you'll ack the patch?
> > > 
> > I am concerned more about API here. Internal implementation details I
> > leave to powerpc experts :)
> 
> So Gleb, I want to step in for a bit here.
> 
> While I understand that the new KVM device API is all nice and shiny and that 
> this
> whole thing should probably have been KVM devices in the first place (had they
> existed or had we been told back then), the point is, the API for handling
> HW IOMMUs that Alexey is trying to add is an extension of an existing 
> mechanism
> used for emulated IOMMUs.
> 
> The internal data structure is shared, and fundamentally, by forcing him to
> use that new KVM device for the "new stuff", we create a oddball API with
> an ioctl for one type of iommu and a KVM device for the other, which makes
> the implementation a complete mess in the kernel (and you should care :-)
> 
Is it unfixable mess? Even if Alexey will do what you suggested earlier?

  - Convert *both* existing TCE objects to the new
  KVM_CREATE_DEVICE, and have some backward compat code for the old one.

The point is implementation usually can be changed, but for API it is
much harder to do so.

> So for something completely new, I would tend to agree with you. However, I
> still think that for this specific case, we should just plonk-in the original
> ioctl proposed by Alexey and be done with it.
> 
Do you think this is the last extension to IOMMU code, or we will see
more and will use same justification to continue adding ioctls?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] can: pcan_usb_core: fix memory leak on failure paths in peak_usb_start()

2013-09-05 Thread Marc Kleine-Budde

On 09/06/2013 08:52 AM, Stephane Grosjean wrote:
> Tx and rx urbs are not deallocated if something goes wrong in 
> peak_usb_start().
> The patch fixes error handling to deallocate all the resources.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Alexey Khoroshilov 
> Acked-by: Stephane Grosjean 

Tnx,
Marc

BTW: A simply reply to the original patch with your Acked-by is sufficient.

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] VMCI: fix to pass correct device identity to free_irq()

2013-09-05 Thread Dmitry Torokhov

On Fri, Sep 06, 2013 at 02:39:28PM +0800, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> free_irq() expects the same device identity that was passed to
> corresponding request_irq(), otherwise the IRQ is not freed.
> 
> Signed-off-by: Wei Yongjun 

Acked-by: Dmitry Torokhov 

> ---
>  drivers/misc/vmw_vmci/vmci_guest.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
> b/drivers/misc/vmw_vmci/vmci_guest.c
> index b3a2b76..c98b03b 100644
> --- a/drivers/misc/vmw_vmci/vmci_guest.c
> +++ b/drivers/misc/vmw_vmci/vmci_guest.c
> @@ -649,7 +649,7 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
>   return 0;
>  
>  err_free_irq:
> - free_irq(vmci_dev->irq, &vmci_dev);
> + free_irq(vmci_dev->irq, vmci_dev);
>   tasklet_kill(&vmci_dev->datagram_tasklet);
>   tasklet_kill(&vmci_dev->bm_tasklet);
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3] PCI: exynos: add support for MSI

2013-09-05 Thread Jingoo Han

This patch adds support for Message Signaled Interrupt in the
Exynos PCIe diver using Synopsys designware PCIe core IP.

Signed-off-by: Siva Reddy Kallam 
Signed-off-by: Srikanth T Shivanand 
Signed-off-by: Jingoo Han 
Cc: Pratyush Anand 
Cc: Mohit KUMAR 
---
Changes since v2:
- fixed MAX_MSI_CTRLS because MAX_MSI_IRQS is 32 only
- used __get_free_pages() to allocate msi_data
- used one msi_data and msi_irq_in_use per one RC
- used irq_domain to represent the MSI controller
- removed msi-base irq number from device tree because this is not
  a hardware property.

Changes since v1:
- removed unnecessary exynos_pcie_clear_irq_level()
- updated the bindings documentation
- used new msi_chip infrastructure
- removed ARCH_SUPPORTS_MSI
- replaced #ifdef guards with IS_ENABLED(CONFIG_PCI_MSI)

 drivers/pci/host/pci-exynos.c  |   44 +++
 drivers/pci/host/pcie-designware.c |  240 
 drivers/pci/host/pcie-designware.h |   14 +++
 3 files changed, 298 insertions(+)

diff --git a/drivers/pci/host/pci-exynos.c b/drivers/pci/host/pci-exynos.c
index 94e096b..f062aca 100644
--- a/drivers/pci/host/pci-exynos.c
+++ b/drivers/pci/host/pci-exynos.c
@@ -48,6 +48,7 @@ struct exynos_pcie {
 #define PCIE_IRQ_SPECIAL   0x008
 #define PCIE_IRQ_EN_PULSE  0x00c
 #define PCIE_IRQ_EN_LEVEL  0x010
+#define IRQ_MSI_ENABLE (0x1 << 2)
 #define PCIE_IRQ_EN_SPECIAL0x014
 #define PCIE_PWR_RESET 0x018
 #define PCIE_CORE_RESET0x01c
@@ -342,9 +343,36 @@ static irqreturn_t exynos_pcie_irq_handler(int irq, void 
*arg)
return IRQ_HANDLED;
 }
 
+static irqreturn_t exynos_pcie_msi_irq_handler(int irq, void *arg)
+{
+   struct pcie_port *pp = arg;
+
+   dw_handle_msi_irq(pp);
+
+   return IRQ_HANDLED;
+}
+
+static void exynos_pcie_msi_init(struct pcie_port *pp)
+{
+   u32 val;
+   struct exynos_pcie *exynos_pcie = to_exynos_pcie(pp);
+
+   dw_pcie_msi_init(pp);
+
+   /* enable MSI interrupt */
+   val = exynos_elb_readl(exynos_pcie, PCIE_IRQ_EN_LEVEL);
+   val |= IRQ_MSI_ENABLE;
+   exynos_elb_writel(exynos_pcie, val, PCIE_IRQ_EN_LEVEL);
+   return;
+}
+
 static void exynos_pcie_enable_interrupts(struct pcie_port *pp)
 {
exynos_pcie_enable_irq_pulse(pp);
+
+   if (IS_ENABLED(CONFIG_PCI_MSI))
+   exynos_pcie_msi_init(pp);
+
return;
 }
 
@@ -430,6 +458,22 @@ static int add_pcie_port(struct pcie_port *pp, struct 
platform_device *pdev)
return ret;
}
 
+   if (IS_ENABLED(CONFIG_PCI_MSI)) {
+   pp->msi_irq = platform_get_irq(pdev, 0);
+   if (!pp->msi_irq) {
+   dev_err(&pdev->dev, "failed to get msi irq\n");
+   return -ENODEV;
+   }
+
+   ret = devm_request_irq(&pdev->dev, pp->msi_irq,
+   exynos_pcie_msi_irq_handler,
+   IRQF_SHARED, "exynos-pcie", pp);
+   if (ret) {
+   dev_err(&pdev->dev, "failed to request msi irq\n");
+   return ret;
+   }
+   }
+
pp->root_bus_nr = -1;
pp->ops = &exynos_pcie_host_ops;
 
diff --git a/drivers/pci/host/pcie-designware.c 
b/drivers/pci/host/pcie-designware.c
index c10e9ac..8963017 100644
--- a/drivers/pci/host/pcie-designware.c
+++ b/drivers/pci/host/pcie-designware.c
@@ -11,8 +11,11 @@
  * published by the Free Software Foundation.
  */
 
+#include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -142,6 +145,204 @@ int dw_pcie_wr_own_conf(struct pcie_port *pp, int where, 
int size,
return ret;
 }
 
+static struct irq_chip dw_msi_irq_chip = {
+   .name = "PCI-MSI",
+   .irq_enable = unmask_msi_irq,
+   .irq_disable = mask_msi_irq,
+   .irq_mask = mask_msi_irq,
+   .irq_unmask = unmask_msi_irq,
+};
+
+/* MSI int handler */
+void dw_handle_msi_irq(struct pcie_port *pp)
+{
+   unsigned long val;
+   int i, pos;
+
+   for (i = 0; i < MAX_MSI_CTRLS; i++) {
+   dw_pcie_rd_own_conf(pp, PCIE_MSI_INTR0_STATUS + i * 12, 4,
+   (u32 *)&val);
+   if (val) {
+   pos = 0;
+   while ((pos = find_next_bit(&val, 32, pos)) != 32) {
+   generic_handle_irq(pp->msi_irq_start
+   + (i * 32) + pos);
+   pos++;
+   }
+   }
+   dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS + i * 12, 4, val);
+   }
+}
+
+void dw_pcie_msi_init(struct pcie_port *pp)
+{
+   pp->msi_data = __get_free_pages(GFP_KERNEL, 0);
+
+   /* program the msi_data */
+   dw_pcie_wr_own_conf(pp, PCIE_MSI_ADDR_LO, 4,
+   virt_to_phys((void *)pp->msi_data));

Re: [PATCH 2/2] fsl: set wakeup sources

2013-09-05 Thread Hongbo Zhang


Sorry linux-kernel subscribers,
This is for team internal review, linux-kernel is cced due to my 
carelessness, omit this mail please.



On 09/06/2013 02:46 PM, hongbo.zh...@freescale.com wrote:

From: Hongbo Zhang 

Some devices can work as wakeup sources, they should be powerred on during
system deep sleep, this patch adds interface for configuring devices power
supply status during deep sleep.

Signed-off-by: Hongbo Zhang 
---
  arch/powerpc/boot/dts/fsl/qoriq-power.dtsi |   73 
  arch/powerpc/sysdev/fsl_rcpm.c |   43 
  2 files changed, 116 insertions(+)
  create mode 100644 arch/powerpc/boot/dts/fsl/qoriq-power.dtsi

diff --git a/arch/powerpc/boot/dts/fsl/qoriq-power.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-power.dtsi
new file mode 100644
index 000..c5c2ba0
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/qoriq-power.dtsi
@@ -0,0 +1,73 @@
+/*
+ * QorIQ Power Management device tree stub
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *   names of its contributors may be used to endorse or promote products
+ *   derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* IPPDEXPCR: IP Power Down EXcePtion Control Register */
+rcpm-power@e2140 {
+   compatible = "fsl,rcpm-ippdexpcr";
+   reg = <0xe2140 0x4>;
+
+   mac1_1_power: soc-power@0 {
+   fsl,ippdexpcr-mask = <0x8000>;
+   };
+   mac1_2_power: soc-power@1 {
+   fsl,ippdexpcr-mask = <0x4000>;
+   };
+   mac1_3_power: soc-power@2 {
+   fsl,ippdexpcr-mask = <0x2000>;
+   };
+   mac1_4_power: soc-power@3 {
+   fsl,ippdexpcr-mask = <0x1000>;
+   };
+   mac1_5_power: soc-power@4 {
+   fsl,ippdexpcr-mask = <0x0800>;
+   };
+   sdhc_power: soc-power@24 {
+   fsl,ippdexpcr-mask = <0x0080>;
+   };
+   gpio_power: soc-power@25 {
+   fsl,ippdexpcr-mask = <0x0040>;
+   };
+   usb1_power: soc-power@26 {
+   fsl,ippdexpcr-mask = <0x0020>;
+   };
+   usb2_power: soc-power@27 {
+   fsl,ippdexpcr-mask = <0x0010>;
+   };
+   fman1_power: soc-power@28 {
+   fsl,ippdexpcr-mask = <0x0008>;
+   };
+   sap_power: soc-power@31 {
+   fsl,ippdexpcr-mask = <0x0001>;
+   };
+};
diff --git a/arch/powerpc/sysdev/fsl_rcpm.c b/arch/powerpc/sysdev/fsl_rcpm.c
index ecf43a2..bc21aea 100644
--- a/arch/powerpc/sysdev/fsl_rcpm.c
+++ b/arch/powerpc/sysdev/fsl_rcpm.c
@@ -23,6 +23,49 @@
  struct ccsr_rcpm __iomem *rcpm1_regs;
  struct ccsr_rcpm_v2 __iomem *rcpm2_regs;
  
+/**

+ * fsl_rcpm_set_wake - enable/disable device working as wakeup source
+ * @dev: device affected
+ * @enable: true for keeping power on for this device during deep sleep
+ *  false otherwise
+ *
+ * return 0 on success, return -EINVAL if the device cannot wake up system
+ * and -ENODEV if RCPM unavailable
+ */
+int fsl_rcpm_set_wake(struct device *dev, bool enable)
+{
+   int ret = 0;
+   struct device_node *pw_np;
+   u32 pw_mask;
+
+   if (!rcpm2_regs) {
+   dev_err(dev, "%s: RCPM is unavailable\n", __func__);
+   return -ENODEV;
+   }
+
+   if (enable && !device_may_wakeup(dev))
+   return -EINVAL;
+
+

Re: [PATCH v4 3/5] clk: dt: binding for basic multiplexer clock

2013-09-05 Thread Tero Kristo


Hi,

Chirping in my thoughts below.

On 09/05/2013 11:30 PM, Stephen Warren wrote:

On 09/05/2013 12:29 PM, Mike Turquette wrote:

On Wed, Sep 4, 2013 at 11:36 AM, Stephen Warren  wrote:

On 09/03/2013 05:22 PM, Mike Turquette wrote:

Quoting Stephen Warren (2013-08-30 14:37:46)

On 08/30/2013 02:33 PM, Mike Turquette wrote:

...

The clock _data_ seems to always have some churn to it. Moving it out to
DT reduces that churn from Linux. My concern above is not about kernel
data size.


That sounds like the opposite of what we should be doing.

It's fine for kernel code/data to change; that's a natural part of
development. Obviously, we should minimize churn, through thorough
review, domain knowledge, etc.


And with the "clock mapping" style bindings we'll end up changing both
the DT binding definition and the kernel. Not great.


What's a "clock mapping" style binding? I guess that means the style
where you have a single DT node that provides multiple clocks, rather
than one DT node per clock?

If the kernel driver changes its internal data, I don't see why that
would have any impact at all on the DT binding definition. We should be
able to use one DT binding definition with arbitrary drivers.


Yes, I'm referring to a single node providing multiple clocks. As an
example see the Exynos 5420 binding:
Documentation/devicetree/bindings/clock/exynos5420-clock.txt

The clock id's are stored as part of the binding definition resulting
in a mapping scheme that can be fragile.


The mapping shouldn't be fragile if e.g.
include/dt-bindings/clock/exynos5420.h were used to define the values.
That way, both the Exynos clock driver and Exynos DT files could both
include the header, and would always be in sync.


There have already been
patches to fix the id's assigned in the binding, which isn't supposed
to happen because it's a stable interface.


That's definitely a real problem. The values should be stable.
Preferably, the values should be derived from some aspect of the HW, and
hence be stable.

For example, many clock IDs on Tegra are derived from the clock's bit
index within the peripheral clock enable registers. Although I must
admit we have a bit of a mess in the Tegra clocks w.r.t. mis-using clock
IDs for reset IDs and hence there are some peripheral clock IDS that
don't map 1:1 with the register, and there are other clocks which aren't
peripheral clocksthat we've assigned arbitrary IDs to rather than some
HW-derived ID.

Alternatively, perhaps a register address unique to the clock could be used.

If new values are added, the additions should all happen in a single
tree, and hence can be co-ordinated, thus avoiding any merge-conflicts.

Even ignoring HW-derived clock IDs, people writing DT bindings simply
need to get used to bindings being an ABI, and put extra effort into
making sure the list of clocks is accurate and complete.

Finally, while it's true that a DT binding definition is an ABI, and
perhaps DT content isn't (so if there's a DT content bug it can simply
be fixed), if DT is wrong because of insufficient thought about its
content, it's still wrong, and the system doesn't work correctly.
Whether we edit a kernel clock driver or a DT file to solve a problem,
there was still a problem. Placing the data into DT doesn't make it any
less likely there will be a problem if sufficient care isn't taken when
thinking about the clock structure.


If clock phandles are
created by individual nodes in DT then the binding definition need
never be updated due to merge conflicts or renaming which plagues the
mapping scenario.


That's true.

But if we take that approach, shouldn't we just ban #clock-cells?

The only case #clock-cells would still be legitimate would be an array
of identical clocks represented by a single node, and even then the
argument could be extended so say: just write out a node for each clock
in the array, just like if the clocks weren't in an array or were
different types.


And I'll respond to your points below but the whole "relocate the
problem to DT" argument is simply not my main point. What I want to do
is increase the usefulness of DT by allowing register-level details into
the binding which can


Can you expand upon why a DT that encodes register-level details is more
useful? I can't see why there would be any difference in usefulness.


Sure. The usefulness comes out of the fact that we do not need to
maintain data synchronization across dts and clock provider drivers.


Only the clock IDs. That's a very small amount of information. And
synchronizing the two simply means including a header file that defines
the IDs in both places. This is *exactly* why I created the
include/dt-bindings/ directory, to house such header files.


The data lives in one place and only one place. We absolutely need a
phandle to a clock in DT link clock consumer devices to their input
clocks, so there is no question that should be in DT. Since we're
already doing that, why not do away with trying to keep d

Re: [PATCH v14 0/6] LSM: Multiple concurrent LSMs

2013-09-05 Thread Casey Schaufler

On 9/5/2013 11:48 AM, Kees Cook wrote:
> On Mon, Aug 26, 2013 at 7:29 PM, Casey Schaufler  
> wrote:
>> On 8/6/2013 3:36 PM, Kees Cook wrote:
>>> On Tue, Aug 6, 2013 at 3:25 PM, Casey Schaufler  
>>> wrote:
 On 8/5/2013 11:30 PM, Kees Cook wrote:
> On Thu, Jul 25, 2013 at 11:52 PM, Casey Schaufler 
>  wrote:
>> The /proc/*/attr interfaces are given to one LSM. This can be
>> done by setting CONFIG_SECURITY_PRESENT. Additional interfaces
>> have been created in /proc/*/attr so that each LSM has its own
>> named interfaces. The name of the presenting LSM can be read from
> For me, this is one problem that was bothering me, but it was a cosmetic
> one that I'd mentioned before: I really disliked the /proc/$pid/attr
> interface being named "$lsm.$file". I feel it's important to build
> directories in attr/ for each LSM. So, I spent time to figure out a way to
> do this. This patch changes the interface to /proc/$pid/attr/$lsm/$file
> instead, which I feel has a much more appealing organizational structure.
 I will confess that the reason I went with .current instead of
 /current was that the former was easier to implement.
>>> Yeah, that's totally fine. It wasn't very obvious (to me) how to
>>> implement this initially, so no problem at all. I'm glad there was
>>> something more than bug fixes I could contribute to this series. :)
>> Oh dear. I'm rebasing for 3.12 and the macros don't generate compiling
>> code any longer. It seems that, among other things, readdir is no longer
>> a member of file_operations.
> Looks like f0c3b5093addc8bfe9fe3a5b01acb7ec7969eafa is what touched
> fs/proc/base.c and it should just need a few tweaks from "readdir"
> becoming "iterate", and the prototype changing.
>
> So it should just require bump the macros a little. Let's see if gmail
> eats my paste...
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 4c80ffd..f670349 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -2358,17 +2358,17 @@ static const struct file_operations 
> proc_pid_attr_operat
>  };
>
>  #define LSM_DIR_OPS(LSM) \
> -static int proc_##LSM##_attr_dir_readdir(struct file * filp, \
> -void * dirent, filldir_t filldir) \
> +static int proc_##LSM##_attr_dir_iterate(struct file * filp, \
> +struct dir_context *ctx) \
>  { \
> -   return proc_pident_readdir(filp, dirent, filldir, \
> +   return proc_pident_readdir(filp, ctx, \
>LSM##_attr_dir_stuff, \
>ARRAY_SIZE(LSM##_attr_dir_stuff)); \
>  } \
>  \
>  static const struct file_operations proc_##LSM##_attr_dir_ops = { \
> .read   = generic_read_dir, \
> -   .readdir= proc_##LSM##_attr_dir_readdir, \
> +   .iterate= proc_##LSM##_attr_dir_iterate, \
> .llseek = default_llseek, \
>  }; \
>  \
>
>
> Do you have the rest of the series already ported to 3.12?
>
> -Kees
>
Yes, but I did it last week before my holiday started, and have not updated 
since.
I will become active again upon my return. I hope to have the 3.12 version 
posted
before the Security Summit.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] fsl: set wakeup sources

2013-09-05 Thread hongbo.zhang

From: Hongbo Zhang 

Some devices can work as wakeup sources, they should be powerred on during
system deep sleep, this patch adds interface for configuring devices power
supply status during deep sleep.

Signed-off-by: Hongbo Zhang 
---
 arch/powerpc/boot/dts/fsl/qoriq-power.dtsi |   73 
 arch/powerpc/sysdev/fsl_rcpm.c |   43 
 2 files changed, 116 insertions(+)
 create mode 100644 arch/powerpc/boot/dts/fsl/qoriq-power.dtsi

diff --git a/arch/powerpc/boot/dts/fsl/qoriq-power.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-power.dtsi
new file mode 100644
index 000..c5c2ba0
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/qoriq-power.dtsi
@@ -0,0 +1,73 @@
+/*
+ * QorIQ Power Management device tree stub
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *   names of its contributors may be used to endorse or promote products
+ *   derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* IPPDEXPCR: IP Power Down EXcePtion Control Register */
+rcpm-power@e2140 {
+   compatible = "fsl,rcpm-ippdexpcr";
+   reg = <0xe2140 0x4>;
+
+   mac1_1_power: soc-power@0 {
+   fsl,ippdexpcr-mask = <0x8000>;
+   };
+   mac1_2_power: soc-power@1 {
+   fsl,ippdexpcr-mask = <0x4000>;
+   };
+   mac1_3_power: soc-power@2 {
+   fsl,ippdexpcr-mask = <0x2000>;
+   };
+   mac1_4_power: soc-power@3 {
+   fsl,ippdexpcr-mask = <0x1000>;
+   };
+   mac1_5_power: soc-power@4 {
+   fsl,ippdexpcr-mask = <0x0800>;
+   };
+   sdhc_power: soc-power@24 {
+   fsl,ippdexpcr-mask = <0x0080>;
+   };
+   gpio_power: soc-power@25 {
+   fsl,ippdexpcr-mask = <0x0040>;
+   };
+   usb1_power: soc-power@26 {
+   fsl,ippdexpcr-mask = <0x0020>;
+   };
+   usb2_power: soc-power@27 {
+   fsl,ippdexpcr-mask = <0x0010>;
+   };
+   fman1_power: soc-power@28 {
+   fsl,ippdexpcr-mask = <0x0008>;
+   };
+   sap_power: soc-power@31 {
+   fsl,ippdexpcr-mask = <0x0001>;
+   };
+};
diff --git a/arch/powerpc/sysdev/fsl_rcpm.c b/arch/powerpc/sysdev/fsl_rcpm.c
index ecf43a2..bc21aea 100644
--- a/arch/powerpc/sysdev/fsl_rcpm.c
+++ b/arch/powerpc/sysdev/fsl_rcpm.c
@@ -23,6 +23,49 @@
 struct ccsr_rcpm __iomem *rcpm1_regs;
 struct ccsr_rcpm_v2 __iomem *rcpm2_regs;
 
+/**
+ * fsl_rcpm_set_wake - enable/disable device working as wakeup source
+ * @dev: device affected
+ * @enable: true for keeping power on for this device during deep sleep
+ *  false otherwise
+ *
+ * return 0 on success, return -EINVAL if the device cannot wake up system
+ * and -ENODEV if RCPM unavailable
+ */
+int fsl_rcpm_set_wake(struct device *dev, bool enable)
+{
+   int ret = 0;
+   struct device_node *pw_np;
+   u32 pw_mask;
+
+   if (!rcpm2_regs) {
+   dev_err(dev, "%s: RCPM is unavailable\n", __func__);
+   return -ENODEV;
+   }
+
+   if (enable && !device_may_wakeup(dev))
+   return -EINVAL;
+
+   pw_np = of_parse_phandle(dev->of_node, "fsl,rcpm-handle", 0);
+   if (!pw_np)
+   return -EINVAL;
+
+   if (of_property_read_u32(pw_np, "fsl,ippdexpcr-mask", &pw_mask)) {
+

[PATCH v2 2/4] ab8500-charger: Remove redundant break

2013-09-05 Thread Sachin Kamat

Each of the if-else blocks has a break statement.
Remove the additional one which is unreachable.

Signed-off-by: Sachin Kamat 
---
No changes since v1.
---
 drivers/power/ab8500_charger.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/power/ab8500_charger.c b/drivers/power/ab8500_charger.c
index 0d355a9..453141e 100644
--- a/drivers/power/ab8500_charger.c
+++ b/drivers/power/ab8500_charger.c
@@ -766,7 +766,6 @@ static int ab8500_charger_max_usb_curr(struct 
ab8500_charger *di,
ret = -ENXIO;
break;
}
-   break;
case USB_STAT_CARKIT_1:
case USB_STAT_CARKIT_2:
case USB_STAT_ACA_DOCK_CHARGER:
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 4/4] pm2301-charger: Staticize pm2xxx_charger_die_therm_mngt

2013-09-05 Thread Sachin Kamat

pm2xxx_charger_die_therm_mngt is used only in this file.
Make it static.

Signed-off-by: Sachin Kamat 
Acked-by: Lee Jones 
---
No changes since v1.
---
 drivers/power/pm2301_charger.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/power/pm2301_charger.c b/drivers/power/pm2301_charger.c
index e55d809..b871ba4 100644
--- a/drivers/power/pm2301_charger.c
+++ b/drivers/power/pm2301_charger.c
@@ -205,7 +205,7 @@ static int pm2xxx_charger_batt_therm_mngt(struct 
pm2xxx_charger *pm2, int val)
 }
 
 
-int pm2xxx_charger_die_therm_mngt(struct pm2xxx_charger *pm2, int val)
+static int pm2xxx_charger_die_therm_mngt(struct pm2xxx_charger *pm2, int val)
 {
queue_work(pm2->charger_wq, &pm2->check_main_thermal_prot_work);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 1/4] ab8500-charger: Check return value of regulator_enable

2013-09-05 Thread Sachin Kamat

Check the return value of regulator_enable to silence the following
type of warnings:
drivers/power/ab8500_charger.c:1390:20: warning: ignoring return value
of ‘regulator_enable’, declared with attribute warn_unused_result
[-Wunused-result]

Signed-off-by: Sachin Kamat 
Cc: Lee Jones 
---
Compile tested.
Changes since v1:
 * converted dev_err and return to dev_warn as suggested by Lee Jones.
---
 drivers/power/ab8500_charger.c |   20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/power/ab8500_charger.c b/drivers/power/ab8500_charger.c
index a4c4a10..0d355a9 100644
--- a/drivers/power/ab8500_charger.c
+++ b/drivers/power/ab8500_charger.c
@@ -1387,8 +1387,14 @@ static int ab8500_charger_ac_en(struct ux500_charger 
*charger,
 * the GPADC module independant of the AB8500 chargers
 */
if (!di->vddadc_en_ac) {
-   regulator_enable(di->regu);
-   di->vddadc_en_ac = true;
+   ret = regulator_enable(di->regu);
+   if (ret) {
+   dev_warn(di->dev,
+   "Failed to enable regulator\n");
+   di->vddadc_en_ac = false;
+   } else {
+   di->vddadc_en_ac = true;
+   }
}
 
/* Check if the requested voltage or current is valid */
@@ -1556,8 +1562,14 @@ static int ab8500_charger_usb_en(struct ux500_charger 
*charger,
 * the GPADC module independant of the AB8500 chargers
 */
if (!di->vddadc_en_usb) {
-   regulator_enable(di->regu);
-   di->vddadc_en_usb = true;
+   ret = regulator_enable(di->regu);
+   if (ret) {
+   dev_warn(di->dev,
+   "Failed to enable regulator\n");
+   di->vddadc_en_usb = false;
+   } else {
+   di->vddadc_en_usb = true;
+   }
}
 
/* Enable USB charging */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 3/4] pm2301-charger: Check return value of regulator_enable

2013-09-05 Thread Sachin Kamat

Check the return value of regulator_enable to silence the following
warning:
drivers/power/pm2301_charger.c:725:20: warning:
ignoring return value of ‘regulator_enable’, declared with
attribute warn_unused_result [-Wunused-result]

Signed-off-by: Sachin Kamat 
Cc: Lee Jones 
---
Compile tested.
Changes since v1:
* converted dev_err and return to dev_warn as suggested by Lee Jones.
---
 drivers/power/pm2301_charger.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/power/pm2301_charger.c b/drivers/power/pm2301_charger.c
index ffa10ed..e55d809 100644
--- a/drivers/power/pm2301_charger.c
+++ b/drivers/power/pm2301_charger.c
@@ -722,8 +722,14 @@ static int pm2xxx_charger_ac_en(struct ux500_charger 
*charger,
 
dev_dbg(pm2->dev, "Enable AC: %dmV %dmA\n", vset, iset);
if (!pm2->vddadc_en_ac) {
-   regulator_enable(pm2->regu);
-   pm2->vddadc_en_ac = true;
+   ret = regulator_enable(pm2->regu);
+   if (ret) {
+   dev_warn(pm2->dev,
+   "Failed to enable vddadc regulator\n");
+   pm2->vddadc_en_ac = false;
+   } else {
+   pm2->vddadc_en_ac = true;
+   }
}
 
ret = pm2xxx_charging_init(pm2);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf kvm: fix sample_type manipulation

2013-09-05 Thread Adrian Hunter

Manipulating the sample_type of an evsel requires
the use of:
perf_evsel__set_sample_bit()
and perf_evsel__reset_sample_bit()

Manipulating the sample type of an evlist requires
the id position to be recalculated.

Signed-off-by: Adrian Hunter 
---
 tools/perf/builtin-kvm.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 47b3540..0b7c5a9 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1165,16 +1165,16 @@ static int kvm_live_open_events(struct perf_kvm_stat 
*kvm)
struct perf_event_attr *attr = &pos->attr;
 
/* make sure these *are* set */
-   attr->sample_type |= PERF_SAMPLE_TID;
-   attr->sample_type |= PERF_SAMPLE_TIME;
-   attr->sample_type |= PERF_SAMPLE_CPU;
-   attr->sample_type |= PERF_SAMPLE_RAW;
+   perf_evsel__set_sample_bit(pos, TID);
+   perf_evsel__set_sample_bit(pos, TIME);
+   perf_evsel__set_sample_bit(pos, CPU);
+   perf_evsel__set_sample_bit(pos, RAW);
/* make sure these are *not*; want as small a sample as 
possible */
-   attr->sample_type &= ~PERF_SAMPLE_PERIOD;
-   attr->sample_type &= ~PERF_SAMPLE_IP;
-   attr->sample_type &= ~PERF_SAMPLE_CALLCHAIN;
-   attr->sample_type &= ~PERF_SAMPLE_ADDR;
-   attr->sample_type &= ~PERF_SAMPLE_READ;
+   perf_evsel__reset_sample_bit(pos, PERIOD);
+   perf_evsel__reset_sample_bit(pos, IP);
+   perf_evsel__reset_sample_bit(pos, CALLCHAIN);
+   perf_evsel__reset_sample_bit(pos, ADDR);
+   perf_evsel__reset_sample_bit(pos, READ);
attr->mmap = 0;
attr->comm = 0;
attr->task = 0;
@@ -1188,6 +1188,8 @@ static int kvm_live_open_events(struct perf_kvm_stat *kvm)
attr->disabled = 1;
}
 
+   perf_evlist__set_id_pos(evlist);
+
err = perf_evlist__open(evlist);
if (err < 0) {
printf("Couldn't create the events: %s\n", strerror(errno));
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 4/4] mm/zswap: use GFP_NOIO instead of GFP_KERNEL

2013-09-05 Thread Bob Liu


On 09/06/2013 01:16 PM, Weijie Yang wrote:
> To avoid zswap store and reclaim functions called recursively,
> use GFP_NOIO instead of GFP_KERNEL
> 

The reason of using GFP_KERNEL in write back path is we want to try our
best to move those pages from zswap to real swap device.

I think it would be better to keep GFP_KERNEL flag but find some other
ways to skip zswap/zswap_frontswap_store() if zswap write back is in
progress.

What I can think of currently is adding a mutex to zswap, take that
mutex when zswap write back happens and check the mutex in
zswap_frontswap_store().


> Signed-off-by: Weijie Yang 
> ---
>  mm/zswap.c |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index cc40e6a..3d05ed8 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -427,7 +427,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>* Get a new page to read into from swap.
>*/
>   if (!new_page) {
> - new_page = alloc_page(GFP_KERNEL);
> + new_page = alloc_page(GFP_NOIO);
>   if (!new_page)
>   break; /* Out of memory */
>   }
> @@ -435,7 +435,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>   /*
>* call radix_tree_preload() while we can wait.
>*/
> - err = radix_tree_preload(GFP_KERNEL);
> + err = radix_tree_preload(GFP_NOIO);
>   if (err)
>   break;
>  
> @@ -636,7 +636,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t 
> offset,
>   }
>  
>   /* allocate entry */
> - entry = zswap_entry_cache_alloc(GFP_KERNEL);
> + entry = zswap_entry_cache_alloc(GFP_NOIO);
>   if (!entry) {
>   zswap_reject_kmemcache_fail++;
>   ret = -ENOMEM;
> 

-- 
Regards,
-Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: problem fetching the watchdog tree

2013-09-05 Thread Wim Van Sebroeck

Hi Stephen,

> Fetching the wireless tree yesterday and today produced this error:
> 
> fatal: unable to connect to www.linux-watchdog.org:
> www.linux-watchdog.org[0: 83.149.101.17]: errno=Connection refused

Strange. I had a git zombie process, got rid of it i2 days ago and
restarted git but apparently it didn't do anything anymore.
I just restarted it and saw a pull coming in again. So it is fixed now.

Thanks for pointing it out.
Wim.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] VMCI: fix to pass correct device identity to free_irq()

2013-09-05 Thread Wei Yongjun

From: Wei Yongjun 

free_irq() expects the same device identity that was passed to
corresponding request_irq(), otherwise the IRQ is not freed.

Signed-off-by: Wei Yongjun 
---
 drivers/misc/vmw_vmci/vmci_guest.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
b/drivers/misc/vmw_vmci/vmci_guest.c
index b3a2b76..c98b03b 100644
--- a/drivers/misc/vmw_vmci/vmci_guest.c
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -649,7 +649,7 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
return 0;
 
 err_free_irq:
-   free_irq(vmci_dev->irq, &vmci_dev);
+   free_irq(vmci_dev->irq, vmci_dev);
tasklet_kill(&vmci_dev->datagram_tasklet);
tasklet_kill(&vmci_dev->bm_tasklet);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] SDT markers listing by perf

2013-09-05 Thread Namhyung Kim

Hi Hemant,

On Wed, 04 Sep 2013 23:07:57 +0530, Hemant wrote:
> On 09/04/2013 12:12 PM, Namhyung Kim wrote:
>> On Tue, 03 Sep 2013 13:06:55 +0530, Hemant Kumar wrote:
>>> +   /*
>>> +* Look for Section type = SHT_NOTE, flags = no SHF_ALLOC
>>> +* and name = .note.stapsdt
>>> +*/
>>> +   scn = elf_section_by_name(elf, &ehdr, &shdr, NOTE_SCN, NULL);
>>> +   if (scn == NULL) {
>>> +   pr_err("%s section not found!\n", NOTE_SCN);
>>> +   goto out_end;
>>> +   }
>>> +
>>> +   if (!(shdr.sh_type == SHT_NOTE) || (shdr.sh_flags & SHF_ALLOC))
>>> +   goto out_end;
>>> +
>>> +   data = elf_getdata(scn, NULL);
>>> +
>>> +   /* Get the notes */
>>> +   for (offset = 0; (next = gelf_getnote(data, offset, &nhdr, &name_off,
>>> + &desc_off)) > 0; offset = next) {
>>> +   tmp = populate_note(&elf, (const char *)((long)(data->d_buf) +
>>> +(long)desc_off),
>>> +   nhdr.n_descsz, nhdr.n_type);
>> Shouldn't we check the name of note being "stapsdt" as well as version
>> (type) 3?
>
> Since, we are already fetching the section NOTE_SCN (".note.stapsdt")
> and then we check for the type being SHT_NOTE and SHF_ALLOC, is it
> required to do the same for the individual notes?

I don't know.  Now it seems only includes SDT notes with name being
"stapsdt" and type being 3.  But things can be changed in future..

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ARM: OMAP2+: am335x-bone*: add DT for BeagleBone Black

2013-09-05 Thread Koen Kooi

The BeagleBone Black is basically a regular BeagleBone with eMMC and HDMI added,
so create a common dtsi both can use. MMC support for AM335x still isn't in, so
only the LDO change has been added.

Signed-off-by: Koen Kooi 
---
 .../{am335x-bone.dts => am335x-bone-common.dtsi}   |   3 -
 arch/arm/boot/dts/am335x-bone.dts  | 256 +
 arch/arm/boot/dts/am335x-boneblack.dts |  18 ++
 3 files changed, 19 insertions(+), 258 deletions(-)
 copy arch/arm/boot/dts/{am335x-bone.dts => am335x-bone-common.dtsi} (99%)
 create mode 100644 arch/arm/boot/dts/am335x-boneblack.dts

diff --git a/arch/arm/boot/dts/am335x-bone.dts 
b/arch/arm/boot/dts/am335x-bone-common.dtsi
similarity index 99%
copy from arch/arm/boot/dts/am335x-bone.dts
copy to arch/arm/boot/dts/am335x-bone-common.dtsi
index d318987..2f66ded 100644
--- a/arch/arm/boot/dts/am335x-bone.dts
+++ b/arch/arm/boot/dts/am335x-bone-common.dtsi
@@ -5,9 +5,6 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
-/dts-v1/;
-
-#include "am33xx.dtsi"
 
 / {
model = "TI AM335x BeagleBone";
diff --git a/arch/arm/boot/dts/am335x-bone.dts 
b/arch/arm/boot/dts/am335x-bone.dts
index d318987..7993c48 100644
--- a/arch/arm/boot/dts/am335x-bone.dts
+++ b/arch/arm/boot/dts/am335x-bone.dts
@@ -8,258 +8,4 @@
 /dts-v1/;
 
 #include "am33xx.dtsi"
-
-/ {
-   model = "TI AM335x BeagleBone";
-   compatible = "ti,am335x-bone", "ti,am33xx";
-
-   cpus {
-   cpu@0 {
-   cpu0-supply = <&dcdc2_reg>;
-   };
-   };
-
-   memory {
-   device_type = "memory";
-   reg = <0x8000 0x1000>; /* 256 MB */
-   };
-
-   am33xx_pinmux: pinmux@44e10800 {
-   pinctrl-names = "default";
-   pinctrl-0 = <&clkout2_pin>;
-
-   user_leds_s0: user_leds_s0 {
-   pinctrl-single,pins = <
-   0x54 (PIN_OUTPUT_PULLDOWN | MUX_MODE7)  /* 
gpmc_a5.gpio1_21 */
-   0x58 (PIN_OUTPUT_PULLUP | MUX_MODE7)/* 
gpmc_a6.gpio1_22 */
-   0x5c (PIN_OUTPUT_PULLDOWN | MUX_MODE7)  /* 
gpmc_a7.gpio1_23 */
-   0x60 (PIN_OUTPUT_PULLUP | MUX_MODE7)/* 
gpmc_a8.gpio1_24 */
-   >;
-   };
-
-   i2c0_pins: pinmux_i2c0_pins {
-   pinctrl-single,pins = <
-   0x188 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
i2c0_sda.i2c0_sda */
-   0x18c (PIN_INPUT_PULLUP | MUX_MODE0)/* 
i2c0_scl.i2c0_scl */
-   >;
-   };
-
-   uart0_pins: pinmux_uart0_pins {
-   pinctrl-single,pins = <
-   0x170 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
uart0_rxd.uart0_rxd */
-   0x174 (PIN_OUTPUT_PULLDOWN | MUX_MODE0) /* 
uart0_txd.uart0_txd */
-   >;
-   };
-
-   clkout2_pin: pinmux_clkout2_pin {
-   pinctrl-single,pins = <
-   0x1b4 (PIN_OUTPUT_PULLDOWN | MUX_MODE3) /* 
xdma_event_intr1.clkout2 */
-   >;
-   };
-
-   cpsw_default: cpsw_default {
-   pinctrl-single,pins = <
-   /* Slave 1 */
-   0x110 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxerr.mii1_rxerr */
-   0x114 (PIN_OUTPUT_PULLDOWN | MUX_MODE0) /* 
mii1_txen.mii1_txen */
-   0x118 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxdv.mii1_rxdv */
-   0x11c (PIN_OUTPUT_PULLDOWN | MUX_MODE0) /* 
mii1_txd3.mii1_txd3 */
-   0x120 (PIN_OUTPUT_PULLDOWN | MUX_MODE0) /* 
mii1_txd2.mii1_txd2 */
-   0x124 (PIN_OUTPUT_PULLDOWN | MUX_MODE0) /* 
mii1_txd1.mii1_txd1 */
-   0x128 (PIN_OUTPUT_PULLDOWN | MUX_MODE0) /* 
mii1_txd0.mii1_txd0 */
-   0x12c (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_txclk.mii1_txclk */
-   0x130 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxclk.mii1_rxclk */
-   0x134 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxd3.mii1_rxd3 */
-   0x138 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxd2.mii1_rxd2 */
-   0x13c (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxd1.mii1_rxd1 */
-   0x140 (PIN_INPUT_PULLUP | MUX_MODE0)/* 
mii1_rxd0.mii1_rxd0 */
-   >;
-   };
-
-   cpsw_sleep: cpsw_sleep {
-   pinctrl-single,pins = <
-   /* Slave 1 reset value */
-

Re: [PATCH v2 2/4] mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently

2013-09-05 Thread Bob Liu


On 09/06/2013 01:16 PM, Weijie Yang wrote:
> Consider the following scenario:
> thread 0: reclaim entry x (get refcount, but not call 
> zswap_get_swap_cache_page)
> thread 1: call zswap_frontswap_invalidate_page to invalidate entry x.
>   finished, entry x and its zbud is not freed as its refcount != 0
>   now, the swap_map[x] = 0
> thread 0: now call zswap_get_swap_cache_page
>   swapcache_prepare return -ENOENT because entry x is not used any more
>   zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM
>   zswap_writeback_entry do nothing except put refcount
> Now, the memory of zswap_entry x and its zpage leak.
> 
> Modify:
> - check the refcount in fail path, free memory if it is not referenced.
> - use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path
> can be not only caused by nomem but also by invalidate.
> 
> Signed-off-by: Weijie Yang 

Reviewed-by: Bob Liu 

> ---
>  mm/zswap.c |   21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index cbd9578..1be7b90 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -387,7 +387,7 @@ static void zswap_free_entry(struct zswap_tree *tree, 
> struct zswap_entry *entry)
>  enum zswap_get_swap_ret {
>   ZSWAP_SWAPCACHE_NEW,
>   ZSWAP_SWAPCACHE_EXIST,
> - ZSWAP_SWAPCACHE_NOMEM
> + ZSWAP_SWAPCACHE_FAIL,
>  };
>  
>  /*
> @@ -401,9 +401,9 @@ enum zswap_get_swap_ret {
>   * added to the swap cache, and returned in retpage.
>   *
>   * If success, the swap cache page is returned in retpage
> - * Returns 0 if page was already in the swap cache, page is not locked
> - * Returns 1 if the new page needs to be populated, page is locked
> - * Returns <0 on error
> + * Returns ZSWAP_SWAPCACHE_EXIST if page was already in the swap cache
> + * Returns ZSWAP_SWAPCACHE_NEW if the new page needs to be populated, page 
> is locked
> + * Returns ZSWAP_SWAPCACHE_FAIL on error
>   */
>  static int zswap_get_swap_cache_page(swp_entry_t entry,
>   struct page **retpage)
> @@ -475,7 +475,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
>   if (new_page)
>   page_cache_release(new_page);
>   if (!found_page)
> - return ZSWAP_SWAPCACHE_NOMEM;
> + return ZSWAP_SWAPCACHE_FAIL;
>   *retpage = found_page;
>   return ZSWAP_SWAPCACHE_EXIST;
>  }
> @@ -529,11 +529,11 @@ static int zswap_writeback_entry(struct zbud_pool 
> *pool, unsigned long handle)
>  
>   /* try to allocate swap cache page */
>   switch (zswap_get_swap_cache_page(swpentry, &page)) {
> - case ZSWAP_SWAPCACHE_NOMEM: /* no memory */
> + case ZSWAP_SWAPCACHE_FAIL: /* no memory or invalidate happened */
>   ret = -ENOMEM;
>   goto fail;
>  
> - case ZSWAP_SWAPCACHE_EXIST: /* page is unlocked */
> + case ZSWAP_SWAPCACHE_EXIST:
>   /* page is already in the swap cache, ignore for now */
>   page_cache_release(page);
>   ret = -EEXIST;
> @@ -591,7 +591,12 @@ static int zswap_writeback_entry(struct zbud_pool *pool, 
> unsigned long handle)
>  
>  fail:
>   spin_lock(&tree->lock);
> - zswap_entry_put(entry);
> + refcount = zswap_entry_put(entry);
> + if (refcount <= 0) {
> + /* invalidate happened, consider writeback as success */
> + zswap_free_entry(tree, entry);
> + ret = 0;
> + }
>   spin_unlock(&tree->lock);
>   return ret;
>  }
> 

-- 
Regards,
-Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 3/4] mm/zswap: avoid unnecessary page scanning

2013-09-05 Thread Bob Liu


On 09/06/2013 01:16 PM, Weijie Yang wrote:
> add SetPageReclaim before __swap_writepage so that page can be moved to the
> tail of the inactive list, which can avoid unnecessary page scanning as this
> page was reclaimed by swap subsystem before.
> 
> Signed-off-by: Weijie Yang 

Reviewed-by: Bob Liu 

> ---
>  mm/zswap.c |3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 1be7b90..cc40e6a 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -556,6 +556,9 @@ static int zswap_writeback_entry(struct zbud_pool *pool, 
> unsigned long handle)
>   SetPageUptodate(page);
>   }
>  
> + /* move it to the tail of the inactive list after end_writeback */
> + SetPageReclaim(page);
> +
>   /* start writeback */
>   __swap_writepage(page, &wbc, end_swap_bio_write);
>   page_cache_release(page);
> 

-- 
Regards,
-Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/4] mm/zswap: bugfix: memory leak when re-swapon

2013-09-05 Thread Bob Liu



On 09/06/2013 01:16 PM, Weijie Yang wrote:
> zswap_tree is not freed when swapoff, and it got re-kmalloc in swapon,
> so memory-leak occurs.
> 
> Modify: free memory of zswap_tree in zswap_frontswap_invalidate_area().
> 
> Signed-off-by: Weijie Yang 

Reviewed-by: Bob Liu 

> ---
>  mm/zswap.c |4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index deda2b6..cbd9578 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -816,6 +816,10 @@ static void zswap_frontswap_invalidate_area(unsigned 
> type)
>   }
>   tree->rbroot = RB_ROOT;
>   spin_unlock(&tree->lock);
> +
> + zbud_destroy_pool(tree->pool);
> + kfree(tree);
> + zswap_trees[type] = NULL;
>  }
>  
>  static struct zbud_ops zswap_zbud_ops = {
> 

-- 
Regards,
-Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] cpufreq: serialize calls to __cpufreq_governor()

2013-09-05 Thread Srivatsa S. Bhat

On 09/05/2013 06:24 AM, Stephen Boyd wrote:
> On 09/04/13 17:26, Rafael J. Wysocki wrote:
>> On Wednesday, September 04, 2013 04:50:01 PM Stephen Boyd wrote:
>>> On 09/04/13 16:55, Rafael J. Wysocki wrote:
 Well, I'm not sure when Viresh is going to be back.

 Srivatsa, can you please resend this patch with a proper changelog?

>>> I haven't had a chance to try this out yet, but I was just thinking
>>> about this patch. How is it going to work? If one task opens the file
>>> and another task is taking down the CPU wouldn't we deadlock in the
>>> CPU_DOWN notifier waiting for the kobject to be released? Task 1 will
>>> grab the kobject reference and sleep on the hotplug mutex and task 2
>>> will put the kobject and wait for the completion, but it won't happen.
>>> At least I think that's what would happen.
>> Do you mean the completion in sysfs_deactivate()?  Yes, we can deadlock
>> there.
> 
> I mean the complete in cpufreq_sysfs_release(). I don't think that will
> ever be called because the kobject is held by the task calling store()
> which is waiting on the hotplug lock to be released.
> 
>>
>> Well, I guess the Srivatsa's patch may be salvaged by making it do a 
>> "trylock"
>> version of get_online_cpus(), but then I wonder if there's no better way.
> 
> I think the real solution is to remove the kobject first if the CPU
> going down is the last user of that policy. Once the completion is done
> we can stop the governor and clean up state. For the case where there
> are still CPUs using the policy why can't we cancel that CPU's work and
> do nothing else? Even in the case where we have to move the cpufreq
> directory do we need to do a STOP/START/LIMITS sequence? I hope we can
> get away with just moving the directory and canceling that CPUs work then.
> 

Conceptually, I agree that your idea of not allowing any process to take
a new reference to the kobject while we are taking the CPU offline, is a
sound solution.

However, I am reluctant to go down that path because, handling the CPU hotplug
sequence in the suspend/resume path might get even more tricky, if we want
to implement the changes that you propose. Just recently we managed to
rework the cpufreq CPU hotplug handling to retain the sysfs file permissions
around suspend/resume, and doing that was not at all simple. Adding more
quirks and complexity to the kobject handling in that path will only make
things even more challenging, IMHO. That's the reason I'm trying to think
of ways to avoid touching that fragile code, and instead solve this problem
in some other way, without compromising on the robustness of the solution.

So here is my new proposal, as a replacement to this patch[2/2]:

We note that, at CPU_DOWN_PREPARE stage, the CPU is not yet marked offline,
whereas by the time we handle CPU_POST_DEAD, the CPU is removed from the
cpu_online_mask, and also the cpu_hotplug lock is dropped.

So, let us split up __cpu_remove_dev() into 2 parts, say:
__cpu_remove_prepare() - invoked during CPU_DOWN_PREPARE
__cpu_remove_finish()  - invoked during CPU_POST_DEAD

In the former function, we stop the governors, so that policy->governor_enabled
gets set to false, so that patch [1/2] will return -EBUSY to any subsequent
->store() requests. Also, we do everything except the kobject cleanup.

In the latter function, we do the remaining work, particularly the part
where we wait for the kobject refcount to drop to zero and the subsequent
cleanup.

And the ->store() functions will be modified to look like this:

store()
{
get_online_cpus();

if (!cpu_online(cpu))
goto out;

/* Body of the function*/
out:
put_online_cpus();
}

That way, if a task tries to write to a cpufreq file during CPU offline,
it will get blocked on get_online_cpus(), and will continue after
CPU_DEAD (since we release the lock here). Then it will notice that the cpu
is offline, and hence will return silently, thus dropping the kobject refcnt.
So, when the cpufreq core comes back at the CPU_POST_DEAD stage to cleanup
the kobject, it won't encounter any problems.

Any thoughts on this approach? I'll try to code it up and post the patch
later today.

Thank you!

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-05 Thread Alexey Kardashevskiy

On 09/06/2013 04:01 PM, Gleb Natapov wrote:
> On Fri, Sep 06, 2013 at 09:38:21AM +1000, Alexey Kardashevskiy wrote:
>> On 09/06/2013 04:10 AM, Gleb Natapov wrote:
>>> On Wed, Sep 04, 2013 at 02:01:28AM +1000, Alexey Kardashevskiy wrote:
 On 09/03/2013 08:53 PM, Gleb Natapov wrote:
> On Mon, Sep 02, 2013 at 01:14:29PM +1000, Alexey Kardashevskiy wrote:
>> On 09/01/2013 10:06 PM, Gleb Natapov wrote:
>>> On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
 This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
 and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
 them to user space which saves time on switching to user space and 
 back.

 Both real and virtual modes are supported. The kernel tries to
 handle a TCE request in the real mode, if fails it passes the request
 to the virtual mode to complete the operation. If it a virtual mode
 handler fails, the request is passed to user space.

 The first user of this is VFIO on POWER. Trampolines to the VFIO 
 external
 user API functions are required for this patch.

 This adds a "SPAPR TCE IOMMU" KVM device to associate a logical bus
 number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel 
 handling
 of map/unmap requests. The device supports a single attribute which is
 a struct with LIOBN and IOMMU fd. When the attribute is set, the device
 establishes the connection between KVM and VFIO.

 Tests show that this patch increases transmission speed from 220MB/s
 to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

 Signed-off-by: Paul Mackerras 
 Signed-off-by: Alexey Kardashevskiy 

 ---

 Changes:
 v9:
 * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with "SPAPR TCE IOMMU"
 KVM device
 * release_spapr_tce_table() is not shared between different TCE types
 * reduced the patch size by moving VFIO external API
 trampolines to separate patche
 * moved documentation from Documentation/virtual/kvm/api.txt to
 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

 v8:
 * fixed warnings from check_patch.pl

 2013/07/11:
 * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
 for KVM_BOOK3S_64
 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much 
 sense
 for this here but the next patch for hugepages support will use it 
 more.

 2013/07/06:
 * added realmode arch_spin_lock to protect TCE table from races
 in real and virtual modes
 * POWERPC IOMMU API is changed to support real mode
 * iommu_take_ownership and iommu_release_ownership are protected by
 iommu_table's locks
 * VFIO external user API use rewritten
 * multiple small fixes

 2013/06/27:
 * tce_list page is referenced now in order to protect it from accident
 invalidation during H_PUT_TCE_INDIRECT execution
 * added use of the external user VFIO API

 2013/06/05:
 * changed capability number
 * changed ioctl number
 * update the doc article number

 2013/05/20:
 * removed get_user() from real mode handlers
 * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts 
 there
 translated TCEs, tries realmode_get_page() on those and if it fails, it
 passes control over the virtual mode handler which tries to finish
 the request handling
 * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY 
 bit
 on a page
 * The only reason to pass the request to user mode now is when the 
 user mode
 did not register TCE table in the kernel, in all other cases the 
 virtual mode
 handler is expected to do the job
 ---
  .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
  arch/powerpc/include/asm/kvm_host.h|   4 +
  arch/powerpc/kvm/book3s_64_vio.c   | 310 
 -
  arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
  arch/powerpc/kvm/powerpc.c |   1 +
  include/linux/kvm_host.h   |   1 +
  virt/kvm/kvm_main.c|   5 +
  7 files changed, 477 insertions(+), 3 deletions(-)
  create mode 100644 
 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

 diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
 b/Documentation/virtual/kvm/devices/spapr_

Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-05 Thread Gleb Natapov

On Fri, Sep 06, 2013 at 09:38:21AM +1000, Alexey Kardashevskiy wrote:
> On 09/06/2013 04:10 AM, Gleb Natapov wrote:
> > On Wed, Sep 04, 2013 at 02:01:28AM +1000, Alexey Kardashevskiy wrote:
> >> On 09/03/2013 08:53 PM, Gleb Natapov wrote:
> >>> On Mon, Sep 02, 2013 at 01:14:29PM +1000, Alexey Kardashevskiy wrote:
>  On 09/01/2013 10:06 PM, Gleb Natapov wrote:
> > On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
> >> This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
> >> and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
> >> them to user space which saves time on switching to user space and 
> >> back.
> >>
> >> Both real and virtual modes are supported. The kernel tries to
> >> handle a TCE request in the real mode, if fails it passes the request
> >> to the virtual mode to complete the operation. If it a virtual mode
> >> handler fails, the request is passed to user space.
> >>
> >> The first user of this is VFIO on POWER. Trampolines to the VFIO 
> >> external
> >> user API functions are required for this patch.
> >>
> >> This adds a "SPAPR TCE IOMMU" KVM device to associate a logical bus
> >> number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel 
> >> handling
> >> of map/unmap requests. The device supports a single attribute which is
> >> a struct with LIOBN and IOMMU fd. When the attribute is set, the device
> >> establishes the connection between KVM and VFIO.
> >>
> >> Tests show that this patch increases transmission speed from 220MB/s
> >> to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
> >>
> >> Signed-off-by: Paul Mackerras 
> >> Signed-off-by: Alexey Kardashevskiy 
> >>
> >> ---
> >>
> >> Changes:
> >> v9:
> >> * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with "SPAPR TCE IOMMU"
> >> KVM device
> >> * release_spapr_tce_table() is not shared between different TCE types
> >> * reduced the patch size by moving VFIO external API
> >> trampolines to separate patche
> >> * moved documentation from Documentation/virtual/kvm/api.txt to
> >> Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
> >>
> >> v8:
> >> * fixed warnings from check_patch.pl
> >>
> >> 2013/07/11:
> >> * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
> >> for KVM_BOOK3S_64
> >> * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much 
> >> sense
> >> for this here but the next patch for hugepages support will use it 
> >> more.
> >>
> >> 2013/07/06:
> >> * added realmode arch_spin_lock to protect TCE table from races
> >> in real and virtual modes
> >> * POWERPC IOMMU API is changed to support real mode
> >> * iommu_take_ownership and iommu_release_ownership are protected by
> >> iommu_table's locks
> >> * VFIO external user API use rewritten
> >> * multiple small fixes
> >>
> >> 2013/06/27:
> >> * tce_list page is referenced now in order to protect it from accident
> >> invalidation during H_PUT_TCE_INDIRECT execution
> >> * added use of the external user VFIO API
> >>
> >> 2013/06/05:
> >> * changed capability number
> >> * changed ioctl number
> >> * update the doc article number
> >>
> >> 2013/05/20:
> >> * removed get_user() from real mode handlers
> >> * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts 
> >> there
> >> translated TCEs, tries realmode_get_page() on those and if it fails, it
> >> passes control over the virtual mode handler which tries to finish
> >> the request handling
> >> * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY 
> >> bit
> >> on a page
> >> * The only reason to pass the request to user mode now is when the 
> >> user mode
> >> did not register TCE table in the kernel, in all other cases the 
> >> virtual mode
> >> handler is expected to do the job
> >> ---
> >>  .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
> >>  arch/powerpc/include/asm/kvm_host.h|   4 +
> >>  arch/powerpc/kvm/book3s_64_vio.c   | 310 
> >> -
> >>  arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
> >>  arch/powerpc/kvm/powerpc.c |   1 +
> >>  include/linux/kvm_host.h   |   1 +
> >>  virt/kvm/kvm_main.c|   5 +
> >>  7 files changed, 477 insertions(+), 3 deletions(-)
> >>  create mode 100644 
> >> Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
> >>
> >> diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
> >> b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
> >> new file mode 100644
> >

[PATCH] mmc: sdhci: add support for pre_req and post_req

2013-09-05 Thread Chanho Min

This patch supports non-blocking mmc request function for the sdchi driver.
(commit: aa8b683a7d392271ed349c6ab9f36b8c313794b7)

pre_req() runs dma_map_sg(), post_req() runs dma_unmap_sg.  If not calling
pre_req() before sdhci_request(), dma_map_sg will be issued before
starting the transfer.  It is optional to use pre_req().  If issuing
pre_req(), post_req() must be called as well.

benchmark results:
 ARM CA9 1GHz, UHS DDR50 mode

 Before:
 dd if=/dev/mmcblk0p15 of=/dev/null bs=64k count=1024
 67108864 bytes (64.0MB) copied, 1.188846 seconds, 53.8MB/s

 After:
 dd if=/dev/mmcblk0p15 of=/dev/null bs=64k count=1024
 67108864 bytes (64.0MB) copied, 0.993098 seconds, 64.4MB/s

Signed-off-by: Chanho Min 
---
 drivers/mmc/host/sdhci.c  |   96 +++--
 include/linux/mmc/sdhci.h |6 +++
 2 files changed, 90 insertions(+), 12 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 2ea429c..0465a9a 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -465,6 +465,42 @@ static void sdhci_set_adma_desc(u8 *desc, u32 addr, int 
len, unsigned cmd)
dataddr[0] = cpu_to_le32(addr);
 }
 
+static int sdhci_pre_dma_transfer(struct sdhci_host *host,
+  struct mmc_data *data,
+  struct sdhci_next *next)
+{
+   int sg_count = 0;
+
+   if (!next && data->host_cookie &&
+   data->host_cookie != host->next_data.cookie) {
+   pr_warn("[%s] invalid cookie: data->host_cookie %d"
+   " host->next_data.cookie %d\n",
+   __func__, data->host_cookie, host->next_data.cookie);
+   data->host_cookie = 0;
+   }
+
+   /* Check if next job is already prepared */
+   if (next ||
+   (!next && data->host_cookie != host->next_data.cookie)) {
+   sg_count = dma_map_sg(mmc_dev(host->mmc), data->sg,
+   data->sg_len,
+   (data->flags & MMC_DATA_READ) ?
+   DMA_FROM_DEVICE :
+   DMA_TO_DEVICE);
+   } else {
+   sg_count = host->next_data.sg_count;
+   host->next_data.sg_count = 0;
+   }
+
+   if (next) {
+   next->sg_count = sg_count;
+   data->host_cookie = ++next->cookie < 0 ? 1 : next->cookie;
+   } else
+   host->sg_count = sg_count;
+
+   return sg_count;
+}
+
 static int sdhci_adma_table_pre(struct sdhci_host *host,
struct mmc_data *data)
 {
@@ -502,8 +538,8 @@ static int sdhci_adma_table_pre(struct sdhci_host *host,
goto fail;
BUG_ON(host->align_addr & 0x3);
 
-   host->sg_count = dma_map_sg(mmc_dev(host->mmc),
-   data->sg, data->sg_len, direction);
+   host->sg_count = sdhci_pre_dma_transfer(host, data, NULL);
+
if (host->sg_count == 0)
goto unmap_align;
 
@@ -643,9 +679,10 @@ static void sdhci_adma_table_post(struct sdhci_host *host,
}
}
}
-
-   dma_unmap_sg(mmc_dev(host->mmc), data->sg,
-   data->sg_len, direction);
+   if (!data->host_cookie) {
+   dma_unmap_sg(mmc_dev(host->mmc), data->sg,
+   data->sg_len, direction);
+   }
 }
 
 static u8 sdhci_calc_timeout(struct sdhci_host *host, struct mmc_command *cmd)
@@ -824,12 +861,8 @@ static void sdhci_prepare_data(struct sdhci_host *host, 
struct mmc_command *cmd)
}
} else {
int sg_cnt;
+   sg_cnt = sdhci_pre_dma_transfer(host, data, NULL);
 
-   sg_cnt = dma_map_sg(mmc_dev(host->mmc),
-   data->sg, data->sg_len,
-   (data->flags & MMC_DATA_READ) ?
-   DMA_FROM_DEVICE :
-   DMA_TO_DEVICE);
if (sg_cnt == 0) {
/*
 * This only happens when someone fed
@@ -928,9 +961,12 @@ static void sdhci_finish_data(struct sdhci_host *host)
if (host->flags & SDHCI_USE_ADMA)
sdhci_adma_table_post(host, data);
else {
-   dma_unmap_sg(mmc_dev(host->mmc), data->sg,
-   data->sg_len, (data->flags & MMC_DATA_READ) ?
+   if (!data->host_cookie) {
+   dma_unmap_sg(mmc_dev(host->mmc), data->sg,
+   data->sg_len,
+   (data->flags & MMC_DATA_READ) ?
DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   }
}

[REPOST PATCH 2/4] slab: introduce helper functions to get/set free object

2013-09-05 Thread Joonsoo Kim

In the following patches, to get/set free objects from the freelist
is changed so that simple casting doesn't work for it. Therefore,
introduce helper functions.

Signed-off-by: Joonsoo Kim 

diff --git a/mm/slab.c b/mm/slab.c
index 9d4bad5..a0e49bb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2545,9 +2545,15 @@ static struct freelist *alloc_slabmgmt(struct kmem_cache 
*cachep,
return freelist;
 }
 
-static inline unsigned int *slab_freelist(struct page *page)
+static inline unsigned int get_free_obj(struct page *page, unsigned int idx)
 {
-   return (unsigned int *)(page->freelist);
+   return ((unsigned int *)page->freelist)[idx];
+}
+
+static inline void set_free_obj(struct page *page,
+   unsigned int idx, unsigned int val)
+{
+   ((unsigned int *)(page->freelist))[idx] = val;
 }
 
 static void cache_init_objs(struct kmem_cache *cachep,
@@ -2592,7 +2598,7 @@ static void cache_init_objs(struct kmem_cache *cachep,
if (cachep->ctor)
cachep->ctor(objp);
 #endif
-   slab_freelist(page)[i] = i;
+   set_free_obj(page, i, i);
}
 }
 
@@ -2611,7 +2617,7 @@ static void *slab_get_obj(struct kmem_cache *cachep, 
struct page *page,
 {
void *objp;
 
-   objp = index_to_obj(cachep, page, slab_freelist(page)[page->active]);
+   objp = index_to_obj(cachep, page, get_free_obj(page, page->active));
page->active++;
 #if DEBUG
WARN_ON(page_to_nid(virt_to_page(objp)) != nodeid);
@@ -2632,7 +2638,7 @@ static void slab_put_obj(struct kmem_cache *cachep, 
struct page *page,
 
/* Verify double free bug */
for (i = page->active; i < cachep->num; i++) {
-   if (slab_freelist(page)[i] == objnr) {
+   if (get_free_obj(page, i) == objnr) {
printk(KERN_ERR "slab: double free detected in cache "
"'%s', objp %p\n", cachep->name, objp);
BUG();
@@ -2640,7 +2646,7 @@ static void slab_put_obj(struct kmem_cache *cachep, 
struct page *page,
}
 #endif
page->active--;
-   slab_freelist(page)[page->active] = objnr;
+   set_free_obj(page, page->active, objnr);
 }
 
 /*
@@ -4214,7 +4220,7 @@ static void handle_slab(unsigned long *n, struct 
kmem_cache *c,
 
for (j = page->active; j < c->num; j++) {
/* Skip freed item */
-   if (slab_freelist(page)[j] == i) {
+   if (get_free_obj(page, j) == i) {
active = false;
break;
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[REPOST PATCH 1/4] slab: factor out calculate nr objects in cache_estimate

2013-09-05 Thread Joonsoo Kim

This logic is not simple to understand so that making separate function
helping readability. Additionally, we can use this change in the
following patch which implement for freelist to have another sized index
in according to nr objects.

Signed-off-by: Joonsoo Kim 

diff --git a/mm/slab.c b/mm/slab.c
index f3868fe..9d4bad5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -565,9 +565,31 @@ static inline struct array_cache *cpu_cache_get(struct 
kmem_cache *cachep)
return cachep->array[smp_processor_id()];
 }
 
-static size_t slab_mgmt_size(size_t nr_objs, size_t align)
+static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
+   size_t idx_size, size_t align)
 {
-   return ALIGN(nr_objs * sizeof(unsigned int), align);
+   int nr_objs;
+   size_t freelist_size;
+
+   /*
+* Ignore padding for the initial guess. The padding
+* is at most @align-1 bytes, and @buffer_size is at
+* least @align. In the worst case, this result will
+* be one greater than the number of objects that fit
+* into the memory allocation when taking the padding
+* into account.
+*/
+   nr_objs = slab_size / (buffer_size + idx_size);
+
+   /*
+* This calculated number will be either the right
+* amount, or one greater than what we want.
+*/
+   freelist_size = slab_size - nr_objs * buffer_size;
+   if (freelist_size < ALIGN(nr_objs * idx_size, align))
+   nr_objs--;
+
+   return nr_objs;
 }
 
 /*
@@ -600,28 +622,12 @@ static void cache_estimate(unsigned long gfporder, size_t 
buffer_size,
nr_objs = slab_size / buffer_size;
 
} else {
-   /*
-* Ignore padding for the initial guess. The padding
-* is at most @align-1 bytes, and @buffer_size is at
-* least @align. In the worst case, this result will
-* be one greater than the number of objects that fit
-* into the memory allocation when taking the padding
-* into account.
-*/
-   nr_objs = (slab_size) / (buffer_size + sizeof(unsigned int));
-
-   /*
-* This calculated number will be either the right
-* amount, or one greater than what we want.
-*/
-   if (slab_mgmt_size(nr_objs, align) + nr_objs*buffer_size
-  > slab_size)
-   nr_objs--;
-
-   mgmt_size = slab_mgmt_size(nr_objs, align);
+   nr_objs = calculate_nr_objs(slab_size, buffer_size,
+   sizeof(unsigned int), align);
+   mgmt_size = ALIGN(nr_objs * sizeof(unsigned int), align);
}
*num = nr_objs;
-   *left_over = slab_size - nr_objs*buffer_size - mgmt_size;
+   *left_over = slab_size - (nr_objs * buffer_size) - mgmt_size;
 }
 
 #if DEBUG
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab

2013-09-05 Thread Joonsoo Kim

On Thu, Sep 05, 2013 at 02:33:56PM +, Christoph Lameter wrote:
> On Thu, 5 Sep 2013, Joonsoo Kim wrote:
> 
> > I think that all patchsets deserve to be merged, since it reduces memory 
> > usage and
> > also improves performance. :)
> 
> Could you clean things up etc and the repost the patchset? This time do
> *not* do this as a response to an earlier email but start the patchset
> with new thread id. I think some people are not seeing this patchset.

Okay. I just did that.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[REPOST PATCH 3/4] slab: introduce byte sized index for the freelist of a slab

2013-09-05 Thread Joonsoo Kim

Currently, the freelist of a slab consist of unsigned int sized indexes.
Most of slabs have less number of objects than 256, since restriction
for page order is at most 1 in default configuration. For example,
consider a slab consisting of 32 byte sized objects on two continous
pages. In this case, 256 objects is possible and these number fit to byte
sized indexes. 256 objects is maximum possible value in default
configuration, since 32 byte is minimum object size in the SLAB.
(8192 / 32 = 256). Therefore, if we use byte sized index, we can save
3 bytes for each object.

This introduce one likely branch to functions used for setting/getting
objects to/from the freelist, but we may get more benefits from
this change.

Signed-off-by: Joonsoo Kim 

diff --git a/mm/slab.c b/mm/slab.c
index a0e49bb..bd366e5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -565,8 +565,16 @@ static inline struct array_cache *cpu_cache_get(struct 
kmem_cache *cachep)
return cachep->array[smp_processor_id()];
 }
 
-static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
-   size_t idx_size, size_t align)
+static inline bool can_byte_index(int nr_objs)
+{
+   if (likely(nr_objs <= (sizeof(unsigned char) << 8)))
+   return true;
+
+   return false;
+}
+
+static int __calculate_nr_objs(size_t slab_size, size_t buffer_size,
+   unsigned int idx_size, size_t align)
 {
int nr_objs;
size_t freelist_size;
@@ -592,6 +600,29 @@ static int calculate_nr_objs(size_t slab_size, size_t 
buffer_size,
return nr_objs;
 }
 
+static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
+   size_t align)
+{
+   int nr_objs;
+   int byte_nr_objs;
+
+   nr_objs = __calculate_nr_objs(slab_size, buffer_size,
+   sizeof(unsigned int), align);
+   if (!can_byte_index(nr_objs))
+   return nr_objs;
+
+   byte_nr_objs = __calculate_nr_objs(slab_size, buffer_size,
+   sizeof(unsigned char), align);
+   /*
+* nr_objs can be larger when using byte index,
+* so that it cannot be indexed by byte index.
+*/
+   if (can_byte_index(byte_nr_objs))
+   return byte_nr_objs;
+   else
+   return nr_objs;
+}
+
 /*
  * Calculate the number of objects and left-over bytes for a given buffer size.
  */
@@ -618,13 +649,18 @@ static void cache_estimate(unsigned long gfporder, size_t 
buffer_size,
 * correct alignment when allocated.
 */
if (flags & CFLGS_OFF_SLAB) {
-   mgmt_size = 0;
nr_objs = slab_size / buffer_size;
+   mgmt_size = 0;
 
} else {
-   nr_objs = calculate_nr_objs(slab_size, buffer_size,
-   sizeof(unsigned int), align);
-   mgmt_size = ALIGN(nr_objs * sizeof(unsigned int), align);
+   nr_objs = calculate_nr_objs(slab_size, buffer_size, align);
+   if (can_byte_index(nr_objs)) {
+   mgmt_size =
+   ALIGN(nr_objs * sizeof(unsigned char), align);
+   } else {
+   mgmt_size =
+   ALIGN(nr_objs * sizeof(unsigned int), align);
+   }
}
*num = nr_objs;
*left_over = slab_size - (nr_objs * buffer_size) - mgmt_size;
@@ -2012,7 +2048,10 @@ static size_t calculate_slab_order(struct kmem_cache 
*cachep,
 * looping condition in cache_grow().
 */
offslab_limit = size;
-   offslab_limit /= sizeof(unsigned int);
+   if (can_byte_index(num))
+   offslab_limit /= sizeof(unsigned char);
+   else
+   offslab_limit /= sizeof(unsigned int);
 
if (num > offslab_limit)
break;
@@ -2253,8 +2292,13 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned 
long flags)
if (!cachep->num)
return -E2BIG;
 
-   freelist_size =
-   ALIGN(cachep->num * sizeof(unsigned int), cachep->align);
+   if (can_byte_index(cachep->num)) {
+   freelist_size = ALIGN(cachep->num * sizeof(unsigned char),
+   cachep->align);
+   } else {
+   freelist_size = ALIGN(cachep->num * sizeof(unsigned int),
+   cachep->align);
+   }
 
/*
 * If the slab has been placed off-slab, and we have enough space then
@@ -2267,7 +2311,10 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned 
long flags)
 
if (flags & CFLGS_OFF_SLAB) {

[REPOST PATCH 4/4] slab: make more slab management structure off the slab

2013-09-05 Thread Joonsoo Kim

Now, the size of the freelist for the slab management diminish,
so that the on-slab management structure can waste large space
if the object of the slab is large.

Consider a 128 byte sized slab. If on-slab is used, 31 objects can be
in the slab. The size of the freelist for this case would be 31 bytes
so that 97 bytes, that is, more than 75% of object size, are wasted.

In a 64 byte sized slab case, no space is wasted if we use on-slab.
So set off-slab determining constraint to 128 bytes.

Signed-off-by: Joonsoo Kim 

diff --git a/mm/slab.c b/mm/slab.c
index bd366e5..d01a2f0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2277,7 +2277,7 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned 
long flags)
 * it too early on. Always use on-slab management when
 * SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
 */
-   if ((size >= (PAGE_SIZE >> 3)) && !slab_early_init &&
+   if ((size >= (PAGE_SIZE >> 5)) && !slab_early_init &&
!(flags & SLAB_NOLEAKTRACE))
/*
 * Size is large, assume best to place the slab management obj
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[REPOST PATCH 0/4] slab: implement byte sized indexes for the freelist of a slab

2013-09-05 Thread Joonsoo Kim

* THIS IS JUST REPOSTED ACCORDING TO MAINTAINER'S REQUEST *

* Changes from original post
Correct the position of the results.
Attach more results about cache-misses and elapsed time on a hackbench test.

-
This patchset implements byte sized indexes for the freelist of a slab.

Currently, the freelist of a slab consist of unsigned int sized indexes.
Most of slabs have less number of objects than 256, so much space is wasted.
To reduce this overhead, this patchset implements byte sized indexes for
the freelist of a slab. With it, we can save 3 bytes for each objects.

This introduce one likely branch to functions used for setting/getting
objects to/from the freelist, but we may get more benefits from
this change.

Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
and this patchset.


* Before *
# name
 : tunables [snip...]
kmalloc-512  52760051281 : tunables   54   270 : 
slabdata 75 75  0   
kmalloc-256  210210256   151 : tunables  120   600 : 
slabdata 14 14  0   
kmalloc-192 1040   1040192   201 : tunables  120   600 : 
slabdata 52 52  0   
kmalloc-96   750750128   301 : tunables  120   600 : 
slabdata 25 25  0   
kmalloc-64  2773   2773 64   591 : tunables  120   600 : 
slabdata 47 47  0   
kmalloc-128  660690128   301 : tunables  120   600 : 
slabdata 23 23  0   
kmalloc-32 11200  11200 32  1121 : tunables  120   600 : 
slabdata100100  0   
kmem_cache   197200192   201 : tunables  120   600 : 
slabdata 10 10  0   

* After my previous posting(overload struct slab over struct page) *
# name
 : tunables [snip...]
kmalloc-512  52564051281 : tunables   54   270 : 
slabdata 80 80  0   
kmalloc-256  210210256   151 : tunables  120   600 : 
slabdata 14 14  0   
kmalloc-192 1016   1040192   201 : tunables  120   600 : 
slabdata 52 52  0   
kmalloc-96   560620128   311 : tunables  120   600 : 
slabdata 20 20  0   
kmalloc-64  2148   2280 64   601 : tunables  120   600 : 
slabdata 38 38  0   
kmalloc-128  647682128   311 : tunables  120   600 : 
slabdata 22 22  0   
kmalloc-32 11360  11413 32  1131 : tunables  120   600 : 
slabdata101101  0   
kmem_cache   197200192   201 : tunables  120   600 : 
slabdata 10 10  0   

kmem_caches consisting of objects less than or equal to 128 byte have one more
objects in a slab. You can see it at objperslab.

We can improve further with this patchset.

* My previous posting + this patchset *
# name
 : tunables [snip...]
kmalloc-512  52164851281 : tunables   54   270 : 
slabdata 81 81  0
kmalloc-256  208208256   161 : tunables  120   600 : 
slabdata 13 13  0
kmalloc-192 1029   1029192   211 : tunables  120   600 : 
slabdata 49 49  0
kmalloc-96   529589128   311 : tunables  120   600 : 
slabdata 19 19  0
kmalloc-64  2142   2142 64   631 : tunables  120   600 : 
slabdata 34 34  0
kmalloc-128  660682128   311 : tunables  120   600 : 
slabdata 22 22  0
kmalloc-32 11716  11780 32  1241 : tunables  120   600 : 
slabdata 95 95  0
kmem_cache   197210192   211 : tunables  120   600 : 
slabdata 10 10  0

kmem_caches consisting of objects less than or equal to 256 byte have
one or more objects than before. In the case of kmalloc-32, we have 11 more
objects, so 352 bytes (11 * 32) are saved and this is roughly 9% saving of
memory. Of couse, this percentage decreases as the number of objects
in a slab decreases.



Here are the performance results on my 4 cpus machine.

* Before *

 Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 
runs):

   238,309,671 cache-misses 
 ( +-  0.40% )

  12.010172090 seconds time elapsed 
 ( +-  0.21% )

* After my previous posting *

 Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 
runs):

   229,945,138 cache-misses 
 ( +-  0.23% )

  11.627897174 seconds time elapsed 
 ( +-  0.14% )

* My previous posting + this patchset *

 Performance counter stats for 'perf b

Re: [PATCH V4] regulator: palmas: add support for external control of rails

2013-09-05 Thread Laxman Dewangan


On Thursday 05 September 2013 09:04 PM, Mark Brown wrote:

* PGP Signed by an unknown key

On Thu, Sep 05, 2013 at 08:27:24PM +0530, Laxman Dewangan wrote:

On Thursday 05 September 2013 08:04 PM, Lee Jones wrote:

It won't go in until v3.12 now, but I have applied the patch.

Thanks Lee for taking care.

If it's going to wait for v3.12 there's no point applying it to MFD as
the dependency will be in mainline after the merge window.
Agree that it should go on regulator tree if it is v3.12 and if there is 
any issue on applying the patch, I will resend at that time after 
rebasing to that branch.


Thanks,
Laxman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf tools: Free strlist in strlist__delete()

2013-09-05 Thread Namhyung Kim

From: Namhyung Kim 

It seems strlist never deleted after allocated.  AFAICS every strlist
is allocated dynamically, just free it in the _delete() function.

Signed-off-by: Namhyung Kim 
---
 tools/perf/util/strlist.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/strlist.c b/tools/perf/util/strlist.c
index eabdce0a2daa..11593d899eb2 100644
--- a/tools/perf/util/strlist.c
+++ b/tools/perf/util/strlist.c
@@ -155,8 +155,10 @@ out_error:
 
 void strlist__delete(struct strlist *slist)
 {
-   if (slist != NULL)
+   if (slist != NULL) {
rblist__delete(&slist->rblist);
+   free(slist);
+   }
 }
 
 struct str_node *strlist__entry(const struct strlist *slist, unsigned int idx)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 09/35] mm: Track the freepage migratetype of pages accurately

2013-09-05 Thread Srivatsa S. Bhat

On 09/04/2013 01:53 PM, Yasuaki Ishimatsu wrote:
> (2013/09/03 17:45), Srivatsa S. Bhat wrote:
>> On 09/03/2013 12:08 PM, Yasuaki Ishimatsu wrote:
>>> (2013/08/30 22:16), Srivatsa S. Bhat wrote:
 Due to the region-wise ordering of the pages in the buddy allocator's
 free lists, whenever we want to delete a free pageblock from a free
 list
 (for ex: when moving blocks of pages from one list to the other), we
 need
 to be able to tell the buddy allocator exactly which migratetype it
 belongs
 to. For that purpose, we can use the page's freepage migratetype
 (which is
 maintained in the page's ->index field).

 So, while splitting up higher order pages into smaller ones as part of
 buddy
 operations, keep the new head pages updated with the correct freepage
 migratetype information (because we depend on tracking this info
 accurately,
 as outlined above).

 Signed-off-by: Srivatsa S. Bhat 
 ---

mm/page_alloc.c |7 +++
1 file changed, 7 insertions(+)

 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 398b62c..b4b1275 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -947,6 +947,13 @@ static inline void expand(struct zone *zone,
 struct page *page,
add_to_freelist(&page[size], &area->free_list[migratetype]);
area->nr_free++;
set_page_order(&page[size], high);
 +
 +/*
 + * Freepage migratetype is tracked using the index field of
 the
 + * first page of the block. So we need to update the new first
 + * page, when changing the page order.
 + */
 +set_freepage_migratetype(&page[size], migratetype);
}
}


>>>
>>> It this patch a bug fix patch?
>>> If so, I want you to split the patch from the patch-set.
>>>
>>
>> No, its not a bug-fix. We need to take care of this only when using the
>> sorted-buddy design to maintain the freelists, which is introduced
>> only in
>> this patchset. So mainline doesn't need this patch.
>>
>> In mainline, we can delete a page from a buddy freelist by simply calling
>> list_del() by passing a pointer to page->lru. It doesn't matter which
>> freelist
>> the page was belonging to. However, in the sorted-buddy design introduced
>> in this patchset, we also need to know which particular freelist we are
>> deleting that page from, because apart from breaking the ->lru link from
>> the linked-list, we also need to update certain other things such as the
>> region->page_block pointer etc, which are part of that particular
>> freelist.
>> Thus, it becomes essential to know which freelist we are deleting the
>> page
>> from. And for that, we need this patch to maintain that information
>> accurately
>> even during buddy operations such as splitting buddy pages in expand().
> 
> I may be wrong because I do not know this part clearly.
> 
> Original code is here:
> 
> ---
> static inline void expand(struct zone *zone, struct page *page,
> int low, int high, struct free_area *area,
> int migratetype)
> {
> ...
> list_add(&page[size].lru, &area->free_list[migratetype]);
> area->nr_free++;
> set_page_order(&page[size], high);
> ---
> 
> It seems that migratietype of page[size] page is changed. So even if not
> applying your patch, I think migratetype of the page should be changed.
> 

Hmm, thinking about this a bit more, I agree with you. Although its not a
bug-fix for mainline, it is certainly good to have, since it makes things
more consistent by tracking the freepage migratetype properly for pages
split during buddy expansion. I'll separate this patch from the series and
post it as a stand-alone patch. Thank you!

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 06/20] mm, hugetlb: return a reserved page to a reserved pool if failed

2013-09-05 Thread Joonsoo Kim

If we fail with a reserved page, just calling put_page() is not sufficient,
because put_page() invoke free_huge_page() at last step and it doesn't
know whether a page comes from a reserved pool or not. So it doesn't do
anything related to reserved count. This makes reserve count lower
than how we need, because reserve count already decrease in
dequeue_huge_page_vma(). This patch fix this situation.

In this patch, PagePrivate() is used for tracking reservation.
When resereved pages are dequeued from reserved pool, Private flag is
assigned to the hugepage until properly mapped. On page returning process,
if there is a hugepage with Private flag, it is considered as the one
returned in certain error path, so that we should restore one
reserve count back in order to preserve certain user's reserved hugepage.

Using Private flag is safe for the hugepage, because it doesn't use the
LRU mechanism so that there is no other user of this page except us.
Therefore we can use this flag safely.

Signed-off-by: Joonsoo Kim 
---
Replenishing commit message only.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6c8eec2..3f834f1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -572,6 +572,7 @@ retry_cpuset:
if (!vma_has_reserves(vma, chg))
break;
 
+   SetPagePrivate(page);
h->resv_huge_pages--;
break;
}
@@ -626,15 +627,20 @@ static void free_huge_page(struct page *page)
int nid = page_to_nid(page);
struct hugepage_subpool *spool =
(struct hugepage_subpool *)page_private(page);
+   bool restore_reserve;
 
set_page_private(page, 0);
page->mapping = NULL;
BUG_ON(page_count(page));
BUG_ON(page_mapcount(page));
+   restore_reserve = PagePrivate(page);
 
spin_lock(&hugetlb_lock);
hugetlb_cgroup_uncharge_page(hstate_index(h),
 pages_per_huge_page(h), page);
+   if (restore_reserve)
+   h->resv_huge_pages++;
+
if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
/* remove the page from active list */
list_del(&page->lru);
@@ -2616,6 +2622,8 @@ retry_avoidcopy:
spin_lock(&mm->page_table_lock);
ptep = huge_pte_offset(mm, address & huge_page_mask(h));
if (likely(pte_same(huge_ptep_get(ptep), pte))) {
+   ClearPagePrivate(new_page);
+
/* Break COW */
huge_ptep_clear_flush(vma, address, ptep);
set_huge_pte_at(mm, address, ptep,
@@ -2727,6 +2735,7 @@ retry:
goto retry;
goto out;
}
+   ClearPagePrivate(page);
 
spin_lock(&inode->i_lock);
inode->i_blocks += blocks_per_huge_page(h);
@@ -2773,8 +2782,10 @@ retry:
if (!huge_pte_none(huge_ptep_get(ptep)))
goto backout;
 
-   if (anon_rmap)
+   if (anon_rmap) {
+   ClearPagePrivate(page);
hugepage_add_new_anon_rmap(page, vma, address);
+   }
else
page_dup_rmap(page);
new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 03/20] mm, hugetlb: fix subpool accounting handling

2013-09-05 Thread Joonsoo Kim

There is a case that we attempt to allocate a hugepage with chg = 0 and
avoid_reserve = 1. Although chg = 0 means that it has a reserved hugepage,
we wouldn't use it, since avoid_reserve = 1 represents that we don't want
to allocate a hugepage from a reserved pool. This happens when the parent
process that created a MAP_PRIVATE mapping is about to perform a COW due to
a shared page count and it attempt to satisfy the allocation without using
the existing reserves.

In this case, we would not dequeue a reserved hugepage and, instead, try
to allocate a new hugepage. Therefore, we should check subpool counter
for a new hugepage. This patch implement it.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Joonsoo Kim 
---
Replenishing commit message and adding reviewed-by tag.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 12b6581..ea1ae0a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1144,13 +1144,14 @@ static struct page *alloc_huge_page(struct 
vm_area_struct *vma,
chg = vma_needs_reservation(h, vma, addr);
if (chg < 0)
return ERR_PTR(-ENOMEM);
-   if (chg)
-   if (hugepage_subpool_get_pages(spool, chg))
+   if (chg || avoid_reserve)
+   if (hugepage_subpool_get_pages(spool, 1))
return ERR_PTR(-ENOSPC);
 
ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg);
if (ret) {
-   hugepage_subpool_put_pages(spool, chg);
+   if (chg || avoid_reserve)
+   hugepage_subpool_put_pages(spool, 1);
return ERR_PTR(-ENOSPC);
}
spin_lock(&hugetlb_lock);
@@ -1162,7 +1163,8 @@ static struct page *alloc_huge_page(struct vm_area_struct 
*vma,
hugetlb_cgroup_uncharge_cgroup(idx,
   pages_per_huge_page(h),
   h_cg);
-   hugepage_subpool_put_pages(spool, chg);
+   if (chg || avoid_reserve)
+   hugepage_subpool_put_pages(spool, 1);
return ERR_PTR(-ENOSPC);
}
spin_lock(&hugetlb_lock);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 03/20] mm, hugetlb: fix subpool accounting handling

2013-09-05 Thread Joonsoo Kim

There is a case that we attempt to allocate a hugepage with chg = 0 and
avoid_reserve = 1. Although chg = 0 means that it has a reserved hugepage,
we wouldn't use it, since avoid_reserve = 1 represents that we don't want
to allocate a hugepage from a reserved pool. This happens when the parent
process that created a MAP_PRIVATE mapping is about to perform a COW due to
a shared page count and it attempt to satisfy the allocation without using
the existing reserves.

In this case, we would not dequeue a reserved hugepage and, instead, try
to allocate a new hugepage. Therefore, we should check subpool counter
for a new hugepage. This patch implement it.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Joonsoo Kim 
---
Replenishing commit message and adding reviewed-by tag.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 12b6581..ea1ae0a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1144,13 +1144,14 @@ static struct page *alloc_huge_page(struct 
vm_area_struct *vma,
chg = vma_needs_reservation(h, vma, addr);
if (chg < 0)
return ERR_PTR(-ENOMEM);
-   if (chg)
-   if (hugepage_subpool_get_pages(spool, chg))
+   if (chg || avoid_reserve)
+   if (hugepage_subpool_get_pages(spool, 1))
return ERR_PTR(-ENOSPC);
 
ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg);
if (ret) {
-   hugepage_subpool_put_pages(spool, chg);
+   if (chg || avoid_reserve)
+   hugepage_subpool_put_pages(spool, 1);
return ERR_PTR(-ENOSPC);
}
spin_lock(&hugetlb_lock);
@@ -1162,7 +1163,8 @@ static struct page *alloc_huge_page(struct vm_area_struct 
*vma,
hugetlb_cgroup_uncharge_cgroup(idx,
   pages_per_huge_page(h),
   h_cg);
-   hugepage_subpool_put_pages(spool, chg);
+   if (chg || avoid_reserve)
+   hugepage_subpool_put_pages(spool, 1);
return ERR_PTR(-ENOSPC);
}
spin_lock(&hugetlb_lock);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 1/4] mm/zswap: bugfix: memory leak when re-swapon

2013-09-05 Thread Weijie Yang

zswap_tree is not freed when swapoff, and it got re-kmalloc in swapon,
so memory-leak occurs.

Modify: free memory of zswap_tree in zswap_frontswap_invalidate_area().

Signed-off-by: Weijie Yang 
---
 mm/zswap.c |4 
 1 file changed, 4 insertions(+)

diff --git a/mm/zswap.c b/mm/zswap.c
index deda2b6..cbd9578 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -816,6 +816,10 @@ static void zswap_frontswap_invalidate_area(unsigned type)
}
tree->rbroot = RB_ROOT;
spin_unlock(&tree->lock);
+
+   zbud_destroy_pool(tree->pool);
+   kfree(tree);
+   zswap_trees[type] = NULL;
 }
 
 static struct zbud_ops zswap_zbud_ops = {
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 2/4] mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently

2013-09-05 Thread Weijie Yang

Consider the following scenario:
thread 0: reclaim entry x (get refcount, but not call zswap_get_swap_cache_page)
thread 1: call zswap_frontswap_invalidate_page to invalidate entry x.
finished, entry x and its zbud is not freed as its refcount != 0
now, the swap_map[x] = 0
thread 0: now call zswap_get_swap_cache_page
swapcache_prepare return -ENOENT because entry x is not used any more
zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM
zswap_writeback_entry do nothing except put refcount
Now, the memory of zswap_entry x and its zpage leak.

Modify:
- check the refcount in fail path, free memory if it is not referenced.
- use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path
can be not only caused by nomem but also by invalidate.

Signed-off-by: Weijie Yang 
---
 mm/zswap.c |   21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index cbd9578..1be7b90 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -387,7 +387,7 @@ static void zswap_free_entry(struct zswap_tree *tree, 
struct zswap_entry *entry)
 enum zswap_get_swap_ret {
ZSWAP_SWAPCACHE_NEW,
ZSWAP_SWAPCACHE_EXIST,
-   ZSWAP_SWAPCACHE_NOMEM
+   ZSWAP_SWAPCACHE_FAIL,
 };
 
 /*
@@ -401,9 +401,9 @@ enum zswap_get_swap_ret {
  * added to the swap cache, and returned in retpage.
  *
  * If success, the swap cache page is returned in retpage
- * Returns 0 if page was already in the swap cache, page is not locked
- * Returns 1 if the new page needs to be populated, page is locked
- * Returns <0 on error
+ * Returns ZSWAP_SWAPCACHE_EXIST if page was already in the swap cache
+ * Returns ZSWAP_SWAPCACHE_NEW if the new page needs to be populated, page is 
locked
+ * Returns ZSWAP_SWAPCACHE_FAIL on error
  */
 static int zswap_get_swap_cache_page(swp_entry_t entry,
struct page **retpage)
@@ -475,7 +475,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
if (new_page)
page_cache_release(new_page);
if (!found_page)
-   return ZSWAP_SWAPCACHE_NOMEM;
+   return ZSWAP_SWAPCACHE_FAIL;
*retpage = found_page;
return ZSWAP_SWAPCACHE_EXIST;
 }
@@ -529,11 +529,11 @@ static int zswap_writeback_entry(struct zbud_pool *pool, 
unsigned long handle)
 
/* try to allocate swap cache page */
switch (zswap_get_swap_cache_page(swpentry, &page)) {
-   case ZSWAP_SWAPCACHE_NOMEM: /* no memory */
+   case ZSWAP_SWAPCACHE_FAIL: /* no memory or invalidate happened */
ret = -ENOMEM;
goto fail;
 
-   case ZSWAP_SWAPCACHE_EXIST: /* page is unlocked */
+   case ZSWAP_SWAPCACHE_EXIST:
/* page is already in the swap cache, ignore for now */
page_cache_release(page);
ret = -EEXIST;
@@ -591,7 +591,12 @@ static int zswap_writeback_entry(struct zbud_pool *pool, 
unsigned long handle)
 
 fail:
spin_lock(&tree->lock);
-   zswap_entry_put(entry);
+   refcount = zswap_entry_put(entry);
+   if (refcount <= 0) {
+   /* invalidate happened, consider writeback as success */
+   zswap_free_entry(tree, entry);
+   ret = 0;
+   }
spin_unlock(&tree->lock);
return ret;
 }
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 3/4] mm/zswap: avoid unnecessary page scanning

2013-09-05 Thread Weijie Yang

add SetPageReclaim before __swap_writepage so that page can be moved to the
tail of the inactive list, which can avoid unnecessary page scanning as this
page was reclaimed by swap subsystem before.

Signed-off-by: Weijie Yang 
---
 mm/zswap.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/zswap.c b/mm/zswap.c
index 1be7b90..cc40e6a 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -556,6 +556,9 @@ static int zswap_writeback_entry(struct zbud_pool *pool, 
unsigned long handle)
SetPageUptodate(page);
}
 
+   /* move it to the tail of the inactive list after end_writeback */
+   SetPageReclaim(page);
+
/* start writeback */
__swap_writepage(page, &wbc, end_swap_bio_write);
page_cache_release(page);
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 4/4] mm/zswap: use GFP_NOIO instead of GFP_KERNEL

2013-09-05 Thread Weijie Yang

To avoid zswap store and reclaim functions called recursively,
use GFP_NOIO instead of GFP_KERNEL

Signed-off-by: Weijie Yang 
---
 mm/zswap.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index cc40e6a..3d05ed8 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -427,7 +427,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
 * Get a new page to read into from swap.
 */
if (!new_page) {
-   new_page = alloc_page(GFP_KERNEL);
+   new_page = alloc_page(GFP_NOIO);
if (!new_page)
break; /* Out of memory */
}
@@ -435,7 +435,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
/*
 * call radix_tree_preload() while we can wait.
 */
-   err = radix_tree_preload(GFP_KERNEL);
+   err = radix_tree_preload(GFP_NOIO);
if (err)
break;
 
@@ -636,7 +636,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t 
offset,
}
 
/* allocate entry */
-   entry = zswap_entry_cache_alloc(GFP_KERNEL);
+   entry = zswap_entry_cache_alloc(GFP_NOIO);
if (!entry) {
zswap_reject_kmemcache_fail++;
ret = -ENOMEM;
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 0/4] mm/zswap bugfix: memory leaks and other problems

2013-09-05 Thread Weijie Yang

This patch series fix a few bugs in zswap based on Linux-3.11.

v1 --> v2
- free memory in zswap_frontswap_invalidate_area (in patch 1)
- fix whitespace corruption (line wrapping)

Corresponding mail thread: https://lkml.org/lkml/2013/8/18/59

These issues fixed/optimized are:

 1. memory leaks when re-swapon
 
 2. memory leaks when invalidate and reclaim occur concurrently
 
 3. avoid unnecessary page scanning
 
 4. use GFP_NOIO instead of GFP_KERNEL to avoid zswap store and reclaim 
functions called recursively

Issues discussed in that mail thread NOT fixed as it happens rarely or
not a big problem:

 1. a "theoretical race condition" when reclaim page
When a handle alloced from zbud, zbud considers this handle is used
validly by upper(zswap) and can be a candidate for reclaim. But zswap has
to initialize it such as setting swapentry and adding it to rbtree.
so there is a race condition, such as:
 thread 0: obtain handle x from zbud_alloc
 thread 1: zbud_reclaim_page is called
 thread 1: callback zswap_writeback_entry to reclaim handle x
 thread 1: get swpentry from handle x (it is random value now)
 thread 1: bad thing may happen
 thread 0: initialize handle x with swapentry

2. frontswap_map bitmap not cleared after zswap reclaim
Frontswap uses frontswap_map bitmap to track page in "backend" 
implementation,
when zswap reclaim a page, the corresponding bitmap record is not cleared.

 mm/zswap.c |   34 +++---
 1 file changed, 23 insertions(+), 11 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 0/2] ext4: increase mbcache scalability

2013-09-05 Thread Andreas Dilger

On 2013-09-05, at 3:49 AM, Thavatchai Makphaibulchoke wrote:
> On 09/05/2013 02:35 AM, Theodore Ts'o wrote:
>> How did you gather these results?  The mbcache is only used if you
>> are using extended attributes, and only if the extended attributes don't fit 
>> in the inode's extra space.
>> 
>> I checked aim7, and it doesn't do any extended attribute operations.
>> So why are you seeing differences?  Are you doing something like
>> deliberately using 128 byte inodes (which is not the default inode
>> size), and then enabling SELinux, or some such?
> 
> No, I did not do anything special, including changing an inode's size. I just 
> used the profile data, which indicated mb_cache module as one of the 
> bottleneck.  Please see below for perf data from one of th new_fserver run, 
> which also shows some mb_cache activities.
> 
> 
>|--3.51%-- __mb_cache_entry_find
>|  mb_cache_entry_find_first
>|  ext4_xattr_cache_find
>|  ext4_xattr_block_set
>|  ext4_xattr_set_handle
>|  ext4_initxattrs
>|  security_inode_init_security
>|  ext4_init_security

Looks like this is some large security xattr, or enough smaller
xattrs to exceed the ~120 bytes of in-inode xattr storage.  How
big is the SELinux xattr (assuming that is what it is)?

> Looks like it's a bit harder to disable mbcache than I thought.
> I ended up adding code to collect the statics.
> 
> With selinux enabled, for new_fserver workload of aim7, there
> are a total of 0x7e054201 ext4_xattr_cache_find() calls
> that result in a hit and 0xc100 calls that are not.
> The number does not seem to favor the complete disabling of
> mbcache in this case.

This is about a 65% hit rate, which seems reasonable.

You could try a few different things here:
- disable selinux completely (boot with "selinux=0" on the kernel
  command line) and see how much faster it is
- format your ext4 filesystem with larger inodes (-I 512) and see
  if this is an improvement or not.  That depends on the size of
  the selinux xattrs and if they will fit into the extra 256 bytes
  of xattr space these larger inodes will give you.  The performance
  might also be worse, since there will be more data to read/write
  for each inode, but it would avoid seeking to the xattr blocks.

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: build warning after merge of the ceph tree

2013-09-05 Thread Stephen Rothwell

Hi Sage,

After merging the ceph tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

In file included from fs/ceph/super.h:4:0,
 from fs/ceph/cache.c:26:
include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by 
default]
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 ^
In file included from include/linux/kernel.h:13:0,
 from include/asm-generic/bug.h:13,
 from arch/x86/include/asm/bug.h:38,
 from include/linux/bug.h:4,
 from include/linux/thread_info.h:11,
 from include/linux/preempt.h:9,
 from include/linux/spinlock.h:50,
 from include/linux/wait.h:7,
 from include/linux/fs.h:6,
 from include/linux/fscache.h:21,
 from fs/ceph/cache.c:24:
include/linux/printk.h:206:0: note: this is the location of the previous 
definition
 #define pr_fmt(fmt) fmt
 ^

Probably introduced by commit cb0963fcf836 ("ceph: use fscache as a local
presisent cache").

pr_fmt needs to be defined before printk.h gets included.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpepYAR3zjQQ.pgp
Description: PGP signature

Re: [PATCH] ethernet/arc/arc_emac: optimize the Tx/Tx-reclaim paths a bit

2013-09-05 Thread David Miller

From: Vineet Gupta 
Date: Fri, 6 Sep 2013 04:24:39 +

> On 09/05/2013 11:54 PM, David Miller wrote:
>> You should keep the check in the transmit queueing code as a BUG check,
>> almost every driver has code of the form (using NIU as an example):
 ...
>> Otherwise queue management bugs are incredibly hard to diagnose.
>>
>> I'm not applying this patch.
> 
> The check is already there for current BD. What I removed was checking for 
> next BD
> too (please see below). IMHO this is useless since it will be done in next
> iteration anyways. In my tests, the next check never got hit, so it was waste 
> of
> cycles.
> 
> static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev)
> {
> if (unlikely((le32_to_cpu(*info) & OWN_MASK) == FOR_EMAC)) {
> netif_stop_queue(ndev);
> return NETDEV_TX_BUSY;
> }
> 
> ...
> *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;
> 
> -   /* Get "info" of the next BD */
> -   info = &priv->txbd[*txbd_curr].info;
> -
> -   /* Check if if Tx BD ring is full - next BD is still owned by EMAC */
> -   if (unlikely((le32_to_cpu(*info) & OWN_MASK) == FOR_EMAC))
> -   netif_stop_queue(ndev);
> 
> OTOH, I do see a slight stats update issue - if the queue is stopped (but pkt 
> not
> dropped) we are failing to increment tx_errors. But that would be a separate 
> patch.

It is exactly the correct thing to do.  The driver should _NEVER_
return NETDEV_TX_BUSY under normal circumstances.  The queue should
always be stopped by the ->ndo_start_xmit() method when it fills the
queue.

Again, when ->ndo_start_xmit() is invoked, it should never see the
queue full.  When that happens it is a bug.

You are deleting exactly the correct part of this function, what it is
doing right now is precisely the correct way to manage netif queue
state.

The only valid change you can make here is to make the:

if (unlikely((le32_to_cpu(*info) & OWN_MASK) == FOR_EMAC)) {
netif_stop_queue(ndev);
return NETDEV_TX_BUSY;
}

print out an error message and increment tx_errors.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/9] KGDB/KDB: add new system NMI entry code to KDB

2013-09-05 Thread Jason Wessel

On 09/05/2013 05:50 PM, Mike Travis wrote:
> This patch adds a new "KDB_REASON" code (KDB_REASON_SYSTEM_NMI).  This
> is purely cosmetic to distinguish it from the other various reasons that
> NMI may occur and are usually after an error occurred.  Also the dumping
> of registers is not done to more closely match what is displayed when KDB
> is entered manually via the sysreq 'g' key.


This patch is not quite right.   See below.


> 
> Signed-off-by: Mike Travis 
> Reviewed-by: Dimitri Sivanich 
> Reviewed-by: Hedi Berriche 
> ---
>  include/linux/kdb.h |1 +
>  include/linux/kgdb.h|1 +
>  kernel/debug/debug_core.c   |5 +
>  kernel/debug/kdb/kdb_debugger.c |5 -
>  kernel/debug/kdb/kdb_main.c |3 +++
>  5 files changed, 14 insertions(+), 1 deletion(-)
> 
> --- linux.orig/include/linux/kdb.h
> +++ linux/include/linux/kdb.h
> @@ -109,6 +109,7 @@ typedef enum {
>   KDB_REASON_RECURSE, /* Recursive entry to kdb;
>* regs probably valid */
>   KDB_REASON_SSTEP,   /* Single Step trap. - regs valid */
> + KDB_REASON_SYSTEM_NMI,  /* In NMI due to SYSTEM cmd; regs valid */
>  } kdb_reason_t;
>  
>  extern int kdb_trap_printk;
> --- linux.orig/include/linux/kgdb.h
> +++ linux/include/linux/kgdb.h
> @@ -52,6 +52,7 @@ extern int kgdb_connected;
>  extern int kgdb_io_module_registered;
>  
>  extern atomic_t  kgdb_setting_breakpoint;
> +extern atomic_t  kgdb_system_nmi;


We don't need extra atomics.  You should add another variable to the kgdb_state 
which is processor specific in this case.

Better yet, just set the ks->err_code properly in your kgdb_nmicallin() or in 
the origination call to kgdb_nmicallback() from your nmi handler (remember I 
still have the question pending if we actually need kgdb_nmicallin() in the 
first place.  You already did the work of adding another NMI type to the enum.  
We just need to use the ks->err_code variable as well.


>  extern atomic_t  kgdb_cpu_doing_single_step;
>  
>  extern struct task_struct*kgdb_usethread;
> --- linux.orig/kernel/debug/debug_core.c
> +++ linux/kernel/debug/debug_core.c
> @@ -125,6 +125,7 @@ static atomic_t   masters_in_kgdb;
>  static atomic_t  slaves_in_kgdb;
>  static atomic_t  kgdb_break_tasklet_var;
>  atomic_t kgdb_setting_breakpoint;
> +atomic_t kgdb_system_nmi;
>  
>  struct task_struct   *kgdb_usethread;
>  struct task_struct   *kgdb_contthread;
> @@ -760,7 +761,11 @@ int kgdb_nmicallin(int cpu, int trapnr,
>  
>   /* Indicate there are slaves waiting */
>   kgdb_info[cpu].send_ready = send_ready;
> +
> + /* Use new reason code "SYSTEM_NMI" */
> + atomic_inc(&kgdb_system_nmi);
>   kgdb_cpu_enter(ks, regs, DCPU_WANT_MASTER);
> + atomic_dec(&kgdb_system_nmi);
>   kgdb_do_roundup = save_kgdb_do_roundup;
>   kgdb_info[cpu].send_ready = NULL;
>  
> --- linux.orig/kernel/debug/kdb/kdb_debugger.c
> +++ linux/kernel/debug/kdb/kdb_debugger.c
> @@ -69,7 +69,10 @@ int kdb_stub(struct kgdb_state *ks)
>   if (atomic_read(&kgdb_setting_breakpoint))
>   reason = KDB_REASON_KEYBOARD;
>  
> - if (in_nmi())
> + if (atomic_read(&kgdb_system_nmi))
> + reason = KDB_REASON_SYSTEM_NMI;


This would get changed to if (ks->err == KDB_REASON_SYSNMI && ks->signo == 
SIGTRAP) 

Cheers,
Jason.

> +
> + else if (in_nmi())
>   reason = KDB_REASON_NMI;
>  
>   for (i = 0, bp = kdb_breakpoints; i < KDB_MAXBPT; i++, bp++) {
> --- linux.orig/kernel/debug/kdb/kdb_main.c
> +++ linux/kernel/debug/kdb/kdb_main.c
> @@ -1200,6 +1200,9 @@ static int kdb_local(kdb_reason_t reason
>  instruction_pointer(regs));
>   kdb_dumpregs(regs);
>   break;
> + case KDB_REASON_SYSTEM_NMI:
> + kdb_printf("due to System NonMaskable Interrupt\n");
> + break;
>   case KDB_REASON_NMI:
>   kdb_printf("due to NonMaskable Interrupt @ "
>  kdb_machreg_fmt "\n",
> 
> -- 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND v3 3/7] Intel MIC Host Driver, card OS state management.

2013-09-05 Thread Greg Kroah-Hartman

On Thu, Sep 05, 2013 at 04:41:55PM -0700, Sudeep Dutt wrote:
> +What:/sys/class/mic/mic(x)/firmware
> +Date:August 2013
> +KernelVersion:   3.11
> +Contact: Sudeep Dutt 
> +Description:
> + When read, this sysfs entry provides the path name under
> + /lib/firmware/ where the firmware image to be booted on the
> + card can be found. The entry can be written to change the
> + firmware image location under /lib/firmware/.

I don't understand, is the path under the HOST device, or the Client
device's disk?  Why do you need to change the path on the HOST?  What's
wrong with the existing firmware path selection we have in the kernel?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND v3 3/7] Intel MIC Host Driver, card OS state management.

2013-09-05 Thread Greg Kroah-Hartman

On Thu, Sep 05, 2013 at 04:41:55PM -0700, Sudeep Dutt wrote:
> +What:/sys/class/mic/mic(x)/cmdline
> +Date:August 2013
> +KernelVersion:   3.11
> +Contact: Sudeep Dutt 
> +Description:
> + An Intel MIC device runs a Linux OS during its operation. Before
> + booting this card OS, it is possible to pass kernel command line
> + options to configure various features in it, similar to
> + self-bootable machines. When read, this entry provides
> + information about the current kernel command line options set to
> + boot the card OS. This entry can be written to change the
> + existing kernel command line options. Typically, the user would
> + want to read the current command line options, append new ones
> + or modify existing ones and then write the whole kernel command
> + line back to this entry.

Is a PAGE_SIZE value going to be big enough for your command line?  I
know some embedded systems have horribly long command lines, hopefully
this will be enough for you.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND v3 3/7] Intel MIC Host Driver, card OS state management.

2013-09-05 Thread Greg Kroah-Hartman

Again, very minor fixups for later (I can even do them...)

> +static DEVICE_ATTR(state, S_IRUGO|S_IWUSR, mic_show_state, mic_store_state);

DEVICE_ATTR_RW() please.

Same for the other attributes you create in this patch.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/1] dcache: Translating dentry into pathname without taking rename_lock

2013-09-05 Thread Linus Torvalds

On Thu, Sep 5, 2013 at 7:01 PM, Waiman Long  wrote:
>
> I am sorry that I misunderstand what you said. I will do what you and Al
> advise me to do.

I'm sorry I shouted at you. I was getting a bit frustrated there..

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND v3 1/7] Intel MIC Host Driver for X100 family.

2013-09-05 Thread Greg Kroah-Hartman

On Thu, Sep 05, 2013 at 04:41:31PM -0700, Sudeep Dutt wrote:
>  drivers/misc/mic/common/mic_device.h  |  37 +++
>  drivers/misc/mic/host/mic_device.h| 109 +

Two different files, with the same name?  You are asking for trouble in
the future, getting them confused :)

Please try to pick a unique name, especially when you later do things
like:

> +#include "../common/mic_device.h"
> +#include "mic_device.h"

Which just looks odd.

Again, not a big deal, follow-on patch can fix this.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND v3 1/7] Intel MIC Host Driver for X100 family.

2013-09-05 Thread Greg Kroah-Hartman

Very minor nits, you can change this in a future add-on patch:

> +static DEVICE_ATTR(family, S_IRUGO, mic_show_family, NULL);

This should use DEVICE_ATTR_RO(), so that we don't have to audit the
permissions of your DEVICE_ATTR() files.

> +static DEVICE_ATTR(stepping, S_IRUGO, mic_show_stepping, NULL);

Same here.

> +static struct attribute *mic_default_attrs[] = {
> + &dev_attr_family.attr,
> + &dev_attr_stepping.attr,
> +
> + NULL
> +};
> +
> +static struct attribute_group mic_attr_group = {
> + .attrs = mic_default_attrs,
> +};
> +
> +static const struct attribute_group *__mic_attr_group[] = {
> + &mic_attr_group,
> + NULL
> +};

These last two structures can be replaced with:
ATTRIBUTE_GROUPS(mic_default);

> +void mic_sysfs_init(struct mic_device *mdev)
> +{
> + mdev->attr_group = __mic_attr_group;
> +}

This is "odd", why not just export the data structure and reference it
in the other code?  The pci core does this, and so do other busses.

Anyway, it's not a big deal, just a bit strange to me.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/8] ceph: fscache support & upstream changes

2013-09-05 Thread Milosz Tanski

David,

After running this for a day on some loaded machines I ran into what
looks like an old issue with the new code. I remember you saw an issue
that manifested it self in a similar way a while back.

[13837253.462779] FS-Cache: Assertion failed
[13837253.462782] 3 == 5 is false
[13837253.462807] [ cut here ]
[13837253.462811] kernel BUG at fs/fscache/operation.c:414!
[13837253.462815] invalid opcode:  [#1] SMP
[13837253.462820] Modules linked in: cachefiles microcode auth_rpcgss
oid_registry nfsv4 nfs lockd ceph sunrpc libceph fscache raid10
raid456 async_pq async_xor async_memcpy async_raid6_recov async_tx
raid1 raid0 multipath linear btrfs raid6_pq lzo_compress xor
zlib_deflate libcrc32c
[13837253.462851] CPU: 1 PID: 1848 Comm: kworker/1:2 Not tainted
3.11.0-rc5-virtual #55
[13837253.462870] Workqueue: ceph-revalidate ceph_revalidate_work [ceph]
[13837253.462875] task: 8804251f16f0 ti: 8804047fa000 task.ti:
8804047fa000
[13837253.462879] RIP: e030:[]  []
fscache_put_operation+0x2ad/0x330 [fscache]
[13837253.462893] RSP: e02b:8804047fbd58  EFLAGS: 00010296
[13837253.462896] RAX: 000f RBX: 880424049d80 RCX:
0006
[13837253.462901] RDX: 0007 RSI: 0007 RDI:
8804047f0218
[13837253.462906] RBP: 8804047fbd68 R08:  R09:

[13837253.462910] R10: 0108 R11: 0107 R12:
8804251cf928
[13837253.462915] R13: 8804253c7370 R14:  R15:

[13837253.462923] FS:  7f5c56e43700()
GS:88044350() knlGS:
[13837253.462928] CS:  e033 DS:  ES:  CR0: 8005003b
[13837253.462932] CR2: 7fc08b7ee000 CR3: 0004259a4000 CR4:
2660
[13837253.462936] Stack:
[13837253.462939]  880424049d80 8804251cf928 8804047fbda8
a016def1
[13837253.462946]  88042b462b20 88040701c750 88040701c730
88040701c3f0
[13837253.462953]  0003  8804047fbde8
a025ba3f
[13837253.462959] Call Trace:
[13837253.462966]  []
__fscache_check_consistency+0x1a1/0x2c0 [fscache]
[13837253.462977]  [] ceph_revalidate_work+0x8f/0x120 [ceph]
[13837253.462987]  [] process_one_work+0x179/0x490
[13837253.462992]  [] worker_thread+0x11b/0x370
[13837253.462998]  [] ? manage_workers.isra.21+0x2e0/0x2e0
[13837253.463004]  [] kthread+0xc0/0xd0
[13837253.463011]  [] ? perf_trace_xen_mmu_pmd_clear+0x50/0xc0
[13837253.463017]  [] ? flush_kthread_worker+0xb0/0xb0
[13837253.463024]  [] ret_from_fork+0x7c/0xb0
[13837253.463029]  [] ? flush_kthread_worker+0xb0/0xb0
[13837253.463033] Code: 31 c0 e8 5d e6 3e e1 48 c7 c7 04 8e 17 a0 31
c0 e8 4f e6 3e e1 8b 73 40 ba 05 00 00 00 48 c7 c7 62 8e 17 a0 31 c0
e8 39 e6 3e e1 <0f> 0b 65 48 8b 34 25 80 c7 00 00 48 c7 c7 4f 8e 17 a0
48 81 c6
[13837253.463071] RIP  []
fscache_put_operation+0x2ad/0x330 [fscache]
[13837253.463079]  RSP 
[13837253.463085] ---[ end trace 2972d68e8efd961e ]---
[13837253.463130] BUG: unable to handle kernel paging request at
ffd8
[13837253.463136] IP: [] kthread_data+0x11/0x20
[13837253.463142] PGD 1a0f067 PUD 1a11067 PMD 0
[13837253.463146] Oops:  [#2] SMP
[13837253.463150] Modules linked in: cachefiles microcode auth_rpcgss
oid_registry nfsv4 nfs lockd ceph sunrpc libceph fscache raid10
raid456 async_pq async_xor async_memcpy async_raid6_recov async_tx
raid1 raid0 multipath linear btrfs raid6_pq lzo_compress xor
zlib_deflate libcrc32c
[13837253.463176] CPU: 1 PID: 1848 Comm: kworker/1:2 Tainted: G  D
 3.11.0-rc5-virtual #55
[13837253.463190] task: 8804251f16f0 ti: 8804047fa000 task.ti:
8804047fa000
[13837253.463194] RIP: e030:[]  []
kthread_data+0x11/0x20
[13837253.463201] RSP: e02b:8804047fba00  EFLAGS: 00010046
[13837253.463204] RAX:  RBX:  RCX:
81c30d00
[13837253.463209] RDX: 0001 RSI: 0001 RDI:
8804251f16f0
[13837253.463213] RBP: 8804047fba18 R08: 27bf1216 R09:

[13837253.463217] R10: 88044360cec0 R11: 000e R12:
0001
[13837253.463222] R13: 8804251f1ac8 R14: 88042c498000 R15:

[13837253.463228] FS:  7f5c56e43700()
GS:88044350() knlGS:
[13837253.463233] CS:  e033 DS:  ES:  CR0: 8005003b
[13837253.463237] CR2: 0028 CR3: 0004259a4000 CR4:
2660
[13837253.463241] Stack:
[13837253.463243]  8107c3d6 880443513fc0 0001
8804047fba98
[13837253.463249]  81568308 0003 8804251f1ce8
8804251f16f0
[13837253.463255]  8804047fbfd8 8804047fbfd8 8804047fbfd8
8804047fba78
[13837253.463261] Call Trace:
[13837253.463265]  [] ? wq_worker_sleeping+0x16/0x90
[13837253.463272]  [] __schedule+0x5c8/0x820
[13837253.463276]  [] schedule+0x29/0x70
[13837253.662186]  [] do_exit+0x6e0/0xa60
[138

Re: [PATCH 2/3] thermal: samsung: change base_common to more meaningful base_second

2013-09-05 Thread amit daniel kachhap

On Wed, Sep 4, 2013 at 9:53 AM, Naveen Krishna Chatradhi
 wrote:
> On Exynos5440 and Exynos5420 there are registers common
> across the TMU channels.
>
> To support that, we introduced a ADDRESS_MULTIPLE flag in the
> driver and the 2nd set of register base and size are provided
> in the "reg" property of the node.
>
> As per Amit's suggestion, this patch changes the base_common
> to base_second and SHARED_MEMORY to ADDRESS_MULTIPLE.
>
> Signed-off-by: Naveen Krishna Chatradhi 
The changes look good. For all the 3 patches in the series,

Acked-by: Amit Daniel Kachhap 
Reviewed-by: Amit Daniel Kachhap

Thanks,
Amit Daniel
> ---
> Changes since v2:
> Changed the flag name from SHARED_MEMORY to ADDRESS_MULTIPLE.
> https://lkml.org/lkml/2013/8/1/38
>
>  .../devicetree/bindings/thermal/exynos-thermal.txt |4 ++--
>  drivers/thermal/samsung/exynos_tmu.c   |   12 ++--
>  drivers/thermal/samsung/exynos_tmu.h   |4 ++--
>  drivers/thermal/samsung/exynos_tmu_data.c  |2 +-
>  4 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/thermal/exynos-thermal.txt 
> b/Documentation/devicetree/bindings/thermal/exynos-thermal.txt
> index 284f530..116cca0 100644
> --- a/Documentation/devicetree/bindings/thermal/exynos-thermal.txt
> +++ b/Documentation/devicetree/bindings/thermal/exynos-thermal.txt
> @@ -11,8 +11,8 @@
>  - reg : Address range of the thermal registers. For soc's which has multiple
> instances of TMU and some registers are shared across all TMU's like
> interrupt related then 2 set of register has to supplied. First set
> -   belongs to each instance of TMU and second set belongs to common TMU
> -   registers.
> +   belongs to each instance of TMU and second set belongs to second set
> +   of common TMU registers.
>  - interrupts : Should contain interrupt for thermal system
>  - clocks : The main clock for TMU device
>  - clock-names : Thermal system clock name
> diff --git a/drivers/thermal/samsung/exynos_tmu.c 
> b/drivers/thermal/samsung/exynos_tmu.c
> index d201ed8..3a55caf 100644
> --- a/drivers/thermal/samsung/exynos_tmu.c
> +++ b/drivers/thermal/samsung/exynos_tmu.c
> @@ -41,7 +41,7 @@
>   * @id: identifier of the one instance of the TMU controller.
>   * @pdata: pointer to the tmu platform/configuration data
>   * @base: base address of the single instance of the TMU controller.
> - * @base_common: base address of the common registers of the TMU controller.
> + * @base_second: base address of the common registers of the TMU controller.
>   * @irq: irq number of the TMU controller.
>   * @soc: id of the SOC type.
>   * @irq_work: pointer to the irq work structure.
> @@ -56,7 +56,7 @@ struct exynos_tmu_data {
> int id;
> struct exynos_tmu_platform_data *pdata;
> void __iomem *base;
> -   void __iomem *base_common;
> +   void __iomem *base_second;
> int irq;
> enum soc_type soc;
> struct work_struct irq_work;
> @@ -297,7 +297,7 @@ skip_calib_data:
> }
> /*Clear the PMIN in the common TMU register*/
> if (reg->tmu_pmin && !data->id)
> -   writel(0, data->base_common + reg->tmu_pmin);
> +   writel(0, data->base_second + reg->tmu_pmin);
>  out:
> clk_disable(data->clk);
> mutex_unlock(&data->lock);
> @@ -451,7 +451,7 @@ static void exynos_tmu_work(struct work_struct *work)
>
> /* Find which sensor generated this interrupt */
> if (reg->tmu_irqstatus) {
> -   val_type = readl(data->base_common + reg->tmu_irqstatus);
> +   val_type = readl(data->base_second + reg->tmu_irqstatus);
> if (!((val_type >> data->id) & 0x1))
> goto out;
> }
> @@ -582,7 +582,7 @@ static int exynos_map_dt_data(struct platform_device 
> *pdev)
>  * Check if the TMU shares some registers and then try to map the
>  * memory of common registers.
>  */
> -   if (!TMU_SUPPORTS(pdata, SHARED_MEMORY))
> +   if (!TMU_SUPPORTS(pdata, ADDRESS_MULTIPLE))
> return 0;
>
> if (of_address_to_resource(pdev->dev.of_node, 1, &res)) {
> @@ -590,7 +590,7 @@ static int exynos_map_dt_data(struct platform_device 
> *pdev)
> return -ENODEV;
> }
>
> -   data->base_common = devm_ioremap(&pdev->dev, res.start,
> +   data->base_second = devm_ioremap(&pdev->dev, res.start,
> resource_size(&res));
> if (!data->base) {
> dev_err(&pdev->dev, "Failed to ioremap memory\n");
> diff --git a/drivers/thermal/samsung/exynos_tmu.h 
> b/drivers/thermal/samsung/exynos_tmu.h
> index 7c6c34a..ebd2ec1 100644
> --- a/drivers/thermal/samsung/exynos_tmu.h
> +++ b/drivers/thermal/samsung/exynos_tmu.h
> @@ -59,7 +59,7 @@ enum soc_type {
>   * state(active/idle) can be checked.
>   * TMU_S

Re: [PATCH 5/9] KGDB/KDB: add support for external NMI handler to call KGDB/KDB.

2013-09-05 Thread Jason Wessel

On 09/05/2013 05:50 PM, Mike Travis wrote:
> This patch adds a kgdb_nmicallin() interface that can be used by
> external NMI handlers to call the KGDB/KDB handler.  The primary need
> for this is for those types of NMI interrupts where all the CPUs
> have already received the NMI signal.  Therefore no send_IPI(NMI)
> is required, and in fact it will cause a 2nd unhandled NMI to occur.
> This generates the "Dazed and Confuzed" messages.
>
> Since all the CPUs are getting the NMI at roughly the same time, it's not
> guaranteed that the first CPU that hits the NMI handler will manage to
> enter KGDB and set the dbg_master_lock before the slaves start entering.

It should have been ok to have more than one master if this was some kind of 
watch dog.  The raw spin lock for the dbg_master_lock should have ensured that 
only a single CPU is in fact the master.  If it is the case that we cannot send 
a nested IPI at this point, the UV machine type should have replaced the 
kgdb_roundup_cpus() routine with something that will work, such as looking at 
the exception type on the way in and perhaps skipping the IPI send.

Also if there is no possibility of restarting the machine from this state it 
would have been possible to simply turn off kgdb_do_roundup in the custom 
kgdb_roundup_cpus().

The patch you created appears that it will work, but it comes at the cost of 
some complexity because you are also checking on the state of 
"kgdb_info[cpu].send_ready" in some other location in the NMI handler.  It 
might be better to consider not sending a nested NMI if all the CPUs are going 
to enter anyway in the master state.

>
> The new argument "send_ready" was added for KGDB to signal the NMI handler
> to release the slave CPUs for entry into KGDB.
>
> Signed-off-by: Mike Travis 
> Reviewed-by: Dimitri Sivanich 
> Reviewed-by: Hedi Berriche 
> ---
>  include/linux/kgdb.h  |1 +
>  kernel/debug/debug_core.c |   41 +
>  kernel/debug/debug_core.h |1 +
>  3 files changed, 43 insertions(+)
>
> --- linux.orig/include/linux/kgdb.h
> +++ linux/include/linux/kgdb.h
> @@ -310,6 +310,7 @@ extern int
>  kgdb_handle_exception(int ex_vector, int signo, int err_code,
>struct pt_regs *regs);
>  extern int kgdb_nmicallback(int cpu, void *regs);
> +extern int kgdb_nmicallin(int cpu, int trapnr, void *regs, atomic_t 
> *snd_rdy);
>  extern void gdbstub_exit(int status);
> 
>  extern intkgdb_single_step;
> --- linux.orig/kernel/debug/debug_core.c
> +++ linux/kernel/debug/debug_core.c
> @@ -578,6 +578,10 @@ return_normal:
>  /* Signal the other CPUs to enter kgdb_wait() */
>  if ((!kgdb_single_step) && kgdb_do_roundup)
>  kgdb_roundup_cpus(flags);
> +
> +/* If optional send ready pointer, signal CPUs to proceed */
> +if (kgdb_info[cpu].send_ready)
> +atomic_set(kgdb_info[cpu].send_ready, 1);
>  #endif
> 
>  /*
> @@ -729,6 +733,43 @@ int kgdb_nmicallback(int cpu, void *regs
>  return 0;
>  }
>  #endif
> +return 1;
> +}
> +
> +int kgdb_nmicallin(int cpu, int trapnr, void *regs, atomic_t *send_ready)
> +{
> +#ifdef CONFIG_SMP
> +if (!kgdb_io_ready(0))
> +return 1;
> +
> +if (kgdb_info[cpu].enter_kgdb == 0) {
> +struct kgdb_state kgdb_var;
> +struct kgdb_state *ks = &kgdb_var;
> +int save_kgdb_do_roundup = kgdb_do_roundup;
> +
> +memset(ks, 0, sizeof(struct kgdb_state));
> +ks->cpu= cpu;
> +ks->ex_vector= trapnr;
> +ks->signo= SIGTRAP;
> +ks->err_code= 0;
> +ks->kgdb_usethreadid= 0;
> +ks->linux_regs= regs;
> +
> +/* Do not broadcast NMI */
> +kgdb_do_roundup = 0;
> +
> +/* Indicate there are slaves waiting */
> +kgdb_info[cpu].send_ready = send_ready;
> +kgdb_cpu_enter(ks, regs, DCPU_WANT_MASTER);

This is the one part of the patch I don't quite understand.  Why does the 
kgdb_nmicallin() desire to be the master core?

It was not obvious the circumstance as to why this is called.  Is it some kind 
of watch dog where you really do want to enter the debugger or is it more to 
deal with nested slave interrupts were the round up would have possibly hung on 
this hardware.  If it is the later, I would have thought this should be a slave 
and not the master.

Perhaps a comment in the code can clear this up?

Thanks,
Jason.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ethernet/arc/arc_emac: optimize the Tx/Tx-reclaim paths a bit

2013-09-05 Thread Vineet Gupta

Hi David,

On 09/05/2013 11:54 PM, David Miller wrote:
> From: Vineet Gupta 
> Date: Wed, 4 Sep 2013 18:33:11 +0530
>
>> This came out of staring at code due to recent performance fix.
>>
>> * TX BD reclaim can call netif_wake_queue() once, outside the loop if
>>   one/more BDs were freed, NO need to do this each iteration.
>>
>> * TX need not look at next BD to stop the netif queue. It rather be done
>>   in the next tx call, when it actually fails as the queue seldom gets
>>   full but the check nevertheless needs to be done for each packet Tx.
>>   Profiled this under heavy traffic (big tar file cp, LMBench betworking
>>   tests) and saw not a single hit to that code.
>>
>> Signed-off-by: Vineet Gupta 
> You should keep the check in the transmit queueing code as a BUG check,
> almost every driver has code of the form (using NIU as an example):
>
>   if (niu_tx_avail(rp) <= (skb_shinfo(skb)->nr_frags + 1)) {
>   netif_tx_stop_queue(txq);
>   dev_err(np->device, "%s: BUG! Tx ring full when queue 
> awake!\n", dev->name);
>   rp->tx_errors++;
>   return NETDEV_TX_BUSY;
>   }
>
> and arc_emac should too.
>
> Otherwise queue management bugs are incredibly hard to diagnose.
>
> I'm not applying this patch.

The check is already there for current BD. What I removed was checking for next 
BD
too (please see below). IMHO this is useless since it will be done in next
iteration anyways. In my tests, the next check never got hit, so it was waste of
cycles.

static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev)
{
if (unlikely((le32_to_cpu(*info) & OWN_MASK) == FOR_EMAC)) {
netif_stop_queue(ndev);
return NETDEV_TX_BUSY;
}

...
*txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;

-   /* Get "info" of the next BD */
-   info = &priv->txbd[*txbd_curr].info;
-
-   /* Check if if Tx BD ring is full - next BD is still owned by EMAC */
-   if (unlikely((le32_to_cpu(*info) & OWN_MASK) == FOR_EMAC))
-   netif_stop_queue(ndev);

OTOH, I do see a slight stats update issue - if the queue is stopped (but pkt 
not
dropped) we are failing to increment tx_errors. But that would be a separate 
patch.

-Vineet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] net: stmmac: fix bad merge conflict resolution

2013-09-05 Thread Stephen Rothwell

Hi all,

On Thu, 05 Sep 2013 22:58:17 -0400 (EDT) David Miller  
wrote:
>
> From: Olof Johansson 
> Date: Thu,  5 Sep 2013 18:01:41 -0700
> 
> > Merge commit 06c54055bebf919249aa1eb68312887c3cfe77b4 did a bad conflict
> > resolution accidentally leaving out a closing brace. Add it back.
> > 
> > Signed-off-by: Olof Johansson 
> > ---
> > 
> > This breaks a handful of defconfigs on ARM, so it'd be good to see it
> > applied pretty quickly. Thanks!
> 
> Looks like Linus applied this, thanks Olof.

And I cherry-picked it into linux-next for today.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgputkdoVt9ya.pgp
Description: PGP signature

[PATCH 0/5] Squashfs: extra sanity checks and sanity check fixes

2013-09-05 Thread Phillip Lougher

Hi,

Following on from the "Squashfs: sanity check information from disk"
patch from Dan Carpenter, I have added a couple more sanity checks,
and fixed a couple of existing sanity checks (including the patch from
Dan Carpenter).

These sanity checks mainly exist to trap maliciously corrupted
filesystems either through using a deliberately modified mksquashfs,
or where the user has deliberately chosen to generate uncompressed
metadata and then corrupted it.

Normally metadata in Squashfs filesystems is compressed, which means
corruption (either accidental or malicious) is detected when
trying to decompress the metadata.  So corrupted data does not normally
get as far as the code paths in question here.

Phillip Lougher (5):
  Squashfs: fix corruption check in get_dir_index_using_name()
  Squashfs: fix corruption checks in squashfs_lookup()
  Squashfs: fix corruption checks in squashfs_readdir()
  Squashfs: add corruption check in get_dir_index_using_offset()
  Squashfs: add corruption check for type in squashfs_readdir()

 fs/squashfs/dir.c | 17 +
 fs/squashfs/namei.c   |  7 +++
 fs/squashfs/squashfs_fs.h |  5 -
 3 files changed, 20 insertions(+), 9 deletions(-)

-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/5] Squashfs: fix corruption checks in squashfs_lookup()

2013-09-05 Thread Phillip Lougher

The dir_count and size fields when read from disk are sanity
checked for correctness.  However, the sanity checks only check the
values are not greater than expected.  As dir_count and size were
incorrectly defined as signed ints, this can lead to corrupted values
appearing as negative which are not trapped.

Signed-off-by: Phillip Lougher 
---
 fs/squashfs/namei.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/squashfs/namei.c b/fs/squashfs/namei.c
index 342a5aa..67cad77 100644
--- a/fs/squashfs/namei.c
+++ b/fs/squashfs/namei.c
@@ -147,7 +147,8 @@ static struct dentry *squashfs_lookup(struct inode *dir, 
struct dentry *dentry,
struct squashfs_dir_entry *dire;
u64 block = squashfs_i(dir)->start + msblk->directory_table;
int offset = squashfs_i(dir)->offset;
-   int err, length, dir_count, size;
+   int err, length;
+   unsigned int dir_count, size;
 
TRACE("Entered squashfs_lookup [%llx:%x]\n", block, offset);
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/5] Squashfs: add corruption check in get_dir_index_using_offset()

2013-09-05 Thread Phillip Lougher

We read the size (of the name) field from disk.  This value should
be sanity checked for correctness to avoid blindly reading
huge amounts of unnecessary data from disk on corruption.

Note, here we're not actually reading the name into a buffer, but
skipping it, and so corruption doesn't cause buffer overflow, merely
lots of unnecessary amounts of data to be read.

Signed-off-by: Phillip Lougher 
---
 fs/squashfs/dir.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/squashfs/dir.c b/fs/squashfs/dir.c
index 1192084..bd7155b 100644
--- a/fs/squashfs/dir.c
+++ b/fs/squashfs/dir.c
@@ -54,6 +54,7 @@ static int get_dir_index_using_offset(struct super_block *sb,
 {
struct squashfs_sb_info *msblk = sb->s_fs_info;
int err, i, index, length = 0;
+   unsigned int size;
struct squashfs_dir_index dir_index;
 
TRACE("Entered get_dir_index_using_offset, i_count %d, f_pos %lld\n",
@@ -81,8 +82,14 @@ static int get_dir_index_using_offset(struct super_block *sb,
 */
break;
 
+   size = le32_to_cpu(dir_index.size) + 1;
+
+   /* size should never be larger than SQUASHFS_NAME_LEN */
+   if (size > SQUASHFS_NAME_LEN)
+   break;
+
err = squashfs_read_metadata(sb, NULL, &index_start,
-   &index_offset, le32_to_cpu(dir_index.size) + 1);
+   &index_offset, size);
if (err < 0)
break;
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/5] Squashfs: fix corruption checks in squashfs_readdir()

2013-09-05 Thread Phillip Lougher

The dir_count and size fields when read from disk are sanity
checked for correctness.  However, the sanity checks only check the
values are not greater than expected.  As dir_count and size were
incorrectly defined as signed ints, this can lead to corrupted values
appearing as negative which are not trapped.

Signed-off-by: Phillip Lougher 
---
 fs/squashfs/dir.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/squashfs/dir.c b/fs/squashfs/dir.c
index f7f527b..1192084 100644
--- a/fs/squashfs/dir.c
+++ b/fs/squashfs/dir.c
@@ -105,9 +105,8 @@ static int squashfs_readdir(struct file *file, struct 
dir_context *ctx)
struct inode *inode = file_inode(file);
struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
u64 block = squashfs_i(inode)->start + msblk->directory_table;
-   int offset = squashfs_i(inode)->offset, length, dir_count, size,
-   type, err;
-   unsigned int inode_number;
+   int offset = squashfs_i(inode)->offset, length, type, err;
+   unsigned int inode_number, dir_count, size;
struct squashfs_dir_header dirh;
struct squashfs_dir_entry *dire;
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/5] Squashfs: add corruption check for type in squashfs_readdir()

2013-09-05 Thread Phillip Lougher

We read the type field from disk.  This value should be sanity
checked for correctness to avoid an out of bounds access when
reading the squashfs_filetype_table array.

Signed-off-by: Phillip Lougher 
---
 fs/squashfs/dir.c | 7 +--
 fs/squashfs/squashfs_fs.h | 5 -
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/squashfs/dir.c b/fs/squashfs/dir.c
index bd7155b..d8c2d74 100644
--- a/fs/squashfs/dir.c
+++ b/fs/squashfs/dir.c
@@ -112,8 +112,8 @@ static int squashfs_readdir(struct file *file, struct 
dir_context *ctx)
struct inode *inode = file_inode(file);
struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
u64 block = squashfs_i(inode)->start + msblk->directory_table;
-   int offset = squashfs_i(inode)->offset, length, type, err;
-   unsigned int inode_number, dir_count, size;
+   int offset = squashfs_i(inode)->offset, length, err;
+   unsigned int inode_number, dir_count, size, type;
struct squashfs_dir_header dirh;
struct squashfs_dir_entry *dire;
 
@@ -206,6 +206,9 @@ static int squashfs_readdir(struct file *file, struct 
dir_context *ctx)
((short) le16_to_cpu(dire->inode_number));
type = le16_to_cpu(dire->type);
 
+   if (type > SQUASHFS_MAX_DIR_TYPE)
+   goto failed_read;
+
if (!dir_emit(ctx, dire->name, size,
inode_number,
squashfs_filetype_table[type]))
diff --git a/fs/squashfs/squashfs_fs.h b/fs/squashfs/squashfs_fs.h
index 9e2349d..4b2beda 100644
--- a/fs/squashfs/squashfs_fs.h
+++ b/fs/squashfs/squashfs_fs.h
@@ -87,7 +87,7 @@
 #define SQUASHFS_COMP_OPTS(flags)  SQUASHFS_BIT(flags, \
SQUASHFS_COMP_OPT)
 
-/* Max number of types and file types */
+/* Inode types including extended types */
 #define SQUASHFS_DIR_TYPE  1
 #define SQUASHFS_REG_TYPE  2
 #define SQUASHFS_SYMLINK_TYPE  3
@@ -103,6 +103,9 @@
 #define SQUASHFS_LFIFO_TYPE13
 #define SQUASHFS_LSOCKET_TYPE  14
 
+/* Max type value stored in directory entry */
+#define SQUASHFS_MAX_DIR_TYPE  7
+
 /* Xattr types */
 #define SQUASHFS_XATTR_USER 0
 #define SQUASHFS_XATTR_TRUSTED  1
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/5] Squashfs: fix corruption check in get_dir_index_using_name()

2013-09-05 Thread Phillip Lougher

Patch "Squashfs: sanity check information from disk" from
Dan Carpenter adds a missing check for corruption in the
"size" field while reading the directory index from disk.

It, however, sets err to -EINVAL, this value is not used later, and
so setting it is completely redundant.  So remove it.

Errors in reading the index are deliberately non-fatal.  If we
get an error in reading the index we just return the part of the
index we have managed to read - the index isn't essential,
just quicker.

Signed-off-by: Phillip Lougher 
---
 fs/squashfs/namei.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/squashfs/namei.c b/fs/squashfs/namei.c
index f866d42..342a5aa 100644
--- a/fs/squashfs/namei.c
+++ b/fs/squashfs/namei.c
@@ -104,10 +104,8 @@ static int get_dir_index_using_name(struct super_block *sb,
 
 
size = le32_to_cpu(index->size) + 1;
-   if (size > SQUASHFS_NAME_LEN) {
-   err = -EINVAL;
+   if (size > SQUASHFS_NAME_LEN)
break;
-   }
 
err = squashfs_read_metadata(sb, index->name, &index_start,
&index_offset, size);
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ftrace 'failed to modify' bug when loading reiserfs.ko

2013-09-05 Thread Dave Jones

On Thu, Sep 05, 2013 at 09:51:54PM -0400, Steven Rostedt wrote:
 > On Thu, 5 Sep 2013 21:48:59 -0400
 > Dave Jones  wrote:
 > 
 > > On Thu, Sep 05, 2013 at 09:44:55PM -0400, Steven Rostedt wrote:
 > >  > On Thu, 5 Sep 2013 21:34:55 -0400
 > >  > Dave Jones  wrote:
 > >  > 
 > >  > > On Thu, Sep 05, 2013 at 09:28:34PM -0400, Steven Rostedt wrote:
 > 
 > >  > Did you change a config option, or update your gcc?
 > > 
 > > Yeah, changed CONFIG_DEBUG_KOBJECT, which rebuilt the world.
 > 
 > Still doesn't explain why it gave you that splat there.
 > 
 > Do you still have that binary module, and can you show me what's at
 > reiserfs_init_bitmap_cache+0x0 with objdump?

I didn't, but it turns out I can recreate this. A little convoluted but..

disable DEBUG_KOBJECT_RELEASE
build, install and boot into kernel

enable DEBUG_KOBJECT_RELEASE
build kernel
install -> boom


28b0 :

return bh;
}

int reiserfs_init_bitmap_cache(struct super_block *sb)
{
28b0:   e8 00 00 00 00  callq  28b5 

28b5:   55  push   %rbp

/* Don't trust REISERFS_SB(sb)->s_bmap_nr, it's a u16
 * which overflows on large file systems. */
static inline __u32 reiserfs_bmap_count(struct super_block *sb)
{
return (SB_BLOCK_COUNT(sb) - 1) / (sb->s_blocksize * 8) + 1;
28b6:   31 d2   xor%edx,%edx
28b8:   48 89 e5mov%rsp,%rbp
28bb:   41 54   push   %r12
28bd:   53  push   %rbx
28be:   48 89 fbmov%rdi,%rbx
28c1:   48 8b 87 50 07 00 00mov0x750(%rdi),%rax
28c8:   48 8b 77 18 mov0x18(%rdi),%rsi
28cc:   48 8b 40 08 mov0x8(%rax),%rax
28d0:   48 8d 0c f5 00 00 00lea0x0(,%rsi,8),%rcx
28d7:   00 
28d8:   8b 00   mov(%rax),%eax
28da:   83 e8 01sub$0x1,%eax
28dd:   48 f7 f1div%rcx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] mfd: rtsx: Modify rts5249_optimize_phy

2013-09-05 Thread wei_wang

From: Wei WANG 

In some platforms, specially Thinkpad series, rts5249 won't be
initialized properly. So we need adjust some phy parameters to
improve the compatibility issue.

Signed-off-by: Wei WANG 
---
 drivers/mfd/rts5249.c|   35 --
 include/linux/mfd/rtsx_pci.h |   43 ++
 2 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/drivers/mfd/rts5249.c b/drivers/mfd/rts5249.c
index 3b835f5..7653638 100644
--- a/drivers/mfd/rts5249.c
+++ b/drivers/mfd/rts5249.c
@@ -130,13 +130,44 @@ static int rts5249_optimize_phy(struct rtsx_pcr *pcr)
 {
int err;
 
-   err = rtsx_pci_write_phy_register(pcr, PHY_REG_REV, 0xFE46);
+   err = rtsx_pci_write_phy_register(pcr, PHY_REG_REV, REG_REV_RESV |
+   RXIDLE_LATCHED | P1_EN | RXIDLE_EN | RX_PWST |
+   CLKREQ_DLY_TIMER_1_0 | STOP_CLKRD | STOP_CLKWR);
if (err < 0)
return err;
 
msleep(1);
 
-   return rtsx_pci_write_phy_register(pcr, PHY_BPCR, 0x05C0);
+   err = rtsx_pci_write_phy_register(pcr, PHY_BPCR, IBRXSEL | IBTXSEL |
+   IB_FILTER | CMIRROR_EN);
+   if (err < 0)
+   return err;
+   err = rtsx_pci_write_phy_register(pcr, PHY_PCR, FORCE_CODE |
+   OOBS_CALI_50 | OOBS_VCM_08 | OOBS_SEN_90 | RSSI_EN);
+   if (err < 0)
+   return err;
+   err = rtsx_pci_write_phy_register(pcr, PHY_RCR2, EMPHASE_EN | NADJR |
+   CDR_CP_10 | CDR_SR_2 | FREQSEL_12 | CPADJEN |
+   CDR_SC_8 | CALIB_LATE);
+   if (err < 0)
+   return err;
+   err = rtsx_pci_write_phy_register(pcr, PHY_FLD4, FLDEN_SEL | REQ_REF |
+   RXAMP_OFF | REQ_ADDA | BER_COUNT |
+   BER_TIMER | BER_CHK_EN);
+   if (err < 0)
+   return err;
+   err = rtsx_pci_write_phy_register(pcr, PHY_RDR, RXDSEL_1_9);
+   if (err < 0)
+   return err;
+   err = rtsx_pci_write_phy_register(pcr, PHY_RCR1, ADP_TIME | VCO_COARSE);
+   if (err < 0)
+   return err;
+   err = rtsx_pci_write_phy_register(pcr, PHY_FLD3, TIMER_4 | TIMER_6 |
+   RXDELINK);
+   if (err < 0)
+   return err;
+   return rtsx_pci_write_phy_register(pcr, PHY_TUNE, TUNEREF_1_0 |
+   VBGSEL_1252 | SDBUS_33 | TUNED18 | TUNED12);
 }
 
 static int rts5249_turn_on_led(struct rtsx_pcr *pcr)
diff --git a/include/linux/mfd/rtsx_pci.h b/include/linux/mfd/rtsx_pci.h
index d1382df..de20538 100644
--- a/include/linux/mfd/rtsx_pci.h
+++ b/include/linux/mfd/rtsx_pci.h
@@ -719,16 +719,41 @@
 
 /* Phy register */
 #define PHY_PCR0x00
+#define  FORCE_CODE0xB000
+#define  OOBS_CALI_50  0x0800
+#define  OOBS_VCM_08   0x0200
+#define  OOBS_SEN_90   0x0040
+#define  RSSI_EN   0x0002
 #define PHY_RCR0   0x01
 #define PHY_RCR1   0x02
+#define  ADP_TIME  0x0100
+#define  VCO_COARSE0x001F
 #define PHY_RCR2   0x03
+#define  EMPHASE_EN0x8000
+#define  NADJR 0x4000
+#define  CDR_CP_10 0x0400
+#define  CDR_SR_2  0x0100
+#define  FREQSEL_120x0040
+#define  CPADJEN   0x0020
+#define  CDR_SC_8  0x0008
+#define  CALIB_LATE0x0002
 #define PHY_RTCR   0x04
 #define PHY_RDR0x05
+#define  RXDSEL_1_90x4000
 #define PHY_TCR0   0x06
 #define PHY_TCR1   0x07
 #define PHY_TUNE   0x08
+#define  TUNEREF_1_0   0x4000
+#define  VBGSEL_1252   0x0C00
+#define  SDBUS_33  0x0200
+#define  TUNED18   0x01C0
+#define  TUNED12   0X0020
 #define PHY_IMR0x09
 #define PHY_BPCR   0x0A
+#define  IBRXSEL   0x0400
+#define  IBTXSEL   0x0100
+#define  IB_FILTER 0x0080
+#define  CMIRROR_EN0x0040
 #define PHY_BIST   0x0B
 #define PHY_RAW_L  0x0C
 #define PHY_RAW_H  0x0D
@@ -744,11 +769,29 @@
 #define PHY_BRNR2  0x17
 #define PHY_BENR   0x18
 #define PHY_REG_REV0x19
+#define  REG_REV_RESV  0xE000
+#define  RXIDLE_LATCHED0x1000
+#define  P1_EN 0x0800
+#define  RXIDLE_EN 0x0400
+#define  CLKREQ_DLY_TIMER_1_0  0x0040
+#define  STO

Re: [PATCH] perf mem: add priv level filtering support

2013-09-05 Thread Andi Kleen

> But my worry here is about consistency accross tools for the single
> letter options, so perhaps if you could use:
> 
>  -U   collect only user level samples
>  -K   collect only kernel level samples

Support for this would be nice for perf stat too, to use with 
the implicit events (using by -d, soon -T etc.)

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] mfd: rtsx: Modify rts5249_optimize_phy

2013-09-05 Thread wei_wang

From: Wei WANG 

v2:
Name those new-added register values

Wei WANG (1):
  mfd: rtsx: Modify rts5249_optimize_phy

 drivers/mfd/rts5249.c|   35 --
 include/linux/mfd/rtsx_pci.h |   43 ++
 2 files changed, 76 insertions(+), 2 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] perf, x86: Avoid checkpointed counters causing excessive TSX aborts v5

2013-09-05 Thread Andi Kleen

From: Andi Kleen 

With checkpointed counters there can be a situation where the counter
is overflowing, aborts the transaction, is set back to a non overflowing
checkpoint, causes interupt. The interrupt doesn't see the overflow
because it has been checkpointed.  This is then a spurious PMI, typically with
a ugly NMI message.  It can also lead to excessive aborts.

Avoid this problem by:
- Using the full counter width for counting counters (earlier patch)
- Forbid sampling for checkpointed counters. It's not too useful anyways,
checkpointing is mainly for counting. The check is approximate
(to still handle KVM), but should catch the majority of cases.
- On a PMI always set back checkpointed counters to zero.

v2: Add unlikely. Add comment
v3: Allow large sampling periods with CP for KVM
v4: Use event_is_checkpointed. Use EOPNOTSUPP. (Stephane Eranian)
v5: Remove comment.
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c | 37 ++
 1 file changed, 37 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index a45d8d4..91e3f8c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1134,6 +1134,11 @@ static void intel_pmu_enable_event(struct perf_event 
*event)
__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
 }
 
+static inline bool event_is_checkpointed(struct perf_event *event)
+{
+   return (event->hw.config & HSW_IN_TX_CHECKPOINTED) != 0;
+}
+
 /*
  * Save and restart an expired event. Called by NMI contexts,
  * so it has to be careful about preempting normal event ops:
@@ -1141,6 +1146,17 @@ static void intel_pmu_enable_event(struct perf_event 
*event)
 int intel_pmu_save_and_restart(struct perf_event *event)
 {
x86_perf_event_update(event);
+   /*
+* For a checkpointed counter always reset back to 0.  This
+* avoids a situation where the counter overflows, aborts the
+* transaction and is then set back to shortly before the
+* overflow, and overflows and aborts again.
+*/
+   if (unlikely(event_is_checkpointed(event))) {
+   /* No race with NMIs because the counter should not be armed */
+   wrmsrl(event->hw.event_base, 0);
+   local64_set(&event->hw.prev_count, 0);
+   }
return x86_perf_event_set_period(event);
 }
 
@@ -1224,6 +1240,13 @@ again:
x86_pmu.drain_pebs(regs);
}
 
+   /*
+* To avoid spurious interrupts with perf stat always reset checkpointed
+* counters.
+*/
+   if (cpuc->events[2] && event_is_checkpointed(cpuc->events[2]))
+   status |= (1ULL << 2);
+
for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
 
@@ -1689,6 +1712,20 @@ static int hsw_hw_config(struct perf_event *event)
  event->attr.precise_ip > 0))
return -EOPNOTSUPP;
 
+   if (event_is_checkpointed(event)) {
+   /*
+* Sampling of checkpointed events can cause situations where
+* the CPU constantly aborts because of a overflow, which is
+* then checkpointed back and ignored. Forbid checkpointing
+* for sampling.
+*
+* But still allow a long sampling period, so that perf stat
+* from KVM works.
+*/
+   if (event->attr.sample_period > 0 &&
+   event->attr.sample_period < 0x7fff)
+   return -EOPNOTSUPP;
+   }
return 0;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] perf, x86: Report TSX transaction abort cost as weight v3

2013-09-05 Thread Andi Kleen

From: Andi Kleen 

Use the existing weight reporting facility to report the transaction
abort cost, that is the number of cycles wasted in aborts.
Haswell reports this in the PEBS record.

This was in fact the original user for weight.

This is a very useful sort key to concentrate on the most
costly aborts and a good metric for TSX tuning.

v2: Add Peter's changes with minor modifications. More comments.
v3: Adjust white space.
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c | 55 +++
 1 file changed, 42 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 3065c57..d4ed99f 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -182,16 +182,29 @@ struct pebs_record_nhm {
  * Same as pebs_record_nhm, with two additional fields.
  */
 struct pebs_record_hsw {
-   struct pebs_record_nhm nhm;
-   /*
-* Real IP of the event. In the Intel documentation this
-* is called eventingrip.
-*/
-   u64 real_ip;
-   /*
-* TSX tuning information field: abort cycles and abort flags.
-*/
-   u64 tsx_tuning;
+   u64 flags, ip;
+   u64 ax, bx, cx, dx;
+   u64 si, di, bp, sp;
+   u64 r8,  r9,  r10, r11;
+   u64 r12, r13, r14, r15;
+   u64 status, dla, dse, lat;
+   u64 real_ip; /* the actual eventing ip */
+   u64 tsx_tuning; /* TSX abort cycles and flags */
+};
+
+union hsw_tsx_tuning {
+   struct {
+   u32 cycles_last_block : 32,
+   hle_abort : 1,
+   rtm_abort : 1,
+   instruction_abort : 1,
+   non_instruction_abort : 1,
+   retry : 1,
+   data_conflict : 1,
+   capacity_writes   : 1,
+   capacity_reads: 1;
+   };
+   u64 value;
 };
 
 void init_debug_store_on_cpu(int cpu)
@@ -759,16 +772,26 @@ static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
return 0;
 }
 
+static inline u64 intel_hsw_weight(struct pebs_record_hsw *pebs)
+{
+   if (pebs->tsx_tuning) {
+   union hsw_tsx_tuning tsx = { .value = pebs->tsx_tuning };
+   return tsx.cycles_last_block;
+   }
+   return 0;
+}
+
 static void __intel_pmu_pebs_event(struct perf_event *event,
   struct pt_regs *iregs, void *__pebs)
 {
/*
 * We cast to pebs_record_nhm to get the load latency data
 * if extra_reg MSR_PEBS_LD_LAT_THRESHOLD used
+* We cast to the biggest PEBS record are careful not
+* to access out-of-bounds members.
 */
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-   struct pebs_record_nhm *pebs = __pebs;
-   struct pebs_record_hsw *pebs_hsw = __pebs;
+   struct pebs_record_hsw *pebs = __pebs;
struct perf_sample_data data;
struct pt_regs regs;
u64 sample_type;
@@ -827,7 +850,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
regs.sp = pebs->sp;
 
if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format >= 2) {
-   regs.ip = pebs_hsw->real_ip;
+   regs.ip = pebs->real_ip;
regs.flags |= PERF_EFLAGS_EXACT;
} else if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip(®s))
regs.flags |= PERF_EFLAGS_EXACT;
@@ -838,6 +861,12 @@ static void __intel_pmu_pebs_event(struct perf_event 
*event,
x86_pmu.intel_cap.pebs_format >= 1)
data.addr = pebs->dla;
 
+   /* Only set the TSX weight when no memory weight was requested. */
+   if ((event->attr.sample_type & PERF_SAMPLE_WEIGHT) &&
+   !fll &&
+   (x86_pmu.intel_cap.pebs_format >= 2))
+   data.weight = intel_hsw_weight(pebs);
+
if (has_branch_stack(event))
data.br_stack = &cpuc->lbr_stack;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] perf, x86: Add Haswell TSX event aliases v6

2013-09-05 Thread Andi Kleen

From: Andi Kleen 

Add TSX event aliases, and export them from the kernel to perf.

These are used by perf stat -T and to allow
more user friendly access to events. The events are designed to
be fairly generic and may also apply to other architectures
implementing HTM.  They all cover common situations that
happens during tuning of transactional code.

For Haswell we have to separate the HLE and RTM events,
as they are separate in the PMU.

This adds the following events.

tx-startCount start transaction (used by perf stat -T)
tx-commit   Count commit of transaction
tx-abortCount all aborts
tx-conflict Count aborts due to conflict with another CPU.
tx-capacity Count capacity aborts (transaction too large)

Then matching el-* events for HLE

cycles-tTransactional cycles (used by perf stat -T)
* also exists on POWER8
cycles-ct   Transactional cycles commited (used by perf stat -T)
* according to Michael Ellerman POWER8 has a cycles-transactional-committed,
* perf stat -T handles both cases

Note for useful abort profiling often precise has to be set,
as Haswell can only report the point inside the transaction
with precise=2.

(I had another patchkit to allow exporting precise too, but Vince
Weaver pointed out it violates the ABI, so dropped now)

For some classes of aborts, like conflicts, this is not needed,
as it makes more sense to look at the complete critical section.

This gives a clean set of generalized events to examine transaction
success and aborts. Haswell has additional events for TSX, but those are more
specialized for very specific situations.

v2: Move to new sysfs infrastructure
v3: Use own sysfs functions now
v4: Add tx/el-abort-return for better conflict sampling
v5: Different white space.
v6: Cut down events, rewrite description.
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 91e3f8c..da58663 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2074,7 +2074,34 @@ static __init void intel_nehalem_quirk(void)
 EVENT_ATTR_STR(mem-loads,  mem_ld_hsw, "event=0xcd,umask=0x1,ldlat=3");
 EVENT_ATTR_STR(mem-stores, mem_st_hsw, "event=0xd0,umask=0x82")
 
+/* Haswell special events */
+EVENT_ATTR_STR(tx-start,tx_start,   "event=0xc9,umask=0x1");
+EVENT_ATTR_STR(tx-commit,   tx_commit,  "event=0xc9,umask=0x2");
+EVENT_ATTR_STR(tx-abort,tx_abort,  "event=0xc9,umask=0x4");
+EVENT_ATTR_STR(tx-capacity, tx_capacity,   "event=0x54,umask=0x2");
+EVENT_ATTR_STR(tx-conflict, tx_conflict,   "event=0x54,umask=0x1");
+EVENT_ATTR_STR(el-start,el_start,   "event=0xc8,umask=0x1");
+EVENT_ATTR_STR(el-commit,   el_commit,  "event=0xc8,umask=0x2");
+EVENT_ATTR_STR(el-abort,el_abort,  "event=0xc8,umask=0x4");
+EVENT_ATTR_STR(el-capacity, el_capacity,"event=0x54,umask=0x2");
+EVENT_ATTR_STR(el-conflict, el_conflict,"event=0x54,umask=0x1");
+EVENT_ATTR_STR(cycles-t,cycles_t,   "event=0x3c,in_tx=1");
+EVENT_ATTR_STR(cycles-ct,   cycles_ct,
+   "event=0x3c,in_tx=1,in_tx_cp=1");
+
 static struct attribute *hsw_events_attrs[] = {
+   EVENT_PTR(tx_start),
+   EVENT_PTR(tx_commit),
+   EVENT_PTR(tx_abort),
+   EVENT_PTR(tx_capacity),
+   EVENT_PTR(tx_conflict),
+   EVENT_PTR(el_start),
+   EVENT_PTR(el_commit),
+   EVENT_PTR(el_abort),
+   EVENT_PTR(el_capacity),
+   EVENT_PTR(el_conflict),
+   EVENT_PTR(cycles_t),
+   EVENT_PTR(cycles_ct),
EVENT_PTR(mem_ld_hsw),
EVENT_PTR(mem_st_hsw),
NULL
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

perf, x86: Add parts of the remaining haswell PMU functionality v5

2013-09-05 Thread Andi Kleen

I hope this version is ok for everyone now.

[v2: Added Peter's changes to the PEBS handler]
[v3: Addressed Arnaldo's feedback for the perf stat -T change
 and avoid conflict]
[v4: Remove XXX comment in checkpoint patch.
 Add Arnaldo's ack for tools patch]
[v5: Some white space adjustments]

Add some more TSX functionality to the basic Haswell PMU.

A lot of the infrastructure needed for these patches has
been merged earlier, so it is all quite straight forward
now.

- Add the checkpointed counter workaround.
(Parts of this have been already merged earlier)
- Add support for reporting PEBS transaction abort cost as weight.
This is useful to judge the cost of aborts and concentrate
on expensive ones first.
(Large parts of this have been already merged earlier,
this is just adding the final few lines to the PEBS handler)
- Add TSX event aliases, needed for perf stat -T and general
usability.
(Infrastructure also already in)
- Add perf stat -T support to give a user friendly highlevel
counting frontend for transaction..
This version should also be usable for POWER8 eventually.

Not included:

Support for transaction flags and TSX LBR flags.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] perf, tools: Add perf stat --transaction v5

2013-09-05 Thread Andi Kleen

From: Andi Kleen 

Add support to perf stat to print the basic transactional execution statistics:
Total cycles, Cycles in Transaction, Cycles in aborted transsactions
using the in_tx and in_tx_checkpoint qualifiers.
Transaction Starts and Elision Starts, to compute the average transaction
length.

This is a reasonable overview over the success of the transactions.

Also support architectures that have a transaction aborted cycles
counter like POWER8. Since that is awkward to handle in the kernel
abstract handle both cases here.

Enable with a new --transaction / -T option.

This requires measuring these events in a group, since they depend on each
other.

This is implemented by using TM sysfs events exported by the kernel

v2: Only print the extended statistics when the option is enabled.
This avoids negative output when the user specifies the -T events
in separate groups.
v3: Port to latest tree
v4: Remove merge error. Avoid linear walks for comparisons. Check
transaction_run earlier. Minor fixes.
v5: Move option to avoid conflict. Improve description.
Acked-by: Arnaldo Carvalho de Melo 
Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-stat.txt |   5 ++
 tools/perf/builtin-stat.c  | 144 -
 tools/perf/util/evsel.h|   6 ++
 tools/perf/util/pmu.c  |  16 
 tools/perf/util/pmu.h  |   1 +
 5 files changed, 171 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 2fe87fb..40bc65a 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -132,6 +132,11 @@ is a useful mode to detect imbalance between physical 
cores.  To enable this mod
 use --per-core in addition to -a. (system-wide).  The output includes the
 core number and the number of online logical processors on that physical 
processor.
 
+-T::
+--transaction::
+
+Print statistics of transactional execution if supported.
+
 EXAMPLES
 
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 352fbd7..6bd90e4 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -46,6 +46,7 @@
 #include "util/util.h"
 #include "util/parse-options.h"
 #include "util/parse-events.h"
+#include "util/pmu.h"
 #include "util/event.h"
 #include "util/evlist.h"
 #include "util/evsel.h"
@@ -70,6 +71,41 @@ static void print_counter_aggr(struct perf_evsel *counter, 
char *prefix);
 static void print_counter(struct perf_evsel *counter, char *prefix);
 static void print_aggr(char *prefix);
 
+/* Default events used for perf stat -T */
+static const char * const transaction_attrs[] = {
+   "task-clock",
+   "{"
+   "instructions,"
+   "cycles,"
+   "cpu/cycles-t/,"
+   "cpu/tx-start/,"
+   "cpu/el-start/,"
+   "cpu/cycles-ct/"
+   "}"
+};
+
+/* More limited version when the CPU does not have all events. */
+static const char * const transaction_limited_attrs[] = {
+   "task-clock",
+   "{"
+   "instructions,"
+   "cycles,"
+   "cpu/cycles-t/,"
+   "cpu/tx-start/"
+   "}"
+};
+
+/* must match transaction_attrs and the beginning limited_attrs */
+enum {
+   T_TASK_CLOCK,
+   T_INSTRUCTIONS,
+   T_CYCLES,
+   T_CYCLES_IN_TX,
+   T_TRANSACTION_START,
+   T_ELISION_START,
+   T_CYCLES_IN_TX_CP,
+};
+
 static struct perf_evlist  *evsel_list;
 
 static struct perf_target  target = {
@@ -90,6 +126,7 @@ static enum aggr_modeaggr_mode   
= AGGR_GLOBAL;
 static volatile pid_t  child_pid   = -1;
 static boolnull_run=  false;
 static int detailed_run=  0;
+static booltransaction_run;
 static boolbig_num =  true;
 static int big_num_opt =  -1;
 static const char  *csv_sep= NULL;
@@ -213,7 +250,10 @@ static struct stats runtime_l1_icache_stats[MAX_NR_CPUS];
 static struct stats runtime_ll_cache_stats[MAX_NR_CPUS];
 static struct stats runtime_itlb_cache_stats[MAX_NR_CPUS];
 static struct stats runtime_dtlb_cache_stats[MAX_NR_CPUS];
+static struct stats runtime_cycles_in_tx_stats[MAX_NR_CPUS];
 static struct stats walltime_nsecs_stats;
+static struct stats runtime_transaction_stats[MAX_NR_CPUS];
+static struct stats runtime_elision_stats[MAX_NR_CPUS];
 
 static void perf_stat__reset_stats(struct perf_evlist *evlist)
 {
@@ -235,6 +275,11 @@ static void perf_stat__reset_stats(struct perf_evlist 
*evlist)
memset(runtime_ll_cache_stats, 0, sizeof(runtime_ll_cache_stats));
memset(runtime_itlb_cache_stats, 0, sizeof(runtime_itlb_cache_stats));
memset(runtime_dtlb_cache_stats, 0, sizeof(runtime_dtlb_cache_stats));
+   memset(runtime_

Re: soft lockup in sysvipc code.

2013-09-05 Thread Lin Ming

On Thu, Sep 5, 2013 at 5:50 AM, Dave Jones  wrote:
> Haven't seen this before.
> Tree based on v3.11-3104-gf357a82
>
> BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:25479]

Can't imagine how it could happen.
In my understanding, "soft lockup" happens when code stuck at
somewhere with preemption disabled.

Look at the code, preemption disabled at:
sysvipc_proc_next -> sysvipc_find_ipc -> ipc_lock_by_ptr

enabled at:
sysvipc_proc_next -> ipc_unlock
or
sysvipc_proc_stop -> ipc_unlock

And I didn't find code may stuck in the path.
I may miss something ..

Regards,
Lin Ming

> Modules linked in: sctp snd_seq_dummy fuse dlci rfcomm tun bnep hidp ipt_ULOG 
> nfnetlink can_raw can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 
> phonet af_rxrpc bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda 
> crc_ccitt rds af_key rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc 
> ax25 xfs snd_hda_codec_realtek libcrc32c snd_hda_intel snd_hda_codec 
> snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd 
> soundcore pcspkr usb_debug e1000e ptp pps_core
> irq event stamp: 1143030
> hardirqs last  enabled at (1143029): [] 
> restore_args+0x0/0x30
> hardirqs last disabled at (1143030): [] 
> apic_timer_interrupt+0x6a/0x80
> softirqs last  enabled at (1143028): [] 
> __do_softirq+0x198/0x460
> softirqs last disabled at (1143023): [] irq_exit+0x135/0x150
> CPU: 0 PID: 25479 Comm: trinity-child0 Not tainted 3.11.0+ #44
> task: 88022c013f90 ti: 88022bd8c000 task.ti: 88022bd8c000
> RIP: 0010:[]  [] 
> idr_find_slowpath+0x9b/0x150
> RSP: 0018:88022bd8dc88  EFLAGS: 0206
> RAX: 0006 RBX: 000a6c0a RCX: 0008
> RDX: 0008 RSI: 81c41040 RDI: 88022c014668
> RBP: 88022bd8dca0 R08:  R09: 
> R10: 0001 R11: 0001 R12: 88023831a290
> R13: 0001 R14: 88022bd8dbe8 R15: 8802449d
> FS:  7fcfcad2c740() GS:88024480() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7fcfc84cb968 CR3: 0001de93f000 CR4: 001407f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Stack:
>  0260 2dba 81c7e258 88022bd8dcf8
>  812b1131 88022c013f90 8801d37174c0 88022bd8dd38
>  81c7e2f0 88022bd8dd38 8801e065cec8 880241d86ca8
> Call Trace:
>  [] sysvipc_find_ipc+0x61/0x300
>  [] sysvipc_proc_next+0x46/0xd0
>  [] traverse.isra.7+0xc9/0x260
>  [] ? lock_release_non_nested+0x308/0x350
>  [] seq_read+0x3e1/0x450
>  [] ? proc_reg_write+0x80/0x80
>  [] proc_reg_read+0x3d/0x80
>  [] do_loop_readv_writev+0x63/0x90
>  [] do_readv_writev+0x21d/0x240
>  [] ? local_clock+0x3f/0x50
>  [] ? context_tracking_user_exit+0x46/0x1a0
>  [] vfs_readv+0x35/0x60
>  [] SyS_preadv+0xa2/0xd0
>  [] tracesys+0xdd/0xe2
> Code: 7e 6e 41 8b 84 24 2c 08 00 00 83 eb 08 c1 e0 03 39 c3 0f 85 c1 00 00 00 
> 89 d9 44 89 e8 d3 f8 0f b6 c0 48 83 c0 04 4d 8b 64 c4 08  80 b4 d6 ff 85 
> c0 74 c4 80 3d f7 2f 9d 00 00 75 bb e8 6e b4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] mm/mmap.c: Remove unnecessary pgoff assignment

2013-09-05 Thread Zhang Yanfei

We never access variable pgoff later, so the assignment is
redundant. Remove it. 

Signed-off-by: Zhang Yanfei 
---
 mm/mmap.c |1 - 
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index f9c97d1..db44f6a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1570,7 +1570,6 @@ munmap_back:
WARN_ON_ONCE(addr != vma->vm_start);
 
addr = vma->vm_start;
-   pgoff = vma->vm_pgoff;
vm_flags = vma->vm_flags;
} else if (vm_flags & VM_SHARED) {
if (unlikely(vm_flags & (VM_GROWSDOWN|VM_GROWSUP)))
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: back merge of Linus' tree into the vfio tree

2013-09-05 Thread Stephen Rothwell

Hi Alex,

On Thu, 05 Sep 2013 17:14:29 -0600 Alex Williamson  
wrote:
>
> On Fri, 2013-09-06 at 09:08 +1000, Stephen Rothwell wrote:
> > 
> > I noticed that you have back merged Linus' tree into yours.  Linus
> > usually takes a dim view of that - especially when there is no
> > explanation in the merge commit message.  i.e. you shouldn't to that
> > unless you really need to - and then you should explain why you did it.
> 
> Hmm, I was hoping that wouldn't be a problem, especially with no
> conflicts in the merge.  I did it because the first commit after the
> merge in my next tree depends on PCI changes that have already been
> merged by Linus.  Re-basing is an even bigger sin and I felt it better
> to do a merge than ask for two pulls or add an unbuild-able commit to my
> next tree.  How do you suggest that I resolve this?

See above ... you should have said all that in the merge commit message.
I guess that you should just own it now and explain it to Linus when you
ask him to pull your tree.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgp4IpFrSiJMD.pgp
Description: PGP signature

Re: [PATCH 10/11] x86, mem-hotplug: Support initialize page tables from low to high.

2013-09-05 Thread Tang Chen


Hi Wanpeng,

On 09/06/2013 10:16 AM, Wanpeng Li wrote:
..

+#ifdef CONFIG_MOVABLE_NODE
+   unsigned long kernel_end;
+
+   if (movablenode_enable_srat&&
+   memblock.current_order == MEMBLOCK_ORDER_LOW_TO_HIGH) {


I think memblock.current_order == MEMBLOCK_ORDER_LOW_TO_HIGH is always
true if config MOVABLE_NODE and movablenode_enable_srat == true if PATCH
11/11 is applied.


memblock.current_order == MEMBLOCK_ORDER_LOW_TO_HIGH is true here if
MOVABLE_NODE
is configured, and it will be reset after SRAT is parsed. But
movablenode_enable_srat
could only be true when users specify movablenode boot option in the
kernel commandline.


You are right.

I mean the change should be:

+#ifdef CONFIG_MOVABLE_NODE
+   unsigned long kernel_end;
+
+   if (movablenode_enable_srat) {

The is unnecessary to check memblock.current_order since it is always true
if movable_node is configured and movablenode_enable_srat is true.



But I think, memblock.current_order is set outside init_mem_mapping(). And
the path in the if statement could only be run when current order is from
low to high. So I think it is safe to check it here.

I prefer to keep it at least in the next version patch-set. If others also
think it is unnecessary, I'm OK with removing the checking. :)

Thanks. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] net: stmmac: fix bad merge conflict resolution

2013-09-05 Thread David Miller

From: Olof Johansson 
Date: Thu,  5 Sep 2013 18:01:41 -0700

> Merge commit 06c54055bebf919249aa1eb68312887c3cfe77b4 did a bad conflict
> resolution accidentally leaving out a closing brace. Add it back.
> 
> Signed-off-by: Olof Johansson 
> ---
> 
> This breaks a handful of defconfigs on ARM, so it'd be good to see it
> applied pretty quickly. Thanks!

Looks like Linus applied this, thanks Olof.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT] Sparc

2013-09-05 Thread David Miller

From: Sergei Shtylyov 
Date: Fri, 06 Sep 2013 02:32:51 +0400

> Hello.
> 
> On 09/06/2013 12:44 AM, David Miller wrote:
> 
>> Several bug fixes (from Kirill Tkhai, Geery Uytterhoeven, and Alexey
>> Dobriyan) and some support for Fujitsu sparc64x chips (from Allen
>> Pais).
> 
>> Please pull, thanks a lot!
> 
>You meant that for 'linux-sparc', not 'linux-ide', right? :-)

Yes, sparclinux is the intended destination, and I forwarded it there
once I realized my mistake :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] Please pull powerpc.git next branch

2013-09-05 Thread Benjamin Herrenschmidt

Hi Linus !

Here's the powerpc batch for this merge window. Some of the highlights are:

 * A bunch of endian fixes ! We don't have full LE support yet in that
release but this contains a lot of fixes all over arch/powerpc to use the
proper accessors, call the firmware with the right endian mode, etc...

 * A few updates to our "powernv" platform (non-virtualized, the one
to run KVM on), among other, support for bridging the P8 LPC bus for UARTs,
support and some EEH fixes.
 
 * Some mpc51xx clock API cleanups in preparation for a clock API overhaul

 * A pile of cleanups of our old math emulation code, including better
support for using it to emulate optional FP instructions on embedded
chips that otherwise have a HW FPU.

 * Some infrastructure in selftest, for powerpc now, but could be generalized,
initially used by some tests for our perf instruction counting code.

 * A pile of fixes for hotplug on pseries (that was seriously bitrotting)

 * The usual slew of freescale embedded updates, new boards, 64-bit hiberation
support, e6500 core PMU support, etc...

Cheers,
Ben.

The following changes since commit d4e4ab86bcba5a72779c43dc1459f71fea3d89c8:

  Linux 3.11-rc5 (2013-08-11 18:04:20 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git next

for you to fetch changes up to 9f24b0c9ef9b6b1292579c9e2cd7ff07ddc372b7:

  powerpc: Correct FSCR bit definitions (2013-09-05 17:29:20 +1000)


Alistair Popple (4):
  powerpc: More little endian fixes for prom.c
  powerpc: More little endian fixes for setup-common.c
  powerpc: Little endian fixes for legacy_serial.c
  powerpc: Make NUMA device node code endian safe

Andy Fleming (2):
  powerpc: Add smp_generic_cpu_bootable
  powerpc: Convert platforms to smp_generic_cpu_bootable

Anton Blanchard (29):
  powerpc: Align p_toc
  powerpc: Handle unaligned ldbrx/stdbrx
  powerpc: Wrap MSR macros with parentheses
  powerpc: Remove SAVE_VSRU and REST_VSRU macros
  powerpc: Simplify logic in include/uapi/asm/elf.h
  powerpc/pseries: Simplify H_GET_TERM_CHAR
  powerpc: Fix a number of sparse warnings
  powerpc/pci: Don't use bitfield for force_32bit_msi
  powerpc: Stop using non-architected shared_proc field in lppaca
  powerpc: Make RTAS device tree accesses endian safe
  powerpc: Make cache info device tree accesses endian safe
  powerpc: Make RTAS calls endian safe
  powerpc: Make logical to real cpu mapping code endian safe
  powerpc: Add some endian annotations to time and xics code
  powerpc: Fix some endian issues in xics code
  powerpc: of_parse_dma_window should take a __be32 *dma_window
  powerpc: Make device tree accesses in cache info code endian safe
  powerpc: Make device tree accesses in HVC VIO console endian safe
  powerpc: Make device tree accesses in VIO subsystem endian safe
  powerpc: Make OF PCI device tree accesses endian safe
  powerpc: Make PCI device node device tree accesses endian safe
  powerpc: Add endian annotations to lppaca, slb_shadow and dtl_entry
  powerpc: Fix little endian lppaca, slb_shadow and dtl_entry
  powerpc: Emulate instructions in little endian mode
  powerpc: Little endian SMP IPI demux
  powerpc/pseries: Fix endian issues in H_GET_TERM_CHAR/H_PUT_TERM_CHAR
  powerpc: Fix little endian coredumps
  powerpc: Make rwlocks endian safe
  powerpc: Never handle VSX alignment exceptions from kernel

Benjamin Herrenschmidt (21):
  Merge remote-tracking branch 'scott/next' into next
  powerpc/pmac: Early debug output on screen on 64-bit macs
  powerpc: Better split CONFIG_PPC_INDIRECT_PIO and CONFIG_PPC_INDIRECT_MMIO
  powerpc/powernv: Update opal.h to add new LPC and XSCOM functions
  powerpc/powernv: Add helper to get ibm,chip-id of a node
  powerpc/powernv: Add PIO accessors for Power8 LPC bus
  powerpc: Cleanup udbg_16550 and add support for LPC PIO-only UARTs
  powerpc: Check "status" property before adding legacy ISA serial ports
  powerpc/powernv: Don't crash if there are no OPAL consoles
  powerpc/powernv: Enable detection of legacy UARTs
  Revert "powerpc/e500: Update compilation flags with core specific options"
  powerpc: Make prom_init.c endian safe
  powerpc/wsp: Fix early debug build
  Merge remote-tracking branch 'scott/next' into next
  Merge branch 'merge' into next
  powerpc/btext: Fix CONFIG_PPC_EARLY_DEBUG_BOOTX on ppc32
  powerpc: Don't Oops when accessing /proc/powerpc/lparcfg without 
hypervisor
  powerpc/powernv: Return secondary CPUs to firmware on kexec
  Merge branch 'merge' into next
  powerpc/pseries: Move lparcfg.c to platforms/pseries
  Merge remote-tracking branch 'agust/next' into next

Catalin Udma (2):
  powerpc/perf: increase the perf HW events to 6

Re: [PATCH v2 4/4] kernel: add support for init_array constructors

2013-09-05 Thread Rusty Russell

Frantisek Hrbata  writes:
> This adds the .init_array section as yet another section with constructors. 
> This
> is needed because gcc could add __gcov_init calls to .init_array or .ctors
> section, depending on gcc version.
>
> v2: - reuse mod->ctors for .init_array section for modules, because gcc uses
>   .ctors or .init_array, but not both at the same time
>
> Signed-off-by: Frantisek Hrbata 

Might be nice to document which gcc version changed this, so people can
choose whether to cherry-pick this change?

Acked-by: Rusty Russell 

> ---
>  include/asm-generic/vmlinux.lds.h | 1 +
>  kernel/module.c   | 3 +++
>  2 files changed, 4 insertions(+)
>
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index 69732d2..c55d8d9 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -468,6 +468,7 @@
>  #define KERNEL_CTORS()   . = ALIGN(8);  \
>   VMLINUX_SYMBOL(__ctors_start) = .; \
>   *(.ctors)  \
> + *(.init_array) \
>   VMLINUX_SYMBOL(__ctors_end) = .;
>  #else
>  #define KERNEL_CTORS()
> diff --git a/kernel/module.c b/kernel/module.c
> index 2069158..bbbd953 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2760,6 +2760,9 @@ static void find_module_sections(struct module *mod, 
> struct load_info *info)
>  #ifdef CONFIG_CONSTRUCTORS
>   mod->ctors = section_objs(info, ".ctors",
> sizeof(*mod->ctors), &mod->num_ctors);
> + if (!mod->ctors)
> + mod->ctors = section_objs(info, ".init_array",
> + sizeof(*mod->ctors), &mod->num_ctors);
>  #endif
>  
>  #ifdef CONFIG_TRACEPOINTS
> -- 
> 1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/1] dcache: Translating dentry into pathname without taking rename_lock

2013-09-05 Thread Waiman Long


On 09/05/2013 04:42 PM, Linus Torvalds wrote:

On Thu, Sep 5, 2013 at 1:29 PM, Waiman Long  wrote:

It is not as simple as doing a strncpy().

Yes it damn well is.

Stop the f*cking stupid arguments, and instead listen to what I say.

Here. Let me bold-face the most important part for you, so that you
don't miss it in all the other crap:

MAKE prepend() JUST USE "strncpy()" INSTEAD OF "memcpy()".

Nothing else. Seriously. Your "you can't do it because we copy
backwards" arguments are pure and utter garbage, exactly BECAUSE YOU
DON'T CHANGE ANY OF THAT. You can actually use the unreliable length
variable BUT YOU MUST STILL STOP AT A ZERO.

Get it?

You're complicating the whole thing for no good reason. I'm telling
you (and HAVE BEEN telling you multiple times) that you cannot use
"memcpy()" because the length may not be reliable, so you need to
check for zero in the middle and stop early. All your arguments have
been totally pointless, because you don't seem to see that simple and
fundamental issue. You don't change ANYTHING else. But you damn well
not do a "memcpy", you do something that stops when it hits a NUL
character.

We call that function "strncpy()". I'd actually prefer to write it out
by hand (because somebody could implement "strncpy()" as a
questionable function that accesses past the NUL as long as it's
within the 'n'), and because I think we might want to do that
word-at-a-time version of it, but for a first approximation, just do
that one-liner version.

Don't do anything else. Don't do locking. Don't do memchr. Just make
sure that you stop at a NUL character, and don't trust the length,
because the length may not match the pointer. That's was always ALL
you needed to do.

   Linus
I am sorry that I misunderstand what you said. I will do what you and Al 
advise me to do.


-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Hope and your long-term cooperation

2013-09-05 Thread jenny

Dear Manager: 

Glad to write to you. 

We are manufacturer of  Stearic acid from China,
We have Zinc stearate, calcium stearate, magnesium stearate, etc.
If you need such chemicals, please do not hesitate to contact me.

 
Best Regards,





  
Shijiazhuang Shinearly Chemicals Co.,Ltd

No. 105 Yellow River Road, Hightech Zone, Shijiazhuang City, Hebei Province, 
China

Tel:  0086-311-89809275

Fax:  
0086-311-67795015N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�&j:+v�����赙zZ+��+zf＂�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
0鹅h���i

Re: ftrace 'failed to modify' bug when loading reiserfs.ko

2013-09-05 Thread Steven Rostedt

On Thu, 5 Sep 2013 21:48:59 -0400
Dave Jones  wrote:

> On Thu, Sep 05, 2013 at 09:44:55PM -0400, Steven Rostedt wrote:
>  > On Thu, 5 Sep 2013 21:34:55 -0400
>  > Dave Jones  wrote:
>  > 
>  > > On Thu, Sep 05, 2013 at 09:28:34PM -0400, Steven Rostedt wrote:

>  > Did you change a config option, or update your gcc?
> 
> Yeah, changed CONFIG_DEBUG_KOBJECT, which rebuilt the world.

Still doesn't explain why it gave you that splat there.

Do you still have that binary module, and can you show me what's at
reiserfs_init_bitmap_cache+0x0 with objdump?

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH 4/4] ftrace/rcu: Do not trace debug_lockdep_rcu_enabled()

2013-09-05 Thread Steven Rostedt

From: "Steven Rostedt (Red Hat)" 

The function debug_lockdep_rcu_enabled() is part of the RCU lockdep
debugging, and is called very frequently. I found that if I enable
a lot of debugging and run the function graph tracer, this
function can cause a live lock of the system.

We don't usually trace lockdep infrastructure, no need to trace
this either.

Reviewed-by: Paul E. McKenney 
Signed-off-by: Steven Rostedt 
---
 kernel/rcupdate.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index cce6ba8..4f20c6c 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -122,7 +122,7 @@ struct lockdep_map rcu_sched_lock_map =
STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_sched", &rcu_sched_lock_key);
 EXPORT_SYMBOL_GPL(rcu_sched_lock_map);
 
-int debug_lockdep_rcu_enabled(void)
+int notrace debug_lockdep_rcu_enabled(void)
 {
return rcu_scheduler_active && debug_locks &&
   current->lockdep_recursion == 0;
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH 1/4] tracing: Make tracing_cpumask available for all instances

2013-09-05 Thread Steven Rostedt

From: Alexander Z Lam 

Allow tracer instances to disable tracing by cpu by moving
the static global tracing_cpumask into trace_array.

Link: 
http://lkml.kernel.org/r/921622317f239bfc2283cac2242647801ef584f2.1375980149.git@google.com

Cc: Vaibhav Nagarnaik 
Cc: David Sharp 
Cc: Alexander Z Lam 
Signed-off-by: Alexander Z Lam 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.c |   37 -
 kernel/trace/trace.h |1 +
 2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 496f94d..7974ba2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3166,11 +3166,6 @@ static const struct file_operations show_traces_fops = {
 };
 
 /*
- * Only trace on a CPU if the bitmask is set:
- */
-static cpumask_var_t tracing_cpumask;
-
-/*
  * The tracer itself will not take this lock, but still we want
  * to provide a consistent cpumask to user-space:
  */
@@ -3186,11 +3181,12 @@ static ssize_t
 tracing_cpumask_read(struct file *filp, char __user *ubuf,
 size_t count, loff_t *ppos)
 {
+   struct trace_array *tr = file_inode(filp)->i_private;
int len;
 
mutex_lock(&tracing_cpumask_update_lock);
 
-   len = cpumask_scnprintf(mask_str, count, tracing_cpumask);
+   len = cpumask_scnprintf(mask_str, count, tr->tracing_cpumask);
if (count - len < 2) {
count = -EINVAL;
goto out_err;
@@ -3208,7 +3204,7 @@ static ssize_t
 tracing_cpumask_write(struct file *filp, const char __user *ubuf,
  size_t count, loff_t *ppos)
 {
-   struct trace_array *tr = filp->private_data;
+   struct trace_array *tr = file_inode(filp)->i_private;
cpumask_var_t tracing_cpumask_new;
int err, cpu;
 
@@ -3228,12 +3224,12 @@ tracing_cpumask_write(struct file *filp, const char 
__user *ubuf,
 * Increase/decrease the disabled counter if we are
 * about to flip a bit in the cpumask:
 */
-   if (cpumask_test_cpu(cpu, tracing_cpumask) &&
+   if (cpumask_test_cpu(cpu, tr->tracing_cpumask) &&
!cpumask_test_cpu(cpu, tracing_cpumask_new)) {
atomic_inc(&per_cpu_ptr(tr->trace_buffer.data, 
cpu)->disabled);
ring_buffer_record_disable_cpu(tr->trace_buffer.buffer, 
cpu);
}
-   if (!cpumask_test_cpu(cpu, tracing_cpumask) &&
+   if (!cpumask_test_cpu(cpu, tr->tracing_cpumask) &&
cpumask_test_cpu(cpu, tracing_cpumask_new)) {
atomic_dec(&per_cpu_ptr(tr->trace_buffer.data, 
cpu)->disabled);
ring_buffer_record_enable_cpu(tr->trace_buffer.buffer, 
cpu);
@@ -3242,7 +3238,7 @@ tracing_cpumask_write(struct file *filp, const char 
__user *ubuf,
arch_spin_unlock(&ftrace_max_lock);
local_irq_enable();
 
-   cpumask_copy(tracing_cpumask, tracing_cpumask_new);
+   cpumask_copy(tr->tracing_cpumask, tracing_cpumask_new);
 
mutex_unlock(&tracing_cpumask_update_lock);
free_cpumask_var(tracing_cpumask_new);
@@ -3256,9 +3252,10 @@ err_unlock:
 }
 
 static const struct file_operations tracing_cpumask_fops = {
-   .open   = tracing_open_generic,
+   .open   = tracing_open_generic_tr,
.read   = tracing_cpumask_read,
.write  = tracing_cpumask_write,
+   .release= tracing_release_generic_tr,
.llseek = generic_file_llseek,
 };
 
@@ -5938,6 +5935,11 @@ static int new_instance_create(const char *name)
if (!tr->name)
goto out_free_tr;
 
+   if (!alloc_cpumask_var(&tr->tracing_cpumask, GFP_KERNEL))
+   goto out_free_tr;
+
+   cpumask_copy(tr->tracing_cpumask, cpu_all_mask);
+
raw_spin_lock_init(&tr->start_lock);
 
tr->current_trace = &nop_trace;
@@ -5969,6 +5971,7 @@ static int new_instance_create(const char *name)
  out_free_tr:
if (tr->trace_buffer.buffer)
ring_buffer_free(tr->trace_buffer.buffer);
+   free_cpumask_var(tr->tracing_cpumask);
kfree(tr->name);
kfree(tr);
 
@@ -6098,6 +6101,9 @@ init_tracer_debugfs(struct trace_array *tr, struct dentry 
*d_tracer)
 {
int cpu;
 
+   trace_create_file("tracing_cpumask", 0644, d_tracer,
+ tr, &tracing_cpumask_fops);
+
trace_create_file("trace_options", 0644, d_tracer,
  tr, &tracing_iter_fops);
 
@@ -6147,9 +6153,6 @@ static __init int tracer_init_debugfs(void)
 
init_tracer_debugfs(&global_trace, d_tracer);
 
-   trace_create_file("tracing_cpumask", 0644, d_tracer,
-   &global_trace, &tracing_cpumask_fops);
-
trace_create_file("available_tracers", 0444, d_tracer,
&global_trace, &show_traces_fop

[for-next][PATCH 3/4] x86-32, ftrace: Fix static ftrace when early microcode is enabled

2013-09-05 Thread Steven Rostedt

From: "H. Peter Anvin" 

Early microcode loading runs C code before paging is enabled on 32
bits.  Since ftrace puts a hook into every function, that hook needs
to be safe to execute in the pre-paging environment.  This is
currently true for dynamic ftrace but not for static ftrace.

Static ftrace is obsolescent and assumed to not be
performance-critical, so we can simply test that the stack pointer
falls within the valid range of kernel addresses.

Reported-by: Jan Kiszka 
Tested-by: Jan Kiszka 
Signed-off-by: H. Peter Anvin 
Signed-off-by: Steven Rostedt 
---
 arch/x86/kernel/entry_32.S |3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 2cfbc3a..f0dcb0c 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -1176,6 +1176,9 @@ ftrace_restore_flags:
 #else /* ! CONFIG_DYNAMIC_FTRACE */
 
 ENTRY(mcount)
+   cmpl $__PAGE_OFFSET, %esp
+   jb ftrace_stub  /* Paging not enabled yet? */
+
cmpl $0, function_trace_stop
jne  ftrace_stub
 
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH 2/4] ftrace: Fix a slight race in modifying what function callback gets traced

2013-09-05 Thread Steven Rostedt

From: "Steven Rostedt (Red Hat)" 

There's a slight race when going from a list function to a non list
function. That is, when only one callback is registered to the function
tracer, it gets called directly by the mcount trampoline. But if this
function has filters, it may be called by the wrong functions.

As the list ops callback that handles multiple callbacks that are
registered to ftrace, it also handles what functions they call. While
the transaction is taking place, use the list function always, and
after all the updates are finished (only the functions that should be
traced are being traced), then we can update the trampoline to call
the function directly.

Signed-off-by: Steven Rostedt 
---
 kernel/trace/ftrace.c |   17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index a6d098c..03cf44a 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1978,12 +1978,27 @@ int __weak ftrace_arch_code_modify_post_process(void)
 
 void ftrace_modify_all_code(int command)
 {
+   int update = command & FTRACE_UPDATE_TRACE_FUNC;
+
+   /*
+* If the ftrace_caller calls a ftrace_ops func directly,
+* we need to make sure that it only traces functions it
+* expects to trace. When doing the switch of functions,
+* we need to update to the ftrace_ops_list_func first
+* before the transition between old and new calls are set,
+* as the ftrace_ops_list_func will check the ops hashes
+* to make sure the ops are having the right functions
+* traced.
+*/
+   if (update)
+   ftrace_update_ftrace_func(ftrace_ops_list_func);
+
if (command & FTRACE_UPDATE_CALLS)
ftrace_replace_code(1);
else if (command & FTRACE_DISABLE_CALLS)
ftrace_replace_code(0);
 
-   if (command & FTRACE_UPDATE_TRACE_FUNC)
+   if (update && ftrace_trace_function != ftrace_ops_list_func)
ftrace_update_ftrace_func(ftrace_trace_function);
 
if (command & FTRACE_START_FUNC_RET)
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH 0/4] tracing: Updated changes for 3.12

2013-09-05 Thread Steven Rostedt

I'm holding off on the rcu unsafe changes with perf and function tracing.
We'll still get bug splats with unsafe rcu usage, but we need to work
out a better solution than I was going to push for 3.12. It's too late
to get things smooth, thus we need to wait till 3.13 to get something
that is decent.

For now, root needs to be careful in how they trace functions with perf.

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
for-next

Head SHA1: a0a5a0561f63905fe94c49bc567615829f42ce1e


Alexander Z Lam (1):
  tracing: Make tracing_cpumask available for all instances

H. Peter Anvin (1):
  x86-32, ftrace: Fix static ftrace when early microcode is enabled

Steven Rostedt (Red Hat) (2):
  ftrace: Fix a slight race in modifying what function callback gets traced
  ftrace/rcu: Do not trace debug_lockdep_rcu_enabled()


 arch/x86/kernel/entry_32.S |3 +++
 kernel/rcupdate.c  |2 +-
 kernel/trace/ftrace.c  |   17 -
 kernel/trace/trace.c   |   37 -
 kernel/trace/trace.h   |1 +
 5 files changed, 41 insertions(+), 19 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ftrace 'failed to modify' bug when loading reiserfs.ko

2013-09-05 Thread Dave Jones

On Thu, Sep 05, 2013 at 09:44:55PM -0400, Steven Rostedt wrote:
 > On Thu, 5 Sep 2013 21:34:55 -0400
 > Dave Jones  wrote:
 > 
 > > On Thu, Sep 05, 2013 at 09:28:34PM -0400, Steven Rostedt wrote:
 > >  > On Thu, 5 Sep 2013 21:19:24 -0400
 > >  > Dave Jones  wrote:
 > >  > 
 > >  > > For whatever dumb reason, when running 'make install' on a Fedora 
 > > system,
 > >  > > os-prober tries to figure out what filesystems are needed by loading 
 > > filesystems,
 > >  > > and seeing what sticks..  Today it blew up spectacularly when it got 
 > > to
 > >  > > loading reiserfs..  System wedged entirely afterwards.
 > >  > 
 > >  > Could it be that the reiserfs module was compiled differently than the
 > >  > running kernel?
 > >  
 > > o... it was probably installing the just-built version over the same 
 > > '3.11+'
 > > modules tree that was running.  This has never been a problem before 
 > > though..
 > > 
 > 
 > Did you change a config option, or update your gcc?

Yeah, changed CONFIG_DEBUG_KOBJECT, which rebuilt the world.

Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ftrace 'failed to modify' bug when loading reiserfs.ko

2013-09-05 Thread Steven Rostedt

On Thu, 5 Sep 2013 21:34:55 -0400
Dave Jones  wrote:

> On Thu, Sep 05, 2013 at 09:28:34PM -0400, Steven Rostedt wrote:
>  > On Thu, 5 Sep 2013 21:19:24 -0400
>  > Dave Jones  wrote:
>  > 
>  > > For whatever dumb reason, when running 'make install' on a Fedora system,
>  > > os-prober tries to figure out what filesystems are needed by loading 
> filesystems,
>  > > and seeing what sticks..  Today it blew up spectacularly when it got to
>  > > loading reiserfs..  System wedged entirely afterwards.
>  > 
>  > Could it be that the reiserfs module was compiled differently than the
>  > running kernel?
>  
> o... it was probably installing the just-built version over the same 
> '3.11+'
> modules tree that was running.  This has never been a problem before though..
> 

Did you change a config option, or update your gcc?

Although, it doesn't really explain why the location would have
something that it doesn't expect. As the mcount/fentry table is created
in the module itself.

 -- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] security subsystem changes for 3.12

2013-09-05 Thread James Morris

Nothing major for this kernel, just maintenance updates.

Please pull.



The following changes since commit 2e032852245b3dcfe5461d7353e34eb6da095ccf:

  Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm 
(2013-09-05 18:07:32 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next

Casey Schaufler (1):
  Smack: network label match fix

James Morris (2):
  Merge branch 'linus-master'; commit 'v3.11-rc2' into ra-next
  Merge branch 'smack-for-3.12' of 
git://git.gitorious.org/smack-next/kernel into ra-next

John Johansen (14):
  apparmor: enable users to query whether apparmor is enabled
  apparmor: add a features/policy dir to interface
  apparmor: provide base for multiple profiles to be replaced at once
  apparmor: convert profile lists to RCU based locking
  apparmor: change how profile replacement update is done
  apparmor: update how unconfined is handled
  apparmor: rework namespace free path
  apparmor: make free_profile available outside of policy.c
  apparmor: allow setting any profile into the unconfined state
  apparmor: add interface files for profiles and namespaces
  apparmor: add an optional profile attachment string for profiles
  apparmor: add the profile introspection file to interface
  apparmor: export set of capabilities supported by the apparmor module
  apparmor: add the ability to report a sha1 hash of loaded policy

Rafal Krypa (1):
  Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes

Tetsuo Handa (2):
  xattr: Constify ->name member of "struct xattr".
  apparmor: remove minimum size check for vmalloc()

Tomasz Stanislawski (2):
  security: smack: fix memleak in smk_write_rules_list()
  security: smack: add a hash table to quicken smk_find_entry()

 fs/ocfs2/xattr.h  |2 +-
 include/linux/security.h  |8 +-
 include/linux/xattr.h |2 +-
 include/uapi/linux/reiserfs_xattr.h   |2 +-
 security/apparmor/Kconfig |   12 +
 security/apparmor/Makefile|7 +-
 security/apparmor/apparmorfs.c|  636 -
 security/apparmor/capability.c|5 +
 security/apparmor/context.c   |   16 +-
 security/apparmor/crypto.c|   97 +
 security/apparmor/domain.c|   24 +-
 security/apparmor/include/apparmor.h  |6 +
 security/apparmor/include/apparmorfs.h|   40 ++
 security/apparmor/include/audit.h |1 -
 security/apparmor/include/capability.h|4 +
 security/apparmor/include/context.h   |   15 +-
 security/apparmor/include/crypto.h|   36 ++
 security/apparmor/include/policy.h|  218 +++---
 security/apparmor/include/policy_unpack.h |   21 +-
 security/apparmor/lib.c   |5 -
 security/apparmor/lsm.c   |   22 +-
 security/apparmor/policy.c|  609 
 security/apparmor/policy_unpack.c |  135 +--
 security/apparmor/procattr.c  |2 +-
 security/capability.c |2 +-
 security/integrity/evm/evm_main.c |2 +-
 security/security.c   |8 +-
 security/selinux/hooks.c  |   17 +-
 security/smack/smack.h|   13 +-
 security/smack/smack_access.c |   29 ++-
 security/smack/smack_lsm.c|   51 ++-
 security/smack/smackfs.c  |  184 -
 32 files changed, 1675 insertions(+), 556 deletions(-)
 create mode 100644 security/apparmor/crypto.c
 create mode 100644 security/apparmor/include/crypto.h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/11] x86, mem-hotplug: Support initialize page tables from low to high.

2013-09-05 Thread Tang Chen


Hi Wanpeng,

Thank you for reviewing. See below, please.

On 09/05/2013 09:30 PM, Wanpeng Li wrote:
..

+#ifdef CONFIG_MOVABLE_NODE
+   unsigned long kernel_end;
+
+   if (movablenode_enable_srat&&
+   memblock.current_order == MEMBLOCK_ORDER_LOW_TO_HIGH) {


I think memblock.current_order == MEMBLOCK_ORDER_LOW_TO_HIGH is always
true if config MOVABLE_NODE and movablenode_enable_srat == true if PATCH
11/11 is applied.


memblock.current_order == MEMBLOCK_ORDER_LOW_TO_HIGH is true here if 
MOVABLE_NODE
is configured, and it will be reset after SRAT is parsed. But 
movablenode_enable_srat
could only be true when users specify movablenode boot option in the 
kernel commandline.


Please refer to patch 9/11.




+   kernel_end = round_up(__pa_symbol(_end), PMD_SIZE);
+
+   memory_map_from_low(kernel_end, end);
+   memory_map_from_low(ISA_END_ADDRESS, kernel_end);


Why split ISA_END_ADDRESS ~ end?


The first 5 pages for the page tables are from brk, please refer to 
alloc_low_pages().
They are able to map about 2MB memory. And this 2MB memory will be used 
to store

page tables for the next mapped pages.

Here, we split [ISA_END_ADDRESS, end) into [ISA_END_ADDRESS, _end) and 
[_end, end),
and map [_end, end) first. This is because memory in [ISA_END_ADDRESS, 
_end) may be
used, then we have not enough memory for the next coming page tables. We 
should map

[_end, end) first because this memory is highly likely unused.




..


I think the variables sorted by address is:
ISA_END_ADDRESS ->  _end ->  real_end ->  end


Yes.




+   memory_map_from_high(ISA_END_ADDRESS, real_end);


If this is overlap with work done between #ifdef CONFIG_MOVABLE_NODE and
#endif?



I don't think so. Seeing from my code, if work between #ifdef 
CONFIG_MOVABLE_NODE and

#endif is done, it will goto out, right ?

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND v3 0/7] Enable Drivers for Intel MIC X100 Coprocessors.

2013-09-05 Thread Joe Perches

Whitespace neatening...

Multiline statement argument alignment.
Argument wrapping.
Use kmalloc_array instead of kmalloc.

---

 drivers/misc/mic/card/mic_virtio.c  | 17 ---
 drivers/misc/mic/card/mic_x100.c|  4 +-
 drivers/misc/mic/host/mic_debugfs.c | 91 ++---
 drivers/misc/mic/host/mic_fops.c|  6 +--
 drivers/misc/mic/host/mic_intr.c| 37 ---
 drivers/misc/mic/host/mic_smpt.c| 17 +++
 drivers/misc/mic/host/mic_sysfs.c   | 18 
 drivers/misc/mic/host/mic_virtio.c  | 34 ++
 drivers/misc/mic/host/mic_x100.c| 29 ++--
 9 files changed, 122 insertions(+), 131 deletions(-)

diff --git a/drivers/misc/mic/card/mic_virtio.c 
b/drivers/misc/mic/card/mic_virtio.c
index 38275c1..6071aec 100644
--- a/drivers/misc/mic/card/mic_virtio.c
+++ b/drivers/misc/mic/card/mic_virtio.c
@@ -103,7 +103,7 @@ static void mic_finalize_features(struct virtio_device 
*vdev)
for (i = 0; i < bits; i++) {
if (test_bit(i, vdev->features))
iowrite8(ioread8(&out_features[i / 8]) | (1 << (i % 8)),
-   &out_features[i / 8]);
+&out_features[i / 8]);
}
 }
 
@@ -197,10 +197,9 @@ static void mic_notify(struct virtqueue *vq)
 static void mic_del_vq(struct virtqueue *vq, int n)
 {
struct mic_vdev *mvdev = to_micvdev(vq->vdev);
-   struct vring *vr = (struct vring *) (vq + 1);
+   struct vring *vr = (struct vring *)(vq + 1);
 
-   free_pages((unsigned long) vr->used,
-   get_order(mvdev->used_size[n]));
+   free_pages((unsigned long) vr->used, get_order(mvdev->used_size[n]));
vring_del_virtqueue(vq);
mic_card_unmap(mvdev->mdev, mvdev->vr[n]);
mvdev->vr[n] = NULL;
@@ -274,8 +273,8 @@ static struct virtqueue *mic_find_vq(struct virtio_device 
*vdev,
/* Allocate and reassign used ring now */
mvdev->used_size[index] = PAGE_ALIGN(sizeof(__u16) * 3 +
sizeof(struct vring_used_elem) * config.num);
-   used = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
-   get_order(mvdev->used_size[index]));
+   used = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+   get_order(mvdev->used_size[index]));
if (!used) {
err = -ENOMEM;
dev_err(mic_dev(mvdev), "%s %d err %d\n",
@@ -291,7 +290,7 @@ static struct virtqueue *mic_find_vq(struct virtio_device 
*vdev,
 * vring_new_virtqueue() would ensure that
 *  (&vq->vring == (struct vring *) (&vq->vq + 1));
 */
-   vr = (struct vring *) (vq + 1);
+   vr = (struct vring *)(vq + 1);
vr->used = used;
 
vq->priv = mvdev;
@@ -544,7 +543,7 @@ static void mic_scan_devices(struct mic_driver *mdrv, bool 
remove)
if (dev) {
if (remove)
iowrite8(MIC_VIRTIO_PARAM_DEV_REMOVE,
-   &dc->config_change);
+&dc->config_change);
put_device(dev);
mic_handle_config_change(d, i, mdrv);
ret = mic_remove_device(d, i, mdrv);
@@ -559,7 +558,7 @@ static void mic_scan_devices(struct mic_driver *mdrv, bool 
remove)
 
/* new device */
dev_dbg(mdrv->dev, "%s %d Adding new virtio device %p\n",
-   __func__, __LINE__, d);
+   __func__, __LINE__, d);
if (!remove)
mic_add_device(d, i, mdrv);
}
diff --git a/drivers/misc/mic/card/mic_x100.c b/drivers/misc/mic/card/mic_x100.c
index 7cb3469..e54dfcb 100644
--- a/drivers/misc/mic/card/mic_x100.c
+++ b/drivers/misc/mic/card/mic_x100.c
@@ -66,8 +66,8 @@ void mic_send_intr(struct mic_device *mdev, int doorbell)
/* Ensure that the interrupt is ordered w.r.t previous stores. */
wmb();
mic_mmio_write(mw, MIC_X100_SBOX_SDBIC0_DBREQ_BIT,
-   MIC_X100_SBOX_BASE_ADDRESS +
-   (MIC_X100_SBOX_SDBIC0 + (4 * doorbell)));
+  MIC_X100_SBOX_BASE_ADDRESS +
+  (MIC_X100_SBOX_SDBIC0 + (4 * doorbell)));
 }
 
 /**
diff --git a/drivers/misc/mic/host/mic_debugfs.c 
b/drivers/misc/mic/host/mic_debugfs.c
index e22fb7b..002faa5 100644
--- a/drivers/misc/mic/host/mic_debugfs.c
+++ b/drivers/misc/mic/host/mic_debugfs.c
@@ -103,7 +103,7 @@ static int mic_smpt_show(struct seq_file *s, void *pos)
unsigned long flags;
 
seq_printf(s, "MIC %-2d |%-10s| %-14s %-10s\n",
-   mdev->id, "SMPT entry", "SW DMA addr", "RefCount");
+  mdev->id, "SMPT entry", "SW DMA addr", "RefCount");
seq_puts(s, "\n");
 
if (mdev->smpt) {
@@ -111,8 +111,8 @@ static

Re: ftrace 'failed to modify' bug when loading reiserfs.ko

2013-09-05 Thread Dave Jones

On Thu, Sep 05, 2013 at 09:28:34PM -0400, Steven Rostedt wrote:
 > On Thu, 5 Sep 2013 21:19:24 -0400
 > Dave Jones  wrote:
 > 
 > > For whatever dumb reason, when running 'make install' on a Fedora system,
 > > os-prober tries to figure out what filesystems are needed by loading 
 > > filesystems,
 > > and seeing what sticks..  Today it blew up spectacularly when it got to
 > > loading reiserfs..  System wedged entirely afterwards.
 > 
 > Could it be that the reiserfs module was compiled differently than the
 > running kernel?
 
o... it was probably installing the just-built version over the same '3.11+'
modules tree that was running.  This has never been a problem before though..

Dave
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ftrace 'failed to modify' bug when loading reiserfs.ko

2013-09-05 Thread Steven Rostedt

On Thu, 5 Sep 2013 21:19:24 -0400
Dave Jones  wrote:

> For whatever dumb reason, when running 'make install' on a Fedora system,
> os-prober tries to figure out what filesystems are needed by loading 
> filesystems,
> and seeing what sticks..  Today it blew up spectacularly when it got to
> loading reiserfs..  System wedged entirely afterwards.

Could it be that the reiserfs module was compiled differently than the
running kernel?

> 
>   Dave
> 
> [ cut here ]
> WARNING: CPU: 2 PID: 30566 at kernel/trace/ftrace.c:1694 
> ftrace_bug+0x25d/0x270()
> Modules linked in: reiserfs(+) snd_hda_codec_hdmi snd_hda_codec_realtek 
> snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm 
> snd_page_alloc xfs snd_timer libcrc32c snd e1000e ptp usb_debug pps_core 
> pcspkr soundcore
> CPU: 2 PID: 30566 Comm: modprobe Not tainted 3.11.0+ #57 
>  81a2809d 88008de19c30 817171e9 
>  88008de19c68 81053dad 0010 a02738b0
>  8802419e3518  8801ab16e100 88008de19c78
> Call Trace:
>  [] dump_stack+0x54/0x74
>  [] warn_slowpath_common+0x7d/0xa0
>  [] warn_slowpath_null+0x1a/0x20
>  [] ftrace_bug+0x25d/0x270
>  [] ftrace_process_locs+0x308/0x630
>  [] ftrace_module_notify_enter+0x3c/0x40
>  [] notifier_call_chain+0x66/0x150
>  [] __blocking_notifier_call_chain+0x67/0xc0
>  [] blocking_notifier_call_chain+0x16/0x20
>  [] load_module+0x1f7d/0x2680
>  [] ? store_uevent+0x40/0x40
>  [] ? reiserfs_xattr_register_handlers+0xf9f/0xf9f 
> [reiserfs]
>  [] ? reiserfs_xattr_register_handlers+0xf9f/0xf9f 
> [reiserfs]
>  [] SyS_finit_module+0x86/0xb0
>  [] tracesys+0xdd/0xe2
> ---[ end trace 956db59f53237fe4 ]---
> ftrace failed to modify [] 
> reiserfs_init_bitmap_cache+0x0/0x5750 [reiserfs]
>  actual: 14:00:00:00:00

Hmm, where it expected to see a call to mcount, instead is sees the
instruction:

 0x14 00 00 00 00


Can you do an objdump of that same binary, and show me what's located
at: reiserfs_init_bitmap_cache+0x0

-- Steve

> [ cut here ]
> WARNING: CPU: 2 PID: 30566 at arch/x86/mm/pageattr.c:677 
> __cpa_process_fault+0x91/0xa0()
> CPA: called for zero pte. vaddr = a0249000 cpa->vaddr = 
> a0249000
> Modules linked in: reiserfs(+) snd_hda_codec_hdmi snd_hda_codec_realtek 
> snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm 
> snd_page_alloc xfs snd_timer libcrc32c snd e1000e ptp usb_debug pps_core 
> pcspkr soundcore
> CPU: 2 PID: 30566 Comm: modprobe Tainted: GW3.11.0+ #57 
>  81a0ba44 88008de19b40 817171e9 88008de19b88
>  88008de19b78 81053dad 88008de19d08 fff2
>  a0249000 880238646248 88008de19d08 88008de19bd8
> Call Trace:
>  [] dump_stack+0x54/0x74
>  [] warn_slowpath_common+0x7d/0xa0
>  [] ? reiserfs_xattr_register_handlers+0x9f9f/0x29f9f 
> [reiserfs]
>  [] warn_slowpath_fmt+0x4c/0x50
>  [] ? reiserfs_xattr_register_handlers+0x8f9f/0xf9f 
> [reiserfs]
>  [] ? reiserfs_xattr_register_handlers+0x9f9f/0x29f9f 
> [reiserfs]
>  [] ? reiserfs_xattr_register_handlers+0x9f9f/0x29f9f 
> [reiserfs]
>  [] __cpa_process_fault+0x91/0xa0
>  [] __change_page_attr_set_clr+0x392/0xab0
>  [] ? 0xa023efff
>  [] change_page_attr_set_clr+0x123/0x460
>  [] ? 0xa023efff
>  [] set_memory_ro+0x2f/0x40
>  [] ? reiserfs_xattr_register_handlers+0x9f9f/0x29f9f 
> [reiserfs]
>  [] set_section_ro_nx+0x3a/0x71
>  [] load_module+0x1f9e/0x2680
>  [] ? store_uevent+0x40/0x40
>  [] ? reiserfs_xattr_register_handlers+0xf9f/0xf9f 
> [reiserfs]
>  [] ? reiserfs_xattr_register_handlers+0xf9f/0xf9f 
> [reiserfs]
>  [] SyS_finit_module+0x86/0xb0
>  [] tracesys+0xdd/0xe2
> ---[ end trace 956db59f53237fe5 ]---
> Oops: 0003 [#1] SMP 
> Modules linked in: reiserfs snd_hda_codec_hdmi snd_hda_codec_realtek 
> snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm 
> snd_page_alloc xfs snd_timer libcrc32c snd e1000e ptp usb_debug pps_core 
> pcspkr soundcore
> CPU: 1 PID: 30571 Comm: modprobe Tainted: GW3.11.0+ #57 
> task: 8801238a ti: 8801ab314000 task.ti: 8801ab314000
> RIP: 0010:[]  [] load_module+0x161b/0x2680
> RSP: 0018:8801ab315dc0  EFLAGS: 00010202
> RAX: a009c000 RBX: 8801ab315ef8 RCX: a00c2000
> RDX: a00c2000 RSI: 0055 RDI: a00c3f98
> RBP: 8801ab315ee8 R08: a009fa68 R09: a009c000
> R10: a00c3f98 R11: 0002 R12: a02d2838
> R13: 0001 R14:  R15: a02d2820
> FS:  7f6f48b51740() GS:88024580() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: a00c2000 CR3: 0002211e9000 CR4: 001407e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Sta

[PATCH V2] arm: LLVMLinux: use static inline in ARM ftrace.h

2013-09-05 Thread behanw

From: Behan Webster 

With compilers which follow the C99 standard (like modern versions of gcc and
clang), "extern inline" does the wrong thing (emits code for an externally
linkable version of the inline function). In this case using static inline
and removing the NULL version of return_address in return_address.c does
the right thing.

Signed-off-by: Behan Webster 
---
 arch/arm/include/asm/ftrace.h| 2 +-
 arch/arm/kernel/return_address.c | 5 -
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/ftrace.h b/arch/arm/include/asm/ftrace.h
index f89515a..2bb8cac 100644
--- a/arch/arm/include/asm/ftrace.h
+++ b/arch/arm/include/asm/ftrace.h
@@ -45,7 +45,7 @@ void *return_address(unsigned int);
 
 #else
 
-extern inline void *return_address(unsigned int level)
+static inline void *return_address(unsigned int level)
 {
return NULL;
 }
diff --git a/arch/arm/kernel/return_address.c b/arch/arm/kernel/return_address.c
index fafedd8..f6aa84d 100644
--- a/arch/arm/kernel/return_address.c
+++ b/arch/arm/kernel/return_address.c
@@ -63,11 +63,6 @@ void *return_address(unsigned int level)
 #warning "TODO: return_address should use unwind tables"
 #endif
 
-void *return_address(unsigned int level)
-{
-   return NULL;
-}
-
 #endif /* if defined(CONFIG_FRAME_POINTER) && !defined(CONFIG_ARM_UNWIND) / 
else */
 
 EXPORT_SYMBOL_GPL(return_address);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

ftrace 'failed to modify' bug when loading reiserfs.ko

2013-09-05 Thread Dave Jones

For whatever dumb reason, when running 'make install' on a Fedora system,
os-prober tries to figure out what filesystems are needed by loading 
filesystems,
and seeing what sticks..  Today it blew up spectacularly when it got to
loading reiserfs..  System wedged entirely afterwards.

Dave

[ cut here ]
WARNING: CPU: 2 PID: 30566 at kernel/trace/ftrace.c:1694 
ftrace_bug+0x25d/0x270()
Modules linked in: reiserfs(+) snd_hda_codec_hdmi snd_hda_codec_realtek 
snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm 
snd_page_alloc xfs snd_timer libcrc32c snd e1000e ptp usb_debug pps_core pcspkr 
soundcore
CPU: 2 PID: 30566 Comm: modprobe Not tainted 3.11.0+ #57 
 81a2809d 88008de19c30 817171e9 
 88008de19c68 81053dad 0010 a02738b0
 8802419e3518  8801ab16e100 88008de19c78
Call Trace:
 [] dump_stack+0x54/0x74
 [] warn_slowpath_common+0x7d/0xa0
 [] warn_slowpath_null+0x1a/0x20
 [] ftrace_bug+0x25d/0x270
 [] ftrace_process_locs+0x308/0x630
 [] ftrace_module_notify_enter+0x3c/0x40
 [] notifier_call_chain+0x66/0x150
 [] __blocking_notifier_call_chain+0x67/0xc0
 [] blocking_notifier_call_chain+0x16/0x20
 [] load_module+0x1f7d/0x2680
 [] ? store_uevent+0x40/0x40
 [] ? reiserfs_xattr_register_handlers+0xf9f/0xf9f [reiserfs]
 [] ? reiserfs_xattr_register_handlers+0xf9f/0xf9f [reiserfs]
 [] SyS_finit_module+0x86/0xb0
 [] tracesys+0xdd/0xe2
---[ end trace 956db59f53237fe4 ]---
ftrace failed to modify [] 
reiserfs_init_bitmap_cache+0x0/0x5750 [reiserfs]
 actual: 14:00:00:00:00
[ cut here ]
WARNING: CPU: 2 PID: 30566 at arch/x86/mm/pageattr.c:677 
__cpa_process_fault+0x91/0xa0()
CPA: called for zero pte. vaddr = a0249000 cpa->vaddr = a0249000
Modules linked in: reiserfs(+) snd_hda_codec_hdmi snd_hda_codec_realtek 
snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm 
snd_page_alloc xfs snd_timer libcrc32c snd e1000e ptp usb_debug pps_core pcspkr 
soundcore
CPU: 2 PID: 30566 Comm: modprobe Tainted: GW3.11.0+ #57 
 81a0ba44 88008de19b40 817171e9 88008de19b88
 88008de19b78 81053dad 88008de19d08 fff2
 a0249000 880238646248 88008de19d08 88008de19bd8
Call Trace:
 [] dump_stack+0x54/0x74
 [] warn_slowpath_common+0x7d/0xa0
 [] ? reiserfs_xattr_register_handlers+0x9f9f/0x29f9f 
[reiserfs]
 [] warn_slowpath_fmt+0x4c/0x50
 [] ? reiserfs_xattr_register_handlers+0x8f9f/0xf9f [reiserfs]
 [] ? reiserfs_xattr_register_handlers+0x9f9f/0x29f9f 
[reiserfs]
 [] ? reiserfs_xattr_register_handlers+0x9f9f/0x29f9f 
[reiserfs]
 [] __cpa_process_fault+0x91/0xa0
 [] __change_page_attr_set_clr+0x392/0xab0
 [] ? 0xa023efff
 [] change_page_attr_set_clr+0x123/0x460
 [] ? 0xa023efff
 [] set_memory_ro+0x2f/0x40
 [] ? reiserfs_xattr_register_handlers+0x9f9f/0x29f9f 
[reiserfs]
 [] set_section_ro_nx+0x3a/0x71
 [] load_module+0x1f9e/0x2680
 [] ? store_uevent+0x40/0x40
 [] ? reiserfs_xattr_register_handlers+0xf9f/0xf9f [reiserfs]
 [] ? reiserfs_xattr_register_handlers+0xf9f/0xf9f [reiserfs]
 [] SyS_finit_module+0x86/0xb0
 [] tracesys+0xdd/0xe2
---[ end trace 956db59f53237fe5 ]---
Oops: 0003 [#1] SMP 
Modules linked in: reiserfs snd_hda_codec_hdmi snd_hda_codec_realtek 
snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm 
snd_page_alloc xfs snd_timer libcrc32c snd e1000e ptp usb_debug pps_core pcspkr 
soundcore
CPU: 1 PID: 30571 Comm: modprobe Tainted: GW3.11.0+ #57 
task: 8801238a ti: 8801ab314000 task.ti: 8801ab314000
RIP: 0010:[]  [] load_module+0x161b/0x2680
RSP: 0018:8801ab315dc0  EFLAGS: 00010202
RAX: a009c000 RBX: 8801ab315ef8 RCX: a00c2000
RDX: a00c2000 RSI: 0055 RDI: a00c3f98
RBP: 8801ab315ee8 R08: a009fa68 R09: a009c000
R10: a00c3f98 R11: 0002 R12: a02d2838
R13: 0001 R14:  R15: a02d2820
FS:  7f6f48b51740() GS:88024580() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: a00c2000 CR3: 0002211e9000 CR4: 001407e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Stack:
 003fa26b 8801238a 8801ab315e48 8801238a
 a009c000 a02d2a58 a02d2838 3a80
 a009c000 a00c2000 003a94a10969 a00c3f98
Call Trace:
 [] ? xfs_setattr_nonsize+0x240/0x5d0 [xfs]
 [] ? xfs_inumbers+0x248/0x420 [xfs]
 [] ? copy_module_from_fd.isra.48+0x12a/0x190
 [] SyS_finit_module+0x86/0xb0
 [] tracesys+0xdd/0xe2
Code: 48 83 7a 38 00 78 6a 48 8b 30 44 89 ea 4c 89 d7 48 8d 14 52 4c 89 4c 24 
40 41 83 c5 01 48 8d 14 d1 48 89 4c 24 48 4c 89 54 24 58 <48> 89 32 48 8b 70 08 
48 89 72 0

Re: [PATCH v4 0/3] cleanup of gpio_pcf857x.c

2013-09-05 Thread Kuninori Morimoto


Hi

> This patch series
> - removes the irq_demux_work
> - Uses devm_request_threaded_irq
> - Call the user handler iff gpio_to_irq is done.
> 
> v1 --> v2
> Split v1 to 3 patches
> v2 --> v3
>   Remove the unnecessary dts patches.
> v3 --> v4
>   Remove gpio->irq (in patch 2)
> 
> Note: these patches were made after applying [1].
> [1] - [PATCH v5] gpio: pcf857x: Add OF support - 
> https://lkml.org/lkml/2013/8/27/70
> 
> George Cherian (3):
>   gpio: pcf857x: change to devm_request_threaded_irq
>   gpio: pcf857x: remove the irq_demux_work and gpio->irq
>   gpio: pcf857x: call the gpio user handler iff gpio_to_irq is done
> 
>  drivers/gpio/gpio-pcf857x.c | 53 
> ++---
>  1 file changed, 26 insertions(+), 27 deletions(-)

For all patches

Acked-by: Kuninori Morimoto 

Best regards
---
Kuninori Morimoto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 >

1 - 100 of 631 matches

Mail list logo