date:20160427

Re: [PATCH 6/6] intel_sgx: TODO file for the staging area

2016-04-27 Thread Jethro Beekman

On 26-04-16 04:23, Jarkko Sakkinen wrote:
> In order to write test code I would need to use the SDK at minimum to
> generate EINITTOKEN for the test enclave.

You could do this right now with the Rust tools for SGX [1]

[1] https://github.com/jethrogb/sgx-utils/

> /Jarkko

Jethro

linux-next: Tree for Apr 27

2016-04-27 Thread Stephen Rothwell

Hi all,

WARNING: I stuffed up when pushing this release out but have reuploaded
it ... sorry about that.

Changes since 20160426:

Removed tree: kbuild-pitre (commits were picked up in the kbuild tree)

The arm-soc tree lost its build failure.

The net-next tree gained conflicts against the net tree.

The drm-misc tree gained a conflict against the drm-intel tree.

I applied a supplied patch to the kbuild tree (replacing the revert).

The dt-rh tree gained a conflict against the tegra tree.

The staging tree gained a conflict against the staging.current tree.

The akpm-current tree gained a conflict against the gpio tree.

Non-merge commits (relative to Linus' tree): 6196
 5590 files changed, 234287 insertions(+), 117296 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 234 trees (counting Linus' and 35 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (bcc981e9ed84 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging fixes/master (9735a22799b9 Linux 4.6-rc2)
Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on 
module install)
Merging arc-current/for-curr (da67e68c0e46 Documentation: dt: arc: fix spelling 
mistakes)
Merging arm-current/fixes (ac36a881b72a ARM: 8564/1: fix cpu feature extracting 
helper)
Merging m68k-current/for-linus (7b8ba82ad4ad m68k/defconfig: Update defconfigs 
for v4.6-rc2)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging powerpc-fixes/fixes (4705e02498d6 powerpc: Update TM user feature bits 
in scan_features())
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (5ec712934ce1 sparc: Write up preadv2/pwritev2 syscalls.)
Merging net/master (8358b02bf67d bpf: fix double-fdput in 
replace_map_fd_with_map_ptr())
Merging ipsec/master (d6af1a31cc72 vti: Add pmtu handling to vti_xmit.)
Merging ipvs/master (bcf493428840 netfilter: ebtables: Fix extension lookup 
with identical name)
Merging wireless-drivers/master (e2841ea91611 Merge tag 
'iwlwifi-for-kalle-2016-04-12_2' of 
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (8f815cdde3e5 nl80211: check netlink protocol in socket 
release notification)
Merging sound-current/for-linus (bb03ed216370 ALSA: hda - Update BCLK also at 
hotplug for i915 HSW/BDW)
Merging pci-current/for-linus (67e658794ca1 cxgb4: Set VPD size so we can read 
both VPD structures)
Merging driver-core.current/driver-core-linus (c3b46c73264b Linux 4.6-rc4)
Merging tty.current/tty-linus (02da2d72174c Linux 4.6-rc5)
Merging usb.current/usb-linus (97b9b7dc7722 usb: musb: jz4740: fix error check 
of usb_get_phy())
Merging usb-gadget-fixes/fixes (38740a5b87d5 usb: gadget: f_fs: Fix 
use-after-free)
Merging usb-serial-fixes/usb-linus (613ac23a46e1 USB: serial: cp210x: add 
Straizona Focusers device ids)
Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: 
change workqueue ci_otg as freezable)
Merging staging.current/staging-linus (431adc0aeca6 Merge tag 
'iio-fixes-for-4.6c' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio 
into staging-linus)
Merging char-misc.current/char-misc-linus (c3b46c73264b Linux 4.6-rc4)
Merging input-current/for-linus (eb43335c4095 Input: atmel_mxt_ts - use 
mxt_acquire_irq in mxt_soft_reset)
Merging cr

[PATCH v7 2/2] dmaengine: Add Xilinx zynqmp dma engine driver support

2016-04-27 Thread Kedareswara rao Appana

Added the driver for zynqmp dma engine used in Zynq
UltraScale+ MPSoC. This dma controller supports memory to memory and
memory to I/O buffer transfers.

Signed-off-by: Punnaiah Choudary Kalluri 
Signed-off-by: Kedareswara rao Appana 
---
Changes for v7:
- Fixed kbuild compilation warnings.
- Fixed {src,dst}_addr_widths are supposed to be a bitmask of
  supported slave device widths as suggested by Rob.
Changes in v6:
- Removed unnecessary axcache properties from the driver
- Fixed compilation issues
Changes in v5:
- Removed in_interrupt check from the tasklet cleanup as
  suggested by the vinod/lars.
Changes in v4:
- Modified the defines to start with ZYNQMP_DMA perfix
- Changed the zynqmp_dma_alloc_chan_resources to return number of
  allocated descriptors
- Changed the zynqmp_dma_device variable name
- Released the locks before calling user callback
- freeup irq in zynqmp_dma_chan_remove function.
Changes in v3:
- Modified the zynqmp_dma_chan_is_idle function return type to bool
Changes in v2:
- Corrected the function header documentation
- Framework expects bus-width value in bytes. so, fixed it
- Removed magic numbers for bus-width

 drivers/dma/Kconfig |6 +
 drivers/dma/xilinx/zynqmp_dma.c | 1232 +++
 2 files changed, 1238 insertions(+)
 create mode 100644 drivers/dma/xilinx/zynqmp_dma.c

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 2846753..33dc427 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -527,6 +527,12 @@ config ZX_DMA
help
  Support the DMA engine for ZTE ZX296702 platform devices.
 
+config XILINX_ZYNQMP_DMA
+   tristate "Xilinx ZynqMP DMA Engine"
+   depends on (ARCH_ZYNQ || MICROBLAZE || ARM64)
+   select DMA_ENGINE
+   help
+ Enable support for Xilinx ZynqMP DMA controller.
 
 # driver files
 source "drivers/dma/bestcomm/Kconfig"
diff --git a/drivers/dma/xilinx/zynqmp_dma.c b/drivers/dma/xilinx/zynqmp_dma.c
new file mode 100644
index 000..9c44418
--- /dev/null
+++ b/drivers/dma/xilinx/zynqmp_dma.c
@@ -0,0 +1,1232 @@
+/*
+ * DMA driver for Xilinx ZynqMP DMA Engine
+ *
+ * Copyright (C) 2016 Xilinx, Inc. All rights reserved.
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../dmaengine.h"
+
+/* Register Offsets */
+#define ZYNQMP_DMA_ISR 0x100
+#define ZYNQMP_DMA_IMR 0x104
+#define ZYNQMP_DMA_IER 0x108
+#define ZYNQMP_DMA_IDS 0x10C
+#define ZYNQMP_DMA_CTRL0   0x110
+#define ZYNQMP_DMA_CTRL1   0x114
+#define ZYNQMP_DMA_DATA_ATTR   0x120
+#define ZYNQMP_DMA_DSCR_ATTR   0x124
+#define ZYNQMP_DMA_SRC_DSCR_WRD0   0x128
+#define ZYNQMP_DMA_SRC_DSCR_WRD1   0x12C
+#define ZYNQMP_DMA_SRC_DSCR_WRD2   0x130
+#define ZYNQMP_DMA_SRC_DSCR_WRD3   0x134
+#define ZYNQMP_DMA_DST_DSCR_WRD0   0x138
+#define ZYNQMP_DMA_DST_DSCR_WRD1   0x13C
+#define ZYNQMP_DMA_DST_DSCR_WRD2   0x140
+#define ZYNQMP_DMA_DST_DSCR_WRD3   0x144
+#define ZYNQMP_DMA_SRC_START_LSB   0x158
+#define ZYNQMP_DMA_SRC_START_MSB   0x15C
+#define ZYNQMP_DMA_DST_START_LSB   0x160
+#define ZYNQMP_DMA_DST_START_MSB   0x164
+#define ZYNQMP_DMA_RATE_CTRL   0x18C
+#define ZYNQMP_DMA_IRQ_SRC_ACCT0x190
+#define ZYNQMP_DMA_IRQ_DST_ACCT0x194
+#define ZYNQMP_DMA_CTRL2   0x200
+
+/* Interrupt registers bit field definitions */
+#define ZYNQMP_DMA_DONEBIT(10)
+#define ZYNQMP_DMA_AXI_WR_DATA BIT(9)
+#define ZYNQMP_DMA_AXI_RD_DATA BIT(8)
+#define ZYNQMP_DMA_AXI_RD_DST_DSCR BIT(7)
+#define ZYNQMP_DMA_AXI_RD_SRC_DSCR BIT(6)
+#define ZYNQMP_DMA_IRQ_DST_ACCT_ERRBIT(5)
+#define ZYNQMP_DMA_IRQ_SRC_ACCT_ERRBIT(4)
+#define ZYNQMP_DMA_BYTE_CNT_OVRFL  BIT(3)
+#define ZYNQMP_DMA_DST_DSCR_DONE   BIT(2)
+#define ZYNQMP_DMA_INV_APB BIT(0)
+
+/* Control 0 register bit field definitions */
+#define ZYNQMP_DMA_OVR_FETCH   BIT(7)
+#define ZYNQMP_DMA_POINT_TYPE_SG   BIT(6)
+#define ZYNQMP_DMA_RATE_CTRL_ENBIT(3)
+
+/* Control 1 register bit field definitions */
+#define ZYNQMP_DMA_SRC_ISSUE   GENMASK(4, 0)
+
+/* Data Attribute register bit field definitions */
+#define ZYNQMP_DMA_ARBURST GENMASK(27, 26)
+#define ZYNQMP_DMA_ARCACHE GENMASK(25, 22)
+#define ZYNQMP_DMA_ARCACHE_OFST22
+#define ZYNQMP_DMA_ARQOS   GENMASK(21, 18)
+#define ZYNQMP_DMA_ARQOS_OFST  18
+#define ZYNQMP_DMA_ARLEN   GENMASK(

[PATCH] ARM: BCM5301X: Enable SPI-NOR on dual flash devices

2016-04-27 Thread Rafał Miłecki

Commit 1b47b98acce2 ("ARM: BCM5301X: Add DT entry for SPI controller and
NOR flash") enabled SPI-NOR device on routers using serial flash only.
However there are also devices with two flash memories:
1) Small SPI attached flash used mostly for booting
2) Bigger NAND used mostly for storing firmware
On such devices we still need SPI-NOR e.g. to access NVRAM data.

Signed-off-by: Rafał Miłecki 
---
 arch/arm/boot/dts/bcm4708-buffalo-wzr-1750dhp.dts | 4 
 arch/arm/boot/dts/bcm4708-netgear-r6250.dts   | 4 
 arch/arm/boot/dts/bcm4708-netgear-r6300-v2.dts| 4 
 arch/arm/boot/dts/bcm4709-buffalo-wxr-1900dhp.dts | 4 
 arch/arm/boot/dts/bcm47094-dlink-dir-885l.dts | 4 
 5 files changed, 20 insertions(+)

diff --git a/arch/arm/boot/dts/bcm4708-buffalo-wzr-1750dhp.dts 
b/arch/arm/boot/dts/bcm4708-buffalo-wzr-1750dhp.dts
index 5087aa8..9cb186e 100644
--- a/arch/arm/boot/dts/bcm4708-buffalo-wzr-1750dhp.dts
+++ b/arch/arm/boot/dts/bcm4708-buffalo-wzr-1750dhp.dts
@@ -147,3 +147,7 @@
 &usb3 {
vcc-gpio = <&chipcommon 10 GPIO_ACTIVE_LOW>;
 };
+
+&spi_nor {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/bcm4708-netgear-r6250.dts 
b/arch/arm/boot/dts/bcm4708-netgear-r6250.dts
index 1049ab1..8ce39d5 100644
--- a/arch/arm/boot/dts/bcm4708-netgear-r6250.dts
+++ b/arch/arm/boot/dts/bcm4708-netgear-r6250.dts
@@ -90,3 +90,7 @@
 &usb3 {
vcc-gpio = <&chipcommon 0 GPIO_ACTIVE_HIGH>;
 };
+
+&spi_nor {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/bcm4708-netgear-r6300-v2.dts 
b/arch/arm/boot/dts/bcm4708-netgear-r6300-v2.dts
index 3a94606..6229ef2 100644
--- a/arch/arm/boot/dts/bcm4708-netgear-r6300-v2.dts
+++ b/arch/arm/boot/dts/bcm4708-netgear-r6300-v2.dts
@@ -82,3 +82,7 @@
};
};
 };
+
+&spi_nor {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/bcm4709-buffalo-wxr-1900dhp.dts 
b/arch/arm/boot/dts/bcm4709-buffalo-wxr-1900dhp.dts
index 791d722..0653e7e 100644
--- a/arch/arm/boot/dts/bcm4709-buffalo-wxr-1900dhp.dts
+++ b/arch/arm/boot/dts/bcm4709-buffalo-wxr-1900dhp.dts
@@ -131,3 +131,7 @@
 &usb2 {
vcc-gpio = <&chipcommon 13 GPIO_ACTIVE_HIGH>;
 };
+
+&spi_nor {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/bcm47094-dlink-dir-885l.dts 
b/arch/arm/boot/dts/bcm47094-dlink-dir-885l.dts
index ace38ef..50cf804 100644
--- a/arch/arm/boot/dts/bcm47094-dlink-dir-885l.dts
+++ b/arch/arm/boot/dts/bcm47094-dlink-dir-885l.dts
@@ -113,3 +113,7 @@
 &usb3 {
vcc-gpio = <&chipcommon 18 GPIO_ACTIVE_HIGH>;
 };
+
+&spi_nor {
+   status = "okay";
+};
-- 
1.8.4.5

[PATCH v7 1/2] Documentation: DT: dma: Add Xilinx zynqmp dma device tree binding documentation

2016-04-27 Thread Kedareswara rao Appana

Device-tree binding documentation for Xilinx zynqmp dma engine used in
Zynq UltraScale+ MPSoC.

Signed-off-by: Punnaiah Choudary Kalluri 
Signed-off-by: Kedareswara rao Appana 
---
Changes in v7:
- None.
Changes in v6:
- Removed desc-axi-cache/dst-axi-cache/src-axi-cache properties
  from the binding doc as it allow broken combinations when dma-coherent
  is set as suggested by Rob.
- Fixed minor comments given by Rob related coding(lower case DT node name).
Changes in v5:
- Use dma-coherent flag for coherent transfers as suggested by rob.
- Removed unnecessary properties from binding doc as suggested by Rob.
Changes in v4:
- None
Changes in v3:
- None
Changes in v2:
- None.


 .../devicetree/bindings/dma/xilinx/zynqmp_dma.txt  | 44 ++
 1 file changed, 44 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/xilinx/zynqmp_dma.txt

diff --git a/Documentation/devicetree/bindings/dma/xilinx/zynqmp_dma.txt 
b/Documentation/devicetree/bindings/dma/xilinx/zynqmp_dma.txt
new file mode 100644
index 000..f0f0b54
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/xilinx/zynqmp_dma.txt
@@ -0,0 +1,44 @@
+Xilinx ZynqMP DMA engine, it does support memory to memory transfers,
+memory to device and device to memory transfers. It also has flow
+control and rate control support for slave/peripheral dma access.
+
+Required properties:
+- compatible   : Should be "xlnx,zynqmp-dma-1.0"
+- reg  : Memory map for gdma/adma module access.
+- interrupt-parent : Interrupt controller the interrupt is routed through
+- interrupts   : Should contain DMA channel interrupt.
+- xlnx,bus-width   : Axi buswidth in bits. Should contain 128 or 64
+- clock-names  : List of input clocks "clk_main", "clk_apb"
+ (see clock bindings for details)
+
+Optional properties:
+- xlnx,include-sg  : Indicates the controller to operate in simple or
+ scatter gather dma mode
+- xlnx,ratectrl: Scheduling interval in terms of clock cycles 
for
+ source AXI transaction
+- xlnx,overfetch   : Tells whether the channel is allowed to over
+ fetch the data
+- xlnx,src-issue   : Number of AXI outstanding transactions on source side
+- xlnx,src-burst-len   : AXI length for data read. Support only power of
+ 2 byte values.
+- xlnx,dst-burst-len   : AXI length for data write. Support only power of
+ 2 byte values.
+- dma-coherent : Present if dma operations are coherent.
+
+Example:
+
+fpd_dma_chan1: dma@fd50 {
+   compatible = "xlnx,zynqmp-dma-1.0";
+   reg = <0x0 0xFD50 0x1000>;
+   interrupt-parent = <&gic>;
+   interrupts = <0 117 4>;
+   clock-names = "clk_main", "clk_apb";
+   xlnx,bus-width = <128>;
+   xlnx,include-sg;
+   xlnx,overfetch;
+   dma-coherent;
+   xlnx,ratectrl = <0>;
+   xlnx,src-issue = <16>;
+   xlnx,src-burst-len = <4>;
+   xlnx,dst-burst-len = <4>;
+};
-- 
2.1.2

[PATCH] ASoC: atmel_ssc_dai: read DSP mode A data on rising edges of bclk

2016-04-27 Thread Peter Rosin

Signed-off-by: Peter Rosin 
---
 sound/soc/atmel/atmel_ssc_dai.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/soc/atmel/atmel_ssc_dai.c b/sound/soc/atmel/atmel_ssc_dai.c
index 6dca95aa8a82..4ab1e2013238 100644
--- a/sound/soc/atmel/atmel_ssc_dai.c
+++ b/sound/soc/atmel/atmel_ssc_dai.c
@@ -652,7 +652,7 @@ static int atmel_ssc_hw_params(struct snd_pcm_substream 
*substream,
rcmr =SSC_BF(RCMR_PERIOD, ssc_p->rcmr_period)
| SSC_BF(RCMR_STTDLY, 1)
| SSC_BF(RCMR_START, SSC_START_RISING_RF)
-   | SSC_BF(RCMR_CKI, SSC_CKI_FALLING)
+   | SSC_BF(RCMR_CKI, SSC_CKI_RISING)
| SSC_BF(RCMR_CKO, SSC_CKO_NONE)
| SSC_BF(RCMR_CKS, SSC_CKS_DIV);
 
@@ -692,7 +692,7 @@ static int atmel_ssc_hw_params(struct snd_pcm_substream 
*substream,
rcmr =SSC_BF(RCMR_PERIOD, 0)
| SSC_BF(RCMR_STTDLY, START_DELAY)
| SSC_BF(RCMR_START, SSC_START_RISING_RF)
-   | SSC_BF(RCMR_CKI, SSC_CKI_FALLING)
+   | SSC_BF(RCMR_CKI, SSC_CKI_RISING)
| SSC_BF(RCMR_CKO, SSC_CKO_NONE)
| SSC_BF(RCMR_CKS, ssc->clk_from_rk_pin ?
   SSC_CKS_PIN : SSC_CKS_CLOCK);
-- 
2.1.4

[PATCH] media: tuner: mt2063: use lib gcd

2016-04-27 Thread zengzhaoxiu

From: Zhaoxiu Zeng 

This patch removes the local MT2063_gcd function, uses lib gcd instead

Signed-off-by: Zhaoxiu Zeng 
---
 drivers/media/tuners/mt2063.c | 30 +-
 1 file changed, 5 insertions(+), 25 deletions(-)

diff --git a/drivers/media/tuners/mt2063.c b/drivers/media/tuners/mt2063.c
index 6457ac9..7f0b9d5 100644
--- a/drivers/media/tuners/mt2063.c
+++ b/drivers/media/tuners/mt2063.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mt2063.h"
 
@@ -665,27 +666,6 @@ static u32 MT2063_ChooseFirstIF(struct 
MT2063_AvoidSpursData_t *pAS_Info)
 }
 
 /**
- * gcd() - Uses Euclid's algorithm
- *
- * @u, @v: Unsigned values whose GCD is desired.
- *
- * Returns THE greatest common divisor of u and v, if either value is 0,
- * the other value is returned as the result.
- */
-static u32 MT2063_gcd(u32 u, u32 v)
-{
-   u32 r;
-
-   while (v != 0) {
-   r = u % v;
-   u = v;
-   v = r;
-   }
-
-   return u;
-}
-
-/**
  * IsSpurInBand() - Checks to see if a spur will be present within the IF's
  *  bandwidth. (fIFOut +/- fIFBW, -fIFOut +/- fIFBW)
  *
@@ -731,12 +711,12 @@ static u32 IsSpurInBand(struct MT2063_AvoidSpursData_t 
*pAS_Info,
 ** of f_LO1, f_LO2 and the edge value.  Use the larger of this
 ** gcd-based scale factor or f_Scale.
 */
-   lo_gcd = MT2063_gcd(f_LO1, f_LO2);
-   gd_Scale = max((u32) MT2063_gcd(lo_gcd, d), f_Scale);
+   lo_gcd = gcd(f_LO1, f_LO2);
+   gd_Scale = max((u32) gcd(lo_gcd, d), f_Scale);
hgds = gd_Scale / 2;
-   gc_Scale = max((u32) MT2063_gcd(lo_gcd, c), f_Scale);
+   gc_Scale = max((u32) gcd(lo_gcd, c), f_Scale);
hgcs = gc_Scale / 2;
-   gf_Scale = max((u32) MT2063_gcd(lo_gcd, f), f_Scale);
+   gf_Scale = max((u32) gcd(lo_gcd, f), f_Scale);
hgfs = gf_Scale / 2;
 
n0 = DIV_ROUND_UP(f_LO2 - d, f_LO1 - f_LO2);
-- 
2.5.0

[patch] sched: Fix smp nice induced group scheduling load distribution woes

2016-04-27 Thread Mike Galbraith

On Mon, 2016-04-25 at 11:18 +0200, Mike Galbraith wrote:
> On Sun, 2016-04-24 at 09:05 +0200, Mike Galbraith wrote:
> > On Sat, 2016-04-23 at 18:38 -0700, Brendan Gregg wrote:
> > 
> > > The bugs they found seem real, and their analysis is great
> > > (although
> > > using visualizations to find and fix scheduler bugs isn't new),
> > > and it
> > > would be good to see these fixed. However, it would also be
> > > useful to
> > > double check how widespread these issues really are. I suspect
> > > many on
> > > this list can test these patches in different environments.
> > 
> > Part of it sounded to me very much like they're meeting and
> > "fixing"
> > SMP group fairness...
> 
> Ew, NUMA boxen look like they could use a hug or two.  Add a group of
> one hog to compete with a box wide kbuild, ~lose a node.

sched: Fix smp nice induced group scheduling load distribution woes

On even a modest sized NUMA box any load that wants to scale
is essentially reduced to SCHED_IDLE class by smp nice scaling.
Limit niceness to prevent cramming a box wide load into a too
small space.  Given niceness affects latency, give the user the
option to completely disable box wide group fairness as well.

time make -j192 modules on a 4 node NUMA box..

Before:
root cgroup
real1m6.987s  1.00

cgroup vs 1 groups of 1 hog
real1m20.871s 1.20

cgroup vs 2 groups of 1 hog
real1m48.803s 1.62

Each single task group receives a ~full socket because the kbuild
has become an essentially massless object that fits in practically
no space at all.  Near perfect math led directly to far from good
scaling/performance, a "Perfect is the enemy of good" poster child.

After "Let's just be nice enough instead" adjustment, single task
groups continued to sustain >99% utilization while competing with
the box sized kbuild.

cgroup vs 2 groups of 1 hog
real1m8.151s 1.01  192/190=1.01

Good enough works better.. nearly perfectly in this case.

Signed-off-by: Mike Galbraith 
---
 kernel/sched/fair.c |   22 ++
 kernel/sched/features.h |3 +++
 2 files changed, 21 insertions(+), 4 deletions(-)

Index: linux-2.6/kernel/sched/fair.c
===
--- linux-2.6.orig/kernel/sched/fair.c
+++ linux-2.6/kernel/sched/fair.c
@@ -2464,17 +2464,28 @@ static inline long calc_tg_weight(struct
 
 static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
 {
-   long tg_weight, load, shares;
+   long tg_weight, load, shares, min_shares = MIN_SHARES;
 
-   tg_weight = calc_tg_weight(tg, cfs_rq);
+   if (!sched_feat(SMP_NICE_GROUPS))
+   return tg->shares;
+
+   /*
+* Bound niceness to prevent everything that wants to scale from
+* essentially becoming SCHED_IDLE on multi/large socket boxen,
+* screwing up our ability to distribute load properly and/or
+* deliver acceptable latencies.
+*/
+   tg_weight = min_t(long, calc_tg_weight(tg, cfs_rq), 
sched_prio_to_weight[10]);
load = cfs_rq->load.weight;
 
shares = (tg->shares * load);
if (tg_weight)
shares /= tg_weight;
 
-   if (shares < MIN_SHARES)
-   shares = MIN_SHARES;
+   if (tg->shares > sched_prio_to_weight[20])
+   min_shares = sched_prio_to_weight[20];
+   if (shares < min_shares)
+   shares = min_shares;
if (shares > tg->shares)
shares = tg->shares;
 
@@ -2517,6 +2528,9 @@ static void update_cfs_shares(struct cfs
 #ifndef CONFIG_SMP
if (likely(se->load.weight == tg->shares))
return;
+#else
+   if (!sched_feat(SMP_NICE_GROUPS) && se->load.weight == tg->shares)
+   return;
 #endif
shares = calc_cfs_shares(cfs_rq, tg);
 
Index: linux-2.6/kernel/sched/features.h
===
--- linux-2.6.orig/kernel/sched/features.h
+++ linux-2.6/kernel/sched/features.h
@@ -69,3 +69,6 @@ SCHED_FEAT(RT_RUNTIME_SHARE, true)
 SCHED_FEAT(LB_MIN, false)
 SCHED_FEAT(ATTACH_AGE_LOAD, true)
 
+#if defined(CONFIG_FAIR_GROUP_SCHED) && defined(CONFIG_SMP)
+SCHED_FEAT(SMP_NICE_GROUPS, true)
+#endif

Re: linux-next: Tree for Apr 27

2016-04-27 Thread Stephen Rothwell

Hi all,

On Wed, 27 Apr 2016 17:03:54 +1000 Stephen Rothwell  
wrote:
>
> WARNING: I stuffed up when pushing this release out but have reuploaded
> it ... sorry about that.

Just to be clear the correct SHA1s for the master branch and the
next-20160427 tag are 29fab3a5a2a1 and 7a1eedbaaf03, respectively.

-- 
Cheers,
Stephen Rothwell

Re: [PATCH] fs: fix over-zealous use of "const"

2016-04-27 Thread James Morris

On Thu, 21 Apr 2016, Kees Cook wrote:

> When I was fixing up const recommendations from checkpatch.pl, I went
> overboard. This fixes the warning (during a W=1 build):
> 
> include/linux/fs.h:2627:74: warning: type qualifiers ignored on function 
> return type [-Wignored-qualifiers]
> static inline const char * const kernel_read_file_id_str(enum 
> kernel_read_file_id id)
> 
> Reported-by: Andy Shevchenko 
> Signed-off-by: Kees Cook 
> ---
> This is for linux-security next
> ---

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next


-- 
James Morris

Re: [PATCH v7 1/2] Documentation: DT: dma: Add Xilinx zynqmp dma device tree binding documentation

2016-04-27 Thread Lars-Peter Clausen

On 04/27/2016 09:05 AM, Kedareswara rao Appana wrote:
[...]
> +- xlnx,include-sg: Indicates the controller to operate in simple or
> +   scatter gather dma mode
> +- xlnx,ratectrl  : Scheduling interval in terms of clock cycles 
> for
> +   source AXI transaction
> +- xlnx,overfetch : Tells whether the channel is allowed to over
> +   fetch the data
> +- xlnx,src-issue : Number of AXI outstanding transactions on source side
> +- xlnx,src-burst-len : AXI length for data read. Support only power of
> +   2 byte values.
> +- xlnx,dst-burst-len : AXI length for data write. Support only power of

These are all software runtime configuration parameters that you'd want to
change at runtime depending on which peripheral you are targeting with a
specific DMA transfer. These really do not belong into the devicetree.

Re: [PATCH] ARM: dts: at91: VInCo: fix phy reset gpio flag

2016-04-27 Thread Nicolas Ferre

Le 26/04/2016 20:27, Sergei Shtylyov a écrit :
> On 04/26/2016 08:17 PM, David Miller wrote:
> 
>>> I plan to queue this patch through arm-soc for 4.7.
>>
>> Ok.
> 
> How about this patch going thru your net-next repo instead?
> I'd like to keep the kernel bisectable... if my phylib/macb patches get 
> merged 
> earlier than this one, that board would be broken...

Sergei, David,

I don't think that there is a big risk for this board to be tested in
the meantime as it's not widely deployed yet.
And as I'm aware of the issue and basically maintaining this DT file, I
think that I'll be informed if people try an unlikely arrangement of
patches on this board.

So either way, I'm okay. But I do think it's not worth thinking too much
about this case.

Bye,
-- 
Nicolas Ferre

Re: [RESEND PATCH v2 5/7] usb: xhci: plat: Remove checks for optional clock in error/remove path

2016-04-27 Thread Felipe Balbi


Hi,

Jisheng Zhang  writes:
> Dear Felipe,
>
> On Wed, 27 Apr 2016 08:33:52 +0300 Felipe Balbi wrote:
>
>> Jisheng Zhang  writes:
>> > Commit 63589e92c2d9 ("clk: Ignore error and NULL pointers passed to
>> > clk_{unprepare, disable}()") allows NULL or error pointer to be passed
>> > unconditionally.
>> >
>> > This patch is to simplify probe error and remove code paths.  
>> 
>> this seems wrong to me. xhci->clk isn't initialized to NULL, it's either
>> initialized to a valid struct clk * or some ERR_PTR() value.
>
> Commit 63589e92c2d9 could also ignore error value ;)

oh okay, thanks for that. That's, IMHO, quite dangerous ;-)

-- 
balbi


signature.asc
Description: PGP signature

Re: [PATCH v4 17/21] capabilities: Allow privileged user in s_user_ns to set security.* xattrs

2016-04-27 Thread James Morris

On Tue, 26 Apr 2016, Seth Forshee wrote:

> A privileged user in s_user_ns will generally have the ability to
> manipulate the backing store and insert security.* xattrs into
> the filesystem directly. Therefore the kernel must be prepared to
> handle these xattrs from unprivileged mounts, and it makes little
> sense for commoncap to prevent writing these xattrs to the
> filesystem. The capability and LSM code have already been updated
> to appropriately handle xattrs from unprivileged mounts, so it
> is safe to loosen this restriction on setting xattrs.
> 
> The exception to this logic is that writing xattrs to a mounted
> filesystem may also cause the LSM inode_post_setxattr or
> inode_setsecurity callbacks to be invoked. SELinux will deny the
> xattr update by virtue of applying mountpoint labeling to
> unprivileged userns mounts, and Smack will deny the writes for
> any user without global CAP_MAC_ADMIN, so loosening the
> capability check in commoncap is safe in this respect as well.
> 
> Signed-off-by: Seth Forshee 
> Acked-by: Serge Hallyn 


Acked-by: James Morris 


-- 
James Morris

Re: [RESEND PATCH v2 6/7] usb: xhci: plat: add generic PHY support

2016-04-27 Thread Felipe Balbi


Hi,

Jisheng Zhang  writes:
>> > +static void xhci_plat_phy_exit(struct usb_hcd *hcd)
>> > +{
>> > +  if (hcd->phy) {
>> > +  phy_power_off(hcd->phy);
>> > +  phy_exit(hcd->phy);
>> > +  } else {
>> > +  usb_phy_shutdown(hcd->usb_phy);
>> > +  }
>> > +}
>> > +
>> >  static int xhci_plat_probe(struct platform_device *pdev)
>> >  {
>> >struct device_node  *node = pdev->dev.of_node;
>> > @@ -145,6 +177,7 @@ static int xhci_plat_probe(struct platform_device 
>> > *pdev)
>> >struct usb_hcd  *hcd;
>> >struct clk  *clk;
>> >struct usb_phy  *usb_phy;
>> > +  struct phy  *phy;  
>> 
>> so, one phy driver using USB PHY layer and another using generic PHY
>> layer ? Why ? I think the first thing your series should do would be to
>
> It's different platforms. E.g
> platform A may write the phy driver under usb phy layer, while platform B
> may have generic phy driver.

right, but both APIs should be supported with *two* PHYs for the time being.

> The questions are: when adding phy support to xhci-plat, the generic phy
> has existed for a long time, what's the reason to use the deprecated usb
> phy APIs.

I don't know, ask the author :-) Maybe the PHY driver was already
available on the USB PHY layer ? What we should do is push that PHY
driver to be moved over to generic PHY layer, then we can get rid of USB
PHY layer from xhci-plat.

> And per my check, it's only MVEBU platforms use this support? I'm not sure
> if we could remove usbphy code from xhci-plat first then add generic phy then
> adding MVEBU xhci phy support bak with the new code. So Cc mvebu maintainers

First the PHY driver(s) depending on that should be converted over.

-- 
balbi


signature.asc
Description: PGP signature

Re: [PATCH 1/2] clk: imx: do not sleep if IRQ's are still disabled

2016-04-27 Thread Shawn Guo

On Wed, Apr 27, 2016 at 10:57:21AM +0800, Dong Aisheng wrote:
> On Wed, Apr 27, 2016 at 9:58 AM, Shawn Guo  wrote:
> > On Tue, Apr 26, 2016 at 07:27:03PM +0800, Dong Aisheng wrote:
> >> Shawn,
> >> What's your suggestion?
> >
> > I think this needs more discussion, and I just dropped Stefan's patch
> > from my tree.
> >
> > We need to firstly understand why this is happening.  The .prepare hook
> > is defined to be non-atomic context, and so that we call sleep function
> > in there.  We did everything right.  Why are we getting the warning?  If
> > I'm correct, this warning only happens on i.MX7D.  Why is that?
> >
> 
> Why Stefan's patch works (checking irqs_disabled()) is because during kernel
> time init, the irq is still not enabled. It fixes the issue indirectly.
> See:
> asmlinkage __visible void __init start_kernel(void)
> {
> /*
>  * Set up the scheduler prior starting any interrupts (such as the
>  * timer interrupt). Full topology setup happens at smp_init()
>  * time - but meanwhile we still have a functioning scheduler.
>  */
> sched_init();
> .
> time_init();
> ..
> WARN(!irqs_disabled(), "Interrupts were enabled early\n");
> early_boot_irqs_disabled = false;
> local_irq_enable();
> }
> 
> The issue can only happen when PLL enable causes a schedule during
> imx_clock_init().
> Not all PLL has this issue.
> The issue happens on MX7D pll_audio_main_clk/pll_video_main_clk
> which requires more delay time and cause usleep.
> Because clk framework does not support MX7D clock types (operation requires
> parents on), we simply enable all clocks in imx7d_clocks_init().
> 
> If apply my this patch series:
> https://lkml.org/lkml/2016/4/20/199
> The issue can also be gone.

Thanks for the info.  It sounds like that we are fixing the problem in
the wrong place, i.e. clk_pllv3_prepare().  The function does nothing
wrong, since the .prepare hook is defined to be one that can sleep.  If
we see sleep warning in a context calling clk_prepare(), that probably
means we shouldn't make the function call from that context.

Shawn

Re: zram: per-cpu compression streams

2016-04-27 Thread Minchan Kim

Hello Sergey,

On Tue, Apr 26, 2016 at 08:23:05PM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (04/19/16 17:00), Minchan Kim wrote:
> [..]
> > I'm convinced now with your data. Super thanks!
> > However, as you know, we need data how bad it is in heavy memory pressure.
> > Maybe, you can test it with fio and backgound memory hogger,
> 
> it's really hard to produce stable test results when the system
> is under mem pressure.
> 
> first, I modified zram to export the re-compression number
> (put cpu stream and re-try handler allocation)
> 
> mm_stat for numjobs{1..10}. the number of re-compressions is in "< NUM>" 
> format
> 
> 3221225472 3221225472 32212254720 322122956800 <
> 6421>
> 3221225472 3221225472 32212254720 322123366400 <
> 6998>
> 3221225472 2912157607 29528023040 29528145920   84 <
> 7271>
> 3221225472 2893479936 28991201280 28991365120  156 <
> 8260>
> 3221217280 2886040814 28990996480 28991283200   78 <
> 8297>
> 3221225472 2880045056 28856934400 28857180160   54 <
> 7794>
> 3221213184 2877431364 28837560320 28838010880  144 <
> 7336>
> 3221225472 2873229312 28760965120 28761333760   28 <
> 8699>
> 3221213184 2870728008 28716933120 28717301760   30 <
> 8189>
> 2899095552 2899095552 28990955520 2899136512786430 <
> 7485>

It would be great when we see the below ratio for each test.

1-compression : 2(re)-compression

> 
> as we can see, the number of re-compressions can vary from 6421 to 8699.
> 
> 
> the test:
> 
> -- 4 GB x86_64 box
> -- zram 3GB, lzo
> -- mem-hogger pre-faults 3GB of pages before the fio test
> -- fio test has been modified to have 11% compression ratio (to increase the
>   chances of re-compressions)

Could you test concurrent mem hogger with fio rather than pre-fault before fio 
test
in next submit?

>-- buffer_compress_percentage=11
>-- scramble_buffers=0
> 
> 
> considering buffer_compress_percentage=11, the box was under somewhat
> heavy pressure.
> 
> now, the results

Yeb, Even, recompression case is fater than old but want to see more heavy 
memory
pressure case and the ratio I mentioned above.

If the result is still good, please send public patch with number.
Thanks for looking this, Sergey!

> 
> 
> fio stats
> 
> 4 streams8 streams   per cpu
> ===
> #jobs1
> READ:   2411.4MB/s 2430.4MB/s  2440.4MB/s
> READ:   2094.8MB/s 2002.7MB/s  2034.5MB/s
> WRITE:  141571KB/s 140334KB/s  143542KB/s
> WRITE:  712025KB/s 706111KB/s  745256KB/s
> READ:   531014KB/s 525250KB/s  537547KB/s
> WRITE:  530960KB/s 525197KB/s  537492KB/s
> READ:   473577KB/s 470320KB/s  476880KB/s
> WRITE:  473645KB/s 470387KB/s  476948KB/s
> #jobs2
> READ:   7897.2MB/s 8031.4MB/s  7968.9MB/s
> READ:   6864.9MB/s 6803.2MB/s  6903.4MB/s
> WRITE:  321386KB/s 314227KB/s  313101KB/s
> WRITE:  1275.3MB/s 1245.6MB/s  1383.5MB/s
> READ:   1035.5MB/s 1021.9MB/s  1098.4MB/s
> WRITE:  1035.6MB/s 1021.1MB/s  1098.6MB/s
> READ:   972014KB/s 952321KB/s  987.66MB/s
> WRITE:  969792KB/s 950144KB/s  985.40MB/s
> #jobs3
> READ:   13260MB/s  13260MB/s   13222MB/s
> READ:   11636MB/s  11636MB/s   11755MB/s
> WRITE:  511500KB/s 507730KB/s  504959KB/s
> WRITE:  1646.1MB/s 1673.9MB/s  1755.5MB/s
> READ:   1389.5MB/s 1387.2MB/s  1479.6MB/s
> WRITE:  1387.6MB/s 1385.3MB/s  1477.4MB/s
> READ:   1286.8MB/s 1289.1MB/s  1377.3MB/s
> WRITE:  1284.8MB/s 1287.1MB/s  1374.9MB/s
> #jobs4
> READ:   19851MB/s  20244MB/s   20344MB/s
> READ:   17732MB/s  17835MB/s   18097MB/s
> WRITE:  667776KB/s 655599KB/s  693464KB/s
> WRITE:  2041.2MB/s 2072.6MB/s  2474.1MB/s
> READ:   1770.1MB/s 1781.7MB/s  2035.5MB/s
> WRITE:  1765.8MB/s 1777.3MB/s  2030.5MB/s
> READ:   1641.6MB/s 1672.4MB/s  1892.5MB/s
> WRITE:  1643.2MB/s 1674.2MB/s  1894.4MB/s
> #jobs5
> READ:   19468MB/s  1848

Re: [PATCH net-next 9/9] taskstats: use the libnl API to align nlattr on 64-bit

2016-04-27 Thread Nicolas Dichtel

Le 27/04/2016 03:14, Balbir Singh a écrit :
> 
> 
> On 23/04/16 01:31, Nicolas Dichtel wrote:
>> Goal of this patch is to use the new libnl API to align netlink attribute
>> when needed.
>> The layout of the netlink message will be a bit different after the patch,
>> because the padattr (TASKSTATS_TYPE_STATS) will be inside the nested
>> attribute instead of before it.
>>
>> Signed-off-by: Nicolas Dichtel 
> 
> The layout will change/break user space -- I've not tested the patch though..
Sigh.

I quote David:
"All userspace components using netlink should always ignore attributes
they do not recognize in dumps.

This is one of the most basic principles of netlink"

Do you have some pointers so I can made some tests?


Regards,
Nicolas

Re: linux-next: Tree for Apr 27

2016-04-27 Thread Sergey Senozhatsky

Hello,

commit 2f7600bc981cb0fd7ea0b92618bae32dcc778317
Author: Thierry Reding 
Date:   Tue Apr 5 17:17:34 2016 +0200

phy: core: Allow children node to be overridden

In order to more flexibly support device tree bindings, allow drivers to
override the container of the child nodes. By default the device node of
the PHY provider is assumed to be the parent for children, but bindings
may decide to add additional levels for better organization.


this does not compile on !CONFIG_OF systems

drivers/phy/phy-core.c: In function ‘__of_phy_provider_register’:
drivers/phy/phy-core.c:848:13: error: implicit declaration of function 
‘of_get_next_parent’ [-Werror=implicit-function-declaration]
parent = of_get_next_parent(parent);
 ^~
drivers/phy/phy-core.c:848:11: warning: assignment makes pointer from integer 
without a cast [-Wint-conversion]
parent = of_get_next_parent(parent);
   ^
  CC [M]  drivers/net/usb/usbnet.o
cc1: some warnings being treated as errors
  CC  net/ipv4/proc.o
scripts/Makefile.build:289: recipe for target 'drivers/phy/phy-core.o' failed
make[2]: *** [drivers/phy/phy-core.o] Error 1
scripts/Makefile.build:440: recipe for target 'drivers/phy' failed
make[1]: *** [drivers/phy] Error 2
make[1]: *** Waiting for unfinished jobs


-ss

Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64 - LTP results

2016-04-27 Thread Andrew Pinski

On Fri, Apr 22, 2016 at 8:37 PM, Zhangjian (Bamvor)
 wrote:
> Hi, Yury
>
>
> On 2016/4/6 6:44, Yury Norov wrote:
>>
>> There are about 20 failing tests of 782 in lite scenario.
>> float_bessel
>> float_exp_log
>> float_iperb
>> float_power
>> float_trigo
>> pipeio_1
>> pipeio_3
>> pipeio_5
>> pipeio_8
>> abort01
>> clone02
>> kill11
>> mmap16
>> open12
>> pause01
>> rename11
>> rmdir02
>> umount2_01
>> umount2_02
>> umount2_03
>> utime06
>> mtest06
>>
>> The list is rough because some tests fail not every time.
>>
>> Tests abort01 and kill11 fail for lp64 too, so maybe there's
>> a reason unrelated to ilp32 itself.
>>
>> float_xxx tests fail because they call unwind() from signal context,
>> and GCC for ilp32 has problem with it, as Andrew told.
>
> Is there some progress about this issue. When we talk about unwind
> functions, do you mean the function in libgcc?
>
> We encountered another issue(abort not segfault) which also called
> pthread_cancel(). The test code is in the attachment. Here is the
> backtrace:

Yes this was a known issue I knew about.  I have a patch GCC to fix
this.  Basically REG_VALUE_IN_UNWIND_CONTEXT needs to be defined while
building libgcc to support the correct unwind information.
I will be posting a GCC patch to fix this tomorrow.  This was a bug
even in the original set of ilp32 patches.  I only finally was able to
sit down and fix it today.


Thanks,
Andrew

>
> ```
> Program received signal SIGABRT, Aborted.
> [Switching to Thread 0xf77ee330 (LWP 2958)]
> 0x0040f5bc in raise (sig=sig@entry=6)
> at ../sysdeps/unix/sysv/linux/raise.c:55
> 55  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  0x0040f5bc in raise (sig=sig@entry=6)
> at ../sysdeps/unix/sysv/linux/raise.c:55
> #1  0x0040f884 in abort () at abort.c:89
>
> #2  0x004073b4 in uw_update_context_1 (
> context=context@entry=0xf77ec820, fs=fs@entry=0xf77ebec8)
> at /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind-dw2.c:1430
>
> #3  0x004078c0 in uw_update_context
> (context=context@entry=0xf77ec820,
> fs=fs@entry=0xf77ebec8)
>at
> /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind-dw2.c:1506
> #4  0x00407a9c in uw_advance_context (fs=0xf77ebec8,
> context=0xf77ec820)
> at
> /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind-dw2.c:1529
> #5  _Unwind_ForcedUnwind_Phase2 (exc=exc@entry=0xf77ee580,
> context=context@entry=0xf77ec820)
> at /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind.inc:185
> #6  0x00408228 in _Unwind_ForcedUnwind (exc=0xf77ee580,
> stop=stop@entry=0x405440 , stop_argument=0xf77eddd8)
> at /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind.inc:207
> #7  0x004055c4 in __pthread_unwind (buf=)
> at unwind.c:126
> #8  0x004050b4 in __do_cancel () at ./pthreadP.h:283
> #9  sigcancel_handler (sig=, si=,
> ctx=) at nptl-init.c:225
> ---Type  to continue, or q  to quit---
> #10 
>
> #11 0x in ?? ()
>
> #12 0x00423084 in __select (nfds=-1, readfds=,
> writefds=, exceptfds=, timeout=0x0)
> at ../sysdeps/unix/sysv/linux/generic/select.c:45
> #13 0x00400604 in TEST_TaskDelay (
> uiMillSecs=)
> at test-cancel.c:18
> #14 0x00400680 in printids (
> s=)
> at test-cancel.c:38
> #15 0x004006d0 in thr_fn (
> arg=)
> at test-cancel.c:49
> #16 0x00401b28 in start_thread (arg=0x4a3000) at
> pthread_create.c:335
> #17 0x00401b28 in start_thread (arg=0x4a3000) at
> pthread_create.c:335
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> ```
>
> Such abort is raise by the following code:
> ```
> static void
> uw_update_context_1 (struct _Unwind_Context *context, _Unwind_FrameState
> *fs)
> {
> //...
>   /* Compute this frame's CFA.  */
>   switch (fs->regs.cfa_how)
> {
> case CFA_REG_OFFSET:
>   cfa = _Unwind_GetPtr (&orig_context, fs->regs.cfa_reg);
>   cfa += fs->regs.cfa_offset;
>   break;
>
> case CFA_EXP:
>   {
> const unsigned char *exp = fs->regs.cfa_exp;
> _uleb128_t len;
>
> exp = read_uleb128 (exp, &len);
> cfa = (void *) (_Unwind_Ptr)
>   execute_stack_op (exp, exp + len, &orig_context, 0);
> break;
>   }
>
> default:
>   gcc_unreachable ();
> }
>   context->cfa = cfa;
> //...
> }
> ``
>
> Any suggestion is appreciated.
>
> CC gcc mailing list. Sorry if it is off topic.
>
> Regards
>
> Bamvor
>
>
>
>
>> pipeio_x tests are very unstable and may fail randomly. I strongly
>> suspect race conditions, as they all work like a charm if pinned to
>> single CPU with taskset. Probably, race is the reason of clone02 too.
>> Though I'm not sure, is the race in kernel, glibc or test itself.
>>
>> But I know for sure that pause01 fails due to test design:
>> if (setitimer(ITIMER_REAL, &it, NULL)) /

Re: [PATCH 1/2] clk: imx: do not sleep if IRQ's are still disabled

2016-04-27 Thread Stefan Agner

On 2016-04-27 00:24, Shawn Guo wrote:
> On Wed, Apr 27, 2016 at 10:57:21AM +0800, Dong Aisheng wrote:
>> On Wed, Apr 27, 2016 at 9:58 AM, Shawn Guo  wrote:
>> > On Tue, Apr 26, 2016 at 07:27:03PM +0800, Dong Aisheng wrote:
>> >> Shawn,
>> >> What's your suggestion?
>> >
>> > I think this needs more discussion, and I just dropped Stefan's patch
>> > from my tree.
>> >
>> > We need to firstly understand why this is happening.  The .prepare hook
>> > is defined to be non-atomic context, and so that we call sleep function
>> > in there.  We did everything right.  Why are we getting the warning?  If
>> > I'm correct, this warning only happens on i.MX7D.  Why is that?
>> >
>>
>> Why Stefan's patch works (checking irqs_disabled()) is because during kernel
>> time init, the irq is still not enabled. It fixes the issue indirectly.
>> See:
>> asmlinkage __visible void __init start_kernel(void)
>> {
>> /*
>>  * Set up the scheduler prior starting any interrupts (such as the
>>  * timer interrupt). Full topology setup happens at smp_init()
>>  * time - but meanwhile we still have a functioning scheduler.
>>  */
>> sched_init();
>> .
>> time_init();
>> ..
>> WARN(!irqs_disabled(), "Interrupts were enabled early\n");
>> early_boot_irqs_disabled = false;
>> local_irq_enable();
>> }
>>
>> The issue can only happen when PLL enable causes a schedule during
>> imx_clock_init().
>> Not all PLL has this issue.
>> The issue happens on MX7D pll_audio_main_clk/pll_video_main_clk
>> which requires more delay time and cause usleep.
>> Because clk framework does not support MX7D clock types (operation requires
>> parents on), we simply enable all clocks in imx7d_clocks_init().
>>
>> If apply my this patch series:
>> https://lkml.org/lkml/2016/4/20/199
>> The issue can also be gone.
> 
> Thanks for the info.  It sounds like that we are fixing the problem in
> the wrong place, i.e. clk_pllv3_prepare().  The function does nothing
> wrong, since the .prepare hook is defined to be one that can sleep.  If
> we see sleep warning in a context calling clk_prepare(), that probably
> means we shouldn't make the function call from that context.
> 

According to the stack trace in the answer to Stephens question the call
comes from imx7d_clocks_init. I doubt we can avoid that those clocks get
enabled there...

--
Stefan

Re: [PATCH 1/2] clk: imx: do not sleep if IRQ's are still disabled

2016-04-27 Thread Stefan Agner

On 2016-04-26 19:56, Fabio Estevam wrote:
> On Tue, Apr 26, 2016 at 11:45 PM, Dong Aisheng  wrote:
> 
>>> We need to firstly understand why this is happening.  The .prepare hook
>>> is defined to be non-atomic context, and so that we call sleep function
>>> in there.  We did everything right.  Why are we getting the warning?  If
>>> I'm correct, this warning only happens on i.MX7D.  Why is that?
>>>
>>
>> This is mainly caused by during kernel early booting, there's only one init 
>> idle
>> task running.
>> See:
>> void __init sched_init(void)
>> {
>> .
>> /*
>>  * Make us the idle thread. Technically, schedule() should not be
>>  * called from this thread, however somewhere below it might be,
>>  * but because we are the idle thread, we just pick up running again
>>  * when this runqueue becomes "idle".
>>  */
>> init_idle(current, smp_processor_id());
>> ...
>> }
>>
>> And the idle sched class indicates it's not valid to schedule for idle task.
>> const struct sched_class idle_sched_class = {
>> /* .next is NULL */
>> /* no enqueue/yield_task for idle tasks */
>>
>> /* dequeue is not valid, we print a debug message there: */
>> .dequeue_task   = dequeue_task_idle,
>> ...
>>
>> }
>>
>> /*
>>  * It is not legal to sleep in the idle task - print a warning
>>  * message if some code attempts to do it:
>>  */
>> static void
>> dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags)
>> {
>> raw_spin_unlock_irq(&rq->lock);
>> printk(KERN_ERR "bad: scheduling from the idle thread!\n");
>> dump_stack();
>> raw_spin_lock_irq(&rq->lock);
>> }
>>
>>
>> Below is the full log of imx7d booting FYI.
> 
> This does not answer Shawn's question: why do we see this only on mx7d?

I was wondering that too My theory is that either on i.MX 6 the
clocks enable almost instantly leading to no sleep, or they are just
bootloader/firmware on...?

--
Stefan

Re: [PATCH 0/6] Intel Secure Guard Extensions

2016-04-27 Thread Pavel Machek

Hi!

> > > Preventing cold boot attacks is really just icing on the cake.  The
> > > real point of this is to allow you to run an "enclave".  An SGX
> > > enclave has unencrypted code but gets access to a key that only it can
> > > access.  It could use that key to unwrap your ssh private key and sign
> > > with it without ever revealing the unwrapped key.  No one, not even
> > > root, can read enclave memory once the enclave is initialized and gets
> > > access to its personalized key.  The point of the memory encryption
> > > engine to to prevent even cold boot attacks from being used to read
> > > enclave memory.
> >
> > Ok, so the attacker can still access the "other" machine, but ok, key
> > is protected.
> >
> > But... that will mean that my ssh will need to be SGX-aware, and that
> > I will not be able to switch to AMD machine in future. ... or to other
> > Intel machine for that matter, right?
> 
> That's the whole point.  You could keep an unwrapped copy of the key
> offline so you could provision another machine if needed.
> 
> >
> > What new syscalls would be needed for ssh to get all this support?
> 
> This patchset or similar, plus some user code and an enclave to use.
> 
> Sadly, on current CPUs, you also need Intel to bless the enclave.  It
> looks like new CPUs might relax that requirement.

Umm. I'm afraid my evil meter just went over "smells evil" and "bit
evil" areas straight to "certainly looks evil".

> > > Replay Protected Memory Block.  It's a device that allows someone to
> > > write to it and confirm that the write happened and the old contents
> > > is no longer available.  You could use it to implement an enclave that
> > > checks a password for your disk but only allows you to try a certain
> > > number of times.
> >
> > Ookay... I guess I can get a fake Replay Protected Memory block, which
> > will confirm that write happened and not do anything from China, but
> > ok, if you put that memory on the CPU, you raise the bar to a "rather
> > difficult" (tm) level. Nice.
> 
> It's not so easy for the RPMB to leak things.  It would be much easier
> for it to simply not provide replay protection (i.e. more or less what
> the FBI asked from Apple: keep allowing guesses even though that
> shouldn't work).

Yup.

> > But that also means that when my CPU dies, I'll no longer be able to
> > access the encrypted data.
> 
> You could implement your own escrow policy and keep a copy in the
> safe.

And then Intel would have to bless my own escrow policy, which is,
realistically, not going to happen, right?

> > And, again, it means that quite complex new kernel-user interface will
> > be needed, right?
> 
> It's actually fairly straightforward, and the kernel part doesn't care
> what you use it for (the kernel part is the same for disk encryption
> and ssh, for example, except that disk encryption would care about
> replay protection, whereas ssh wouldn't).

So we end up with parts of kernel we can not change, and where we may
not even change the compiler. That means assembly. Hey, user, you have
freedom to this code, except it will not work. That was called TiVo
before. We'd have security-relevant parts of kernel where we could not
even fix a securit holes without Intel.

If anything, this is reason to switch to GPLv3.

I'm sorry. This is evil.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

RE: [PATCH v7 1/2] Documentation: DT: dma: Add Xilinx zynqmp dma device tree binding documentation

2016-04-27 Thread Appana Durga Kedareswara Rao

Hi Lars,

> -Original Message-
> From: Lars-Peter Clausen [mailto:l...@metafoo.de]
> Sent: Wednesday, April 27, 2016 12:42 PM
> To: Appana Durga Kedareswara Rao ;
> robh...@kernel.org; pawel.m...@arm.com; mark.rutl...@arm.com;
> ijc+devicet...@hellion.org.uk; ga...@codeaurora.org; Michal Simek
> ; Soren Brinkmann ;
> vinod.k...@intel.com; dan.j.willi...@intel.com; Appana Durga Kedareswara
> Rao ; moritz.fisc...@ettus.com;
> laurent.pinch...@ideasonboard.com; l...@debethencourt.com; Anirudha
> Sarangi ; Punnaiah Choudary Kalluri
> 
> Cc: devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org; dmaeng...@vger.kernel.org
> Subject: Re: [PATCH v7 1/2] Documentation: DT: dma: Add Xilinx zynqmp dma
> device tree binding documentation
> 
> On 04/27/2016 09:05 AM, Kedareswara rao Appana wrote:
> [...]
> > +- xlnx,include-sg  : Indicates the controller to operate in simple or
> > + scatter gather dma mode
> > +- xlnx,ratectrl: Scheduling interval in terms of clock cycles 
> > for
> > + source AXI transaction
> > +- xlnx,overfetch   : Tells whether the channel is allowed to over
> > + fetch the data
> > +- xlnx,src-issue   : Number of AXI outstanding transactions on source
> side
> > +- xlnx,src-burst-len   : AXI length for data read. Support only power 
> > of
> > + 2 byte values.
> > +- xlnx,dst-burst-len   : AXI length for data write. Support only power 
> > of
> 
> These are all software runtime configuration parameters that you'd want to
> change at runtime depending on which peripheral you are targeting with a
> specific DMA transfer. These really do not belong into the devicetree.

You mean to have a separate config structure in the driver and handle the above 
parameters
Through that structure???

I understand that above will work for slave dma transfer types what about 
memory to memory
Transfers where we don't have provision to the use this parameters...

Regards,
Kedar.

Re: [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression

2016-04-27 Thread Michal Hocko

On Wed 27-04-16 11:15:56, kernel test robot wrote:
> FYI, we noticed vm-scalability.throughput -11.8% regression with the 
> following commit:

Could you be more specific what the test does please?
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 1/2] clk: imx: do not sleep if IRQ's are still disabled

2016-04-27 Thread Stefan Agner

On 2016-04-26 19:57, Dong Aisheng wrote:
> On Wed, Apr 27, 2016 at 9:58 AM, Shawn Guo  wrote:
>> On Tue, Apr 26, 2016 at 07:27:03PM +0800, Dong Aisheng wrote:
>>> Shawn,
>>> What's your suggestion?
>>
>> I think this needs more discussion, and I just dropped Stefan's patch
>> from my tree.
>>
>> We need to firstly understand why this is happening.  The .prepare hook
>> is defined to be non-atomic context, and so that we call sleep function
>> in there.  We did everything right.  Why are we getting the warning?  If
>> I'm correct, this warning only happens on i.MX7D.  Why is that?
>>
> 
> Why Stefan's patch works (checking irqs_disabled()) is because during kernel
> time init, the irq is still not enabled. It fixes the issue indirectly.
> See:
> asmlinkage __visible void __init start_kernel(void)
> {
> /*
>  * Set up the scheduler prior starting any interrupts (such as the
>  * timer interrupt). Full topology setup happens at smp_init()
>  * time - but meanwhile we still have a functioning scheduler.
>  */
> sched_init();
> .
> time_init();
> ..
> WARN(!irqs_disabled(), "Interrupts were enabled early\n");
> early_boot_irqs_disabled = false;
> local_irq_enable();
> }
> 
> The issue can only happen when PLL enable causes a schedule during
> imx_clock_init().
> Not all PLL has this issue.
> The issue happens on MX7D pll_audio_main_clk/pll_video_main_clk
> which requires more delay time and cause usleep.
> Because clk framework does not support MX7D clock types (operation requires
> parents on), we simply enable all clocks in imx7d_clocks_init().
> 
> If apply my this patch series:
> https://lkml.org/lkml/2016/4/20/199
> The issue can also be gone.

Oh ok, it does make sense to enable as few clocks as possible.

However, even if we do not enable lots of clocks at that time, and this
helps to avoid the problem for now, it could still be that
someone/something requests a clock early during boot, leading to a PLL
enable... Shouldn't we make sure that those base clocks can be enabled
even during early boot..?

--
Stefan

Re: zram: per-cpu compression streams

2016-04-27 Thread Sergey Senozhatsky

Hello,

On (04/27/16 16:29), Minchan Kim wrote:
[..]
> > the test:
> > 
> > -- 4 GB x86_64 box
> > -- zram 3GB, lzo
> > -- mem-hogger pre-faults 3GB of pages before the fio test
> > -- fio test has been modified to have 11% compression ratio (to increase the
> >   chances of 
> > re-compressions)
> 
> Could you test concurrent mem hogger with fio rather than pre-fault before 
> fio test
> in next submit?

this test will not prove anything, unfortunately. I performed it;
and it's impossible to guarantee even remotely stable results.
mem-hogger process can spend on pre-fault from 41 to 81 seconds;
so I'm quite sceptical about the actual value of this test.

> > considering buffer_compress_percentage=11, the box was under somewhat
> > heavy pressure.
> > 
> > now, the results
> 
> Yeb, Even, recompression case is fater than old but want to see more heavy 
> memory
> pressure case and the ratio I mentioned above.

I did quite heavy testing over the last 7 days, with numerous OOM kills
and OOM panics.

-ss

[PATCH v4 05/12] zsmalloc: use bit_spin_lock

2016-04-27 Thread Minchan Kim

Use kernel standard bit spin-lock instead of custom mess. Even, it has
a bug which doesn't disable preemption. The reason we don't have any
problem is that we have used it during preemption disable section
by class->lock spinlock. So no need to go to stable.

Cc: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 8649d0243e6c..d75d9e4b6a4d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -864,21 +864,17 @@ static unsigned long obj_idx_to_offset(struct page *page,
 
 static inline int trypin_tag(unsigned long handle)
 {
-   unsigned long *ptr = (unsigned long *)handle;
-
-   return !test_and_set_bit_lock(HANDLE_PIN_BIT, ptr);
+   return bit_spin_trylock(HANDLE_PIN_BIT, (unsigned long *)handle);
 }
 
 static void pin_tag(unsigned long handle)
 {
-   while (!trypin_tag(handle));
+   bit_spin_lock(HANDLE_PIN_BIT, (unsigned long *)handle);
 }
 
 static void unpin_tag(unsigned long handle)
 {
-   unsigned long *ptr = (unsigned long *)handle;
-
-   clear_bit_unlock(HANDLE_PIN_BIT, ptr);
+   bit_spin_unlock(HANDLE_PIN_BIT, (unsigned long *)handle);
 }
 
 static void reset_page(struct page *page)
-- 
1.9.1

[PATCH v4 01/12] mm: use put_page to free page instead of putback_lru_page

2016-04-27 Thread Minchan Kim

Procedure of page migration is as follows:

First of all, it should isolate a page from LRU and try to
migrate the page. If it is successful, it releases the page
for freeing. Otherwise, it should put the page back to LRU
list.

For LRU pages, we have used putback_lru_page for both freeing
and putback to LRU list. It's okay because put_page is aware of
LRU list so if it releases last refcount of the page, it removes
the page from LRU list. However, It makes unnecessary operations
(e.g., lru_cache_add, pagevec and flags operations. It would be
not significant but no worth to do) and harder to support new
non-lru page migration because put_page isn't aware of non-lru
page's data structure.

To solve the problem, we can add new hook in put_page with
PageMovable flags check but it can increase overhead in
hot path and needs new locking scheme to stabilize the flag check
with put_page.

So, this patch cleans it up to divide two semantic(ie, put and putback).
If migration is successful, use put_page instead of putback_lru_page and
use putback_lru_page only on failure. That makes code more readable
and doesn't add overhead in put_page.

Comment from Vlastimil
"Yeah, and compaction (perhaps also other migration users) has to drain
the lru pvec... Getting rid of this stuff is worth even by itself."

Cc: Rik van Riel 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Naoya Horiguchi 
Acked-by: Vlastimil Babka 
Signed-off-by: Minchan Kim 
---
 mm/migrate.c | 64 +---
 1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index f8587b974cba..7880f30d1d3d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -933,6 +933,19 @@ static int __unmap_and_move(struct page *page, struct page 
*newpage,
put_anon_vma(anon_vma);
unlock_page(page);
 out:
+   /*
+* If migration is successful, decrease refcount of the newpage
+* which will not free the page because new page owner increased
+* refcounter. As well, if it is LRU page, add the page to LRU
+* list in here.
+*/
+   if (rc == MIGRATEPAGE_SUCCESS) {
+   if (unlikely(__is_movable_balloon_page(newpage)))
+   put_page(newpage);
+   else
+   putback_lru_page(newpage);
+   }
+
return rc;
 }
 
@@ -971,6 +984,12 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
 
if (page_count(page) == 1) {
/* page was freed from under us. So we are done. */
+   ClearPageActive(page);
+   ClearPageUnevictable(page);
+   if (put_new_page)
+   put_new_page(newpage, private);
+   else
+   put_page(newpage);
goto out;
}
 
@@ -983,10 +1002,8 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
}
 
rc = __unmap_and_move(page, newpage, force, mode);
-   if (rc == MIGRATEPAGE_SUCCESS) {
-   put_new_page = NULL;
+   if (rc == MIGRATEPAGE_SUCCESS)
set_page_owner_migrate_reason(newpage, reason);
-   }
 
 out:
if (rc != -EAGAIN) {
@@ -999,34 +1016,33 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
-   /* Soft-offlined page shouldn't go through lru cache list */
-   if (reason == MR_MEMORY_FAILURE && rc == MIGRATEPAGE_SUCCESS) {
+   }
+
+   /*
+* If migration is successful, releases reference grabbed during
+* isolation. Otherwise, restore the page to right list unless
+* we want to retry.
+*/
+   if (rc == MIGRATEPAGE_SUCCESS) {
+   put_page(page);
+   if (reason == MR_MEMORY_FAILURE) {
/*
-* With this release, we free successfully migrated
-* page and set PG_HWPoison on just freed page
-* intentionally. Although it's rather weird, it's how
-* HWPoison flag works at the moment.
+* Set PG_HWPoison on just freed page
+* intentionally. Although it's rather weird,
+* it's how HWPoison flag works at the moment.
 */
-   put_page(page);
if (!test_set_page_hwpoison(page))
num_poisoned_pages_inc();
-   } else
+   }
+   } else {
+   if (rc != -EAGAIN)
putback_lru_page(page);
+   if (put_new_page)
+   put_new_page(newpage, private);
+   else
+   put_page(newpage);
}
 
-   /*
-

[PATCH v4 02/12] mm: migrate: support non-lru movable page migration

2016-04-27 Thread Minchan Kim

We have allowed migration for only LRU pages until now and it was
enough to make high-order pages. But recently, embedded system(e.g.,
webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
so we have seen several reports about troubles of small high-order
allocation. For fixing the problem, there were several efforts
(e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
reserved memory, vmalloc and so on) but if there are lots of
non-movable pages in system, their solutions are void in the long run.

So, this patch is to support facility to change non-movable pages
with movable. For the feature, this patch introduces functions related
to migration to address_space_operations as well as some page flags.

If a driver want to make own pages movable, it should define three functions
which are function pointers of struct address_space_operations.

1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);

What VM expects on isolate_page function of driver is to return *true*
if driver isolates page successfully. On returing true, VM marks the page
as PG_isolated so concurrent isolation in several CPUs skip the page
for isolation. If a driver cannot isolate the page, it should return *false*.

Once page is successfully isolated, VM uses page.lru fields so driver
shouldn't expect to preserve values in that fields.

2. int (*migratepage) (struct address_space *mapping,
struct page *newpage, struct page *oldpage, enum migrate_mode);

After isolation, VM calls migratepage of driver with isolated page.
The function of migratepage is to move content of the old page to new page
and set up fields of struct page newpage. Keep in mind that you should
clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
migrated the oldpage successfully and returns MIGRATEPAGE_SUCCESS.
If driver cannot migrate the page at the moment, driver can return -EAGAIN.
On -EAGAIN, VM will retry page migration in a short time because VM interprets
-EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
VM will give up the page migration without retrying in this time.

Driver shouldn't touch page.lru field VM using in the functions.

3. void (*putback_page)(struct page *);

If migration fails on isolated page, VM should return the isolated page
to the driver so VM calls driver's putback_page with migration failed page.
In this function, driver should put the isolated page back to the own data
structure.

4. non-lru movable page flags

There are two page flags for supporting non-lru movable page.

* PG_movable

Driver should use the below function to make page movable under page_lock.

void __SetPageMovable(struct page *page, struct address_space *mapping)

It needs argument of address_space for registering migration family functions
which will be called by VM. Exactly speaking, PG_movable is not a real flag of
struct page. Rather than, VM reuses page->mapping's lower bits to represent it.

#define PAGE_MAPPING_MOVABLE 0x2
page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;

so driver shouldn't access page->mapping directly. Instead, driver should
use page_mapping which mask off the low two bits of page->mapping so it can get
right struct address_space.

For testing of non-lru movable page, VM supports __PageMovable function.
However, it doesn't guarantee to identify non-lru movable page because
page->mapping field is unified with other variables in struct page.
As well, if driver releases the page after isolation by VM, page->mapping
doesn't have stable value although it has PAGE_MAPPING_MOVABLE
(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
page is LRU or non-lru movable once the page has been isolated. Because
LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
good for just peeking to test non-lru movable pages before more expensive
checking with lock_page in pfn scanning to select victim.

For guaranteeing non-lru movable page, VM provides PageMovable function.
Unlike __PageMovable, PageMovable functions validates page->mapping and
mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
destroying of page->mapping.

Driver using __SetPageMovable should clear the flag via __ClearMovablePage
under page_lock before the releasing the page.

* PG_isolated

To prevent concurrent isolation among several CPUs, VM marks isolated page
as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
movable page, it can skip it. Driver doesn't need to manipulate the flag
because VM will set/clear it automatically. Keep in mind that if driver
sees PG_isolated page, it means the page have been isolated by VM so it
shouldn't touch page.lru field.
PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
for own purpose.

Cc: Rik van Riel 
Cc: Vlastimil Babka 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Rafael Aquini 
Cc: virtualiz

Re: [PATCH] i2c: designware: do not disable adapter after transfer

2016-04-27 Thread Jarkko Nikula


On 04/25/2016 06:04 PM, Lucas De Marchi wrote:



On 04/25/2016 08:51 AM, Jarkko Nikula wrote:

[ ... ]


@@ -413,8 +416,16 @@ static void i2c_dw_xfer_init(struct dw_i2c_dev
*dev)
  struct i2c_msg *msgs = dev->msgs;
  u32 ic_con, ic_tar = 0;

-/* Disable the adapter */
-__i2c_dw_enable(dev, false);
+if (dev->enabled) {
+u32 ic_status;
+
+/* check ic_tar and ic_con can be dynamically updated */
+ic_status = dw_readl(dev, DW_IC_STATUS);
+if (ic_status & DW_IC_STATUS_ACTIVITY
+|| !(ic_status & DW_IC_STATUS_TX_EMPTY)) {
+__i2c_dw_enable(dev, false);
+}
+}


Worth to double check this. I see bit 1 means TX FIFO not full and bit 2
is TX FIFO completely empty.


the conditions to be able to update IC_TAR dynamically are:

   - Adapter isn't doing any TX/RX operation (IC_STATUS[5] == 0) and
   - There are no entries in TX FIFO (IC_STATUS[2] == 1)

So... yeah, the condition above seems wrong. I should be reading bit 5,
not bit 1. Thanks! However:

It reads above, bit 2 instead of 1 for TX FIFO checking and then either 
bit 5 or 0 for activity checking.


I'd say it's probably better to check bit 5 instead of bit 0 even bit 0 
is or'ed from bits 5 and 6. I don't know how possible slave support and 
slave being active will play here so it's best to follow spec.


--
jarkko

[PATCH v4 09/12] zsmalloc: separate free_zspage from putback_zspage

2016-04-27 Thread Minchan Kim

Currently, putback_zspage does free zspage under class->lock
if fullness become ZS_EMPTY but it makes trouble to implement
locking scheme for new zspage migration.
So, this patch is to separate free_zspage from putback_zspage
and free zspage out of class->lock which is preparation for
zspage migration.

Cc: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 27 +++
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 4b5ead85c7e7..9522bca068de 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1684,14 +1684,12 @@ static struct zspage *isolate_zspage(struct size_class 
*class, bool source)
 
 /*
  * putback_zspage - add @zspage into right class's fullness list
- * @pool: target pool
  * @class: destination class
  * @zspage: target page
  *
  * Return @zspage's fullness_group
  */
-static enum fullness_group putback_zspage(struct zs_pool *pool,
-   struct size_class *class,
+static enum fullness_group putback_zspage(struct size_class *class,
struct zspage *zspage)
 {
enum fullness_group fullness;
@@ -1700,15 +1698,6 @@ static enum fullness_group putback_zspage(struct zs_pool 
*pool,
insert_zspage(class, zspage, fullness);
set_zspage_mapping(zspage, class->index, fullness);
 
-   if (fullness == ZS_EMPTY) {
-   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
-   class->size, class->pages_per_zspage));
-   atomic_long_sub(class->pages_per_zspage,
-   &pool->pages_allocated);
-
-   free_zspage(pool, zspage);
-   }
-
return fullness;
 }
 
@@ -1754,23 +1743,29 @@ static void __zs_compact(struct zs_pool *pool, struct 
size_class *class)
if (!migrate_zspage(pool, class, &cc))
break;
 
-   putback_zspage(pool, class, dst_zspage);
+   putback_zspage(class, dst_zspage);
}
 
/* Stop if we couldn't find slot */
if (dst_zspage == NULL)
break;
 
-   putback_zspage(pool, class, dst_zspage);
-   if (putback_zspage(pool, class, src_zspage) == ZS_EMPTY)
+   putback_zspage(class, dst_zspage);
+   if (putback_zspage(class, src_zspage) == ZS_EMPTY) {
+   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
+   class->size, class->pages_per_zspage));
+   atomic_long_sub(class->pages_per_zspage,
+   &pool->pages_allocated);
+   free_zspage(pool, src_zspage);
pool->stats.pages_compacted += class->pages_per_zspage;
+   }
spin_unlock(&class->lock);
cond_resched();
spin_lock(&class->lock);
}
 
if (src_zspage)
-   putback_zspage(pool, class, src_zspage);
+   putback_zspage(class, src_zspage);
 
spin_unlock(&class->lock);
 }
-- 
1.9.1

[PATCH v4 12/12] zram: use __GFP_MOVABLE for memory allocation

2016-04-27 Thread Minchan Kim

Zsmalloc is ready for page migration so zram can use __GFP_MOVABLE
from now on.

I did test to see how it helps to make higher order pages.
Test scenario is as follows.

KVM guest, 1G memory, ext4 formated zram block device,

for i in `seq 1 8`;
do
dd if=/dev/vda1 of=mnt/test$i.txt bs=128M count=1 &
done

wait `pidof dd`

for i in `seq 1 2 8`;
do
rm -rf mnt/test$i.txt
done
fstrim -v mnt

echo "init"
cat /proc/buddyinfo

echo "compaction"
echo 1 > /proc/sys/vm/compact_memory
cat /proc/buddyinfo

old:

init
Node 0, zone  DMA208120 51 41 11  0  0  0   
   0  0  0
Node 0, zoneDMA32  16380  13777   9184   3805789 54  3  0   
   0  0  0
compaction
Node 0, zone  DMA132 82 40 39 16  2  1  0   
   0  0  0
Node 0, zoneDMA32   5219   5526   4969   3455   1831677139 15   
   0  0  0

new:

init
Node 0, zone  DMA379115 97 19  2  0  0  0   
   0  0  0
Node 0, zoneDMA32  18891  16774  10862   3947637 21  0  0   
   0  0  0
compaction  1
Node 0, zone  DMA214 66 87 29 10  3  0  0   
   0  0  0
Node 0, zoneDMA32   1612   3139   3154   2469   1745990384 94   
   7  0  0

As you can see, compaction made so many high-order pages. Yay!

Reviewed-by: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 drivers/block/zram/zram_drv.c | 3 ++-
 mm/zsmalloc.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 370c2f76016d..10f6ff1cf6a0 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -514,7 +514,8 @@ static struct zram_meta *zram_meta_alloc(char *pool_name, 
u64 disksize)
goto out_error;
}
 
-   meta->mem_pool = zs_create_pool(pool_name, GFP_NOIO | __GFP_HIGHMEM);
+   meta->mem_pool = zs_create_pool(pool_name, GFP_NOIO|__GFP_HIGHMEM
+   |__GFP_MOVABLE);
if (!meta->mem_pool) {
pr_err("Error creating memory pool\n");
goto out_error;
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 042793015ecf..d4264c916f86 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -315,7 +315,7 @@ static void destroy_cache(struct zs_pool *pool)
 static unsigned long cache_alloc_handle(struct zs_pool *pool)
 {
return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
-   pool->flags & ~__GFP_HIGHMEM);
+   pool->flags & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
 }
 
 static void cache_free_handle(struct zs_pool *pool, unsigned long handle)
-- 
1.9.1

[PATCH v4 07/12] zsmalloc: factor page chain functionality out

2016-04-27 Thread Minchan Kim

For page migration, we need to create page chain of zspage dynamically
so this patch factors it out from alloc_zspage.

Cc: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 59 +++
 1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 8c22b0ca1df7..b08ac1ae1743 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -956,7 +956,8 @@ static void init_zspage(struct size_class *class, struct 
page *first_page)
unsigned long off = 0;
struct page *page = first_page;
 
-   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+   first_page->freelist = NULL;
+   set_zspage_inuse(first_page, 0);
 
while (page) {
struct page *next_page;
@@ -992,15 +993,16 @@ static void init_zspage(struct size_class *class, struct 
page *first_page)
page = next_page;
off %= PAGE_SIZE;
}
+
+   set_freeobj(first_page, (unsigned long)location_to_obj(first_page, 0));
 }
 
-/*
- * Allocate a zspage for the given size class
- */
-static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
+static void create_page_chain(struct page *pages[], int nr_pages)
 {
-   int i, error;
-   struct page *first_page = NULL, *uninitialized_var(prev_page);
+   int i;
+   struct page *page;
+   struct page *prev_page = NULL;
+   struct page *first_page = NULL;
 
/*
 * Allocate individual pages and link them together as:
@@ -1013,20 +1015,14 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
 * (i.e. no other sub-page has this flag set) and PG_private_2 to
 * identify the last page.
 */
-   error = -ENOMEM;
-   for (i = 0; i < class->pages_per_zspage; i++) {
-   struct page *page;
-
-   page = alloc_page(flags);
-   if (!page)
-   goto cleanup;
+   for (i = 0; i < nr_pages; i++) {
+   page = pages[i];
 
INIT_LIST_HEAD(&page->lru);
-   if (i == 0) {   /* first page */
+   if (i == 0) {
SetPagePrivate(page);
set_page_private(page, 0);
first_page = page;
-   set_zspage_inuse(first_page, 0);
}
if (i == 1)
set_page_private(first_page, (unsigned long)page);
@@ -1034,22 +1030,37 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
set_page_private(page, (unsigned long)first_page);
if (i >= 2)
list_add(&page->lru, &prev_page->lru);
-   if (i == class->pages_per_zspage - 1)   /* last page */
+   if (i == nr_pages - 1)
SetPagePrivate2(page);
prev_page = page;
}
+}
 
-   init_zspage(class, first_page);
+/*
+ * Allocate a zspage for the given size class
+ */
+static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
+{
+   int i;
+   struct page *first_page = NULL;
+   struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE];
 
-   set_freeobj(first_page, (unsigned long)location_to_obj(first_page, 0));
-   error = 0; /* Success */
+   for (i = 0; i < class->pages_per_zspage; i++) {
+   struct page *page;
 
-cleanup:
-   if (unlikely(error) && first_page) {
-   free_zspage(first_page);
-   first_page = NULL;
+   page = alloc_page(flags);
+   if (!page) {
+   while (--i >= 0)
+   __free_page(pages[i]);
+   return NULL;
+   }
+   pages[i] = page;
}
 
+   create_page_chain(pages, class->pages_per_zspage);
+   first_page = pages[0];
+   init_zspage(class, first_page);
+
return first_page;
 }
 
-- 
1.9.1

Re: [PATCH v2 1/6] rtc: m68k: provide rtc_class_ops directly

2016-04-27 Thread Geert Uytterhoeven

Hi Arnd,

On Tue, Apr 26, 2016 at 11:52 PM, Arnd Bergmann  wrote:
> The rtc-generic driver provides an architecture specific
> wrapper on top of the generic rtc_class_ops abstraction,
> and m68k has another abstraction on top, which is a bit
> silly.
>
> This changes the m68k rtc-generic device to provide its
> rtc_class_ops directly, to reduce the number of layers
> by one.
>
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/m68k/kernel/time.c | 24 ++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/arch/m68k/kernel/time.c b/arch/m68k/kernel/time.c
> index 3857737e3958..fe35890feede 100644
> --- a/arch/m68k/kernel/time.c
> +++ b/arch/m68k/kernel/time.c
> @@ -86,7 +86,24 @@ void read_persistent_clock(struct timespec *ts)
> }
>  }
>
> -#ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
> +#if defined(CONFIG_ARCH_USES_GETTIMEOFFSET) && 
> defined(CONFIG_RTC_DRV_GENERIC)

s/defined/IS_ENABLED/ for the modular case.

> @@ -95,7 +112,10 @@ static int __init rtc_init(void)
> if (!mach_hwclk)
> return -ENODEV;
>
> -   pdev = platform_device_register_simple("rtc-generic", -1, NULL, 0);
> +   /* or just call devm_rtc_device_register instead? */

I guess this comment is a bogus leftover? There's no "dev" parameter to
pass to devm_rtc_device_register() here.

> +   pdev = platform_device_register_data(NULL, "rtc-generic", -1,
> +&generic_rtc_ops,
> +sizeof(generic_rtc_ops));
> return PTR_ERR_OR_ZERO(pdev);
>  }

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

[PATCH v4 11/12] zsmalloc: page migration support

2016-04-27 Thread Minchan Kim

This patch introduces run-time migration feature for zspage.

For migration, VM uses page.lru field so it would be better to not use
page.next field for own purpose. For that, firstly, we can get first
object offset of the page via runtime calculation instead of
page->index so we can use page->index as link for page chaining.
In case of huge object, it stores handle rather than page chaining.
To identify huge object, we uses PG_owner_priv_1 flag.

For migration, it supports three functions

* zs_page_isolate

It isolates a zspage which includes a subpage VM want to migrate from
class so anyone cannot allocate new object from the zspage if it's first
isolation on subpages of zspage. Thus, further isolation on other
subpages cannot isolate zspage from class list.

* zs_page_migrate

First of all, it holds write-side zspage->lock to prevent migrate other
subpage in zspage. Then, lock all objects in the page VM want to migrate.
The reason we should lock all objects in the page is due to race between
zs_map_object and zs_page_migrate.

zs_map_object   zs_page_migrate

pin_tag(handle)
obj = handle_to_obj(handle)
obj_to_location(obj, &page, &obj_idx);

write_lock(&zspage->lock)
if (!trypin_tag(handle))
goto unpin_object

zspage = get_zspage(page);
read_lock(&zspage->lock);

If zs_page_migrate doesn't do trypin_tag, zs_map_object's page can
be stale so go crash.

If it locks all of objects successfully, it copies content from old page
create new one, finally, create new page chain with new page.
If it's last isolated page in the zspage, put the zspage back to class.

* zs_page_putback

It returns isolated zspage to right fullness_group list if it fails to
migrate a page.

Lastly, this patch introduces asynchronous zspage free. The reason
we need it is we need page_lock to clear PG_movable but unfortunately,
zs_free path should be atomic so the apporach is try to grab page_lock
with preemption disabled. If it got page_lock of all of pages
successfully, it can free zspage in the context. Otherwise, it queues
the free request and free zspage via workqueue in process context.

Cc: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 include/uapi/linux/magic.h |   1 +
 mm/zsmalloc.c  | 552 +++--
 2 files changed, 487 insertions(+), 66 deletions(-)

diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index e1fbe72c39c0..93b1affe4801 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -79,5 +79,6 @@
 #define NSFS_MAGIC 0x6e736673
 #define BPF_FS_MAGIC   0xcafe4a11
 #define BALLOON_KVM_MAGIC  0x13661366
+#define ZSMALLOC_MAGIC 0x58295829
 
 #endif /* __LINUX_MAGIC_H__ */
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 8d82e44c4644..042793015ecf 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -17,15 +17,14 @@
  *
  * Usage of struct page fields:
  * page->private: points to zspage
- * page->index: offset of the first object starting in this page.
- * For the first page, this is always 0, so we use this field
- * to store handle for huge object.
- * page->next: links together all component pages of a zspage
+ * page->freelist: links together all component pages of a zspage
+ * For the huge page, this is always 0, so we use this field
+ * to store handle.
  *
  * Usage of struct page flags:
  * PG_private: identifies the first component page
  * PG_private2: identifies the last component page
- *
+ * PG_owner_priv_1: indentifies the huge component page
  */
 
 #include 
@@ -47,6 +46,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+
+#define ZSPAGE_MAGIC   0x58
 
 /*
  * This must be power of 2 and greater than of equal to sizeof(link_free).
@@ -128,8 +131,33 @@
  *  ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
  *  (reason above)
  */
+
+/*
+ * A zspage's class index and fullness group
+ * are encoded in its (first)page->mapping
+ */
+#define FULLNESS_BITS  2
+#define CLASS_BITS 8
+#define ISOLATED_BITS  3
+#define MAGIC_VAL_BITS 8
+
+
 #define ZS_SIZE_CLASS_DELTA(PAGE_SIZE >> CLASS_BITS)
 
+struct zspage {
+   struct {
+   unsigned int fullness:FULLNESS_BITS;
+   unsigned int class:CLASS_BITS;
+   unsigned int isolated:ISOLATED_BITS;
+   unsigned int magic:MAGIC_VAL_BITS;
+   };
+   unsigned int inuse;
+   unsigned int freeobj;
+   struct page *first_page;
+   struct list_head list; /* fullness list */
+   rwlock_t lock;
+};
+
 /*
  * We do not maintain any list for completely empty or full pages
  */
@@ -161,6 +189,8 @@ struct zs_size_stat {
 static struct dentry *zs_stat_root;
 #endif
 
+static struct vfsmount *zsmalloc_mnt;
+
 /*
  * number of size_cl

[PATCH v4 10/12] zsmalloc: use freeobj for index

2016-04-27 Thread Minchan Kim

Zsmalloc stores first free object's  position into
freeobj in each zspage. If we change it with index from first_page
instead of position, it makes page migration simple because we
don't need to correct other entries for linked list if a page is
migrated out.

Cc: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 139 ++
 1 file changed, 73 insertions(+), 66 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9522bca068de..8d82e44c4644 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -69,9 +69,7 @@
  * Object location (, ) is encoded as
  * as single (unsigned long) handle value.
  *
- * Note that object index  is relative to system
- * page  it is stored in, so for each sub-page belonging
- * to a zspage, obj_idx starts with 0.
+ * Note that object index  starts from 0.
  *
  * This is made more complicated by various memory models and PAE.
  */
@@ -212,10 +210,10 @@ struct size_class {
 struct link_free {
union {
/*
-* Position of next free chunk (encodes )
+* Free object index;
 * It's valid for non-allocated object
 */
-   void *next;
+   unsigned long next;
/*
 * Handle of allocated object.
 */
@@ -260,7 +258,7 @@ struct zspage {
unsigned int class:CLASS_BITS;
};
unsigned int inuse;
-   void *freeobj;
+   unsigned int freeobj;
struct page *first_page;
struct list_head list; /* fullness list */
 };
@@ -457,14 +455,14 @@ static inline void set_first_obj_offset(struct page 
*page, int offset)
page->index = offset;
 }
 
-static inline unsigned long get_freeobj(struct zspage *zspage)
+static inline unsigned int get_freeobj(struct zspage *zspage)
 {
-   return (unsigned long)zspage->freeobj;
+   return zspage->freeobj;
 }
 
-static inline void set_freeobj(struct zspage *zspage, unsigned long obj)
+static inline void set_freeobj(struct zspage *zspage, unsigned int obj)
 {
-   zspage->freeobj = (void *)obj;
+   zspage->freeobj = obj;
 }
 
 static void get_zspage_mapping(struct zspage *zspage,
@@ -810,6 +808,10 @@ static int get_pages_per_zspage(int class_size)
return max_usedpc_order;
 }
 
+static struct page *get_first_page(struct zspage *zspage)
+{
+   return zspage->first_page;
+}
 
 static struct zspage *get_zspage(struct page *page)
 {
@@ -821,37 +823,33 @@ static struct page *get_next_page(struct page *page)
return page->next;
 }
 
-/*
- * Encode  as a single handle value.
- * We use the least bit of handle for tagging.
+/**
+ * obj_to_location - get (, ) from encoded object value
+ * @page: page object resides in zspage
+ * @obj_idx: object index
  */
-static void *location_to_obj(struct page *page, unsigned long obj_idx)
+static void obj_to_location(unsigned long obj, struct page **page,
+   unsigned int *obj_idx)
 {
-   unsigned long obj;
+   obj >>= OBJ_TAG_BITS;
+   *page = pfn_to_page(obj >> OBJ_INDEX_BITS);
+   *obj_idx = (obj & OBJ_INDEX_MASK);
+}
 
-   if (!page) {
-   VM_BUG_ON(obj_idx);
-   return NULL;
-   }
+/**
+ * location_to_obj - get obj value encoded from (, )
+ * @page: page object resides in zspage
+ * @obj_idx: object index
+ */
+static unsigned long location_to_obj(struct page *page, unsigned int obj_idx)
+{
+   unsigned long obj;
 
obj = page_to_pfn(page) << OBJ_INDEX_BITS;
-   obj |= ((obj_idx) & OBJ_INDEX_MASK);
+   obj |= obj_idx & OBJ_INDEX_MASK;
obj <<= OBJ_TAG_BITS;
 
-   return (void *)obj;
-}
-
-/*
- * Decode  pair from the given object handle. We adjust the
- * decoded obj_idx back to its original value since it was adjusted in
- * location_to_obj().
- */
-static void obj_to_location(unsigned long obj, struct page **page,
-   unsigned long *obj_idx)
-{
-   obj >>= OBJ_TAG_BITS;
-   *page = pfn_to_page(obj >> OBJ_INDEX_BITS);
-   *obj_idx = (obj & OBJ_INDEX_MASK);
+   return obj;
 }
 
 static unsigned long handle_to_obj(unsigned long handle)
@@ -869,16 +867,6 @@ static unsigned long obj_to_head(struct size_class *class, 
struct page *page,
return *(unsigned long *)obj;
 }
 
-static unsigned long obj_idx_to_offset(struct page *page,
-   unsigned long obj_idx, int class_size)
-{
-   unsigned long off;
-
-   off = get_first_obj_offset(page);
-
-   return off + obj_idx * class_size;
-}
-
 static inline int trypin_tag(unsigned long handle)
 {
return bit_spin_trylock(HANDLE_PIN_BIT, (unsigned long *)handle);
@@ -922,13 +910,13 @@ static void free_zspage(struct zs_pool *pool, struct 
zspage *zspage)
 /* Initialize a newly allocated zspage */
 static void init_zspage(struct size_class *class, struct zspage *zspage)
 {
+   uns

[PATCH v4 04/12] zsmalloc: keep max_object in size_class

2016-04-27 Thread Minchan Kim

Every zspage in a size_class has same number of max objects so
we could move it to a size_class.

Reviewed-by: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index a0890e9003e2..8649d0243e6c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -32,8 +32,6 @@
  * page->freelist: points to the first free object in zspage.
  * Free objects are linked together using in-place
  * metadata.
- * page->objects: maximum number of objects we can store in this
- * zspage (class->zspage_order * PAGE_SIZE / class->size)
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
  * page->mapping: class index and fullness group of the zspage
@@ -211,6 +209,7 @@ struct size_class {
 * of ZS_ALIGN.
 */
int size;
+   int objs_per_zspage;
unsigned int index;
 
struct zs_size_stat stats;
@@ -627,21 +626,22 @@ static inline void zs_pool_stat_destroy(struct zs_pool 
*pool)
  * the pool (not yet implemented). This function returns fullness
  * status of the given page.
  */
-static enum fullness_group get_fullness_group(struct page *first_page)
+static enum fullness_group get_fullness_group(struct size_class *class,
+   struct page *first_page)
 {
-   int inuse, max_objects;
+   int inuse, objs_per_zspage;
enum fullness_group fg;
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
inuse = first_page->inuse;
-   max_objects = first_page->objects;
+   objs_per_zspage = class->objs_per_zspage;
 
if (inuse == 0)
fg = ZS_EMPTY;
-   else if (inuse == max_objects)
+   else if (inuse == objs_per_zspage)
fg = ZS_FULL;
-   else if (inuse <= 3 * max_objects / fullness_threshold_frac)
+   else if (inuse <= 3 * objs_per_zspage / fullness_threshold_frac)
fg = ZS_ALMOST_EMPTY;
else
fg = ZS_ALMOST_FULL;
@@ -728,7 +728,7 @@ static enum fullness_group fix_fullness_group(struct 
size_class *class,
enum fullness_group currfg, newfg;
 
get_zspage_mapping(first_page, &class_idx, &currfg);
-   newfg = get_fullness_group(first_page);
+   newfg = get_fullness_group(class, first_page);
if (newfg == currfg)
goto out;
 
@@ -1008,9 +1008,6 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
init_zspage(class, first_page);
 
first_page->freelist = location_to_obj(first_page, 0);
-   /* Maximum number of objects we can store in this zspage */
-   first_page->objects = class->pages_per_zspage * PAGE_SIZE / class->size;
-
error = 0; /* Success */
 
 cleanup:
@@ -1238,11 +1235,11 @@ static bool can_merge(struct size_class *prev, int 
size, int pages_per_zspage)
return true;
 }
 
-static bool zspage_full(struct page *first_page)
+static bool zspage_full(struct size_class *class, struct page *first_page)
 {
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   return first_page->inuse == first_page->objects;
+   return first_page->inuse == class->objs_per_zspage;
 }
 
 unsigned long zs_get_total_pages(struct zs_pool *pool)
@@ -1628,7 +1625,7 @@ static int migrate_zspage(struct zs_pool *pool, struct 
size_class *class,
}
 
/* Stop if there is no more space */
-   if (zspage_full(d_page)) {
+   if (zspage_full(class, d_page)) {
unpin_tag(handle);
ret = -ENOMEM;
break;
@@ -1687,7 +1684,7 @@ static enum fullness_group putback_zspage(struct zs_pool 
*pool,
 {
enum fullness_group fullness;
 
-   fullness = get_fullness_group(first_page);
+   fullness = get_fullness_group(class, first_page);
insert_zspage(class, fullness, first_page);
set_zspage_mapping(first_page, class->index, fullness);
 
@@ -1936,8 +1933,9 @@ struct zs_pool *zs_create_pool(const char *name, gfp_t 
flags)
class->size = size;
class->index = i;
class->pages_per_zspage = pages_per_zspage;
-   if (pages_per_zspage == 1 &&
-   get_maxobj_per_zspage(size, pages_per_zspage) == 1)
+   class->objs_per_zspage = class->pages_per_zspage *
+   PAGE_SIZE / class->size;
+   if (pages_per_zspage == 1 && class->objs_per_zspage == 1)
class->huge = true;
spin_lock_init(&class->lock);
pool->size_class[i] = class;
-- 
1.9.1

[PATCH v4 03/12] mm: balloon: use general non-lru movable page feature

2016-04-27 Thread Minchan Kim

Now, VM has a feature to migrate non-lru movable pages so
balloon doesn't need custom migration hooks in migrate.c
and compaction.c. Instead, this patch implements
page->mapping->a_ops->{isolate|migrate|putback} functions.

With that, we could remove hooks for ballooning in general
migration functions and make balloon compaction simple.

Cc: virtualizat...@lists.linux-foundation.org
Cc: Rafael Aquini 
Cc: Konstantin Khlebnikov 
Signed-off-by: Gioh Kim 
Signed-off-by: Minchan Kim 
---
 drivers/virtio/virtio_balloon.c| 52 +++--
 include/linux/balloon_compaction.h | 51 ++---
 include/uapi/linux/magic.h |  1 +
 mm/balloon_compaction.c| 94 +++---
 mm/compaction.c|  7 ---
 mm/migrate.c   | 19 +---
 mm/vmscan.c|  2 +-
 7 files changed, 83 insertions(+), 143 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7b6d74f0c72f..04fc63b4a735 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -45,6 +46,10 @@ static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
 module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 
+#ifdef CONFIG_BALLOON_COMPACTION
+static struct vfsmount *balloon_mnt;
+#endif
+
 struct virtio_balloon {
struct virtio_device *vdev;
struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
@@ -486,6 +491,24 @@ static int virtballoon_migratepage(struct balloon_dev_info 
*vb_dev_info,
 
return MIGRATEPAGE_SUCCESS;
 }
+
+static struct dentry *balloon_mount(struct file_system_type *fs_type,
+   int flags, const char *dev_name, void *data)
+{
+   static const struct dentry_operations ops = {
+   .d_dname = simple_dname,
+   };
+
+   return mount_pseudo(fs_type, "balloon-kvm:", NULL, &ops,
+   BALLOON_KVM_MAGIC);
+}
+
+static struct file_system_type balloon_fs = {
+   .name   = "balloon-kvm",
+   .mount  = balloon_mount,
+   .kill_sb= kill_anon_super,
+};
+
 #endif /* CONFIG_BALLOON_COMPACTION */
 
 static int virtballoon_probe(struct virtio_device *vdev)
@@ -515,9 +538,6 @@ static int virtballoon_probe(struct virtio_device *vdev)
vb->vdev = vdev;
 
balloon_devinfo_init(&vb->vb_dev_info);
-#ifdef CONFIG_BALLOON_COMPACTION
-   vb->vb_dev_info.migratepage = virtballoon_migratepage;
-#endif
 
err = init_vqs(vb);
if (err)
@@ -527,13 +547,33 @@ static int virtballoon_probe(struct virtio_device *vdev)
vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
err = register_oom_notifier(&vb->nb);
if (err < 0)
-   goto out_oom_notify;
+   goto out_del_vqs;
+
+#ifdef CONFIG_BALLOON_COMPACTION
+   balloon_mnt = kern_mount(&balloon_fs);
+   if (IS_ERR(balloon_mnt)) {
+   err = PTR_ERR(balloon_mnt);
+   unregister_oom_notifier(&vb->nb);
+   goto out_del_vqs;
+   }
+
+   vb->vb_dev_info.migratepage = virtballoon_migratepage;
+   vb->vb_dev_info.inode = alloc_anon_inode(balloon_mnt->mnt_sb);
+   if (IS_ERR(vb->vb_dev_info.inode)) {
+   err = PTR_ERR(vb->vb_dev_info.inode);
+   kern_unmount(balloon_mnt);
+   unregister_oom_notifier(&vb->nb);
+   vb->vb_dev_info.inode = NULL;
+   goto out_del_vqs;
+   }
+   vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
+#endif
 
virtio_device_ready(vdev);
 
return 0;
 
-out_oom_notify:
+out_del_vqs:
vdev->config->del_vqs(vdev);
 out_free_vb:
kfree(vb);
@@ -567,6 +607,8 @@ static void virtballoon_remove(struct virtio_device *vdev)
cancel_work_sync(&vb->update_balloon_stats_work);
 
remove_common(vb);
+   if (vb->vb_dev_info.inode)
+   iput(vb->vb_dev_info.inode);
kfree(vb);
 }
 
diff --git a/include/linux/balloon_compaction.h 
b/include/linux/balloon_compaction.h
index 9b0a15d06a4f..79542b2698ec 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Balloon device information descriptor.
@@ -62,6 +63,7 @@ struct balloon_dev_info {
struct list_head pages; /* Pages enqueued & handled to Host */
int (*migratepage)(struct balloon_dev_info *, struct page *newpage,
struct page *page, enum migrate_mode mode);
+   struct inode *inode;
 };
 
 extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
@@ -73,45 +75,19 @@ static inline void balloon_devinfo_init(struct 
balloon_dev_info *balloon)
spin_loc

[PATCH v4 06/12] zsmalloc: use accessor

2016-04-27 Thread Minchan Kim

Upcoming patch will change how to encode zspage meta so for easy review,
this patch wraps code to access metadata as accessor.

Cc: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 82 +++
 1 file changed, 60 insertions(+), 22 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index d75d9e4b6a4d..8c22b0ca1df7 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -267,10 +267,14 @@ struct zs_pool {
  * A zspage's class index and fullness group
  * are encoded in its (first)page->mapping
  */
-#define CLASS_IDX_BITS 28
 #define FULLNESS_BITS  4
-#define CLASS_IDX_MASK ((1 << CLASS_IDX_BITS) - 1)
-#define FULLNESS_MASK  ((1 << FULLNESS_BITS) - 1)
+#define CLASS_BITS 28
+
+#define FULLNESS_SHIFT 0
+#define CLASS_SHIFT(FULLNESS_SHIFT + FULLNESS_BITS)
+
+#define FULLNESS_MASK  ((1UL << FULLNESS_BITS) - 1)
+#define CLASS_MASK ((1UL << CLASS_BITS) - 1)
 
 struct mapping_area {
 #ifdef CONFIG_PGTABLE_MAPPING
@@ -412,6 +416,41 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
 }
 
+static inline int get_zspage_inuse(struct page *first_page)
+{
+   return first_page->inuse;
+}
+
+static inline void set_zspage_inuse(struct page *first_page, int val)
+{
+   first_page->inuse = val;
+}
+
+static inline void mod_zspage_inuse(struct page *first_page, int val)
+{
+   first_page->inuse += val;
+}
+
+static inline int get_first_obj_offset(struct page *page)
+{
+   return page->index;
+}
+
+static inline void set_first_obj_offset(struct page *page, int offset)
+{
+   page->index = offset;
+}
+
+static inline unsigned long get_freeobj(struct page *first_page)
+{
+   return (unsigned long)first_page->freelist;
+}
+
+static inline void set_freeobj(struct page *first_page, unsigned long obj)
+{
+   first_page->freelist = (void *)obj;
+}
+
 static void get_zspage_mapping(struct page *first_page,
unsigned int *class_idx,
enum fullness_group *fullness)
@@ -420,8 +459,8 @@ static void get_zspage_mapping(struct page *first_page,
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
m = (unsigned long)first_page->mapping;
-   *fullness = m & FULLNESS_MASK;
-   *class_idx = (m >> FULLNESS_BITS) & CLASS_IDX_MASK;
+   *fullness = (m >> FULLNESS_SHIFT) & FULLNESS_MASK;
+   *class_idx = (m >> CLASS_SHIFT) & CLASS_MASK;
 }
 
 static void set_zspage_mapping(struct page *first_page,
@@ -431,8 +470,7 @@ static void set_zspage_mapping(struct page *first_page,
unsigned long m;
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) |
-   (fullness & FULLNESS_MASK);
+   m = (class_idx << CLASS_SHIFT) | (fullness << FULLNESS_SHIFT);
first_page->mapping = (struct address_space *)m;
 }
 
@@ -634,7 +672,7 @@ static enum fullness_group get_fullness_group(struct 
size_class *class,
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   inuse = first_page->inuse;
+   inuse = get_zspage_inuse(first_page);
objs_per_zspage = class->objs_per_zspage;
 
if (inuse == 0)
@@ -680,7 +718,7 @@ static void insert_zspage(struct size_class *class,
 * empty/full. Put pages with higher ->inuse first.
 */
list_add_tail(&first_page->lru, &(*head)->lru);
-   if (first_page->inuse >= (*head)->inuse)
+   if (get_zspage_inuse(first_page) >= get_zspage_inuse(*head))
*head = first_page;
 }
 
@@ -857,7 +895,7 @@ static unsigned long obj_idx_to_offset(struct page *page,
unsigned long off = 0;
 
if (!is_first_page(page))
-   off = page->index;
+   off = get_first_obj_offset(page);
 
return off + obj_idx * class_size;
 }
@@ -892,7 +930,7 @@ static void free_zspage(struct page *first_page)
struct page *nextp, *tmp, *head_extra;
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
-   VM_BUG_ON_PAGE(first_page->inuse, first_page);
+   VM_BUG_ON_PAGE(get_zspage_inuse(first_page), first_page);
 
head_extra = (struct page *)page_private(first_page);
 
@@ -933,7 +971,7 @@ static void init_zspage(struct size_class *class, struct 
page *first_page)
 * head of corresponding zspage's freelist.
 */
if (page != first_page)
-   page->index = off;
+   set_first_obj_offset(page, off);
 
vaddr = kmap_atomic(page);
link = (struct link_free *)vaddr + off / sizeof(*link);
@@ -988,7 +1026,7 @@ static struct page *alloc_zspage(struct size_class *class, 
gfp_t flags)
SetPagePrivate(page);
set_page_private(page, 0);
first_page = page;
-   first_page->inuse = 0;
+

[PATCH v4 08/12] zsmalloc: introduce zspage structure

2016-04-27 Thread Minchan Kim

We have squeezed meta data of zspage into first page's descriptor.
So, to get meta data from subpage, we should get first page first
of all. But it makes trouble to implment page migration feature
of zsmalloc because any place where to get first page from subpage
can be raced with first page migration. IOW, first page it got
could be stale. For preventing it, I have tried several approahces
but it made code complicated so finally, I concluded to separate
metadata from first page. Of course, it consumes more memory. IOW,
16bytes per zspage on 32bit at the moment. It means we lost 1%
at *worst case*(40B/4096B) which is not bad I think at the cost of
maintenance.

Cc: Sergey Senozhatsky 
Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 532 +++---
 1 file changed, 243 insertions(+), 289 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b08ac1ae1743..4b5ead85c7e7 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -16,26 +16,11 @@
  * struct page(s) to form a zspage.
  *
  * Usage of struct page fields:
- * page->private: points to the first component (0-order) page
- * page->index (union with page->freelist): offset of the first object
- * starting in this page. For the first page, this is
- * always 0, so we use this field (aka freelist) to point
- * to the first free object in zspage.
- * page->lru: links together all component pages (except the first page)
- * of a zspage
- *
- * For _first_ page only:
- *
- * page->private: refers to the component page after the first page
- * If the page is first_page for huge object, it stores handle.
- * Look at size_class->huge.
- * page->freelist: points to the first free object in zspage.
- * Free objects are linked together using in-place
- * metadata.
- * page->lru: links together first pages of various zspages.
- * Basically forming list of zspages in a fullness group.
- * page->mapping: class index and fullness group of the zspage
- * page->inuse: the number of objects that are used in this zspage
+ * page->private: points to zspage
+ * page->index: offset of the first object starting in this page.
+ * For the first page, this is always 0, so we use this field
+ * to store handle for huge object.
+ * page->next: links together all component pages of a zspage
  *
  * Usage of struct page flags:
  * PG_private: identifies the first component page
@@ -145,7 +130,7 @@
  *  ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
  *  (reason above)
  */
-#define ZS_SIZE_CLASS_DELTA(PAGE_SIZE >> 8)
+#define ZS_SIZE_CLASS_DELTA(PAGE_SIZE >> CLASS_BITS)
 
 /*
  * We do not maintain any list for completely empty or full pages
@@ -153,8 +138,6 @@
 enum fullness_group {
ZS_ALMOST_FULL,
ZS_ALMOST_EMPTY,
-   _ZS_NR_FULLNESS_GROUPS,
-
ZS_EMPTY,
ZS_FULL
 };
@@ -203,7 +186,7 @@ static const int fullness_threshold_frac = 4;
 
 struct size_class {
spinlock_t lock;
-   struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS];
+   struct list_head fullness_list[2];
/*
 * Size of objects stored in this class. Must be multiple
 * of ZS_ALIGN.
@@ -222,7 +205,7 @@ struct size_class {
 
 /*
  * Placed within free objects to form a singly linked list.
- * For every zspage, first_page->freelist gives head of this list.
+ * For every zspage, zspage->freeobj gives head of this list.
  *
  * This must be power of 2 and less than or equal to ZS_ALIGN
  */
@@ -245,6 +228,7 @@ struct zs_pool {
 
struct size_class **size_class;
struct kmem_cache *handle_cachep;
+   struct kmem_cache *zspage_cachep;
 
gfp_t flags;/* allocation flags used when growing pool */
atomic_long_t pages_allocated;
@@ -267,14 +251,19 @@ struct zs_pool {
  * A zspage's class index and fullness group
  * are encoded in its (first)page->mapping
  */
-#define FULLNESS_BITS  4
-#define CLASS_BITS 28
+#define FULLNESS_BITS  2
+#define CLASS_BITS 8
 
-#define FULLNESS_SHIFT 0
-#define CLASS_SHIFT(FULLNESS_SHIFT + FULLNESS_BITS)
-
-#define FULLNESS_MASK  ((1UL << FULLNESS_BITS) - 1)
-#define CLASS_MASK ((1UL << CLASS_BITS) - 1)
+struct zspage {
+   struct {
+   unsigned int fullness:FULLNESS_BITS;
+   unsigned int class:CLASS_BITS;
+   };
+   unsigned int inuse;
+   void *freeobj;
+   struct page *first_page;
+   struct list_head list; /* fullness list */
+};
 
 struct mapping_area {
 #ifdef CONFIG_PGTABLE_MAPPING
@@ -286,29 +275,55 @@ struct mapping_area {
enum zs_mapmode vm_mm; /* mapping mode */
 };
 
-static int create_handle_cache(struct zs_pool *pool)
+static int create_cache(struct zs_pool *pool)
 {
pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE,

[PATCH v4 00/13] Support non-lru page migration

2016-04-27 Thread Minchan Kim

Recently, I got many reports about perfermance degradation in embedded
system(Android mobile phone, webOS TV and so on) and easy fork fail.

The problem was fragmentation caused by zram and GPU driver mainly.
With memory pressure, their pages were spread out all of pageblock and
it cannot be migrated with current compaction algorithm which supports
only LRU pages. In the end, compaction cannot work well so reclaimer
shrinks all of working set pages. It made system very slow and even to
fail to fork easily which requires order-[2 or 3] allocations.

Other pain point is that they cannot use CMA memory space so when OOM
kill happens, I can see many free pages in CMA area, which is not
memory efficient. In our product which has big CMA memory, it reclaims
zones too exccessively to allocate GPU and zram page although there are
lots of free space in CMA so system becomes very slow easily.

To solve these problem, this patch tries to add facility to migrate
non-lru pages via introducing new functions and page flags to help
migration.


struct address_space_operations {
..
..
bool (*isolate_page)(struct page *, isolate_mode_t);
void (*putback_page)(struct page *);
..
}

new page flags

PG_movable
PG_isolated

For details, please read description in "mm: migrate: support non-lru
movable page migration".

Originally, Gioh Kim had tried to support this feature but he moved so
I took over the work. I took many code from his work and changed a little
bit and Konstantin Khlebnikov helped Gioh a lot so he should deserve to have
many credit, too.

Thanks, Gioh and Konstantin!

This patchset consists of five parts.

1. clean up migration
  mm: use put_page to free page instead of putback_lru_page

2. add non-lru page migration feature
  mm: migrate: support non-lru movable page migration

3. rework KVM memory-ballooning
  mm: balloon: use general non-lru movable page feature

4. zsmalloc refactoring for preparing page migration
  zsmalloc: keep max_object in size_class
  zsmalloc: use bit_spin_lock
  zsmalloc: use accessor
  zsmalloc: factor page chain functionality out
  zsmalloc: introduce zspage structure
  zsmalloc: separate free_zspage from putback_zspage
  zsmalloc: use freeobj for index

5. zsmalloc page migration
  zsmalloc: page migration support
  zram: use __GFP_MOVABLE for memory allocation

* From v3
  * rebase on mmotm-2016-04-06-20-40
  * fix swap_info deadlock - Chulmin
  * race without page_lock - Vlastimil
  * no use page._mapcount for potential user-mapped page driver - Vlastimil
  * fix and enhance doc/description - Vlastimil
  * use page->mapping lower bits to represent PG_movable
  * make driver side's rule simple.

* From v2
  * rebase on mmotm-2016-03-29-15-54-16
  * check PageMovable before lock_page - Joonsoo
  * check PageMovable before PageIsolated checking - Joonsoo
  * add more description about rule

* From v1
  * rebase on v4.5-mmotm-2016-03-17-15-04
  * reordering patches to merge clean-up patches first
  * add Acked-by/Reviewed-by from Vlastimil and Sergey
  * use each own mount model instead of reusing anon_inode_fs - Al Viro
  * small changes - YiPing, Gioh

Cc: Vlastimil Babka 
Cc: dri-de...@lists.freedesktop.org
Cc: Hugh Dickins 
Cc: John Einar Reitan 
Cc: Jonathan Corbet 
Cc: Joonsoo Kim 
Cc: Konstantin Khlebnikov 
Cc: Mel Gorman 
Cc: Naoya Horiguchi 
Cc: Rafael Aquini 
Cc: Rik van Riel 
Cc: Sergey Senozhatsky 
Cc: virtualizat...@lists.linux-foundation.org
Cc: Gioh Kim 
Cc: Chan Gyun Jeong 
Cc: Sangseok Lee 
Cc: Kyeongdon Kim 
Cc: Chulmin Kim 

Minchan Kim (12):
  mm: use put_page to free page instead of  putback_lru_page
  mm: migrate: support non-lru movable page migration
  mm: balloon: use general non-lru movable page feature
  zsmalloc: keep max_object in size_class
  zsmalloc: use bit_spin_lock
  zsmalloc: use accessor
  zsmalloc: factor page chain functionality out
  zsmalloc: introduce zspage structure
  zsmalloc: separate free_zspage from putback_zspage
  zsmalloc: use freeobj for index
  zsmalloc: page migration support
  zram: use __GFP_MOVABLE for memory allocation

 Documentation/filesystems/Locking  |4 +
 Documentation/filesystems/vfs.txt  |   11 +
 Documentation/vm/page_migration|  107 +++-
 drivers/block/zram/zram_drv.c  |3 +-
 drivers/virtio/virtio_balloon.c|   52 +-
 include/linux/balloon_compaction.h |   51 +-
 include/linux/fs.h |2 +
 include/linux/ksm.h|3 +-
 include/linux/migrate.h|5 +
 include/linux/mm.h |1 +
 include/linux/page-flags.h |   23 +-
 include/uapi/linux/magic.h |2 +
 mm/balloon_compaction.c|   94 +--
 mm/compaction.c|   40 +-
 mm/ksm.c   |4 +-
 mm/migrate.c   |  287 +++--
 mm/page_alloc.c|4 +-
 mm/util.c  |6 +-
 mm/vmscan.c

Re: [PATCH v2 0/6] simplify rtc-generic driver

2016-04-27 Thread Geert Uytterhoeven

Hi Arnd,

On Tue, Apr 26, 2016 at 11:52 PM, Arnd Bergmann  wrote:
> This is a resend of an earlier series, to clean up the rtc-generic
> driver by avoiding the dependency on the architecture specific
> include/asm/rtc.h header that after this series is only used
> for the deprecated "genrtc" driver. As I've shown in another
> series, only three architectures (m68k, powerpc, parisc)
> actually use the genrtc driver, and they all support rtc-generic
> as a replacement as well.
>
> The only missing piece appears to be the ioctl support for
> the m68k q40 machine that I'm adding in patch 2 here.

Apparently I had applied your previous version to my local tree, but I had
completely forgotten about it. So it has received quite some compile testing.

CONFIG_GEN_RTC is not enabled in any of the m68k defconfigs, so I think it's
been unused for a while.
CONFIG_RTC_DRV_GENERIC is modular, so I typically don't run-test it.
I just did that, and after fixing patch 1 to use IS_ENABLED() it worked fine
on ARAnyM.

Tested-by: Geert Uytterhoeven 

I do not have a Q40, so I couldn't test that part.

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH 1/2] mm: add PF_MEMALLOC_NOFS

2016-04-27 Thread Michal Hocko

On Wed 27-04-16 09:07:02, Dave Chinner wrote:
> On Tue, Apr 26, 2016 at 01:56:11PM +0200, Michal Hocko wrote:
> > From: Michal Hocko 
> > 
> > GFP_NOFS context is used for the following 4 reasons currently
> > - to prevent from deadlocks when the lock held by the allocation
> >   context would be needed during the memory reclaim
> > - to prevent from stack overflows during the reclaim because
> >   the allocation is performed from a deep context already
> > - to prevent lockups when the allocation context depends on
> >   other reclaimers to make a forward progress indirectly
> > - just in case because this would be safe from the fs POV
> 
> - silencing lockdep false positives
> 
> > Introduce PF_MEMALLOC_NOFS task specific flag and 
> > memalloc_nofs_{save,restore}
> > API to control the scope. This is basically copying
> > memalloc_noio_{save,restore} API we have for other restricted allocation
> > context GFP_NOIO.
> > 
> > Xfs has already had a similar functionality as PF_FSTRANS so let's just
> > give it a more generic name and make it usable for others as well and
> > move the GFP_NOFS context tracking to the page allocator. Xfs has its
> > own accessor functions but let's keep them for now to reduce this patch
> > as minimum.
> 
> Can you split this into two patches? The first simply does this:
> 
> #define PF_MEMALLOC_NOFS PF_FSTRANS
> 
> and changes only the XFS code to use PF_MEMALLOC_NOFS.
> 
> The second patch can then do the rest of the mm API changes that we
> don't actually care about in XFS at all.  That way I can carry all
> the XFS changes in the XFS tree and not have to worry about when
> this stuff gets merged or conflicts with the rest of the work that
> is being done to the mm/ code and whatever tree that eventually
> lands in...

Sure I will do that

-- 
Michal Hocko
SUSE Labs

Re: [PATCH 3/3] mtd: brcmnand: respect ECC algorithm set by NAND subsystem

2016-04-27 Thread Boris Brezillon

On Mon, 25 Apr 2016 22:53:55 -0700
Brian Norris  wrote:

> From: Brian Norris 
> Date: Mon, 25 Apr 2016 20:48:02 -0700
> Subject: [PATCH] mtd: brcmnand: respect ECC algorithm set by the NAND
>  subsystem
> 
> This is more obvious than guessing based on ECC strength. It allows
> using NAND on devices with BCH-1 (e.g. D-Link DIR-885L).
> 
> This maintains DT backward compatibility by defaulting to Hamming if a
> 1-bit ECC algorithm is specified without a corresponding algorithm
> selection. i.e., to use BCH-1, you must specify:
> 
>   nand-ecc-strength = <1>;
>   nand-ecc-step-size = <512>;
>   nand-ecc-algo = "bch";
> 
> Also adds a check to ensure we haven't allowed someone to get by with SW
> ECC. If we want to support SW ECC, we need to refactor some other pieces
> of this driver.
> 
> Signed-off-by: Brian Norris 

Applied, thanks.

Boris

> ---
>  drivers/mtd/nand/brcmnand/brcmnand.c | 24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c 
> b/drivers/mtd/nand/brcmnand/brcmnand.c
> index c3331ffcaffd..b76ad7c0144f 100644
> --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> @@ -1925,9 +1925,31 @@ static int brcmnand_setup_dev(struct brcmnand_host 
> *host)
>   cfg->col_adr_bytes = 2;
>   cfg->blk_adr_bytes = get_blk_adr_bytes(mtd->size, mtd->writesize);
>  
> + if (chip->ecc.mode != NAND_ECC_HW) {
> + dev_err(ctrl->dev, "only HW ECC supported; selected: %d\n",
> + chip->ecc.mode);
> + return -EINVAL;
> + }
> +
> + if (chip->ecc.algo == NAND_ECC_UNKNOWN) {
> + if (chip->ecc.strength == 1 && chip->ecc.size == 512)
> + /* Default to Hamming for 1-bit ECC, if unspecified */
> + chip->ecc.algo = NAND_ECC_HAMMING;
> + else
> + /* Otherwise, BCH */
> + chip->ecc.algo = NAND_ECC_BCH;
> + }
> +
> + if (chip->ecc.algo == NAND_ECC_HAMMING && (chip->ecc.strength != 1 ||
> +chip->ecc.size != 512)) {
> + dev_err(ctrl->dev, "invalid Hamming params: %d bits per %d 
> bytes\n",
> + chip->ecc.strength, chip->ecc.size);
> + return -EINVAL;
> + }
> +
>   switch (chip->ecc.size) {
>   case 512:
> - if (chip->ecc.strength == 1) /* Hamming */
> + if (chip->ecc.algo == NAND_ECC_HAMMING)
>   cfg->ecc_level = 15;
>   else
>   cfg->ecc_level = chip->ecc.strength;



-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

Re: char: legacy RTC cleanups

2016-04-27 Thread Geert Uytterhoeven

Hi Arnd,

On Tue, Apr 26, 2016 at 11:44 PM, Arnd Bergmann  wrote:
> For the genrtc driver, rearranging the headers makes it simpler
> to use and reduces duplication. In case of alpha and mn10300,
> I've shown that the genrtc and rtc drivers are doing the same
> thing, so we don't need them both. The remaining three
> architectures (m68k, parisc, powerpc) actually all support
> the newer rtc-generic driver, so we could remove genrtc completely
> if we want to.

CONFIG_GEN_RTC is not enabled in any of the m68k defconfigs, so I think genrtc
has been unused for a while.
All defconfigs either use CONFIG_RTC_DRV_GENERIC, or enable a more specific
RTC driver.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH V2] mtd: nand: fix NULL pointer dereference in of_get_nand_ecc_algo

2016-04-27 Thread Boris Brezillon

On Tue, 26 Apr 2016 23:16:47 +0200
Rafał Miłecki  wrote:

> Our array nand_ecc_algos doesn't specify mappings for all available
> enum nand_ecc_algo values. The one missing there is NAND_ECC_UNKNOWN
> as this value is reserved for algorithm not being specified at all.
> It means we have to be careful when iterating this array and avoid
> NULL values.
> 
> Signed-off-by: Rafał Miłecki 

Changes squashed into "mtd: nand: add support for "nand-ecc-algo" DT
property".

Thanks,

Boris

> ---
> V2: Iterate from NAND_ECC_HAMMING instead of checking for NULL as preferred by
> Boris.
> ---
>  drivers/mtd/nand/nand_base.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index a5417a0..2e6ba44 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -4015,7 +4015,7 @@ static int of_get_nand_ecc_algo(struct device_node *np)
>  
>   err = of_property_read_string(np, "nand-ecc-algo", &pm);
>   if (!err) {
> - for (i = 0; i < ARRAY_SIZE(nand_ecc_algos); i++)
> + for (i = NAND_ECC_HAMMING; i < ARRAY_SIZE(nand_ecc_algos); i++)
>   if (!strcasecmp(pm, nand_ecc_algos[i]))
>   return i;
>   return -ENODEV;



-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

Re: zram: per-cpu compression streams

2016-04-27 Thread Minchan Kim

On Wed, Apr 27, 2016 at 04:43:35PM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (04/27/16 16:29), Minchan Kim wrote:
> [..]
> > > the test:
> > > 
> > > -- 4 GB x86_64 box
> > > -- zram 3GB, lzo
> > > -- mem-hogger pre-faults 3GB of pages before the fio test
> > > -- fio test has been modified to have 11% compression ratio (to increase 
> > > the
> > >   chances of 
> > > re-compressions)
> > 
> > Could you test concurrent mem hogger with fio rather than pre-fault before 
> > fio test
> > in next submit?
> 
> this test will not prove anything, unfortunately. I performed it;
> and it's impossible to guarantee even remotely stable results.
> mem-hogger process can spend on pre-fault from 41 to 81 seconds;
> so I'm quite sceptical about the actual value of this test.
> 
> > > considering buffer_compress_percentage=11, the box was under somewhat
> > > heavy pressure.
> > > 
> > > now, the results
> > 
> > Yeb, Even, recompression case is fater than old but want to see more heavy 
> > memory
> > pressure case and the ratio I mentioned above.
> 
> I did quite heavy testing over the last 7 days, with numerous OOM kills
> and OOM panics.

Okay, I think it's worth to merge enough and see the result.
Please send formal patch which has recompression stat. ;-)

Thanks.

[PATCH v4 4/4] ARM64: dts: rockchip: add dts file for RK3399 evaluation board

2016-04-27 Thread Jianqun Xu

This patch add rk3399-evb.dts for RK3399 evaluation board.
Tested on RK3399 evb.

Signed-off-by: Jianqun Xu 
---
changes in v4:
- add google,rk3399evb-rev2 compatible (Doug, Heiko)

changes in v3:
- add more compatible (Doug)
- add modle

 arch/arm64/boot/dts/rockchip/Makefile   |   1 +
 arch/arm64/boot/dts/rockchip/rk3399-evb.dts | 124 
 2 files changed, 125 insertions(+)
 create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-evb.dts

diff --git a/arch/arm64/boot/dts/rockchip/Makefile 
b/arch/arm64/boot/dts/rockchip/Makefile
index df37865..7037a16 100644
--- a/arch/arm64/boot/dts/rockchip/Makefile
+++ b/arch/arm64/boot/dts/rockchip/Makefile
@@ -1,6 +1,7 @@
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3368-evb-act8846.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3368-geekbox.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3368-r88.dtb
+dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-evb.dtb
 
 always := $(dtb-y)
 subdir-y   := $(dts-dirs)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-evb.dts 
b/arch/arm64/boot/dts/rockchip/rk3399-evb.dts
new file mode 100644
index 000..1a3eb14
--- /dev/null
+++ b/arch/arm64/boot/dts/rockchip/rk3399-evb.dts
@@ -0,0 +1,124 @@
+/*
+ * Copyright (c) 2016 Fuzhou Rockchip Electronics Co., Ltd
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use,
+ * copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+#include 
+#include "rk3399.dtsi"
+
+/ {
+   model = "Rockchip RK3399 Evaluation Board";
+   compatible = "rockchip,rk3399-evb", "rockchip,rk3399",
+"google,rk3399evb-rev2";
+
+   vdd_center: vdd-center {
+   compatible = "pwm-regulator";
+   pwms = <&pwm3 0 25000 0>;
+   regulator-name = "vdd_center";
+   regulator-min-microvolt = <80>;
+   regulator-max-microvolt = <140>;
+   regulator-always-on;
+   regulator-boot-on;
+   status = "okay";
+   };
+
+   vcc3v3_sys: vcc3v3-sys {
+   compatible = "regulator-fixed";
+   regulator-name = "vcc3v3_sys";
+   regulator-always-on;
+   regulator-boot-on;
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   };
+
+   vcc_phy: vcc-phy-regulator {
+   compatible = "regulator-fixed";
+   regulator-name = "vcc_phy";
+   regulator-always-on;
+   regulator-boot-on;
+   };
+};
+
+&pwm0 {
+   status = "okay";
+};
+
+&pwm2 {
+   status = "okay";
+};
+
+&pwm3 {
+   status = "okay";
+};
+
+&uart2 {
+   status = "okay";
+};
+
+&usb_host0_ehci {
+   status = "okay";
+};
+
+&usb_host0_ohci {
+   status = "okay";
+};
+
+&usb_host1_ehci {
+   status = "okay";
+};
+
+&usb_host1_ohci {
+   status = "okay";
+};
+
+&pinctrl {
+   pmic {
+   pmic_int_l: pmic-int-l {
+   rockchip,pins =
+   <1 21 RK_FUNC_GPIO &pcfg_pull_up>;
+   };
+
+   pmic_dvs2: pmic-dvs2 {
+   rockchi

[PATCH v4 1/4] Documentation: rockchip-dw-mshc: add description for rk3399

2016-04-27 Thread Jianqun Xu

From: Shawn Lin 

Add "rockchip,rk3399-dw-mshc", "rockchip,rk3288-dw-mshc" for
dwmmc on rk3399 platform.

Acked-by: Rob Herring 
Signed-off-by: Shawn Lin 
Signed-off-by: Jianqun Xu 
---
 Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt 
b/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt
index ea5614b..07184e8 100644
--- a/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt
+++ b/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt
@@ -15,6 +15,7 @@ Required Properties:
- "rockchip,rk3288-dw-mshc": for Rockchip RK3288
- "rockchip,rk3036-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip 
RK3036
- "rockchip,rk3368-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip 
RK3368
+   - "rockchip,rk3399-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip 
RK3399
 
 Optional Properties:
 * clocks: from common clock binding: if ciu_drive and ciu_sample are
-- 
1.9.1

[PATCH v4 2/4] ARM64: dts: rockchip: add core dtsi file for RK3399 SoCs

2016-04-27 Thread Jianqun Xu

This patch adds core dtsi file for Rockchip RK3399 SoCs.

The RK3399 has big/little architecture, which needs a separate
node for the PMU of each microarchitecture, for now it missing
the pmu node since the old one could not work well.

Marc is working on it with:
https://lkml.org/lkml/2016/4/11/182

and on the following branch:
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git
irq/percpu-partition

And it has been tested on RK3399 evb and works well.

Tested-by: Brian Norris 
Signed-off-by: Jianqun Xu 
---
changes in v4:
- none

changes in v3:
- tested irq/percpu-partition patches and worked well (huangtao)
- tested by Brian

changes in v2:
- remove rk808 since without i2c, which will upstream independently
- remove es8316 since without i2c, which will upstream independently
- fix codingstyle issues

 arch/arm64/boot/dts/rockchip/rk3399.dtsi | 1022 ++
 1 file changed, 1022 insertions(+)
 create mode 100644 arch/arm64/boot/dts/rockchip/rk3399.dtsi

diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
new file mode 100644
index 000..5a8a915
--- /dev/null
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -0,0 +1,1022 @@
+/*
+ * Copyright (c) 2016 Fuzhou Rockchip Electronics Co., Ltd
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use,
+ * copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/ {
+   compatible = "rockchip,rk3399";
+
+   interrupt-parent = <&gic>;
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   aliases {
+   serial0 = &uart0;
+   serial1 = &uart1;
+   serial2 = &uart2;
+   serial3 = &uart3;
+   serial4 = &uart4;
+   };
+
+   psci {
+   compatible = "arm,psci-1.0";
+   method = "smc";
+   };
+
+   cpus {
+   #address-cells = <2>;
+   #size-cells = <0>;
+
+   cpu-map {
+   cluster0 {
+   core0 {
+   cpu = <&cpu_l0>;
+   };
+   core1 {
+   cpu = <&cpu_l1>;
+   };
+   core2 {
+   cpu = <&cpu_l2>;
+   };
+   core3 {
+   cpu = <&cpu_l3>;
+   };
+   };
+
+   cluster1 {
+   core0 {
+   cpu = <&cpu_b0>;
+   };
+   core1 {
+   cpu = <&cpu_b1>;
+   };
+   };
+   };
+
+   cpu_l0: cpu@0 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a53", "arm,armv8";
+

[PATCH v4 3/4] Documentation: devicetree: rockchip: Document rk3399-evb

2016-04-27 Thread Jianqun Xu

Use "rockchip,rk3399-evb" compatible string for Rockchip RK3399
evaluation board.

Acked-by: Rob Herring 
Signed-off-by: Jianqun Xu 
---
changes in v4:
- none

changes in v3:
- modify title (Rob)

 Documentation/devicetree/bindings/arm/rockchip.txt | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/arm/rockchip.txt 
b/Documentation/devicetree/bindings/arm/rockchip.txt
index 2549519..6491b56 100644
--- a/Documentation/devicetree/bindings/arm/rockchip.txt
+++ b/Documentation/devicetree/bindings/arm/rockchip.txt
@@ -101,4 +101,8 @@ Rockchip platforms device tree bindings
 
 - Rockchip RK3228 Evaluation board:
 Required root node properties:
-  - compatible = "rockchip,rk3228-evb", "rockchip,rk3228";
+ - compatible = "rockchip,rk3228-evb", "rockchip,rk3228";
+
+- Rockchip RK3399 evb:
+Required root node properties:
+  - compatible = "rockchip,rk3399-evb", "rockchip,rk3399";
-- 
1.9.1

Re: [RESEND PATCH v2 6/7] usb: xhci: plat: add generic PHY support

2016-04-27 Thread Heikki Krogerus

Hi,

On Tue, Apr 26, 2016 at 08:57:39PM +0800, Jisheng Zhang wrote:
> @@ -232,22 +265,44 @@ static int xhci_plat_probe(struct platform_device *pdev)
>   if (HCC_MAX_PSA(xhci->hcc_params) >= 4)
>   xhci->shared_hcd->can_do_streams = 1;
>  
> + hcd->phy = devm_phy_get(&pdev->dev, "usb2-phy");
> + if (IS_ERR(hcd->phy)) {
> + ret = PTR_ERR(hcd->phy);
> + if (ret == -EPROBE_DEFER)
> + goto put_usb3_hcd;
> + hcd->phy = NULL;
> + }
> +
> + phy = devm_phy_get(&pdev->dev, "usb-phy");

"usb-phy" for what I understand is the USB3 PHY right?

I was unable to find any definition for the phy names for example from
Documentation/devicetree/bindings/usb/usb-xhci.txt, so I would say
this needs to be "usb3-phy" and the phy names need to be defined
somewhere.

Thanks,

-- 
heikki

[PATCH v4 0/4] ARM64: dts: rockchip: add support for RK3399

2016-04-27 Thread Jianqun Xu

Add dtsi file for RK3399 SoCs, and evb dts file for RK3399 evb.

To make patch more easily to be reviewed, some nodes have been removed
temporarily, after this base file been applied, more patches will be
upstreamed independently.

Jianqun Xu (3):
  ARM64: dts: rockchip: add core dtsi file for RK3399 SoCs
  Documentation: devicetree: rockchip: Document rk3399-evb
  ARM64: dts: rockchip: add dts file for RK3399 evaluation board

Shawn Lin (1):
  Documentation: rockchip-dw-mshc: add description for rk3399

 Documentation/devicetree/bindings/arm/rockchip.txt |6 +-
 .../devicetree/bindings/mmc/rockchip-dw-mshc.txt   |1 +
 arch/arm64/boot/dts/rockchip/Makefile  |1 +
 arch/arm64/boot/dts/rockchip/rk3399-evb.dts|  124 +++
 arch/arm64/boot/dts/rockchip/rk3399.dtsi   | 1022 
 5 files changed, 1153 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-evb.dts
 create mode 100644 arch/arm64/boot/dts/rockchip/rk3399.dtsi

--
changes in v4:
- add google,rk3399evb-rev2 compatible (Doug, Heiko)

changes in v3:
- modify title for patch 3/4 (Brian)
- add more compatible for evb dts (Doug)

changes in v2:
- split into more patches. (Heiko)
- remove arm-pmu at first. (Marc, Heiko, Mark)
- remove rga, emmc, usb3, mipi, edp, pd, i2c, gpu, thermal, tsadc, saradc, 
which will upstream independently
- remove rk808 since without i2c, which will upstream independently
- remove es8316 since without i2c, which will upstream independently
- add rockchip-dw-mshc binding patch
- add rockchip,rk3399-evb binding patch
- fix codingstyle issues

1.9.1

Re: [PATCH V2 1/2] devicetree/bindings: Add binding for operator panel on FSP machines

2016-04-27 Thread Suraj Jitindar Singh

On 27/04/16 15:03, Stewart Smith wrote:
> Suraj Jitindar Singh  writes:
>> Add a binding to Documentation/devicetree/bindings/powerpc/opal
>> (oppanel-opal.txt) for the operator panel which is present on IBM
>> pseries machines with FSPs.
> It's not pseries (as that implies PowerVM / PAPR) - while here we're all
> about OPAL.

Thanks, will fix that up.

> With a slight change to the commit message,
> Acked-by: Stewart Smith 
>
>

Re: [-mmots 2016-04-25] hugetlb: error: ‘cpu_has_pse’ undeclared

2016-04-27 Thread Ingo Molnar


* Andrew Morton  wrote:

> On Tue, 26 Apr 2016 23:14:35 +0900 Sergey Senozhatsky 
>  wrote:
> 
> > Hello,
> > 
> > v4.6-rc5-mmots-2016-04-25-17-33
> > 
> > 
> > In file included from include/linux/hugetlb.h:418:0,
> >  from fs/hugetlbfs/inode.c:28:
> > fs/hugetlbfs/inode.c: In function 'init_hugetlbfs_fs':
> > ./arch/x86/include/asm/hugetlb.h:7:31: error: 'cpu_has_pse' undeclared 
> > (first use in this function)
> >  #define hugepages_supported() cpu_has_pse
> >^
> 
> hm, how did that happen.  I had some issues with cpu_has_pse a number
> of days ago but they later went away.
> 
> In my current tree I have, in arch/x86/include/asm/hugetlb.h:
> 
> #define hugepages_supported() boot_cpu_has(X86_FEATURE_PSE)
> 
> and that came in from linux-next.patch.  I wonder why your tree is
> different.
> 
> 
> 
> OK, http://ozlabs.org/~akpm/mmots/broken-out/linux-next.patch has no
> changes to arch/x86/include/asm/hugetlb.h at all.  Maybe I fat-fingered
> something.  Odd.

So I think the reason is that cpu_has_pse is gone from the x86 devel tree, 
please 
use this instead:

  boot_cpu_has(X86_FEATURE_PSE)

Thanks,

Ingo

Re: [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context

2016-04-27 Thread Michal Hocko

On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> On Tue, Apr 26, 2016 at 01:56:12PM +0200, Michal Hocko wrote:
> > From: Michal Hocko 
> > 
> > THIS PATCH IS FOR TESTING ONLY AND NOT MEANT TO HIT LINUS TREE
> > 
> > It is desirable to reduce the direct GFP_NO{FS,IO} usage at minimum and
> > prefer scope usage defined by memalloc_no{fs,io}_{save,restore} API.
> > 
> > Let's help this process and add a debugging tool to catch when an
> > explicit allocation request for GFP_NO{FS,IO} is done from the scope
> > context. The printed stacktrace should help to identify the caller
> > and evaluate whether it can be changed to use a wider context or whether
> > it is called from another potentially dangerous context which needs
> > a scope protection as well.
> 
> You're going to get a large number of these from XFS. There are call
> paths in XFs that get called both inside and outside transaction
> context, and many of them are marked with GFP_NOFS to prevent issues
> that have cropped up in the past.
> 
> Often these are to silence lockdep warnings (e.g. commit b17cb36
> ("xfs: fix missing KM_NOFS tags to keep lockdep happy")) because
> lockdep gets very unhappy about the same functions being called with
> different reclaim contexts. e.g.  directory block mapping might
> occur from readdir (no transaction context) or within transactions
> (create/unlink). hence paths like this are tagged with GFP_NOFS to
> stop lockdep emitting false positive warnings

I would much rather see lockdep being fixed than abusing GFP_NOFS to
workaround its limitations. GFP_NOFS has a real consequences to the
memory reclaim. I will go and check the commit you mentioned and try
to understand why that is a problem. From what you described above
I would like to get rid of exactly this kind of usage which is not
really needed for the recursion protection.

Thanks!
-- 
Michal Hocko
SUSE Labs

Re: Double-Fetch bug in Linux-4.5/drivers/scsi/aacraid/commctrl.c

2016-04-27 Thread Dan Carpenter

On Wed, Apr 27, 2016 at 07:42:04AM +0200, Julia Lawall wrote:
> 
> 
> On Tue, 26 Apr 2016, Kees Cook wrote:
> 
> > On Mon, Apr 25, 2016 at 7:50 AM, Pengfei Wang  
> > wrote:
> > > Hello,
> > >
> > > I found this Double-Fetch bug in Linux-4.5/drivers/scsi/aacraid/commctrl.c
> > > when I was examining the source code.
> > 
> > Thanks for these reports! I wrote a coccinelle script to find these,
> > but it requires some manual checking. For what it's worth, it found
> > your report as well:
> > 
> > ./drivers/scsi/aacraid/commctrl.c:116:5-19: potentially dangerous
> > second copy_from_user()
> > 
> > So I should probably get this added to the coccicheck run... Maybe it
> > can get some clean up from Julia. :)
> 
> I looked a bit at the results, and didn't see anything obvious.  What is 
> the problem, exactly, and what would be a characteristic of a false 
> positive?
> 


copy_from_user(dest, src, sizeof(dest));

if (dest.extra > MAX_SIZE)
return -EINVAL;

copy_from_user(dest, src, sizeof(dest) + dest.extra);

for (i = 0; i < dest.extra; i++) {
dest.foo[i] = xxx;


We get dest.extra from the user, we verify the size, then we copy more
data from the user but that over writes dest.extra again.  We use
dest.extra a second time without checking that it's still <= MAX_SIZE.

regards,
dan carpenter

Re: [PATCH] perf/x86/amd: Explicitly define PERF_COUNT_HW_REF_CPU_CYCLES as undefined.

2016-04-27 Thread Ingo Molnar


* Adam Borowski  wrote:

> filter_events() relies on the value of 0 to remove events that are not
> applicable, like this one.
> 
> UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:30
> index 9 is out of range for type 'u64 [9]'
> UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:9
> load of address 81c021c8 with insufficient space
> for an object of type 'const u64'
> 
> Signed-off-by: Adam Borowski 
> ---
>  arch/x86/events/amd/core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
> index 86a9bec..5fa1b8e 100644
> --- a/arch/x86/events/amd/core.c
> +++ b/arch/x86/events/amd/core.c
> @@ -125,6 +125,7 @@ static const u64 amd_perfmon_event_map[] =
>[PERF_COUNT_HW_BRANCH_MISSES]  = 0x00c3,
>[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND]= 0x00d0, /* "Decoder empty" 
> event */
>[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x00d1, /* "Dispatch stalls" 
> event */
> +  [PERF_COUNT_HW_REF_CPU_CYCLES] =  0,
>  };

Hm, I think it would be cleaner and more robust to change this (and all other 
similar, if any) arrays to [PERF_COUNT_HW_MAX] instead.

Thanks,

Ingo

Re: [PATCH] ARM: dts: at91: sama5d2: add slow clock to watchdog node

2016-04-27 Thread Alexandre Belloni

On 26/04/2016 at 15:11:47 +0200, Nicolas Ferre wrote :
> As the watchdog timer needs the slow clock, add it to the currently defined
> wdt node.
> 
> Reported-by: Alexandre Belloni 
> Signed-off-by: Nicolas Ferre 
Acked-by: Alexandre Belloni 

> ---
>  arch/arm/boot/dts/sama5d2.dtsi | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm/boot/dts/sama5d2.dtsi b/arch/arm/boot/dts/sama5d2.dtsi
> index 43015c2ad395..84eed47ffb0a 100644
> --- a/arch/arm/boot/dts/sama5d2.dtsi
> +++ b/arch/arm/boot/dts/sama5d2.dtsi
> @@ -1050,6 +1050,7 @@
>   compatible = "atmel,sama5d4-wdt";
>   reg = <0xf8048040 0x10>;
>   interrupts = <4 IRQ_TYPE_LEVEL_HIGH 7>;
> + clocks = <&clk32k>;
>   status = "disabled";
>   };
>  
> -- 
> 2.1.3
> 

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

Re: [PATCH v6 2/4] dmaengine: dw: revisit data_width property

2016-04-27 Thread Vinod Koul

On Wed, Apr 27, 2016 at 12:13:17PM +0530, Viresh Kumar wrote:
> On 25-04-16, 15:35, Andy Shevchenko wrote:
> > There are several changes are done here:
> > 
> > - Convert the property to be in bytes
> > 
> > Besides this is common practice for such property the use of a value in 
> > bytes
> > much more convenient than handling the encoded value.
> > 
> > - Rename data_width to data-width in the device tree bindings
> > 
> > - While here, replace dwc_fast_ffs() by __ffs()
> > 
> > The change leaves the support for old format as well just in case someone 
> > will
> > use a newer kernel with an old device tree blob.
> > 
> > Signed-off-by: Andy Shevchenko 
> > ---
> >  Documentation/devicetree/bindings/dma/snps-dma.txt |  6 ++--
> >  arch/arc/boot/dts/abilis_tb10x.dtsi|  2 +-
> >  arch/arm/boot/dts/spear13xx.dtsi   |  4 +--
> >  drivers/dma/dw/core.c  | 42 
> > ++
> >  drivers/dma/dw/platform.c  |  5 ++-
> >  include/linux/platform_data/dma-dw.h   |  2 +-
> >  6 files changed, 21 insertions(+), 40 deletions(-)
> > 
> > diff --git a/Documentation/devicetree/bindings/dma/snps-dma.txt 
> > b/Documentation/devicetree/bindings/dma/snps-dma.txt
> > index c99c1ff..544b9b9 100644
> > --- a/Documentation/devicetree/bindings/dma/snps-dma.txt
> > +++ b/Documentation/devicetree/bindings/dma/snps-dma.txt
> > @@ -13,8 +13,8 @@ Required properties:
> >  - chan_priority: priority of channels. 0 (default): increase from chan 
> > 0->n, 1:
> >increase from chan n->0
> >  - block_size: Maximum block size supported by the controller
> > -- data_width: Maximum data width supported by hardware per AHB master
> > -  (0 - 8bits, 1 - 16bits, ..., 5 - 256bits)
> > +- data-width: Maximum data width supported by hardware per AHB master
> > +  (in bytes, power of 2)
> >  
> >  
> >  Optional properties:
> > @@ -38,7 +38,7 @@ Example:
> > chan_allocation_order = <1>;
> > chan_priority = <1>;
> > block_size = <0xfff>;
> > -   data_width = <3 3>;
> > +   data-width = <8 8>;
> > };
> 
> You broke backward compatibility with earlier DTs.
> 
> What's backward compatibility ?
> 
> Consider that the DT from an earlier version of kernel is part of the bootrom 
> of
> a SoC. Now that bootrom should work just fine with any new kernel version. 
> i.e.
> old DT + new kernels should always work.

And this is not fixed yet!.

-- 
~Vinod

Re: Double-Fetch bug in Linux-4.5/drivers/scsi/aacraid/commctrl.c

2016-04-27 Thread Julia Lawall



On Wed, 27 Apr 2016, Dan Carpenter wrote:

> On Wed, Apr 27, 2016 at 07:42:04AM +0200, Julia Lawall wrote:
> >
> >
> > On Tue, 26 Apr 2016, Kees Cook wrote:
> >
> > > On Mon, Apr 25, 2016 at 7:50 AM, Pengfei Wang  
> > > wrote:
> > > > Hello,
> > > >
> > > > I found this Double-Fetch bug in 
> > > > Linux-4.5/drivers/scsi/aacraid/commctrl.c
> > > > when I was examining the source code.
> > >
> > > Thanks for these reports! I wrote a coccinelle script to find these,
> > > but it requires some manual checking. For what it's worth, it found
> > > your report as well:
> > >
> > > ./drivers/scsi/aacraid/commctrl.c:116:5-19: potentially dangerous
> > > second copy_from_user()
> > >
> > > So I should probably get this added to the coccicheck run... Maybe it
> > > can get some clean up from Julia. :)
> >
> > I looked a bit at the results, and didn't see anything obvious.  What is
> > the problem, exactly, and what would be a characteristic of a false
> > positive?
> >
>
>
>   copy_from_user(dest, src, sizeof(dest));
>
>   if (dest.extra > MAX_SIZE)
>   return -EINVAL;
>
>   copy_from_user(dest, src, sizeof(dest) + dest.extra);
>
>   for (i = 0; i < dest.extra; i++) {
>   dest.foo[i] = xxx;
>
>
> We get dest.extra from the user, we verify the size, then we copy more
> data from the user but that over writes dest.extra again.  We use
> dest.extra a second time without checking that it's still <= MAX_SIZE.

OK, so the problem is when data that was checked on the first copy is used
after the second copy?  It would probably be possible to get rid of a lot
of false positives with that.

thanks,
julia

Re: [PATCH] ARM: at91/defconfig: add the HDMA controller to sama5_defconfig

2016-04-27 Thread Alexandre Belloni

On 26/04/2016 at 15:47:11 +0200, Nicolas Ferre wrote :
> Selection of the HDMAC option is now needed to allow some sama5 devices
> to have the DMA driver compiled.
> 
> Signed-off-by: Nicolas Ferre 
Acked-by: Alexandre Belloni 

> ---
>  arch/arm/configs/sama5_defconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm/configs/sama5_defconfig 
> b/arch/arm/configs/sama5_defconfig
> index afbda413d61a..e002b3bba5b4 100644
> --- a/arch/arm/configs/sama5_defconfig
> +++ b/arch/arm/configs/sama5_defconfig
> @@ -183,6 +183,7 @@ CONFIG_LEDS_TRIGGER_GPIO=y
>  CONFIG_RTC_CLASS=y
>  CONFIG_RTC_DRV_AT91RM9200=y
>  CONFIG_DMADEVICES=y
> +CONFIG_AT_HDMAC=y
>  CONFIG_AT_XDMAC=y
>  # CONFIG_IOMMU_SUPPORT is not set
>  CONFIG_IIO=y
> -- 
> 2.1.3
> 

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[patch V2] lib: GCD: add binary GCD algorithm

2016-04-27 Thread zengzhaoxiu

From: Zhaoxiu Zeng 

Because some architectures (alpha, armv6, etc.) don't provide hardware division,
the mod operation is slow! Binary GCD algorithm uses simple arithmetic 
operations,
it replaces division with arithmetic shifts, comparisons, and subtraction.

I have compiled successfully with x86_64_defconfig and i386_defconfig.

I use the following code to test:

#include 
#include 
#include 
#include 

#define swap(a, b) \
do { \
a ^= b; \
b ^= a; \
a ^= b; \
} while (0)

unsigned long gcd0(unsigned long a, unsigned long b)
{
unsigned long r;

if (a < b) {
swap(a, b);
}

if (b == 0)
return a;

while ((r = a % b) != 0) {
a = b;
b = r;
}

return b;
}

unsigned long gcd1(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
return r;

b >>= __builtin_ctzl(b);
for (;;) {

a >>= __builtin_ctzl(a);
if (a == b)
break;

if (a < b)
swap(a, b);
a -= b;
}

b <<= __builtin_ctzl(r);
return b;
}

unsigned long gcd2(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
return r;

r ^= (r - 1);

while (!(b & r))
b >>= 1;

for (;;) {
while (!(a & r))
a >>= 1;
if (a == b)
break;

if (a < b)
swap(a, b);

a -= b;
a >>= 1;
if (a & r)
a += b;
}

return b;
}

static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
gcd0, gcd1, gcd2,
};

#define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))
#define TEST_TIMES 100
static unsigned long res[TEST_ENTRIES][TEST_TIMES];

#define TIME_T struct timespec

static inline struct timespec read_time(void)
{
struct timespec time;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
return time;
}

static uint64_t diff_time(struct timespec start, struct timespec end)
{
struct timespec temp;

if ((end.tv_nsec - start.tv_nsec) < 0) {
temp.tv_sec = end.tv_sec - start.tv_sec - 1;
temp.tv_nsec = 10UL + end.tv_nsec - 
start.tv_nsec;
} else {
temp.tv_sec = end.tv_sec - start.tv_sec;
temp.tv_nsec = end.tv_nsec - start.tv_nsec;
}

return temp.tv_sec * 10UL + temp.tv_nsec;
}

static inline unsigned long get_rand()
{
if (sizeof(long) == 8)
return (unsigned long)rand() << 32 | rand();
else
return rand();
}

int main(int argc, char **argv)
{
unsigned int seed = time(0);
unsigned int i, j;
TIME_T start, end;

for (i = 0; i < TEST_ENTRIES; i++) {
srand(seed);
start = read_time();
for (j = 0; j < TEST_TIMES; j++)
res[i][j] = gcd_func[i](get_rand(), get_rand());
end = read_time();
printf("gcd%d: elapsed %lld\n", i, diff_time(start, 
end));
sleep(1);
}

for (j = 0; j < TEST_TIMES; j++) {
for (i = 1; i < TEST_ENTRIES; i++) {
if (res[i][j] != res[0][j])
break;
}
if (i < TEST_ENTRIES) {
fprintf(stderr, "Error: ");
for (i = 0; i < TEST_ENTRIES; i++)
fprintf(stderr, "res%d %ld%s", i, 
res[i][j], i < TEST_ENTRIES - 1 ? ", " : "\n");
}
}

return 0;
}

Compiled with "-O2", on "VirtualBox 4.

Re: [PATCH] ARM: at91/defconfig: add HLCDC driver to sama5_defconfig

2016-04-27 Thread Alexandre Belloni

On 26/04/2016 at 17:01:27 +0200, Nicolas Ferre wrote :
> Add the LCD DRM driver with all its dependencies:
> - the MFD driver
> - the backlight PWM
> - the simple panel driver
> 
> Remove the CONFIG_FB as it is not needed on any sama5 device.
> 
> Signed-off-by: Nicolas Ferre 
Acked-by: Alexandre Belloni 

> ---
>  arch/arm/configs/sama5_defconfig | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/configs/sama5_defconfig 
> b/arch/arm/configs/sama5_defconfig
> index e002b3bba5b4..f5a3966f6793 100644
> --- a/arch/arm/configs/sama5_defconfig
> +++ b/arch/arm/configs/sama5_defconfig
> @@ -133,6 +133,7 @@ CONFIG_WATCHDOG=y
>  CONFIG_AT91SAM9X_WATCHDOG=y
>  CONFIG_SAMA5D4_WATCHDOG=y
>  CONFIG_MFD_ATMEL_FLEXCOM=y
> +CONFIG_MFD_ATMEL_HLCDC=y
>  CONFIG_REGULATOR=y
>  CONFIG_REGULATOR_FIXED_VOLTAGE=y
>  CONFIG_REGULATOR_ACT8865=y
> @@ -142,11 +143,14 @@ CONFIG_V4L_PLATFORM_DRIVERS=y
>  CONFIG_SOC_CAMERA=y
>  CONFIG_VIDEO_ATMEL_ISI=y
>  CONFIG_SOC_CAMERA_OV2640=y
> -CONFIG_FB=y
> +CONFIG_DRM=y
> +CONFIG_DRM_ATMEL_HLCDC=y
> +CONFIG_DRM_PANEL_SIMPLE=y
>  CONFIG_BACKLIGHT_LCD_SUPPORT=y
> -# CONFIG_LCD_CLASS_DEVICE is not set
> +CONFIG_LCD_CLASS_DEVICE=y
>  CONFIG_BACKLIGHT_CLASS_DEVICE=y
>  # CONFIG_BACKLIGHT_GENERIC is not set
> +CONFIG_BACKLIGHT_PWM=y
>  CONFIG_FRAMEBUFFER_CONSOLE=y
>  CONFIG_SOUND=y
>  CONFIG_SND=y
> @@ -191,6 +195,7 @@ CONFIG_AT91_ADC=y
>  CONFIG_AT91_SAMA5D2_ADC=y
>  CONFIG_PWM=y
>  CONFIG_PWM_ATMEL=y
> +CONFIG_PWM_ATMEL_HLCDC_PWM=y
>  CONFIG_PWM_ATMEL_TCB=y
>  CONFIG_EXT4_FS=y
>  CONFIG_FANOTIFY=y
> -- 
> 2.1.3
> 

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[patch] pinctrl: baytrail: fix some error handling in debugfs

2016-04-27 Thread Dan Carpenter

We need to unlock before continuing.  Also the continue was accidentally
left out on one error path which would lead to a NULL dereference.

Fixes: 86e3ef812fe3 ('pinctrl: baytrail: Update gpio chip operations')
Signed-off-by: Dan Carpenter 

diff --git a/drivers/pinctrl/intel/pinctrl-baytrail.c 
b/drivers/pinctrl/intel/pinctrl-baytrail.c
index 6dcf43a..55182fc 100644
--- a/drivers/pinctrl/intel/pinctrl-baytrail.c
+++ b/drivers/pinctrl/intel/pinctrl-baytrail.c
@@ -1390,6 +1390,7 @@ static void byt_gpio_dbg_show(struct seq_file *s, struct 
gpio_chip *chip)
seq_printf(s,
   "Could not retrieve pin %i conf0 reg\n",
   pin);
+   raw_spin_unlock_irqrestore(&vg->lock, flags);
continue;
}
conf0 = readl(reg);
@@ -1398,6 +1399,8 @@ static void byt_gpio_dbg_show(struct seq_file *s, struct 
gpio_chip *chip)
if (!reg) {
seq_printf(s,
   "Could not retrieve pin %i val reg\n", pin);
+   raw_spin_unlock_irqrestore(&vg->lock, flags);
+   continue;
}
val = readl(reg);
raw_spin_unlock_irqrestore(&vg->lock, flags);

Re: zram: per-cpu compression streams

2016-04-27 Thread Sergey Senozhatsky

On (04/27/16 16:55), Minchan Kim wrote:
[..]
> > > Could you test concurrent mem hogger with fio rather than pre-fault 
> > > before fio test
> > > in next submit?
> > 
> > this test will not prove anything, unfortunately. I performed it;
> > and it's impossible to guarantee even remotely stable results.
> > mem-hogger process can spend on pre-fault from 41 to 81 seconds;
> > so I'm quite sceptical about the actual value of this test.
> > 
> > > > considering buffer_compress_percentage=11, the box was under somewhat
> > > > heavy pressure.
> > > > 
> > > > now, the results
> > > 
> > > Yeb, Even, recompression case is fater than old but want to see more 
> > > heavy memory
> > > pressure case and the ratio I mentioned above.
> > 
> > I did quite heavy testing over the last 7 days, with numerous OOM kills
> > and OOM panics.
> 
> Okay, I think it's worth to merge enough and see the result.
> Please send formal patch which has recompression stat. ;-)


correction: those 41-81s spikes in mem-hogger were observed under
different scenario: 10GB zram with 6GB mem-hogger on a 4GB system.

I'll do another round of tests (with parallel mem-hogger pre-fault
and 4GB/4GB zram/mem-hogger split) and collect the number that you
asked for.

thanks!

-ss

Re: [PATCH 1/1] simplified security.nscapability xattr

2016-04-27 Thread Jann Horn

On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote:
> On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn  wrote:
> > Quoting Kees Cook (keesc...@chromium.org):
> >> On Fri, Apr 22, 2016 at 10:26 AM,   wrote:
> >> > From: Serge Hallyn 
> >> >
> >> > This can only be set by root in his own namespace, and will
> >> > only be respected by namespaces with that same root kuid
> >> > mapped as root, or namespaces descended from it.
> >> >
> >> > This allows a simple setxattr to work, allows tar/untar to
> >> > work, and allows us to tar in one namespace and untar in
> >> > another while preserving the capability, without risking
> >> > leaking privilege into a parent namespace.
> >>
> >> The concept seems sensible to me. Various notes below...
> >>
> >> >
> >> > Signed-off-by: Serge Hallyn 
> >> > ---
> >> >  include/linux/capability.h  |5 ++-
> >> >  include/uapi/linux/capability.h |   18 
> >> >  include/uapi/linux/xattr.h  |3 ++
> >> >  security/commoncap.c|   91 
> >> > +--
> >> >  4 files changed, 112 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/include/linux/capability.h b/include/linux/capability.h
> >> > index 00690ff..cf533ff 100644
> >> > --- a/include/linux/capability.h
> >> > +++ b/include/linux/capability.h
> >> > @@ -13,7 +13,7 @@
> >> >  #define _LINUX_CAPABILITY_H
> >> >
> >> >  #include 
> >> > -
> >> > +#include 
> >> >
> >> >  #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3
> >> >  #define _KERNEL_CAPABILITY_U32S_LINUX_CAPABILITY_U32S_3
> >> > @@ -31,6 +31,9 @@ struct cpu_vfs_cap_data {
> >> > kernel_cap_t inheritable;
> >> >  };
> >> >
> >> > +#define NS_CAPS_VERSION(x) (x & 0xFF)
> >> > +#define NS_CAPS_FLAGS(x) ((x >> 8) & 0xFF)
> >> > +
> >> >  #define _USER_CAP_HEADER_SIZE  (sizeof(struct __user_cap_header_struct))
> >> >  #define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t))
> >> >
> >> > diff --git a/include/uapi/linux/capability.h 
> >> > b/include/uapi/linux/capability.h
> >> > index 12c37a1..f0b4a66 100644
> >> > --- a/include/uapi/linux/capability.h
> >> > +++ b/include/uapi/linux/capability.h
> >> > @@ -62,6 +62,9 @@ typedef struct __user_cap_data_struct {
> >> >  #define VFS_CAP_U32_2   2
> >> >  #define XATTR_CAPS_SZ_2 (sizeof(__le32)*(1 + 2*VFS_CAP_U32_2))
> >> >
> >> > +/* version number for security.nscapability xattrs hdr->hdr_info */
> >> > +#define VFS_NS_CAP_REVISION 1
> >> > +
> >> >  #define XATTR_CAPS_SZ   XATTR_CAPS_SZ_2
> >> >  #define VFS_CAP_U32 VFS_CAP_U32_2
> >> >  #define VFS_CAP_REVISION   VFS_CAP_REVISION_2
> >> > @@ -74,6 +77,21 @@ struct vfs_cap_data {
> >> > } data[VFS_CAP_U32];
> >> >  };
> >> >
> >> > +#define VFS_NS_CAP_EFFECTIVE0x1
> >> > +/*
> >> > + * 32-bit hdr_info contains
> >> > + * 16 leftmost: reserved
> >> > + * next 8: flags (only VFS_NS_CAP_EFFECTIVE so far)
> >> > + * last 8: version
> >> > + */
> >> > +struct vfs_ns_cap_data {
> >> > +   __le32 magic_etc;
> >> > +   struct {
> >> > +   __le32 permitted;/* Little endian */
> >> > +   __le32 inheritable;  /* Little endian */
> >> > +   } data[VFS_CAP_U32];
> >> > +};
> >>
> >> This is identical to vfs_cap_data. Is there a reason not to re-use the
> >> existing one?
> >
> > Hm.  I used to have them completely different.  ATM the only difference
> > is what goes into the magic_etc, and that not very (different).  So
> > yeah it probably should be re-used.
> >
> >> > +
> >> >  #ifndef __KERNEL__
> >> >
> >> >  /*
> >> > diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h
> >> > index 1590c49..67c80ab 100644
> >> > --- a/include/uapi/linux/xattr.h
> >> > +++ b/include/uapi/linux/xattr.h
> >> > @@ -68,6 +68,9 @@
> >> >  #define XATTR_CAPS_SUFFIX "capability"
> >> >  #define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX
> >> >
> >> > +#define XATTR_NS_CAPS_SUFFIX "nscapability"
> >> > +#define XATTR_NAME_NS_CAPS XATTR_SECURITY_PREFIX XATTR_NS_CAPS_SUFFIX
> >> > +
> >> >  #define XATTR_POSIX_ACL_ACCESS  "posix_acl_access"
> >> >  #define XATTR_NAME_POSIX_ACL_ACCESS XATTR_SYSTEM_PREFIX 
> >> > XATTR_POSIX_ACL_ACCESS
> >> >  #define XATTR_POSIX_ACL_DEFAULT  "posix_acl_default"
> >>
> >> Are these documented anywhere in Documentation/ or in man pages? This
> >> seems like it'd need a man-page update too.
> >
> > Yeah, if we decide we're ok with this strategy I'll update those.
> >
> >> > diff --git a/security/commoncap.c b/security/commoncap.c
> >> > index 48071ed..8f3f34a 100644
> >> > --- a/security/commoncap.c
> >> > +++ b/security/commoncap.c
> >> > @@ -313,6 +313,10 @@ int cap_inode_need_killpriv(struct dentry *dentry)
> >> > if (!inode->i_op->getxattr)
> >> >return 0;
> >> >
> >> > +   error = inode->i_op->getxattr(dentry, XATTR_NAME_NS_CAPS, NULL, 
> >> > 0);
> >> > +   if (error > 0)
> >> > +   return 1;
> >> > +
> >>

Re: [PATCH V17 2/3] dmaengine: qcom_hidma: add debugfs hooks

2016-04-27 Thread Vinod Koul

On Tue, Apr 26, 2016 at 12:55:18PM -0400, Sinan Kaya wrote:
> On 4/26/2016 12:25 PM, Vinod Koul wrote:
> > On Tue, Apr 26, 2016 at 08:08:16AM -0400, ok...@codeaurora.org wrote:
> >> On 2016-04-25 23:30, Vinod Koul wrote:
> >>> On Mon, Apr 11, 2016 at 10:21:12AM -0400, Sinan Kaya wrote:
> >>>
>  +static int hidma_chan_stats(struct seq_file *s, void *unused)
>  +{
>  +struct hidma_chan *mchan = s->private;
>  +struct hidma_desc *mdesc;
>  +struct hidma_dev *dmadev = mchan->dmadev;
>  +
>  +pm_runtime_get_sync(dmadev->ddev.dev);
> >>>
> >>> debug shouldn't power up device, why do you want to do that
> >>
> >>
> >> Clocks are turned off while the hw is idle. I can’t reach hw
> >> registers without restoring power.
> > 
> > Hmm, have you thought about using regmap?
> > 
> 
> To be honest, I didn't know what regmap is but I just read some code
> and looked at how it is used. Feel free to correct me if I got it 
> wrong. 
> 
> Regmap seems to be designed for *slow* speed peripherals to improve frequent
> accesses by the SW. It looks like it is used by MFD, SPI and I2C drivers.
> 
> It seems to cache the register contents and flush/invalidate them only when
> needed.
> 
> The MMIO version seems to be assuming the presence of device-tree like CLK
> API which doesn't exist on ACPI systems and is not portable.
> 
> My reaction is that it is a lot of code with no added functionality to what
> HIDMA driver is trying to achieve. 
> 
> Given that the use case here is only for debug purposes; I think it is OK 
> to keep this runtime call here. I don't want to add any overhead into the
> existing code just to support the debug use case.  
> 
> None of my register read/writes are slow. This file will only be used to 
> troubleshoot customer issues.

$ is always faster than MMIO. This way you can give reg contents to users
without waking up hw.

Also we at Intel use regmap on ACPI systems without CLK API

-- 
~Vinod

Re: [PATCH] ARM: at91/defconfig: add PDMIC driver to sama5_defconfig

2016-04-27 Thread Alexandre Belloni

On 26/04/2016 at 17:38:08 +0200, Nicolas Ferre wrote :
> Add Pulse Density Modulation Interface Controller (PDMIC) driver compilation
> for sama5 default configuration. Is used by sama5d2 SoC for instance.
> 
> Signed-off-by: Nicolas Ferre 
Acked-by: Alexandre Belloni 

> ---
>  arch/arm/configs/sama5_defconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm/configs/sama5_defconfig 
> b/arch/arm/configs/sama5_defconfig
> index f5a3966f6793..32a082888adf 100644
> --- a/arch/arm/configs/sama5_defconfig
> +++ b/arch/arm/configs/sama5_defconfig
> @@ -158,6 +158,7 @@ CONFIG_SND_SOC=y
>  CONFIG_SND_ATMEL_SOC=y
>  CONFIG_SND_ATMEL_SOC_WM8904=y
>  # CONFIG_HID_GENERIC is not set
> +CONFIG_SND_ATMEL_SOC_PDMIC=y
>  CONFIG_USB=y
>  CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
>  CONFIG_USB_EHCI_HCD=y
> -- 
> 2.1.3
> 

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

Re: [BUG linux-next] Kernel panic found with linux-next-20160414

2016-04-27 Thread Hugh Dickins

On Wed, 20 Apr 2016, Shi, Yang wrote:
> On 4/20/2016 1:01 AM, Hugh Dickins wrote:
> > On Tue, 19 Apr 2016, Shi, Yang wrote:
> > > Hi folks,
> > > 
> > > When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the
> > > below
> > > kernel panic:
> > > 
> > > Unable to handle kernel paging request at virtual address
> > > ffc007846000
> > > pgd = ffc01e21d000
> > > [ffc007846000] *pgd=, *pud=
> > > Internal error: Oops: 9647 [#11] PREEMPT SMP
> > > Modules linked in: loop
> > > CPU: 7 PID: 274 Comm: systemd-journal Tainted: G  D
> > > 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9
> > > Hardware name: Freescale Layerscape 2085a RDB Board (DT)
> > > task: ffc01e3fcf80 ti: ffc01ea8c000 task.ti: ffc01ea8c000
> > > PC is at copy_page+0x38/0x120
> > > LR is at migrate_page_copy+0x604/0x1660
> > > pc : [] lr : [] pstate: 2145
> > > sp : ffc01ea8ecd0
> > > x29: ffc01ea8ecd0 x28: 
> > > x27: 1ff7b80240f8 x26: ffc018196f20
> > > x25: ffbdc01e1180 x24: ffbdc01e1180
> > > x23:  x22: ffc01e3fcf80
> > > x21: ffc00481f000 x20: ff900a31d000
> > > x19: ffbdc01207c0 x18: 0f00
> > > x17:  x16: 
> > > x15:  x14: 
> > > x13:  x12: 
> > > x11:  x10: 
> > > x9 :  x8 : 
> > > x7 :  x6 : 
> > > x5 :  x4 : 
> > > x3 :  x2 : 
> > > x1 : ffc00481f080 x0 : ffc007846000
> > > 
> > > Call trace:
> > > Exception stack(0xffc021fc2ed0 to 0xffc021fc2ff0)
> > > 2ec0:   ffbdc00887c0 ff900a31d000
> > > 2ee0: ffc021fc30f0 ff9008ff2318 2145 0025
> > > 2f00: ffbdc025a280 ffc020adc4c0 41b58ab3 ff900a085fd0
> > > 2f20: ff9008200658   ffbdc00887c0
> > > 2f40: ff900b0f1320 ffc021fc3078 41b58ab3 ff900a0864f8
> > > 2f60: ff9008210010 ffc021fb8960 ff900867bacc 1ff8043f712d
> > > 2f80: ffc021fc2fb0 ff9008210564 ffc021fc3070 ffc021fb8940
> > > 2fa0: 08221f78 ff900862f9c8 ffc021fc2fe0 ff9008215dc8
> > > 2fc0: 1ff8043f8602 ffc021fc ffc00968a000 ffc00221f080
> > > 2fe0: f9407e11d1f0 d61f02209103e210
> > > [] copy_page+0x38/0x120
> > > [] migrate_page+0x74/0x98
> > > [] nfs_migrate_page+0x58/0x80
> > > [] move_to_new_page+0x15c/0x4d8
> > > [] migrate_pages+0x7c8/0x11f0
> > > [] compact_zone+0xdfc/0x2570
> > > [] compact_zone_order+0xe0/0x170
> > > [] try_to_compact_pages+0x2e8/0x8f8
> > > [] __alloc_pages_direct_compact+0x100/0x540
> > > [] __alloc_pages_nodemask+0xc40/0x1c58
> > > [] khugepaged+0x468/0x19c8
> > > [] kthread+0x248/0x2c0
> > > [] ret_from_fork+0x10/0x40
> > > Code: d281f012 91020021 f1020252 d503201f (a8000c02)
> > > 
> > > 
> > > I did some initial investigation and found it is caused by
> > > DEBUG_PAGEALLOC
> > > and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline 4.6-rc3 works
> > > well.
> > > 
> > > It should be not arch specific although I got it caught on ARM64. I
> > > suspect
> > > this might be caused by Hugh's huge tmpfs patches.
> > 
> > Thanks for testing.  It might be caused by my patches, but I don't think
> > that's very likely.  This is page migraton for compaction, in the service
> > of anon THP's khugepaged; and I wonder if you were even exercising huge
> > tmpfs when running LTP here (it certainly can be done: I like to mount a
> > huge tmpfs on /opt/ltp and install there, with shmem_huge 2 so any other
> > tmpfs mounts are also huge).
> 
> Some further investigation shows I got the panic even though I don't have
> tmpfs mounted with huge=1 or set shmem_huge to 2.
> 
> > 
> > There are compaction changes in linux-next too, but I don't see any
> > reason why they'd cause this.  I don't know arm64 traces enough to know
> > whether it's the source page or the destination page for the copy, but
> > it looks as if it has been freed (and DEBUG_PAGEALLOC unmapped) before
> > reaching migration's copy.
> 
> The fault address is passed by x0, which is dest in the implementation of
> copy_page, so it is the destination page.
> 
> > 
> > Needs more debugging, I'm afraid: is it reproducible?
> 
> Yes, as long as I enable those two PAGEALLOC debug options, I can get the
> panic once I run ltp. But, it is not caused any specific ltp test case
> directly, the panic happens randomly during ltp is running.

Your ping on the crash in release_freepages() reminded me to take another
look at this one.  And found that I only needed to enable DEBUG_PAGEALLOC
and run LTP to get it on x86_64 too, as you suspected.

It's another of those compaction errors, in mmotm and linux-next of a
week or two

Re: + mm-zswap-use-workqueue-to-destroy-pool.patch added to -mm tree

2016-04-27 Thread Dan Streetman

On Wed, Apr 27, 2016 at 1:40 AM, Sergey Senozhatsky
 wrote:
> Hello,
>
> On (04/26/16 16:52), a...@linux-foundation.org wrote:
> [..]
>> -static void __zswap_pool_release(struct rcu_head *head)
>> +static void __zswap_pool_release(struct work_struct *work)
>>  {
>> - struct zswap_pool *pool = container_of(head, typeof(*pool), rcu_head);
>> + struct zswap_pool *pool = container_of(work, typeof(*pool), work);
>> +
>> + synchronize_rcu();
>>
>>   /* nobody should have been able to get a kref... */
>>   WARN_ON(kref_get_unless_zero(&pool->kref));
>> @@ -674,7 +676,9 @@ static void __zswap_pool_empty(struct kr
>>   WARN_ON(pool == zswap_pool_current());
>>
>>   list_del_rcu(&pool->list);
>> - call_rcu(&pool->rcu_head, __zswap_pool_release);
>> +
>> + INIT_WORK(&pool->work, __zswap_pool_release);
>> + schedule_work(&pool->work);
>>
>>   spin_unlock(&zswap_pools_lock);
>>  }
>> _
>>
>> Patches currently in -mm which might be from ddstr...@ieee.org are
>>
>> mm-zpool-use-workqueue-for-zpool_destroy.patch
>> mm-zswap-use-workqueue-to-destroy-pool.patch
>
> I think only mm-zswap-use-workqueue-to-destroy-pool.patch is
> needed.

yep, please drop mm-zpool-use-workqueue-for-zpool_destroy.patch

thanks!

>
> -ss

Re: [PATCH 0/6] Intel Secure Guard Extensions

2016-04-27 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> > What new syscalls would be needed for ssh to get all this support?
> 
> This patchset or similar, plus some user code and an enclave to use.
> 
> Sadly, on current CPUs, you also need Intel to bless the enclave.  It looks 
> like 
> new CPUs might relax that requirement.

That looks like a fundamental technical limitation in my book - to an open 
source 
user this is essentially a very similar capability as tboot: it only allows the 
execution of externally blessed static binary blobs...

I don't think we can merge any of this upstream until it's clear that the 
hardware 
owner running open-source user-space can also freely define/start his own 
secure 
enclaves without having to sign the enclave with any external party. I.e. 
self-signed enclaves should be fundamentally supported as well.

Thanks,

Ingo

Re: [PATCH] serial: mctrl_gpio: Drop support for out1-gpios and out2-gpios

2016-04-27 Thread Richard Genoud

2016-04-22 17:10 GMT+02:00 Geert Uytterhoeven :
> The OUT1 and OUT2 pins present on some legacy UARTs are basically GPIOs.
> It doesn't make much sense to emulate GPIOs using other GPIOs, hence
> drop support for that.
>
> Signed-off-by: Geert Uytterhoeven 
> ---
>  drivers/tty/serial/serial_mctrl_gpio.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/tty/serial/serial_mctrl_gpio.c 
> b/drivers/tty/serial/serial_mctrl_gpio.c
> index 02147361eaa94747..821ffa637eb630cb 100644
> --- a/drivers/tty/serial/serial_mctrl_gpio.c
> +++ b/drivers/tty/serial/serial_mctrl_gpio.c
> @@ -43,8 +43,6 @@ static const struct {
> { "rng", TIOCM_RNG, false, },
> { "rts", TIOCM_RTS, true, },
> { "dtr", TIOCM_DTR, true, },
> -   { "out1", TIOCM_OUT1, true, },
> -   { "out2", TIOCM_OUT2, true, },
>  };
>
>  void mctrl_gpio_set(struct mctrl_gpios *gpios, unsigned int mctrl)
> --
> 1.9.1
>
Maybe I missed something, but I think you want to remove
UART_GPIO_OUT{1,2} also :
diff --git a/drivers/tty/serial/serial_mctrl_gpio.h
b/drivers/tty/serial/serial_mctrl_gpio.h
index 9716db283290..10632e72b89f 100644
--- a/drivers/tty/serial/serial_mctrl_gpio.h
+++ b/drivers/tty/serial/serial_mctrl_gpio.h
@@ -32,8 +32,6 @@ enum mctrl_gpio_idx {
 UART_GPIO_RI = UART_GPIO_RNG,
 UART_GPIO_RTS,
 UART_GPIO_DTR,
-UART_GPIO_OUT1,
-UART_GPIO_OUT2,
 UART_GPIO_MAX,
 };

Richard.

Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression

2016-04-27 Thread Huang, Ying

Michal Hocko  writes:

> On Wed 27-04-16 11:15:56, kernel test robot wrote:
>> FYI, we noticed vm-scalability.throughput -11.8% regression with the 
>> following commit:
>
> Could you be more specific what the test does please?

The sub-testcase of vm-scalability is swap-w-rand.  An RAM emulated pmem
device is used as a swap device, and a test program will allocate/write
anonymous memory randomly to exercise page allocation, reclaiming, and
swapping in code path.

Best Regards,
Huang, Ying

Re: [RFC v2 7/8] drm/fence: add fence timeline to drm_crtc

2016-04-27 Thread Daniel Stone

Hi,

On 26 April 2016 at 00:33, Gustavo Padovan  wrote:
> +static inline struct drm_crtc *fence_to_crtc(struct fence *fence)
> +{
> +   if (fence->ops != &drm_crtc_fence_ops)
> +   return NULL;

Since this is (currently) only used before unconditional dereferences,
maybe turn this into a BUG_ON instead of return NULL?

Cheers,
Daniel

Re: [PATCH v6 2/4] dmaengine: dw: revisit data_width property

2016-04-27 Thread Andy Shevchenko

On Wed, Apr 27, 2016 at 9:43 AM, Viresh Kumar  wrote:
> On 25-04-16, 15:35, Andy Shevchenko wrote:
>> There are several changes are done here:
>>
>> - Convert the property to be in bytes
>>
>> Besides this is common practice for such property the use of a value in bytes
>> much more convenient than handling the encoded value.
>>
>> - Rename data_width to data-width in the device tree bindings
>>
>> - While here, replace dwc_fast_ffs() by __ffs()
>>
>> The change leaves the support for old format as well just in case someone 
>> will
>> use a newer kernel with an old device tree blob.
>>
>> Signed-off-by: Andy Shevchenko 
>> ---
>>  Documentation/devicetree/bindings/dma/snps-dma.txt |  6 ++--
>>  arch/arc/boot/dts/abilis_tb10x.dtsi|  2 +-
>>  arch/arm/boot/dts/spear13xx.dtsi   |  4 +--
>>  drivers/dma/dw/core.c  | 42 
>> ++
>>  drivers/dma/dw/platform.c  |  5 ++-
>>  include/linux/platform_data/dma-dw.h   |  2 +-
>>  6 files changed, 21 insertions(+), 40 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/dma/snps-dma.txt 
>> b/Documentation/devicetree/bindings/dma/snps-dma.txt
>> index c99c1ff..544b9b9 100644
>> --- a/Documentation/devicetree/bindings/dma/snps-dma.txt
>> +++ b/Documentation/devicetree/bindings/dma/snps-dma.txt
>> @@ -13,8 +13,8 @@ Required properties:
>>  - chan_priority: priority of channels. 0 (default): increase from chan 
>> 0->n, 1:
>>increase from chan n->0
>>  - block_size: Maximum block size supported by the controller
>> -- data_width: Maximum data width supported by hardware per AHB master
>> -  (0 - 8bits, 1 - 16bits, ..., 5 - 256bits)
>> +- data-width: Maximum data width supported by hardware per AHB master
>> +  (in bytes, power of 2)
>>
>>
>>  Optional properties:
>> @@ -38,7 +38,7 @@ Example:
>>   chan_allocation_order = <1>;
>>   chan_priority = <1>;
>>   block_size = <0xfff>;
>> - data_width = <3 3>;
>> + data-width = <8 8>;
>>   };
>
> You broke backward compatibility with earlier DTs.

How?

>
> What's backward compatibility ?
>
> Consider that the DT from an earlier version of kernel is part of the bootrom 
> of
> a SoC. Now that bootrom should work just fine with any new kernel version. 
> i.e.
> old DT + new kernels should always work.

Yes, the property name is slightly different as meaning.

If we find data-width property driver will use it, otherwise it takes
old name and converts variable to be in bytes in the driver.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v6 2/4] dmaengine: dw: revisit data_width property

2016-04-27 Thread Viresh Kumar

On 27-04-16, 11:25, Andy Shevchenko wrote:
> On Wed, Apr 27, 2016 at 9:43 AM, Viresh Kumar  wrote:
> > On 25-04-16, 15:35, Andy Shevchenko wrote:
> >> There are several changes are done here:
> >>
> >> - Convert the property to be in bytes
> >>
> >> Besides this is common practice for such property the use of a value in 
> >> bytes
> >> much more convenient than handling the encoded value.
> >>
> >> - Rename data_width to data-width in the device tree bindings
> >>
> >> - While here, replace dwc_fast_ffs() by __ffs()
> >>
> >> The change leaves the support for old format as well just in case someone 
> >> will
> >> use a newer kernel with an old device tree blob.
> >>
> >> Signed-off-by: Andy Shevchenko 
> >> ---
> >>  Documentation/devicetree/bindings/dma/snps-dma.txt |  6 ++--
> >>  arch/arc/boot/dts/abilis_tb10x.dtsi|  2 +-
> >>  arch/arm/boot/dts/spear13xx.dtsi   |  4 +--
> >>  drivers/dma/dw/core.c  | 42 
> >> ++
> >>  drivers/dma/dw/platform.c  |  5 ++-
> >>  include/linux/platform_data/dma-dw.h   |  2 +-
> >>  6 files changed, 21 insertions(+), 40 deletions(-)
> >>
> >> diff --git a/Documentation/devicetree/bindings/dma/snps-dma.txt 
> >> b/Documentation/devicetree/bindings/dma/snps-dma.txt
> >> index c99c1ff..544b9b9 100644
> >> --- a/Documentation/devicetree/bindings/dma/snps-dma.txt
> >> +++ b/Documentation/devicetree/bindings/dma/snps-dma.txt
> >> @@ -13,8 +13,8 @@ Required properties:
> >>  - chan_priority: priority of channels. 0 (default): increase from chan 
> >> 0->n, 1:
> >>increase from chan n->0
> >>  - block_size: Maximum block size supported by the controller
> >> -- data_width: Maximum data width supported by hardware per AHB master
> >> -  (0 - 8bits, 1 - 16bits, ..., 5 - 256bits)
> >> +- data-width: Maximum data width supported by hardware per AHB master
> >> +  (in bytes, power of 2)
> >>
> >>
> >>  Optional properties:
> >> @@ -38,7 +38,7 @@ Example:
> >>   chan_allocation_order = <1>;
> >>   chan_priority = <1>;
> >>   block_size = <0xfff>;
> >> - data_width = <3 3>;
> >> + data-width = <8 8>;
> >>   };
> >
> > You broke backward compatibility with earlier DTs.
> 
> How?
> 
> >
> > What's backward compatibility ?
> >
> > Consider that the DT from an earlier version of kernel is part of the 
> > bootrom of
> > a SoC. Now that bootrom should work just fine with any new kernel version. 
> > i.e.
> > old DT + new kernels should always work.
> 
> Yes, the property name is slightly different as meaning.
> 
> If we find data-width property driver will use it, otherwise it takes
> old name and converts variable to be in bytes in the driver.

Oh, I missed that you renamed the property and taking care of both the
properties now. Sorry about that. So, you didn't break them for sure :)

But, the DT documentation doesn't contain the old property now but the code
does. I think you are required to keep the bindings currently supported by the
kernel in there. You can mark them deprecated, but can't remove them.

-- 
viresh

Re: [PATCH 1/8] char/rtc: replace blacklist with whitelist

2016-04-27 Thread Alexandre Belloni

On 26/04/2016 at 23:44:05 +0200, Arnd Bergmann wrote :
> Every new architecture has to add itself to the growing list of those
> that do not support the legacy PC RTC driver.
> 
> This replaces the long list of architectures that don't support it
> with a shorter list of those that do.
> 
> The list is taken from those architectures that have a non-empty
> asm/mc146818rtc.h header file and were not explicitly blacklisted.
> 
> Signed-off-by: Arnd Bergmann 
Acked-by: Alexandre Belloni 

> ---
>  drivers/char/Kconfig | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
> index 3ec0766ed5e9..66b5d48f409a 100644
> --- a/drivers/char/Kconfig
> +++ b/drivers/char/Kconfig
> @@ -279,8 +279,7 @@ if RTC_LIB=n
>  
>  config RTC
>   tristate "Enhanced Real Time Clock Support (legacy PC RTC driver)"
> - depends on !PPC && !PARISC && !IA64 && !M68K && !SPARC && !FRV \
> - && !ARM && !SUPERH && !S390 && !AVR32 && !BLACKFIN && 
> !UML
> + depends on ALPHA || (MIPS && MACH_LOONGSON64) || MN10300 || X86
>   ---help---
> If you say Y here and create a character special file /dev/rtc with
> major number 10 and minor number 135 using mknod ("man mknod"), you
> -- 
> 2.7.0
> 

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

Re: char: legacy RTC cleanups

2016-04-27 Thread Arnd Bergmann

On Wednesday 27 April 2016 09:54:41 Geert Uytterhoeven wrote:
> Hi Arnd,
> 
> On Tue, Apr 26, 2016 at 11:44 PM, Arnd Bergmann  wrote:
> > For the genrtc driver, rearranging the headers makes it simpler
> > to use and reduces duplication. In case of alpha and mn10300,
> > I've shown that the genrtc and rtc drivers are doing the same
> > thing, so we don't need them both. The remaining three
> > architectures (m68k, parisc, powerpc) actually all support
> > the newer rtc-generic driver, so we could remove genrtc completely
> > if we want to.
> 
> CONFIG_GEN_RTC is not enabled in any of the m68k defconfigs, so I think genrtc
> has been unused for a while.
> All defconfigs either use CONFIG_RTC_DRV_GENERIC, or enable a more specific
> RTC driver.

Ok, good to know. I'm guessing the same is true for parisc, but there are
also very few users.

Regarding the Q40 specific ioctls, what do you think this means, is it

a) nobody uses Q40 with modern kernels,
b) nobody calls RTC_PLL_GET/RTC_PLL_SET on q40, or
c) Q40 users have their own configurations and enable GEN_RTC?

On powerpc, a quarter of the defconfigs (mostly for really old hardware)
still use GEN_RTC, but I guess we can either bulk-convert them to RTC_GENERIC,
or convert the five implementations of .get_rtc_time/.set_rtc_time
(8xx, rtas, chrp, powermac, maple) into five regular RTC class drivers.

Arnd

Re: [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards.

2016-04-27 Thread Jaehoon Chung

On 04/26/2016 05:03 PM, Enric Balletbo i Serra wrote:
> Hi,
> 
> I introduced the cover letter to give some background about this.
> 
> I have been investigating a problem related to at least one specific sdcard 
> when
> UHS-I is set. The card is not detected due the tuning phase reports a
> failure. Since the problem is only reproduced with a single model of a single
> brand of card, it is probably a card firmware issue, but the card works fine
> on my laptop.

I think you have analyzed many case..of course..it was successful to switch 
voltage, right?
Maybe this patch too old..so can you remember which specific sdcard is produced?

> 
> The first attempt to fix this was a patch sent by Doug Anderson [1], but Alim
> Akhtar found that this produced randomly a hung task on Peach-pi. I can 
> confirm
> that it's easy to reproduce the hung task, either, with cold boots or suspend 
> to
> ram tests.

Yep..I have already tested and checked for this.

> 
> I tried to fix both problems (the original issue and the one introduced by the
> patch) in different ways, but I ended thinking that this second proposal is 
> the
> most simple that solves both issues. So let's try to fix this by handling the
> response CRC error slightly differently when tuning command is happening.
> 
> I tested the patch on both platforms, on exynos and on rockhip. I did lots of
> tests and at the moment the patch seems to fix the rockchip issue and don't
> hung on exynos. I'll continue testing meanwhile we discuss about it.
> 
> I think the patch, at least, needs the Doug's approval (as he dig into the 
> issue
> before) and the Tested-by Alim. So will be good if you have a slot of time to
> look a bit into this.
> 
> Thanks in advance.
>  Enric
> 
> [1] https://lkml.org/lkml/2015/5/18/495
> 
> Changelog since v1:
> - Fix the issue found by Alim with exynos letting the data transfer
>   take place only when MMC_SEND_TUNING_BLOCK is issued.
> 
> Doug Anderson (1):
>   mmc: dw_mmc: Wait for data transfer after response errors.
> 
>  drivers/mmc/host/dw_mmc.c | 27 +++
>  1 file changed, 27 insertions(+)
>

[PATCH v4 1/1] powerpc/86xx: Add support for Emerson/Artesyn MVME7100

2016-04-27 Thread Alessio Igor Bogani

Add support for the Artesyn MVME7100 Single Board Computer.

The MVME7100 is a 6U form factor VME64 computer with:

- A two e600 cores Freescale MPC8641D CPU
- 2 GB of DDR2 onboard memory
- Four Gigabit Ethernets
- Five 16550 compatible UARTs
- One USB 2.0 port
- Two PCI/PCI eXpress Mezzanine Card (PMC/XMC) Slots
- A DS1375 Real Time Clock (RTC)
- 512 KB of Non-Volatile Memory (NVRAM)
- Two 64 KB EEPROMs
- 128 MB NOR and 4/8 GB NAND Flash

This patch is based on linux-4.6-rc5 and has been only boot tested.

Limitations:
This patch covers only models 171 and 173
No plans to support CPLD timers

Know issues:
All four PHYs work in polling mode

Configuration is missing for:
PCI IDSEL and PCI Interrupt definition

Support is missing for:
Cache and memory controllers (which are very similar to the 85xx ones
but right now I don't know if we can re-use their support)
Watchdog, USB, NVRAM, NOR, NAND, EEPROMs, VME, PMC/XMC and RTC

Signed-off-by: Alessio Igor Bogani 
---
This patch requires 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-April/141813.html
to be built and 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-April/141980.html to
work correctly.

v3 -> v4
Add few details into the commit message as suggested by Scott Wood
Update to v4.6-rc5
Disable the PCI and RTC nodes in the device tree due missing of mandatory
configurations as suggested by Scott Wood
Remove a couple of checkpatch warnings

v2 -> v3
Simplify device tree using pci1 definition from the header file
as suggested bt Scott Wood
Move assembly code into a separated file as suggested by Scott Wood
Increase from 2 to 5 the number of UARTs

v1 -> v2
Fix BCSR handling
Add missing @interrupt-cells in the device tree
to avoid 'of_irq_parse_pci() failed with rc=-22'
Reduce from 3 to 2 the PCI windows to avoid
'Ran out of outbound PCI ATMUs for IO resource'

 arch/powerpc/boot/Makefile   |   4 +
 arch/powerpc/boot/dts/fsl/mvme7100.dts   | 159 +++
 arch/powerpc/boot/motload-head.S |  11 ++
 arch/powerpc/boot/mvme7100.c |  60 ++
 arch/powerpc/boot/ppcboot.h  |   2 +-
 arch/powerpc/boot/wrapper|   5 +
 arch/powerpc/configs/86xx-hw.config  |   4 +-
 arch/powerpc/configs/mpc86xx_basic_defconfig |   1 +
 arch/powerpc/platforms/86xx/Kconfig  |   7 +-
 arch/powerpc/platforms/86xx/Makefile |   1 +
 arch/powerpc/platforms/86xx/mvme7100.c   | 125 +
 11 files changed, 375 insertions(+), 4 deletions(-)
 create mode 100644 arch/powerpc/boot/dts/fsl/mvme7100.dts
 create mode 100644 arch/powerpc/boot/motload-head.S
 create mode 100644 arch/powerpc/boot/mvme7100.c
 create mode 100644 arch/powerpc/platforms/86xx/mvme7100.c

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 8fe78a3..963aa88 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -113,6 +113,7 @@ src-plat-$(CONFIG_EPAPR_BOOT) += epapr.c epapr-wrapper.c
 src-plat-$(CONFIG_PPC_PSERIES) += pseries-head.S
 src-plat-$(CONFIG_PPC_POWERNV) += pseries-head.S
 src-plat-$(CONFIG_PPC_IBM_CELL_BLADE) += pseries-head.S
+src-plat-$(CONFIG_MVME7100) += motload-head.S mvme7100.c
 
 src-wlib := $(sort $(src-wlib-y))
 src-plat := $(sort $(src-plat-y))
@@ -296,6 +297,9 @@ image-$(CONFIG_TQM8560) += 
cuImage.tqm8560
 image-$(CONFIG_SBC8548)+= cuImage.sbc8548
 image-$(CONFIG_KSI8560)+= cuImage.ksi8560
 
+# Board ports in arch/powerpc/platform/86xx/Kconfig
+image-$(CONFIG_MVME7100)+= dtbImage.mvme7100
+
 # Board ports in arch/powerpc/platform/embedded6xx/Kconfig
 image-$(CONFIG_STORCENTER) += cuImage.storcenter
 image-$(CONFIG_MPC7448HPC2)+= cuImage.mpc7448hpc2
diff --git a/arch/powerpc/boot/dts/fsl/mvme7100.dts 
b/arch/powerpc/boot/dts/fsl/mvme7100.dts
new file mode 100644
index 000..ee6886b
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/mvme7100.dts
@@ -0,0 +1,159 @@
+/*
+ * Device tree source for the Emerson/Artesyn MVME7100
+ *
+ * Copyright 2016 Elettra-Sincrotrone Trieste S.C.p.A.
+ *
+ * Author: Alessio Igor Bogani 
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ *
+ */
+
+/include/ "mpc8641si-pre.dtsi"
+
+/ {
+   model = "MVME7100";
+   compatible = "artesyn,MVME7100";
+
+   memory {
+   device_type = "memory";
+   reg = <0x 0x8000>;
+   };
+
+   soc: soc@f100 {
+   ranges = <0x 0xf100 0x0010>;
+
+   i2c@3000 {
+   hwmo

Re: [PATCH 17/19] dm: get rid of superfluous gfp flags

2016-04-27 Thread Michal Hocko

[Adding dm-de...@redhat.com to CC]

On Tue 26-04-16 13:20:04, Mikulas Patocka wrote:
> On Fri, 22 Apr 2016, Michal Hocko wrote:
[...]
> > copy_params seems to be called only from the ioctl context which doesn't
> > hold any locks which would lockup during the direct reclaim AFAICS. The
> > git log shows that the code has used PF_MEMALLOC before which is even
> > bigger mystery to me. Could you please clarify why this is GFP_NOIO
> > restricted context? Maybe it needed to be in the past but I do not see
> > any reason for it to be now so unless I am missing something the
> > GFP_KERNEL should be perfectly OK. Also note that GFP_NOIO wouldn't work
> > properly because there are copy_from_user calls in the same path which
> > could page fault and do GFP_KERNEL allocations anyway. I can send follow
> > up cleanups unless I am missing something subtle here.
> 
> The LVM tool calls suspend and resume ioctls on device mapper block 
> devices.
>
> When a device is suspended, any bio sent to the device is held. If the 
> resume ioctl did GFP_KERNEL allocation, the allocation could get stuck 
> trying to write some dirty cached pages to the suspended device.
> 
> The LVM tool and the dmeventd daemon use mlock to lock its address space, 
> so the copy_from_user/copy_to_user call cannot trigger a page fault.

OK, I see, thanks for the clarification! This sounds fragile to me
though. Wouldn't it be better to use the memalloc_noio_save for the
whole copy_params instead? That would force all possible allocations to
not trigger any IO. Something like the following.
---
>From dbb2338bb88d2da1ff24cee59cbffd120b119e3b Mon Sep 17 00:00:00 2001
From: Michal Hocko 
Date: Wed, 27 Apr 2016 10:26:13 +0200
Subject: [PATCH] dm: clean up GFP_NIO usage

copy_params uses GFP_NOIO for explicit allocation requests because this
might be called from the suspend path. To quote Mikulas:
: The LVM tool calls suspend and resume ioctls on device mapper block
: devices.
:
: When a device is suspended, any bio sent to the device is held. If the
: resume ioctl did GFP_KERNEL allocation, the allocation could get stuck
: trying to write some dirty cached pages to the suspended device.
:
: The LVM tool and the dmeventd daemon use mlock to lock its address space,
: so the copy_from_user/copy_to_user call cannot trigger a page fault.

Relying on the mlock is quite fragile and we have a better way in kernel
to enfore NOIO which is already used for the vmalloc fallback. Just use
memalloc_noio_{save,restore} around the whole copy_params function which
will force the same also to the page fult paths via copy_{from,to}_user.

While we are there we can also remove __GFP_NOMEMALLOC because copy_params
is never called from MEMALLOC context (e.g. during the reclaim).

Signed-off-by: Michal Hocko 
---
 drivers/md/dm-ioctl.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 2c7ca258c4e4..fe0b57d7573c 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1715,16 +1715,13 @@ static int copy_params(struct dm_ioctl __user *user, 
struct dm_ioctl *param_kern
 */
dmi = NULL;
if (param_kernel->data_size <= KMALLOC_MAX_SIZE) {
-   dmi = kmalloc(param_kernel->data_size, GFP_NOIO | __GFP_NORETRY 
| __GFP_NOMEMALLOC | __GFP_NOWARN);
+   dmi = kmalloc(param_kernel->data_size, GFP_KERNEL | 
__GFP_NORETRY | __GFP_NOWARN);
if (dmi)
*param_flags |= DM_PARAMS_KMALLOC;
}
 
if (!dmi) {
-   unsigned noio_flag;
-   noio_flag = memalloc_noio_save();
-   dmi = __vmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH 
| __GFP_HIGHMEM, PAGE_KERNEL);
-   memalloc_noio_restore(noio_flag);
+   dmi = __vmalloc(param_kernel->data_size, GFP_KERNEL | 
__GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL);
if (dmi)
*param_flags |= DM_PARAMS_VMALLOC;
}
@@ -1801,6 +1798,7 @@ static int ctl_ioctl(uint command, struct dm_ioctl __user 
*user)
ioctl_fn fn = NULL;
size_t input_param_size;
struct dm_ioctl param_kernel;
+   unsigned noio_flag;
 
/* only root can play with this */
if (!capable(CAP_SYS_ADMIN))
@@ -1832,9 +1830,12 @@ static int ctl_ioctl(uint command, struct dm_ioctl 
__user *user)
}
 
/*
-* Copy the parameters into kernel space.
+* Copy the parameters into kernel space. Make sure that no IO is 
triggered
+* from the allocation paths because this might be called during the 
suspend.
 */
+   noio_flag = memalloc_noio_save();
r = copy_params(user, ¶m_kernel, ioctl_flags, ¶m, ¶m_flags);
+   memalloc_noio_restore(noio_flag);
 
if (r)
return r;
-- 
2.8.0.rc3

-- 
Michal Hocko
SUSE Labs

Re: [PATCH] mmc: dw_mmc-rockchip: add MMC_CAP_CMD23 capabilities

2016-04-27 Thread Jaehoon Chung

Hi Shawn,

I will apply this. Thanks!

Best Regards,
Jaehoon Chung

On 04/26/2016 03:53 PM, Shawn Lin wrote:
> Add MMC_CAP_CMD23 for dw_mmc-rockchip, otherwise
> failing to create rpmb partition. With it, we can
> get rpmb successfully:
> 
> mmc1: new HS200 MMC card at address 0001
> mmcblk0: mmc1:0001 DS2016 14.7 GiB
> mmcblk0boot0: mmc1:0001 DS2016 partition 1 4.00 MiB
> mmcblk0boot1: mmc1:0001 DS2016 partition 2 4.00 MiB
> mmcblk0rpmb: mmc1:0001 DS2016 partition 3 4.00 MiB
> 
> Signed-off-by: Shawn Lin 
> ---
> 
>  drivers/mmc/host/dw_mmc-rockchip.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/mmc/host/dw_mmc-rockchip.c 
> b/drivers/mmc/host/dw_mmc-rockchip.c
> index 8c20b81..2b4bcd2 100644
> --- a/drivers/mmc/host/dw_mmc-rockchip.c
> +++ b/drivers/mmc/host/dw_mmc-rockchip.c
> @@ -233,10 +233,10 @@ static int dw_mci_rockchip_init(struct dw_mci *host)
>  
>  /* Common capabilities of RK3288 SoC */
>  static unsigned long dw_mci_rk3288_dwmmc_caps[4] = {
> - MMC_CAP_ERASE,
> - MMC_CAP_ERASE,
> - MMC_CAP_ERASE,
> - MMC_CAP_ERASE,
> + MMC_CAP_ERASE | MMC_CAP_CMD23,
> + MMC_CAP_ERASE | MMC_CAP_CMD23,
> + MMC_CAP_ERASE | MMC_CAP_CMD23,
> + MMC_CAP_ERASE | MMC_CAP_CMD23,
>  };
>  
>  static const struct dw_mci_drv_data rk2928_drv_data = {
>

Re: [PATCH 2/8] char/rtc: legacy RTC is no longer supported on x86

2016-04-27 Thread Alexandre Belloni

On 26/04/2016 at 23:44:06 +0200, Arnd Bergmann wrote :
> Commit 3195ef59cb42 ("x86: Do full rtc synchronization with ntp") had
> the side-effect of unconditionally enabling the RTC_LIB symbol on x86,
> which in turn disables the selection of the CONFIG_RTC and
> CONFIG_GEN_RTC drivers that contain a two older implementations of
> the CONFIG_RTC_DRV_CMOS driver.
> 
> This removes x86 from the list.
> 
> Signed-off-by: Arnd Bergmann 
Acked-by: Alexandre Belloni 

Two down, still four drivers for the x86 RTCs...


> ---
>  drivers/char/Kconfig | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
> index 66b5d48f409a..9bdb629fbaae 100644
> --- a/drivers/char/Kconfig
> +++ b/drivers/char/Kconfig
> @@ -279,7 +279,7 @@ if RTC_LIB=n
>  
>  config RTC
>   tristate "Enhanced Real Time Clock Support (legacy PC RTC driver)"
> - depends on ALPHA || (MIPS && MACH_LOONGSON64) || MN10300 || X86
> + depends on ALPHA || (MIPS && MACH_LOONGSON64) || MN10300
>   ---help---
> If you say Y here and create a character special file /dev/rtc with
> major number 10 and minor number 135 using mknod ("man mknod"), you
> @@ -328,7 +328,7 @@ config JS_RTC
>  config GEN_RTC
>   tristate "Generic /dev/rtc emulation"
>   depends on RTC!=y
> - depends on ALPHA || M68K || MN10300 || PARISC || PPC || X86
> + depends on ALPHA || M68K || MN10300 || PARISC || PPC
>   ---help---
> If you say Y here and create a character special file /dev/rtc with
> major number 10 and minor number 135 using mknod ("man mknod"), you
> -- 
> 2.7.0
> 

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression

2016-04-27 Thread Michal Hocko

On Wed 27-04-16 16:20:43, Huang, Ying wrote:
> Michal Hocko  writes:
> 
> > On Wed 27-04-16 11:15:56, kernel test robot wrote:
> >> FYI, we noticed vm-scalability.throughput -11.8% regression with the 
> >> following commit:
> >
> > Could you be more specific what the test does please?
> 
> The sub-testcase of vm-scalability is swap-w-rand.  An RAM emulated pmem
> device is used as a swap device, and a test program will allocate/write
> anonymous memory randomly to exercise page allocation, reclaiming, and
> swapping in code path.

Can I download the test with the setup to play with this?
-- 
Michal Hocko
SUSE Labs

Re: [PATCH v6 2/4] dmaengine: dw: revisit data_width property

2016-04-27 Thread Andy Shevchenko

On Wed, 2016-04-27 at 14:00 +0530, Viresh Kumar wrote:
> On 27-04-16, 11:25, Andy Shevchenko wrote:
> > 
> > -- data_width: Maximum data width supported by hardware per AHB
> > > > master
> > > > -  (0 - 8bits, 1 - 16bits, ..., 5 - 256bits)
> > > > +- data-width: Maximum data width supported by hardware per AHB
> > > > master
> > > > +  (in bytes, power of 2)

> 
> But, the DT documentation doesn't contain the old property now but the
> code
> does. I think you are required to keep the bindings currently
> supported by the
> kernel in there. You can mark them deprecated, but can't remove them.

This point I take, indeed.

Will update series soon. Thanks for review.

-- 
Andy Shevchenko 
Intel Finland Oy

Re: [PATCH] iio: inv_mpu6050: Add support for auxiliary I2C master

2016-04-27 Thread Peter Rosin

Hi!

On 2016-04-23 23:32, Jonathan Cameron wrote:
> On 20/04/16 18:17, Crestez Dan Leonard wrote:
>> The MPU has an auxiliary I2C bus for connecting external
>> sensors. This bus has two operating modes:
>> * pass-through, which connects the primary and auxiliary busses
>> together. This is already supported via an i2c mux.
>> * I2C master mode, where the mpu60x0 acts as a master to any external
>> connected sensors. This is implemented by this patch.
>>
>> This I2C master mode also works when the MPU itself is connected via
>> SPI.
>>
>> I2C master supports up to 5 slaves. Slaves 0-3 have a common operating
>> mode while slave 4 is different. This patch implements an i2c adapter
>> using slave 4 because it has a cleaner interface and it has an
>> interrupt that signals when data from slave to master arrived.
>>
>> Signed-off-by: Crestez Dan Leonard 
> This one needs acks from:
>
> Device tree maintainer (odd binding ;)
> Peter Rosin (odd binding interacting with the mux support)
> Wolfram (it has a whole i2c master driver in here).
>
> (just thought I'd list these for the avoidance of doubt).

I spot some overlap with the questions in "[RFC] i2c: device-tree:
Handling child nodes which are not i2c devices"
http://marc.info/?l=linux-i2c&m=146073452819116&w=2

And I think I agree with Stephen Warren that an intermediate placeholder
node would make sense. I.e.

mpu6050@68 {
compatible = "...";
reg = <0x68>;
...
i2c-aux-mux {
i2c@0 {
#address-cells = <1>;
#size-cells = <0>;
reg = <0>;

foo@44 {
compatible = "bar";
reg = <0x44>;
...
}
}
}
}

Or

mpu6050@68 {
compatible = "...";
reg = <0x68>;
...
i2c-aux-master {
#address-cells = <1>;
#size-cells = <0>;

gazonk@44 {
compatible = "baz";
reg = <0x44>;
...
}
}
}

depending on if you want an aux-mux or an aux-master.

But I don't know if that intermediate i2c-aux-mux node causes any
problems?

Cheers,
Peter

Re: char: legacy RTC cleanups

2016-04-27 Thread Geert Uytterhoeven

Hi Arnd,

On Wed, Apr 27, 2016 at 10:33 AM, Arnd Bergmann  wrote:
> On Wednesday 27 April 2016 09:54:41 Geert Uytterhoeven wrote:
>> On Tue, Apr 26, 2016 at 11:44 PM, Arnd Bergmann  wrote:
>> > For the genrtc driver, rearranging the headers makes it simpler
>> > to use and reduces duplication. In case of alpha and mn10300,
>> > I've shown that the genrtc and rtc drivers are doing the same
>> > thing, so we don't need them both. The remaining three
>> > architectures (m68k, parisc, powerpc) actually all support
>> > the newer rtc-generic driver, so we could remove genrtc completely
>> > if we want to.
>>
>> CONFIG_GEN_RTC is not enabled in any of the m68k defconfigs, so I think 
>> genrtc
>> has been unused for a while.
>> All defconfigs either use CONFIG_RTC_DRV_GENERIC, or enable a more specific
>> RTC driver.
>
> Ok, good to know. I'm guessing the same is true for parisc, but there are
> also very few users.
>
> Regarding the Q40 specific ioctls, what do you think this means, is it
>
> a) nobody uses Q40 with modern kernels,
> b) nobody calls RTC_PLL_GET/RTC_PLL_SET on q40, or
> c) Q40 users have their own configurations and enable GEN_RTC?

To be honest, I have no idea. There have never been many Q40 users.
(old) http://www.linux-m68k.org/Registry/Statistics.html shows only 8.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

[PATCH] usb: dwc3: usb/dwc3: fake dissconnect event when turn off pullup

2016-04-27 Thread changbin . du

From: "Du, Changbin" 

The dwc3 controller can't generate a disconnect event after it is
stopped. Thus gadget dissconnect callback is not invoked when do
soft dissconnect. Call dissconnect here to workaround this issue.

Note, most time we still see disconnect be called that because
it is invoked by dwc3_gadget_reset_interrupt(). But if we
disconnect cable once pullup disabled quickly, issue can be
observed.

Signed-off-by: Du, Changbin 
---
 drivers/usb/dwc3/gadget.c | 33 -
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 8e4a1b1..cd73187 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1566,6 +1566,15 @@ static int dwc3_gadget_run_stop(struct dwc3 *dwc, int 
is_on, int suspend)
return 0;
 }
 
+static void dwc3_disconnect_gadget(struct dwc3 *dwc)
+{
+   if (dwc->gadget_driver && dwc->gadget_driver->disconnect) {
+   spin_unlock(&dwc->lock);
+   dwc->gadget_driver->disconnect(&dwc->gadget);
+   spin_lock(&dwc->lock);
+   }
+}
+
 static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
 {
struct dwc3 *dwc = gadget_to_dwc(g);
@@ -1575,6 +1584,21 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int 
is_on)
is_on = !!is_on;
 
spin_lock_irqsave(&dwc->lock, flags);
+   /**
+* WORKAROUND: The dwc3 controller can't generate a disconnect
+* event after it is stopped. Thus gadget dissconnect callback
+* is not invoked when do soft dissconnect. Call dissconnect
+* here to workaround this issue.
+* Note, most time we still see disconnect be called that because
+* it is invoked by dwc3_gadget_reset_interrupt(). But if we
+* disconnect cable once pullup disabled quickly, issue can be
+* observed.
+*/
+   if (!is_on && (dwc->gadget.speed != USB_SPEED_UNKNOWN)) {
+   dev_dbg(dwc->dev, "fake dissconnect event on pullup off\n");
+   dwc3_disconnect_gadget(dwc);
+   dwc->gadget.speed = USB_SPEED_UNKNOWN;
+   }
ret = dwc3_gadget_run_stop(dwc, is_on, false);
spin_unlock_irqrestore(&dwc->lock, flags);
 
@@ -2144,15 +2168,6 @@ static void dwc3_endpoint_interrupt(struct dwc3 *dwc,
}
 }
 
-static void dwc3_disconnect_gadget(struct dwc3 *dwc)
-{
-   if (dwc->gadget_driver && dwc->gadget_driver->disconnect) {
-   spin_unlock(&dwc->lock);
-   dwc->gadget_driver->disconnect(&dwc->gadget);
-   spin_lock(&dwc->lock);
-   }
-}
-
 static void dwc3_suspend_gadget(struct dwc3 *dwc)
 {
if (dwc->gadget_driver && dwc->gadget_driver->suspend) {
-- 
2.7.4

Re: [PATCH 3/8] char/rtc: remove empty asm/mc146818rtc.h files

2016-04-27 Thread Alexandre Belloni

On 26/04/2016 at 23:44:07 +0200, Arnd Bergmann wrote :
> Nothing on these architectures ever includes the asm/mc146818rtc.h
> file, the drivers that used to do this have been fixed long ago,
> and the remaining users are all PC-specific.
> 
> This removes the files for good.
> 
> Signed-off-by: Arnd Bergmann 
Acked-by: Alexandre Belloni 

> ---
>  arch/frv/include/asm/mc146818rtc.h| 16 
>  arch/h8300/include/asm/mc146818rtc.h  |  9 -
>  arch/ia64/include/asm/mc146818rtc.h   | 10 --
>  arch/parisc/include/asm/mc146818rtc.h |  9 -
>  arch/sh/include/asm/mc146818rtc.h |  7 ---
>  5 files changed, 51 deletions(-)
>  delete mode 100644 arch/frv/include/asm/mc146818rtc.h
>  delete mode 100644 arch/h8300/include/asm/mc146818rtc.h
>  delete mode 100644 arch/ia64/include/asm/mc146818rtc.h
>  delete mode 100644 arch/parisc/include/asm/mc146818rtc.h
>  delete mode 100644 arch/sh/include/asm/mc146818rtc.h
> 
> diff --git a/arch/frv/include/asm/mc146818rtc.h 
> b/arch/frv/include/asm/mc146818rtc.h
> deleted file mode 100644
> index 90dfb7a633d1..
> --- a/arch/frv/include/asm/mc146818rtc.h
> +++ /dev/null
> @@ -1,16 +0,0 @@
> -/* mc146818rtc.h: RTC defs
> - *
> - * Copyright (C) 2005 Red Hat, Inc. All Rights Reserved.
> - * Written by David Howells (dhowe...@redhat.com)
> - *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of the GNU General Public License
> - * as published by the Free Software Foundation; either version
> - * 2 of the License, or (at your option) any later version.
> - */
> -
> -#ifndef _ASM_MC146818RTC_H
> -#define _ASM_MC146818RTC_H
> -
> -
> -#endif /* _ASM_MC146818RTC_H */
> diff --git a/arch/h8300/include/asm/mc146818rtc.h 
> b/arch/h8300/include/asm/mc146818rtc.h
> deleted file mode 100644
> index ab9d9646d241..
> --- a/arch/h8300/include/asm/mc146818rtc.h
> +++ /dev/null
> @@ -1,9 +0,0 @@
> -/*
> - * Machine dependent access functions for RTC registers.
> - */
> -#ifndef _H8300_MC146818RTC_H
> -#define _H8300_MC146818RTC_H
> -
> -/* empty include file to satisfy the include in genrtc.c/ide-geometry.c */
> -
> -#endif /* _H8300_MC146818RTC_H */
> diff --git a/arch/ia64/include/asm/mc146818rtc.h 
> b/arch/ia64/include/asm/mc146818rtc.h
> deleted file mode 100644
> index 407787a237ba..
> --- a/arch/ia64/include/asm/mc146818rtc.h
> +++ /dev/null
> @@ -1,10 +0,0 @@
> -#ifndef _ASM_IA64_MC146818RTC_H
> -#define _ASM_IA64_MC146818RTC_H
> -
> -/*
> - * Machine dependent access functions for RTC registers.
> - */
> -
> -/* empty include file to satisfy the include in genrtc.c */
> -
> -#endif /* _ASM_IA64_MC146818RTC_H */
> diff --git a/arch/parisc/include/asm/mc146818rtc.h 
> b/arch/parisc/include/asm/mc146818rtc.h
> deleted file mode 100644
> index adf41631449f..
> --- a/arch/parisc/include/asm/mc146818rtc.h
> +++ /dev/null
> @@ -1,9 +0,0 @@
> -/*
> - * Machine dependent access functions for RTC registers.
> - */
> -#ifndef _ASM_MC146818RTC_H
> -#define _ASM_MC146818RTC_H
> -
> -/* empty include file to satisfy the include in genrtc.c */
> -
> -#endif /* _ASM_MC146818RTC_H */
> diff --git a/arch/sh/include/asm/mc146818rtc.h 
> b/arch/sh/include/asm/mc146818rtc.h
> deleted file mode 100644
> index 0aee96a97330..
> --- a/arch/sh/include/asm/mc146818rtc.h
> +++ /dev/null
> @@ -1,7 +0,0 @@
> -/*
> - * Machine dependent access functions for RTC registers.
> - */
> -#ifndef _ASM_MC146818RTC_H
> -#define _ASM_MC146818RTC_H
> -
> -#endif /* _ASM_MC146818RTC_H */
> -- 
> 2.7.0
> 

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[PATCH] mm/memory_failure: unify the output-prefix for printk()

2016-04-27 Thread Chen Yucong

This patch aims to replace 'MCE' that was introduced by
'commit c2200538d89d ("mm/memory-failure: fix race with
compound page split/merge")' with 'Memory failure'.[1]

[1] https://lkml.org/lkml/2016/4/18/894

Signed-off-by: Chen Yucong 
---
 mm/memory-failure.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 839aa53..2fcca6b 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -894,7 +894,8 @@ int get_hwpoison_page(struct page *page)
if (head == compound_head(page))
return 1;
 
-   pr_info("MCE: %#lx cannot catch tail\n", page_to_pfn(page));
+   pr_info("Memory failure: %#lx cannot catch tail\n",
+   page_to_pfn(page));
put_page(head);
}
 
-- 
1.8.3.1

Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression

2016-04-27 Thread Huang, Ying

Michal Hocko  writes:

> On Wed 27-04-16 16:20:43, Huang, Ying wrote:
>> Michal Hocko  writes:
>> 
>> > On Wed 27-04-16 11:15:56, kernel test robot wrote:
>> >> FYI, we noticed vm-scalability.throughput -11.8% regression with the 
>> >> following commit:
>> >
>> > Could you be more specific what the test does please?
>> 
>> The sub-testcase of vm-scalability is swap-w-rand.  An RAM emulated pmem
>> device is used as a swap device, and a test program will allocate/write
>> anonymous memory randomly to exercise page allocation, reclaiming, and
>> swapping in code path.
>
> Can I download the test with the setup to play with this?

There are reproduce steps in the original report email.

To reproduce:

git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml  # job file is attached in this email
bin/lkp run job.yaml


The job.yaml and kconfig file are attached in the original report email.

Best Regards,
Huang, Ying

Re: [PATCH V17 2/3] dmaengine: qcom_hidma: add debugfs hooks

2016-04-27 Thread Marc Zyngier

On 27/04/16 09:15, Vinod Koul wrote:
> On Tue, Apr 26, 2016 at 12:55:18PM -0400, Sinan Kaya wrote:
>> On 4/26/2016 12:25 PM, Vinod Koul wrote:
>>> On Tue, Apr 26, 2016 at 08:08:16AM -0400, ok...@codeaurora.org wrote:
 On 2016-04-25 23:30, Vinod Koul wrote:
> On Mon, Apr 11, 2016 at 10:21:12AM -0400, Sinan Kaya wrote:
>
>> +static int hidma_chan_stats(struct seq_file *s, void *unused)
>> +{
>> +struct hidma_chan *mchan = s->private;
>> +struct hidma_desc *mdesc;
>> +struct hidma_dev *dmadev = mchan->dmadev;
>> +
>> +pm_runtime_get_sync(dmadev->ddev.dev);
>
> debug shouldn't power up device, why do you want to do that


 Clocks are turned off while the hw is idle. I can’t reach hw
 registers without restoring power.
>>>
>>> Hmm, have you thought about using regmap?
>>>
>>
>> To be honest, I didn't know what regmap is but I just read some code
>> and looked at how it is used. Feel free to correct me if I got it 
>> wrong. 
>>
>> Regmap seems to be designed for *slow* speed peripherals to improve frequent
>> accesses by the SW. It looks like it is used by MFD, SPI and I2C drivers.
>>
>> It seems to cache the register contents and flush/invalidate them only when
>> needed.
>>
>> The MMIO version seems to be assuming the presence of device-tree like CLK
>> API which doesn't exist on ACPI systems and is not portable.
>>
>> My reaction is that it is a lot of code with no added functionality to what
>> HIDMA driver is trying to achieve. 
>>
>> Given that the use case here is only for debug purposes; I think it is OK 
>> to keep this runtime call here. I don't want to add any overhead into the
>> existing code just to support the debug use case.  
>>
>> None of my register read/writes are slow. This file will only be used to 
>> troubleshoot customer issues.

I'd recommend you actually run perf on a a few of your MMIO accesses. I
believe the result will be eye opening. On the KVM side, we've trimmed
our MMIO access as much as possible, using a memory-based cache (similar
to regmap in concept). This has made some code paths about 40% faster.

> $ is always faster than MMIO. This way you can give reg contents to users
> without waking up hw.

Indeed. MMIO access sucks rocks, even on a very fast box. Actually, the
faster the box is, the slower MMIO feels (compared to memory).

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...

Re: [PATCH 1/2] clk: imx: do not sleep if IRQ's are still disabled

2016-04-27 Thread Dong Aisheng

On Wed, Apr 27, 2016 at 3:24 PM, Shawn Guo  wrote:
> On Wed, Apr 27, 2016 at 10:57:21AM +0800, Dong Aisheng wrote:
>> On Wed, Apr 27, 2016 at 9:58 AM, Shawn Guo  wrote:
>> > On Tue, Apr 26, 2016 at 07:27:03PM +0800, Dong Aisheng wrote:
>> >> Shawn,
>> >> What's your suggestion?
>> >
>> > I think this needs more discussion, and I just dropped Stefan's patch
>> > from my tree.
>> >
>> > We need to firstly understand why this is happening.  The .prepare hook
>> > is defined to be non-atomic context, and so that we call sleep function
>> > in there.  We did everything right.  Why are we getting the warning?  If
>> > I'm correct, this warning only happens on i.MX7D.  Why is that?
>> >
>>
>> Why Stefan's patch works (checking irqs_disabled()) is because during kernel
>> time init, the irq is still not enabled. It fixes the issue indirectly.
>> See:
>> asmlinkage __visible void __init start_kernel(void)
>> {
>> /*
>>  * Set up the scheduler prior starting any interrupts (such as the
>>  * timer interrupt). Full topology setup happens at smp_init()
>>  * time - but meanwhile we still have a functioning scheduler.
>>  */
>> sched_init();
>> .
>> time_init();
>> ..
>> WARN(!irqs_disabled(), "Interrupts were enabled early\n");
>> early_boot_irqs_disabled = false;
>> local_irq_enable();
>> }
>>
>> The issue can only happen when PLL enable causes a schedule during
>> imx_clock_init().
>> Not all PLL has this issue.
>> The issue happens on MX7D pll_audio_main_clk/pll_video_main_clk
>> which requires more delay time and cause usleep.
>> Because clk framework does not support MX7D clock types (operation requires
>> parents on), we simply enable all clocks in imx7d_clocks_init().
>>
>> If apply my this patch series:
>> https://lkml.org/lkml/2016/4/20/199
>> The issue can also be gone.
>
> Thanks for the info.  It sounds like that we are fixing the problem in
> the wrong place, i.e. clk_pllv3_prepare().  The function does nothing
> wrong, since the .prepare hook is defined to be one that can sleep.  If
> we see sleep warning in a context calling clk_prepare(), that probably
> means we shouldn't make the function call from that context.
>

Yes, i agree.
I'm trying to get rid of these calls.

Simply remove them or delay to arch_init will cause kernel fail to boot.
Still checking the root cause.

> Shawn

Regards
Dong Aisheng

Re: [BUG linux-next] Kernel panic found with linux-next-20160414

2016-04-27 Thread Vlastimil Babka


On 04/27/2016 10:14 AM, Hugh Dickins wrote:

It's rather horrible that compaction.c uses functions in page_alloc.c
which skip doing some of the things we expect to be done: the non-debug
preparation tends to get noticed, but the debug options overlooked.
We can expect more problems of this kind in future: someone will add
yet another debug prep line in page_alloc.c, and at first nobody will
notice that it's also needed in compaction.c.


Point taken, I'll try to come up with more maintainable solution next 
time I attempt the isolate_freepages_direct() approach. Sorry about the 
troubles.

Re: [PATCH 1/2] clk: imx: do not sleep if IRQ's are still disabled

2016-04-27 Thread Dong Aisheng

On Wed, Apr 27, 2016 at 3:28 PM, Stefan Agner  wrote:
> On 2016-04-26 19:56, Fabio Estevam wrote:
>> On Tue, Apr 26, 2016 at 11:45 PM, Dong Aisheng  wrote:
>>
 We need to firstly understand why this is happening.  The .prepare hook
 is defined to be non-atomic context, and so that we call sleep function
 in there.  We did everything right.  Why are we getting the warning?  If
 I'm correct, this warning only happens on i.MX7D.  Why is that?

>>>
>>> This is mainly caused by during kernel early booting, there's only one init 
>>> idle
>>> task running.
>>> See:
>>> void __init sched_init(void)
>>> {
>>> .
>>> /*
>>>  * Make us the idle thread. Technically, schedule() should not be
>>>  * called from this thread, however somewhere below it might be,
>>>  * but because we are the idle thread, we just pick up running again
>>>  * when this runqueue becomes "idle".
>>>  */
>>> init_idle(current, smp_processor_id());
>>> ...
>>> }
>>>
>>> And the idle sched class indicates it's not valid to schedule for idle task.
>>> const struct sched_class idle_sched_class = {
>>> /* .next is NULL */
>>> /* no enqueue/yield_task for idle tasks */
>>>
>>> /* dequeue is not valid, we print a debug message there: */
>>> .dequeue_task   = dequeue_task_idle,
>>> ...
>>>
>>> }
>>>
>>> /*
>>>  * It is not legal to sleep in the idle task - print a warning
>>>  * message if some code attempts to do it:
>>>  */
>>> static void
>>> dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags)
>>> {
>>> raw_spin_unlock_irq(&rq->lock);
>>> printk(KERN_ERR "bad: scheduling from the idle thread!\n");
>>> dump_stack();
>>> raw_spin_lock_irq(&rq->lock);
>>> }
>>>
>>>
>>> Below is the full log of imx7d booting FYI.
>>
>> This does not answer Shawn's question: why do we see this only on mx7d?
>
> I was wondering that too My theory is that either on i.MX 6 the
> clocks enable almost instantly leading to no sleep, or they are just
> bootloader/firmware on...?
>

Yes, for core plls, they're already enabled in bootloader.
We observed this issue on MX7D is because we rudely enable all clocks
for it and MX7D AV PLLs which will lead to sleep reveals this issue.

Regards
Dong Aisheng

> --
> Stefan

Re: zram: per-cpu compression streams

2016-04-27 Thread Sergey Senozhatsky


Hello,

more tests. I did only 8streams vs per-cpu this time. the changes
to the test are:
-- mem-hogger now per-faults pages in parallel with fio
-- mem-hogger alloc size increased from 3GB to 4GB.

the system couldn't survive 4GB/4GB 
zram(buffer_compress_percentage=11)/mem-hogger
split (OOM), so I executed the 3GB/4GB test (close to system's OOM edge).

-- 4 GB x86_64
-- 3 GB zram lzo

firts, the mm_stat.

8 streams (base kernel):

3221225472 3221225472 32212254720 322122956800 < 
2752460/   0>
3221225472 3221225472 32212254720 322123366400 < 
5504124/   0>
3221225472 2912157607 29528023040 29528268800   81 < 
8253369/   0>
3221225472 2893479936 28991201280 28991365120  147 
<11003056/   0>
3221217280 2886040814 28991037440 28991283200   26 
<13748450/   0>
3221225472 2880045056 28856934400 28857180160  180 
<16503120/   0>
3221213184 2877431364 28837560320 28838092800  132 
<19259891/   0>
3221225472 2873229312 28760965120 28761333760   16 
<22016512/   0>
3221213184 2870728008 28716933120 28717260800   24 
<24768909/   0>
2899095552 2899095552 28990955520 2899132416786430 
<27523600/   0>

per-cpu:

3221225472 3221225472 32212254720 322122956800 < 
2752460/8180>
3221225472 3221225472 32212254720 322123366400 < 
5504124/   10523>
3221225472 2912157607 29528023040 29528145920  117 < 
8253369/9451>
3221225472 2893479936 28991201280 28991365120  129 
<11003056/9395>
3221217280 2886040814 28991037440 28991283200   51 
<13748450/   10879>
3221225472 2880045056 28856934400 28857180160  126 
<16503120/   10300>
3221213184 2877431364 28837724160 28838010880  252 
<19259891/   10509>
3221225472 2873229312 28761006080 28761333760   14 
<22016512/   11081>
3221213184 2870728008 28716933120 28717301760   54 
<24768909/   10770>
2899095552 2899095552 28990955520 2899136512786430 
<27523600/   10231>


mem-hogger pre-fault times

8 streams (base kernel):

[431] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7f3f5d38a010 <+   6.031550428>
[470] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7fa29d414010 <+   5.242295692>
[514] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7f4a7eac8010 <+   5.485469454>
[563] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7f07da76b010 <+   5.563647658>
[619] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7ff5efc26010 <+   5.516866208>
[681] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7f8fb896d010 <+   5.535275748>
[751] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7fb2ac6fa010 <+   4.594626366>
[825] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7f355f9a0010 <+   5.075849029>
[905] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7feb16715010 <+   4.696363680>
[991] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7f3a1b9f4010 <+   5.292365453>


per-cpu:

[413] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7fe8058f5010 <+   5.513944292>
[451] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7f65fe753010 <+   4.742384977>
[494] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7fb99a05c010 <+   5.394711696>
[542] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7f0d61c81010 <+   5.021011664>
[598] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7f9abdeb6010 <+   5.094722019>
[660] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7fb192ae9010 <+   4.943961060>
[728] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7f7313aeb010 <+   5.437872456>
[802] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7f25ffdeb010 <+   5.422829590>
[881] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7f60daa8e010 <+   4.806425351>
[970] single-alloc:  INFO: Allocated 0x1 bytes at address 
0x7f384cf04010 <+   4.982513395>


so, pre-fault time range is somewhat big. for example, from 4.696363680 to 
6.031550428 seconds.



fio
8 streamsper-cpu
===
#jobs1  
READ:   2507.8MB/s   2526.4MB/s
READ:   2043.1MB/s   1970.6MB/s
WRITE:  127100KB/s   139160KB/s
WRITE:  724488KB/s   733440KB/s
READ:   534624KB/s   540967KB/s
WRITE:  534569KB/s   540912KB/s
READ:   471165KB/s   477459KB/s
WRITE:  471233KB/s   477527KB/s
#jobs2

Re: [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards.

2016-04-27 Thread Enric Balletbo i Serra



On 27/04/16 10:35, Jaehoon Chung wrote:
> On 04/26/2016 05:03 PM, Enric Balletbo i Serra wrote:
>> Hi,
>>
>> I introduced the cover letter to give some background about this.
>>
>> I have been investigating a problem related to at least one specific sdcard 
>> when
>> UHS-I is set. The card is not detected due the tuning phase reports a
>> failure. Since the problem is only reproduced with a single model of a single
>> brand of card, it is probably a card firmware issue, but the card works fine
>> on my laptop.
> 
> I think you have analyzed many case..of course..it was successful to switch 
> voltage, right?
> Maybe this patch too old..so can you remember which specific sdcard is 
> produced?
> 

Yes it was successful to switch voltage. The specific card is an UNIREX 16GB 
Class 10
SD card (Compatible with UHS-1)

>>
>> The first attempt to fix this was a patch sent by Doug Anderson [1], but Alim
>> Akhtar found that this produced randomly a hung task on Peach-pi. I can 
>> confirm
>> that it's easy to reproduce the hung task, either, with cold boots or 
>> suspend to
>> ram tests.
> 
> Yep..I have already tested and checked for this.
> 
>>
>> I tried to fix both problems (the original issue and the one introduced by 
>> the
>> patch) in different ways, but I ended thinking that this second proposal is 
>> the
>> most simple that solves both issues. So let's try to fix this by handling the
>> response CRC error slightly differently when tuning command is happening.
>>
>> I tested the patch on both platforms, on exynos and on rockhip. I did lots of
>> tests and at the moment the patch seems to fix the rockchip issue and don't
>> hung on exynos. I'll continue testing meanwhile we discuss about it.
>>
>> I think the patch, at least, needs the Doug's approval (as he dig into the 
>> issue
>> before) and the Tested-by Alim. So will be good if you have a slot of time to
>> look a bit into this.
>>
>> Thanks in advance.
>>  Enric
>>
>> [1] https://lkml.org/lkml/2015/5/18/495
>>
>> Changelog since v1:
>> - Fix the issue found by Alim with exynos letting the data transfer
>>   take place only when MMC_SEND_TUNING_BLOCK is issued.
>>
>> Doug Anderson (1):
>>   mmc: dw_mmc: Wait for data transfer after response errors.
>>
>>  drivers/mmc/host/dw_mmc.c | 27 +++
>>  1 file changed, 27 insertions(+)
>>
>

Re: [PATCH V4 10/18] coresight: tmc: getting rid of multiple read access

2016-04-27 Thread Suzuki K Poulose


On 26/04/16 23:10, Mathieu Poirier wrote:

Allowing multiple readers to access the trace data simultaniously
via sysFS provides no shortage of opportunity for race condition,
mandates two variable to be maintained (drvdata::read_count and
drvdata::reading), makes the code complex and provide little
advantages, if any.

This patch streamlines the read process by restricting trace data
access to a single user.  That way drvdata::read_count can
be eliminated and race conditions (along with faulty error handling)
in function tmc_open() and tmc_release() eliminated.

Signed-off-by: Mathieu Poirier 


Nice!

Reviewed-by: Suzuki K Poulose 

Suzuki

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1129 matches

Mail list logo