date:20160310

[PATCH v8] mtd: spi-nor: add hisilicon spi-nor flash controller driver

2016-03-10 Thread Jiancheng Xue

Add hisilicon spi-nor flash controller driver

Signed-off-by: Binquan Peng 
Signed-off-by: Jiancheng Xue 
Acked-by: Rob Herring 
Reviewed-by: Ezequiel Garcia 
---
change log
v8:
Fixed issues pointed by Ezequiel Garcia and Brian Norris.
Moved dts binding file to mtd directory.
Changed the compatible string more specific.
v7:
Rebased to v4.5-rc3.
Fixed issues pointed by Ezequiel Garcia.
v6:
Based on v4.5-rc2
Fixed issues pointed by Ezequiel Garcia.
v5:
Fixed a compile error.
v4:
Rebased to v4.5-rc1
v3:
Added a compatible string "hisilicon,hi3519-sfc".
v2:
Fixed some compiling warings.

 .../bindings/mtd/hisilicon,fmc-spi-nor.txt |  24 +
 drivers/mtd/spi-nor/Kconfig|   7 +
 drivers/mtd/spi-nor/Makefile   |   1 +
 drivers/mtd/spi-nor/hisi-sfc.c | 500 +
 4 files changed, 532 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/mtd/hisilicon,fmc-spi-nor.txt
 create mode 100644 drivers/mtd/spi-nor/hisi-sfc.c

diff --git a/Documentation/devicetree/bindings/mtd/hisilicon,fmc-spi-nor.txt 
b/Documentation/devicetree/bindings/mtd/hisilicon,fmc-spi-nor.txt
new file mode 100644
index 000..7498152
--- /dev/null
+++ b/Documentation/devicetree/bindings/mtd/hisilicon,fmc-spi-nor.txt
@@ -0,0 +1,24 @@
+HiSilicon SPI-NOR Flash Controller
+
+Required properties:
+- compatible : Should be "hisilicon,fmc-spi-nor" and one of the following 
strings:
+   "hisilicon,hi3519-spi-nor"
+- address-cells : Should be 1.
+- size-cells : Should be 0.
+- reg : Offset and length of the register set for the controller device.
+- reg-names : Must include the following two entries: "control", "memory".
+- clocks : handle to spi-nor flash controller clock.
+
+Example:
+spi-nor-controller@1000 {
+   compatible = "hisilicon,hi3519-spi-nor", "hisilicon,fmc-spi-nor";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x1000 0x1000>, <0x1400 0x100>;
+   reg-names = "control", "memory";
+   clocks = < HI3519_FMC_CLK>;
+   spi-nor@0 {
+   compatible = "jedec,spi-nor";
+   reg = <0>;
+   };
+};
diff --git a/drivers/mtd/spi-nor/Kconfig b/drivers/mtd/spi-nor/Kconfig
index 0dc9275..120624d 100644
--- a/drivers/mtd/spi-nor/Kconfig
+++ b/drivers/mtd/spi-nor/Kconfig
@@ -37,6 +37,13 @@ config SPI_FSL_QUADSPI
  This controller does not support generic SPI. It only supports
  SPI NOR.
 
+config SPI_HISI_SFC
+   tristate "Hisilicon SPI-NOR Flash Controller(SFC)"
+   depends on ARCH_HISI || COMPILE_TEST
+   depends on HAS_IOMEM
+   help
+ This enables support for hisilicon SPI-NOR flash controller.
+
 config SPI_NXP_SPIFI
tristate "NXP SPI Flash Interface (SPIFI)"
depends on OF && (ARCH_LPC18XX || COMPILE_TEST)
diff --git a/drivers/mtd/spi-nor/Makefile b/drivers/mtd/spi-nor/Makefile
index 0bf3a7f8..8a6fa69 100644
--- a/drivers/mtd/spi-nor/Makefile
+++ b/drivers/mtd/spi-nor/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_MTD_SPI_NOR)  += spi-nor.o
 obj-$(CONFIG_SPI_FSL_QUADSPI)  += fsl-quadspi.o
+obj-$(CONFIG_SPI_HISI_SFC) += hisi-sfc.o
 obj-$(CONFIG_MTD_MT81xx_NOR)+= mtk-quadspi.o
 obj-$(CONFIG_SPI_NXP_SPIFI)+= nxp-spifi.o
diff --git a/drivers/mtd/spi-nor/hisi-sfc.c b/drivers/mtd/spi-nor/hisi-sfc.c
new file mode 100644
index 000..be56904
--- /dev/null
+++ b/drivers/mtd/spi-nor/hisi-sfc.c
@@ -0,0 +1,500 @@
+/*
+ * HiSilicon SPI Nor Flash Controller Driver
+ *
+ * Copyright (c) 2015-2016 HiSilicon Technologies Co., Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Hardware register offsets and field definitions */
+#define FMC_CFG0x00
+#define SPI_NOR_ADDR_MODE  BIT(10)
+#define FMC_GLOBAL_CFG 0x04
+#define FMC_GLOBAL_CFG_WP_ENABLE   BIT(6)
+#define FMC_SPI_TIMING_CFG 0x08
+#define TIMING_CFG_TCSH(nr)(((nr) & 0xf) << 8)
+#define TIMING_CFG_TCSS(nr)(((nr) & 0xf) << 4)
+#define TIMING_CFG_TSHSL(nr)   ((nr) & 0xf)
+#define CS_HOLD_TIME   0x6
+#define CS_SETUP_TIME  0x6

[PATCH v8] mtd: spi-nor: add hisilicon spi-nor flash controller driver

2016-03-10 Thread Jiancheng Xue

Add hisilicon spi-nor flash controller driver

Signed-off-by: Binquan Peng 
Signed-off-by: Jiancheng Xue 
Acked-by: Rob Herring 
Reviewed-by: Ezequiel Garcia 
---
change log
v8:
Fixed issues pointed by Ezequiel Garcia and Brian Norris.
Moved dts binding file to mtd directory.
Changed the compatible string more specific.
v7:
Rebased to v4.5-rc3.
Fixed issues pointed by Ezequiel Garcia.
v6:
Based on v4.5-rc2
Fixed issues pointed by Ezequiel Garcia.
v5:
Fixed a compile error.
v4:
Rebased to v4.5-rc1
v3:
Added a compatible string "hisilicon,hi3519-sfc".
v2:
Fixed some compiling warings.

 .../bindings/mtd/hisilicon,fmc-spi-nor.txt |  24 +
 drivers/mtd/spi-nor/Kconfig|   7 +
 drivers/mtd/spi-nor/Makefile   |   1 +
 drivers/mtd/spi-nor/hisi-sfc.c | 500 +
 4 files changed, 532 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/mtd/hisilicon,fmc-spi-nor.txt
 create mode 100644 drivers/mtd/spi-nor/hisi-sfc.c

diff --git a/Documentation/devicetree/bindings/mtd/hisilicon,fmc-spi-nor.txt 
b/Documentation/devicetree/bindings/mtd/hisilicon,fmc-spi-nor.txt
new file mode 100644
index 000..7498152
--- /dev/null
+++ b/Documentation/devicetree/bindings/mtd/hisilicon,fmc-spi-nor.txt
@@ -0,0 +1,24 @@
+HiSilicon SPI-NOR Flash Controller
+
+Required properties:
+- compatible : Should be "hisilicon,fmc-spi-nor" and one of the following 
strings:
+   "hisilicon,hi3519-spi-nor"
+- address-cells : Should be 1.
+- size-cells : Should be 0.
+- reg : Offset and length of the register set for the controller device.
+- reg-names : Must include the following two entries: "control", "memory".
+- clocks : handle to spi-nor flash controller clock.
+
+Example:
+spi-nor-controller@1000 {
+   compatible = "hisilicon,hi3519-spi-nor", "hisilicon,fmc-spi-nor";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x1000 0x1000>, <0x1400 0x100>;
+   reg-names = "control", "memory";
+   clocks = < HI3519_FMC_CLK>;
+   spi-nor@0 {
+   compatible = "jedec,spi-nor";
+   reg = <0>;
+   };
+};
diff --git a/drivers/mtd/spi-nor/Kconfig b/drivers/mtd/spi-nor/Kconfig
index 0dc9275..120624d 100644
--- a/drivers/mtd/spi-nor/Kconfig
+++ b/drivers/mtd/spi-nor/Kconfig
@@ -37,6 +37,13 @@ config SPI_FSL_QUADSPI
  This controller does not support generic SPI. It only supports
  SPI NOR.
 
+config SPI_HISI_SFC
+   tristate "Hisilicon SPI-NOR Flash Controller(SFC)"
+   depends on ARCH_HISI || COMPILE_TEST
+   depends on HAS_IOMEM
+   help
+ This enables support for hisilicon SPI-NOR flash controller.
+
 config SPI_NXP_SPIFI
tristate "NXP SPI Flash Interface (SPIFI)"
depends on OF && (ARCH_LPC18XX || COMPILE_TEST)
diff --git a/drivers/mtd/spi-nor/Makefile b/drivers/mtd/spi-nor/Makefile
index 0bf3a7f8..8a6fa69 100644
--- a/drivers/mtd/spi-nor/Makefile
+++ b/drivers/mtd/spi-nor/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_MTD_SPI_NOR)  += spi-nor.o
 obj-$(CONFIG_SPI_FSL_QUADSPI)  += fsl-quadspi.o
+obj-$(CONFIG_SPI_HISI_SFC) += hisi-sfc.o
 obj-$(CONFIG_MTD_MT81xx_NOR)+= mtk-quadspi.o
 obj-$(CONFIG_SPI_NXP_SPIFI)+= nxp-spifi.o
diff --git a/drivers/mtd/spi-nor/hisi-sfc.c b/drivers/mtd/spi-nor/hisi-sfc.c
new file mode 100644
index 000..be56904
--- /dev/null
+++ b/drivers/mtd/spi-nor/hisi-sfc.c
@@ -0,0 +1,500 @@
+/*
+ * HiSilicon SPI Nor Flash Controller Driver
+ *
+ * Copyright (c) 2015-2016 HiSilicon Technologies Co., Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Hardware register offsets and field definitions */
+#define FMC_CFG0x00
+#define SPI_NOR_ADDR_MODE  BIT(10)
+#define FMC_GLOBAL_CFG 0x04
+#define FMC_GLOBAL_CFG_WP_ENABLE   BIT(6)
+#define FMC_SPI_TIMING_CFG 0x08
+#define TIMING_CFG_TCSH(nr)(((nr) & 0xf) << 8)
+#define TIMING_CFG_TCSS(nr)(((nr) & 0xf) << 4)
+#define TIMING_CFG_TSHSL(nr)   ((nr) & 0xf)
+#define CS_HOLD_TIME   0x6
+#define CS_SETUP_TIME  0x6
+#define CS_DESELECT_TIME   0xf
+#define FMC_INT0x18

Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage

2016-03-10 Thread Michal Hocko

On Thu 10-03-16 15:50:13, Johannes Weiner wrote:
> When setting memory.high below usage, nothing happens until the next
> charge comes along, and then it will only reclaim its own charge and
> not the now potentially huge excess of the new memory.high. This can
> cause groups to stay in excess of their memory.high indefinitely.
> 
> To fix that, when shrinking memory.high, kick off a reclaim cycle that
> goes after the delta.

This has been the case since the knob was introduce but I wouldn't
bother with the CC: stable # 4.0+ as this was still in experimental
mode. I guess we want to have it in 4.5 or put it to 4.5 stable.

> 
> Signed-off-by: Johannes Weiner 

Acked-by: Michal Hocko 

Thanks!

> ---
>  mm/memcontrol.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8615b066b642..f7c9b4cbdf01 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4992,6 +4992,7 @@ static ssize_t memory_high_write(struct 
> kernfs_open_file *of,
>char *buf, size_t nbytes, loff_t off)
>  {
>   struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> + unsigned long nr_pages;
>   unsigned long high;
>   int err;
>  
> @@ -5002,6 +5003,11 @@ static ssize_t memory_high_write(struct 
> kernfs_open_file *of,
>  
>   memcg->high = high;
>  
> + nr_pages = page_counter_read(>memory);
> + if (nr_pages > high)
> + try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
> +  GFP_KERNEL, true);
> +
>   memcg_wb_domain_size_changed(memcg);
>   return nbytes;
>  }
> -- 
> 2.7.2

-- 
Michal Hocko
SUSE Labs

Re: [PATCH 1/2] [PATCH 1/2] Introduce new macros min_lt and max_lt for comparing with larger type

2016-03-10 Thread Dave Young

Hi, Jianyu

On 03/11/16 at 03:19pm, Jianyu Zhan wrote:
> On Fri, Mar 11, 2016 at 2:21 PM,   wrote:
> > A useful use case for min_t and max_t is comparing two values with larger
> > type. For example comparing an u64 and an u32, usually we do not want to
> > truncate the u64, so we need use min_t or max_t with u64.
> >
> > To simplify the usage introducing two more macros min_lt and max_lt,
> > 'lt' means larger type.
> >
> > Signed-off-by: Dave Young 
> > ---
> >  include/linux/kernel.h |   13 +
> >  1 file changed, 13 insertions(+)
> >
> > --- linux.orig/include/linux/kernel.h
> > +++ linux/include/linux/kernel.h
> > @@ -798,6 +798,19 @@ static inline void ftrace_dump(enum ftra
> > type __max2 = (y);  \
> > __max1 > __max2 ? __max1: __max2; })
> >
> > +/*
> > + * use type of larger size in min_lt and max_lt
> > + */
> > +#define min_lt(x, y) ({\
> > +   int sx = sizeof(typeof(x)); \
> > +   int sy = sizeof(typeof(y)); \
> > +   sx > sy ? min_t(typeof(x), x, y) : min_t(typeof(y), x, y); })
> > +
> > +#define max_lt(x, y) ({\
> > +   int sx = sizeof(typeof(x)); \
> > +   int sy = sizeof(typeof(y)); \
> > +   sx > sy ? max_t(typeof(x), x, y) : max_t(typeof(y), x, y); })
> > +
> >  /**
> >   * clamp_t - return a value clamped to a given range using a given type
> >   * @type: the type of variable to use
> >
> >
> 
> No no!
> 
> C standard has defined "usual arithmetic conversions" rules[1], which
> decides the type promotion rules in binary operators.
> 
> The interfaces in this patch just bluntly overrides this rule to
> choose the bigger type size
> for operation.  Most of time it might work well, because most time the
> operands used in min_t()/max_t() in Linux kernel
> have same sign'ness and this rule works.
> 
> But if two operands have same size type but have different different
> sign'ness,  this interfaces will exhibit Undefind Behavior,
> i.e.  you choose the typeof(y) as the final type to use in operation
> when they have the same type size,  so it might be signed
> or unsigned, depending on the type of y.

Oops, brain dead, I obviously missed the case, it is definitely a problem.

> 
> So,  in this /proc/fs/vmcore case you should rather just explicit cast
> the operand to avoid truncation.

Sure, will resend a fix

Thanks
Dave

Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage

2016-03-10 Thread Michal Hocko

On Thu 10-03-16 15:50:13, Johannes Weiner wrote:
> When setting memory.high below usage, nothing happens until the next
> charge comes along, and then it will only reclaim its own charge and
> not the now potentially huge excess of the new memory.high. This can
> cause groups to stay in excess of their memory.high indefinitely.
> 
> To fix that, when shrinking memory.high, kick off a reclaim cycle that
> goes after the delta.

This has been the case since the knob was introduce but I wouldn't
bother with the CC: stable # 4.0+ as this was still in experimental
mode. I guess we want to have it in 4.5 or put it to 4.5 stable.

> 
> Signed-off-by: Johannes Weiner 

Acked-by: Michal Hocko 

Thanks!

> ---
>  mm/memcontrol.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8615b066b642..f7c9b4cbdf01 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4992,6 +4992,7 @@ static ssize_t memory_high_write(struct 
> kernfs_open_file *of,
>char *buf, size_t nbytes, loff_t off)
>  {
>   struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> + unsigned long nr_pages;
>   unsigned long high;
>   int err;
>  
> @@ -5002,6 +5003,11 @@ static ssize_t memory_high_write(struct 
> kernfs_open_file *of,
>  
>   memcg->high = high;
>  
> + nr_pages = page_counter_read(>memory);
> + if (nr_pages > high)
> + try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
> +  GFP_KERNEL, true);
> +
>   memcg_wb_domain_size_changed(memcg);
>   return nbytes;
>  }
> -- 
> 2.7.2

-- 
Michal Hocko
SUSE Labs

Re: [PATCH 1/2] [PATCH 1/2] Introduce new macros min_lt and max_lt for comparing with larger type

2016-03-10 Thread Dave Young

Hi, Jianyu

On 03/11/16 at 03:19pm, Jianyu Zhan wrote:
> On Fri, Mar 11, 2016 at 2:21 PM,   wrote:
> > A useful use case for min_t and max_t is comparing two values with larger
> > type. For example comparing an u64 and an u32, usually we do not want to
> > truncate the u64, so we need use min_t or max_t with u64.
> >
> > To simplify the usage introducing two more macros min_lt and max_lt,
> > 'lt' means larger type.
> >
> > Signed-off-by: Dave Young 
> > ---
> >  include/linux/kernel.h |   13 +
> >  1 file changed, 13 insertions(+)
> >
> > --- linux.orig/include/linux/kernel.h
> > +++ linux/include/linux/kernel.h
> > @@ -798,6 +798,19 @@ static inline void ftrace_dump(enum ftra
> > type __max2 = (y);  \
> > __max1 > __max2 ? __max1: __max2; })
> >
> > +/*
> > + * use type of larger size in min_lt and max_lt
> > + */
> > +#define min_lt(x, y) ({\
> > +   int sx = sizeof(typeof(x)); \
> > +   int sy = sizeof(typeof(y)); \
> > +   sx > sy ? min_t(typeof(x), x, y) : min_t(typeof(y), x, y); })
> > +
> > +#define max_lt(x, y) ({\
> > +   int sx = sizeof(typeof(x)); \
> > +   int sy = sizeof(typeof(y)); \
> > +   sx > sy ? max_t(typeof(x), x, y) : max_t(typeof(y), x, y); })
> > +
> >  /**
> >   * clamp_t - return a value clamped to a given range using a given type
> >   * @type: the type of variable to use
> >
> >
> 
> No no!
> 
> C standard has defined "usual arithmetic conversions" rules[1], which
> decides the type promotion rules in binary operators.
> 
> The interfaces in this patch just bluntly overrides this rule to
> choose the bigger type size
> for operation.  Most of time it might work well, because most time the
> operands used in min_t()/max_t() in Linux kernel
> have same sign'ness and this rule works.
> 
> But if two operands have same size type but have different different
> sign'ness,  this interfaces will exhibit Undefind Behavior,
> i.e.  you choose the typeof(y) as the final type to use in operation
> when they have the same type size,  so it might be signed
> or unsigned, depending on the type of y.

Oops, brain dead, I obviously missed the case, it is definitely a problem.

> 
> So,  in this /proc/fs/vmcore case you should rather just explicit cast
> the operand to avoid truncation.

Sure, will resend a fix

Thanks
Dave

[patch] drm/amdkfd: uninitialized variable in dbgdev_wave_control_set_registers()

2016-03-10 Thread Dan Carpenter

At the end of the function we expect "status" to be zero, but it's
either -EINVAL or unitialized.

Fixes: 788bf83db301 ('drm/amdkfd: Add wave control operation to debugger')
Signed-off-by: Dan Carpenter 

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index c34c393..d5e19b5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -513,7 +513,7 @@ static int dbgdev_wave_control_set_registers(
union SQ_CMD_BITS *in_reg_sq_cmd,
union GRBM_GFX_INDEX_BITS *in_reg_gfx_index)
 {
-   int status;
+   int status = 0;
union SQ_CMD_BITS reg_sq_cmd;
union GRBM_GFX_INDEX_BITS reg_gfx_index;
struct HsaDbgWaveMsgAMDGen2 *pMsg;

[patch] drm/amdkfd: uninitialized variable in dbgdev_wave_control_set_registers()

2016-03-10 Thread Dan Carpenter

At the end of the function we expect "status" to be zero, but it's
either -EINVAL or unitialized.

Fixes: 788bf83db301 ('drm/amdkfd: Add wave control operation to debugger')
Signed-off-by: Dan Carpenter 

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index c34c393..d5e19b5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -513,7 +513,7 @@ static int dbgdev_wave_control_set_registers(
union SQ_CMD_BITS *in_reg_sq_cmd,
union GRBM_GFX_INDEX_BITS *in_reg_gfx_index)
 {
-   int status;
+   int status = 0;
union SQ_CMD_BITS reg_sq_cmd;
union GRBM_GFX_INDEX_BITS reg_gfx_index;
struct HsaDbgWaveMsgAMDGen2 *pMsg;

[PATCH 5/5] hwrng: exynos - Disable runtime PM on driver unbind

2016-03-10 Thread Krzysztof Kozlowski

Driver enabled runtime PM but did not revert this on removal. Re-binding
of a device triggered warning:
exynos-rng 10830400.rng: Unbalanced pm_runtime_enable!

Fixes: b329669ea0b5 ("hwrng: exynos - Add support for Exynos random number 
generator")
Signed-off-by: Krzysztof Kozlowski 
---
 drivers/char/hw_random/exynos-rng.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/char/hw_random/exynos-rng.c 
b/drivers/char/hw_random/exynos-rng.c
index 68c349bf66a0..cba1ff538c46 100644
--- a/drivers/char/hw_random/exynos-rng.c
+++ b/drivers/char/hw_random/exynos-rng.c
@@ -154,6 +154,13 @@ static int exynos_rng_probe(struct platform_device *pdev)
return ret;
 }
 
+static int exynos_rng_remove(struct platform_device *pdev)
+{
+   pm_runtime_disable(>dev);
+
+   return 0;
+}
+
 static int __maybe_unused exynos_rng_runtime_suspend(struct device *dev)
 {
struct platform_device *pdev = to_platform_device(dev);
@@ -211,6 +218,7 @@ static struct platform_driver exynos_rng_driver = {
.of_match_table = exynos_rng_dt_match,
},
.probe  = exynos_rng_probe,
+   .remove = exynos_rng_remove,
 };
 
 module_platform_driver(exynos_rng_driver);
-- 
2.5.0

[PATCH 5/5] hwrng: exynos - Disable runtime PM on driver unbind

2016-03-10 Thread Krzysztof Kozlowski

Driver enabled runtime PM but did not revert this on removal. Re-binding
of a device triggered warning:
exynos-rng 10830400.rng: Unbalanced pm_runtime_enable!

Fixes: b329669ea0b5 ("hwrng: exynos - Add support for Exynos random number 
generator")
Signed-off-by: Krzysztof Kozlowski 
---
 drivers/char/hw_random/exynos-rng.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/char/hw_random/exynos-rng.c 
b/drivers/char/hw_random/exynos-rng.c
index 68c349bf66a0..cba1ff538c46 100644
--- a/drivers/char/hw_random/exynos-rng.c
+++ b/drivers/char/hw_random/exynos-rng.c
@@ -154,6 +154,13 @@ static int exynos_rng_probe(struct platform_device *pdev)
return ret;
 }
 
+static int exynos_rng_remove(struct platform_device *pdev)
+{
+   pm_runtime_disable(>dev);
+
+   return 0;
+}
+
 static int __maybe_unused exynos_rng_runtime_suspend(struct device *dev)
 {
struct platform_device *pdev = to_platform_device(dev);
@@ -211,6 +218,7 @@ static struct platform_driver exynos_rng_driver = {
.of_match_table = exynos_rng_dt_match,
},
.probe  = exynos_rng_probe,
+   .remove = exynos_rng_remove,
 };
 
 module_platform_driver(exynos_rng_driver);
-- 
2.5.0

[PATCH 2/5] hwrng: exynos - Runtime suspend device after init

2016-03-10 Thread Krzysztof Kozlowski

The driver uses pm_runtime_put_noidle() after initialization so the
device might remain in active state if the core does not read from it
(the read callback contains regular runtime put). The put_noidle() was
chosen probably to avoid unneeded suspend and resume cycle after the
initialization.

However for this purpose autosuspend is enabled so it is safe to runtime
put just after the initialization.

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/char/hw_random/exynos-rng.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/exynos-rng.c 
b/drivers/char/hw_random/exynos-rng.c
index ada081232528..d1fd21e99368 100644
--- a/drivers/char/hw_random/exynos-rng.c
+++ b/drivers/char/hw_random/exynos-rng.c
@@ -77,7 +77,8 @@ static int exynos_init(struct hwrng *rng)
 
pm_runtime_get_sync(exynos_rng->dev);
ret = exynos_rng_configure(exynos_rng);
-   pm_runtime_put_noidle(exynos_rng->dev);
+   pm_runtime_mark_last_busy(exynos_rng->dev);
+   pm_runtime_put_autosuspend(exynos_rng->dev);
 
return ret;
 }
-- 
2.5.0

[PATCH 2/5] hwrng: exynos - Runtime suspend device after init

2016-03-10 Thread Krzysztof Kozlowski

The driver uses pm_runtime_put_noidle() after initialization so the
device might remain in active state if the core does not read from it
(the read callback contains regular runtime put). The put_noidle() was
chosen probably to avoid unneeded suspend and resume cycle after the
initialization.

However for this purpose autosuspend is enabled so it is safe to runtime
put just after the initialization.

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/char/hw_random/exynos-rng.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/exynos-rng.c 
b/drivers/char/hw_random/exynos-rng.c
index ada081232528..d1fd21e99368 100644
--- a/drivers/char/hw_random/exynos-rng.c
+++ b/drivers/char/hw_random/exynos-rng.c
@@ -77,7 +77,8 @@ static int exynos_init(struct hwrng *rng)
 
pm_runtime_get_sync(exynos_rng->dev);
ret = exynos_rng_configure(exynos_rng);
-   pm_runtime_put_noidle(exynos_rng->dev);
+   pm_runtime_mark_last_busy(exynos_rng->dev);
+   pm_runtime_put_autosuspend(exynos_rng->dev);
 
return ret;
 }
-- 
2.5.0

[PATCH 4/5] hwrng: exynos - Disable runtime PM on probe failure

2016-03-10 Thread Krzysztof Kozlowski

Add proper error path (for disabling runtime PM) when registering of
hwrng fails.

Fixes: b329669ea0b5 ("hwrng: exynos - Add support for Exynos random number 
generator")
Signed-off-by: Krzysztof Kozlowski 
---
 drivers/char/hw_random/exynos-rng.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/exynos-rng.c 
b/drivers/char/hw_random/exynos-rng.c
index 38b80f82ddd2..68c349bf66a0 100644
--- a/drivers/char/hw_random/exynos-rng.c
+++ b/drivers/char/hw_random/exynos-rng.c
@@ -119,6 +119,7 @@ static int exynos_rng_probe(struct platform_device *pdev)
 {
struct exynos_rng *exynos_rng;
struct resource *res;
+   int ret;
 
exynos_rng = devm_kzalloc(>dev, sizeof(struct exynos_rng),
GFP_KERNEL);
@@ -146,7 +147,11 @@ static int exynos_rng_probe(struct platform_device *pdev)
pm_runtime_use_autosuspend(>dev);
pm_runtime_enable(>dev);
 
-   return devm_hwrng_register(>dev, _rng->rng);
+   ret = devm_hwrng_register(>dev, _rng->rng);
+   if (ret)
+   pm_runtime_disable(>dev);
+
+   return ret;
 }
 
 static int __maybe_unused exynos_rng_runtime_suspend(struct device *dev)
-- 
2.5.0

[PATCH 4/5] hwrng: exynos - Disable runtime PM on probe failure

2016-03-10 Thread Krzysztof Kozlowski

Add proper error path (for disabling runtime PM) when registering of
hwrng fails.

Fixes: b329669ea0b5 ("hwrng: exynos - Add support for Exynos random number 
generator")
Signed-off-by: Krzysztof Kozlowski 
---
 drivers/char/hw_random/exynos-rng.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/exynos-rng.c 
b/drivers/char/hw_random/exynos-rng.c
index 38b80f82ddd2..68c349bf66a0 100644
--- a/drivers/char/hw_random/exynos-rng.c
+++ b/drivers/char/hw_random/exynos-rng.c
@@ -119,6 +119,7 @@ static int exynos_rng_probe(struct platform_device *pdev)
 {
struct exynos_rng *exynos_rng;
struct resource *res;
+   int ret;
 
exynos_rng = devm_kzalloc(>dev, sizeof(struct exynos_rng),
GFP_KERNEL);
@@ -146,7 +147,11 @@ static int exynos_rng_probe(struct platform_device *pdev)
pm_runtime_use_autosuspend(>dev);
pm_runtime_enable(>dev);
 
-   return devm_hwrng_register(>dev, _rng->rng);
+   ret = devm_hwrng_register(>dev, _rng->rng);
+   if (ret)
+   pm_runtime_disable(>dev);
+
+   return ret;
 }
 
 static int __maybe_unused exynos_rng_runtime_suspend(struct device *dev)
-- 
2.5.0

[PATCH 1/5] hwrng: exynos - Hide PM functions with __maybe_unused

2016-03-10 Thread Krzysztof Kozlowski

Replace ifdef with __maybe_unused to silence compiler warning on when
SUSPEND=n and PM=y:

drivers/char/hw_random/exynos-rng.c:166:12: warning: ‘exynos_rng_suspend’ 
defined but not used [-Wunused-function]
 static int exynos_rng_suspend(struct device *dev)
^
drivers/char/hw_random/exynos-rng.c:171:12: warning: ‘exynos_rng_resume’ 
defined but not used [-Wunused-function]
 static int exynos_rng_resume(struct device *dev)

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/char/hw_random/exynos-rng.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/char/hw_random/exynos-rng.c 
b/drivers/char/hw_random/exynos-rng.c
index 30cf4623184f..ada081232528 100644
--- a/drivers/char/hw_random/exynos-rng.c
+++ b/drivers/char/hw_random/exynos-rng.c
@@ -144,8 +144,7 @@ static int exynos_rng_probe(struct platform_device *pdev)
return devm_hwrng_register(>dev, _rng->rng);
 }
 
-#ifdef CONFIG_PM
-static int exynos_rng_runtime_suspend(struct device *dev)
+static int __maybe_unused exynos_rng_runtime_suspend(struct device *dev)
 {
struct platform_device *pdev = to_platform_device(dev);
struct exynos_rng *exynos_rng = platform_get_drvdata(pdev);
@@ -155,7 +154,7 @@ static int exynos_rng_runtime_suspend(struct device *dev)
return 0;
 }
 
-static int exynos_rng_runtime_resume(struct device *dev)
+static int __maybe_unused exynos_rng_runtime_resume(struct device *dev)
 {
struct platform_device *pdev = to_platform_device(dev);
struct exynos_rng *exynos_rng = platform_get_drvdata(pdev);
@@ -163,12 +162,12 @@ static int exynos_rng_runtime_resume(struct device *dev)
return clk_prepare_enable(exynos_rng->clk);
 }
 
-static int exynos_rng_suspend(struct device *dev)
+static int __maybe_unused exynos_rng_suspend(struct device *dev)
 {
return pm_runtime_force_suspend(dev);
 }
 
-static int exynos_rng_resume(struct device *dev)
+static int __maybe_unused exynos_rng_resume(struct device *dev)
 {
struct platform_device *pdev = to_platform_device(dev);
struct exynos_rng *exynos_rng = platform_get_drvdata(pdev);
@@ -180,7 +179,6 @@ static int exynos_rng_resume(struct device *dev)
 
return exynos_rng_configure(exynos_rng);
 }
-#endif
 
 static const struct dev_pm_ops exynos_rng_pm_ops = {
SET_SYSTEM_SLEEP_PM_OPS(exynos_rng_suspend, exynos_rng_resume)
-- 
2.5.0

[PATCH 3/5] hwrng: exynos - Fix unbalanced PM runtime put on timeout error path

2016-03-10 Thread Krzysztof Kozlowski

In case of timeout during read operation, the exit path lacked PM
runtime put. This could lead to unbalanced runtime PM usage counter thus
leaving the device in an active state.

Fixes: d7fd6075a205 ("hwrng: exynos - Add timeout for waiting on init done")
Cc:  # v4.4+
Signed-off-by: Krzysztof Kozlowski 
---
 drivers/char/hw_random/exynos-rng.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/exynos-rng.c 
b/drivers/char/hw_random/exynos-rng.c
index d1fd21e99368..38b80f82ddd2 100644
--- a/drivers/char/hw_random/exynos-rng.c
+++ b/drivers/char/hw_random/exynos-rng.c
@@ -90,6 +90,7 @@ static int exynos_read(struct hwrng *rng, void *buf,
struct exynos_rng, rng);
u32 *data = buf;
int retry = 100;
+   int ret = 4;
 
pm_runtime_get_sync(exynos_rng->dev);
 
@@ -98,17 +99,20 @@ static int exynos_read(struct hwrng *rng, void *buf,
while (!(exynos_rng_readl(exynos_rng,
EXYNOS_PRNG_STATUS_OFFSET) & PRNG_DONE) && --retry)
cpu_relax();
-   if (!retry)
-   return -ETIMEDOUT;
+   if (!retry) {
+   ret = -ETIMEDOUT;
+   goto out;
+   }
 
exynos_rng_writel(exynos_rng, PRNG_DONE, EXYNOS_PRNG_STATUS_OFFSET);
 
*data = exynos_rng_readl(exynos_rng, EXYNOS_PRNG_OUT1_OFFSET);
 
+out:
pm_runtime_mark_last_busy(exynos_rng->dev);
pm_runtime_put_sync_autosuspend(exynos_rng->dev);
 
-   return 4;
+   return ret;
 }
 
 static int exynos_rng_probe(struct platform_device *pdev)
-- 
2.5.0

[PATCH 1/5] hwrng: exynos - Hide PM functions with __maybe_unused

2016-03-10 Thread Krzysztof Kozlowski

Replace ifdef with __maybe_unused to silence compiler warning on when
SUSPEND=n and PM=y:

drivers/char/hw_random/exynos-rng.c:166:12: warning: ‘exynos_rng_suspend’ 
defined but not used [-Wunused-function]
 static int exynos_rng_suspend(struct device *dev)
^
drivers/char/hw_random/exynos-rng.c:171:12: warning: ‘exynos_rng_resume’ 
defined but not used [-Wunused-function]
 static int exynos_rng_resume(struct device *dev)

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/char/hw_random/exynos-rng.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/char/hw_random/exynos-rng.c 
b/drivers/char/hw_random/exynos-rng.c
index 30cf4623184f..ada081232528 100644
--- a/drivers/char/hw_random/exynos-rng.c
+++ b/drivers/char/hw_random/exynos-rng.c
@@ -144,8 +144,7 @@ static int exynos_rng_probe(struct platform_device *pdev)
return devm_hwrng_register(>dev, _rng->rng);
 }
 
-#ifdef CONFIG_PM
-static int exynos_rng_runtime_suspend(struct device *dev)
+static int __maybe_unused exynos_rng_runtime_suspend(struct device *dev)
 {
struct platform_device *pdev = to_platform_device(dev);
struct exynos_rng *exynos_rng = platform_get_drvdata(pdev);
@@ -155,7 +154,7 @@ static int exynos_rng_runtime_suspend(struct device *dev)
return 0;
 }
 
-static int exynos_rng_runtime_resume(struct device *dev)
+static int __maybe_unused exynos_rng_runtime_resume(struct device *dev)
 {
struct platform_device *pdev = to_platform_device(dev);
struct exynos_rng *exynos_rng = platform_get_drvdata(pdev);
@@ -163,12 +162,12 @@ static int exynos_rng_runtime_resume(struct device *dev)
return clk_prepare_enable(exynos_rng->clk);
 }
 
-static int exynos_rng_suspend(struct device *dev)
+static int __maybe_unused exynos_rng_suspend(struct device *dev)
 {
return pm_runtime_force_suspend(dev);
 }
 
-static int exynos_rng_resume(struct device *dev)
+static int __maybe_unused exynos_rng_resume(struct device *dev)
 {
struct platform_device *pdev = to_platform_device(dev);
struct exynos_rng *exynos_rng = platform_get_drvdata(pdev);
@@ -180,7 +179,6 @@ static int exynos_rng_resume(struct device *dev)
 
return exynos_rng_configure(exynos_rng);
 }
-#endif
 
 static const struct dev_pm_ops exynos_rng_pm_ops = {
SET_SYSTEM_SLEEP_PM_OPS(exynos_rng_suspend, exynos_rng_resume)
-- 
2.5.0

[PATCH 3/5] hwrng: exynos - Fix unbalanced PM runtime put on timeout error path

2016-03-10 Thread Krzysztof Kozlowski

In case of timeout during read operation, the exit path lacked PM
runtime put. This could lead to unbalanced runtime PM usage counter thus
leaving the device in an active state.

Fixes: d7fd6075a205 ("hwrng: exynos - Add timeout for waiting on init done")
Cc:  # v4.4+
Signed-off-by: Krzysztof Kozlowski 
---
 drivers/char/hw_random/exynos-rng.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/exynos-rng.c 
b/drivers/char/hw_random/exynos-rng.c
index d1fd21e99368..38b80f82ddd2 100644
--- a/drivers/char/hw_random/exynos-rng.c
+++ b/drivers/char/hw_random/exynos-rng.c
@@ -90,6 +90,7 @@ static int exynos_read(struct hwrng *rng, void *buf,
struct exynos_rng, rng);
u32 *data = buf;
int retry = 100;
+   int ret = 4;
 
pm_runtime_get_sync(exynos_rng->dev);
 
@@ -98,17 +99,20 @@ static int exynos_read(struct hwrng *rng, void *buf,
while (!(exynos_rng_readl(exynos_rng,
EXYNOS_PRNG_STATUS_OFFSET) & PRNG_DONE) && --retry)
cpu_relax();
-   if (!retry)
-   return -ETIMEDOUT;
+   if (!retry) {
+   ret = -ETIMEDOUT;
+   goto out;
+   }
 
exynos_rng_writel(exynos_rng, PRNG_DONE, EXYNOS_PRNG_STATUS_OFFSET);
 
*data = exynos_rng_readl(exynos_rng, EXYNOS_PRNG_OUT1_OFFSET);
 
+out:
pm_runtime_mark_last_busy(exynos_rng->dev);
pm_runtime_put_sync_autosuspend(exynos_rng->dev);
 
-   return 4;
+   return ret;
 }
 
 static int exynos_rng_probe(struct platform_device *pdev)
-- 
2.5.0

Re: [PATCH 2/2] [PATCH 2/2] proc-vmcore: wrong data type casting fix

2016-03-10 Thread Dave Young

Hi, Minfei

On 03/11/16 at 03:19pm, Minfei Huang wrote:
> On 03/11/16 at 02:21pm, dyo...@redhat.com wrote:
> > @@ -231,7 +231,8 @@ static ssize_t __read_vmcore(char *buffe
> >  
> > list_for_each_entry(m, _list, list) {
> > if (*fpos < m->offset + m->size) {
> > -   tsz = min_t(size_t, m->offset + m->size - *fpos, 
> > buflen);
> > +   tsz = (size_t)min_lt(m->offset + m->size - *fpos,
> > +   buflen);
> > start = m->paddr + *fpos - m->offset;
> > tmp = read_from_oldmem(buffer, tsz, , userbuf);
> > if (tmp < 0)
> > @@ -461,7 +462,7 @@ static int mmap_vmcore(struct file *file
> > if (start < m->offset + m->size) {
> > u64 paddr = 0;
> >  
> > -   tsz = min_t(size_t, m->offset + m->size - start, size);
> > +   tsz = (size_t)min_lt(m->offset + m->size - start, size);
> > paddr = m->paddr + start - m->offset;
> > if (vmcore_remap_oldmem_pfn(vma, vma->vm_start + len,
> > paddr >> PAGE_SHIFT, tsz,
> 
> Hi, Dave.
> 
> Seems the previous parameter is unsigned long long, and the later one is
> size_t. The size of both these types doesn't change in running time, why
> not use min_t(unsigned long long, a, b) instead?

Just want a common macro so it can benefit other users so that we can simply
use it without knowning type details and avoid such similar bugs also.

Thanks
Dave

Re: [PATCH 2/2] [PATCH 2/2] proc-vmcore: wrong data type casting fix

2016-03-10 Thread Dave Young

Hi, Minfei

On 03/11/16 at 03:19pm, Minfei Huang wrote:
> On 03/11/16 at 02:21pm, dyo...@redhat.com wrote:
> > @@ -231,7 +231,8 @@ static ssize_t __read_vmcore(char *buffe
> >  
> > list_for_each_entry(m, _list, list) {
> > if (*fpos < m->offset + m->size) {
> > -   tsz = min_t(size_t, m->offset + m->size - *fpos, 
> > buflen);
> > +   tsz = (size_t)min_lt(m->offset + m->size - *fpos,
> > +   buflen);
> > start = m->paddr + *fpos - m->offset;
> > tmp = read_from_oldmem(buffer, tsz, , userbuf);
> > if (tmp < 0)
> > @@ -461,7 +462,7 @@ static int mmap_vmcore(struct file *file
> > if (start < m->offset + m->size) {
> > u64 paddr = 0;
> >  
> > -   tsz = min_t(size_t, m->offset + m->size - start, size);
> > +   tsz = (size_t)min_lt(m->offset + m->size - start, size);
> > paddr = m->paddr + start - m->offset;
> > if (vmcore_remap_oldmem_pfn(vma, vma->vm_start + len,
> > paddr >> PAGE_SHIFT, tsz,
> 
> Hi, Dave.
> 
> Seems the previous parameter is unsigned long long, and the later one is
> size_t. The size of both these types doesn't change in running time, why
> not use min_t(unsigned long long, a, b) instead?

Just want a common macro so it can benefit other users so that we can simply
use it without knowning type details and avoid such similar bugs also.

Thanks
Dave

Re: [PATCH] dmaengine: pl330: Fix some race conditions in residue calculation

2016-03-10 Thread Vinod Koul

On Tue, Mar 08, 2016 at 03:50:41PM +, Jon Medhurst (Tixy) wrote:

> > The reside is requested for "a descriptor". For example if you have prepared
> > two descriptors A and B and submitted them, then you can request status and
> > reside for A and you need to calculate that for A only and not take into
> > account status of B
> 
> But, in the case of the pl330 driver, A and B may each consist of
> multiple internal/hidden descriptors. So the residue calculation has to
> sum up the residue of all these internal/hidden descriptors as well.
> This is what the current pl330_tx_status() function does, but has bugs.
> 
> I've only just managed to clearly understand all the above details
> whilst writing this email, and this confusion obviously means the code
> and any commit messages need to explain things better.

Okay I relooked at the code, the driver is creating descriptor per period,
so yes while calculating that should be taken into account but only for
cyclic case and not for rest, as it will break reside values for for them.

-- 
~Vinod

КЛИЕНТСКИЕ БАЗЫ! Тел\Viber\Whatsapp: +79133913837 Email: mgordee...@gmail.com Skype: prodawez389

2016-03-10 Thread linux-kernel@vger.kernel.org

КЛИЕНТСКИЕ БАЗЫ!

Соберем для Вас по интернет базу данных потенциальных клиентов 
для Вашего Бизнеса!
Много! Быстро! Недорого! 
Узнайте об этом подробнее по 
Тел: +79133913837
Viber: +79133913837
Whatsapp: +79133913837
Skype: prodawez389
Email: mgordee...@gmail.com

Re: [PATCH] dmaengine: pl330: Fix some race conditions in residue calculation

2016-03-10 Thread Vinod Koul

On Tue, Mar 08, 2016 at 03:50:41PM +, Jon Medhurst (Tixy) wrote:

> > The reside is requested for "a descriptor". For example if you have prepared
> > two descriptors A and B and submitted them, then you can request status and
> > reside for A and you need to calculate that for A only and not take into
> > account status of B
> 
> But, in the case of the pl330 driver, A and B may each consist of
> multiple internal/hidden descriptors. So the residue calculation has to
> sum up the residue of all these internal/hidden descriptors as well.
> This is what the current pl330_tx_status() function does, but has bugs.
> 
> I've only just managed to clearly understand all the above details
> whilst writing this email, and this confusion obviously means the code
> and any commit messages need to explain things better.

Okay I relooked at the code, the driver is creating descriptor per period,
so yes while calculating that should be taken into account but only for
cyclic case and not for rest, as it will break reside values for for them.

-- 
~Vinod

КЛИЕНТСКИЕ БАЗЫ! Тел\Viber\Whatsapp: +79133913837 Email: mgordee...@gmail.com Skype: prodawez389

2016-03-10 Thread linux-kernel@vger.kernel.org

КЛИЕНТСКИЕ БАЗЫ!

Соберем для Вас по интернет базу данных потенциальных клиентов 
для Вашего Бизнеса!
Много! Быстро! Недорого! 
Узнайте об этом подробнее по 
Тел: +79133913837
Viber: +79133913837
Whatsapp: +79133913837
Skype: prodawez389
Email: mgordee...@gmail.com

Re: [PATCH] Documentation: Howto: Fixed subtitles style

2016-03-10 Thread Philippe Loctaux

On Thu, Mar 10, 2016 at 09:55:25PM -0700, Jonathan Corbet wrote:
> Not on kernel.org.  From MAINTAINERS:
> 
>   T:  git git://git.lwn.net/linux.git docs-next

Allright thanks :)

--
Philippe Loctaux

Re: [PATCH] Documentation: Howto: Fixed subtitles style

2016-03-10 Thread Philippe Loctaux

On Thu, Mar 10, 2016 at 09:55:25PM -0700, Jonathan Corbet wrote:
> Not on kernel.org.  From MAINTAINERS:
> 
>   T:  git git://git.lwn.net/linux.git docs-next

Allright thanks :)

--
Philippe Loctaux

[PATCH v1 12/19] zsmalloc: move struct zs_meta from mapping to freelist

2016-03-10 Thread Minchan Kim

For supporting migration from VM, we need to have address_space
on every page so zsmalloc shouldn't use page->mapping. So,
this patch moves zs_meta from mapping to freelist.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e23cd3b2dd71..bfc6a048afac 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -29,7 +29,7 @@
  * Look at size_class->huge.
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
- * page->mapping: override by struct zs_meta
+ * page->freelist: override by struct zs_meta
  *
  * Usage of struct page flags:
  * PG_private: identifies the first component page
@@ -419,7 +419,7 @@ static int get_zspage_inuse(struct page *first_page)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
 
return m->inuse;
 }
@@ -430,7 +430,7 @@ static void set_zspage_inuse(struct page *first_page, int 
val)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
m->inuse = val;
 }
 
@@ -440,7 +440,7 @@ static void mod_zspage_inuse(struct page *first_page, int 
val)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
m->inuse += val;
 }
 
@@ -450,7 +450,7 @@ static void set_freeobj(struct page *first_page, int idx)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
m->freeobj = idx;
 }
 
@@ -460,7 +460,7 @@ static unsigned long get_freeobj(struct page *first_page)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
return m->freeobj;
 }
 
@@ -472,7 +472,7 @@ static void get_zspage_mapping(struct page *first_page,
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
*fullness = m->fullness;
*class_idx = m->class;
 }
@@ -485,7 +485,7 @@ static void set_zspage_mapping(struct page *first_page,
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
m->fullness = fullness;
m->class = class_idx;
 }
@@ -941,7 +941,7 @@ static void reset_page(struct page *page)
clear_bit(PG_private, >flags);
clear_bit(PG_private_2, >flags);
set_page_private(page, 0);
-   page->mapping = NULL;
+   page->freelist = NULL;
page_mapcount_reset(page);
 }
 
@@ -1051,6 +1051,7 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
 
INIT_LIST_HEAD(>lru);
if (i == 0) {   /* first page */
+   page->freelist = NULL;
SetPagePrivate(page);
set_page_private(page, 0);
first_page = page;
@@ -2066,9 +2067,9 @@ static int __init zs_init(void)
 
/*
 * A zspage's a free object index, class index, fullness group,
-* inuse object count are encoded in its (first)page->mapping
+* inuse object count are encoded in its (first)page->freelist
 * so sizeof(struct zs_meta) should be less than
-* sizeof(page->mapping(i.e., unsigned long)).
+* sizeof(page->freelist(i.e., void *)).
 */
BUILD_BUG_ON(sizeof(struct zs_meta) > sizeof(unsigned long));
 
-- 
1.9.1

[PATCH v1 04/19] mm/balloon: use general movable page feature into balloon

2016-03-10 Thread Minchan Kim

Now, VM has a feature to migrate non-lru movable pages so
balloon doesn't need custom migration hooks in migrate.c
and compact.c. Instead, this patch implements page->mapping
->{isolate|migrate|putback} functions.

With that, we could remove hooks for ballooning in general
migration functions and make balloon compaction simple.

Cc: virtualizat...@lists.linux-foundation.org
Cc: Rafael Aquini 
Cc: Konstantin Khlebnikov 
Signed-off-by: Gioh Kim 
Signed-off-by: Minchan Kim 
---
 drivers/virtio/virtio_balloon.c|   4 ++
 include/linux/balloon_compaction.h |  47 -
 include/linux/page-flags.h |  53 +++
 mm/balloon_compaction.c| 101 -
 mm/compaction.c|   7 ---
 mm/migrate.c   |  22 ++--
 mm/vmscan.c|   2 +-
 7 files changed, 73 insertions(+), 163 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0c3691f46575..30a1ea31bef4 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -476,6 +477,7 @@ static int virtballoon_migratepage(struct balloon_dev_info 
*vb_dev_info,
 
mutex_unlock(>balloon_lock);
 
+   ClearPageIsolated(page);
put_page(page); /* balloon reference */
 
return MIGRATEPAGE_SUCCESS;
@@ -509,6 +511,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
balloon_devinfo_init(>vb_dev_info);
 #ifdef CONFIG_BALLOON_COMPACTION
vb->vb_dev_info.migratepage = virtballoon_migratepage;
+   vb->vb_dev_info.inode = anon_inode_new();
+   vb->vb_dev_info.inode->i_mapping->a_ops = _aops;
 #endif
 
err = init_vqs(vb);
diff --git a/include/linux/balloon_compaction.h 
b/include/linux/balloon_compaction.h
index 9b0a15d06a4f..43a858545844 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Balloon device information descriptor.
@@ -62,6 +63,7 @@ struct balloon_dev_info {
struct list_head pages; /* Pages enqueued & handled to Host */
int (*migratepage)(struct balloon_dev_info *, struct page *newpage,
struct page *page, enum migrate_mode mode);
+   struct inode *inode;
 };
 
 extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
@@ -73,45 +75,19 @@ static inline void balloon_devinfo_init(struct 
balloon_dev_info *balloon)
spin_lock_init(>pages_lock);
INIT_LIST_HEAD(>pages);
balloon->migratepage = NULL;
+   balloon->inode = NULL;
 }
 
 #ifdef CONFIG_BALLOON_COMPACTION
-extern bool balloon_page_isolate(struct page *page);
+extern const struct address_space_operations balloon_aops;
+extern bool balloon_page_isolate(struct page *page,
+   isolate_mode_t mode);
 extern void balloon_page_putback(struct page *page);
-extern int balloon_page_migrate(struct page *newpage,
+extern int balloon_page_migrate(struct address_space *mapping,
+   struct page *newpage,
struct page *page, enum migrate_mode mode);
 
 /*
- * __is_movable_balloon_page - helper to perform @page PageBalloon tests
- */
-static inline bool __is_movable_balloon_page(struct page *page)
-{
-   return PageBalloon(page);
-}
-
-/*
- * balloon_page_movable - test PageBalloon to identify balloon pages
- *   and PagePrivate to check that the page is not
- *   isolated and can be moved by compaction/migration.
- *
- * As we might return false positives in the case of a balloon page being just
- * released under us, this need to be re-tested later, under the page lock.
- */
-static inline bool balloon_page_movable(struct page *page)
-{
-   return PageBalloon(page) && PagePrivate(page);
-}
-
-/*
- * isolated_balloon_page - identify an isolated balloon page on private
- *compaction/migration page lists.
- */
-static inline bool isolated_balloon_page(struct page *page)
-{
-   return PageBalloon(page);
-}
-
-/*
  * balloon_page_insert - insert a page into the balloon's page list and make
  *  the page->private assignment accordingly.
  * @balloon : pointer to balloon device
@@ -123,8 +99,8 @@ static inline bool isolated_balloon_page(struct page *page)
 static inline void balloon_page_insert(struct balloon_dev_info *balloon,
   struct page *page)
 {
+   page->mapping = balloon->inode->i_mapping;
__SetPageBalloon(page);
-   SetPagePrivate(page);
set_page_private(page, (unsigned long)balloon);
list_add(>lru,

[PATCH v1 08/19] zsmalloc: remove unused pool param in obj_free

2016-03-10 Thread Minchan Kim

Let's remove unused pool param in obj_free

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 156edf909046..b4fb11831acb 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1435,8 +1435,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 }
 EXPORT_SYMBOL_GPL(zs_malloc);
 
-static void obj_free(struct zs_pool *pool, struct size_class *class,
-   unsigned long obj)
+static void obj_free(struct size_class *class, unsigned long obj)
 {
struct link_free *link;
struct page *first_page, *f_page;
@@ -1482,7 +1481,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
class = pool->size_class[class_idx];
 
spin_lock(>lock);
-   obj_free(pool, class, obj);
+   obj_free(class, obj);
fullness = fix_fullness_group(class, first_page);
if (fullness == ZS_EMPTY) {
zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
@@ -1645,7 +1644,7 @@ static int migrate_zspage(struct zs_pool *pool, struct 
size_class *class,
free_obj |= BIT(HANDLE_PIN_BIT);
record_obj(handle, free_obj);
unpin_tag(handle);
-   obj_free(pool, class, used_obj);
+   obj_free(class, used_obj);
}
 
/* Remember last position in this iteration */
-- 
1.9.1

[PATCH v1 15/19] zsmalloc: zs_compact refactoring

2016-03-10 Thread Minchan Kim

Currently, we rely on class->lock to prevent zspage destruction.
It was okay until now because the critical section is short but
with run-time migration, it could be long so class->lock is not
a good apporach any more.

So, this patch introduces [un]freeze_zspage functions which
freeze allocated objects in the zspage with pinning tag so
user cannot free using object. With those functions, this patch
redesign compaction.

Those functions will be used for implementing zspage runtime
migrations, too.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 393 ++
 1 file changed, 257 insertions(+), 136 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 49ae6531b7ad..43ab16affa68 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -917,6 +917,13 @@ static unsigned long obj_to_head(struct size_class *class, 
struct page *page,
return *(unsigned long *)obj;
 }
 
+static inline int testpin_tag(unsigned long handle)
+{
+   unsigned long *ptr = (unsigned long *)handle;
+
+   return test_bit(HANDLE_PIN_BIT, ptr);
+}
+
 static inline int trypin_tag(unsigned long handle)
 {
unsigned long *ptr = (unsigned long *)handle;
@@ -945,8 +952,7 @@ static void reset_page(struct page *page)
page_mapcount_reset(page);
 }
 
-static void free_zspage(struct zs_pool *pool, struct size_class *class,
-   struct page *first_page)
+static void free_zspage(struct zs_pool *pool, struct page *first_page)
 {
struct page *nextp, *tmp, *head_extra;
 
@@ -969,11 +975,6 @@ static void free_zspage(struct zs_pool *pool, struct 
size_class *class,
}
reset_page(head_extra);
__free_page(head_extra);
-
-   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
-   class->size, class->pages_per_zspage));
-   atomic_long_sub(class->pages_per_zspage,
-   >pages_allocated);
 }
 
 /* Initialize a newly allocated zspage */
@@ -1323,6 +1324,11 @@ static bool zspage_full(struct size_class *class, struct 
page *first_page)
return get_zspage_inuse(first_page) == class->objs_per_zspage;
 }
 
+static bool zspage_empty(struct size_class *class, struct page *first_page)
+{
+   return get_zspage_inuse(first_page) == 0;
+}
+
 unsigned long zs_get_total_pages(struct zs_pool *pool)
 {
return atomic_long_read(>pages_allocated);
@@ -1453,7 +1459,6 @@ static unsigned long obj_malloc(struct size_class *class,
set_page_private(first_page, handle | OBJ_ALLOCATED_TAG);
kunmap_atomic(vaddr);
mod_zspage_inuse(first_page, 1);
-   zs_stat_inc(class, OBJ_USED, 1);
 
obj = location_to_obj(m_page, obj);
 
@@ -1508,6 +1513,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
}
 
obj = obj_malloc(class, first_page, handle);
+   zs_stat_inc(class, OBJ_USED, 1);
/* Now move the zspage to another fullness group, if required */
fix_fullness_group(class, first_page);
record_obj(handle, obj);
@@ -1538,7 +1544,6 @@ static void obj_free(struct size_class *class, unsigned 
long obj)
kunmap_atomic(vaddr);
set_freeobj(first_page, f_objidx);
mod_zspage_inuse(first_page, -1);
-   zs_stat_dec(class, OBJ_USED, 1);
 }
 
 void zs_free(struct zs_pool *pool, unsigned long handle)
@@ -1562,10 +1567,19 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 
spin_lock(>lock);
obj_free(class, obj);
+   zs_stat_dec(class, OBJ_USED, 1);
fullness = fix_fullness_group(class, first_page);
-   if (fullness == ZS_EMPTY)
-   free_zspage(pool, class, first_page);
+   if (fullness == ZS_EMPTY) {
+   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
+   class->size, class->pages_per_zspage));
+   spin_unlock(>lock);
+   atomic_long_sub(class->pages_per_zspage,
+   >pages_allocated);
+   free_zspage(pool, first_page);
+   goto out;
+   }
spin_unlock(>lock);
+out:
unpin_tag(handle);
 
free_handle(pool, handle);
@@ -1635,127 +1649,66 @@ static void zs_object_copy(struct size_class *class, 
unsigned long dst,
kunmap_atomic(s_addr);
 }
 
-/*
- * Find alloced object in zspage from index object and
- * return handle.
- */
-static unsigned long find_alloced_obj(struct size_class *class,
-   struct page *page, int index)
+static unsigned long handle_from_obj(struct size_class *class,
+   struct page *first_page, int obj_idx)
 {
-   unsigned long head;
-   int offset = 0;
-   unsigned long handle = 0;
-   void *addr = kmap_atomic(page);
-
-   if (!is_first_page(page))
-   offset = page->index;
-   offset += class->size * index;
-
-   while

[PATCH v1 08/19] zsmalloc: remove unused pool param in obj_free

2016-03-10 Thread Minchan Kim

Let's remove unused pool param in obj_free

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 156edf909046..b4fb11831acb 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1435,8 +1435,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 }
 EXPORT_SYMBOL_GPL(zs_malloc);
 
-static void obj_free(struct zs_pool *pool, struct size_class *class,
-   unsigned long obj)
+static void obj_free(struct size_class *class, unsigned long obj)
 {
struct link_free *link;
struct page *first_page, *f_page;
@@ -1482,7 +1481,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
class = pool->size_class[class_idx];
 
spin_lock(>lock);
-   obj_free(pool, class, obj);
+   obj_free(class, obj);
fullness = fix_fullness_group(class, first_page);
if (fullness == ZS_EMPTY) {
zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
@@ -1645,7 +1644,7 @@ static int migrate_zspage(struct zs_pool *pool, struct 
size_class *class,
free_obj |= BIT(HANDLE_PIN_BIT);
record_obj(handle, free_obj);
unpin_tag(handle);
-   obj_free(pool, class, used_obj);
+   obj_free(class, used_obj);
}
 
/* Remember last position in this iteration */
-- 
1.9.1

[PATCH v1 15/19] zsmalloc: zs_compact refactoring

2016-03-10 Thread Minchan Kim

Currently, we rely on class->lock to prevent zspage destruction.
It was okay until now because the critical section is short but
with run-time migration, it could be long so class->lock is not
a good apporach any more.

So, this patch introduces [un]freeze_zspage functions which
freeze allocated objects in the zspage with pinning tag so
user cannot free using object. With those functions, this patch
redesign compaction.

Those functions will be used for implementing zspage runtime
migrations, too.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 393 ++
 1 file changed, 257 insertions(+), 136 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 49ae6531b7ad..43ab16affa68 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -917,6 +917,13 @@ static unsigned long obj_to_head(struct size_class *class, 
struct page *page,
return *(unsigned long *)obj;
 }
 
+static inline int testpin_tag(unsigned long handle)
+{
+   unsigned long *ptr = (unsigned long *)handle;
+
+   return test_bit(HANDLE_PIN_BIT, ptr);
+}
+
 static inline int trypin_tag(unsigned long handle)
 {
unsigned long *ptr = (unsigned long *)handle;
@@ -945,8 +952,7 @@ static void reset_page(struct page *page)
page_mapcount_reset(page);
 }
 
-static void free_zspage(struct zs_pool *pool, struct size_class *class,
-   struct page *first_page)
+static void free_zspage(struct zs_pool *pool, struct page *first_page)
 {
struct page *nextp, *tmp, *head_extra;
 
@@ -969,11 +975,6 @@ static void free_zspage(struct zs_pool *pool, struct 
size_class *class,
}
reset_page(head_extra);
__free_page(head_extra);
-
-   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
-   class->size, class->pages_per_zspage));
-   atomic_long_sub(class->pages_per_zspage,
-   >pages_allocated);
 }
 
 /* Initialize a newly allocated zspage */
@@ -1323,6 +1324,11 @@ static bool zspage_full(struct size_class *class, struct 
page *first_page)
return get_zspage_inuse(first_page) == class->objs_per_zspage;
 }
 
+static bool zspage_empty(struct size_class *class, struct page *first_page)
+{
+   return get_zspage_inuse(first_page) == 0;
+}
+
 unsigned long zs_get_total_pages(struct zs_pool *pool)
 {
return atomic_long_read(>pages_allocated);
@@ -1453,7 +1459,6 @@ static unsigned long obj_malloc(struct size_class *class,
set_page_private(first_page, handle | OBJ_ALLOCATED_TAG);
kunmap_atomic(vaddr);
mod_zspage_inuse(first_page, 1);
-   zs_stat_inc(class, OBJ_USED, 1);
 
obj = location_to_obj(m_page, obj);
 
@@ -1508,6 +1513,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
}
 
obj = obj_malloc(class, first_page, handle);
+   zs_stat_inc(class, OBJ_USED, 1);
/* Now move the zspage to another fullness group, if required */
fix_fullness_group(class, first_page);
record_obj(handle, obj);
@@ -1538,7 +1544,6 @@ static void obj_free(struct size_class *class, unsigned 
long obj)
kunmap_atomic(vaddr);
set_freeobj(first_page, f_objidx);
mod_zspage_inuse(first_page, -1);
-   zs_stat_dec(class, OBJ_USED, 1);
 }
 
 void zs_free(struct zs_pool *pool, unsigned long handle)
@@ -1562,10 +1567,19 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 
spin_lock(>lock);
obj_free(class, obj);
+   zs_stat_dec(class, OBJ_USED, 1);
fullness = fix_fullness_group(class, first_page);
-   if (fullness == ZS_EMPTY)
-   free_zspage(pool, class, first_page);
+   if (fullness == ZS_EMPTY) {
+   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
+   class->size, class->pages_per_zspage));
+   spin_unlock(>lock);
+   atomic_long_sub(class->pages_per_zspage,
+   >pages_allocated);
+   free_zspage(pool, first_page);
+   goto out;
+   }
spin_unlock(>lock);
+out:
unpin_tag(handle);
 
free_handle(pool, handle);
@@ -1635,127 +1649,66 @@ static void zs_object_copy(struct size_class *class, 
unsigned long dst,
kunmap_atomic(s_addr);
 }
 
-/*
- * Find alloced object in zspage from index object and
- * return handle.
- */
-static unsigned long find_alloced_obj(struct size_class *class,
-   struct page *page, int index)
+static unsigned long handle_from_obj(struct size_class *class,
+   struct page *first_page, int obj_idx)
 {
-   unsigned long head;
-   int offset = 0;
-   unsigned long handle = 0;
-   void *addr = kmap_atomic(page);
-
-   if (!is_first_page(page))
-   offset = page->index;
-   offset += class->size * index;
-
-   while (offset <

[PATCH v1 12/19] zsmalloc: move struct zs_meta from mapping to freelist

2016-03-10 Thread Minchan Kim

For supporting migration from VM, we need to have address_space
on every page so zsmalloc shouldn't use page->mapping. So,
this patch moves zs_meta from mapping to freelist.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e23cd3b2dd71..bfc6a048afac 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -29,7 +29,7 @@
  * Look at size_class->huge.
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
- * page->mapping: override by struct zs_meta
+ * page->freelist: override by struct zs_meta
  *
  * Usage of struct page flags:
  * PG_private: identifies the first component page
@@ -419,7 +419,7 @@ static int get_zspage_inuse(struct page *first_page)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
 
return m->inuse;
 }
@@ -430,7 +430,7 @@ static void set_zspage_inuse(struct page *first_page, int 
val)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
m->inuse = val;
 }
 
@@ -440,7 +440,7 @@ static void mod_zspage_inuse(struct page *first_page, int 
val)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
m->inuse += val;
 }
 
@@ -450,7 +450,7 @@ static void set_freeobj(struct page *first_page, int idx)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
m->freeobj = idx;
 }
 
@@ -460,7 +460,7 @@ static unsigned long get_freeobj(struct page *first_page)
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
return m->freeobj;
 }
 
@@ -472,7 +472,7 @@ static void get_zspage_mapping(struct page *first_page,
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
*fullness = m->fullness;
*class_idx = m->class;
 }
@@ -485,7 +485,7 @@ static void set_zspage_mapping(struct page *first_page,
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (struct zs_meta *)_page->mapping;
+   m = (struct zs_meta *)_page->freelist;
m->fullness = fullness;
m->class = class_idx;
 }
@@ -941,7 +941,7 @@ static void reset_page(struct page *page)
clear_bit(PG_private, >flags);
clear_bit(PG_private_2, >flags);
set_page_private(page, 0);
-   page->mapping = NULL;
+   page->freelist = NULL;
page_mapcount_reset(page);
 }
 
@@ -1051,6 +1051,7 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
 
INIT_LIST_HEAD(>lru);
if (i == 0) {   /* first page */
+   page->freelist = NULL;
SetPagePrivate(page);
set_page_private(page, 0);
first_page = page;
@@ -2066,9 +2067,9 @@ static int __init zs_init(void)
 
/*
 * A zspage's a free object index, class index, fullness group,
-* inuse object count are encoded in its (first)page->mapping
+* inuse object count are encoded in its (first)page->freelist
 * so sizeof(struct zs_meta) should be less than
-* sizeof(page->mapping(i.e., unsigned long)).
+* sizeof(page->freelist(i.e., void *)).
 */
BUILD_BUG_ON(sizeof(struct zs_meta) > sizeof(unsigned long));
 
-- 
1.9.1

[PATCH v1 04/19] mm/balloon: use general movable page feature into balloon

2016-03-10 Thread Minchan Kim

Now, VM has a feature to migrate non-lru movable pages so
balloon doesn't need custom migration hooks in migrate.c
and compact.c. Instead, this patch implements page->mapping
->{isolate|migrate|putback} functions.

With that, we could remove hooks for ballooning in general
migration functions and make balloon compaction simple.

Cc: virtualizat...@lists.linux-foundation.org
Cc: Rafael Aquini 
Cc: Konstantin Khlebnikov 
Signed-off-by: Gioh Kim 
Signed-off-by: Minchan Kim 
---
 drivers/virtio/virtio_balloon.c|   4 ++
 include/linux/balloon_compaction.h |  47 -
 include/linux/page-flags.h |  53 +++
 mm/balloon_compaction.c| 101 -
 mm/compaction.c|   7 ---
 mm/migrate.c   |  22 ++--
 mm/vmscan.c|   2 +-
 7 files changed, 73 insertions(+), 163 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0c3691f46575..30a1ea31bef4 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -476,6 +477,7 @@ static int virtballoon_migratepage(struct balloon_dev_info 
*vb_dev_info,
 
mutex_unlock(>balloon_lock);
 
+   ClearPageIsolated(page);
put_page(page); /* balloon reference */
 
return MIGRATEPAGE_SUCCESS;
@@ -509,6 +511,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
balloon_devinfo_init(>vb_dev_info);
 #ifdef CONFIG_BALLOON_COMPACTION
vb->vb_dev_info.migratepage = virtballoon_migratepage;
+   vb->vb_dev_info.inode = anon_inode_new();
+   vb->vb_dev_info.inode->i_mapping->a_ops = _aops;
 #endif
 
err = init_vqs(vb);
diff --git a/include/linux/balloon_compaction.h 
b/include/linux/balloon_compaction.h
index 9b0a15d06a4f..43a858545844 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Balloon device information descriptor.
@@ -62,6 +63,7 @@ struct balloon_dev_info {
struct list_head pages; /* Pages enqueued & handled to Host */
int (*migratepage)(struct balloon_dev_info *, struct page *newpage,
struct page *page, enum migrate_mode mode);
+   struct inode *inode;
 };
 
 extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
@@ -73,45 +75,19 @@ static inline void balloon_devinfo_init(struct 
balloon_dev_info *balloon)
spin_lock_init(>pages_lock);
INIT_LIST_HEAD(>pages);
balloon->migratepage = NULL;
+   balloon->inode = NULL;
 }
 
 #ifdef CONFIG_BALLOON_COMPACTION
-extern bool balloon_page_isolate(struct page *page);
+extern const struct address_space_operations balloon_aops;
+extern bool balloon_page_isolate(struct page *page,
+   isolate_mode_t mode);
 extern void balloon_page_putback(struct page *page);
-extern int balloon_page_migrate(struct page *newpage,
+extern int balloon_page_migrate(struct address_space *mapping,
+   struct page *newpage,
struct page *page, enum migrate_mode mode);
 
 /*
- * __is_movable_balloon_page - helper to perform @page PageBalloon tests
- */
-static inline bool __is_movable_balloon_page(struct page *page)
-{
-   return PageBalloon(page);
-}
-
-/*
- * balloon_page_movable - test PageBalloon to identify balloon pages
- *   and PagePrivate to check that the page is not
- *   isolated and can be moved by compaction/migration.
- *
- * As we might return false positives in the case of a balloon page being just
- * released under us, this need to be re-tested later, under the page lock.
- */
-static inline bool balloon_page_movable(struct page *page)
-{
-   return PageBalloon(page) && PagePrivate(page);
-}
-
-/*
- * isolated_balloon_page - identify an isolated balloon page on private
- *compaction/migration page lists.
- */
-static inline bool isolated_balloon_page(struct page *page)
-{
-   return PageBalloon(page);
-}
-
-/*
  * balloon_page_insert - insert a page into the balloon's page list and make
  *  the page->private assignment accordingly.
  * @balloon : pointer to balloon device
@@ -123,8 +99,8 @@ static inline bool isolated_balloon_page(struct page *page)
 static inline void balloon_page_insert(struct balloon_dev_info *balloon,
   struct page *page)
 {
+   page->mapping = balloon->inode->i_mapping;
__SetPageBalloon(page);
-   SetPagePrivate(page);
set_page_private(page, (unsigned long)balloon);
list_add(>lru, >pages);
 }
@@ -140,11 +116,10 @@ static inline void balloon_page_insert(struct

[PATCH v1 07/19] zsmalloc: reordering function parameter

2016-03-10 Thread Minchan Kim

This patch cleans up function parameter ordering to order
higher data structure first.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 50 ++
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 3c82011cc405..156edf909046 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -564,7 +564,7 @@ static const struct file_operations zs_stat_size_ops = {
.release= single_release,
 };
 
-static int zs_pool_stat_create(const char *name, struct zs_pool *pool)
+static int zs_pool_stat_create(struct zs_pool *pool, const char *name)
 {
struct dentry *entry;
 
@@ -604,7 +604,7 @@ static void __exit zs_stat_exit(void)
 {
 }
 
-static inline int zs_pool_stat_create(const char *name, struct zs_pool *pool)
+static inline int zs_pool_stat_create(struct zs_pool *pool, const char *name)
 {
return 0;
 }
@@ -650,8 +650,9 @@ static enum fullness_group get_fullness_group(struct page 
*first_page)
  * have. This functions inserts the given zspage into the freelist
  * identified by .
  */
-static void insert_zspage(struct page *first_page, struct size_class *class,
-   enum fullness_group fullness)
+static void insert_zspage(struct size_class *class,
+   enum fullness_group fullness,
+   struct page *first_page)
 {
struct page **head;
 
@@ -682,8 +683,9 @@ static void insert_zspage(struct page *first_page, struct 
size_class *class,
  * This function removes the given zspage from the freelist identified
  * by .
  */
-static void remove_zspage(struct page *first_page, struct size_class *class,
-   enum fullness_group fullness)
+static void remove_zspage(struct size_class *class,
+   enum fullness_group fullness,
+   struct page *first_page)
 {
struct page **head;
 
@@ -725,8 +727,8 @@ static enum fullness_group fix_fullness_group(struct 
size_class *class,
if (newfg == currfg)
goto out;
 
-   remove_zspage(first_page, class, currfg);
-   insert_zspage(first_page, class, newfg);
+   remove_zspage(class, currfg, first_page);
+   insert_zspage(class, newfg, first_page);
set_zspage_mapping(first_page, class_idx, newfg);
 
 out:
@@ -910,7 +912,7 @@ static void free_zspage(struct page *first_page)
 }
 
 /* Initialize a newly allocated zspage */
-static void init_zspage(struct page *first_page, struct size_class *class)
+static void init_zspage(struct size_class *class, struct page *first_page)
 {
unsigned long off = 0;
struct page *page = first_page;
@@ -998,7 +1000,7 @@ static struct page *alloc_zspage(struct size_class *class, 
gfp_t flags)
prev_page = page;
}
 
-   init_zspage(first_page, class);
+   init_zspage(class, first_page);
 
first_page->freelist = location_to_obj(first_page, 0);
/* Maximum number of objects we can store in this zspage */
@@ -1345,8 +1347,8 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long 
handle)
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
 
-static unsigned long obj_malloc(struct page *first_page,
-   struct size_class *class, unsigned long handle)
+static unsigned long obj_malloc(struct size_class *class,
+   struct page *first_page, unsigned long handle)
 {
unsigned long obj;
struct link_free *link;
@@ -1423,7 +1425,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
class->size, class->pages_per_zspage));
}
 
-   obj = obj_malloc(first_page, class, handle);
+   obj = obj_malloc(class, first_page, handle);
/* Now move the zspage to another fullness group, if required */
fix_fullness_group(class, first_page);
record_obj(handle, obj);
@@ -1496,8 +1498,8 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 }
 EXPORT_SYMBOL_GPL(zs_free);
 
-static void zs_object_copy(unsigned long dst, unsigned long src,
-   struct size_class *class)
+static void zs_object_copy(struct size_class *class, unsigned long dst,
+   unsigned long src)
 {
struct page *s_page, *d_page;
unsigned long s_objidx, d_objidx;
@@ -1563,8 +1565,8 @@ static void zs_object_copy(unsigned long dst, unsigned 
long src,
  * Find alloced object in zspage from index object and
  * return handle.
  */
-static unsigned long find_alloced_obj(struct page *page, int index,
-   struct size_class *class)
+static unsigned long find_alloced_obj(struct size_class *class,
+   struct page *page, int index)
 {
unsigned long head;
int offset = 0;
@@ -1614,7 +1616,7 @@ static int

[PATCH v1 16/19] zsmalloc: migrate head page of zspage

2016-03-10 Thread Minchan Kim

This patch introduces run-time migration feature for zspage.
To begin with, it supports only head page migration for
easy review(later patches will support tail page migration).

For migration, it supports three functions

* zs_page_isolate

It isolates a zspage which includes a subpage VM want to migrate
from class so anyone cannot allocate new object from the zspage.
IOW, allocation freeze

* zs_page_migrate

First of all, it freezes zspage to prevent zspage destrunction
so anyone cannot free object. Then, It copies content from oldpage
to newpage and create new page-chain with new page.
If it was successful, drop the refcount of old page to free
and putback new zspage to right data structure of zsmalloc.
Lastly, unfreeze zspages so we allows object allocation/free
from now on.

* zs_page_putback

It returns isolated zspage to right fullness_group list
if it fails to migrate a page.

NOTE: A hurdle to support migration is that destroying zspage
while migration is going on. Once a zspage is isolated,
anyone cannot allocate object from the zspage but can deallocate
object freely so a zspage could be destroyed until all of objects
in zspage are freezed to prevent deallocation. The problem is
large window betwwen zs_page_isolate and freeze_zspage
in zs_page_migrate so the zspage could be destroyed.

A easy approach to solve the problem is that object freezing
in zs_page_isolate but it has a drawback that any object cannot
be deallocated until migration fails after isolation. However,
There is large time gab between isolation and migration so
any object freeing in other CPU should spin by pin_tag which
would cause big latency. So, this patch introduces lock_zspage
which holds PG_lock of all pages in a zspage right before
freeing the zspage. VM migration locks the page, too right
before calling ->migratepage so such race doesn't exist any more.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 291 +++---
 1 file changed, 280 insertions(+), 11 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 43ab16affa68..8eb78569 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -56,6 +56,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * This must be power of 2 and greater than of equal to sizeof(link_free).
@@ -263,6 +265,7 @@ struct zs_pool {
 #ifdef CONFIG_ZSMALLOC_STAT
struct dentry *stat_dentry;
 #endif
+   struct inode *inode;
 };
 
 struct zs_meta {
@@ -413,6 +416,29 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
 }
 
+/*
+ * Indicate that whether zspage is isolated for page migration.
+ * Protected by size_class lock
+ */
+static void SetZsPageIsolate(struct page *first_page)
+{
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+   SetPageUptodate(first_page);
+}
+
+static int ZsPageIsolate(struct page *first_page)
+{
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   return PageUptodate(first_page);
+}
+
+static void ClearZsPageIsolate(struct page *first_page)
+{
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+   ClearPageUptodate(first_page);
+}
+
 static int get_zspage_inuse(struct page *first_page)
 {
struct zs_meta *m;
@@ -778,8 +804,11 @@ static enum fullness_group fix_fullness_group(struct 
size_class *class,
if (newfg == currfg)
goto out;
 
-   remove_zspage(class, currfg, first_page);
-   insert_zspage(class, newfg, first_page);
+   /* Later, putback will insert page to right list */
+   if (!ZsPageIsolate(first_page)) {
+   remove_zspage(class, currfg, first_page);
+   insert_zspage(class, newfg, first_page);
+   }
set_zspage_mapping(first_page, class_idx, newfg);
 
 out:
@@ -945,13 +974,31 @@ static void unpin_tag(unsigned long handle)
 
 static void reset_page(struct page *page)
 {
+   __ClearPageMovable(page);
clear_bit(PG_private, >flags);
clear_bit(PG_private_2, >flags);
set_page_private(page, 0);
page->freelist = NULL;
+   page->mapping = NULL;
page_mapcount_reset(page);
 }
 
+/**
+ * lock_zspage - lock all pages in the zspage
+ * @first_page: head page of the zspage
+ *
+ * To prevent destroy during migration, zspage freeing should
+ * hold locks of all pages in a zspage
+ */
+void lock_zspage(struct page *first_page)
+{
+   struct page *cursor = first_page;
+
+   do {
+   while (!trylock_page(cursor));
+   } while ((cursor = get_next_page(cursor)) != NULL);
+}
+
 static void free_zspage(struct zs_pool *pool, struct page *first_page)
 {
struct page *nextp, *tmp, *head_extra;
@@ -959,26 +1006,31 @@ static void free_zspage(struct zs_pool *pool, struct 
page *first_page)
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
VM_BUG_ON_PAGE(get_zspage_inuse(first_page), first_page);
 
+

[PATCH v1 14/19] zsmalloc: separate free_zspage from putback_zspage

2016-03-10 Thread Minchan Kim

Currently, putback_zspage does free zspage under class->lock
if fullness become ZS_EMPTY but it makes trouble to implement
locking scheme for new zspage migration.
So, this patch is to separate free_zspage from putback_zspage
and free zspage out of class->lock which is preparation for
zspage migration.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 46 +++---
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index f86f8aaeb902..49ae6531b7ad 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -945,7 +945,8 @@ static void reset_page(struct page *page)
page_mapcount_reset(page);
 }
 
-static void free_zspage(struct page *first_page)
+static void free_zspage(struct zs_pool *pool, struct size_class *class,
+   struct page *first_page)
 {
struct page *nextp, *tmp, *head_extra;
 
@@ -968,6 +969,11 @@ static void free_zspage(struct page *first_page)
}
reset_page(head_extra);
__free_page(head_extra);
+
+   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
+   class->size, class->pages_per_zspage));
+   atomic_long_sub(class->pages_per_zspage,
+   >pages_allocated);
 }
 
 /* Initialize a newly allocated zspage */
@@ -1557,13 +1563,8 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
spin_lock(>lock);
obj_free(class, obj);
fullness = fix_fullness_group(class, first_page);
-   if (fullness == ZS_EMPTY) {
-   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
-   class->size, class->pages_per_zspage));
-   atomic_long_sub(class->pages_per_zspage,
-   >pages_allocated);
-   free_zspage(first_page);
-   }
+   if (fullness == ZS_EMPTY)
+   free_zspage(pool, class, first_page);
spin_unlock(>lock);
unpin_tag(handle);
 
@@ -1750,7 +1751,7 @@ static struct page *isolate_target_page(struct size_class 
*class)
  * @class: destination class
  * @first_page: target page
  *
- * Return @fist_page's fullness_group
+ * Return @first_page's updated fullness_group
  */
 static enum fullness_group putback_zspage(struct zs_pool *pool,
struct size_class *class,
@@ -1762,15 +1763,6 @@ static enum fullness_group putback_zspage(struct zs_pool 
*pool,
insert_zspage(class, fullness, first_page);
set_zspage_mapping(first_page, class->index, fullness);
 
-   if (fullness == ZS_EMPTY) {
-   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
-   class->size, class->pages_per_zspage));
-   atomic_long_sub(class->pages_per_zspage,
-   >pages_allocated);
-
-   free_zspage(first_page);
-   }
-
return fullness;
 }
 
@@ -1833,23 +1825,31 @@ static void __zs_compact(struct zs_pool *pool, struct 
size_class *class)
if (!migrate_zspage(pool, class, ))
break;
 
-   putback_zspage(pool, class, dst_page);
+   VM_BUG_ON_PAGE(putback_zspage(pool, class,
+   dst_page) == ZS_EMPTY, dst_page);
}
 
/* Stop if we couldn't find slot */
if (dst_page == NULL)
break;
 
-   putback_zspage(pool, class, dst_page);
-   if (putback_zspage(pool, class, src_page) == ZS_EMPTY)
+   VM_BUG_ON_PAGE(putback_zspage(pool, class,
+   dst_page) == ZS_EMPTY, dst_page);
+   if (putback_zspage(pool, class, src_page) == ZS_EMPTY) {
pool->stats.pages_compacted += class->pages_per_zspage;
-   spin_unlock(>lock);
+   spin_unlock(>lock);
+   free_zspage(pool, class, src_page);
+   } else {
+   spin_unlock(>lock);
+   }
+
cond_resched();
spin_lock(>lock);
}
 
if (src_page)
-   putback_zspage(pool, class, src_page);
+   VM_BUG_ON_PAGE(putback_zspage(pool, class,
+   src_page) == ZS_EMPTY, src_page);
 
spin_unlock(>lock);
 }
-- 
1.9.1

[PATCH v1 17/19] zsmalloc: use single linked list for page chain

2016-03-10 Thread Minchan Kim

For tail page migration, we shouldn't use page->lru which
was used for page chaining because VM will use it for own
purpose so that we need another field for chaining.
For chaining, singly linked list is enough and page->index
of tail page to point first object offset in the page could
be replaced in run-time calculation.

So, this patch change page->lru list for chaining with singly
linked list via page->freelist squeeze and introduces
get_first_obj_ofs to get first object offset in a page.

With that, it could maintain page chaining without using
page->lru.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 119 ++
 1 file changed, 78 insertions(+), 41 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 8eb78569..24d8dd1fc749 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -17,10 +17,7 @@
  *
  * Usage of struct page fields:
  * page->private: points to the first component (0-order) page
- * page->index (union with page->freelist): offset of the first object
- * starting in this page.
- * page->lru: links together all component pages (except the first page)
- * of a zspage
+ * page->index (union with page->freelist): override by struct zs_meta
  *
  * For _first_ page only:
  *
@@ -269,10 +266,19 @@ struct zs_pool {
 };
 
 struct zs_meta {
-   unsigned long freeobj:FREEOBJ_BITS;
-   unsigned long class:CLASS_BITS;
-   unsigned long fullness:FULLNESS_BITS;
-   unsigned long inuse:INUSE_BITS;
+   union {
+   /* first page */
+   struct {
+   unsigned long freeobj:FREEOBJ_BITS;
+   unsigned long class:CLASS_BITS;
+   unsigned long fullness:FULLNESS_BITS;
+   unsigned long inuse:INUSE_BITS;
+   };
+   /* tail pages */
+   struct {
+   struct page *next;
+   };
+   };
 };
 
 struct mapping_area {
@@ -490,6 +496,34 @@ static unsigned long get_freeobj(struct page *first_page)
return m->freeobj;
 }
 
+static void set_next_page(struct page *page, struct page *next)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(is_first_page(page), page);
+
+   m = (struct zs_meta *)>index;
+   m->next = next;
+}
+
+static struct page *get_next_page(struct page *page)
+{
+   struct page *next;
+
+   if (is_last_page(page))
+   next = NULL;
+   else if (is_first_page(page))
+   next = (struct page *)page_private(page);
+   else {
+   struct zs_meta *m = (struct zs_meta *)>index;
+
+   VM_BUG_ON(!m->next);
+   next = m->next;
+   }
+
+   return next;
+}
+
 static void get_zspage_mapping(struct page *first_page,
unsigned int *class_idx,
enum fullness_group *fullness)
@@ -864,18 +898,30 @@ static struct page *get_first_page(struct page *page)
return (struct page *)page_private(page);
 }
 
-static struct page *get_next_page(struct page *page)
+int get_first_obj_ofs(struct size_class *class, struct page *first_page,
+   struct page *page)
 {
-   struct page *next;
+   int pos, bound;
+   int page_idx = 0;
+   int ofs = 0;
+   struct page *cursor = first_page;
 
-   if (is_last_page(page))
-   next = NULL;
-   else if (is_first_page(page))
-   next = (struct page *)page_private(page);
-   else
-   next = list_entry(page->lru.next, struct page, lru);
+   if (first_page == page)
+   goto out;
 
-   return next;
+   while (page != cursor) {
+   page_idx++;
+   cursor = get_next_page(cursor);
+   }
+
+   bound = PAGE_SIZE * page_idx;
+   pos = (((class->objs_per_zspage * class->size) *
+   page_idx / class->pages_per_zspage) / class->size
+   ) * class->size;
+
+   ofs = (pos + class->size) % PAGE_SIZE;
+out:
+   return ofs;
 }
 
 static void objidx_to_page_and_ofs(struct size_class *class,
@@ -1001,27 +1047,25 @@ void lock_zspage(struct page *first_page)
 
 static void free_zspage(struct zs_pool *pool, struct page *first_page)
 {
-   struct page *nextp, *tmp, *head_extra;
+   struct page *nextp, *tmp;
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
VM_BUG_ON_PAGE(get_zspage_inuse(first_page), first_page);
 
lock_zspage(first_page);
-   head_extra = (struct page *)page_private(first_page);
+   nextp = (struct page *)page_private(first_page);
 
/* zspage with only 1 system page */
-   if (!head_extra)
+   if (!nextp)
goto out;
 
-   list_for_each_entry_safe(nextp, tmp, _extra->lru, lru) {
-   list_del(>lru);
-   reset_page(nextp);
-

[PATCH v1 07/19] zsmalloc: reordering function parameter

2016-03-10 Thread Minchan Kim

This patch cleans up function parameter ordering to order
higher data structure first.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 50 ++
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 3c82011cc405..156edf909046 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -564,7 +564,7 @@ static const struct file_operations zs_stat_size_ops = {
.release= single_release,
 };
 
-static int zs_pool_stat_create(const char *name, struct zs_pool *pool)
+static int zs_pool_stat_create(struct zs_pool *pool, const char *name)
 {
struct dentry *entry;
 
@@ -604,7 +604,7 @@ static void __exit zs_stat_exit(void)
 {
 }
 
-static inline int zs_pool_stat_create(const char *name, struct zs_pool *pool)
+static inline int zs_pool_stat_create(struct zs_pool *pool, const char *name)
 {
return 0;
 }
@@ -650,8 +650,9 @@ static enum fullness_group get_fullness_group(struct page 
*first_page)
  * have. This functions inserts the given zspage into the freelist
  * identified by .
  */
-static void insert_zspage(struct page *first_page, struct size_class *class,
-   enum fullness_group fullness)
+static void insert_zspage(struct size_class *class,
+   enum fullness_group fullness,
+   struct page *first_page)
 {
struct page **head;
 
@@ -682,8 +683,9 @@ static void insert_zspage(struct page *first_page, struct 
size_class *class,
  * This function removes the given zspage from the freelist identified
  * by .
  */
-static void remove_zspage(struct page *first_page, struct size_class *class,
-   enum fullness_group fullness)
+static void remove_zspage(struct size_class *class,
+   enum fullness_group fullness,
+   struct page *first_page)
 {
struct page **head;
 
@@ -725,8 +727,8 @@ static enum fullness_group fix_fullness_group(struct 
size_class *class,
if (newfg == currfg)
goto out;
 
-   remove_zspage(first_page, class, currfg);
-   insert_zspage(first_page, class, newfg);
+   remove_zspage(class, currfg, first_page);
+   insert_zspage(class, newfg, first_page);
set_zspage_mapping(first_page, class_idx, newfg);
 
 out:
@@ -910,7 +912,7 @@ static void free_zspage(struct page *first_page)
 }
 
 /* Initialize a newly allocated zspage */
-static void init_zspage(struct page *first_page, struct size_class *class)
+static void init_zspage(struct size_class *class, struct page *first_page)
 {
unsigned long off = 0;
struct page *page = first_page;
@@ -998,7 +1000,7 @@ static struct page *alloc_zspage(struct size_class *class, 
gfp_t flags)
prev_page = page;
}
 
-   init_zspage(first_page, class);
+   init_zspage(class, first_page);
 
first_page->freelist = location_to_obj(first_page, 0);
/* Maximum number of objects we can store in this zspage */
@@ -1345,8 +1347,8 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long 
handle)
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
 
-static unsigned long obj_malloc(struct page *first_page,
-   struct size_class *class, unsigned long handle)
+static unsigned long obj_malloc(struct size_class *class,
+   struct page *first_page, unsigned long handle)
 {
unsigned long obj;
struct link_free *link;
@@ -1423,7 +1425,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
class->size, class->pages_per_zspage));
}
 
-   obj = obj_malloc(first_page, class, handle);
+   obj = obj_malloc(class, first_page, handle);
/* Now move the zspage to another fullness group, if required */
fix_fullness_group(class, first_page);
record_obj(handle, obj);
@@ -1496,8 +1498,8 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 }
 EXPORT_SYMBOL_GPL(zs_free);
 
-static void zs_object_copy(unsigned long dst, unsigned long src,
-   struct size_class *class)
+static void zs_object_copy(struct size_class *class, unsigned long dst,
+   unsigned long src)
 {
struct page *s_page, *d_page;
unsigned long s_objidx, d_objidx;
@@ -1563,8 +1565,8 @@ static void zs_object_copy(unsigned long dst, unsigned 
long src,
  * Find alloced object in zspage from index object and
  * return handle.
  */
-static unsigned long find_alloced_obj(struct page *page, int index,
-   struct size_class *class)
+static unsigned long find_alloced_obj(struct size_class *class,
+   struct page *page, int index)
 {
unsigned long head;
int offset = 0;
@@ -1614,7 +1616,7 @@ static int migrate_zspage(struct zs_pool *pool, struct 
size_class *class,

[PATCH v1 16/19] zsmalloc: migrate head page of zspage

2016-03-10 Thread Minchan Kim

This patch introduces run-time migration feature for zspage.
To begin with, it supports only head page migration for
easy review(later patches will support tail page migration).

For migration, it supports three functions

* zs_page_isolate

It isolates a zspage which includes a subpage VM want to migrate
from class so anyone cannot allocate new object from the zspage.
IOW, allocation freeze

* zs_page_migrate

First of all, it freezes zspage to prevent zspage destrunction
so anyone cannot free object. Then, It copies content from oldpage
to newpage and create new page-chain with new page.
If it was successful, drop the refcount of old page to free
and putback new zspage to right data structure of zsmalloc.
Lastly, unfreeze zspages so we allows object allocation/free
from now on.

* zs_page_putback

It returns isolated zspage to right fullness_group list
if it fails to migrate a page.

NOTE: A hurdle to support migration is that destroying zspage
while migration is going on. Once a zspage is isolated,
anyone cannot allocate object from the zspage but can deallocate
object freely so a zspage could be destroyed until all of objects
in zspage are freezed to prevent deallocation. The problem is
large window betwwen zs_page_isolate and freeze_zspage
in zs_page_migrate so the zspage could be destroyed.

A easy approach to solve the problem is that object freezing
in zs_page_isolate but it has a drawback that any object cannot
be deallocated until migration fails after isolation. However,
There is large time gab between isolation and migration so
any object freeing in other CPU should spin by pin_tag which
would cause big latency. So, this patch introduces lock_zspage
which holds PG_lock of all pages in a zspage right before
freeing the zspage. VM migration locks the page, too right
before calling ->migratepage so such race doesn't exist any more.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 291 +++---
 1 file changed, 280 insertions(+), 11 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 43ab16affa68..8eb78569 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -56,6 +56,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * This must be power of 2 and greater than of equal to sizeof(link_free).
@@ -263,6 +265,7 @@ struct zs_pool {
 #ifdef CONFIG_ZSMALLOC_STAT
struct dentry *stat_dentry;
 #endif
+   struct inode *inode;
 };
 
 struct zs_meta {
@@ -413,6 +416,29 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
 }
 
+/*
+ * Indicate that whether zspage is isolated for page migration.
+ * Protected by size_class lock
+ */
+static void SetZsPageIsolate(struct page *first_page)
+{
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+   SetPageUptodate(first_page);
+}
+
+static int ZsPageIsolate(struct page *first_page)
+{
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   return PageUptodate(first_page);
+}
+
+static void ClearZsPageIsolate(struct page *first_page)
+{
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+   ClearPageUptodate(first_page);
+}
+
 static int get_zspage_inuse(struct page *first_page)
 {
struct zs_meta *m;
@@ -778,8 +804,11 @@ static enum fullness_group fix_fullness_group(struct 
size_class *class,
if (newfg == currfg)
goto out;
 
-   remove_zspage(class, currfg, first_page);
-   insert_zspage(class, newfg, first_page);
+   /* Later, putback will insert page to right list */
+   if (!ZsPageIsolate(first_page)) {
+   remove_zspage(class, currfg, first_page);
+   insert_zspage(class, newfg, first_page);
+   }
set_zspage_mapping(first_page, class_idx, newfg);
 
 out:
@@ -945,13 +974,31 @@ static void unpin_tag(unsigned long handle)
 
 static void reset_page(struct page *page)
 {
+   __ClearPageMovable(page);
clear_bit(PG_private, >flags);
clear_bit(PG_private_2, >flags);
set_page_private(page, 0);
page->freelist = NULL;
+   page->mapping = NULL;
page_mapcount_reset(page);
 }
 
+/**
+ * lock_zspage - lock all pages in the zspage
+ * @first_page: head page of the zspage
+ *
+ * To prevent destroy during migration, zspage freeing should
+ * hold locks of all pages in a zspage
+ */
+void lock_zspage(struct page *first_page)
+{
+   struct page *cursor = first_page;
+
+   do {
+   while (!trylock_page(cursor));
+   } while ((cursor = get_next_page(cursor)) != NULL);
+}
+
 static void free_zspage(struct zs_pool *pool, struct page *first_page)
 {
struct page *nextp, *tmp, *head_extra;
@@ -959,26 +1006,31 @@ static void free_zspage(struct zs_pool *pool, struct 
page *first_page)
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
VM_BUG_ON_PAGE(get_zspage_inuse(first_page), first_page);
 
+   lock_zspage(first_page);

[PATCH v1 14/19] zsmalloc: separate free_zspage from putback_zspage

2016-03-10 Thread Minchan Kim

Currently, putback_zspage does free zspage under class->lock
if fullness become ZS_EMPTY but it makes trouble to implement
locking scheme for new zspage migration.
So, this patch is to separate free_zspage from putback_zspage
and free zspage out of class->lock which is preparation for
zspage migration.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 46 +++---
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index f86f8aaeb902..49ae6531b7ad 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -945,7 +945,8 @@ static void reset_page(struct page *page)
page_mapcount_reset(page);
 }
 
-static void free_zspage(struct page *first_page)
+static void free_zspage(struct zs_pool *pool, struct size_class *class,
+   struct page *first_page)
 {
struct page *nextp, *tmp, *head_extra;
 
@@ -968,6 +969,11 @@ static void free_zspage(struct page *first_page)
}
reset_page(head_extra);
__free_page(head_extra);
+
+   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
+   class->size, class->pages_per_zspage));
+   atomic_long_sub(class->pages_per_zspage,
+   >pages_allocated);
 }
 
 /* Initialize a newly allocated zspage */
@@ -1557,13 +1563,8 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
spin_lock(>lock);
obj_free(class, obj);
fullness = fix_fullness_group(class, first_page);
-   if (fullness == ZS_EMPTY) {
-   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
-   class->size, class->pages_per_zspage));
-   atomic_long_sub(class->pages_per_zspage,
-   >pages_allocated);
-   free_zspage(first_page);
-   }
+   if (fullness == ZS_EMPTY)
+   free_zspage(pool, class, first_page);
spin_unlock(>lock);
unpin_tag(handle);
 
@@ -1750,7 +1751,7 @@ static struct page *isolate_target_page(struct size_class 
*class)
  * @class: destination class
  * @first_page: target page
  *
- * Return @fist_page's fullness_group
+ * Return @first_page's updated fullness_group
  */
 static enum fullness_group putback_zspage(struct zs_pool *pool,
struct size_class *class,
@@ -1762,15 +1763,6 @@ static enum fullness_group putback_zspage(struct zs_pool 
*pool,
insert_zspage(class, fullness, first_page);
set_zspage_mapping(first_page, class->index, fullness);
 
-   if (fullness == ZS_EMPTY) {
-   zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
-   class->size, class->pages_per_zspage));
-   atomic_long_sub(class->pages_per_zspage,
-   >pages_allocated);
-
-   free_zspage(first_page);
-   }
-
return fullness;
 }
 
@@ -1833,23 +1825,31 @@ static void __zs_compact(struct zs_pool *pool, struct 
size_class *class)
if (!migrate_zspage(pool, class, ))
break;
 
-   putback_zspage(pool, class, dst_page);
+   VM_BUG_ON_PAGE(putback_zspage(pool, class,
+   dst_page) == ZS_EMPTY, dst_page);
}
 
/* Stop if we couldn't find slot */
if (dst_page == NULL)
break;
 
-   putback_zspage(pool, class, dst_page);
-   if (putback_zspage(pool, class, src_page) == ZS_EMPTY)
+   VM_BUG_ON_PAGE(putback_zspage(pool, class,
+   dst_page) == ZS_EMPTY, dst_page);
+   if (putback_zspage(pool, class, src_page) == ZS_EMPTY) {
pool->stats.pages_compacted += class->pages_per_zspage;
-   spin_unlock(>lock);
+   spin_unlock(>lock);
+   free_zspage(pool, class, src_page);
+   } else {
+   spin_unlock(>lock);
+   }
+
cond_resched();
spin_lock(>lock);
}
 
if (src_page)
-   putback_zspage(pool, class, src_page);
+   VM_BUG_ON_PAGE(putback_zspage(pool, class,
+   src_page) == ZS_EMPTY, src_page);
 
spin_unlock(>lock);
 }
-- 
1.9.1

[PATCH v1 17/19] zsmalloc: use single linked list for page chain

2016-03-10 Thread Minchan Kim

For tail page migration, we shouldn't use page->lru which
was used for page chaining because VM will use it for own
purpose so that we need another field for chaining.
For chaining, singly linked list is enough and page->index
of tail page to point first object offset in the page could
be replaced in run-time calculation.

So, this patch change page->lru list for chaining with singly
linked list via page->freelist squeeze and introduces
get_first_obj_ofs to get first object offset in a page.

With that, it could maintain page chaining without using
page->lru.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 119 ++
 1 file changed, 78 insertions(+), 41 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 8eb78569..24d8dd1fc749 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -17,10 +17,7 @@
  *
  * Usage of struct page fields:
  * page->private: points to the first component (0-order) page
- * page->index (union with page->freelist): offset of the first object
- * starting in this page.
- * page->lru: links together all component pages (except the first page)
- * of a zspage
+ * page->index (union with page->freelist): override by struct zs_meta
  *
  * For _first_ page only:
  *
@@ -269,10 +266,19 @@ struct zs_pool {
 };
 
 struct zs_meta {
-   unsigned long freeobj:FREEOBJ_BITS;
-   unsigned long class:CLASS_BITS;
-   unsigned long fullness:FULLNESS_BITS;
-   unsigned long inuse:INUSE_BITS;
+   union {
+   /* first page */
+   struct {
+   unsigned long freeobj:FREEOBJ_BITS;
+   unsigned long class:CLASS_BITS;
+   unsigned long fullness:FULLNESS_BITS;
+   unsigned long inuse:INUSE_BITS;
+   };
+   /* tail pages */
+   struct {
+   struct page *next;
+   };
+   };
 };
 
 struct mapping_area {
@@ -490,6 +496,34 @@ static unsigned long get_freeobj(struct page *first_page)
return m->freeobj;
 }
 
+static void set_next_page(struct page *page, struct page *next)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(is_first_page(page), page);
+
+   m = (struct zs_meta *)>index;
+   m->next = next;
+}
+
+static struct page *get_next_page(struct page *page)
+{
+   struct page *next;
+
+   if (is_last_page(page))
+   next = NULL;
+   else if (is_first_page(page))
+   next = (struct page *)page_private(page);
+   else {
+   struct zs_meta *m = (struct zs_meta *)>index;
+
+   VM_BUG_ON(!m->next);
+   next = m->next;
+   }
+
+   return next;
+}
+
 static void get_zspage_mapping(struct page *first_page,
unsigned int *class_idx,
enum fullness_group *fullness)
@@ -864,18 +898,30 @@ static struct page *get_first_page(struct page *page)
return (struct page *)page_private(page);
 }
 
-static struct page *get_next_page(struct page *page)
+int get_first_obj_ofs(struct size_class *class, struct page *first_page,
+   struct page *page)
 {
-   struct page *next;
+   int pos, bound;
+   int page_idx = 0;
+   int ofs = 0;
+   struct page *cursor = first_page;
 
-   if (is_last_page(page))
-   next = NULL;
-   else if (is_first_page(page))
-   next = (struct page *)page_private(page);
-   else
-   next = list_entry(page->lru.next, struct page, lru);
+   if (first_page == page)
+   goto out;
 
-   return next;
+   while (page != cursor) {
+   page_idx++;
+   cursor = get_next_page(cursor);
+   }
+
+   bound = PAGE_SIZE * page_idx;
+   pos = (((class->objs_per_zspage * class->size) *
+   page_idx / class->pages_per_zspage) / class->size
+   ) * class->size;
+
+   ofs = (pos + class->size) % PAGE_SIZE;
+out:
+   return ofs;
 }
 
 static void objidx_to_page_and_ofs(struct size_class *class,
@@ -1001,27 +1047,25 @@ void lock_zspage(struct page *first_page)
 
 static void free_zspage(struct zs_pool *pool, struct page *first_page)
 {
-   struct page *nextp, *tmp, *head_extra;
+   struct page *nextp, *tmp;
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
VM_BUG_ON_PAGE(get_zspage_inuse(first_page), first_page);
 
lock_zspage(first_page);
-   head_extra = (struct page *)page_private(first_page);
+   nextp = (struct page *)page_private(first_page);
 
/* zspage with only 1 system page */
-   if (!head_extra)
+   if (!nextp)
goto out;
 
-   list_for_each_entry_safe(nextp, tmp, _extra->lru, lru) {
-   list_del(>lru);
-   reset_page(nextp);
-   unlock_page(nextp);

[PATCH v1 18/19] zsmalloc: migrate tail pages in zspage

2016-03-10 Thread Minchan Kim

This patch enables tail page migration of zspage.

In this point, I tested zsmalloc regression with micro-benchmark
which does zs_malloc/map/unmap/zs_free for all size class
in every CPU(my system is 12) during 20 sec.

It shows 1% regression which is really small when we consider
the benefit of this feature and realworkload overhead(i.e.,
most overhead comes from compression).

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 131 +++---
 1 file changed, 115 insertions(+), 16 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 24d8dd1fc749..b9ff698115a1 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -550,6 +550,19 @@ static void set_zspage_mapping(struct page *first_page,
m->class = class_idx;
 }
 
+static bool check_isolated_page(struct page *first_page)
+{
+   struct page *cursor;
+
+   for (cursor = first_page; cursor != NULL; cursor =
+   get_next_page(cursor)) {
+   if (PageIsolated(cursor))
+   return true;
+   }
+
+   return false;
+}
+
 /*
  * zsmalloc divides the pool into various size classes where each
  * class maintains a list of zspages where each zspage is divided
@@ -1045,6 +1058,44 @@ void lock_zspage(struct page *first_page)
} while ((cursor = get_next_page(cursor)) != NULL);
 }
 
+int trylock_zspage(struct page *first_page, struct page *locked_page)
+{
+   struct page *cursor, *fail;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   for (cursor = first_page; cursor != NULL; cursor =
+   get_next_page(cursor)) {
+   if (cursor != locked_page) {
+   if (!trylock_page(cursor)) {
+   fail = cursor;
+   goto unlock;
+   }
+   }
+   }
+
+   return 1;
+unlock:
+   for (cursor = first_page; cursor != fail; cursor =
+   get_next_page(cursor)) {
+   if (cursor != locked_page)
+   unlock_page(cursor);
+   }
+
+   return 0;
+}
+
+void unlock_zspage(struct page *first_page, struct page *locked_page)
+{
+   struct page *cursor = first_page;
+
+   for (; cursor != NULL; cursor = get_next_page(cursor)) {
+   VM_BUG_ON_PAGE(!PageLocked(cursor), cursor);
+   if (cursor != locked_page)
+   unlock_page(cursor);
+   };
+}
+
 static void free_zspage(struct zs_pool *pool, struct page *first_page)
 {
struct page *nextp, *tmp;
@@ -1083,16 +1134,17 @@ static void init_zspage(struct size_class *class, 
struct page *first_page,
first_page->freelist = NULL;
INIT_LIST_HEAD(_page->lru);
set_zspage_inuse(first_page, 0);
-   BUG_ON(!trylock_page(first_page));
-   first_page->mapping = mapping;
-   __SetPageMovable(first_page);
-   unlock_page(first_page);
 
while (page) {
struct page *next_page;
struct link_free *link;
void *vaddr;
 
+   BUG_ON(!trylock_page(page));
+   page->mapping = mapping;
+   __SetPageMovable(page);
+   unlock_page(page);
+
vaddr = kmap_atomic(page);
link = (struct link_free *)vaddr + off / sizeof(*link);
 
@@ -1845,6 +1897,7 @@ static enum fullness_group putback_zspage(struct 
size_class *class,
 
VM_BUG_ON_PAGE(!list_empty(_page->lru), first_page);
VM_BUG_ON_PAGE(ZsPageIsolate(first_page), first_page);
+   VM_BUG_ON_PAGE(check_isolated_page(first_page), first_page);
 
fullness = get_fullness_group(class, first_page);
insert_zspage(class, fullness, first_page);
@@ -1951,6 +2004,12 @@ static struct page *isolate_source_page(struct 
size_class *class)
if (!page)
continue;
 
+   /* To prevent race between object and page migration */
+   if (!trylock_zspage(page, NULL)) {
+   page = NULL;
+   continue;
+   }
+
remove_zspage(class, i, page);
 
inuse = get_zspage_inuse(page);
@@ -1959,6 +2018,7 @@ static struct page *isolate_source_page(struct size_class 
*class)
if (inuse != freezed) {
unfreeze_zspage(class, page, freezed);
putback_zspage(class, page);
+   unlock_zspage(page, NULL);
page = NULL;
continue;
}
@@ -1990,6 +2050,12 @@ static struct page *isolate_target_page(struct 
size_class *class)
if (!page)
continue;
 
+   /* To prevent race between object and page migration */
+   if (!trylock_zspage(page, NULL)) {
+   page = NULL;
+

[PATCH v1 10/19] zsmalloc: squeeze inuse into page->mapping

2016-03-10 Thread Minchan Kim

Currently, we store class:fullness into page->mapping.
The number of class we can support is 255 and fullness is 4 so
(8 + 2 = 10bit) is enough to represent them.
Meanwhile, the bits we need to store in-use objects in zspage
is that 11bit is enough.

For example, If we assume that 64K PAGE_SIZE, class_size 32
which is worst case, class->pages_per_zspage become 1 so
the number of objects in zspage is 2048 so 11bit is enough.
The next class is 32 + 256(i.e., ZS_SIZE_CLASS_DELTA).
With worst case that ZS_MAX_PAGES_PER_ZSPAGE, 64K * 4 /
(32 + 256) = 910 so 11bit is still enough.

So, we could squeeze inuse object count to page->mapping.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 103 --
 1 file changed, 71 insertions(+), 32 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index ca663c82c1fc..954e8758a78d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -34,8 +34,7 @@
  * metadata.
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
- * page->mapping: class index and fullness group of the zspage
- * page->inuse: the number of objects that are used in this zspage
+ * page->mapping: override by struct zs_meta
  *
  * Usage of struct page flags:
  * PG_private: identifies the first component page
@@ -132,6 +131,13 @@
 /* each chunk includes extra space to keep handle */
 #define ZS_MAX_ALLOC_SIZE  PAGE_SIZE
 
+#define CLASS_BITS 8
+#define CLASS_MASK ((1 << CLASS_BITS) - 1)
+#define FULLNESS_BITS  2
+#define FULLNESS_MASK  ((1 << FULLNESS_BITS) - 1)
+#define INUSE_BITS 11
+#define INUSE_MASK ((1 << INUSE_BITS) - 1)
+
 /*
  * On systems with 4K page size, this gives 255 size classes! There is a
  * trader-off here:
@@ -145,7 +151,7 @@
  *  ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
  *  (reason above)
  */
-#define ZS_SIZE_CLASS_DELTA(PAGE_SIZE >> 8)
+#define ZS_SIZE_CLASS_DELTA(PAGE_SIZE >> CLASS_BITS)
 
 /*
  * We do not maintain any list for completely empty or full pages
@@ -155,7 +161,7 @@ enum fullness_group {
ZS_ALMOST_EMPTY,
_ZS_NR_FULLNESS_GROUPS,
 
-   ZS_EMPTY,
+   ZS_EMPTY = _ZS_NR_FULLNESS_GROUPS,
ZS_FULL
 };
 
@@ -263,14 +269,11 @@ struct zs_pool {
 #endif
 };
 
-/*
- * A zspage's class index and fullness group
- * are encoded in its (first)page->mapping
- */
-#define CLASS_IDX_BITS 28
-#define FULLNESS_BITS  4
-#define CLASS_IDX_MASK ((1 << CLASS_IDX_BITS) - 1)
-#define FULLNESS_MASK  ((1 << FULLNESS_BITS) - 1)
+struct zs_meta {
+   unsigned long class:CLASS_BITS;
+   unsigned long fullness:FULLNESS_BITS;
+   unsigned long inuse:INUSE_BITS;
+};
 
 struct mapping_area {
 #ifdef CONFIG_PGTABLE_MAPPING
@@ -413,28 +416,61 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
 }
 
+static int get_zspage_inuse(struct page *first_page)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)_page->mapping;
+
+   return m->inuse;
+}
+
+static void set_zspage_inuse(struct page *first_page, int val)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)_page->mapping;
+   m->inuse = val;
+}
+
+static void mod_zspage_inuse(struct page *first_page, int val)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)_page->mapping;
+   m->inuse += val;
+}
+
 static void get_zspage_mapping(struct page *first_page,
unsigned int *class_idx,
enum fullness_group *fullness)
 {
-   unsigned long m;
+   struct zs_meta *m;
+
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (unsigned long)first_page->mapping;
-   *fullness = m & FULLNESS_MASK;
-   *class_idx = (m >> FULLNESS_BITS) & CLASS_IDX_MASK;
+   m = (struct zs_meta *)_page->mapping;
+   *fullness = m->fullness;
+   *class_idx = m->class;
 }
 
 static void set_zspage_mapping(struct page *first_page,
unsigned int class_idx,
enum fullness_group fullness)
 {
-   unsigned long m;
+   struct zs_meta *m;
+
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) |
-   (fullness & FULLNESS_MASK);
-   first_page->mapping = (struct address_space *)m;
+   m = (struct zs_meta *)_page->mapping;
+   m->fullness = fullness;
+   m->class = class_idx;
 }
 
 /*
@@ -627,9 +663,7 @@ static enum fullness_group get_fullness_group(struct 
size_class *class,
int inuse, objs_per_zspage;
enum fullness_group fg;
 
-

[PATCH v1 09/19] zsmalloc: keep max_object in size_class

2016-03-10 Thread Minchan Kim

Every zspage in a size_class has same number of max objects so
we could move it to a size_class.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b4fb11831acb..ca663c82c1fc 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -32,8 +32,6 @@
  * page->freelist: points to the first free object in zspage.
  * Free objects are linked together using in-place
  * metadata.
- * page->objects: maximum number of objects we can store in this
- * zspage (class->zspage_order * PAGE_SIZE / class->size)
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
  * page->mapping: class index and fullness group of the zspage
@@ -211,6 +209,7 @@ struct size_class {
 * of ZS_ALIGN.
 */
int size;
+   int objs_per_zspage;
unsigned int index;
 
struct zs_size_stat stats;
@@ -622,21 +621,22 @@ static inline void zs_pool_stat_destroy(struct zs_pool 
*pool)
  * the pool (not yet implemented). This function returns fullness
  * status of the given page.
  */
-static enum fullness_group get_fullness_group(struct page *first_page)
+static enum fullness_group get_fullness_group(struct size_class *class,
+   struct page *first_page)
 {
-   int inuse, max_objects;
+   int inuse, objs_per_zspage;
enum fullness_group fg;
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
inuse = first_page->inuse;
-   max_objects = first_page->objects;
+   objs_per_zspage = class->objs_per_zspage;
 
if (inuse == 0)
fg = ZS_EMPTY;
-   else if (inuse == max_objects)
+   else if (inuse == objs_per_zspage)
fg = ZS_FULL;
-   else if (inuse <= 3 * max_objects / fullness_threshold_frac)
+   else if (inuse <= 3 * objs_per_zspage / fullness_threshold_frac)
fg = ZS_ALMOST_EMPTY;
else
fg = ZS_ALMOST_FULL;
@@ -723,7 +723,7 @@ static enum fullness_group fix_fullness_group(struct 
size_class *class,
enum fullness_group currfg, newfg;
 
get_zspage_mapping(first_page, _idx, );
-   newfg = get_fullness_group(first_page);
+   newfg = get_fullness_group(class, first_page);
if (newfg == currfg)
goto out;
 
@@ -1003,9 +1003,6 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
init_zspage(class, first_page);
 
first_page->freelist = location_to_obj(first_page, 0);
-   /* Maximum number of objects we can store in this zspage */
-   first_page->objects = class->pages_per_zspage * PAGE_SIZE / class->size;
-
error = 0; /* Success */
 
 cleanup:
@@ -1235,11 +1232,11 @@ static bool can_merge(struct size_class *prev, int 
size, int pages_per_zspage)
return true;
 }
 
-static bool zspage_full(struct page *first_page)
+static bool zspage_full(struct size_class *class, struct page *first_page)
 {
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   return first_page->inuse == first_page->objects;
+   return first_page->inuse == class->objs_per_zspage;
 }
 
 unsigned long zs_get_total_pages(struct zs_pool *pool)
@@ -1625,7 +1622,7 @@ static int migrate_zspage(struct zs_pool *pool, struct 
size_class *class,
}
 
/* Stop if there is no more space */
-   if (zspage_full(d_page)) {
+   if (zspage_full(class, d_page)) {
unpin_tag(handle);
ret = -ENOMEM;
break;
@@ -1684,7 +1681,7 @@ static enum fullness_group putback_zspage(struct zs_pool 
*pool,
 {
enum fullness_group fullness;
 
-   fullness = get_fullness_group(first_page);
+   fullness = get_fullness_group(class, first_page);
insert_zspage(class, fullness, first_page);
set_zspage_mapping(first_page, class->index, fullness);
 
@@ -1933,6 +1930,8 @@ struct zs_pool *zs_create_pool(const char *name, gfp_t 
flags)
class->size = size;
class->index = i;
class->pages_per_zspage = pages_per_zspage;
+   class->objs_per_zspage = class->pages_per_zspage *
+   PAGE_SIZE / class->size;
if (pages_per_zspage == 1 &&
get_maxobj_per_zspage(size, pages_per_zspage) == 1)
class->huge = true;
-- 
1.9.1

[PATCH v1 13/19] zsmalloc: factor page chain functionality out

2016-03-10 Thread Minchan Kim

For migration, we need to create sub-page chain of zspage
dynamically so this patch factors it out from alloc_zspage.

As a minor refactoring, it makes OBJ_ALLOCATED_TAG assign
more clear in obj_malloc(it could be another patch but it's
trivial so I want to put together in this patch).

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 78 ++-
 1 file changed, 45 insertions(+), 33 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index bfc6a048afac..f86f8aaeb902 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -977,7 +977,9 @@ static void init_zspage(struct size_class *class, struct 
page *first_page)
unsigned long off = 0;
struct page *page = first_page;
 
-   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+   first_page->freelist = NULL;
+   INIT_LIST_HEAD(_page->lru);
+   set_zspage_inuse(first_page, 0);
 
while (page) {
struct page *next_page;
@@ -1022,13 +1024,44 @@ static void init_zspage(struct size_class *class, 
struct page *first_page)
set_freeobj(first_page, 0);
 }
 
+static void create_page_chain(struct page *pages[], int nr_pages)
+{
+   int i;
+   struct page *page;
+   struct page *prev_page = NULL;
+   struct page *first_page = NULL;
+
+   for (i = 0; i < nr_pages; i++) {
+   page = pages[i];
+
+   INIT_LIST_HEAD(>lru);
+   if (i == 0) {
+   SetPagePrivate(page);
+   set_page_private(page, 0);
+   first_page = page;
+   }
+
+   if (i == 1)
+   set_page_private(first_page, (unsigned long)page);
+   if (i >= 1)
+   set_page_private(page, (unsigned long)first_page);
+   if (i >= 2)
+   list_add(>lru, _page->lru);
+   if (i == nr_pages - 1)
+   SetPagePrivate2(page);
+
+   prev_page = page;
+   }
+}
+
 /*
  * Allocate a zspage for the given size class
  */
 static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
 {
-   int i, error;
+   int i;
struct page *first_page = NULL, *uninitialized_var(prev_page);
+   struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE];
 
/*
 * Allocate individual pages and link them together as:
@@ -1041,43 +1074,23 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
 * (i.e. no other sub-page has this flag set) and PG_private_2 to
 * identify the last page.
 */
-   error = -ENOMEM;
for (i = 0; i < class->pages_per_zspage; i++) {
struct page *page;
 
page = alloc_page(flags);
-   if (!page)
-   goto cleanup;
-
-   INIT_LIST_HEAD(>lru);
-   if (i == 0) {   /* first page */
-   page->freelist = NULL;
-   SetPagePrivate(page);
-   set_page_private(page, 0);
-   first_page = page;
-   set_zspage_inuse(page, 0);
+   if (!page) {
+   while (--i >= 0)
+   __free_page(pages[i]);
+   return NULL;
}
-   if (i == 1)
-   set_page_private(first_page, (unsigned long)page);
-   if (i >= 1)
-   set_page_private(page, (unsigned long)first_page);
-   if (i >= 2)
-   list_add(>lru, _page->lru);
-   if (i == class->pages_per_zspage - 1)   /* last page */
-   SetPagePrivate2(page);
-   prev_page = page;
+
+   pages[i] = page;
}
 
+   create_page_chain(pages, class->pages_per_zspage);
+   first_page = pages[0];
init_zspage(class, first_page);
 
-   error = 0; /* Success */
-
-cleanup:
-   if (unlikely(error) && first_page) {
-   free_zspage(first_page);
-   first_page = NULL;
-   }
-
return first_page;
 }
 
@@ -1419,7 +1432,6 @@ static unsigned long obj_malloc(struct size_class *class,
unsigned long m_offset;
void *vaddr;
 
-   handle |= OBJ_ALLOCATED_TAG;
obj = get_freeobj(first_page);
objidx_to_page_and_ofs(class, first_page, obj,
_page, _offset);
@@ -1429,10 +1441,10 @@ static unsigned long obj_malloc(struct size_class 
*class,
set_freeobj(first_page, link->next >> OBJ_ALLOCATED_TAG);
if (!class->huge)
/* record handle in the header of allocated chunk */
-   link->handle = handle;
+   link->handle = handle | OBJ_ALLOCATED_TAG;
else
/* record handle in first_page->private */
-   set_page_private(first_page, handle);

[PATCH v1 11/19] zsmalloc: squeeze freelist into page->mapping

2016-03-10 Thread Minchan Kim

Zsmalloc stores first free object's position into first_page->freelist
in each zspage. If we change it with object index from first_page
instead of location, we could squeeze it into page->mapping because
the number of bit we need to store offset is at most 11bit.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 159 +++---
 1 file changed, 96 insertions(+), 63 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 954e8758a78d..e23cd3b2dd71 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -18,9 +18,7 @@
  * Usage of struct page fields:
  * page->private: points to the first component (0-order) page
  * page->index (union with page->freelist): offset of the first object
- * starting in this page. For the first page, this is
- * always 0, so we use this field (aka freelist) to point
- * to the first free object in zspage.
+ * starting in this page.
  * page->lru: links together all component pages (except the first page)
  * of a zspage
  *
@@ -29,9 +27,6 @@
  * page->private: refers to the component page after the first page
  * If the page is first_page for huge object, it stores handle.
  * Look at size_class->huge.
- * page->freelist: points to the first free object in zspage.
- * Free objects are linked together using in-place
- * metadata.
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
  * page->mapping: override by struct zs_meta
@@ -131,6 +126,7 @@
 /* each chunk includes extra space to keep handle */
 #define ZS_MAX_ALLOC_SIZE  PAGE_SIZE
 
+#define FREEOBJ_BITS 11
 #define CLASS_BITS 8
 #define CLASS_MASK ((1 << CLASS_BITS) - 1)
 #define FULLNESS_BITS  2
@@ -228,17 +224,17 @@ struct size_class {
 
 /*
  * Placed within free objects to form a singly linked list.
- * For every zspage, first_page->freelist gives head of this list.
+ * For every zspage, first_page->freeobj gives head of this list.
  *
  * This must be power of 2 and less than or equal to ZS_ALIGN
  */
 struct link_free {
union {
/*
-* Position of next free chunk (encodes )
+* free object list
 * It's valid for non-allocated object
 */
-   void *next;
+   unsigned long next;
/*
 * Handle of allocated object.
 */
@@ -270,6 +266,7 @@ struct zs_pool {
 };
 
 struct zs_meta {
+   unsigned long freeobj:FREEOBJ_BITS;
unsigned long class:CLASS_BITS;
unsigned long fullness:FULLNESS_BITS;
unsigned long inuse:INUSE_BITS;
@@ -447,6 +444,26 @@ static void mod_zspage_inuse(struct page *first_page, int 
val)
m->inuse += val;
 }
 
+static void set_freeobj(struct page *first_page, int idx)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)_page->mapping;
+   m->freeobj = idx;
+}
+
+static unsigned long get_freeobj(struct page *first_page)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)_page->mapping;
+   return m->freeobj;
+}
+
 static void get_zspage_mapping(struct page *first_page,
unsigned int *class_idx,
enum fullness_group *fullness)
@@ -832,30 +849,33 @@ static struct page *get_next_page(struct page *page)
return next;
 }
 
-/*
- * Encode  as a single handle value.
- * We use the least bit of handle for tagging.
- */
-static void *location_to_obj(struct page *page, unsigned long obj_idx)
+static void objidx_to_page_and_ofs(struct size_class *class,
+   struct page *first_page,
+   unsigned long obj_idx,
+   struct page **obj_page,
+   unsigned long *ofs_in_page)
 {
-   unsigned long obj;
+   int i;
+   unsigned long ofs;
+   struct page *cursor;
+   int nr_page;
 
-   if (!page) {
-   VM_BUG_ON(obj_idx);
-   return NULL;
-   }
+   ofs = obj_idx * class->size;
+   cursor = first_page;
+   nr_page = ofs >> PAGE_SHIFT;
 
-   obj = page_to_pfn(page) << OBJ_INDEX_BITS;
-   obj |= ((obj_idx) & OBJ_INDEX_MASK);
-   obj <<= OBJ_TAG_BITS;
+   *ofs_in_page = ofs & ~PAGE_MASK;
+
+   for (i = 0; i < nr_page; i++)
+   cursor = get_next_page(cursor);
 
-   return (void *)obj;
+   *obj_page = cursor;
 }
 
-/*
- * Decode  pair from the given object handle. We adjust the
- * decoded obj_idx back to its original value since it was adjusted in
- * location_to_obj().
+/**
+

[PATCH v1 18/19] zsmalloc: migrate tail pages in zspage

2016-03-10 Thread Minchan Kim

This patch enables tail page migration of zspage.

In this point, I tested zsmalloc regression with micro-benchmark
which does zs_malloc/map/unmap/zs_free for all size class
in every CPU(my system is 12) during 20 sec.

It shows 1% regression which is really small when we consider
the benefit of this feature and realworkload overhead(i.e.,
most overhead comes from compression).

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 131 +++---
 1 file changed, 115 insertions(+), 16 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 24d8dd1fc749..b9ff698115a1 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -550,6 +550,19 @@ static void set_zspage_mapping(struct page *first_page,
m->class = class_idx;
 }
 
+static bool check_isolated_page(struct page *first_page)
+{
+   struct page *cursor;
+
+   for (cursor = first_page; cursor != NULL; cursor =
+   get_next_page(cursor)) {
+   if (PageIsolated(cursor))
+   return true;
+   }
+
+   return false;
+}
+
 /*
  * zsmalloc divides the pool into various size classes where each
  * class maintains a list of zspages where each zspage is divided
@@ -1045,6 +1058,44 @@ void lock_zspage(struct page *first_page)
} while ((cursor = get_next_page(cursor)) != NULL);
 }
 
+int trylock_zspage(struct page *first_page, struct page *locked_page)
+{
+   struct page *cursor, *fail;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   for (cursor = first_page; cursor != NULL; cursor =
+   get_next_page(cursor)) {
+   if (cursor != locked_page) {
+   if (!trylock_page(cursor)) {
+   fail = cursor;
+   goto unlock;
+   }
+   }
+   }
+
+   return 1;
+unlock:
+   for (cursor = first_page; cursor != fail; cursor =
+   get_next_page(cursor)) {
+   if (cursor != locked_page)
+   unlock_page(cursor);
+   }
+
+   return 0;
+}
+
+void unlock_zspage(struct page *first_page, struct page *locked_page)
+{
+   struct page *cursor = first_page;
+
+   for (; cursor != NULL; cursor = get_next_page(cursor)) {
+   VM_BUG_ON_PAGE(!PageLocked(cursor), cursor);
+   if (cursor != locked_page)
+   unlock_page(cursor);
+   };
+}
+
 static void free_zspage(struct zs_pool *pool, struct page *first_page)
 {
struct page *nextp, *tmp;
@@ -1083,16 +1134,17 @@ static void init_zspage(struct size_class *class, 
struct page *first_page,
first_page->freelist = NULL;
INIT_LIST_HEAD(_page->lru);
set_zspage_inuse(first_page, 0);
-   BUG_ON(!trylock_page(first_page));
-   first_page->mapping = mapping;
-   __SetPageMovable(first_page);
-   unlock_page(first_page);
 
while (page) {
struct page *next_page;
struct link_free *link;
void *vaddr;
 
+   BUG_ON(!trylock_page(page));
+   page->mapping = mapping;
+   __SetPageMovable(page);
+   unlock_page(page);
+
vaddr = kmap_atomic(page);
link = (struct link_free *)vaddr + off / sizeof(*link);
 
@@ -1845,6 +1897,7 @@ static enum fullness_group putback_zspage(struct 
size_class *class,
 
VM_BUG_ON_PAGE(!list_empty(_page->lru), first_page);
VM_BUG_ON_PAGE(ZsPageIsolate(first_page), first_page);
+   VM_BUG_ON_PAGE(check_isolated_page(first_page), first_page);
 
fullness = get_fullness_group(class, first_page);
insert_zspage(class, fullness, first_page);
@@ -1951,6 +2004,12 @@ static struct page *isolate_source_page(struct 
size_class *class)
if (!page)
continue;
 
+   /* To prevent race between object and page migration */
+   if (!trylock_zspage(page, NULL)) {
+   page = NULL;
+   continue;
+   }
+
remove_zspage(class, i, page);
 
inuse = get_zspage_inuse(page);
@@ -1959,6 +2018,7 @@ static struct page *isolate_source_page(struct size_class 
*class)
if (inuse != freezed) {
unfreeze_zspage(class, page, freezed);
putback_zspage(class, page);
+   unlock_zspage(page, NULL);
page = NULL;
continue;
}
@@ -1990,6 +2050,12 @@ static struct page *isolate_target_page(struct 
size_class *class)
if (!page)
continue;
 
+   /* To prevent race between object and page migration */
+   if (!trylock_zspage(page, NULL)) {
+   page = NULL;
+

[PATCH v1 10/19] zsmalloc: squeeze inuse into page->mapping

2016-03-10 Thread Minchan Kim

Currently, we store class:fullness into page->mapping.
The number of class we can support is 255 and fullness is 4 so
(8 + 2 = 10bit) is enough to represent them.
Meanwhile, the bits we need to store in-use objects in zspage
is that 11bit is enough.

For example, If we assume that 64K PAGE_SIZE, class_size 32
which is worst case, class->pages_per_zspage become 1 so
the number of objects in zspage is 2048 so 11bit is enough.
The next class is 32 + 256(i.e., ZS_SIZE_CLASS_DELTA).
With worst case that ZS_MAX_PAGES_PER_ZSPAGE, 64K * 4 /
(32 + 256) = 910 so 11bit is still enough.

So, we could squeeze inuse object count to page->mapping.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 103 --
 1 file changed, 71 insertions(+), 32 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index ca663c82c1fc..954e8758a78d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -34,8 +34,7 @@
  * metadata.
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
- * page->mapping: class index and fullness group of the zspage
- * page->inuse: the number of objects that are used in this zspage
+ * page->mapping: override by struct zs_meta
  *
  * Usage of struct page flags:
  * PG_private: identifies the first component page
@@ -132,6 +131,13 @@
 /* each chunk includes extra space to keep handle */
 #define ZS_MAX_ALLOC_SIZE  PAGE_SIZE
 
+#define CLASS_BITS 8
+#define CLASS_MASK ((1 << CLASS_BITS) - 1)
+#define FULLNESS_BITS  2
+#define FULLNESS_MASK  ((1 << FULLNESS_BITS) - 1)
+#define INUSE_BITS 11
+#define INUSE_MASK ((1 << INUSE_BITS) - 1)
+
 /*
  * On systems with 4K page size, this gives 255 size classes! There is a
  * trader-off here:
@@ -145,7 +151,7 @@
  *  ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
  *  (reason above)
  */
-#define ZS_SIZE_CLASS_DELTA(PAGE_SIZE >> 8)
+#define ZS_SIZE_CLASS_DELTA(PAGE_SIZE >> CLASS_BITS)
 
 /*
  * We do not maintain any list for completely empty or full pages
@@ -155,7 +161,7 @@ enum fullness_group {
ZS_ALMOST_EMPTY,
_ZS_NR_FULLNESS_GROUPS,
 
-   ZS_EMPTY,
+   ZS_EMPTY = _ZS_NR_FULLNESS_GROUPS,
ZS_FULL
 };
 
@@ -263,14 +269,11 @@ struct zs_pool {
 #endif
 };
 
-/*
- * A zspage's class index and fullness group
- * are encoded in its (first)page->mapping
- */
-#define CLASS_IDX_BITS 28
-#define FULLNESS_BITS  4
-#define CLASS_IDX_MASK ((1 << CLASS_IDX_BITS) - 1)
-#define FULLNESS_MASK  ((1 << FULLNESS_BITS) - 1)
+struct zs_meta {
+   unsigned long class:CLASS_BITS;
+   unsigned long fullness:FULLNESS_BITS;
+   unsigned long inuse:INUSE_BITS;
+};
 
 struct mapping_area {
 #ifdef CONFIG_PGTABLE_MAPPING
@@ -413,28 +416,61 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
 }
 
+static int get_zspage_inuse(struct page *first_page)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)_page->mapping;
+
+   return m->inuse;
+}
+
+static void set_zspage_inuse(struct page *first_page, int val)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)_page->mapping;
+   m->inuse = val;
+}
+
+static void mod_zspage_inuse(struct page *first_page, int val)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)_page->mapping;
+   m->inuse += val;
+}
+
 static void get_zspage_mapping(struct page *first_page,
unsigned int *class_idx,
enum fullness_group *fullness)
 {
-   unsigned long m;
+   struct zs_meta *m;
+
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = (unsigned long)first_page->mapping;
-   *fullness = m & FULLNESS_MASK;
-   *class_idx = (m >> FULLNESS_BITS) & CLASS_IDX_MASK;
+   m = (struct zs_meta *)_page->mapping;
+   *fullness = m->fullness;
+   *class_idx = m->class;
 }
 
 static void set_zspage_mapping(struct page *first_page,
unsigned int class_idx,
enum fullness_group fullness)
 {
-   unsigned long m;
+   struct zs_meta *m;
+
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) |
-   (fullness & FULLNESS_MASK);
-   first_page->mapping = (struct address_space *)m;
+   m = (struct zs_meta *)_page->mapping;
+   m->fullness = fullness;
+   m->class = class_idx;
 }
 
 /*
@@ -627,9 +663,7 @@ static enum fullness_group get_fullness_group(struct 
size_class *class,
int inuse, objs_per_zspage;
enum fullness_group fg;
 
-

[PATCH v1 09/19] zsmalloc: keep max_object in size_class

2016-03-10 Thread Minchan Kim

Every zspage in a size_class has same number of max objects so
we could move it to a size_class.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b4fb11831acb..ca663c82c1fc 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -32,8 +32,6 @@
  * page->freelist: points to the first free object in zspage.
  * Free objects are linked together using in-place
  * metadata.
- * page->objects: maximum number of objects we can store in this
- * zspage (class->zspage_order * PAGE_SIZE / class->size)
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
  * page->mapping: class index and fullness group of the zspage
@@ -211,6 +209,7 @@ struct size_class {
 * of ZS_ALIGN.
 */
int size;
+   int objs_per_zspage;
unsigned int index;
 
struct zs_size_stat stats;
@@ -622,21 +621,22 @@ static inline void zs_pool_stat_destroy(struct zs_pool 
*pool)
  * the pool (not yet implemented). This function returns fullness
  * status of the given page.
  */
-static enum fullness_group get_fullness_group(struct page *first_page)
+static enum fullness_group get_fullness_group(struct size_class *class,
+   struct page *first_page)
 {
-   int inuse, max_objects;
+   int inuse, objs_per_zspage;
enum fullness_group fg;
 
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
inuse = first_page->inuse;
-   max_objects = first_page->objects;
+   objs_per_zspage = class->objs_per_zspage;
 
if (inuse == 0)
fg = ZS_EMPTY;
-   else if (inuse == max_objects)
+   else if (inuse == objs_per_zspage)
fg = ZS_FULL;
-   else if (inuse <= 3 * max_objects / fullness_threshold_frac)
+   else if (inuse <= 3 * objs_per_zspage / fullness_threshold_frac)
fg = ZS_ALMOST_EMPTY;
else
fg = ZS_ALMOST_FULL;
@@ -723,7 +723,7 @@ static enum fullness_group fix_fullness_group(struct 
size_class *class,
enum fullness_group currfg, newfg;
 
get_zspage_mapping(first_page, _idx, );
-   newfg = get_fullness_group(first_page);
+   newfg = get_fullness_group(class, first_page);
if (newfg == currfg)
goto out;
 
@@ -1003,9 +1003,6 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
init_zspage(class, first_page);
 
first_page->freelist = location_to_obj(first_page, 0);
-   /* Maximum number of objects we can store in this zspage */
-   first_page->objects = class->pages_per_zspage * PAGE_SIZE / class->size;
-
error = 0; /* Success */
 
 cleanup:
@@ -1235,11 +1232,11 @@ static bool can_merge(struct size_class *prev, int 
size, int pages_per_zspage)
return true;
 }
 
-static bool zspage_full(struct page *first_page)
+static bool zspage_full(struct size_class *class, struct page *first_page)
 {
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
-   return first_page->inuse == first_page->objects;
+   return first_page->inuse == class->objs_per_zspage;
 }
 
 unsigned long zs_get_total_pages(struct zs_pool *pool)
@@ -1625,7 +1622,7 @@ static int migrate_zspage(struct zs_pool *pool, struct 
size_class *class,
}
 
/* Stop if there is no more space */
-   if (zspage_full(d_page)) {
+   if (zspage_full(class, d_page)) {
unpin_tag(handle);
ret = -ENOMEM;
break;
@@ -1684,7 +1681,7 @@ static enum fullness_group putback_zspage(struct zs_pool 
*pool,
 {
enum fullness_group fullness;
 
-   fullness = get_fullness_group(first_page);
+   fullness = get_fullness_group(class, first_page);
insert_zspage(class, fullness, first_page);
set_zspage_mapping(first_page, class->index, fullness);
 
@@ -1933,6 +1930,8 @@ struct zs_pool *zs_create_pool(const char *name, gfp_t 
flags)
class->size = size;
class->index = i;
class->pages_per_zspage = pages_per_zspage;
+   class->objs_per_zspage = class->pages_per_zspage *
+   PAGE_SIZE / class->size;
if (pages_per_zspage == 1 &&
get_maxobj_per_zspage(size, pages_per_zspage) == 1)
class->huge = true;
-- 
1.9.1

[PATCH v1 13/19] zsmalloc: factor page chain functionality out

2016-03-10 Thread Minchan Kim

For migration, we need to create sub-page chain of zspage
dynamically so this patch factors it out from alloc_zspage.

As a minor refactoring, it makes OBJ_ALLOCATED_TAG assign
more clear in obj_malloc(it could be another patch but it's
trivial so I want to put together in this patch).

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 78 ++-
 1 file changed, 45 insertions(+), 33 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index bfc6a048afac..f86f8aaeb902 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -977,7 +977,9 @@ static void init_zspage(struct size_class *class, struct 
page *first_page)
unsigned long off = 0;
struct page *page = first_page;
 
-   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+   first_page->freelist = NULL;
+   INIT_LIST_HEAD(_page->lru);
+   set_zspage_inuse(first_page, 0);
 
while (page) {
struct page *next_page;
@@ -1022,13 +1024,44 @@ static void init_zspage(struct size_class *class, 
struct page *first_page)
set_freeobj(first_page, 0);
 }
 
+static void create_page_chain(struct page *pages[], int nr_pages)
+{
+   int i;
+   struct page *page;
+   struct page *prev_page = NULL;
+   struct page *first_page = NULL;
+
+   for (i = 0; i < nr_pages; i++) {
+   page = pages[i];
+
+   INIT_LIST_HEAD(>lru);
+   if (i == 0) {
+   SetPagePrivate(page);
+   set_page_private(page, 0);
+   first_page = page;
+   }
+
+   if (i == 1)
+   set_page_private(first_page, (unsigned long)page);
+   if (i >= 1)
+   set_page_private(page, (unsigned long)first_page);
+   if (i >= 2)
+   list_add(>lru, _page->lru);
+   if (i == nr_pages - 1)
+   SetPagePrivate2(page);
+
+   prev_page = page;
+   }
+}
+
 /*
  * Allocate a zspage for the given size class
  */
 static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
 {
-   int i, error;
+   int i;
struct page *first_page = NULL, *uninitialized_var(prev_page);
+   struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE];
 
/*
 * Allocate individual pages and link them together as:
@@ -1041,43 +1074,23 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
 * (i.e. no other sub-page has this flag set) and PG_private_2 to
 * identify the last page.
 */
-   error = -ENOMEM;
for (i = 0; i < class->pages_per_zspage; i++) {
struct page *page;
 
page = alloc_page(flags);
-   if (!page)
-   goto cleanup;
-
-   INIT_LIST_HEAD(>lru);
-   if (i == 0) {   /* first page */
-   page->freelist = NULL;
-   SetPagePrivate(page);
-   set_page_private(page, 0);
-   first_page = page;
-   set_zspage_inuse(page, 0);
+   if (!page) {
+   while (--i >= 0)
+   __free_page(pages[i]);
+   return NULL;
}
-   if (i == 1)
-   set_page_private(first_page, (unsigned long)page);
-   if (i >= 1)
-   set_page_private(page, (unsigned long)first_page);
-   if (i >= 2)
-   list_add(>lru, _page->lru);
-   if (i == class->pages_per_zspage - 1)   /* last page */
-   SetPagePrivate2(page);
-   prev_page = page;
+
+   pages[i] = page;
}
 
+   create_page_chain(pages, class->pages_per_zspage);
+   first_page = pages[0];
init_zspage(class, first_page);
 
-   error = 0; /* Success */
-
-cleanup:
-   if (unlikely(error) && first_page) {
-   free_zspage(first_page);
-   first_page = NULL;
-   }
-
return first_page;
 }
 
@@ -1419,7 +1432,6 @@ static unsigned long obj_malloc(struct size_class *class,
unsigned long m_offset;
void *vaddr;
 
-   handle |= OBJ_ALLOCATED_TAG;
obj = get_freeobj(first_page);
objidx_to_page_and_ofs(class, first_page, obj,
_page, _offset);
@@ -1429,10 +1441,10 @@ static unsigned long obj_malloc(struct size_class 
*class,
set_freeobj(first_page, link->next >> OBJ_ALLOCATED_TAG);
if (!class->huge)
/* record handle in the header of allocated chunk */
-   link->handle = handle;
+   link->handle = handle | OBJ_ALLOCATED_TAG;
else
/* record handle in first_page->private */
-   set_page_private(first_page, handle);
+

[PATCH v1 11/19] zsmalloc: squeeze freelist into page->mapping

2016-03-10 Thread Minchan Kim

Zsmalloc stores first free object's position into first_page->freelist
in each zspage. If we change it with object index from first_page
instead of location, we could squeeze it into page->mapping because
the number of bit we need to store offset is at most 11bit.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 159 +++---
 1 file changed, 96 insertions(+), 63 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 954e8758a78d..e23cd3b2dd71 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -18,9 +18,7 @@
  * Usage of struct page fields:
  * page->private: points to the first component (0-order) page
  * page->index (union with page->freelist): offset of the first object
- * starting in this page. For the first page, this is
- * always 0, so we use this field (aka freelist) to point
- * to the first free object in zspage.
+ * starting in this page.
  * page->lru: links together all component pages (except the first page)
  * of a zspage
  *
@@ -29,9 +27,6 @@
  * page->private: refers to the component page after the first page
  * If the page is first_page for huge object, it stores handle.
  * Look at size_class->huge.
- * page->freelist: points to the first free object in zspage.
- * Free objects are linked together using in-place
- * metadata.
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
  * page->mapping: override by struct zs_meta
@@ -131,6 +126,7 @@
 /* each chunk includes extra space to keep handle */
 #define ZS_MAX_ALLOC_SIZE  PAGE_SIZE
 
+#define FREEOBJ_BITS 11
 #define CLASS_BITS 8
 #define CLASS_MASK ((1 << CLASS_BITS) - 1)
 #define FULLNESS_BITS  2
@@ -228,17 +224,17 @@ struct size_class {
 
 /*
  * Placed within free objects to form a singly linked list.
- * For every zspage, first_page->freelist gives head of this list.
+ * For every zspage, first_page->freeobj gives head of this list.
  *
  * This must be power of 2 and less than or equal to ZS_ALIGN
  */
 struct link_free {
union {
/*
-* Position of next free chunk (encodes )
+* free object list
 * It's valid for non-allocated object
 */
-   void *next;
+   unsigned long next;
/*
 * Handle of allocated object.
 */
@@ -270,6 +266,7 @@ struct zs_pool {
 };
 
 struct zs_meta {
+   unsigned long freeobj:FREEOBJ_BITS;
unsigned long class:CLASS_BITS;
unsigned long fullness:FULLNESS_BITS;
unsigned long inuse:INUSE_BITS;
@@ -447,6 +444,26 @@ static void mod_zspage_inuse(struct page *first_page, int 
val)
m->inuse += val;
 }
 
+static void set_freeobj(struct page *first_page, int idx)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)_page->mapping;
+   m->freeobj = idx;
+}
+
+static unsigned long get_freeobj(struct page *first_page)
+{
+   struct zs_meta *m;
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+   m = (struct zs_meta *)_page->mapping;
+   return m->freeobj;
+}
+
 static void get_zspage_mapping(struct page *first_page,
unsigned int *class_idx,
enum fullness_group *fullness)
@@ -832,30 +849,33 @@ static struct page *get_next_page(struct page *page)
return next;
 }
 
-/*
- * Encode  as a single handle value.
- * We use the least bit of handle for tagging.
- */
-static void *location_to_obj(struct page *page, unsigned long obj_idx)
+static void objidx_to_page_and_ofs(struct size_class *class,
+   struct page *first_page,
+   unsigned long obj_idx,
+   struct page **obj_page,
+   unsigned long *ofs_in_page)
 {
-   unsigned long obj;
+   int i;
+   unsigned long ofs;
+   struct page *cursor;
+   int nr_page;
 
-   if (!page) {
-   VM_BUG_ON(obj_idx);
-   return NULL;
-   }
+   ofs = obj_idx * class->size;
+   cursor = first_page;
+   nr_page = ofs >> PAGE_SHIFT;
 
-   obj = page_to_pfn(page) << OBJ_INDEX_BITS;
-   obj |= ((obj_idx) & OBJ_INDEX_MASK);
-   obj <<= OBJ_TAG_BITS;
+   *ofs_in_page = ofs & ~PAGE_MASK;
+
+   for (i = 0; i < nr_page; i++)
+   cursor = get_next_page(cursor);
 
-   return (void *)obj;
+   *obj_page = cursor;
 }
 
-/*
- * Decode  pair from the given object handle. We adjust the
- * decoded obj_idx back to its original value since it was adjusted in
- * location_to_obj().
+/**
+ * obj_to_location - get (, ) from encoded object value
+ *

[PATCH v1 03/19] fs/anon_inodes: new interface to create new inode

2016-03-10 Thread Minchan Kim

From: Gioh Kim 

The anon_inodes has already complete interfaces to create manage
many anonymous inodes but don't have interface to get
new inode. Other sub-modules can create anonymous inode
without creating and mounting it's own pseudo filesystem.

Acked-by: Rafael Aquini 
Signed-off-by: Gioh Kim 
Signed-off-by: Minchan Kim 
---
 fs/anon_inodes.c| 6 ++
 include/linux/anon_inodes.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 80ef38c73e5a..1d51f96acdd9 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -162,6 +162,12 @@ int anon_inode_getfd(const char *name, const struct 
file_operations *fops,
 }
 EXPORT_SYMBOL_GPL(anon_inode_getfd);
 
+struct inode *anon_inode_new(void)
+{
+   return alloc_anon_inode(anon_inode_mnt->mnt_sb);
+}
+EXPORT_SYMBOL_GPL(anon_inode_new);
+
 static int __init anon_inode_init(void)
 {
anon_inode_mnt = kern_mount(_inode_fs_type);
diff --git a/include/linux/anon_inodes.h b/include/linux/anon_inodes.h
index 8013a45242fe..ddbd67f8a73f 100644
--- a/include/linux/anon_inodes.h
+++ b/include/linux/anon_inodes.h
@@ -15,6 +15,7 @@ struct file *anon_inode_getfile(const char *name,
void *priv, int flags);
 int anon_inode_getfd(const char *name, const struct file_operations *fops,
 void *priv, int flags);
+struct inode *anon_inode_new(void);
 
 #endif /* _LINUX_ANON_INODES_H */
 
-- 
1.9.1

[PATCH v1 06/19] zsmalloc: clean up many BUG_ON

2016-03-10 Thread Minchan Kim

There are many BUG_ON in zsmalloc.c which is not recommened so
change them as alternatives.

Normal rule is as follows:

1. avoid BUG_ON if possible. Instead, use VM_BUG_ON or VM_BUG_ON_PAGE
2. use VM_BUG_ON_PAGE if we need to see struct page's fields
3. use those assertion in primitive functions so higher functions
can rely on the assertion in the primitive function.
4. Don't use assertion if following instruction can trigger Oops

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 42 +++---
 1 file changed, 15 insertions(+), 27 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index bb29203ec6b3..3c82011cc405 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -419,7 +419,7 @@ static void get_zspage_mapping(struct page *first_page,
enum fullness_group *fullness)
 {
unsigned long m;
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
m = (unsigned long)first_page->mapping;
*fullness = m & FULLNESS_MASK;
@@ -431,7 +431,7 @@ static void set_zspage_mapping(struct page *first_page,
enum fullness_group fullness)
 {
unsigned long m;
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) |
(fullness & FULLNESS_MASK);
@@ -626,7 +626,8 @@ static enum fullness_group get_fullness_group(struct page 
*first_page)
 {
int inuse, max_objects;
enum fullness_group fg;
-   BUG_ON(!is_first_page(first_page));
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
inuse = first_page->inuse;
max_objects = first_page->objects;
@@ -654,7 +655,7 @@ static void insert_zspage(struct page *first_page, struct 
size_class *class,
 {
struct page **head;
 
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
@@ -686,13 +687,13 @@ static void remove_zspage(struct page *first_page, struct 
size_class *class,
 {
struct page **head;
 
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
 
head = >fullness_list[fullness];
-   BUG_ON(!*head);
+   VM_BUG_ON_PAGE(!*head, first_page);
if (list_empty(&(*head)->lru))
*head = NULL;
else if (*head == first_page)
@@ -719,8 +720,6 @@ static enum fullness_group fix_fullness_group(struct 
size_class *class,
int class_idx;
enum fullness_group currfg, newfg;
 
-   BUG_ON(!is_first_page(first_page));
-
get_zspage_mapping(first_page, _idx, );
newfg = get_fullness_group(first_page);
if (newfg == currfg)
@@ -806,7 +805,7 @@ static void *location_to_obj(struct page *page, unsigned 
long obj_idx)
unsigned long obj;
 
if (!page) {
-   BUG_ON(obj_idx);
+   VM_BUG_ON(obj_idx);
return NULL;
}
 
@@ -839,7 +838,7 @@ static unsigned long obj_to_head(struct size_class *class, 
struct page *page,
void *obj)
 {
if (class->huge) {
-   VM_BUG_ON(!is_first_page(page));
+   VM_BUG_ON_PAGE(!is_first_page(page), page);
return page_private(page);
} else
return *(unsigned long *)obj;
@@ -889,8 +888,8 @@ static void free_zspage(struct page *first_page)
 {
struct page *nextp, *tmp, *head_extra;
 
-   BUG_ON(!is_first_page(first_page));
-   BUG_ON(first_page->inuse);
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+   VM_BUG_ON_PAGE(first_page->inuse, first_page);
 
head_extra = (struct page *)page_private(first_page);
 
@@ -916,7 +915,8 @@ static void init_zspage(struct page *first_page, struct 
size_class *class)
unsigned long off = 0;
struct page *page = first_page;
 
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
while (page) {
struct page *next_page;
struct link_free *link;
@@ -1235,7 +1235,7 @@ static bool can_merge(struct size_class *prev, int size, 
int pages_per_zspage)
 
 static bool zspage_full(struct page *first_page)
 {
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
return first_page->inuse == first_page->objects;
 }
@@ -1273,14 +1273,12 @@ void *zs_map_object(struct zs_pool *pool, unsigned long 
handle,
struct page *pages[2];
void *ret;
 
-   BUG_ON(!handle);
-
/*
 * Because we use per-cpu mapping areas shared among the
 * pools/users,

[PATCH v1 01/19] mm: use put_page to free page instead of putback_lru_page

2016-03-10 Thread Minchan Kim

Procedure of page migration is as follows:

First of all, it should isolate a page from LRU and try to
migrate the page. If it is successful, it releases the page
for freeing. Otherwise, it should put the page back to LRU
list.

For LRU pages, we have used putback_lru_page for both freeing
and putback to LRU list. It's okay because put_page is aware of
LRU list so if it releases last refcount of the page, it removes
the page from LRU list. However, It makes unnecessary operations
(e.g., lru_cache_add, pagevec and flags operations. It would be
not significant but no worth to do) and harder to support new
non-lru page migration because put_page isn't aware of non-lru
page's data structure.

To solve the problem, we can add new hook in put_page with
PageMovable flags check but it can increase overhead in
hot path and needs new locking scheme to stabilize the flag check
with put_page.

So, this patch cleans it up to divide two semantic(ie, put and putback).
If migration is successful, use put_page instead of putback_lru_page and
use putback_lru_page only on failure. That makes code more readable
and doesn't add overhead in put_page.

Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Naoya Horiguchi 
Signed-off-by: Minchan Kim 
---
 mm/migrate.c | 49 ++---
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 3ad0fea5c438..bf31ea9ffaf8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -907,6 +907,14 @@ static int __unmap_and_move(struct page *page, struct page 
*newpage,
put_anon_vma(anon_vma);
unlock_page(page);
 out:
+   /* If migration is scucessful, move newpage to right list */
+   if (rc == MIGRATEPAGE_SUCCESS) {
+   if (unlikely(__is_movable_balloon_page(newpage)))
+   put_page(newpage);
+   else
+   putback_lru_page(newpage);
+   }
+
return rc;
 }
 
@@ -940,6 +948,12 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
 
if (page_count(page) == 1) {
/* page was freed from under us. So we are done. */
+   ClearPageActive(page);
+   ClearPageUnevictable(page);
+   if (put_new_page)
+   put_new_page(newpage, private);
+   else
+   put_page(newpage);
goto out;
}
 
@@ -952,9 +966,6 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
}
 
rc = __unmap_and_move(page, newpage, force, mode);
-   if (rc == MIGRATEPAGE_SUCCESS)
-   put_new_page = NULL;
-
 out:
if (rc != -EAGAIN) {
/*
@@ -966,28 +977,28 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
list_del(>lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
-   /* Soft-offlined page shouldn't go through lru cache list */
+   }
+
+   /*
+* If migration is successful, drop the reference grabbed during
+* isolation. Otherwise, restore the page to LRU list unless we
+* want to retry.
+*/
+   if (rc == MIGRATEPAGE_SUCCESS) {
+   put_page(page);
if (reason == MR_MEMORY_FAILURE) {
-   put_page(page);
if (!test_set_page_hwpoison(page))
num_poisoned_pages_inc();
-   } else
+   }
+   } else {
+   if (rc != -EAGAIN)
putback_lru_page(page);
+   if (put_new_page)
+   put_new_page(newpage, private);
+   else
+   put_page(newpage);
}
 
-   /*
-* If migration was not successful and there's a freeing callback, use
-* it.  Otherwise, putback_lru_page() will drop the reference grabbed
-* during isolation.
-*/
-   if (put_new_page)
-   put_new_page(newpage, private);
-   else if (unlikely(__is_movable_balloon_page(newpage))) {
-   /* drop our reference, page already in the balloon */
-   put_page(newpage);
-   } else
-   putback_lru_page(newpage);
-
if (result) {
if (rc)
*result = rc;
-- 
1.9.1

[PATCH v1 02/19] mm/compaction: support non-lru movable page migration

2016-03-10 Thread Minchan Kim

We have allowed migration for only LRU pages until now and it was
enough to make high-order pages. But recently, embedded system(e.g.,
webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
so we have seen several reports about troubles of small high-order
allocation. For fixing the problem, there were several efforts
(e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
reserved memory, vmalloc and so on) but if there are lots of
non-movable pages in system, their solutions are void in the long run.

So, this patch is to support facility to change non-movable pages
with movable. For the feature, this patch introduces functions related
to migration to address_space_operations as well as some page flags.

Basically, this patch supports two page-flags and two functions related
to page migration. The flag and page->mapping stability are protected
by PG_lock.

PG_movable
PG_isolated

bool (*isolate_page) (struct page *, isolate_mode_t);
void (*putback_page) (struct page *);

Duty of subsystem want to make their pages as migratable are
as follows:

1. It should register address_space to page->mapping then mark
the page as PG_movable via __SetPageMovable.

2. It should mark the page as PG_isolated via SetPageIsolated
if isolation is sucessful and return true.

3. If migration is successful, it should clear PG_isolated and
PG_movable of the page for free preparation then release the
reference of the page to free.

4. If migration fails, putback function of subsystem should
clear PG_isolated via ClearPageIsolated.

Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: dri-de...@lists.freedesktop.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Gioh Kim 
Signed-off-by: Minchan Kim 
---
 Documentation/filesystems/Locking  |   4 +
 Documentation/filesystems/vfs.txt  |   5 ++
 fs/proc/page.c |   3 +
 include/linux/compaction.h |   8 ++
 include/linux/fs.h |   2 +
 include/linux/migrate.h|   2 +
 include/linux/page-flags.h |  29 
 include/uapi/linux/kernel-page-flags.h |   1 +
 mm/compaction.c|  14 +++-
 mm/migrate.c   | 132 +
 10 files changed, 185 insertions(+), 15 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 619af9bfdcb3..0bb79560abb3 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -195,7 +195,9 @@ unlocks and drops the reference.
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
+   bool (*isolate_page) (struct page *, isolate_mode_t);
int (*migratepage)(struct address_space *, struct page *, struct page 
*);
+   void (*putback_page) (struct page *);
int (*launder_page)(struct page *);
int (*is_partially_uptodate)(struct page *, unsigned long, unsigned 
long);
int (*error_remove_page)(struct address_space *, struct page *);
@@ -219,7 +221,9 @@ invalidatepage: yes
 releasepage:   yes
 freepage:  yes
 direct_IO:
+isolate_page:  yes
 migratepage:   yes (both)
+putback_page:  yes
 launder_page:  yes
 is_partially_uptodate: yes
 error_remove_page: yes
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index b02a7d598258..4c1b6c3b4bc8 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -592,9 +592,14 @@ struct address_space_operations {
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t 
offset);
+   /* isolate a page for migration */
+   bool (*isolate_page) (struct page *, isolate_mode_t);
/* migrate the contents of a page to the specified target */
int (*migratepage) (struct page *, struct page *);
+   /* put the page back to right list */
+   void (*putback_page) (struct page *);
int (*launder_page) (struct page *);
+
int (*is_partially_uptodate) (struct page *, unsigned long,
unsigned long);
void (*is_dirty_writeback) (struct page *, bool *, bool *);
diff --git a/fs/proc/page.c b/fs/proc/page.c
index b2855eea5405..b2bab774adea 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -155,6 +155,9 @@ u64 stable_page_flags(struct page *page)
if (page_is_idle(page))
u |= 1 << KPF_IDLE;
 
+   if (PageMovable(page))
+   u |= 1 << KPF_MOVABLE;
+
u |= kpf_copy_bit(k, KPF_LOCKED,PG_locked);
 
u |= kpf_copy_bit(k,

[PATCH v1 02/19] mm/compaction: support non-lru movable page migration

2016-03-10 Thread Minchan Kim

We have allowed migration for only LRU pages until now and it was
enough to make high-order pages. But recently, embedded system(e.g.,
webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
so we have seen several reports about troubles of small high-order
allocation. For fixing the problem, there were several efforts
(e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
reserved memory, vmalloc and so on) but if there are lots of
non-movable pages in system, their solutions are void in the long run.

So, this patch is to support facility to change non-movable pages
with movable. For the feature, this patch introduces functions related
to migration to address_space_operations as well as some page flags.

Basically, this patch supports two page-flags and two functions related
to page migration. The flag and page->mapping stability are protected
by PG_lock.

PG_movable
PG_isolated

bool (*isolate_page) (struct page *, isolate_mode_t);
void (*putback_page) (struct page *);

Duty of subsystem want to make their pages as migratable are
as follows:

1. It should register address_space to page->mapping then mark
the page as PG_movable via __SetPageMovable.

2. It should mark the page as PG_isolated via SetPageIsolated
if isolation is sucessful and return true.

3. If migration is successful, it should clear PG_isolated and
PG_movable of the page for free preparation then release the
reference of the page to free.

4. If migration fails, putback function of subsystem should
clear PG_isolated via ClearPageIsolated.

Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: dri-de...@lists.freedesktop.org
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Gioh Kim 
Signed-off-by: Minchan Kim 
---
 Documentation/filesystems/Locking  |   4 +
 Documentation/filesystems/vfs.txt  |   5 ++
 fs/proc/page.c |   3 +
 include/linux/compaction.h |   8 ++
 include/linux/fs.h |   2 +
 include/linux/migrate.h|   2 +
 include/linux/page-flags.h |  29 
 include/uapi/linux/kernel-page-flags.h |   1 +
 mm/compaction.c|  14 +++-
 mm/migrate.c   | 132 +
 10 files changed, 185 insertions(+), 15 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 619af9bfdcb3..0bb79560abb3 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -195,7 +195,9 @@ unlocks and drops the reference.
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
+   bool (*isolate_page) (struct page *, isolate_mode_t);
int (*migratepage)(struct address_space *, struct page *, struct page 
*);
+   void (*putback_page) (struct page *);
int (*launder_page)(struct page *);
int (*is_partially_uptodate)(struct page *, unsigned long, unsigned 
long);
int (*error_remove_page)(struct address_space *, struct page *);
@@ -219,7 +221,9 @@ invalidatepage: yes
 releasepage:   yes
 freepage:  yes
 direct_IO:
+isolate_page:  yes
 migratepage:   yes (both)
+putback_page:  yes
 launder_page:  yes
 is_partially_uptodate: yes
 error_remove_page: yes
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index b02a7d598258..4c1b6c3b4bc8 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -592,9 +592,14 @@ struct address_space_operations {
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t 
offset);
+   /* isolate a page for migration */
+   bool (*isolate_page) (struct page *, isolate_mode_t);
/* migrate the contents of a page to the specified target */
int (*migratepage) (struct page *, struct page *);
+   /* put the page back to right list */
+   void (*putback_page) (struct page *);
int (*launder_page) (struct page *);
+
int (*is_partially_uptodate) (struct page *, unsigned long,
unsigned long);
void (*is_dirty_writeback) (struct page *, bool *, bool *);
diff --git a/fs/proc/page.c b/fs/proc/page.c
index b2855eea5405..b2bab774adea 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -155,6 +155,9 @@ u64 stable_page_flags(struct page *page)
if (page_is_idle(page))
u |= 1 << KPF_IDLE;
 
+   if (PageMovable(page))
+   u |= 1 << KPF_MOVABLE;
+
u |= kpf_copy_bit(k, KPF_LOCKED,PG_locked);
 
u |= kpf_copy_bit(k, KPF_SLAB,  PG_slab);
diff --git a/include/linux/compaction.h b/include/linux/compaction.h

[PATCH v1 03/19] fs/anon_inodes: new interface to create new inode

2016-03-10 Thread Minchan Kim

From: Gioh Kim 

The anon_inodes has already complete interfaces to create manage
many anonymous inodes but don't have interface to get
new inode. Other sub-modules can create anonymous inode
without creating and mounting it's own pseudo filesystem.

Acked-by: Rafael Aquini 
Signed-off-by: Gioh Kim 
Signed-off-by: Minchan Kim 
---
 fs/anon_inodes.c| 6 ++
 include/linux/anon_inodes.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 80ef38c73e5a..1d51f96acdd9 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -162,6 +162,12 @@ int anon_inode_getfd(const char *name, const struct 
file_operations *fops,
 }
 EXPORT_SYMBOL_GPL(anon_inode_getfd);
 
+struct inode *anon_inode_new(void)
+{
+   return alloc_anon_inode(anon_inode_mnt->mnt_sb);
+}
+EXPORT_SYMBOL_GPL(anon_inode_new);
+
 static int __init anon_inode_init(void)
 {
anon_inode_mnt = kern_mount(_inode_fs_type);
diff --git a/include/linux/anon_inodes.h b/include/linux/anon_inodes.h
index 8013a45242fe..ddbd67f8a73f 100644
--- a/include/linux/anon_inodes.h
+++ b/include/linux/anon_inodes.h
@@ -15,6 +15,7 @@ struct file *anon_inode_getfile(const char *name,
void *priv, int flags);
 int anon_inode_getfd(const char *name, const struct file_operations *fops,
 void *priv, int flags);
+struct inode *anon_inode_new(void);
 
 #endif /* _LINUX_ANON_INODES_H */
 
-- 
1.9.1

[PATCH v1 06/19] zsmalloc: clean up many BUG_ON

2016-03-10 Thread Minchan Kim

There are many BUG_ON in zsmalloc.c which is not recommened so
change them as alternatives.

Normal rule is as follows:

1. avoid BUG_ON if possible. Instead, use VM_BUG_ON or VM_BUG_ON_PAGE
2. use VM_BUG_ON_PAGE if we need to see struct page's fields
3. use those assertion in primitive functions so higher functions
can rely on the assertion in the primitive function.
4. Don't use assertion if following instruction can trigger Oops

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 42 +++---
 1 file changed, 15 insertions(+), 27 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index bb29203ec6b3..3c82011cc405 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -419,7 +419,7 @@ static void get_zspage_mapping(struct page *first_page,
enum fullness_group *fullness)
 {
unsigned long m;
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
m = (unsigned long)first_page->mapping;
*fullness = m & FULLNESS_MASK;
@@ -431,7 +431,7 @@ static void set_zspage_mapping(struct page *first_page,
enum fullness_group fullness)
 {
unsigned long m;
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) |
(fullness & FULLNESS_MASK);
@@ -626,7 +626,8 @@ static enum fullness_group get_fullness_group(struct page 
*first_page)
 {
int inuse, max_objects;
enum fullness_group fg;
-   BUG_ON(!is_first_page(first_page));
+
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
inuse = first_page->inuse;
max_objects = first_page->objects;
@@ -654,7 +655,7 @@ static void insert_zspage(struct page *first_page, struct 
size_class *class,
 {
struct page **head;
 
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
@@ -686,13 +687,13 @@ static void remove_zspage(struct page *first_page, struct 
size_class *class,
 {
struct page **head;
 
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
 
head = >fullness_list[fullness];
-   BUG_ON(!*head);
+   VM_BUG_ON_PAGE(!*head, first_page);
if (list_empty(&(*head)->lru))
*head = NULL;
else if (*head == first_page)
@@ -719,8 +720,6 @@ static enum fullness_group fix_fullness_group(struct 
size_class *class,
int class_idx;
enum fullness_group currfg, newfg;
 
-   BUG_ON(!is_first_page(first_page));
-
get_zspage_mapping(first_page, _idx, );
newfg = get_fullness_group(first_page);
if (newfg == currfg)
@@ -806,7 +805,7 @@ static void *location_to_obj(struct page *page, unsigned 
long obj_idx)
unsigned long obj;
 
if (!page) {
-   BUG_ON(obj_idx);
+   VM_BUG_ON(obj_idx);
return NULL;
}
 
@@ -839,7 +838,7 @@ static unsigned long obj_to_head(struct size_class *class, 
struct page *page,
void *obj)
 {
if (class->huge) {
-   VM_BUG_ON(!is_first_page(page));
+   VM_BUG_ON_PAGE(!is_first_page(page), page);
return page_private(page);
} else
return *(unsigned long *)obj;
@@ -889,8 +888,8 @@ static void free_zspage(struct page *first_page)
 {
struct page *nextp, *tmp, *head_extra;
 
-   BUG_ON(!is_first_page(first_page));
-   BUG_ON(first_page->inuse);
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+   VM_BUG_ON_PAGE(first_page->inuse, first_page);
 
head_extra = (struct page *)page_private(first_page);
 
@@ -916,7 +915,8 @@ static void init_zspage(struct page *first_page, struct 
size_class *class)
unsigned long off = 0;
struct page *page = first_page;
 
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
while (page) {
struct page *next_page;
struct link_free *link;
@@ -1235,7 +1235,7 @@ static bool can_merge(struct size_class *prev, int size, 
int pages_per_zspage)
 
 static bool zspage_full(struct page *first_page)
 {
-   BUG_ON(!is_first_page(first_page));
+   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
 
return first_page->inuse == first_page->objects;
 }
@@ -1273,14 +1273,12 @@ void *zs_map_object(struct zs_pool *pool, unsigned long 
handle,
struct page *pages[2];
void *ret;
 
-   BUG_ON(!handle);
-
/*
 * Because we use per-cpu mapping areas shared among the
 * pools/users, we can't allow

[PATCH v1 01/19] mm: use put_page to free page instead of putback_lru_page

2016-03-10 Thread Minchan Kim

Procedure of page migration is as follows:

First of all, it should isolate a page from LRU and try to
migrate the page. If it is successful, it releases the page
for freeing. Otherwise, it should put the page back to LRU
list.

For LRU pages, we have used putback_lru_page for both freeing
and putback to LRU list. It's okay because put_page is aware of
LRU list so if it releases last refcount of the page, it removes
the page from LRU list. However, It makes unnecessary operations
(e.g., lru_cache_add, pagevec and flags operations. It would be
not significant but no worth to do) and harder to support new
non-lru page migration because put_page isn't aware of non-lru
page's data structure.

To solve the problem, we can add new hook in put_page with
PageMovable flags check but it can increase overhead in
hot path and needs new locking scheme to stabilize the flag check
with put_page.

So, this patch cleans it up to divide two semantic(ie, put and putback).
If migration is successful, use put_page instead of putback_lru_page and
use putback_lru_page only on failure. That makes code more readable
and doesn't add overhead in put_page.

Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Naoya Horiguchi 
Signed-off-by: Minchan Kim 
---
 mm/migrate.c | 49 ++---
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 3ad0fea5c438..bf31ea9ffaf8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -907,6 +907,14 @@ static int __unmap_and_move(struct page *page, struct page 
*newpage,
put_anon_vma(anon_vma);
unlock_page(page);
 out:
+   /* If migration is scucessful, move newpage to right list */
+   if (rc == MIGRATEPAGE_SUCCESS) {
+   if (unlikely(__is_movable_balloon_page(newpage)))
+   put_page(newpage);
+   else
+   putback_lru_page(newpage);
+   }
+
return rc;
 }
 
@@ -940,6 +948,12 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
 
if (page_count(page) == 1) {
/* page was freed from under us. So we are done. */
+   ClearPageActive(page);
+   ClearPageUnevictable(page);
+   if (put_new_page)
+   put_new_page(newpage, private);
+   else
+   put_page(newpage);
goto out;
}
 
@@ -952,9 +966,6 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
}
 
rc = __unmap_and_move(page, newpage, force, mode);
-   if (rc == MIGRATEPAGE_SUCCESS)
-   put_new_page = NULL;
-
 out:
if (rc != -EAGAIN) {
/*
@@ -966,28 +977,28 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
list_del(>lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
-   /* Soft-offlined page shouldn't go through lru cache list */
+   }
+
+   /*
+* If migration is successful, drop the reference grabbed during
+* isolation. Otherwise, restore the page to LRU list unless we
+* want to retry.
+*/
+   if (rc == MIGRATEPAGE_SUCCESS) {
+   put_page(page);
if (reason == MR_MEMORY_FAILURE) {
-   put_page(page);
if (!test_set_page_hwpoison(page))
num_poisoned_pages_inc();
-   } else
+   }
+   } else {
+   if (rc != -EAGAIN)
putback_lru_page(page);
+   if (put_new_page)
+   put_new_page(newpage, private);
+   else
+   put_page(newpage);
}
 
-   /*
-* If migration was not successful and there's a freeing callback, use
-* it.  Otherwise, putback_lru_page() will drop the reference grabbed
-* during isolation.
-*/
-   if (put_new_page)
-   put_new_page(newpage, private);
-   else if (unlikely(__is_movable_balloon_page(newpage))) {
-   /* drop our reference, page already in the balloon */
-   put_page(newpage);
-   } else
-   putback_lru_page(newpage);
-
if (result) {
if (rc)
*result = rc;
-- 
1.9.1

Re: [PATCH] android: lmk: add swap pte pmd in tasksize

2016-03-10 Thread yalin wang


> On Mar 11, 2016, at 15:23, Lu Bing  wrote:
> 
> From: l00215322 
> 
> Many android devices have zram,so we should add "MM_SWAPENTS" in tasksize.
> Refer oom_kill.c,we add pte also.
> 
> Reviewed-by: Chen Feng 
> Reviewed-by: Fu Jun 
> Reviewed-by: Xu YiPing 
> Reviewed-by: Yu DongBin 
> Signed-off-by: Lu Bing 
> ---
> drivers/staging/android/lowmemorykiller.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/android/lowmemorykiller.c 
> b/drivers/staging/android/lowmemorykiller.c
> index 8b5a4a8..0817d3b 100644
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ b/drivers/staging/android/lowmemorykiller.c
> @@ -139,7 +139,9 @@ static unsigned long lowmem_scan(struct shrinker *s, 
> struct shrink_control *sc)
>   task_unlock(p);
>   continue;
>   }
> - tasksize = get_mm_rss(p->mm);
> + tasksize = get_mm_rss(p->mm) +
> + get_mm_counter(p->mm, MM_SWAPENTS) +
> + atomic_long_read(>mm->nr_ptes) + mm_nr_pmds(p->mm);
why not introduce a mm_nr_ptes()  help function here ?
more clear to see .

>   task_unlock(p);
>   if (tasksize <= 0)
>   continue;
> -- 
> 1.8.3.2
>

[PATCH v1 19/19] zram: use __GFP_MOVABLE for memory allocation

2016-03-10 Thread Minchan Kim

Zsmalloc is ready for page migration so zram can use __GFP_MOVABLE
from now on.

I did test to see how it helps to make higher order pages.
Test scenario is as follows.

KVM guest, 1G memory, ext4 formated zram block device,

for i in `seq 1 8`;
do
dd if=/dev/vda1 of=mnt/test$i.txt bs=128M count=1 &
done

wait `pidof dd`

for i in `seq 1 2 8`;
do
rm -rf mnt/test$i.txt
done
fstrim -v mnt

echo "init"
cat /proc/buddyinfo

echo "compaction"
echo 1 > /proc/sys/vm/compact_memory
cat /proc/buddyinfo

old:

init
Node 0, zone  DMA208120 51 41 11  0  0  0   
   0  0  0
Node 0, zoneDMA32  16380  13777   9184   3805789 54  3  0   
   0  0  0
compaction
Node 0, zone  DMA132 82 40 39 16  2  1  0   
   0  0  0
Node 0, zoneDMA32   5219   5526   4969   3455   1831677139 15   
   0  0  0

new:

init
Node 0, zone  DMA379115 97 19  2  0  0  0   
   0  0  0
Node 0, zoneDMA32  18891  16774  10862   3947637 21  0  0   
   0  0  0
compaction  1
Node 0, zone  DMA214 66 87 29 10  3  0  0   
   0  0  0
Node 0, zoneDMA32   1612   3139   3154   2469   1745990384 94   
   7  0  0

As you can see, compaction made so many high-order pages. Yay!

Signed-off-by: Minchan Kim 
---
 drivers/block/zram/zram_drv.c | 3 ++-
 mm/zsmalloc.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 370c2f76016d..10f6ff1cf6a0 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -514,7 +514,8 @@ static struct zram_meta *zram_meta_alloc(char *pool_name, 
u64 disksize)
goto out_error;
}
 
-   meta->mem_pool = zs_create_pool(pool_name, GFP_NOIO | __GFP_HIGHMEM);
+   meta->mem_pool = zs_create_pool(pool_name, GFP_NOIO|__GFP_HIGHMEM
+   |__GFP_MOVABLE);
if (!meta->mem_pool) {
pr_err("Error creating memory pool\n");
goto out_error;
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b9ff698115a1..a1f47a7b3a99 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -307,7 +307,7 @@ static void destroy_handle_cache(struct zs_pool *pool)
 static unsigned long alloc_handle(struct zs_pool *pool)
 {
return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
-   pool->flags & ~__GFP_HIGHMEM);
+   pool->flags & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
 }
 
 static void free_handle(struct zs_pool *pool, unsigned long handle)
-- 
1.9.1

Re: [PATCH] android: lmk: add swap pte pmd in tasksize

2016-03-10 Thread yalin wang


> On Mar 11, 2016, at 15:23, Lu Bing  wrote:
> 
> From: l00215322 
> 
> Many android devices have zram,so we should add "MM_SWAPENTS" in tasksize.
> Refer oom_kill.c,we add pte also.
> 
> Reviewed-by: Chen Feng 
> Reviewed-by: Fu Jun 
> Reviewed-by: Xu YiPing 
> Reviewed-by: Yu DongBin 
> Signed-off-by: Lu Bing 
> ---
> drivers/staging/android/lowmemorykiller.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/android/lowmemorykiller.c 
> b/drivers/staging/android/lowmemorykiller.c
> index 8b5a4a8..0817d3b 100644
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ b/drivers/staging/android/lowmemorykiller.c
> @@ -139,7 +139,9 @@ static unsigned long lowmem_scan(struct shrinker *s, 
> struct shrink_control *sc)
>   task_unlock(p);
>   continue;
>   }
> - tasksize = get_mm_rss(p->mm);
> + tasksize = get_mm_rss(p->mm) +
> + get_mm_counter(p->mm, MM_SWAPENTS) +
> + atomic_long_read(>mm->nr_ptes) + mm_nr_pmds(p->mm);
why not introduce a mm_nr_ptes()  help function here ?
more clear to see .

>   task_unlock(p);
>   if (tasksize <= 0)
>   continue;
> -- 
> 1.8.3.2
>

[PATCH v1 19/19] zram: use __GFP_MOVABLE for memory allocation

2016-03-10 Thread Minchan Kim

Zsmalloc is ready for page migration so zram can use __GFP_MOVABLE
from now on.

I did test to see how it helps to make higher order pages.
Test scenario is as follows.

KVM guest, 1G memory, ext4 formated zram block device,

for i in `seq 1 8`;
do
dd if=/dev/vda1 of=mnt/test$i.txt bs=128M count=1 &
done

wait `pidof dd`

for i in `seq 1 2 8`;
do
rm -rf mnt/test$i.txt
done
fstrim -v mnt

echo "init"
cat /proc/buddyinfo

echo "compaction"
echo 1 > /proc/sys/vm/compact_memory
cat /proc/buddyinfo

old:

init
Node 0, zone  DMA208120 51 41 11  0  0  0   
   0  0  0
Node 0, zoneDMA32  16380  13777   9184   3805789 54  3  0   
   0  0  0
compaction
Node 0, zone  DMA132 82 40 39 16  2  1  0   
   0  0  0
Node 0, zoneDMA32   5219   5526   4969   3455   1831677139 15   
   0  0  0

new:

init
Node 0, zone  DMA379115 97 19  2  0  0  0   
   0  0  0
Node 0, zoneDMA32  18891  16774  10862   3947637 21  0  0   
   0  0  0
compaction  1
Node 0, zone  DMA214 66 87 29 10  3  0  0   
   0  0  0
Node 0, zoneDMA32   1612   3139   3154   2469   1745990384 94   
   7  0  0

As you can see, compaction made so many high-order pages. Yay!

Signed-off-by: Minchan Kim 
---
 drivers/block/zram/zram_drv.c | 3 ++-
 mm/zsmalloc.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 370c2f76016d..10f6ff1cf6a0 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -514,7 +514,8 @@ static struct zram_meta *zram_meta_alloc(char *pool_name, 
u64 disksize)
goto out_error;
}
 
-   meta->mem_pool = zs_create_pool(pool_name, GFP_NOIO | __GFP_HIGHMEM);
+   meta->mem_pool = zs_create_pool(pool_name, GFP_NOIO|__GFP_HIGHMEM
+   |__GFP_MOVABLE);
if (!meta->mem_pool) {
pr_err("Error creating memory pool\n");
goto out_error;
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b9ff698115a1..a1f47a7b3a99 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -307,7 +307,7 @@ static void destroy_handle_cache(struct zs_pool *pool)
 static unsigned long alloc_handle(struct zs_pool *pool)
 {
return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
-   pool->flags & ~__GFP_HIGHMEM);
+   pool->flags & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
 }
 
 static void free_handle(struct zs_pool *pool, unsigned long handle)
-- 
1.9.1

[PATCH v1 05/19] zsmalloc: use first_page rather than page

2016-03-10 Thread Minchan Kim

This patch cleans up function parameter "struct page".
Many functions of zsmalloc expects that page paramter is "first_page"
so use "first_page" rather than "page" for code readability.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 62 ++-
 1 file changed, 32 insertions(+), 30 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 2d7c4c11fc63..bb29203ec6b3 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -414,26 +414,28 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
 }
 
-static void get_zspage_mapping(struct page *page, unsigned int *class_idx,
+static void get_zspage_mapping(struct page *first_page,
+   unsigned int *class_idx,
enum fullness_group *fullness)
 {
unsigned long m;
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
-   m = (unsigned long)page->mapping;
+   m = (unsigned long)first_page->mapping;
*fullness = m & FULLNESS_MASK;
*class_idx = (m >> FULLNESS_BITS) & CLASS_IDX_MASK;
 }
 
-static void set_zspage_mapping(struct page *page, unsigned int class_idx,
+static void set_zspage_mapping(struct page *first_page,
+   unsigned int class_idx,
enum fullness_group fullness)
 {
unsigned long m;
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) |
(fullness & FULLNESS_MASK);
-   page->mapping = (struct address_space *)m;
+   first_page->mapping = (struct address_space *)m;
 }
 
 /*
@@ -620,14 +622,14 @@ static inline void zs_pool_stat_destroy(struct zs_pool 
*pool)
  * the pool (not yet implemented). This function returns fullness
  * status of the given page.
  */
-static enum fullness_group get_fullness_group(struct page *page)
+static enum fullness_group get_fullness_group(struct page *first_page)
 {
int inuse, max_objects;
enum fullness_group fg;
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
-   inuse = page->inuse;
-   max_objects = page->objects;
+   inuse = first_page->inuse;
+   max_objects = first_page->objects;
 
if (inuse == 0)
fg = ZS_EMPTY;
@@ -647,12 +649,12 @@ static enum fullness_group get_fullness_group(struct page 
*page)
  * have. This functions inserts the given zspage into the freelist
  * identified by .
  */
-static void insert_zspage(struct page *page, struct size_class *class,
+static void insert_zspage(struct page *first_page, struct size_class *class,
enum fullness_group fullness)
 {
struct page **head;
 
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
@@ -662,7 +664,7 @@ static void insert_zspage(struct page *page, struct 
size_class *class,
 
head = >fullness_list[fullness];
if (!*head) {
-   *head = page;
+   *head = first_page;
return;
}
 
@@ -670,21 +672,21 @@ static void insert_zspage(struct page *page, struct 
size_class *class,
 * We want to see more ZS_FULL pages and less almost
 * empty/full. Put pages with higher ->inuse first.
 */
-   list_add_tail(>lru, &(*head)->lru);
-   if (page->inuse >= (*head)->inuse)
-   *head = page;
+   list_add_tail(_page->lru, &(*head)->lru);
+   if (first_page->inuse >= (*head)->inuse)
+   *head = first_page;
 }
 
 /*
  * This function removes the given zspage from the freelist identified
  * by .
  */
-static void remove_zspage(struct page *page, struct size_class *class,
+static void remove_zspage(struct page *first_page, struct size_class *class,
enum fullness_group fullness)
 {
struct page **head;
 
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
@@ -693,11 +695,11 @@ static void remove_zspage(struct page *page, struct 
size_class *class,
BUG_ON(!*head);
if (list_empty(&(*head)->lru))
*head = NULL;
-   else if (*head == page)
+   else if (*head == first_page)
*head = (struct page *)list_entry((*head)->lru.next,
struct page, lru);
 
-   list_del_init(>lru);
+   list_del_init(_page->lru);
zs_stat_dec(class, fullness == ZS_ALMOST_EMPTY ?
CLASS_ALMOST_EMPTY : CLASS_ALMOST_FULL, 1);
 }
@@ -712,21 +714,21 @@ static void remove_zspage(struct page *page, struct 
size_class *class,
  * fullness group.
  */
 static

[PATCH v1 00/19] Support non-lru page migration

2016-03-10 Thread Minchan Kim

Recently, I got many reports about perfermance degradation
in embedded system(Android mobile phone, webOS TV and so on)
and failed to fork easily.

The problem was fragmentation caused by zram and GPU driver
pages. Their pages cannot be migrated so compaction cannot
work well, either so reclaimer ends up shrinking all of working
set pages. It made system very slow and even to fail to fork
easily.

Other pain point is that they cannot work with CMA.
Most of CMA memory space could be idle(ie, it could be used
for movable pages unless driver is using) but if driver(i.e.,
zram) cannot migrate his page, that memory space could be
wasted. In our product which has big CMA memory, it reclaims
zones too exccessively although there are lots of free space
in CMA so system was very slow easily.

To solve these problem, this patch try to add facility to
migrate non-lru pages via introducing new friend functions
of migratepage in address_space_operation and new page flags.

(isolate_page, putback_page)
(PG_movable, PG_isolated)

For details, please read description in
"mm/compaction: support non-lru movable page migration".

Originally, Gioh Kim tried to support this feature but he moved
so I took over the work. But I took many code from his work and
changed a little bit.
Thanks, Gioh!

And I should mention Konstantin Khlebnikov. He really heped Gioh
at that time so he should deserve to have many credit, too.
Thanks, Konstantin!

This patchset consists of three parts

1. add non-lru page migration feature

  mm: use put_page to free page instead of putback_lru_page
  fs/anon_inodes: new interface to create new inode
  mm/compaction: support non-lru movable page migration

2. rework KVM memory-ballooning

  mm/balloon: use general movable page feature into balloon

3. rework zsmalloc

  zsmalloc: use first_page rather than page
  zsmalloc: clean up many BUG_ON
  zsmalloc: reordering function parameter
  zsmalloc: remove unused pool param in obj_free
  zsmalloc: keep max_object in size_class
  zsmalloc: squeeze inuse into page->mapping
  zsmalloc: squeeze freelist into page->mapping
  zsmalloc: move struct zs_meta from mapping to freelist
  zsmalloc: factor page chain functionality out
  zsmalloc: separate free_zspage from putback_zspage
  zsmalloc: zs_compact refactoring
  zsmalloc: migrate head page of zspage
  zsmalloc: use single linked list for page chain
  zsmalloc: migrate tail pages in zspage
  zram: use __GFP_MOVABLE for memory allocation

Gioh Kim (1):
  fs/anon_inodes: new interface to create new inode

Minchan Kim (18):
  mm: use put_page to free page instead of putback_lru_page
  mm/compaction: support non-lru movable page migration
  mm/balloon: use general movable page feature into balloon
  zsmalloc: use first_page rather than page
  zsmalloc: clean up many BUG_ON
  zsmalloc: reordering function parameter
  zsmalloc: remove unused pool param in obj_free
  zsmalloc: keep max_object in size_class
  zsmalloc: squeeze inuse into page->mapping
  zsmalloc: squeeze freelist into page->mapping
  zsmalloc: move struct zs_meta from mapping to freelist
  zsmalloc: factor page chain functionality out
  zsmalloc: separate free_zspage from putback_zspage
  zsmalloc: zs_compact refactoring
  zsmalloc: migrate head page of zspage
  zsmalloc: use single linked list for page chain
  zsmalloc: migrate tail pages in zspage
  zram: use __GFP_MOVABLE for memory allocation

 Documentation/filesystems/Locking  |4 +
 Documentation/filesystems/vfs.txt  |5 +
 drivers/block/zram/zram_drv.c  |3 +-
 drivers/virtio/virtio_balloon.c|4 +
 fs/anon_inodes.c   |6 +
 fs/proc/page.c |3 +
 include/linux/anon_inodes.h|1 +
 include/linux/balloon_compaction.h |   47 +-
 include/linux/compaction.h |8 +
 include/linux/fs.h |2 +
 include/linux/migrate.h|2 +
 include/linux/page-flags.h |   42 +-
 include/uapi/linux/kernel-page-flags.h |1 +
 mm/balloon_compaction.c|  101 +--
 mm/compaction.c|   15 +-
 mm/migrate.c   |  197 +++--
 mm/vmscan.c|2 +-
 mm/zsmalloc.c  | 1295 +++-
 18 files changed, 1219 insertions(+), 519 deletions(-)

-- 
1.9.1

[PATCH v1 05/19] zsmalloc: use first_page rather than page

2016-03-10 Thread Minchan Kim

This patch cleans up function parameter "struct page".
Many functions of zsmalloc expects that page paramter is "first_page"
so use "first_page" rather than "page" for code readability.

Signed-off-by: Minchan Kim 
---
 mm/zsmalloc.c | 62 ++-
 1 file changed, 32 insertions(+), 30 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 2d7c4c11fc63..bb29203ec6b3 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -414,26 +414,28 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
 }
 
-static void get_zspage_mapping(struct page *page, unsigned int *class_idx,
+static void get_zspage_mapping(struct page *first_page,
+   unsigned int *class_idx,
enum fullness_group *fullness)
 {
unsigned long m;
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
-   m = (unsigned long)page->mapping;
+   m = (unsigned long)first_page->mapping;
*fullness = m & FULLNESS_MASK;
*class_idx = (m >> FULLNESS_BITS) & CLASS_IDX_MASK;
 }
 
-static void set_zspage_mapping(struct page *page, unsigned int class_idx,
+static void set_zspage_mapping(struct page *first_page,
+   unsigned int class_idx,
enum fullness_group fullness)
 {
unsigned long m;
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) |
(fullness & FULLNESS_MASK);
-   page->mapping = (struct address_space *)m;
+   first_page->mapping = (struct address_space *)m;
 }
 
 /*
@@ -620,14 +622,14 @@ static inline void zs_pool_stat_destroy(struct zs_pool 
*pool)
  * the pool (not yet implemented). This function returns fullness
  * status of the given page.
  */
-static enum fullness_group get_fullness_group(struct page *page)
+static enum fullness_group get_fullness_group(struct page *first_page)
 {
int inuse, max_objects;
enum fullness_group fg;
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
-   inuse = page->inuse;
-   max_objects = page->objects;
+   inuse = first_page->inuse;
+   max_objects = first_page->objects;
 
if (inuse == 0)
fg = ZS_EMPTY;
@@ -647,12 +649,12 @@ static enum fullness_group get_fullness_group(struct page 
*page)
  * have. This functions inserts the given zspage into the freelist
  * identified by .
  */
-static void insert_zspage(struct page *page, struct size_class *class,
+static void insert_zspage(struct page *first_page, struct size_class *class,
enum fullness_group fullness)
 {
struct page **head;
 
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
@@ -662,7 +664,7 @@ static void insert_zspage(struct page *page, struct 
size_class *class,
 
head = >fullness_list[fullness];
if (!*head) {
-   *head = page;
+   *head = first_page;
return;
}
 
@@ -670,21 +672,21 @@ static void insert_zspage(struct page *page, struct 
size_class *class,
 * We want to see more ZS_FULL pages and less almost
 * empty/full. Put pages with higher ->inuse first.
 */
-   list_add_tail(>lru, &(*head)->lru);
-   if (page->inuse >= (*head)->inuse)
-   *head = page;
+   list_add_tail(_page->lru, &(*head)->lru);
+   if (first_page->inuse >= (*head)->inuse)
+   *head = first_page;
 }
 
 /*
  * This function removes the given zspage from the freelist identified
  * by .
  */
-static void remove_zspage(struct page *page, struct size_class *class,
+static void remove_zspage(struct page *first_page, struct size_class *class,
enum fullness_group fullness)
 {
struct page **head;
 
-   BUG_ON(!is_first_page(page));
+   BUG_ON(!is_first_page(first_page));
 
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
@@ -693,11 +695,11 @@ static void remove_zspage(struct page *page, struct 
size_class *class,
BUG_ON(!*head);
if (list_empty(&(*head)->lru))
*head = NULL;
-   else if (*head == page)
+   else if (*head == first_page)
*head = (struct page *)list_entry((*head)->lru.next,
struct page, lru);
 
-   list_del_init(>lru);
+   list_del_init(_page->lru);
zs_stat_dec(class, fullness == ZS_ALMOST_EMPTY ?
CLASS_ALMOST_EMPTY : CLASS_ALMOST_FULL, 1);
 }
@@ -712,21 +714,21 @@ static void remove_zspage(struct page *page, struct 
size_class *class,
  * fullness group.
  */
 static enum fullness_group fix_fullness_group(struct size_class *class,
-

[PATCH v1 00/19] Support non-lru page migration

2016-03-10 Thread Minchan Kim

Recently, I got many reports about perfermance degradation
in embedded system(Android mobile phone, webOS TV and so on)
and failed to fork easily.

The problem was fragmentation caused by zram and GPU driver
pages. Their pages cannot be migrated so compaction cannot
work well, either so reclaimer ends up shrinking all of working
set pages. It made system very slow and even to fail to fork
easily.

Other pain point is that they cannot work with CMA.
Most of CMA memory space could be idle(ie, it could be used
for movable pages unless driver is using) but if driver(i.e.,
zram) cannot migrate his page, that memory space could be
wasted. In our product which has big CMA memory, it reclaims
zones too exccessively although there are lots of free space
in CMA so system was very slow easily.

To solve these problem, this patch try to add facility to
migrate non-lru pages via introducing new friend functions
of migratepage in address_space_operation and new page flags.

(isolate_page, putback_page)
(PG_movable, PG_isolated)

For details, please read description in
"mm/compaction: support non-lru movable page migration".

Originally, Gioh Kim tried to support this feature but he moved
so I took over the work. But I took many code from his work and
changed a little bit.
Thanks, Gioh!

And I should mention Konstantin Khlebnikov. He really heped Gioh
at that time so he should deserve to have many credit, too.
Thanks, Konstantin!

This patchset consists of three parts

1. add non-lru page migration feature

  mm: use put_page to free page instead of putback_lru_page
  fs/anon_inodes: new interface to create new inode
  mm/compaction: support non-lru movable page migration

2. rework KVM memory-ballooning

  mm/balloon: use general movable page feature into balloon

3. rework zsmalloc

  zsmalloc: use first_page rather than page
  zsmalloc: clean up many BUG_ON
  zsmalloc: reordering function parameter
  zsmalloc: remove unused pool param in obj_free
  zsmalloc: keep max_object in size_class
  zsmalloc: squeeze inuse into page->mapping
  zsmalloc: squeeze freelist into page->mapping
  zsmalloc: move struct zs_meta from mapping to freelist
  zsmalloc: factor page chain functionality out
  zsmalloc: separate free_zspage from putback_zspage
  zsmalloc: zs_compact refactoring
  zsmalloc: migrate head page of zspage
  zsmalloc: use single linked list for page chain
  zsmalloc: migrate tail pages in zspage
  zram: use __GFP_MOVABLE for memory allocation

Gioh Kim (1):
  fs/anon_inodes: new interface to create new inode

Minchan Kim (18):
  mm: use put_page to free page instead of putback_lru_page
  mm/compaction: support non-lru movable page migration
  mm/balloon: use general movable page feature into balloon
  zsmalloc: use first_page rather than page
  zsmalloc: clean up many BUG_ON
  zsmalloc: reordering function parameter
  zsmalloc: remove unused pool param in obj_free
  zsmalloc: keep max_object in size_class
  zsmalloc: squeeze inuse into page->mapping
  zsmalloc: squeeze freelist into page->mapping
  zsmalloc: move struct zs_meta from mapping to freelist
  zsmalloc: factor page chain functionality out
  zsmalloc: separate free_zspage from putback_zspage
  zsmalloc: zs_compact refactoring
  zsmalloc: migrate head page of zspage
  zsmalloc: use single linked list for page chain
  zsmalloc: migrate tail pages in zspage
  zram: use __GFP_MOVABLE for memory allocation

 Documentation/filesystems/Locking  |4 +
 Documentation/filesystems/vfs.txt  |5 +
 drivers/block/zram/zram_drv.c  |3 +-
 drivers/virtio/virtio_balloon.c|4 +
 fs/anon_inodes.c   |6 +
 fs/proc/page.c |3 +
 include/linux/anon_inodes.h|1 +
 include/linux/balloon_compaction.h |   47 +-
 include/linux/compaction.h |8 +
 include/linux/fs.h |2 +
 include/linux/migrate.h|2 +
 include/linux/page-flags.h |   42 +-
 include/uapi/linux/kernel-page-flags.h |1 +
 mm/balloon_compaction.c|  101 +--
 mm/compaction.c|   15 +-
 mm/migrate.c   |  197 +++--
 mm/vmscan.c|2 +-
 mm/zsmalloc.c  | 1295 +++-
 18 files changed, 1219 insertions(+), 519 deletions(-)

-- 
1.9.1

RE: [f2fs-dev] [PATCH 0/8] some cleanup of inline flag checking

2016-03-10 Thread Chao Yu

Hi Shawn,

> -Original Message-
> From: Shawn Lin [mailto:shawn@kernel-upstream.org]
> Sent: Friday, March 11, 2016 2:34 PM
> To: Chao Yu; 'Shawn Lin'; 'Jaegeuk Kim'
> Cc: shawn@kernel-upstream.org; linux-kernel@vger.kernel.org;
> linux-f2fs-de...@lists.sourceforge.net
> Subject: Re: [f2fs-dev] [PATCH 0/8] some cleanup of inline flag checking
> 
> Hi Chao Yu,
> 
> On 2016/3/11 13:29, Chao Yu wrote:
> > Hi Shawn,
> >
> >> -Original Message-
> >> From: Shawn Lin [mailto:shawn@rock-chips.com]
> >> Sent: Friday, March 11, 2016 11:28 AM
> >> To: Jaegeuk Kim
> >> Cc: Shawn Lin; linux-kernel@vger.kernel.org; 
> >> linux-f2fs-de...@lists.sourceforge.net
> >> Subject: [f2fs-dev] [PATCH 0/8] some cleanup of inline flag checking
> >>
> >>
> >> This patchset is going to remove some redunant checking
> >> of inline data flag and also going to avoid some unnecessary
> >> cpu waste when doing inline stuff.
> >
> > When we are accessing inline inode, inline inode conversion can happen
> > concurrently, we should check inline flag again under inode page's lock
> > to avoid accessing the wrong inline data which may have been converted.
> >
> 
> that sounds reasonable at first glance, and it more seems like that
> mopst part of this patchset is just puting the checking in the right way.
> 
> If we need to check the inline inode under the protection of inode
> page's lock, it means any callers who calling inline API stuff is
> wasting time on checing the flag outside the API, right?

As you know, inline conversion was designed as one-way operation, which means
inline inode can only be converted to normal inode, but can not be converted
in the opposite way. So here, with original design, it is OK to handle inode
as regular one if we detect that it is a non-inline inode, since it won't be
converted to inline one, otherwise, we should take inode page's lock and check
the flag again.

Thanks,

> 
> So we can just remove the redundant checking of the caller, but not
> change the behaviour of checing inline flag under page's lock?
> 
> Thanks for catching it.
> 
> > Thanks,
> >
> >>
> >> Note:
> >> Sorry for sending previous four patches in separate, let
> >> drop them and make them in this thread for better review.
> >>
> >>
> >>
> >> Shawn Lin (8):
> >>f2fs: check inline flag ahead for f2fs_write_inline_data
> >>f2fs: remove checing inline data flag for f2fs_write_data_page
> >>f2fs: check inline flag ahead for f2fs_read_inline_data
> >>f2fs: remove redundant checking of inline data flag
> >>f2fs: f2fs: check inline flag ahead for f2fs_inline_data_fiemap
> >>f2fs: remove checing inline data flag for f2fs_fiemap
> >>f2fs: remove unnecessary inline checking for f2fs_convert_inline_inode
> >>f2fs: check inline flag ahead for get_dnode_of_data
> >>
> >>   fs/f2fs/data.c   | 17 +++--
> >>   fs/f2fs/inline.c | 27 ++-
> >>   fs/f2fs/node.c   | 12 +---
> >>   3 files changed, 22 insertions(+), 34 deletions(-)
> >>
> >> --
> >> 2.3.7
> >>
> >>
> >>
> >> --
> >> Transform Data into Opportunity.
> >> Accelerate data analysis in your applications with
> >> Intel Data Analytics Acceleration Library.
> >> Click to learn more.
> >> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
> >> ___
> >> Linux-f2fs-devel mailing list
> >> linux-f2fs-de...@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> >
> >
> > --
> > Transform Data into Opportunity.
> > Accelerate data analysis in your applications with
> > Intel Data Analytics Acceleration Library.
> > Click to learn more.
> > http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
> > ___
> > Linux-f2fs-devel mailing list
> > linux-f2fs-de...@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> >

RE: [f2fs-dev] [PATCH 0/8] some cleanup of inline flag checking

2016-03-10 Thread Chao Yu

Hi Shawn,

> -Original Message-
> From: Shawn Lin [mailto:shawn@kernel-upstream.org]
> Sent: Friday, March 11, 2016 2:34 PM
> To: Chao Yu; 'Shawn Lin'; 'Jaegeuk Kim'
> Cc: shawn@kernel-upstream.org; linux-kernel@vger.kernel.org;
> linux-f2fs-de...@lists.sourceforge.net
> Subject: Re: [f2fs-dev] [PATCH 0/8] some cleanup of inline flag checking
> 
> Hi Chao Yu,
> 
> On 2016/3/11 13:29, Chao Yu wrote:
> > Hi Shawn,
> >
> >> -Original Message-
> >> From: Shawn Lin [mailto:shawn@rock-chips.com]
> >> Sent: Friday, March 11, 2016 11:28 AM
> >> To: Jaegeuk Kim
> >> Cc: Shawn Lin; linux-kernel@vger.kernel.org; 
> >> linux-f2fs-de...@lists.sourceforge.net
> >> Subject: [f2fs-dev] [PATCH 0/8] some cleanup of inline flag checking
> >>
> >>
> >> This patchset is going to remove some redunant checking
> >> of inline data flag and also going to avoid some unnecessary
> >> cpu waste when doing inline stuff.
> >
> > When we are accessing inline inode, inline inode conversion can happen
> > concurrently, we should check inline flag again under inode page's lock
> > to avoid accessing the wrong inline data which may have been converted.
> >
> 
> that sounds reasonable at first glance, and it more seems like that
> mopst part of this patchset is just puting the checking in the right way.
> 
> If we need to check the inline inode under the protection of inode
> page's lock, it means any callers who calling inline API stuff is
> wasting time on checing the flag outside the API, right?

As you know, inline conversion was designed as one-way operation, which means
inline inode can only be converted to normal inode, but can not be converted
in the opposite way. So here, with original design, it is OK to handle inode
as regular one if we detect that it is a non-inline inode, since it won't be
converted to inline one, otherwise, we should take inode page's lock and check
the flag again.

Thanks,

> 
> So we can just remove the redundant checking of the caller, but not
> change the behaviour of checing inline flag under page's lock?
> 
> Thanks for catching it.
> 
> > Thanks,
> >
> >>
> >> Note:
> >> Sorry for sending previous four patches in separate, let
> >> drop them and make them in this thread for better review.
> >>
> >>
> >>
> >> Shawn Lin (8):
> >>f2fs: check inline flag ahead for f2fs_write_inline_data
> >>f2fs: remove checing inline data flag for f2fs_write_data_page
> >>f2fs: check inline flag ahead for f2fs_read_inline_data
> >>f2fs: remove redundant checking of inline data flag
> >>f2fs: f2fs: check inline flag ahead for f2fs_inline_data_fiemap
> >>f2fs: remove checing inline data flag for f2fs_fiemap
> >>f2fs: remove unnecessary inline checking for f2fs_convert_inline_inode
> >>f2fs: check inline flag ahead for get_dnode_of_data
> >>
> >>   fs/f2fs/data.c   | 17 +++--
> >>   fs/f2fs/inline.c | 27 ++-
> >>   fs/f2fs/node.c   | 12 +---
> >>   3 files changed, 22 insertions(+), 34 deletions(-)
> >>
> >> --
> >> 2.3.7
> >>
> >>
> >>
> >> --
> >> Transform Data into Opportunity.
> >> Accelerate data analysis in your applications with
> >> Intel Data Analytics Acceleration Library.
> >> Click to learn more.
> >> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
> >> ___
> >> Linux-f2fs-devel mailing list
> >> linux-f2fs-de...@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> >
> >
> > --
> > Transform Data into Opportunity.
> > Accelerate data analysis in your applications with
> > Intel Data Analytics Acceleration Library.
> > Click to learn more.
> > http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
> > ___
> > Linux-f2fs-devel mailing list
> > linux-f2fs-de...@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> >

[PATCH] android: lmk: add swap pte pmd in tasksize

2016-03-10 Thread Lu Bing

From: l00215322 

Many android devices have zram,so we should add "MM_SWAPENTS" in tasksize.
Refer oom_kill.c,we add pte also.

Reviewed-by: Chen Feng 
Reviewed-by: Fu Jun 
Reviewed-by: Xu YiPing 
Reviewed-by: Yu DongBin 
Signed-off-by: Lu Bing 
---
 drivers/staging/android/lowmemorykiller.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/android/lowmemorykiller.c 
b/drivers/staging/android/lowmemorykiller.c
index 8b5a4a8..0817d3b 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -139,7 +139,9 @@ static unsigned long lowmem_scan(struct shrinker *s, struct 
shrink_control *sc)
task_unlock(p);
continue;
}
-   tasksize = get_mm_rss(p->mm);
+   tasksize = get_mm_rss(p->mm) +
+   get_mm_counter(p->mm, MM_SWAPENTS) +
+   atomic_long_read(>mm->nr_ptes) + mm_nr_pmds(p->mm);
task_unlock(p);
if (tasksize <= 0)
continue;
-- 
1.8.3.2

[PATCH] android: lmk: add swap pte pmd in tasksize

2016-03-10 Thread Lu Bing

From: l00215322 

Many android devices have zram,so we should add "MM_SWAPENTS" in tasksize.
Refer oom_kill.c,we add pte also.

Reviewed-by: Chen Feng 
Reviewed-by: Fu Jun 
Reviewed-by: Xu YiPing 
Reviewed-by: Yu DongBin 
Signed-off-by: Lu Bing 
---
 drivers/staging/android/lowmemorykiller.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/android/lowmemorykiller.c 
b/drivers/staging/android/lowmemorykiller.c
index 8b5a4a8..0817d3b 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -139,7 +139,9 @@ static unsigned long lowmem_scan(struct shrinker *s, struct 
shrink_control *sc)
task_unlock(p);
continue;
}
-   tasksize = get_mm_rss(p->mm);
+   tasksize = get_mm_rss(p->mm) +
+   get_mm_counter(p->mm, MM_SWAPENTS) +
+   atomic_long_read(>mm->nr_ptes) + mm_nr_pmds(p->mm);
task_unlock(p);
if (tasksize <= 0)
continue;
-- 
1.8.3.2

Re: [PATCH 1/2] [PATCH 1/2] Introduce new macros min_lt and max_lt for comparing with larger type

2016-03-10 Thread Jianyu Zhan

On Fri, Mar 11, 2016 at 2:21 PM,   wrote:
> A useful use case for min_t and max_t is comparing two values with larger
> type. For example comparing an u64 and an u32, usually we do not want to
> truncate the u64, so we need use min_t or max_t with u64.
>
> To simplify the usage introducing two more macros min_lt and max_lt,
> 'lt' means larger type.
>
> Signed-off-by: Dave Young 
> ---
>  include/linux/kernel.h |   13 +
>  1 file changed, 13 insertions(+)
>
> --- linux.orig/include/linux/kernel.h
> +++ linux/include/linux/kernel.h
> @@ -798,6 +798,19 @@ static inline void ftrace_dump(enum ftra
> type __max2 = (y);  \
> __max1 > __max2 ? __max1: __max2; })
>
> +/*
> + * use type of larger size in min_lt and max_lt
> + */
> +#define min_lt(x, y) ({\
> +   int sx = sizeof(typeof(x)); \
> +   int sy = sizeof(typeof(y)); \
> +   sx > sy ? min_t(typeof(x), x, y) : min_t(typeof(y), x, y); })
> +
> +#define max_lt(x, y) ({\
> +   int sx = sizeof(typeof(x)); \
> +   int sy = sizeof(typeof(y)); \
> +   sx > sy ? max_t(typeof(x), x, y) : max_t(typeof(y), x, y); })
> +
>  /**
>   * clamp_t - return a value clamped to a given range using a given type
>   * @type: the type of variable to use
>
>

No no!

C standard has defined "usual arithmetic conversions" rules[1], which
decides the type promotion rules in binary operators.

The interfaces in this patch just bluntly overrides this rule to
choose the bigger type size
for operation.  Most of time it might work well, because most time the
operands used in min_t()/max_t() in Linux kernel
have same sign'ness and this rule works.

But if two operands have same size type but have different different
sign'ness,  this interfaces will exhibit Undefind Behavior,
i.e.  you choose the typeof(y) as the final type to use in operation
when they have the same type size,  so it might be signed
or unsigned, depending on the type of y.

So,  in this /proc/fs/vmcore case you should rather just explicit cast
the operand to avoid truncation.

[1] 
http://www.tti.unipa.it/~ricrizzo/KS/Data/PBurden/chap4.usual.conversions.html

Regards,
Jianyu Zhan

Re: [PATCH 1/2] [PATCH 1/2] Introduce new macros min_lt and max_lt for comparing with larger type

2016-03-10 Thread Jianyu Zhan

On Fri, Mar 11, 2016 at 2:21 PM,   wrote:
> A useful use case for min_t and max_t is comparing two values with larger
> type. For example comparing an u64 and an u32, usually we do not want to
> truncate the u64, so we need use min_t or max_t with u64.
>
> To simplify the usage introducing two more macros min_lt and max_lt,
> 'lt' means larger type.
>
> Signed-off-by: Dave Young 
> ---
>  include/linux/kernel.h |   13 +
>  1 file changed, 13 insertions(+)
>
> --- linux.orig/include/linux/kernel.h
> +++ linux/include/linux/kernel.h
> @@ -798,6 +798,19 @@ static inline void ftrace_dump(enum ftra
> type __max2 = (y);  \
> __max1 > __max2 ? __max1: __max2; })
>
> +/*
> + * use type of larger size in min_lt and max_lt
> + */
> +#define min_lt(x, y) ({\
> +   int sx = sizeof(typeof(x)); \
> +   int sy = sizeof(typeof(y)); \
> +   sx > sy ? min_t(typeof(x), x, y) : min_t(typeof(y), x, y); })
> +
> +#define max_lt(x, y) ({\
> +   int sx = sizeof(typeof(x)); \
> +   int sy = sizeof(typeof(y)); \
> +   sx > sy ? max_t(typeof(x), x, y) : max_t(typeof(y), x, y); })
> +
>  /**
>   * clamp_t - return a value clamped to a given range using a given type
>   * @type: the type of variable to use
>
>

No no!

C standard has defined "usual arithmetic conversions" rules[1], which
decides the type promotion rules in binary operators.

The interfaces in this patch just bluntly overrides this rule to
choose the bigger type size
for operation.  Most of time it might work well, because most time the
operands used in min_t()/max_t() in Linux kernel
have same sign'ness and this rule works.

But if two operands have same size type but have different different
sign'ness,  this interfaces will exhibit Undefind Behavior,
i.e.  you choose the typeof(y) as the final type to use in operation
when they have the same type size,  so it might be signed
or unsigned, depending on the type of y.

So,  in this /proc/fs/vmcore case you should rather just explicit cast
the operand to avoid truncation.

[1] 
http://www.tti.unipa.it/~ricrizzo/KS/Data/PBurden/chap4.usual.conversions.html

Regards,
Jianyu Zhan

Re: [PATCH 2/2] [PATCH 2/2] proc-vmcore: wrong data type casting fix

2016-03-10 Thread Minfei Huang

On 03/11/16 at 02:21pm, dyo...@redhat.com wrote:
> @@ -231,7 +231,8 @@ static ssize_t __read_vmcore(char *buffe
>  
>   list_for_each_entry(m, _list, list) {
>   if (*fpos < m->offset + m->size) {
> - tsz = min_t(size_t, m->offset + m->size - *fpos, 
> buflen);
> + tsz = (size_t)min_lt(m->offset + m->size - *fpos,
> + buflen);
>   start = m->paddr + *fpos - m->offset;
>   tmp = read_from_oldmem(buffer, tsz, , userbuf);
>   if (tmp < 0)
> @@ -461,7 +462,7 @@ static int mmap_vmcore(struct file *file
>   if (start < m->offset + m->size) {
>   u64 paddr = 0;
>  
> - tsz = min_t(size_t, m->offset + m->size - start, size);
> + tsz = (size_t)min_lt(m->offset + m->size - start, size);
>   paddr = m->paddr + start - m->offset;
>   if (vmcore_remap_oldmem_pfn(vma, vma->vm_start + len,
>   paddr >> PAGE_SHIFT, tsz,

Hi, Dave.

Seems the previous parameter is unsigned long long, and the later one is
size_t. The size of both these types doesn't change in running time, why
not use min_t(unsigned long long, a, b) instead?

Thanks
Minfei

Re: [PATCH 2/2] [PATCH 2/2] proc-vmcore: wrong data type casting fix

2016-03-10 Thread Minfei Huang

On 03/11/16 at 02:21pm, dyo...@redhat.com wrote:
> @@ -231,7 +231,8 @@ static ssize_t __read_vmcore(char *buffe
>  
>   list_for_each_entry(m, _list, list) {
>   if (*fpos < m->offset + m->size) {
> - tsz = min_t(size_t, m->offset + m->size - *fpos, 
> buflen);
> + tsz = (size_t)min_lt(m->offset + m->size - *fpos,
> + buflen);
>   start = m->paddr + *fpos - m->offset;
>   tmp = read_from_oldmem(buffer, tsz, , userbuf);
>   if (tmp < 0)
> @@ -461,7 +462,7 @@ static int mmap_vmcore(struct file *file
>   if (start < m->offset + m->size) {
>   u64 paddr = 0;
>  
> - tsz = min_t(size_t, m->offset + m->size - start, size);
> + tsz = (size_t)min_lt(m->offset + m->size - start, size);
>   paddr = m->paddr + start - m->offset;
>   if (vmcore_remap_oldmem_pfn(vma, vma->vm_start + len,
>   paddr >> PAGE_SHIFT, tsz,

Hi, Dave.

Seems the previous parameter is unsigned long long, and the later one is
size_t. The size of both these types doesn't change in running time, why
not use min_t(unsigned long long, a, b) instead?

Thanks
Minfei

[BUG] Device unbound in a resumed state

2016-03-10 Thread Krzysztof Kozlowski

Hi,


Could be related (the same?) with [0].

I have a driver (hwrng/exynos-rng) which in probe does:
pm_runtime_set_autosuspend_delay(>dev, EXYNOS_AUTOSUSPEND_DELAY);
pm_runtime_use_autosuspend(>dev);
pm_runtime_enable(>dev);

and in remove:
pm_runtime_disable(>dev)

Just before unbinding in __device_release_driver() the device is resumed
but unfortunately not suspended later. I mean the
__device_release_driver()->pm_runtime_put_sync() does not trigger
runtime suspend.

This leads to leaving the device in active state (e.g. clocks enabled).

It does not happen after removal of autosuspend. Also runtime suspend
happens after very fast unbind-bind.


Best regards,
Krzysztof

[0] PM regression with commit 5de85b9d57ab PM runtime re-init in v4.5-rc1
http://comments.gmane.org/gmane.linux.power-management.general/70690

[BUG] Device unbound in a resumed state

2016-03-10 Thread Krzysztof Kozlowski

Hi,


Could be related (the same?) with [0].

I have a driver (hwrng/exynos-rng) which in probe does:
pm_runtime_set_autosuspend_delay(>dev, EXYNOS_AUTOSUSPEND_DELAY);
pm_runtime_use_autosuspend(>dev);
pm_runtime_enable(>dev);

and in remove:
pm_runtime_disable(>dev)

Just before unbinding in __device_release_driver() the device is resumed
but unfortunately not suspended later. I mean the
__device_release_driver()->pm_runtime_put_sync() does not trigger
runtime suspend.

This leads to leaving the device in active state (e.g. clocks enabled).

It does not happen after removal of autosuspend. Also runtime suspend
happens after very fast unbind-bind.


Best regards,
Krzysztof

[0] PM regression with commit 5de85b9d57ab PM runtime re-init in v4.5-rc1
http://comments.gmane.org/gmane.linux.power-management.general/70690

Re: [PATCH] xhci: fix typo in babble endpoint handling comment

2016-03-10 Thread Felipe Balbi

Rajesh Bhagat  writes:

> [ text/plain ]
> The 0.95 xHCI spec says that non-control endpoints will be halted if a
> babble is detected on a transfer.  The 0.96 xHCI spec says all types of
> endpoints will be halted when a babble is detected.  Some hardware that
> claims to be 0.95 compliant halts the control endpoint anyway.
>
> Reference: http://www.spinics.net/lists/linux-usb/msg21755.html
>
> Signed-off-by: Rajesh Bhagat 

Reviewed-by: Felipe Balbi 

> ---
>  drivers/usb/host/xhci-ring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 3915657..59841a9 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -1768,7 +1768,7 @@ static int xhci_requires_manual_halt_cleanup(struct 
> xhci_hcd *xhci,
>   if (trb_comp_code == COMP_TX_ERR ||
>   trb_comp_code == COMP_BABBLE ||
>   trb_comp_code == COMP_SPLIT_ERR)
> - /* The 0.96 spec says a babbling control endpoint
> + /* The 0.95 spec says a babbling control endpoint
>* is not halted. The 0.96 spec says it is.  Some HW
>* claims to be 0.95 compliant, but it halts the control
>* endpoint anyway.  Check if a babble halted the
> -- 
> 2.6.2.198.g614a2ac
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
balbi


signature.asc
Description: PGP signature

Re: [PATCH] xhci: fix typo in babble endpoint handling comment

2016-03-10 Thread Felipe Balbi

Rajesh Bhagat  writes:

> [ text/plain ]
> The 0.95 xHCI spec says that non-control endpoints will be halted if a
> babble is detected on a transfer.  The 0.96 xHCI spec says all types of
> endpoints will be halted when a babble is detected.  Some hardware that
> claims to be 0.95 compliant halts the control endpoint anyway.
>
> Reference: http://www.spinics.net/lists/linux-usb/msg21755.html
>
> Signed-off-by: Rajesh Bhagat 

Reviewed-by: Felipe Balbi 

> ---
>  drivers/usb/host/xhci-ring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 3915657..59841a9 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -1768,7 +1768,7 @@ static int xhci_requires_manual_halt_cleanup(struct 
> xhci_hcd *xhci,
>   if (trb_comp_code == COMP_TX_ERR ||
>   trb_comp_code == COMP_BABBLE ||
>   trb_comp_code == COMP_SPLIT_ERR)
> - /* The 0.96 spec says a babbling control endpoint
> + /* The 0.95 spec says a babbling control endpoint
>* is not halted. The 0.96 spec says it is.  Some HW
>* claims to be 0.95 compliant, but it halts the control
>* endpoint anyway.  Check if a babble halted the
> -- 
> 2.6.2.198.g614a2ac
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
balbi


signature.asc
Description: PGP signature

[PATCH] kcm: fix variable type

2016-03-10 Thread Andrzej Hajda

Function skb_splice_bits can return negative values, its result should
be assigned to signed variable to allow correct error checking.

The problem has been detected using patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci.

Signed-off-by: Andrzej Hajda 
---
 net/kcm/kcmsock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 40662d73..0b68ba7 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1483,7 +1483,7 @@ static ssize_t kcm_splice_read(struct socket *sock, 
loff_t *ppos,
long timeo;
struct kcm_rx_msg *rxm;
int err = 0;
-   size_t copied;
+   ssize_t copied;
struct sk_buff *skb;
 
/* Only support splice for SOCKSEQPACKET */
-- 
1.9.1

[PATCH] kcm: fix variable type

2016-03-10 Thread Andrzej Hajda

Function skb_splice_bits can return negative values, its result should
be assigned to signed variable to allow correct error checking.

The problem has been detected using patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci.

Signed-off-by: Andrzej Hajda 
---
 net/kcm/kcmsock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 40662d73..0b68ba7 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1483,7 +1483,7 @@ static ssize_t kcm_splice_read(struct socket *sock, 
loff_t *ppos,
long timeo;
struct kcm_rx_msg *rxm;
int err = 0;
-   size_t copied;
+   ssize_t copied;
struct sk_buff *skb;
 
/* Only support splice for SOCKSEQPACKET */
-- 
1.9.1

Re: Overlapping ioremap() calls, set_memory_*() semantics

2016-03-10 Thread Andy Lutomirski

On Mon, Mar 7, 2016 at 9:03 AM, Toshi Kani  wrote:
> Let me try to summarize...
>
> The original issue Luis brought up was that drivers written to work with
> MTRR may create a single ioremap range covering multiple cache attributes
> since MTRR can overwrite cache attribute of a certain range.  Converting
> such drivers with PAT-based ioremap interfaces, i.e. ioremap_wc() and
> ioremap_nocache(), requires a separate ioremap map for each cache
> attribute, which can be challenging as it may result in overlapping ioremap
> ranges (in his term) with different cache attributes.
>
> So, Luis asked about 'sematics of overlapping ioremap()' calls.  Hence, I
> responded that aliasing mapping itself is supported, but alias with
> different cache attribute is not.  We have checks in place to detect such
> condition.  Overlapping ioremap calls with a different cache attribute
> either fails or gets redirected to the existing cache attribute on x86.

A little off-topic, but someone reminded me recently: most recent CPUs
have self-snoop.  It's poorly documented, but on self-snooping CPUs, I
think that a lot of the aliasing issues go away.  We may be able to
optimize the code quite a bit on these CPUs.

I also wonder whether we can drop a bunch of the memtype tracking.
After all, if we have aliases of different types on a self-snooping
CPU and /dev/mem is locked down hard enough, we could maybe get away
with letting self-snoop handle all the conflicts.

(We could also make /dev/mem always do UC if it would help.)

--Andy

Re: Overlapping ioremap() calls, set_memory_*() semantics

2016-03-10 Thread Andy Lutomirski

On Mon, Mar 7, 2016 at 9:03 AM, Toshi Kani  wrote:
> Let me try to summarize...
>
> The original issue Luis brought up was that drivers written to work with
> MTRR may create a single ioremap range covering multiple cache attributes
> since MTRR can overwrite cache attribute of a certain range.  Converting
> such drivers with PAT-based ioremap interfaces, i.e. ioremap_wc() and
> ioremap_nocache(), requires a separate ioremap map for each cache
> attribute, which can be challenging as it may result in overlapping ioremap
> ranges (in his term) with different cache attributes.
>
> So, Luis asked about 'sematics of overlapping ioremap()' calls.  Hence, I
> responded that aliasing mapping itself is supported, but alias with
> different cache attribute is not.  We have checks in place to detect such
> condition.  Overlapping ioremap calls with a different cache attribute
> either fails or gets redirected to the existing cache attribute on x86.

A little off-topic, but someone reminded me recently: most recent CPUs
have self-snoop.  It's poorly documented, but on self-snooping CPUs, I
think that a lot of the aliasing issues go away.  We may be able to
optimize the code quite a bit on these CPUs.

I also wonder whether we can drop a bunch of the memtype tracking.
After all, if we have aliases of different types on a self-snooping
CPU and /dev/mem is locked down hard enough, we could maybe get away
with letting self-snoop handle all the conflicts.

(We could also make /dev/mem always do UC if it would help.)

--Andy

Re: [PATCH v5 3/9] dma-mapping: add dma_{map,unmap}_resource

2016-03-10 Thread Dan Williams

On Thu, Mar 10, 2016 at 8:05 AM, Niklas S??derlund
 wrote:
> Hi Christoph,
>
> On 2016-03-07 23:38:47 -0800, Christoph Hellwig wrote:
>> Please add some documentation on where/how this should be used.  It's
>> not a very obvious interface.
>
> Good idea, I have added the following to Documentation/DMA-API.txt and
> folded it in to this patch. Do you feel it's adequate and do you know
> anywhere else I should add documentation?
>
> diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
> index 45ef3f2..248556a 100644
> --- a/Documentation/DMA-API.txt
> +++ b/Documentation/DMA-API.txt
> @@ -277,14 +277,29 @@ and  parameters are provided to do partial page 
> mapping, it is
>  recommended that you never use these unless you really know what the
>  cache width is.
>
> +dma_addr_t
> +dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size,
> +enum dma_data_direction dir, struct dma_attrs *attrs)
> +
> +Maps a MMIO region so it can be accessed by the device and returns the
> +DMA address of the memory. API should only be used to map device MMIO,
> +mapping of RAM is not permitted.
> +

I think it is confusing to use the dma_ prefix for this peer-to-peer
mmio functionality.  dma_addr_t is a device's view of host memory.
Something like bus_addr_t bus_map_resource().  Doesn't this routine
also need the source device in addition to the target device?  The
resource address is from the perspective of the host cpu, it may be a
different address space in the view of two devices relative to each
other.

Re: [PATCH v5 3/9] dma-mapping: add dma_{map,unmap}_resource

2016-03-10 Thread Dan Williams

On Thu, Mar 10, 2016 at 8:05 AM, Niklas S??derlund
 wrote:
> Hi Christoph,
>
> On 2016-03-07 23:38:47 -0800, Christoph Hellwig wrote:
>> Please add some documentation on where/how this should be used.  It's
>> not a very obvious interface.
>
> Good idea, I have added the following to Documentation/DMA-API.txt and
> folded it in to this patch. Do you feel it's adequate and do you know
> anywhere else I should add documentation?
>
> diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
> index 45ef3f2..248556a 100644
> --- a/Documentation/DMA-API.txt
> +++ b/Documentation/DMA-API.txt
> @@ -277,14 +277,29 @@ and  parameters are provided to do partial page 
> mapping, it is
>  recommended that you never use these unless you really know what the
>  cache width is.
>
> +dma_addr_t
> +dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size,
> +enum dma_data_direction dir, struct dma_attrs *attrs)
> +
> +Maps a MMIO region so it can be accessed by the device and returns the
> +DMA address of the memory. API should only be used to map device MMIO,
> +mapping of RAM is not permitted.
> +

I think it is confusing to use the dma_ prefix for this peer-to-peer
mmio functionality.  dma_addr_t is a device's view of host memory.
Something like bus_addr_t bus_map_resource().  Doesn't this routine
also need the source device in addition to the target device?  The
resource address is from the perspective of the host cpu, it may be a
different address space in the view of two devices relative to each
other.

[PATCH] tty: amba-pl011: Use 32-bit accesses for SBSA UART

2016-03-10 Thread Christopher Covington

Version 2 of the Server Base System Architecture (SBSAv2) describes the
Generic UART registers as 32 bits wide. At least one implementation, found
on the Qualcomm Technologies QDF2432, only supports 32 bit accesses.
SBSAv3, which describes supported access sizes in greater detail,
explicitly requires support for both 16 and 32 bit accesses to all
registers (and 8 bit accesses to some but not all). Therefore, for broad
compatibility, simply use 32 bit accessors for the SBSA UART.

Tested-by: Mark Langsdorf 
Signed-off-by: Christopher Covington 
---
Changes new in v2:
* Fixed from address
* Elaborated on forward (SBSAv3) compatibility in commit message
* Included Mark Langsdorf's Tested-by, which now covers:
QDF2432
Seattle
X-Gene 1
---
 drivers/tty/serial/amba-pl011.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index c0da0cc..ffb5eb8 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -121,6 +121,7 @@ static struct vendor_data vendor_arm = {
 
 static struct vendor_data vendor_sbsa = {
.reg_offset = pl011_std_offsets,
+   .access_32b = true,
.oversampling   = false,
.dma_threshold  = false,
.cts_event_workaround   = false,
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH] tty: amba-pl011: Use 32-bit accesses for SBSA UART

2016-03-10 Thread Christopher Covington

Version 2 of the Server Base System Architecture (SBSAv2) describes the
Generic UART registers as 32 bits wide. At least one implementation, found
on the Qualcomm Technologies QDF2432, only supports 32 bit accesses.
SBSAv3, which describes supported access sizes in greater detail,
explicitly requires support for both 16 and 32 bit accesses to all
registers (and 8 bit accesses to some but not all). Therefore, for broad
compatibility, simply use 32 bit accessors for the SBSA UART.

Tested-by: Mark Langsdorf 
Signed-off-by: Christopher Covington 
---
Changes new in v2:
* Fixed from address
* Elaborated on forward (SBSAv3) compatibility in commit message
* Included Mark Langsdorf's Tested-by, which now covers:
QDF2432
Seattle
X-Gene 1
---
 drivers/tty/serial/amba-pl011.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index c0da0cc..ffb5eb8 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -121,6 +121,7 @@ static struct vendor_data vendor_arm = {
 
 static struct vendor_data vendor_sbsa = {
.reg_offset = pl011_std_offsets,
+   .access_32b = true,
.oversampling   = false,
.dma_threshold  = false,
.cts_event_workaround   = false,
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [f2fs-dev] [PATCH 0/8] some cleanup of inline flag checking

2016-03-10 Thread Shawn Lin


Hi Chao Yu,

On 2016/3/11 13:29, Chao Yu wrote:

Hi Shawn,


-Original Message-
From: Shawn Lin [mailto:shawn@rock-chips.com]
Sent: Friday, March 11, 2016 11:28 AM
To: Jaegeuk Kim
Cc: Shawn Lin; linux-kernel@vger.kernel.org; 
linux-f2fs-de...@lists.sourceforge.net
Subject: [f2fs-dev] [PATCH 0/8] some cleanup of inline flag checking


This patchset is going to remove some redunant checking
of inline data flag and also going to avoid some unnecessary
cpu waste when doing inline stuff.


When we are accessing inline inode, inline inode conversion can happen
concurrently, we should check inline flag again under inode page's lock
to avoid accessing the wrong inline data which may have been converted.



that sounds reasonable at first glance, and it more seems like that
mopst part of this patchset is just puting the checking in the right way.

If we need to check the inline inode under the protection of inode
page's lock, it means any callers who calling inline API stuff is
wasting time on checing the flag outside the API, right?

So we can just remove the redundant checking of the caller, but not
change the behaviour of checing inline flag under page's lock?

Thanks for catching it.


Thanks,



Note:
Sorry for sending previous four patches in separate, let
drop them and make them in this thread for better review.



Shawn Lin (8):
   f2fs: check inline flag ahead for f2fs_write_inline_data
   f2fs: remove checing inline data flag for f2fs_write_data_page
   f2fs: check inline flag ahead for f2fs_read_inline_data
   f2fs: remove redundant checking of inline data flag
   f2fs: f2fs: check inline flag ahead for f2fs_inline_data_fiemap
   f2fs: remove checing inline data flag for f2fs_fiemap
   f2fs: remove unnecessary inline checking for f2fs_convert_inline_inode
   f2fs: check inline flag ahead for get_dnode_of_data

  fs/f2fs/data.c   | 17 +++--
  fs/f2fs/inline.c | 27 ++-
  fs/f2fs/node.c   | 12 +---
  3 files changed, 22 insertions(+), 34 deletions(-)

--
2.3.7



--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
___
Linux-f2fs-devel mailing list
linux-f2fs-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel



--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
___
Linux-f2fs-devel mailing list
linux-f2fs-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [f2fs-dev] [PATCH 0/8] some cleanup of inline flag checking

2016-03-10 Thread Shawn Lin


Hi Chao Yu,

On 2016/3/11 13:29, Chao Yu wrote:

Hi Shawn,


-Original Message-
From: Shawn Lin [mailto:shawn@rock-chips.com]
Sent: Friday, March 11, 2016 11:28 AM
To: Jaegeuk Kim
Cc: Shawn Lin; linux-kernel@vger.kernel.org; 
linux-f2fs-de...@lists.sourceforge.net
Subject: [f2fs-dev] [PATCH 0/8] some cleanup of inline flag checking


This patchset is going to remove some redunant checking
of inline data flag and also going to avoid some unnecessary
cpu waste when doing inline stuff.


When we are accessing inline inode, inline inode conversion can happen
concurrently, we should check inline flag again under inode page's lock
to avoid accessing the wrong inline data which may have been converted.



that sounds reasonable at first glance, and it more seems like that
mopst part of this patchset is just puting the checking in the right way.

If we need to check the inline inode under the protection of inode
page's lock, it means any callers who calling inline API stuff is
wasting time on checing the flag outside the API, right?

So we can just remove the redundant checking of the caller, but not
change the behaviour of checing inline flag under page's lock?

Thanks for catching it.


Thanks,



Note:
Sorry for sending previous four patches in separate, let
drop them and make them in this thread for better review.



Shawn Lin (8):
   f2fs: check inline flag ahead for f2fs_write_inline_data
   f2fs: remove checing inline data flag for f2fs_write_data_page
   f2fs: check inline flag ahead for f2fs_read_inline_data
   f2fs: remove redundant checking of inline data flag
   f2fs: f2fs: check inline flag ahead for f2fs_inline_data_fiemap
   f2fs: remove checing inline data flag for f2fs_fiemap
   f2fs: remove unnecessary inline checking for f2fs_convert_inline_inode
   f2fs: check inline flag ahead for get_dnode_of_data

  fs/f2fs/data.c   | 17 +++--
  fs/f2fs/inline.c | 27 ++-
  fs/f2fs/node.c   | 12 +---
  3 files changed, 22 insertions(+), 34 deletions(-)

--
2.3.7



--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
___
Linux-f2fs-devel mailing list
linux-f2fs-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel



--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
___
Linux-f2fs-devel mailing list
linux-f2fs-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

[PATCH 1/2] [PATCH 1/2] Introduce new macros min_lt and max_lt for comparing with larger type

2016-03-10 Thread dyoung

A useful use case for min_t and max_t is comparing two values with larger
type. For example comparing an u64 and an u32, usually we do not want to
truncate the u64, so we need use min_t or max_t with u64.

To simplify the usage introducing two more macros min_lt and max_lt,
'lt' means larger type.

Signed-off-by: Dave Young 
---
 include/linux/kernel.h |   13 +
 1 file changed, 13 insertions(+)

--- linux.orig/include/linux/kernel.h
+++ linux/include/linux/kernel.h
@@ -798,6 +798,19 @@ static inline void ftrace_dump(enum ftra
type __max2 = (y);  \
__max1 > __max2 ? __max1: __max2; })
 
+/*
+ * use type of larger size in min_lt and max_lt
+ */
+#define min_lt(x, y) ({\
+   int sx = sizeof(typeof(x)); \
+   int sy = sizeof(typeof(y)); \
+   sx > sy ? min_t(typeof(x), x, y) : min_t(typeof(y), x, y); })
+
+#define max_lt(x, y) ({\
+   int sx = sizeof(typeof(x)); \
+   int sy = sizeof(typeof(y)); \
+   sx > sy ? max_t(typeof(x), x, y) : max_t(typeof(y), x, y); })
+
 /**
  * clamp_t - return a value clamped to a given range using a given type
  * @type: the type of variable to use

[PATCH 0/2] min_t type casting bug fixes

2016-03-10 Thread dyoung

Hi,

We found a min_t type casting issue in fs/proc/vmcore.c, it uses smaller
type in min_t so that during i386 PAE testing 64bit value was trucated
and then cause saving vmcore failure and BUG() for mmap case.

I introduced new macros min_lt and max_lt to select the larger data type
in x and y. Then use it in 2nd patch.

Any comments are appreciated.

Thanks
Dave

[PATCH 2/2] [PATCH 2/2] proc-vmcore: wrong data type casting fix

2016-03-10 Thread dyoung

On i686 PAE enabled machine the contiguous physical area could be large
and it can cause triming down variables in below calculation in
read_vmcore() and mmap_vmcore():

tsz = min_t(size_t, m->offset + m->size - *fpos, buflen);

Then the real size passed down is not correct any more.
Suppose m->offset + m->size - *fpos being truncated to 0, buflen >0 then
we will get tsz = 0. It is of course not an expected result.

During out test there are two problems caused by it:
1) read_vmcore will refuse to continue so makedumpfile fails.
2) mmap_vmcore will trigger BUG_ON() in remap_pfn_range().

Use min_lt instead so that the variables are not truncated.

Signed-off-by: Baoquan He 
Signed-off-by: Dave Young 
---
 fs/proc/vmcore.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- linux.orig/fs/proc/vmcore.c
+++ linux/fs/proc/vmcore.c
@@ -231,7 +231,8 @@ static ssize_t __read_vmcore(char *buffe
 
list_for_each_entry(m, _list, list) {
if (*fpos < m->offset + m->size) {
-   tsz = min_t(size_t, m->offset + m->size - *fpos, 
buflen);
+   tsz = (size_t)min_lt(m->offset + m->size - *fpos,
+   buflen);
start = m->paddr + *fpos - m->offset;
tmp = read_from_oldmem(buffer, tsz, , userbuf);
if (tmp < 0)
@@ -461,7 +462,7 @@ static int mmap_vmcore(struct file *file
if (start < m->offset + m->size) {
u64 paddr = 0;
 
-   tsz = min_t(size_t, m->offset + m->size - start, size);
+   tsz = (size_t)min_lt(m->offset + m->size - start, size);
paddr = m->paddr + start - m->offset;
if (vmcore_remap_oldmem_pfn(vma, vma->vm_start + len,
paddr >> PAGE_SHIFT, tsz,

[PATCH 1/2] [PATCH 1/2] Introduce new macros min_lt and max_lt for comparing with larger type

2016-03-10 Thread dyoung

A useful use case for min_t and max_t is comparing two values with larger
type. For example comparing an u64 and an u32, usually we do not want to
truncate the u64, so we need use min_t or max_t with u64.

To simplify the usage introducing two more macros min_lt and max_lt,
'lt' means larger type.

Signed-off-by: Dave Young 
---
 include/linux/kernel.h |   13 +
 1 file changed, 13 insertions(+)

--- linux.orig/include/linux/kernel.h
+++ linux/include/linux/kernel.h
@@ -798,6 +798,19 @@ static inline void ftrace_dump(enum ftra
type __max2 = (y);  \
__max1 > __max2 ? __max1: __max2; })
 
+/*
+ * use type of larger size in min_lt and max_lt
+ */
+#define min_lt(x, y) ({\
+   int sx = sizeof(typeof(x)); \
+   int sy = sizeof(typeof(y)); \
+   sx > sy ? min_t(typeof(x), x, y) : min_t(typeof(y), x, y); })
+
+#define max_lt(x, y) ({\
+   int sx = sizeof(typeof(x)); \
+   int sy = sizeof(typeof(y)); \
+   sx > sy ? max_t(typeof(x), x, y) : max_t(typeof(y), x, y); })
+
 /**
  * clamp_t - return a value clamped to a given range using a given type
  * @type: the type of variable to use

[PATCH 0/2] min_t type casting bug fixes

2016-03-10 Thread dyoung

Hi,

We found a min_t type casting issue in fs/proc/vmcore.c, it uses smaller
type in min_t so that during i386 PAE testing 64bit value was trucated
and then cause saving vmcore failure and BUG() for mmap case.

I introduced new macros min_lt and max_lt to select the larger data type
in x and y. Then use it in 2nd patch.

Any comments are appreciated.

Thanks
Dave

[PATCH 2/2] [PATCH 2/2] proc-vmcore: wrong data type casting fix

2016-03-10 Thread dyoung

On i686 PAE enabled machine the contiguous physical area could be large
and it can cause triming down variables in below calculation in
read_vmcore() and mmap_vmcore():

tsz = min_t(size_t, m->offset + m->size - *fpos, buflen);

Then the real size passed down is not correct any more.
Suppose m->offset + m->size - *fpos being truncated to 0, buflen >0 then
we will get tsz = 0. It is of course not an expected result.

During out test there are two problems caused by it:
1) read_vmcore will refuse to continue so makedumpfile fails.
2) mmap_vmcore will trigger BUG_ON() in remap_pfn_range().

Use min_lt instead so that the variables are not truncated.

Signed-off-by: Baoquan He 
Signed-off-by: Dave Young 
---
 fs/proc/vmcore.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- linux.orig/fs/proc/vmcore.c
+++ linux/fs/proc/vmcore.c
@@ -231,7 +231,8 @@ static ssize_t __read_vmcore(char *buffe
 
list_for_each_entry(m, _list, list) {
if (*fpos < m->offset + m->size) {
-   tsz = min_t(size_t, m->offset + m->size - *fpos, 
buflen);
+   tsz = (size_t)min_lt(m->offset + m->size - *fpos,
+   buflen);
start = m->paddr + *fpos - m->offset;
tmp = read_from_oldmem(buffer, tsz, , userbuf);
if (tmp < 0)
@@ -461,7 +462,7 @@ static int mmap_vmcore(struct file *file
if (start < m->offset + m->size) {
u64 paddr = 0;
 
-   tsz = min_t(size_t, m->offset + m->size - start, size);
+   tsz = (size_t)min_lt(m->offset + m->size - start, size);
paddr = m->paddr + start - m->offset;
if (vmcore_remap_oldmem_pfn(vma, vma->vm_start + len,
paddr >> PAGE_SHIFT, tsz,

Re: [PATCH v5 3/5] Add Cyclomatic complexity GCC plugin

2016-03-10 Thread Masahiro Yamada

Hi Emese,


2016-03-07 8:05 GMT+09:00 Emese Revfy :
> Add a very simple plugin to demonstrate the GCC plugin infrastructure. This 
> GCC
> plugin computes the cyclomatic complexity of each function.
>
> The complexity M of a function's control flow graph is defined as:
>  M = E - N + 2P
> where
>  E = the number of edges
>  N = the number of nodes
>  P = the number of connected components (exit nodes).
>
> Signed-off-by: Emese Revfy 
> ---
>  arch/Kconfig  | 12 +++
>  scripts/Makefile.gcc-plugins  |  6 +++-
>  tools/gcc/Makefile|  4 +++
>  tools/gcc/cyc_complexity_plugin.c | 73 
> +++
>  4 files changed, 94 insertions(+), 1 deletion(-)
>  create mode 100644 tools/gcc/cyc_complexity_plugin.c
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index e090642..59bde9b 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -370,6 +370,18 @@ menuconfig GCC_PLUGINS
>   GCC plugins are loadable modules that provide extra features to the
>   compiler. They are useful for runtime instrumentation and static 
> analysis.
>
> +config GCC_PLUGIN_CYC_COMPLEXITY
> +   bool "Compute the cyclomatic complexity of a function"
> +   depends on GCC_PLUGINS
> +   help
> + The complexity M of a function's control flow graph is defined as:
> +  M = E - N + 2P
> + where
> +
> + E = the number of edges
> + N = the number of nodes
> + P = the number of connected components (exit nodes).
> +
>  config HAVE_CC_STACKPROTECTOR
> bool
> help
> diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins
> index 7c85bf2..dd7b56d 100644
> --- a/scripts/Makefile.gcc-plugins
> +++ b/scripts/Makefile.gcc-plugins
> @@ -5,7 +5,11 @@ else
>  PLUGINCC := $(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-plugin.sh 
> "$(HOSTCC)" "$(HOSTCXX)" "$(CC)")
>  endif
>  ifneq ($(PLUGINCC),)
> -export PLUGINCC GCC_PLUGINS_CFLAGS GCC_PLUGINS_AFLAGS
> +ifdef CONFIG_GCC_PLUGIN_CYC_COMPLEXITY
> +GCC_PLUGIN_CYC_COMPLEXITY_CFLAGS := 
> -fplugin=$(objtree)/tools/gcc/cyc_complexity_plugin.so
> +endif
> +GCC_PLUGINS_CFLAGS := $(GCC_PLUGIN_CYC_COMPLEXITY_CFLAGS)
> +export PLUGINCC GCC_PLUGINS_CFLAGS GCC_PLUGINS_AFLAGS 
> GCC_PLUGIN_CYC_COMPLEXITY


Do you need to export "GCC_PLUGIN_CYC_COMPLEXITY"?
I do not see any reference to it.

If we expect more and more plugins in the future,
is it better to do like this?

gcc-plugin-$(CONFIG_GCC_PLUGIN_CYC_COMPLEXITY)  += cyc_complexity_plugin.so
gcc-plugin-$(CONFIG_GCC_PLUGIN_SANCOV)  += sancov_plugin.so
gcc-plugin-$(CONFIG_GCC_PLUGIN_FOO) += foo_plugin.so


GCC_PLUGINS_CFLAGS := $(addprefix -fplugin=$(objtree)/tools/gcc/,
$(gcc-plugin-y))








-- 
Best Regards
Masahiro Yamada

Re: [PATCH v5 3/5] Add Cyclomatic complexity GCC plugin

2016-03-10 Thread Masahiro Yamada

Hi Emese,


2016-03-07 8:05 GMT+09:00 Emese Revfy :
> Add a very simple plugin to demonstrate the GCC plugin infrastructure. This 
> GCC
> plugin computes the cyclomatic complexity of each function.
>
> The complexity M of a function's control flow graph is defined as:
>  M = E - N + 2P
> where
>  E = the number of edges
>  N = the number of nodes
>  P = the number of connected components (exit nodes).
>
> Signed-off-by: Emese Revfy 
> ---
>  arch/Kconfig  | 12 +++
>  scripts/Makefile.gcc-plugins  |  6 +++-
>  tools/gcc/Makefile|  4 +++
>  tools/gcc/cyc_complexity_plugin.c | 73 
> +++
>  4 files changed, 94 insertions(+), 1 deletion(-)
>  create mode 100644 tools/gcc/cyc_complexity_plugin.c
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index e090642..59bde9b 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -370,6 +370,18 @@ menuconfig GCC_PLUGINS
>   GCC plugins are loadable modules that provide extra features to the
>   compiler. They are useful for runtime instrumentation and static 
> analysis.
>
> +config GCC_PLUGIN_CYC_COMPLEXITY
> +   bool "Compute the cyclomatic complexity of a function"
> +   depends on GCC_PLUGINS
> +   help
> + The complexity M of a function's control flow graph is defined as:
> +  M = E - N + 2P
> + where
> +
> + E = the number of edges
> + N = the number of nodes
> + P = the number of connected components (exit nodes).
> +
>  config HAVE_CC_STACKPROTECTOR
> bool
> help
> diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins
> index 7c85bf2..dd7b56d 100644
> --- a/scripts/Makefile.gcc-plugins
> +++ b/scripts/Makefile.gcc-plugins
> @@ -5,7 +5,11 @@ else
>  PLUGINCC := $(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-plugin.sh 
> "$(HOSTCC)" "$(HOSTCXX)" "$(CC)")
>  endif
>  ifneq ($(PLUGINCC),)
> -export PLUGINCC GCC_PLUGINS_CFLAGS GCC_PLUGINS_AFLAGS
> +ifdef CONFIG_GCC_PLUGIN_CYC_COMPLEXITY
> +GCC_PLUGIN_CYC_COMPLEXITY_CFLAGS := 
> -fplugin=$(objtree)/tools/gcc/cyc_complexity_plugin.so
> +endif
> +GCC_PLUGINS_CFLAGS := $(GCC_PLUGIN_CYC_COMPLEXITY_CFLAGS)
> +export PLUGINCC GCC_PLUGINS_CFLAGS GCC_PLUGINS_AFLAGS 
> GCC_PLUGIN_CYC_COMPLEXITY


Do you need to export "GCC_PLUGIN_CYC_COMPLEXITY"?
I do not see any reference to it.

If we expect more and more plugins in the future,
is it better to do like this?

gcc-plugin-$(CONFIG_GCC_PLUGIN_CYC_COMPLEXITY)  += cyc_complexity_plugin.so
gcc-plugin-$(CONFIG_GCC_PLUGIN_SANCOV)  += sancov_plugin.so
gcc-plugin-$(CONFIG_GCC_PLUGIN_FOO) += foo_plugin.so


GCC_PLUGINS_CFLAGS := $(addprefix -fplugin=$(objtree)/tools/gcc/,
$(gcc-plugin-y))








-- 
Best Regards
Masahiro Yamada

Re: [RFC] PCI: PTM Driver

2016-03-10 Thread Yong, Jonathan


On 02/29/2016 15:29, Yong, Jonathan wrote:

Hello LKML,

This is a preliminary implementation of the PTM[1] support driver, the code
is obviously hacked together and in need of refactoring. This driver has
only been tested against a virtual PCI bus.

The drivers job is to get to every PTM capable device, set some PCI config
space bits, then go back to sleep [2].

PTM capable PCIe devices will get a new sysfs entry to allow PTM to be
enabled if automatic PTM activation is disabled, or disabled if so desired.

Comments? Should I explain the PTM registers in more details?
Please CC me, thanks.


Ping?

Re: [RFC] PCI: PTM Driver

2016-03-10 Thread Yong, Jonathan


On 02/29/2016 15:29, Yong, Jonathan wrote:

Hello LKML,

This is a preliminary implementation of the PTM[1] support driver, the code
is obviously hacked together and in need of refactoring. This driver has
only been tested against a virtual PCI bus.

The drivers job is to get to every PTM capable device, set some PCI config
space bits, then go back to sleep [2].

PTM capable PCIe devices will get a new sysfs entry to allow PTM to be
enabled if automatic PTM activation is disabled, or disabled if so desired.

Comments? Should I explain the PTM registers in more details?
Please CC me, thanks.


Ping?

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1588 matches

Mail list logo