Re: [PATCH] pinctrl: msm: Use dynamic GPIO numbering

2020-11-08 Thread Arun KS
On Thu, Nov 5, 2020 at 3:46 PM Linus Walleij  wrote:
>
> On Fri, Oct 23, 2020 at 4:21 PM Arun KS  wrote:
>
> > Im only concerned because, after this change, the use of gpio number
> > from user space has become a little difficult.
>
> This makes me a bit puzzled so I need to push back a bit
> here.
>
> What is this userspace and what interface is it using?
>
> We recommend using the GPIO character device with
> libgpiod for userspace applications:
> https://www.kernel.org/doc/html/latest/driver-api/gpio/using-gpio.html

Thanks Linus. Makes sense. Basically using the gpiochipset and offset
to it will solve my problem. Earlier, while using the sysfs interface,
it used to be one to one mapping with real gpio numbers.

Regards,
Arun
>
> Is there any problem with this?
>
> sysfs is deprecated for years now:
> https://www.kernel.org/doc/html/latest/admin-guide/gpio/sysfs.html
>
> Yours,
> Linus Walleij


Re: [PATCH] pinctrl: msm: Use dynamic GPIO numbering

2020-10-23 Thread Arun KS
On Mon, Jan 29, 2018 at 8:30 AM Bjorn Andersson
 wrote:
>
> The base of the TLMM gpiochip should not be statically defined as 0, fix
> this to not artificially restrict the existence of multiple pinctrl-msm
> devices.

Can someone please provide the details why this is needed for
pinctrl-msm.  Is there any msm-chipset using multiple tlmm devices?  I
m only concerned because, after this change, the use of gpio number
from user space has become a little difficult. Can we merge the patch
from Timur to maintain the past behavior when multiple tlmm devices
are not present, which is most likely the case?

 static int base = 0;

 chip->base = base;
 base = -1;

Regards,
Arun

>
> Fixes: f365be092572 ("pinctrl: Add Qualcomm TLMM driver")
> Reported-by: Timur Tabi 
> Signed-off-by: Bjorn Andersson 
> ---
>  drivers/pinctrl/qcom/pinctrl-msm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c 
> b/drivers/pinctrl/qcom/pinctrl-msm.c
> index 495432f3341b..95e5c5ea40af 100644
> --- a/drivers/pinctrl/qcom/pinctrl-msm.c
> +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
> @@ -818,7 +818,7 @@ static int msm_gpio_init(struct msm_pinctrl *pctrl)
> return -EINVAL;
>
> chip = >chip;
> -   chip->base = 0;
> +   chip->base = -1;
> chip->ngpio = ngpio;
> chip->label = dev_name(pctrl->dev);
> chip->parent = pctrl->dev;
> --
> 2.15.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] arm64: Fix size of __early_cpu_boot_status

2019-05-15 Thread Arun KS
On Wed, May 15, 2019 at 7:20 PM Will Deacon  wrote:
>
> On Wed, May 15, 2019 at 07:13:19PM +0530, Arun KS wrote:
> > __early_cpu_boot_status is of type int. Fix up the calls to
> > update_early_cpu_boot_status, to use a w register.
> >
> > Signed-off-by: Arun KS 
> > Acked-by: Mark Rutland 
> > ---
> >  arch/arm64/include/asm/smp.h | 2 +-
> >  arch/arm64/kernel/head.S | 6 +++---
> >  2 files changed, 4 insertions(+), 4 deletions(-)
>
> Your original patch is now in mainline:
>
> https://git.kernel.org/linus/61cf61d81e32
>
> Is this still needed?
Thanks for pointing that out. We can ignore this patch.

Regards,
Arun
>
> Will


[PATCH v1] arm64: Fix size of __early_cpu_boot_status

2019-05-15 Thread Arun KS
__early_cpu_boot_status is of type int. Fix up the calls to
update_early_cpu_boot_status, to use a w register.

Signed-off-by: Arun KS 
Acked-by: Mark Rutland 
---
 arch/arm64/include/asm/smp.h | 2 +-
 arch/arm64/kernel/head.S | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 18553f3..59e80ab 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -96,7 +96,7 @@ struct secondary_data {
 };
 
 extern struct secondary_data secondary_data;
-extern long __early_cpu_boot_status;
+extern int __early_cpu_boot_status;
 extern void secondary_entry(void);
 
 extern void arch_send_call_function_single_ipi(int cpu);
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index fcae3f8..c7175fb 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -778,7 +778,7 @@ ENTRY(__enable_mmu)
ubfxx2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4
cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
b.ne__no_granule_support
-   update_early_cpu_boot_status 0, x2, x3
+   update_early_cpu_boot_status 0, x2, w3
adrpx2, idmap_pg_dir
phys_to_ttbr x1, x1
phys_to_ttbr x2, x2
@@ -810,7 +810,7 @@ ENTRY(__cpu_secondary_check52bitva)
cbnzx0, 2f
 
update_early_cpu_boot_status \
-   CPU_STUCK_IN_KERNEL | CPU_STUCK_REASON_52_BIT_VA, x0, x1
+   CPU_STUCK_IN_KERNEL | CPU_STUCK_REASON_52_BIT_VA, x0, w1
 1: wfe
wfi
b   1b
@@ -822,7 +822,7 @@ ENDPROC(__cpu_secondary_check52bitva)
 __no_granule_support:
/* Indicate that this CPU can't boot and is stuck in the kernel */
update_early_cpu_boot_status \
-   CPU_STUCK_IN_KERNEL | CPU_STUCK_REASON_NO_GRAN, x1, x2
+   CPU_STUCK_IN_KERNEL | CPU_STUCK_REASON_NO_GRAN, x1, w2
 1:
wfe
wfi
-- 
1.9.1



Re: arm64: Fix size of __early_cpu_boot_status

2019-04-30 Thread Arun KS
On Tue, Apr 30, 2019 at 4:39 PM Will Deacon  wrote:
>
> On Tue, Apr 30, 2019 at 04:05:04PM +0530, Arun KS wrote:
> > __early_cpu_boot_status is of type long. Use quad
> > assembler directive to allocate proper size.
> >
> > Signed-off-by: Arun KS 
> > ---
> >  arch/arm64/kernel/head.S | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index eecf792..115f332 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -684,7 +684,7 @@ ENTRY(__boot_cpu_mode)
> >   * with MMU turned off.
> >   */
> >  ENTRY(__early_cpu_boot_status)
> > - .long   0
> > + .quad   0
>
> Yikes. How did you spot this? Did we end up corrupting an adjacent variable,
> or does the alignment in the linker script save us in practice?

Rite now there is no adjacent variable. But I was adding one and it
was getting corrupted.

Regards,
Arun
>
> Will


arm64: Fix size of __early_cpu_boot_status

2019-04-30 Thread Arun KS
__early_cpu_boot_status is of type long. Use quad
assembler directive to allocate proper size.

Signed-off-by: Arun KS 
---
 arch/arm64/kernel/head.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index eecf792..115f332 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -684,7 +684,7 @@ ENTRY(__boot_cpu_mode)
  * with MMU turned off.
  */
 ENTRY(__early_cpu_boot_status)
-   .long   0
+   .quad   0
 
.popsection
 
-- 
1.9.1



[PATCH v11] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-17 Thread Arun KS
When freeing pages are done with higher order, time spent on coalescing
pages by buddy allocator can be reduced.  With section size of 256MB, hot
add latency of a single section shows improvement from 50-60 ms to less
than 1 ms, hence improving the hot add latency by 60 times.  Modify
external providers of online callback to align with the change.

Signed-off-by: Arun KS 
Acked-by: Michal Hocko 
Reviewed-by: Oscar Salvador 
Reviewed-by: Alexander Duyck 
---
Changes sinc v10:
- Fix check page belong to HAS in hv_ballon.c.
- Removed unnecessary brackets.

Changes since v9:
- Fix condition check in hv_ballon driver.

Changes since v8:
- Remove return type change for online_page_callback.
- Use consistent names for external online_page providers.
- Fix onlined_pages accounting.

Changes since v7:
- Rebased to 5.0-rc1.
- Fixed onlined_pages accounting.
- Added comment for return value of online_page_callback.
- Renamed xen_bring_pgs_online to xen_online_pages.

Changes since v6:
- Rebased to 4.20
- Changelog updated.
- No improvement seen on arm64, hence removed removal of prefetch.

Changes since v5:
- Rebased to 4.20-rc1.
- Changelog updated.

Changes since v4:
- As suggested by Michal Hocko,
- Simplify logic in online_pages_block() by using get_order().
- Seperate out removal of prefetch from __free_pages_core().

Changes since v3:
- Renamed _free_pages_boot_core -> __free_pages_core.
- Removed prefetch from __free_pages_core.
- Removed xen_online_page().

Changes since v2:
- Reuse code from __free_pages_boot_core().

Changes since v1:
- Removed prefetch().

Changes since RFC:
- Rebase.
- As suggested by Michal Hocko remove pages_per_block.
- Modifed external providers of online_page_callback.

v10: https://lore.kernel.org/patchwork/patch/1032266/
v9: https://lore.kernel.org/patchwork/patch/1030806/
v8: https://lore.kernel.org/patchwork/patch/1030332/
v7: https://lore.kernel.org/patchwork/patch/1028908/
v6: https://lore.kernel.org/patchwork/patch/1007253/
v5: https://lore.kernel.org/patchwork/patch/995739/
v4: https://lore.kernel.org/patchwork/patch/995111/
v3: https://lore.kernel.org/patchwork/patch/992348/
v2: https://lore.kernel.org/patchwork/patch/991363/
v1: https://lore.kernel.org/patchwork/patch/989445/
RFC: https://lore.kernel.org/patchwork/patch/984754/
---

 drivers/hv/hv_balloon.c|  7 ---
 drivers/xen/balloon.c  | 15 ++-
 include/linux/memory_hotplug.h |  2 +-
 mm/internal.h  |  1 +
 mm/memory_hotplug.c| 37 +
 mm/page_alloc.c|  8 
 6 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5301fef..c2cb6df 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -771,7 +771,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
}
 }
 
-static void hv_online_page(struct page *pg)
+static void hv_online_page(struct page *pg, unsigned int order)
 {
struct hv_hotadd_state *has;
unsigned long flags;
@@ -780,10 +780,11 @@ static void hv_online_page(struct page *pg)
spin_lock_irqsave(_device.ha_lock, flags);
list_for_each_entry(has, _device.ha_region_list, list) {
/* The page belongs to a different HAS. */
-   if ((pfn < has->start_pfn) || (pfn >= has->end_pfn))
+   if ((pfn < has->start_pfn) ||
+   (pfn + (1UL << order) > has->end_pfn))
continue;
 
-   hv_page_online_one(has, pg);
+   hv_bring_pgs_online(has, pfn, 1UL << order);
break;
}
spin_unlock_irqrestore(_device.ha_lock, flags);
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index ceb5048..d107447 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -369,14 +369,19 @@ static enum bp_state reserve_additional_memory(void)
return BP_ECANCELED;
 }
 
-static void xen_online_page(struct page *page)
+static void xen_online_page(struct page *page, unsigned int order)
 {
-   __online_page_set_limits(page);
+   unsigned long i, size = (1 << order);
+   unsigned long start_pfn = page_to_pfn(page);
+   struct page *p;
 
+   pr_debug("Online %lu pages starting at pfn 0x%lx\n", size, start_pfn);
mutex_lock(_mutex);
-
-   __balloon_append(page);
-
+   for (i = 0; i < size; i++) {
+   p = pfn_to_page(start_pfn + i);
+   __online_page_set_limits(p);
+   __balloon_append(p);
+   }
mutex_unlock(_mutex);
 }
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 07da5c6..e368730 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -87,7 +87,7 @@ extern int test_pages_in_a_zone(unsigned long start_pfn, 
unsigned long end_pfn,
unsi

[PATCH v10] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-15 Thread Arun KS
When freeing pages are done with higher order, time spent on coalescing
pages by buddy allocator can be reduced.  With section size of 256MB, hot
add latency of a single section shows improvement from 50-60 ms to less
than 1 ms, hence improving the hot add latency by 60 times.  Modify
external providers of online callback to align with the change.

Signed-off-by: Arun KS 
Acked-by: Michal Hocko 
Reviewed-by: Oscar Salvador 
Reviewed-by: Alexander Duyck 
---
Changes since v9:
- Fix condition check in hv_ballon driver.

Changes since v8:
- Remove return type change for online_page_callback.
- Use consistent names for external online_page providers.
- Fix onlined_pages accounting.

Changes since v7:
- Rebased to 5.0-rc1.
- Fixed onlined_pages accounting.
- Added comment for return value of online_page_callback.
- Renamed xen_bring_pgs_online to xen_online_pages.

Changes since v6:
- Rebased to 4.20
- Changelog updated.
- No improvement seen on arm64, hence removed removal of prefetch.

Changes since v5:
- Rebased to 4.20-rc1.
- Changelog updated.

Changes since v4:
- As suggested by Michal Hocko,
- Simplify logic in online_pages_block() by using get_order().
- Seperate out removal of prefetch from __free_pages_core().

Changes since v3:
- Renamed _free_pages_boot_core -> __free_pages_core.
- Removed prefetch from __free_pages_core.
- Removed xen_online_page().

Changes since v2:
- Reuse code from __free_pages_boot_core().

Changes since v1:
- Removed prefetch().

Changes since RFC:
- Rebase.
- As suggested by Michal Hocko remove pages_per_block.
- Modifed external providers of online_page_callback.

v9: https://lore.kernel.org/patchwork/patch/1030806/
v8: https://lore.kernel.org/patchwork/patch/1030332/
v7: https://lore.kernel.org/patchwork/patch/1028908/
v6: https://lore.kernel.org/patchwork/patch/1007253/
v5: https://lore.kernel.org/patchwork/patch/995739/
v4: https://lore.kernel.org/patchwork/patch/995111/
v3: https://lore.kernel.org/patchwork/patch/992348/
v2: https://lore.kernel.org/patchwork/patch/991363/
v1: https://lore.kernel.org/patchwork/patch/989445/
RFC: https://lore.kernel.org/patchwork/patch/984754/
---
 drivers/hv/hv_balloon.c|  4 ++--
 drivers/xen/balloon.c  | 15 ++-
 include/linux/memory_hotplug.h |  2 +-
 mm/internal.h  |  1 +
 mm/memory_hotplug.c| 37 +
 mm/page_alloc.c|  8 
 6 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5301fef..2ced9a7 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -771,7 +771,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
}
 }
 
-static void hv_online_page(struct page *pg)
+static void hv_online_page(struct page *pg, unsigned int order)
 {
struct hv_hotadd_state *has;
unsigned long flags;
@@ -780,10 +780,11 @@ static void hv_online_page(struct page *pg)
spin_lock_irqsave(_device.ha_lock, flags);
list_for_each_entry(has, _device.ha_region_list, list) {
/* The page belongs to a different HAS. */
-   if ((pfn < has->start_pfn) || (pfn >= has->end_pfn))
+   if ((pfn < has->start_pfn) ||
+   (pfn + (1UL << order) >= has->end_pfn))
continue;
 
-   hv_page_online_one(has, pg);
+   hv_bring_pgs_online(has, pfn, (1UL << order));
break;
}
spin_unlock_irqrestore(_device.ha_lock, flags);
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index ceb5048..d107447 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -369,14 +369,19 @@ static enum bp_state reserve_additional_memory(void)
return BP_ECANCELED;
 }
 
-static void xen_online_page(struct page *page)
+static void xen_online_page(struct page *page, unsigned int order)
 {
-   __online_page_set_limits(page);
+   unsigned long i, size = (1 << order);
+   unsigned long start_pfn = page_to_pfn(page);
+   struct page *p;
 
+   pr_debug("Online %lu pages starting at pfn 0x%lx\n", size, start_pfn);
mutex_lock(_mutex);
-
-   __balloon_append(page);
-
+   for (i = 0; i < size; i++) {
+   p = pfn_to_page(start_pfn + i);
+   __online_page_set_limits(p);
+   __balloon_append(p);
+   }
mutex_unlock(_mutex);
 }
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 07da5c6..e368730 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -87,7 +87,7 @@ extern int test_pages_in_a_zone(unsigned long start_pfn, 
unsigned long end_pfn,
unsigned long *valid_start, unsigned long *valid_end);
 extern void __offline_isolated_pages(unsigned long, unsigned long);
 
-typedef void (

Re: [PATCH v9] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-14 Thread Arun KS

On 2019-01-14 21:45, Alexander Duyck wrote:

On Mon, 2019-01-14 at 19:29 +0530, Arun KS wrote:

On 2019-01-10 21:53, Alexander Duyck wrote:
> On Thu, 2019-01-10 at 11:05 +0530, Arun KS wrote:
> > When freeing pages are done with higher order, time spent on
> > coalescing
> > pages by buddy allocator can be reduced.  With section size of 256MB,
> > hot
> > add latency of a single section shows improvement from 50-60 ms to
> > less
> > than 1 ms, hence improving the hot add latency by 60 times.  Modify
> > external providers of online callback to align with the change.
> >
> > Signed-off-by: Arun KS 
> > Acked-by: Michal Hocko 
> > Reviewed-by: Oscar Salvador 
>
> So I decided to give this one last thorough review and I think I might
> have found a few more minor issues, but not anything that is
> necessarily a showstopper.
>
> Reviewed-by: Alexander Duyck 
>
> > ---
> > Changes since v8:
> > - Remove return type change for online_page_callback.
> > - Use consistent names for external online_page providers.
> > - Fix onlined_pages accounting.
> >
> > Changes since v7:
> > - Rebased to 5.0-rc1.
> > - Fixed onlined_pages accounting.
> > - Added comment for return value of online_page_callback.
> > - Renamed xen_bring_pgs_online to xen_online_pages.
> >
> > Changes since v6:
> > - Rebased to 4.20
> > - Changelog updated.
> > - No improvement seen on arm64, hence removed removal of prefetch.
> >
> > Changes since v5:
> > - Rebased to 4.20-rc1.
> > - Changelog updated.
> >
> > Changes since v4:
> > - As suggested by Michal Hocko,
> > - Simplify logic in online_pages_block() by using get_order().
> > - Seperate out removal of prefetch from __free_pages_core().
> >
> > Changes since v3:
> > - Renamed _free_pages_boot_core -> __free_pages_core.
> > - Removed prefetch from __free_pages_core.
> > - Removed xen_online_page().
> >
> > Changes since v2:
> > - Reuse code from __free_pages_boot_core().
> >
> > Changes since v1:
> > - Removed prefetch().
> >
> > Changes since RFC:
> > - Rebase.
> > - As suggested by Michal Hocko remove pages_per_block.
> > - Modifed external providers of online_page_callback.
> >
> > v8: https://lore.kernel.org/patchwork/patch/1030332/
> > v7: https://lore.kernel.org/patchwork/patch/1028908/
> > v6: https://lore.kernel.org/patchwork/patch/1007253/
> > v5: https://lore.kernel.org/patchwork/patch/995739/
> > v4: https://lore.kernel.org/patchwork/patch/995111/
> > v3: https://lore.kernel.org/patchwork/patch/992348/
> > v2: https://lore.kernel.org/patchwork/patch/991363/
> > v1: https://lore.kernel.org/patchwork/patch/989445/
> > RFC: https://lore.kernel.org/patchwork/patch/984754/
> > ---
> > ---
> >  drivers/hv/hv_balloon.c|  4 ++--
> >  drivers/xen/balloon.c  | 15 ++-
> >  include/linux/memory_hotplug.h |  2 +-
> >  mm/internal.h  |  1 +
> >  mm/memory_hotplug.c| 37
> > +
> >  mm/page_alloc.c|  8 
> >  6 files changed, 43 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> > index 5301fef..55d79f8 100644
> > --- a/drivers/hv/hv_balloon.c
> > +++ b/drivers/hv/hv_balloon.c
> > @@ -771,7 +771,7 @@ static void hv_mem_hot_add(unsigned long start,
> > unsigned long size,
> >   }
> >  }
> >
> > -static void hv_online_page(struct page *pg)
> > +static void hv_online_page(struct page *pg, unsigned int order)
> >  {
> >   struct hv_hotadd_state *has;
> >   unsigned long flags;
> > @@ -783,7 +783,7 @@ static void hv_online_page(struct page *pg)
> >   if ((pfn < has->start_pfn) || (pfn >= has->end_pfn))
> >   continue;
> >
>
> I haven't followed earlier reviews, but do we know for certain the
> entire range being onlined will fit within a single hv_hotadd_state? If
> nothing else it seems like this check should be updated so that we are
> checking to verify that pfn + (1UL << order) is less than or equal to
> has->end_pfn.

Good catch. I ll change the check to,
  if ((pfn < has->start_pfn) ||
   (pfn + (1UL << order) >= has->end_pfn))
   continue;

>
> > - hv_page_online_one(has, pg);
> > + hv_bring_pgs_online(has, pfn, (1UL << order));
> >   break;
> >   }

Re: [PATCH v9] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-14 Thread Arun KS

On 2019-01-10 21:53, Alexander Duyck wrote:

On Thu, 2019-01-10 at 11:05 +0530, Arun KS wrote:
When freeing pages are done with higher order, time spent on 
coalescing
pages by buddy allocator can be reduced.  With section size of 256MB, 
hot
add latency of a single section shows improvement from 50-60 ms to 
less

than 1 ms, hence improving the hot add latency by 60 times.  Modify
external providers of online callback to align with the change.

Signed-off-by: Arun KS 
Acked-by: Michal Hocko 
Reviewed-by: Oscar Salvador 


So I decided to give this one last thorough review and I think I might
have found a few more minor issues, but not anything that is
necessarily a showstopper.

Reviewed-by: Alexander Duyck 


---
Changes since v8:
- Remove return type change for online_page_callback.
- Use consistent names for external online_page providers.
- Fix onlined_pages accounting.

Changes since v7:
- Rebased to 5.0-rc1.
- Fixed onlined_pages accounting.
- Added comment for return value of online_page_callback.
- Renamed xen_bring_pgs_online to xen_online_pages.

Changes since v6:
- Rebased to 4.20
- Changelog updated.
- No improvement seen on arm64, hence removed removal of prefetch.

Changes since v5:
- Rebased to 4.20-rc1.
- Changelog updated.

Changes since v4:
- As suggested by Michal Hocko,
- Simplify logic in online_pages_block() by using get_order().
- Seperate out removal of prefetch from __free_pages_core().

Changes since v3:
- Renamed _free_pages_boot_core -> __free_pages_core.
- Removed prefetch from __free_pages_core.
- Removed xen_online_page().

Changes since v2:
- Reuse code from __free_pages_boot_core().

Changes since v1:
- Removed prefetch().

Changes since RFC:
- Rebase.
- As suggested by Michal Hocko remove pages_per_block.
- Modifed external providers of online_page_callback.

v8: https://lore.kernel.org/patchwork/patch/1030332/
v7: https://lore.kernel.org/patchwork/patch/1028908/
v6: https://lore.kernel.org/patchwork/patch/1007253/
v5: https://lore.kernel.org/patchwork/patch/995739/
v4: https://lore.kernel.org/patchwork/patch/995111/
v3: https://lore.kernel.org/patchwork/patch/992348/
v2: https://lore.kernel.org/patchwork/patch/991363/
v1: https://lore.kernel.org/patchwork/patch/989445/
RFC: https://lore.kernel.org/patchwork/patch/984754/
---
---
 drivers/hv/hv_balloon.c|  4 ++--
 drivers/xen/balloon.c  | 15 ++-
 include/linux/memory_hotplug.h |  2 +-
 mm/internal.h  |  1 +
 mm/memory_hotplug.c| 37 
+

 mm/page_alloc.c|  8 
 6 files changed, 43 insertions(+), 24 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5301fef..55d79f8 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -771,7 +771,7 @@ static void hv_mem_hot_add(unsigned long start, 
unsigned long size,

}
 }

-static void hv_online_page(struct page *pg)
+static void hv_online_page(struct page *pg, unsigned int order)
 {
struct hv_hotadd_state *has;
unsigned long flags;
@@ -783,7 +783,7 @@ static void hv_online_page(struct page *pg)
if ((pfn < has->start_pfn) || (pfn >= has->end_pfn))
continue;



I haven't followed earlier reviews, but do we know for certain the
entire range being onlined will fit within a single hv_hotadd_state? If
nothing else it seems like this check should be updated so that we are
checking to verify that pfn + (1UL << order) is less than or equal to
has->end_pfn.


Good catch. I ll change the check to,
 if ((pfn < has->start_pfn) ||
  (pfn + (1UL << order) >= has->end_pfn))
  continue;




-   hv_page_online_one(has, pg);
+   hv_bring_pgs_online(has, pfn, (1UL << order));
break;
}
spin_unlock_irqrestore(_device.ha_lock, flags);
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index ceb5048..d107447 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -369,14 +369,19 @@ static enum bp_state 
reserve_additional_memory(void)

return BP_ECANCELED;
 }

-static void xen_online_page(struct page *page)
+static void xen_online_page(struct page *page, unsigned int order)
 {
-   __online_page_set_limits(page);
+   unsigned long i, size = (1 << order);
+   unsigned long start_pfn = page_to_pfn(page);
+   struct page *p;

+	pr_debug("Online %lu pages starting at pfn 0x%lx\n", size, 
start_pfn);

mutex_lock(_mutex);
-
-   __balloon_append(page);
-
+   for (i = 0; i < size; i++) {
+   p = pfn_to_page(start_pfn + i);
+   __online_page_set_limits(p);
+   __balloon_append(p);
+   }
mutex_unlock(_mutex);
 }

diff --git a/include/linux/memory_hotplug.h 
b/include/linux/memory_hotplug.h

index 07da5c6..e368730 100644
--- a/include/l

[PATCH v9] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-09 Thread Arun KS
When freeing pages are done with higher order, time spent on coalescing
pages by buddy allocator can be reduced.  With section size of 256MB, hot
add latency of a single section shows improvement from 50-60 ms to less
than 1 ms, hence improving the hot add latency by 60 times.  Modify
external providers of online callback to align with the change.

Signed-off-by: Arun KS 
Acked-by: Michal Hocko 
Reviewed-by: Oscar Salvador 
---
Changes since v8:
- Remove return type change for online_page_callback.
- Use consistent names for external online_page providers.
- Fix onlined_pages accounting.

Changes since v7:
- Rebased to 5.0-rc1.
- Fixed onlined_pages accounting.
- Added comment for return value of online_page_callback.
- Renamed xen_bring_pgs_online to xen_online_pages.

Changes since v6:
- Rebased to 4.20
- Changelog updated.
- No improvement seen on arm64, hence removed removal of prefetch.

Changes since v5:
- Rebased to 4.20-rc1.
- Changelog updated.

Changes since v4:
- As suggested by Michal Hocko,
- Simplify logic in online_pages_block() by using get_order().
- Seperate out removal of prefetch from __free_pages_core().

Changes since v3:
- Renamed _free_pages_boot_core -> __free_pages_core.
- Removed prefetch from __free_pages_core.
- Removed xen_online_page().

Changes since v2:
- Reuse code from __free_pages_boot_core().

Changes since v1:
- Removed prefetch().

Changes since RFC:
- Rebase.
- As suggested by Michal Hocko remove pages_per_block.
- Modifed external providers of online_page_callback.

v8: https://lore.kernel.org/patchwork/patch/1030332/
v7: https://lore.kernel.org/patchwork/patch/1028908/
v6: https://lore.kernel.org/patchwork/patch/1007253/
v5: https://lore.kernel.org/patchwork/patch/995739/
v4: https://lore.kernel.org/patchwork/patch/995111/
v3: https://lore.kernel.org/patchwork/patch/992348/
v2: https://lore.kernel.org/patchwork/patch/991363/
v1: https://lore.kernel.org/patchwork/patch/989445/
RFC: https://lore.kernel.org/patchwork/patch/984754/
---
---
 drivers/hv/hv_balloon.c|  4 ++--
 drivers/xen/balloon.c  | 15 ++-
 include/linux/memory_hotplug.h |  2 +-
 mm/internal.h  |  1 +
 mm/memory_hotplug.c| 37 +
 mm/page_alloc.c|  8 
 6 files changed, 43 insertions(+), 24 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5301fef..55d79f8 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -771,7 +771,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
}
 }
 
-static void hv_online_page(struct page *pg)
+static void hv_online_page(struct page *pg, unsigned int order)
 {
struct hv_hotadd_state *has;
unsigned long flags;
@@ -783,7 +783,7 @@ static void hv_online_page(struct page *pg)
if ((pfn < has->start_pfn) || (pfn >= has->end_pfn))
continue;
 
-   hv_page_online_one(has, pg);
+   hv_bring_pgs_online(has, pfn, (1UL << order));
break;
}
spin_unlock_irqrestore(_device.ha_lock, flags);
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index ceb5048..d107447 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -369,14 +369,19 @@ static enum bp_state reserve_additional_memory(void)
return BP_ECANCELED;
 }
 
-static void xen_online_page(struct page *page)
+static void xen_online_page(struct page *page, unsigned int order)
 {
-   __online_page_set_limits(page);
+   unsigned long i, size = (1 << order);
+   unsigned long start_pfn = page_to_pfn(page);
+   struct page *p;
 
+   pr_debug("Online %lu pages starting at pfn 0x%lx\n", size, start_pfn);
mutex_lock(_mutex);
-
-   __balloon_append(page);
-
+   for (i = 0; i < size; i++) {
+   p = pfn_to_page(start_pfn + i);
+   __online_page_set_limits(p);
+   __balloon_append(p);
+   }
mutex_unlock(_mutex);
 }
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 07da5c6..e368730 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -87,7 +87,7 @@ extern int test_pages_in_a_zone(unsigned long start_pfn, 
unsigned long end_pfn,
unsigned long *valid_start, unsigned long *valid_end);
 extern void __offline_isolated_pages(unsigned long, unsigned long);
 
-typedef void (*online_page_callback_t)(struct page *page);
+typedef void (*online_page_callback_t)(struct page *page, unsigned int order);
 
 extern int set_online_page_callback(online_page_callback_t callback);
 extern int restore_online_page_callback(online_page_callback_t callback);
diff --git a/mm/internal.h b/mm/internal.h
index f4a7bb0..536bc2a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -163,6 +163,7 @@ static inline struct page *pageblock_pfn_to_page(unsigned 
long st

Re: [PATCH v7] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-09 Thread Arun KS

On 2019-01-10 00:26, Andrew Morton wrote:
On Wed, 09 Jan 2019 16:36:36 +0530 Arun KS  
wrote:



On 2019-01-09 16:27, Michal Hocko wrote:
> On Wed 09-01-19 16:12:48, Arun KS wrote:
> [...]
>> It will be called once per online of a section and the arg value is
>> always
>> set to 0 while entering online_pages_range.
>
> You rare right that this will be the case in the most simple scenario.
> But the point is that the callback can be called several times from
> walk_system_ram_range and then your current code wouldn't work
> properly.

Thanks. Will use +=


The v8 patch
https://lore.kernel.org/lkml/1547032395-24582-1-git-send-email-aru...@codeaurora.org/T/#u

(which you apparently sent 7 minutes after typing the above) still has

 static int online_pages_range(unsigned long start_pfn, unsigned long 
nr_pages,

void *arg)
 {
-   unsigned long i;
unsigned long onlined_pages = *(unsigned long *)arg;
-   struct page *page;

if (PageReserved(pfn_to_page(start_pfn)))
-   for (i = 0; i < nr_pages; i++) {
-   page = pfn_to_page(start_pfn + i);
-   (*online_page_callback)(page);
-   onlined_pages++;
-   }
+   onlined_pages = online_pages_blocks(start_pfn, nr_pages);


Even then the code makes no sense.

static int online_pages_range(unsigned long start_pfn, unsigned long 
nr_pages,

void *arg)
{
unsigned long onlined_pages = *(unsigned long *)arg;

if (PageReserved(pfn_to_page(start_pfn)))
onlined_pages += online_pages_blocks(start_pfn, nr_pages);

online_mem_sections(start_pfn, start_pfn + nr_pages);

*(unsigned long *)arg += onlined_pages;
return 0;
}

Either the final assignment should be

*(unsigned long *)arg = onlined_pages;

or the initialization should be

unsigned long onlined_pages = 0;



This is becoming a tad tiresome and I'd prefer not to have to check up
on such things.  Can we please get this right?


Sorry about that. Will fix it.

Regards,
Arun


Re: [PATCH v8] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-09 Thread Arun KS

On 2019-01-09 21:47, Alexander Duyck wrote:

On Wed, 2019-01-09 at 16:43 +0530, Arun KS wrote:
When freeing pages are done with higher order, time spent on 
coalescing
pages by buddy allocator can be reduced.  With section size of 256MB, 
hot
add latency of a single section shows improvement from 50-60 ms to 
less

than 1 ms, hence improving the hot add latency by 60 times.  Modify
external providers of online callback to align with the change.

Signed-off-by: Arun KS 
Acked-by: Michal Hocko 
Reviewed-by: Oscar Salvador 
---
Changes since v7:
- Rebased to 5.0-rc1.
- Fixed onlined_pages accounting.
- Added comment for return value of online_page_callback.
- Renamed xen_bring_pgs_online to xen_online_pages.


As far as the renaming you should try to be consistent. If you aren't
going to rename generic_online_page or hv_online_page I wouldn't bother
with renaming xen_online_page. I would stick with the name
xen_online_page since it is a single high order page that you are
freeing.


Sure. I ll fix them.





Changes since v6:
- Rebased to 4.20
- Changelog updated.
- No improvement seen on arm64, hence removed removal of prefetch.

Changes since v5:
- Rebased to 4.20-rc1.
- Changelog updated.

Changes since v4:
- As suggested by Michal Hocko,
- Simplify logic in online_pages_block() by using get_order().
- Seperate out removal of prefetch from __free_pages_core().

Changes since v3:
- Renamed _free_pages_boot_core -> __free_pages_core.
- Removed prefetch from __free_pages_core.
- Removed xen_online_page().

Changes since v2:
- Reuse code from __free_pages_boot_core().

Changes since v1:
- Removed prefetch().

Changes since RFC:
- Rebase.
- As suggested by Michal Hocko remove pages_per_block.
- Modifed external providers of online_page_callback.

v7: https://lore.kernel.org/patchwork/patch/1028908/
v6: https://lore.kernel.org/patchwork/patch/1007253/
v5: https://lore.kernel.org/patchwork/patch/995739/
v4: https://lore.kernel.org/patchwork/patch/995111/
v3: https://lore.kernel.org/patchwork/patch/992348/
v2: https://lore.kernel.org/patchwork/patch/991363/
v1: https://lore.kernel.org/patchwork/patch/989445/
RFC: https://lore.kernel.org/patchwork/patch/984754/
---
 drivers/hv/hv_balloon.c|  6 +++--
 drivers/xen/balloon.c  | 21 +++--
 include/linux/memory_hotplug.h |  2 +-
 mm/internal.h  |  1 +
 mm/memory_hotplug.c| 51 
+++---

 mm/page_alloc.c|  8 +++
 6 files changed, 62 insertions(+), 27 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5301fef..211f3fe 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -771,7 +771,7 @@ static void hv_mem_hot_add(unsigned long start, 
unsigned long size,

}
 }

-static void hv_online_page(struct page *pg)
+static int hv_online_page(struct page *pg, unsigned int order)
 {
struct hv_hotadd_state *has;
unsigned long flags;
@@ -783,10 +783,12 @@ static void hv_online_page(struct page *pg)
if ((pfn < has->start_pfn) || (pfn >= has->end_pfn))
continue;

-   hv_page_online_one(has, pg);
+   hv_bring_pgs_online(has, pfn, (1UL << order));
break;
}
spin_unlock_irqrestore(_device.ha_lock, flags);
+
+   return 0;
 }



I would hold off on adding return values until you actually have code
that uses them. It will make things easier if somebody has to backport
this to a stable branch and avoid adding complexity until it is needed.

Also the patch description doesn't really explain that it is doing this
so it might be better to break it off into a separate patch so you can
call out exactly why you are adding a return value in the patch
description.

- Alex


Re: [PATCH v7] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-09 Thread Arun KS

On 2019-01-09 21:39, Alexander Duyck wrote:

On Wed, 2019-01-09 at 11:51 +0530, Arun KS wrote:

On 2019-01-09 03:47, Alexander Duyck wrote:
> On Fri, 2019-01-04 at 10:31 +0530, Arun KS wrote:
> > When freeing pages are done with higher order, time spent on
> > coalescing
> > pages by buddy allocator can be reduced.  With section size of 256MB,
> > hot
> > add latency of a single section shows improvement from 50-60 ms to
> > less
> > than 1 ms, hence improving the hot add latency by 60 times.  Modify
> > external providers of online callback to align with the change.
> >
> > Signed-off-by: Arun KS 
> > Acked-by: Michal Hocko 
> > Reviewed-by: Oscar Salvador 
>
> Sorry, ended up encountering a couple more things that have me a bit
> confused.
>
> [...]
>
> > diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> > index 5301fef..211f3fe 100644
> > --- a/drivers/hv/hv_balloon.c
> > +++ b/drivers/hv/hv_balloon.c
> > @@ -771,7 +771,7 @@ static void hv_mem_hot_add(unsigned long start,
> > unsigned long size,
> >   }
> >  }
> >
> > -static void hv_online_page(struct page *pg)
> > +static int hv_online_page(struct page *pg, unsigned int order)
> >  {
> >   struct hv_hotadd_state *has;
> >   unsigned long flags;
> > @@ -783,10 +783,12 @@ static void hv_online_page(struct page *pg)
> >   if ((pfn < has->start_pfn) || (pfn >= has->end_pfn))
> >   continue;
> >
> > - hv_page_online_one(has, pg);
> > + hv_bring_pgs_online(has, pfn, (1UL << order));
> >   break;
> >   }
> >   spin_unlock_irqrestore(_device.ha_lock, flags);
> > +
> > + return 0;
> >  }
> >
> >  static int pfn_covered(unsigned long start_pfn, unsigned long
> > pfn_cnt)
>
> So the question I have is why was a return value added to these
> functions? They were previously void types and now they are int. What
> is the return value expected other than 0?

Earlier with returning a void there was now way for an arch code to
denying onlining of this particular page. By using an int as return
type, we can implement this. In one of the boards I was using, there 
are

some pages which should not be onlined because they are used for other
purposes(like secure trust zone or hypervisor).


So where is the code using that? I don't see any functions in the
kernel that are returning anything other than 0. Maybe you should hold
off on changing the return type and make that a separate patch to be
enabled when you add the new functions that can return non-zero values.

That way if someone wants to backport this they are just getting the
bits needed to enable the improved hot-plug times without adding the
extra overhead for changing the return type.


The implementation was in our downstream code. I thought this might be 
useful for someone else in similar situations.
Considering the above mentioned reasons, I ll remove changing the return 
type.


Regards,
Arun


[PATCH v8] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-09 Thread Arun KS
When freeing pages are done with higher order, time spent on coalescing
pages by buddy allocator can be reduced.  With section size of 256MB, hot
add latency of a single section shows improvement from 50-60 ms to less
than 1 ms, hence improving the hot add latency by 60 times.  Modify
external providers of online callback to align with the change.

Signed-off-by: Arun KS 
Acked-by: Michal Hocko 
Reviewed-by: Oscar Salvador 
---
Changes since v7:
- Rebased to 5.0-rc1.
- Fixed onlined_pages accounting.
- Added comment for return value of online_page_callback.
- Renamed xen_bring_pgs_online to xen_online_pages.

Changes since v6:
- Rebased to 4.20
- Changelog updated.
- No improvement seen on arm64, hence removed removal of prefetch.

Changes since v5:
- Rebased to 4.20-rc1.
- Changelog updated.

Changes since v4:
- As suggested by Michal Hocko,
- Simplify logic in online_pages_block() by using get_order().
- Seperate out removal of prefetch from __free_pages_core().

Changes since v3:
- Renamed _free_pages_boot_core -> __free_pages_core.
- Removed prefetch from __free_pages_core.
- Removed xen_online_page().

Changes since v2:
- Reuse code from __free_pages_boot_core().

Changes since v1:
- Removed prefetch().

Changes since RFC:
- Rebase.
- As suggested by Michal Hocko remove pages_per_block.
- Modifed external providers of online_page_callback.

v7: https://lore.kernel.org/patchwork/patch/1028908/
v6: https://lore.kernel.org/patchwork/patch/1007253/
v5: https://lore.kernel.org/patchwork/patch/995739/
v4: https://lore.kernel.org/patchwork/patch/995111/
v3: https://lore.kernel.org/patchwork/patch/992348/
v2: https://lore.kernel.org/patchwork/patch/991363/
v1: https://lore.kernel.org/patchwork/patch/989445/
RFC: https://lore.kernel.org/patchwork/patch/984754/
---
 drivers/hv/hv_balloon.c|  6 +++--
 drivers/xen/balloon.c  | 21 +++--
 include/linux/memory_hotplug.h |  2 +-
 mm/internal.h  |  1 +
 mm/memory_hotplug.c| 51 +++---
 mm/page_alloc.c|  8 +++
 6 files changed, 62 insertions(+), 27 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5301fef..211f3fe 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -771,7 +771,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
}
 }
 
-static void hv_online_page(struct page *pg)
+static int hv_online_page(struct page *pg, unsigned int order)
 {
struct hv_hotadd_state *has;
unsigned long flags;
@@ -783,10 +783,12 @@ static void hv_online_page(struct page *pg)
if ((pfn < has->start_pfn) || (pfn >= has->end_pfn))
continue;
 
-   hv_page_online_one(has, pg);
+   hv_bring_pgs_online(has, pfn, (1UL << order));
break;
}
spin_unlock_irqrestore(_device.ha_lock, flags);
+
+   return 0;
 }
 
 static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index ceb5048..116a042 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -345,7 +345,7 @@ static enum bp_state reserve_additional_memory(void)
 
/*
 * add_memory_resource() will call online_pages() which in its turn
-* will call xen_online_page() callback causing deadlock if we don't
+* will call xen_online_pages() callback causing deadlock if we don't
 * release balloon_mutex here. Unlocking here is safe because the
 * callers drop the mutex before trying again.
 */
@@ -369,15 +369,22 @@ static enum bp_state reserve_additional_memory(void)
return BP_ECANCELED;
 }
 
-static void xen_online_page(struct page *page)
+static int xen_online_pages(struct page *pg, unsigned int order)
 {
-   __online_page_set_limits(page);
+   unsigned long i, size = (1 << order);
+   unsigned long start_pfn = page_to_pfn(pg);
+   struct page *p;
 
+   pr_debug("Online %lu pages starting at pfn 0x%lx\n", size, start_pfn);
mutex_lock(_mutex);
-
-   __balloon_append(page);
-
+   for (i = 0; i < size; i++) {
+   p = pfn_to_page(start_pfn + i);
+   __online_page_set_limits(p);
+   __balloon_append(p);
+   }
mutex_unlock(_mutex);
+
+   return 0;
 }
 
 static int xen_memory_notifier(struct notifier_block *nb, unsigned long val, 
void *v)
@@ -702,7 +709,7 @@ static int __init balloon_init(void)
balloon_stats.max_retry_count = RETRY_UNLIMITED;
 
 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
-   set_online_page_callback(_online_page);
+   set_online_page_callback(_online_pages);
register_memory_notifier(_memory_nb);
register_sysctl_table(xen_root);
 #endif
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 07da5c6..d56bfba 100

Re: [PATCH v7] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-09 Thread Arun KS

On 2019-01-09 16:27, Michal Hocko wrote:

On Wed 09-01-19 16:12:48, Arun KS wrote:
[...]
It will be called once per online of a section and the arg value is 
always

set to 0 while entering online_pages_range.


You rare right that this will be the case in the most simple scenario.
But the point is that the callback can be called several times from
walk_system_ram_range and then your current code wouldn't work 
properly.


Thanks. Will use +=

Regards,
Arun


Re: [PATCH v7] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-09 Thread Arun KS

On 2019-01-09 14:10, Michal Hocko wrote:

On Wed 09-01-19 13:58:50, Arun KS wrote:

On 2019-01-09 13:07, Michal Hocko wrote:
> On Wed 09-01-19 11:28:52, Arun KS wrote:
> > On 2019-01-08 23:43, Michal Hocko wrote:
> > > On Tue 08-01-19 09:56:09, Alexander Duyck wrote:
> > > > On Fri, 2019-01-04 at 10:31 +0530, Arun KS wrote:
> > > [...]
> > > > >  static int online_pages_range(unsigned long start_pfn, unsigned long 
nr_pages,
> > > > >void *arg)
> > > > >  {
> > > > > -  unsigned long i;
> > > > >unsigned long onlined_pages = *(unsigned long *)arg;
> > > > > -  struct page *page;
> > > > >
> > > > >if (PageReserved(pfn_to_page(start_pfn)))
> > > > > -  for (i = 0; i < nr_pages; i++) {
> > > > > -  page = pfn_to_page(start_pfn + i);
> > > > > -  (*online_page_callback)(page);
> > > > > -  onlined_pages++;
> > > > > -  }
> > > > > +  onlined_pages = online_pages_blocks(start_pfn, 
nr_pages);
> > > >
> > > > Shouldn't this be a "+=" instead of an "="? It seems like you are
> > > > going
> > > > to lose your count otherwise.
> > >
> > > You are right of course. I should have noticed during the review.
> > > Thanks!
> >
> > I think we don't need to. The caller function is setting
> > onlined_pages = 0
> > before calling online_pages_range().
> > And there are no other reference to online_pages_range other than from
> > online_pages().
>
> Are you missing that we accumulate onlined_pages via
>*(unsigned long *)arg = onlined_pages;
> in online_pages_range?

In my testing I didn't find any problem. To match the code being 
replaced

and to avoid any corner cases, it is better to use +=
Will update the patch.


Have you checked that the number of present pages both in the zone and
the node is correct because I fail to see how that would be possible.


Yes they are showing correct values.

Previous value of cat /proc/zoneinfo,

Node 0, zone   Normal
  pages free 65492
min  300
low  375
high 450
spanned  65536
present  65536
managed  65536

Value after hotadd,

Node 0, zone   Normal
  pages free 129970
min  518
low  649
high 780
spanned  983040
present  131072
managed  131072

I added prints in online_pages_range function.
It will be called once per online of a section and the arg value is 
always set to 0 while entering online_pages_range.


/sys/devices/system/memory # echo online > memory16/state
[   52.956558] online_pages_range start_pfn = 10 nr_pages = 65536 
arg = 0
[   52.964104] Built 1 zonelists, mobility grouping on.  Total pages: 
187367

[   52.964828] Policy zone: Normal

But still I'll change to += to match with the previous code.

Regards,
Arun


Re: [PATCH v7] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-09 Thread Arun KS

On 2019-01-09 13:07, Michal Hocko wrote:

On Wed 09-01-19 11:28:52, Arun KS wrote:

On 2019-01-08 23:43, Michal Hocko wrote:
> On Tue 08-01-19 09:56:09, Alexander Duyck wrote:
> > On Fri, 2019-01-04 at 10:31 +0530, Arun KS wrote:
> [...]
> > >  static int online_pages_range(unsigned long start_pfn, unsigned long 
nr_pages,
> > >  void *arg)
> > >  {
> > > -unsigned long i;
> > >  unsigned long onlined_pages = *(unsigned long *)arg;
> > > -struct page *page;
> > >
> > >  if (PageReserved(pfn_to_page(start_pfn)))
> > > -for (i = 0; i < nr_pages; i++) {
> > > -page = pfn_to_page(start_pfn + i);
> > > -(*online_page_callback)(page);
> > > -onlined_pages++;
> > > -}
> > > +onlined_pages = online_pages_blocks(start_pfn, nr_pages);
> >
> > Shouldn't this be a "+=" instead of an "="? It seems like you are
> > going
> > to lose your count otherwise.
>
> You are right of course. I should have noticed during the review.
> Thanks!

I think we don't need to. The caller function is setting onlined_pages 
= 0

before calling online_pages_range().
And there are no other reference to online_pages_range other than from
online_pages().


Are you missing that we accumulate onlined_pages via
*(unsigned long *)arg = onlined_pages;
in online_pages_range?


In my testing I didn't find any problem. To match the code being 
replaced and to avoid any corner cases, it is better to use +=

Will update the patch.

Regards,
Arun


Re: [PATCH v7] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-08 Thread Arun KS

On 2019-01-09 03:47, Alexander Duyck wrote:

On Fri, 2019-01-04 at 10:31 +0530, Arun KS wrote:
When freeing pages are done with higher order, time spent on 
coalescing
pages by buddy allocator can be reduced.  With section size of 256MB, 
hot
add latency of a single section shows improvement from 50-60 ms to 
less

than 1 ms, hence improving the hot add latency by 60 times.  Modify
external providers of online callback to align with the change.

Signed-off-by: Arun KS 
Acked-by: Michal Hocko 
Reviewed-by: Oscar Salvador 


Sorry, ended up encountering a couple more things that have me a bit
confused.

[...]


diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5301fef..211f3fe 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -771,7 +771,7 @@ static void hv_mem_hot_add(unsigned long start, 
unsigned long size,

}
 }

-static void hv_online_page(struct page *pg)
+static int hv_online_page(struct page *pg, unsigned int order)
 {
struct hv_hotadd_state *has;
unsigned long flags;
@@ -783,10 +783,12 @@ static void hv_online_page(struct page *pg)
if ((pfn < has->start_pfn) || (pfn >= has->end_pfn))
continue;

-   hv_page_online_one(has, pg);
+   hv_bring_pgs_online(has, pfn, (1UL << order));
break;
}
spin_unlock_irqrestore(_device.ha_lock, flags);
+
+   return 0;
 }

 static int pfn_covered(unsigned long start_pfn, unsigned long 
pfn_cnt)


So the question I have is why was a return value added to these
functions? They were previously void types and now they are int. What
is the return value expected other than 0?


Earlier with returning a void there was now way for an arch code to 
denying onlining of this particular page. By using an int as return 
type, we can implement this. In one of the boards I was using, there are 
some pages which should not be onlined because they are used for other 
purposes(like secure trust zone or hypervisor).





diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index ceb5048..95f888f 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -345,8 +345,8 @@ static enum bp_state 
reserve_additional_memory(void)


/*
 * add_memory_resource() will call online_pages() which in its turn
-* will call xen_online_page() callback causing deadlock if we don't
-* release balloon_mutex here. Unlocking here is safe because the
+* will call xen_bring_pgs_online() callback causing deadlock if we
+	 * don't release balloon_mutex here. Unlocking here is safe because 
the

 * callers drop the mutex before trying again.
 */
mutex_unlock(_mutex);
@@ -369,15 +369,22 @@ static enum bp_state 
reserve_additional_memory(void)

return BP_ECANCELED;
 }

-static void xen_online_page(struct page *page)
+static int xen_bring_pgs_online(struct page *pg, unsigned int order)


Why did we rename this function? I see it was added as a new function
in v3, however in v4 we ended up replacing it completely. So why not
just keep the same name and make it easier for us to identify that the
is the Xen version of the XXX_online_pages callback?


Point taken. Will send a patch.



[...]

+static int online_pages_blocks(unsigned long start, unsigned long 
nr_pages)

+{
+   unsigned long end = start + nr_pages;
+   int order, ret, onlined_pages = 0;
+
+   while (start < end) {
+   order = min(MAX_ORDER - 1,
+   get_order(PFN_PHYS(end) - PFN_PHYS(start)));
+
+   ret = (*online_page_callback)(pfn_to_page(start), order);
+   if (!ret)
+   onlined_pages += (1UL << order);
+   else if (ret > 0)
+   onlined_pages += ret;
+


So if the ret > 0 it is supposed to represent how many pages were
onlined within a given block? What if the ret was negative? Really I am
not a fan of adding a return value to the online functions unless we
specifically document what the expected return values are supposed to
be. If we don't have any return values other than 0 there isn't much
point in having one anyway.


I ll document this.

Regards,
Arun



Re: [PATCH v7] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-08 Thread Arun KS

On 2019-01-08 23:43, Michal Hocko wrote:

On Tue 08-01-19 09:56:09, Alexander Duyck wrote:

On Fri, 2019-01-04 at 10:31 +0530, Arun KS wrote:

[...]

>  static int online_pages_range(unsigned long start_pfn, unsigned long 
nr_pages,
>void *arg)
>  {
> -  unsigned long i;
>unsigned long onlined_pages = *(unsigned long *)arg;
> -  struct page *page;
>
>if (PageReserved(pfn_to_page(start_pfn)))
> -  for (i = 0; i < nr_pages; i++) {
> -  page = pfn_to_page(start_pfn + i);
> -  (*online_page_callback)(page);
> -  onlined_pages++;
> -  }
> +  onlined_pages = online_pages_blocks(start_pfn, nr_pages);

Shouldn't this be a "+=" instead of an "="? It seems like you are 
going

to lose your count otherwise.


You are right of course. I should have noticed during the review.
Thanks!


I think we don't need to. The caller function is setting onlined_pages = 
0 before calling online_pages_range().
And there are no other reference to online_pages_range other than from 
online_pages().


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/memory_hotplug.c?h=v5.0-rc1#n845

int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int 
online_type)

{
unsigned long flags;
unsigned long onlined_pages = 0;

Regards,
Arun


Re: [PATCH v6 1/2] memory_hotplug: Free pages as higher order

2019-01-03 Thread Arun KS

On 2018-11-07 11:51, Arun KS wrote:

On 2018-11-07 01:38, Michal Hocko wrote:

On Tue 06-11-18 21:01:29, Arun KS wrote:

On 2018-11-06 19:36, Michal Hocko wrote:
> On Tue 06-11-18 11:33:13, Arun KS wrote:
> > When free pages are done with higher order, time spend on
> > coalescing pages by buddy allocator can be reduced. With
> > section size of 256MB, hot add latency of a single section
> > shows improvement from 50-60 ms to less than 1 ms, hence
> > improving the hot add latency by 60%. Modify external
> > providers of online callback to align with the change.
> >
> > This patch modifies totalram_pages, zone->managed_pages and
> > totalhigh_pages outside managed_page_count_lock. A follow up
> > series will be send to convert these variable to atomic to
> > avoid readers potentially seeing a store tear.
>
> Is there any reason to rush this through rather than wait for counters
> conversion first?

Sure Michal.

Conversion patch, https://patchwork.kernel.org/cover/10657217/ is 
currently

incremental to this patch.


The ordering should be other way around. Because as things stand with
this patch first it is possible to introduce a subtle race prone
updates. As I've said I am skeptical the race would matter, really, 
but

there is no real reason to risk for that. Especially when you have the
other (first) half ready.


Makes sense. I have rebased the preparatory patch on top of -rc1.
https://patchwork.kernel.org/patch/10670787/


Hello Michal,

Please review version 7 sent,
https://lore.kernel.org/patchwork/patch/1028908/

Regards,
Arun


Regards,
Arun


[PATCH v7] mm/page_alloc.c: memory_hotplug: free pages as higher order

2019-01-03 Thread Arun KS
When freeing pages are done with higher order, time spent on coalescing
pages by buddy allocator can be reduced.  With section size of 256MB, hot
add latency of a single section shows improvement from 50-60 ms to less
than 1 ms, hence improving the hot add latency by 60 times.  Modify
external providers of online callback to align with the change.

Signed-off-by: Arun KS 
Acked-by: Michal Hocko 
Reviewed-by: Oscar Salvador 
---
Changes since v6:
- Rebased to 4.20
- Changelog updated.
- No improvement seen on arm64, hence removed removal of prefetch.

Changes since v5:
- Rebased to 4.20-rc1.
- Changelog updated.

Changes since v4:
- As suggested by Michal Hocko,
- Simplify logic in online_pages_block() by using get_order().
- Seperate out removal of prefetch from __free_pages_core().

Changes since v3:
- Renamed _free_pages_boot_core -> __free_pages_core.
- Removed prefetch from __free_pages_core.
- Removed xen_online_page().

Changes since v2:
- Reuse code from __free_pages_boot_core().

Changes since v1:
- Removed prefetch().

Changes since RFC:
- Rebase.
- As suggested by Michal Hocko remove pages_per_block.
- Modifed external providers of online_page_callback.

v6: https://lore.kernel.org/patchwork/patch/1007253/
v5: https://lore.kernel.org/patchwork/patch/995739/
v4: https://lore.kernel.org/patchwork/patch/995111/
v3: https://lore.kernel.org/patchwork/patch/992348/
v2: https://lore.kernel.org/patchwork/patch/991363/
v1: https://lore.kernel.org/patchwork/patch/989445/
RFC: https://lore.kernel.org/patchwork/patch/984754/

---
 drivers/hv/hv_balloon.c|  6 --
 drivers/xen/balloon.c  | 23 +++
 include/linux/memory_hotplug.h |  2 +-
 mm/internal.h  |  1 +
 mm/memory_hotplug.c| 42 ++
 mm/page_alloc.c|  8 
 6 files changed, 55 insertions(+), 27 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5301fef..211f3fe 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -771,7 +771,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
}
 }
 
-static void hv_online_page(struct page *pg)
+static int hv_online_page(struct page *pg, unsigned int order)
 {
struct hv_hotadd_state *has;
unsigned long flags;
@@ -783,10 +783,12 @@ static void hv_online_page(struct page *pg)
if ((pfn < has->start_pfn) || (pfn >= has->end_pfn))
continue;
 
-   hv_page_online_one(has, pg);
+   hv_bring_pgs_online(has, pfn, (1UL << order));
break;
}
spin_unlock_irqrestore(_device.ha_lock, flags);
+
+   return 0;
 }
 
 static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index ceb5048..95f888f 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -345,8 +345,8 @@ static enum bp_state reserve_additional_memory(void)
 
/*
 * add_memory_resource() will call online_pages() which in its turn
-* will call xen_online_page() callback causing deadlock if we don't
-* release balloon_mutex here. Unlocking here is safe because the
+* will call xen_bring_pgs_online() callback causing deadlock if we
+* don't release balloon_mutex here. Unlocking here is safe because the
 * callers drop the mutex before trying again.
 */
mutex_unlock(_mutex);
@@ -369,15 +369,22 @@ static enum bp_state reserve_additional_memory(void)
return BP_ECANCELED;
 }
 
-static void xen_online_page(struct page *page)
+static int xen_bring_pgs_online(struct page *pg, unsigned int order)
 {
-   __online_page_set_limits(page);
+   unsigned long i, size = (1 << order);
+   unsigned long start_pfn = page_to_pfn(pg);
+   struct page *p;
 
+   pr_debug("Online %lu pages starting at pfn 0x%lx\n", size, start_pfn);
mutex_lock(_mutex);
-
-   __balloon_append(page);
-
+   for (i = 0; i < size; i++) {
+   p = pfn_to_page(start_pfn + i);
+   __online_page_set_limits(p);
+   __balloon_append(p);
+   }
mutex_unlock(_mutex);
+
+   return 0;
 }
 
 static int xen_memory_notifier(struct notifier_block *nb, unsigned long val, 
void *v)
@@ -702,7 +709,7 @@ static int __init balloon_init(void)
balloon_stats.max_retry_count = RETRY_UNLIMITED;
 
 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
-   set_online_page_callback(_online_page);
+   set_online_page_callback(_bring_pgs_online);
register_memory_notifier(_memory_nb);
register_sysctl_table(xen_root);
 #endif
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 07da5c6..d56bfba 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -87,7 +87,7 @@ extern int test

[PATCH v5 3/4] mm: convert totalram_pages and totalhigh_pages variables to atomic

2018-11-12 Thread Arun KS
totalram_pages and totalhigh_pages are made static inline function.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in lenght here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
So it seemes better to remove the lock and convert variables
to atomic, with preventing poteintial store-to-read tearing as
a bonus.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
coccinelle script to make most of the changes,

@@
declarer name EXPORT_SYMBOL;
symbol totalram_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalram_pages);
|
- totalram_pages = e
+ totalram_pages_set(e)
|
- totalram_pages += e
+ totalram_pages_add(e)
|
- totalram_pages++
+ totalram_pages_inc()
|
- totalram_pages--
+ totalram_pages_dec()
|
- totalram_pages
+ totalram_pages()
)

@@
symbol totalhigh_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalhigh_pages);
|
- totalhigh_pages = e
+ totalhigh_pages_set(e)
|
- totalhigh_pages += e
+ totalhigh_pages_add(e)
|
- totalhigh_pages++
+ totalhigh_pages_inc()
|
- totalhigh_pages--
+ totalhigh_pages_dec()
|
- totalhigh_pages
+ totalhigh_pages()
)

Manaually apply all changes of following files,

include/linux/highmem.h
include/linux/mm.h
include/linux/swap.h
mm/highmem.c

and for mm/page_alloc.c mannualy apply only below changes,

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 

 /* Protect totalram_pages and zone->managed_pages */
 static DEFINE_SPINLOCK(managed_page_count_lock);

-unsigned long totalram_pages __read_mostly;
+atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;
---

 arch/csky/mm/init.c   |  4 ++--
 arch/powerpc/platforms/pseries/cmm.c  | 10 +-
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  2 +-
 arch/x86/kernel/cpu/microcode/core.c  |  2 +-
 drivers/char/agp/backend.c|  4 ++--
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 ++--
 drivers/hv/hv_balloon.c   |  2 +-
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 ++--
 drivers/parisc/sba_iommu.c|  4 ++--
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +++---
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  2 +-
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 +--
 include/linux/mm.h| 27 +-
 include/linux/swap.h  |  1 -
 kernel/fork.c |  2 +-
 kernel/kexec_core.c   |  2 +-
 kernel/power/snapshot.c   |  2 +-
 mm/highmem.c  |  5 ++---
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  4 ++--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 20 ++-
 mm/shmem.c|  8 
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 ++--
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 ++--
 net/dccp/proto.c  |  2 +-
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  2 +-
 net/netfilter/xt_hashlimit.c  |  2 +-
 net/sctp/protocol.c   |  2 +-
 security/integrity/ima/ima_kexec.c|  2 +-
 53 files changed, 130 insertions(+), 81 deletions(-)

diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c
index dc07c07..66e5970 100644
--- a/arch/csky/mm/init.c
+++ b/arch/csky/mm/ini

[PATCH v5 2/4] mm: convert zone->managed_pages to atomic variable

2018-11-12 Thread Arun KS
totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

This patch converts zone->managed_pages. Subsequent patches will
convert totalram_panges, totalhigh_pages and eventually
managed_page_count_lock will be removed.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
So it seemes better to remove the lock and convert variables
to atomic, with preventing poteintial store-to-read tearing as
a bonus.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
Most of the changes are done by below coccinelle script,

@@
struct zone *z;
expression e1;
@@
(
- z->managed_pages = e1
+ atomic_long_set(>managed_pages, e1)
|
- e1->managed_pages++
+ atomic_long_inc(>managed_pages)
|
- z->managed_pages
+ zone_managed_pages(z)
)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, >managed_pages)

@@
expression z;
@@
- z.managed_pages
+ zone_managed_pages()

Then, manually apply following change,
include/linux/mmzone.h

- unsigned long managed_pages;
+ atomic_long_t managed_pages;

+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}

---

 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 include/linux/mmzone.h|  9 +--
 lib/show_mem.c|  2 +-
 mm/memblock.c |  2 +-
 mm/page_alloc.c   | 44 +--
 mm/vmstat.c   |  4 ++--
 6 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 56412b0..c0e55bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -848,7 +848,7 @@ static int kfd_fill_mem_info_for_cpu(int numa_node_id, int 
*avail_size,
 */
pgdat = NODE_DATA(numa_node_id);
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-   mem_in_bytes += pgdat->node_zones[zone_type].managed_pages;
+   mem_in_bytes += 
zone_managed_pages(>node_zones[zone_type]);
mem_in_bytes <<= PAGE_SHIFT;
 
sub_type_hdr->length_low = lower_32_bits(mem_in_bytes);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 847705a..e73dc31 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -435,7 +435,7 @@ struct zone {
 * adjust_managed_page_count() should be used instead of directly
 * touching zone->managed_pages and totalram_pages.
 */
-   unsigned long   managed_pages;
+   atomic_long_t   managed_pages;
unsigned long   spanned_pages;
unsigned long   present_pages;
 
@@ -524,6 +524,11 @@ enum pgdat_flags {
PGDAT_RECLAIM_LOCKED,   /* prevents concurrent reclaim */
 };
 
+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}
+
 static inline unsigned long zone_end_pfn(const struct zone *zone)
 {
return zone->zone_start_pfn + zone->spanned_pages;
@@ -814,7 +819,7 @@ static inline bool is_dev_zone(const struct zone *zone)
  */
 static inline bool managed_zone(struct zone *zone)
 {
-   return zone->managed_pages;
+   return zone_managed_pages(zone);
 }
 
 /* Returns true if a zone has memory */
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d..eefe67d 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -28,7 +28,7 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
continue;
 
total += zone->present_pages;
-   reserved += zone->present_pages - zone->managed_pages;
+   reserved += zone->present_pages - 
zone_managed_pages(zone);
 
if (is_highmem_idx(zoneid))
highmem += zone->present_pages;
diff --git a/mm/memblock.c b/mm/memblock.c
index 7df468c..bbd82ab 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1950,7 +1950,7 @@ void reset_node_managed_pages(pg_data_t *pgdat)
struct zone *z;
 
for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
-   z->managed_pages = 0;
+   atomic_long_set(>managed_pages, 0);
 }
 
 void __init reset_all_zones_managed_pages(void)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 173312b..22e6645 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1279,

[PATCH v5 4/4] mm: Remove managed_page_count spinlock

2018-11-12 Thread Arun KS
Now that totalram_pages and managed_pages are atomic varibles, no need
of managed_page_count spinlock. The lock had really a weak consistency
guarantee. It hasn't been used for anything but the update but no reader
actually cares about all the values being updated to be in sync.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e73dc31..c71b4d9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8b64cc..26c5e14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);
 
-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 EXPORT_SYMBOL(_totalram_pages);
 unsigned long totalreserve_pages __read_mostly;
@@ -7065,14 +7062,12 @@ static int __init cmdline_parse_movablecore(char *p)
 
 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);
 
-- 
1.9.1



[PATCH v5 1/4] mm: reference totalram_pages and managed_pages once per function

2018-11-12 Thread Arun KS
This patch is in preparation to a later patch which converts totalram_pages
and zone->managed_pages to atomic variables. Please note that re-reading
the value might lead to a different value and as such it could lead to
unexpected behavior. There are no known bugs as a result of the current code
but it is better to prevent from them in principle.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
 arch/um/kernel/mem.c |  2 +-
 arch/x86/kernel/cpu/microcode/core.c |  5 +++--
 drivers/hv/hv_balloon.c  | 19 ++-
 fs/file_table.c  |  7 ---
 kernel/fork.c|  5 +++--
 kernel/kexec_core.c  |  5 +++--
 mm/page_alloc.c  |  5 +++--
 mm/shmem.c   |  3 ++-
 net/dccp/proto.c |  7 ---
 net/netfilter/nf_conntrack_core.c|  7 ---
 net/netfilter/xt_hashlimit.c |  5 +++--
 net/sctp/protocol.c  |  7 ---
 12 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 1067469..2da2096 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -52,7 +52,7 @@ void __init mem_init(void)
/* this will put all low memory onto the freelists */
memblock_free_all();
max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn;
mem_init_print_info(NULL);
kmalloc_ok = 1;
 }
diff --git a/arch/x86/kernel/cpu/microcode/core.c 
b/arch/x86/kernel/cpu/microcode/core.c
index 2637ff0..168fa27 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, const 
char __user *buf,
   size_t len, loff_t *ppos)
 {
ssize_t ret = -EINVAL;
+   unsigned long nr_pages = totalram_pages;
 
-   if ((len >> PAGE_SHIFT) > totalram_pages) {
-   pr_err("too much data (max %ld pages)\n", totalram_pages);
+   if ((len >> PAGE_SHIFT) > nr_pages) {
+   pr_err("too much data (max %ld pages)\n", nr_pages);
return ret;
}
 
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4163151..f3e7da9 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1090,6 +1090,7 @@ static void process_info(struct hv_dynmem_device *dm, 
struct dm_info_msg *msg)
 static unsigned long compute_balloon_floor(void)
 {
unsigned long min_pages;
+   unsigned long nr_pages = totalram_pages;
 #define MB2PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
/* Simple continuous piecewiese linear function:
 *  max MiB -> min MiB  gradient
@@ -1102,16 +1103,16 @@ static unsigned long compute_balloon_floor(void)
 *8192   744(1/16)
 *   32768  1512(1/32)
 */
-   if (totalram_pages < MB2PAGES(128))
-   min_pages = MB2PAGES(8) + (totalram_pages >> 1);
-   else if (totalram_pages < MB2PAGES(512))
-   min_pages = MB2PAGES(40) + (totalram_pages >> 2);
-   else if (totalram_pages < MB2PAGES(2048))
-   min_pages = MB2PAGES(104) + (totalram_pages >> 3);
-   else if (totalram_pages < MB2PAGES(8192))
-   min_pages = MB2PAGES(232) + (totalram_pages >> 4);
+   if (nr_pages < MB2PAGES(128))
+   min_pages = MB2PAGES(8) + (nr_pages >> 1);
+   else if (nr_pages < MB2PAGES(512))
+   min_pages = MB2PAGES(40) + (nr_pages >> 2);
+   else if (nr_pages < MB2PAGES(2048))
+   min_pages = MB2PAGES(104) + (nr_pages >> 3);
+   else if (nr_pages < MB2PAGES(8192))
+   min_pages = MB2PAGES(232) + (nr_pages >> 4);
else
-   min_pages = MB2PAGES(488) + (totalram_pages >> 5);
+   min_pages = MB2PAGES(488) + (nr_pages >> 5);
 #undef MB2PAGES
return min_pages;
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index e49af4c..b6e9587 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -380,10 +380,11 @@ void __init files_init(void)
 void __init files_maxfiles_init(void)
 {
unsigned long n;
-   unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+   unsigned long nr_pages = totalram_pages;
+   unsigned long memreserve = (nr_pages - nr_free_pages()) * 3/2;
 
-   memreserve = min(memreserve, totalram_pages - 1);
-   n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
+   memreserve = min(memreserve, nr_pages - 1);
+   n = ((nr_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
 
files_stat.max_files = max_t(unsigned long, n, NR_FILE);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index 07cddff..

[PATCH v5 0/4] mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-12 Thread Arun KS
This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & i386_defconfig)
on 4.20-rc2. And memory hotplug tested on arm64, but on an older version
of kernel.

totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
It seemes better to remove the lock and convert variables
to atomic. With the change, preventing poteintial store-to-read
tearing comes as a bonus.

Changes in v5:
- totalram_pgs renamed to nr_pages.
https://lore.kernel.org/patchwork/patch/1011293/#1194248

Changes in v4:
- Fixed kbuild test robot error.
- Modified changelog.
- Rebased to 4.20.-rc2

Changes in v3:
- Fixed kbuild test robot errors.
- Modified changelogs to be more clear.
- EXPORT_SYMBOL for _totalram_pages and _totalhigh_pages.

Arun KS (4):
  mm: reference totalram_pages and managed_pages once per function
  mm: convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to atomic
  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  4 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c|  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c|  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h| 27 +-
 include/linux/mmzone.h| 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c|  2 +-
 mm/highmem.c  |  5 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 72 +--
 mm/shmem.c|  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c|  2 +-
 57 files changed, 196 insertions(+), 142 deletions(-)

-- 
1.9.1



[PATCH v5 3/4] mm: convert totalram_pages and totalhigh_pages variables to atomic

2018-11-12 Thread Arun KS
totalram_pages and totalhigh_pages are made static inline function.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in lenght here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
So it seemes better to remove the lock and convert variables
to atomic, with preventing poteintial store-to-read tearing as
a bonus.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
coccinelle script to make most of the changes,

@@
declarer name EXPORT_SYMBOL;
symbol totalram_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalram_pages);
|
- totalram_pages = e
+ totalram_pages_set(e)
|
- totalram_pages += e
+ totalram_pages_add(e)
|
- totalram_pages++
+ totalram_pages_inc()
|
- totalram_pages--
+ totalram_pages_dec()
|
- totalram_pages
+ totalram_pages()
)

@@
symbol totalhigh_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalhigh_pages);
|
- totalhigh_pages = e
+ totalhigh_pages_set(e)
|
- totalhigh_pages += e
+ totalhigh_pages_add(e)
|
- totalhigh_pages++
+ totalhigh_pages_inc()
|
- totalhigh_pages--
+ totalhigh_pages_dec()
|
- totalhigh_pages
+ totalhigh_pages()
)

Manaually apply all changes of following files,

include/linux/highmem.h
include/linux/mm.h
include/linux/swap.h
mm/highmem.c

and for mm/page_alloc.c mannualy apply only below changes,

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 

 /* Protect totalram_pages and zone->managed_pages */
 static DEFINE_SPINLOCK(managed_page_count_lock);

-unsigned long totalram_pages __read_mostly;
+atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;
---

 arch/csky/mm/init.c   |  4 ++--
 arch/powerpc/platforms/pseries/cmm.c  | 10 +-
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  2 +-
 arch/x86/kernel/cpu/microcode/core.c  |  2 +-
 drivers/char/agp/backend.c|  4 ++--
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 ++--
 drivers/hv/hv_balloon.c   |  2 +-
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 ++--
 drivers/parisc/sba_iommu.c|  4 ++--
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +++---
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  2 +-
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 +--
 include/linux/mm.h| 27 +-
 include/linux/swap.h  |  1 -
 kernel/fork.c |  2 +-
 kernel/kexec_core.c   |  2 +-
 kernel/power/snapshot.c   |  2 +-
 mm/highmem.c  |  5 ++---
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  4 ++--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 20 ++-
 mm/shmem.c|  8 
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 ++--
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 ++--
 net/dccp/proto.c  |  2 +-
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  2 +-
 net/netfilter/xt_hashlimit.c  |  2 +-
 net/sctp/protocol.c   |  2 +-
 security/integrity/ima/ima_kexec.c|  2 +-
 53 files changed, 130 insertions(+), 81 deletions(-)

diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c
index dc07c07..66e5970 100644
--- a/arch/csky/mm/init.c
+++ b/arch/csky/mm/ini

[PATCH v5 2/4] mm: convert zone->managed_pages to atomic variable

2018-11-12 Thread Arun KS
totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

This patch converts zone->managed_pages. Subsequent patches will
convert totalram_panges, totalhigh_pages and eventually
managed_page_count_lock will be removed.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
So it seemes better to remove the lock and convert variables
to atomic, with preventing poteintial store-to-read tearing as
a bonus.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
Most of the changes are done by below coccinelle script,

@@
struct zone *z;
expression e1;
@@
(
- z->managed_pages = e1
+ atomic_long_set(>managed_pages, e1)
|
- e1->managed_pages++
+ atomic_long_inc(>managed_pages)
|
- z->managed_pages
+ zone_managed_pages(z)
)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, >managed_pages)

@@
expression z;
@@
- z.managed_pages
+ zone_managed_pages()

Then, manually apply following change,
include/linux/mmzone.h

- unsigned long managed_pages;
+ atomic_long_t managed_pages;

+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}

---

 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 include/linux/mmzone.h|  9 +--
 lib/show_mem.c|  2 +-
 mm/memblock.c |  2 +-
 mm/page_alloc.c   | 44 +--
 mm/vmstat.c   |  4 ++--
 6 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 56412b0..c0e55bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -848,7 +848,7 @@ static int kfd_fill_mem_info_for_cpu(int numa_node_id, int 
*avail_size,
 */
pgdat = NODE_DATA(numa_node_id);
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-   mem_in_bytes += pgdat->node_zones[zone_type].managed_pages;
+   mem_in_bytes += 
zone_managed_pages(>node_zones[zone_type]);
mem_in_bytes <<= PAGE_SHIFT;
 
sub_type_hdr->length_low = lower_32_bits(mem_in_bytes);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 847705a..e73dc31 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -435,7 +435,7 @@ struct zone {
 * adjust_managed_page_count() should be used instead of directly
 * touching zone->managed_pages and totalram_pages.
 */
-   unsigned long   managed_pages;
+   atomic_long_t   managed_pages;
unsigned long   spanned_pages;
unsigned long   present_pages;
 
@@ -524,6 +524,11 @@ enum pgdat_flags {
PGDAT_RECLAIM_LOCKED,   /* prevents concurrent reclaim */
 };
 
+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}
+
 static inline unsigned long zone_end_pfn(const struct zone *zone)
 {
return zone->zone_start_pfn + zone->spanned_pages;
@@ -814,7 +819,7 @@ static inline bool is_dev_zone(const struct zone *zone)
  */
 static inline bool managed_zone(struct zone *zone)
 {
-   return zone->managed_pages;
+   return zone_managed_pages(zone);
 }
 
 /* Returns true if a zone has memory */
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d..eefe67d 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -28,7 +28,7 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
continue;
 
total += zone->present_pages;
-   reserved += zone->present_pages - zone->managed_pages;
+   reserved += zone->present_pages - 
zone_managed_pages(zone);
 
if (is_highmem_idx(zoneid))
highmem += zone->present_pages;
diff --git a/mm/memblock.c b/mm/memblock.c
index 7df468c..bbd82ab 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1950,7 +1950,7 @@ void reset_node_managed_pages(pg_data_t *pgdat)
struct zone *z;
 
for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
-   z->managed_pages = 0;
+   atomic_long_set(>managed_pages, 0);
 }
 
 void __init reset_all_zones_managed_pages(void)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 173312b..22e6645 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1279,

[PATCH v5 4/4] mm: Remove managed_page_count spinlock

2018-11-12 Thread Arun KS
Now that totalram_pages and managed_pages are atomic varibles, no need
of managed_page_count spinlock. The lock had really a weak consistency
guarantee. It hasn't been used for anything but the update but no reader
actually cares about all the values being updated to be in sync.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e73dc31..c71b4d9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8b64cc..26c5e14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);
 
-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 EXPORT_SYMBOL(_totalram_pages);
 unsigned long totalreserve_pages __read_mostly;
@@ -7065,14 +7062,12 @@ static int __init cmdline_parse_movablecore(char *p)
 
 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);
 
-- 
1.9.1



[PATCH v5 1/4] mm: reference totalram_pages and managed_pages once per function

2018-11-12 Thread Arun KS
This patch is in preparation to a later patch which converts totalram_pages
and zone->managed_pages to atomic variables. Please note that re-reading
the value might lead to a different value and as such it could lead to
unexpected behavior. There are no known bugs as a result of the current code
but it is better to prevent from them in principle.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
 arch/um/kernel/mem.c |  2 +-
 arch/x86/kernel/cpu/microcode/core.c |  5 +++--
 drivers/hv/hv_balloon.c  | 19 ++-
 fs/file_table.c  |  7 ---
 kernel/fork.c|  5 +++--
 kernel/kexec_core.c  |  5 +++--
 mm/page_alloc.c  |  5 +++--
 mm/shmem.c   |  3 ++-
 net/dccp/proto.c |  7 ---
 net/netfilter/nf_conntrack_core.c|  7 ---
 net/netfilter/xt_hashlimit.c |  5 +++--
 net/sctp/protocol.c  |  7 ---
 12 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 1067469..2da2096 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -52,7 +52,7 @@ void __init mem_init(void)
/* this will put all low memory onto the freelists */
memblock_free_all();
max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn;
mem_init_print_info(NULL);
kmalloc_ok = 1;
 }
diff --git a/arch/x86/kernel/cpu/microcode/core.c 
b/arch/x86/kernel/cpu/microcode/core.c
index 2637ff0..168fa27 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, const 
char __user *buf,
   size_t len, loff_t *ppos)
 {
ssize_t ret = -EINVAL;
+   unsigned long nr_pages = totalram_pages;
 
-   if ((len >> PAGE_SHIFT) > totalram_pages) {
-   pr_err("too much data (max %ld pages)\n", totalram_pages);
+   if ((len >> PAGE_SHIFT) > nr_pages) {
+   pr_err("too much data (max %ld pages)\n", nr_pages);
return ret;
}
 
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4163151..f3e7da9 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1090,6 +1090,7 @@ static void process_info(struct hv_dynmem_device *dm, 
struct dm_info_msg *msg)
 static unsigned long compute_balloon_floor(void)
 {
unsigned long min_pages;
+   unsigned long nr_pages = totalram_pages;
 #define MB2PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
/* Simple continuous piecewiese linear function:
 *  max MiB -> min MiB  gradient
@@ -1102,16 +1103,16 @@ static unsigned long compute_balloon_floor(void)
 *8192   744(1/16)
 *   32768  1512(1/32)
 */
-   if (totalram_pages < MB2PAGES(128))
-   min_pages = MB2PAGES(8) + (totalram_pages >> 1);
-   else if (totalram_pages < MB2PAGES(512))
-   min_pages = MB2PAGES(40) + (totalram_pages >> 2);
-   else if (totalram_pages < MB2PAGES(2048))
-   min_pages = MB2PAGES(104) + (totalram_pages >> 3);
-   else if (totalram_pages < MB2PAGES(8192))
-   min_pages = MB2PAGES(232) + (totalram_pages >> 4);
+   if (nr_pages < MB2PAGES(128))
+   min_pages = MB2PAGES(8) + (nr_pages >> 1);
+   else if (nr_pages < MB2PAGES(512))
+   min_pages = MB2PAGES(40) + (nr_pages >> 2);
+   else if (nr_pages < MB2PAGES(2048))
+   min_pages = MB2PAGES(104) + (nr_pages >> 3);
+   else if (nr_pages < MB2PAGES(8192))
+   min_pages = MB2PAGES(232) + (nr_pages >> 4);
else
-   min_pages = MB2PAGES(488) + (totalram_pages >> 5);
+   min_pages = MB2PAGES(488) + (nr_pages >> 5);
 #undef MB2PAGES
return min_pages;
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index e49af4c..b6e9587 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -380,10 +380,11 @@ void __init files_init(void)
 void __init files_maxfiles_init(void)
 {
unsigned long n;
-   unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+   unsigned long nr_pages = totalram_pages;
+   unsigned long memreserve = (nr_pages - nr_free_pages()) * 3/2;
 
-   memreserve = min(memreserve, totalram_pages - 1);
-   n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
+   memreserve = min(memreserve, nr_pages - 1);
+   n = ((nr_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
 
files_stat.max_files = max_t(unsigned long, n, NR_FILE);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index 07cddff..

[PATCH v5 0/4] mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-12 Thread Arun KS
This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & i386_defconfig)
on 4.20-rc2. And memory hotplug tested on arm64, but on an older version
of kernel.

totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
It seemes better to remove the lock and convert variables
to atomic. With the change, preventing poteintial store-to-read
tearing comes as a bonus.

Changes in v5:
- totalram_pgs renamed to nr_pages.
https://lore.kernel.org/patchwork/patch/1011293/#1194248

Changes in v4:
- Fixed kbuild test robot error.
- Modified changelog.
- Rebased to 4.20.-rc2

Changes in v3:
- Fixed kbuild test robot errors.
- Modified changelogs to be more clear.
- EXPORT_SYMBOL for _totalram_pages and _totalhigh_pages.

Arun KS (4):
  mm: reference totalram_pages and managed_pages once per function
  mm: convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to atomic
  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  4 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c|  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c|  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h| 27 +-
 include/linux/mmzone.h| 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c|  2 +-
 mm/highmem.c  |  5 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 72 +--
 mm/shmem.c|  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c|  2 +-
 57 files changed, 196 insertions(+), 142 deletions(-)

-- 
1.9.1



Re: [PATCH v4 1/4] mm: reference totalram_pages and managed_pages once per function

2018-11-11 Thread Arun KS

Hello Matthew,

Thanks for reviewing.
On 2018-11-12 11:43, Matthew Wilcox wrote:

On Mon, Nov 12, 2018 at 11:37:46AM +0530, Arun KS wrote:

+++ b/arch/um/kernel/mem.c
@@ -51,8 +51,7 @@ void __init mem_init(void)

/* this will put all low memory onto the freelists */
memblock_free_all();
-   max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn = totalram_pages;


We don't normally do "a = b = c".  How about:

max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn;


Point taken. Will fix it.




+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, 
const char __user *buf,

   size_t len, loff_t *ppos)
 {
ssize_t ret = -EINVAL;
+   unsigned long totalram_pgs = totalram_pages;


Can't we use a better variable name here?  Even nr_pages would look
better to me.


Looks better.

Regards,
Arun




+++ b/drivers/hv/hv_balloon.c
+   unsigned long totalram_pgs = totalram_pages;


Ditto


+++ b/fs/file_table.c
+   unsigned long totalram_pgs = totalram_pages;


... throughout, I guess.






Re: [PATCH v4 1/4] mm: reference totalram_pages and managed_pages once per function

2018-11-11 Thread Arun KS

Hello Matthew,

Thanks for reviewing.
On 2018-11-12 11:43, Matthew Wilcox wrote:

On Mon, Nov 12, 2018 at 11:37:46AM +0530, Arun KS wrote:

+++ b/arch/um/kernel/mem.c
@@ -51,8 +51,7 @@ void __init mem_init(void)

/* this will put all low memory onto the freelists */
memblock_free_all();
-   max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn = totalram_pages;


We don't normally do "a = b = c".  How about:

max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn;


Point taken. Will fix it.




+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, 
const char __user *buf,

   size_t len, loff_t *ppos)
 {
ssize_t ret = -EINVAL;
+   unsigned long totalram_pgs = totalram_pages;


Can't we use a better variable name here?  Even nr_pages would look
better to me.


Looks better.

Regards,
Arun




+++ b/drivers/hv/hv_balloon.c
+   unsigned long totalram_pgs = totalram_pages;


Ditto


+++ b/fs/file_table.c
+   unsigned long totalram_pgs = totalram_pages;


... throughout, I guess.






[PATCH v4 3/4] mm: convert totalram_pages and totalhigh_pages variables to atomic

2018-11-11 Thread Arun KS
totalram_pages and totalhigh_pages are made static inline function.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in lenght here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
So it seemes better to remove the lock and convert variables
to atomic, with preventing poteintial store-to-read tearing as
a bonus.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 

---
coccinelle script to make most of the changes,

@@
declarer name EXPORT_SYMBOL;
symbol totalram_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalram_pages);
|
- totalram_pages = e
+ totalram_pages_set(e)
|
- totalram_pages += e
+ totalram_pages_add(e)
|
- totalram_pages++
+ totalram_pages_inc()
|
- totalram_pages--
+ totalram_pages_dec()
|
- totalram_pages
+ totalram_pages()
)

@@
symbol totalhigh_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalhigh_pages);
|
- totalhigh_pages = e
+ totalhigh_pages_set(e)
|
- totalhigh_pages += e
+ totalhigh_pages_add(e)
|
- totalhigh_pages++
+ totalhigh_pages_inc()
|
- totalhigh_pages--
+ totalhigh_pages_dec()
|
- totalhigh_pages
+ totalhigh_pages()
)

Manaually apply all changes of following files,

include/linux/highmem.h
include/linux/mm.h
include/linux/swap.h
mm/highmem.c

and for mm/page_alloc.c mannualy apply only below changes,

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 

 /* Protect totalram_pages and zone->managed_pages */
 static DEFINE_SPINLOCK(managed_page_count_lock);

-unsigned long totalram_pages __read_mostly;
+atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;

---
---
 arch/csky/mm/init.c   |  4 ++--
 arch/powerpc/platforms/pseries/cmm.c  | 10 +-
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  2 +-
 arch/x86/kernel/cpu/microcode/core.c  |  2 +-
 drivers/char/agp/backend.c|  4 ++--
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 ++--
 drivers/hv/hv_balloon.c   |  2 +-
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 ++--
 drivers/parisc/sba_iommu.c|  4 ++--
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +++---
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  2 +-
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 +--
 include/linux/mm.h| 27 +-
 include/linux/swap.h  |  1 -
 kernel/fork.c |  2 +-
 kernel/kexec_core.c   |  2 +-
 kernel/power/snapshot.c   |  2 +-
 mm/highmem.c  |  5 ++---
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  4 ++--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 20 ++-
 mm/shmem.c|  8 
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 ++--
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 ++--
 net/dccp/proto.c  |  2 +-
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  2 +-
 net/netfilter/xt_hashlimit.c  |  2 +-
 net/sctp/protocol.c   |  2 +-
 security/integrity/ima/ima_kexec.c|  2 +-
 53 files changed, 130 insertions(+), 81 deletions(-)

diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c
index dc07c07..66e5970 100644
--- a/arch/csky/mm/init.c
+++ b/arch/csky

[PATCH v4 1/4] mm: reference totalram_pages and managed_pages once per function

2018-11-11 Thread Arun KS
This patch is in preparation to a later patch which converts totalram_pages
and zone->managed_pages to atomic variables. Please note that re-reading
the value might lead to a different value and as such it could lead to
unexpected behavior. There are no known bugs as a result of the current code
but it is better to prevent from them in principle.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
 arch/um/kernel/mem.c |  3 +--
 arch/x86/kernel/cpu/microcode/core.c |  5 +++--
 drivers/hv/hv_balloon.c  | 19 ++-
 fs/file_table.c  |  7 ---
 kernel/fork.c|  5 +++--
 kernel/kexec_core.c  |  5 +++--
 mm/page_alloc.c  |  5 +++--
 mm/shmem.c   |  3 ++-
 net/dccp/proto.c |  7 ---
 net/netfilter/nf_conntrack_core.c|  7 ---
 net/netfilter/xt_hashlimit.c |  5 +++--
 net/sctp/protocol.c  |  7 ---
 12 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 1067469..134d3fd 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -51,8 +51,7 @@ void __init mem_init(void)
 
/* this will put all low memory onto the freelists */
memblock_free_all();
-   max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn = totalram_pages;
mem_init_print_info(NULL);
kmalloc_ok = 1;
 }
diff --git a/arch/x86/kernel/cpu/microcode/core.c 
b/arch/x86/kernel/cpu/microcode/core.c
index 2637ff0..99c67ca 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, const 
char __user *buf,
   size_t len, loff_t *ppos)
 {
ssize_t ret = -EINVAL;
+   unsigned long totalram_pgs = totalram_pages;
 
-   if ((len >> PAGE_SHIFT) > totalram_pages) {
-   pr_err("too much data (max %ld pages)\n", totalram_pages);
+   if ((len >> PAGE_SHIFT) > totalram_pgs) {
+   pr_err("too much data (max %ld pages)\n", totalram_pgs);
return ret;
}
 
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4163151..cac4945 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1090,6 +1090,7 @@ static void process_info(struct hv_dynmem_device *dm, 
struct dm_info_msg *msg)
 static unsigned long compute_balloon_floor(void)
 {
unsigned long min_pages;
+   unsigned long totalram_pgs = totalram_pages;
 #define MB2PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
/* Simple continuous piecewiese linear function:
 *  max MiB -> min MiB  gradient
@@ -1102,16 +1103,16 @@ static unsigned long compute_balloon_floor(void)
 *8192   744(1/16)
 *   32768  1512(1/32)
 */
-   if (totalram_pages < MB2PAGES(128))
-   min_pages = MB2PAGES(8) + (totalram_pages >> 1);
-   else if (totalram_pages < MB2PAGES(512))
-   min_pages = MB2PAGES(40) + (totalram_pages >> 2);
-   else if (totalram_pages < MB2PAGES(2048))
-   min_pages = MB2PAGES(104) + (totalram_pages >> 3);
-   else if (totalram_pages < MB2PAGES(8192))
-   min_pages = MB2PAGES(232) + (totalram_pages >> 4);
+   if (totalram_pgs < MB2PAGES(128))
+   min_pages = MB2PAGES(8) + (totalram_pgs >> 1);
+   else if (totalram_pgs < MB2PAGES(512))
+   min_pages = MB2PAGES(40) + (totalram_pgs >> 2);
+   else if (totalram_pgs < MB2PAGES(2048))
+   min_pages = MB2PAGES(104) + (totalram_pgs >> 3);
+   else if (totalram_pgs < MB2PAGES(8192))
+   min_pages = MB2PAGES(232) + (totalram_pgs >> 4);
else
-   min_pages = MB2PAGES(488) + (totalram_pages >> 5);
+   min_pages = MB2PAGES(488) + (totalram_pgs >> 5);
 #undef MB2PAGES
return min_pages;
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index e49af4c..6e3c088 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -380,10 +380,11 @@ void __init files_init(void)
 void __init files_maxfiles_init(void)
 {
unsigned long n;
-   unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+   unsigned long totalram_pgs = totalram_pages;
+   unsigned long memreserve = (totalram_pgs - nr_free_pages()) * 3/2;
 
-   memreserve = min(memreserve, totalram_pages - 1);
-   n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
+   memreserve = min(memreserve, totalram_pgs - 1);
+   n = ((totalram_pgs - memreserve) * (PAGE_SIZE / 1024)) / 10;
 
files_stat.max_files = max_t(un

[PATCH v4 1/4] mm: reference totalram_pages and managed_pages once per function

2018-11-11 Thread Arun KS
This patch is in preparation to a later patch which converts totalram_pages
and zone->managed_pages to atomic variables. Please note that re-reading
the value might lead to a different value and as such it could lead to
unexpected behavior. There are no known bugs as a result of the current code
but it is better to prevent from them in principle.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
 arch/um/kernel/mem.c |  3 +--
 arch/x86/kernel/cpu/microcode/core.c |  5 +++--
 drivers/hv/hv_balloon.c  | 19 ++-
 fs/file_table.c  |  7 ---
 kernel/fork.c|  5 +++--
 kernel/kexec_core.c  |  5 +++--
 mm/page_alloc.c  |  5 +++--
 mm/shmem.c   |  3 ++-
 net/dccp/proto.c |  7 ---
 net/netfilter/nf_conntrack_core.c|  7 ---
 net/netfilter/xt_hashlimit.c |  5 +++--
 net/sctp/protocol.c  |  7 ---
 12 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 1067469..134d3fd 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -51,8 +51,7 @@ void __init mem_init(void)
 
/* this will put all low memory onto the freelists */
memblock_free_all();
-   max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn = totalram_pages;
mem_init_print_info(NULL);
kmalloc_ok = 1;
 }
diff --git a/arch/x86/kernel/cpu/microcode/core.c 
b/arch/x86/kernel/cpu/microcode/core.c
index 2637ff0..99c67ca 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, const 
char __user *buf,
   size_t len, loff_t *ppos)
 {
ssize_t ret = -EINVAL;
+   unsigned long totalram_pgs = totalram_pages;
 
-   if ((len >> PAGE_SHIFT) > totalram_pages) {
-   pr_err("too much data (max %ld pages)\n", totalram_pages);
+   if ((len >> PAGE_SHIFT) > totalram_pgs) {
+   pr_err("too much data (max %ld pages)\n", totalram_pgs);
return ret;
}
 
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4163151..cac4945 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1090,6 +1090,7 @@ static void process_info(struct hv_dynmem_device *dm, 
struct dm_info_msg *msg)
 static unsigned long compute_balloon_floor(void)
 {
unsigned long min_pages;
+   unsigned long totalram_pgs = totalram_pages;
 #define MB2PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
/* Simple continuous piecewiese linear function:
 *  max MiB -> min MiB  gradient
@@ -1102,16 +1103,16 @@ static unsigned long compute_balloon_floor(void)
 *8192   744(1/16)
 *   32768  1512(1/32)
 */
-   if (totalram_pages < MB2PAGES(128))
-   min_pages = MB2PAGES(8) + (totalram_pages >> 1);
-   else if (totalram_pages < MB2PAGES(512))
-   min_pages = MB2PAGES(40) + (totalram_pages >> 2);
-   else if (totalram_pages < MB2PAGES(2048))
-   min_pages = MB2PAGES(104) + (totalram_pages >> 3);
-   else if (totalram_pages < MB2PAGES(8192))
-   min_pages = MB2PAGES(232) + (totalram_pages >> 4);
+   if (totalram_pgs < MB2PAGES(128))
+   min_pages = MB2PAGES(8) + (totalram_pgs >> 1);
+   else if (totalram_pgs < MB2PAGES(512))
+   min_pages = MB2PAGES(40) + (totalram_pgs >> 2);
+   else if (totalram_pgs < MB2PAGES(2048))
+   min_pages = MB2PAGES(104) + (totalram_pgs >> 3);
+   else if (totalram_pgs < MB2PAGES(8192))
+   min_pages = MB2PAGES(232) + (totalram_pgs >> 4);
else
-   min_pages = MB2PAGES(488) + (totalram_pages >> 5);
+   min_pages = MB2PAGES(488) + (totalram_pgs >> 5);
 #undef MB2PAGES
return min_pages;
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index e49af4c..6e3c088 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -380,10 +380,11 @@ void __init files_init(void)
 void __init files_maxfiles_init(void)
 {
unsigned long n;
-   unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+   unsigned long totalram_pgs = totalram_pages;
+   unsigned long memreserve = (totalram_pgs - nr_free_pages()) * 3/2;
 
-   memreserve = min(memreserve, totalram_pages - 1);
-   n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
+   memreserve = min(memreserve, totalram_pgs - 1);
+   n = ((totalram_pgs - memreserve) * (PAGE_SIZE / 1024)) / 10;
 
files_stat.max_files = max_t(un

[PATCH v4 3/4] mm: convert totalram_pages and totalhigh_pages variables to atomic

2018-11-11 Thread Arun KS
totalram_pages and totalhigh_pages are made static inline function.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in lenght here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
So it seemes better to remove the lock and convert variables
to atomic, with preventing poteintial store-to-read tearing as
a bonus.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 

---
coccinelle script to make most of the changes,

@@
declarer name EXPORT_SYMBOL;
symbol totalram_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalram_pages);
|
- totalram_pages = e
+ totalram_pages_set(e)
|
- totalram_pages += e
+ totalram_pages_add(e)
|
- totalram_pages++
+ totalram_pages_inc()
|
- totalram_pages--
+ totalram_pages_dec()
|
- totalram_pages
+ totalram_pages()
)

@@
symbol totalhigh_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalhigh_pages);
|
- totalhigh_pages = e
+ totalhigh_pages_set(e)
|
- totalhigh_pages += e
+ totalhigh_pages_add(e)
|
- totalhigh_pages++
+ totalhigh_pages_inc()
|
- totalhigh_pages--
+ totalhigh_pages_dec()
|
- totalhigh_pages
+ totalhigh_pages()
)

Manaually apply all changes of following files,

include/linux/highmem.h
include/linux/mm.h
include/linux/swap.h
mm/highmem.c

and for mm/page_alloc.c mannualy apply only below changes,

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 

 /* Protect totalram_pages and zone->managed_pages */
 static DEFINE_SPINLOCK(managed_page_count_lock);

-unsigned long totalram_pages __read_mostly;
+atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;

---
---
 arch/csky/mm/init.c   |  4 ++--
 arch/powerpc/platforms/pseries/cmm.c  | 10 +-
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  2 +-
 arch/x86/kernel/cpu/microcode/core.c  |  2 +-
 drivers/char/agp/backend.c|  4 ++--
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 ++--
 drivers/hv/hv_balloon.c   |  2 +-
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 ++--
 drivers/parisc/sba_iommu.c|  4 ++--
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +++---
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  2 +-
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 +--
 include/linux/mm.h| 27 +-
 include/linux/swap.h  |  1 -
 kernel/fork.c |  2 +-
 kernel/kexec_core.c   |  2 +-
 kernel/power/snapshot.c   |  2 +-
 mm/highmem.c  |  5 ++---
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  4 ++--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 20 ++-
 mm/shmem.c|  8 
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 ++--
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 ++--
 net/dccp/proto.c  |  2 +-
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  2 +-
 net/netfilter/xt_hashlimit.c  |  2 +-
 net/sctp/protocol.c   |  2 +-
 security/integrity/ima/ima_kexec.c|  2 +-
 53 files changed, 130 insertions(+), 81 deletions(-)

diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c
index dc07c07..66e5970 100644
--- a/arch/csky/mm/init.c
+++ b/arch/csky

[PATCH v4 2/4] mm: convert zone->managed_pages to atomic variable

2018-11-11 Thread Arun KS
totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

This patch converts zone->managed_pages. Subsequent patches will
convert totalram_panges, totalhigh_pages and eventually
managed_page_count_lock will be removed.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
So it seemes better to remove the lock and convert variables
to atomic, with preventing poteintial store-to-read tearing as
a bonus.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 

---
Most of the changes are done by below coccinelle script,

@@
struct zone *z;
expression e1;
@@
(
- z->managed_pages = e1
+ atomic_long_set(>managed_pages, e1)
|
- e1->managed_pages++
+ atomic_long_inc(>managed_pages)
|
- z->managed_pages
+ zone_managed_pages(z)
)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, >managed_pages)

@@
expression z;
@@
- z.managed_pages
+ zone_managed_pages()

Then, manually apply following change,
include/linux/mmzone.h

- unsigned long managed_pages;
+ atomic_long_t managed_pages;

+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}

---
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 include/linux/mmzone.h|  9 +--
 lib/show_mem.c|  2 +-
 mm/memblock.c |  2 +-
 mm/page_alloc.c   | 44 +--
 mm/vmstat.c   |  4 ++--
 6 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 56412b0..c0e55bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -848,7 +848,7 @@ static int kfd_fill_mem_info_for_cpu(int numa_node_id, int 
*avail_size,
 */
pgdat = NODE_DATA(numa_node_id);
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-   mem_in_bytes += pgdat->node_zones[zone_type].managed_pages;
+   mem_in_bytes += 
zone_managed_pages(>node_zones[zone_type]);
mem_in_bytes <<= PAGE_SHIFT;
 
sub_type_hdr->length_low = lower_32_bits(mem_in_bytes);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 847705a..e73dc31 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -435,7 +435,7 @@ struct zone {
 * adjust_managed_page_count() should be used instead of directly
 * touching zone->managed_pages and totalram_pages.
 */
-   unsigned long   managed_pages;
+   atomic_long_t   managed_pages;
unsigned long   spanned_pages;
unsigned long   present_pages;
 
@@ -524,6 +524,11 @@ enum pgdat_flags {
PGDAT_RECLAIM_LOCKED,   /* prevents concurrent reclaim */
 };
 
+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}
+
 static inline unsigned long zone_end_pfn(const struct zone *zone)
 {
return zone->zone_start_pfn + zone->spanned_pages;
@@ -814,7 +819,7 @@ static inline bool is_dev_zone(const struct zone *zone)
  */
 static inline bool managed_zone(struct zone *zone)
 {
-   return zone->managed_pages;
+   return zone_managed_pages(zone);
 }
 
 /* Returns true if a zone has memory */
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d..eefe67d 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -28,7 +28,7 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
continue;
 
total += zone->present_pages;
-   reserved += zone->present_pages - zone->managed_pages;
+   reserved += zone->present_pages - 
zone_managed_pages(zone);
 
if (is_highmem_idx(zoneid))
highmem += zone->present_pages;
diff --git a/mm/memblock.c b/mm/memblock.c
index 7df468c..bbd82ab 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1950,7 +1950,7 @@ void reset_node_managed_pages(pg_data_t *pgdat)
struct zone *z;
 
for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
-   z->managed_pages = 0;
+   atomic_long_set(>managed_pages, 0);
 }
 
 void __init reset_all_zones_managed_pages(void)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 173312b..22e6645 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ 

[PATCH v4 4/4] mm: Remove managed_page_count spinlock

2018-11-11 Thread Arun KS
Now that totalram_pages and managed_pages are atomic varibles, no need
of managed_page_count spinlock. The lock had really a weak consistency
guarantee. It hasn't been used for anything but the update but no reader
actually cares about all the values being updated to be in sync.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e73dc31..c71b4d9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8b64cc..26c5e14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);
 
-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 EXPORT_SYMBOL(_totalram_pages);
 unsigned long totalreserve_pages __read_mostly;
@@ -7065,14 +7062,12 @@ static int __init cmdline_parse_movablecore(char *p)
 
 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);
 
-- 
1.9.1



[PATCH v4 0/4] mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-11 Thread Arun KS
This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & i386_defconfig)
on 4.20-rc2. And memory hotplug tested on arm64, but on an older version
of kernel.

totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
It seemes better to remove the lock and convert variables
to atomic. With the change, preventing poteintial store-to-read
tearing comes as a bonus.

Changes in v4:
- Fixed kbuild test robot error.
- Modified changelog.
- Rebased to 4.20.-rc2

Changes in v3:
- Fixed kbuild test robot errors.
- Modified changelogs to be more clear.
- EXPORT_SYMBOL for _totalram_pages and _totalhigh_pages.

Arun KS (4):
  mm: reference totalram_pages and managed_pages once per function
  mm: convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to atomic
  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  3 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c|  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c|  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h| 27 +-
 include/linux/mmzone.h| 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c|  2 +-
 mm/highmem.c  |  5 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 72 +--
 mm/shmem.c|  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c|  2 +-
 57 files changed, 195 insertions(+), 142 deletions(-)

-- 
1.9.1



[PATCH v4 2/4] mm: convert zone->managed_pages to atomic variable

2018-11-11 Thread Arun KS
totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

This patch converts zone->managed_pages. Subsequent patches will
convert totalram_panges, totalhigh_pages and eventually
managed_page_count_lock will be removed.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
So it seemes better to remove the lock and convert variables
to atomic, with preventing poteintial store-to-read tearing as
a bonus.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 

---
Most of the changes are done by below coccinelle script,

@@
struct zone *z;
expression e1;
@@
(
- z->managed_pages = e1
+ atomic_long_set(>managed_pages, e1)
|
- e1->managed_pages++
+ atomic_long_inc(>managed_pages)
|
- z->managed_pages
+ zone_managed_pages(z)
)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, >managed_pages)

@@
expression z;
@@
- z.managed_pages
+ zone_managed_pages()

Then, manually apply following change,
include/linux/mmzone.h

- unsigned long managed_pages;
+ atomic_long_t managed_pages;

+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}

---
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 include/linux/mmzone.h|  9 +--
 lib/show_mem.c|  2 +-
 mm/memblock.c |  2 +-
 mm/page_alloc.c   | 44 +--
 mm/vmstat.c   |  4 ++--
 6 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 56412b0..c0e55bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -848,7 +848,7 @@ static int kfd_fill_mem_info_for_cpu(int numa_node_id, int 
*avail_size,
 */
pgdat = NODE_DATA(numa_node_id);
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-   mem_in_bytes += pgdat->node_zones[zone_type].managed_pages;
+   mem_in_bytes += 
zone_managed_pages(>node_zones[zone_type]);
mem_in_bytes <<= PAGE_SHIFT;
 
sub_type_hdr->length_low = lower_32_bits(mem_in_bytes);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 847705a..e73dc31 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -435,7 +435,7 @@ struct zone {
 * adjust_managed_page_count() should be used instead of directly
 * touching zone->managed_pages and totalram_pages.
 */
-   unsigned long   managed_pages;
+   atomic_long_t   managed_pages;
unsigned long   spanned_pages;
unsigned long   present_pages;
 
@@ -524,6 +524,11 @@ enum pgdat_flags {
PGDAT_RECLAIM_LOCKED,   /* prevents concurrent reclaim */
 };
 
+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}
+
 static inline unsigned long zone_end_pfn(const struct zone *zone)
 {
return zone->zone_start_pfn + zone->spanned_pages;
@@ -814,7 +819,7 @@ static inline bool is_dev_zone(const struct zone *zone)
  */
 static inline bool managed_zone(struct zone *zone)
 {
-   return zone->managed_pages;
+   return zone_managed_pages(zone);
 }
 
 /* Returns true if a zone has memory */
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d..eefe67d 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -28,7 +28,7 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
continue;
 
total += zone->present_pages;
-   reserved += zone->present_pages - zone->managed_pages;
+   reserved += zone->present_pages - 
zone_managed_pages(zone);
 
if (is_highmem_idx(zoneid))
highmem += zone->present_pages;
diff --git a/mm/memblock.c b/mm/memblock.c
index 7df468c..bbd82ab 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1950,7 +1950,7 @@ void reset_node_managed_pages(pg_data_t *pgdat)
struct zone *z;
 
for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
-   z->managed_pages = 0;
+   atomic_long_set(>managed_pages, 0);
 }
 
 void __init reset_all_zones_managed_pages(void)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 173312b..22e6645 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ 

[PATCH v4 4/4] mm: Remove managed_page_count spinlock

2018-11-11 Thread Arun KS
Now that totalram_pages and managed_pages are atomic varibles, no need
of managed_page_count spinlock. The lock had really a weak consistency
guarantee. It hasn't been used for anything but the update but no reader
actually cares about all the values being updated to be in sync.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e73dc31..c71b4d9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8b64cc..26c5e14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);
 
-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 EXPORT_SYMBOL(_totalram_pages);
 unsigned long totalreserve_pages __read_mostly;
@@ -7065,14 +7062,12 @@ static int __init cmdline_parse_movablecore(char *p)
 
 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);
 
-- 
1.9.1



[PATCH v4 0/4] mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-11 Thread Arun KS
This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & i386_defconfig)
on 4.20-rc2. And memory hotplug tested on arm64, but on an older version
of kernel.

totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
It seemes better to remove the lock and convert variables
to atomic. With the change, preventing poteintial store-to-read
tearing comes as a bonus.

Changes in v4:
- Fixed kbuild test robot error.
- Modified changelog.
- Rebased to 4.20.-rc2

Changes in v3:
- Fixed kbuild test robot errors.
- Modified changelogs to be more clear.
- EXPORT_SYMBOL for _totalram_pages and _totalhigh_pages.

Arun KS (4):
  mm: reference totalram_pages and managed_pages once per function
  mm: convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to atomic
  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  3 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c|  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c|  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h| 27 +-
 include/linux/mmzone.h| 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c|  2 +-
 mm/highmem.c  |  5 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 72 +--
 mm/shmem.c|  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c|  2 +-
 57 files changed, 195 insertions(+), 142 deletions(-)

-- 
1.9.1



Re: [PATCH v3 4/4] mm: Remove managed_page_count spinlock

2018-11-09 Thread Arun KS

On 2018-11-08 15:44, Michal Hocko wrote:

On Thu 08-11-18 15:33:06, Arun KS wrote:

On 2018-11-08 14:04, Michal Hocko wrote:
> On Thu 08-11-18 13:53:18, Arun KS wrote:
> > Now totalram_pages and managed_pages are atomic varibles. No need
> > of managed_page_count spinlock.
>
> As explained earlier. Please add a motivation here. Feel free to reuse
> wording from
> http://lkml.kernel.org/r/20181107103630.gf2...@dhcp22.suse.cz

Sure. Will add in next spin.


Andrew usually updates changelogs if you give him the full wording.
I would wait few days before resubmitting, if that is needed at all.
0day will throw a lot of random configs which can reveal some 
leftovers.


0day sent one more failure. Will fix that and resend one more version.

Regards,
Arun


Re: [PATCH v3 4/4] mm: Remove managed_page_count spinlock

2018-11-09 Thread Arun KS

On 2018-11-08 15:44, Michal Hocko wrote:

On Thu 08-11-18 15:33:06, Arun KS wrote:

On 2018-11-08 14:04, Michal Hocko wrote:
> On Thu 08-11-18 13:53:18, Arun KS wrote:
> > Now totalram_pages and managed_pages are atomic varibles. No need
> > of managed_page_count spinlock.
>
> As explained earlier. Please add a motivation here. Feel free to reuse
> wording from
> http://lkml.kernel.org/r/20181107103630.gf2...@dhcp22.suse.cz

Sure. Will add in next spin.


Andrew usually updates changelogs if you give him the full wording.
I would wait few days before resubmitting, if that is needed at all.
0day will throw a lot of random configs which can reveal some 
leftovers.


0day sent one more failure. Will fix that and resend one more version.

Regards,
Arun


Re: [PATCH v3 4/4] mm: Remove managed_page_count spinlock

2018-11-08 Thread Arun KS

On 2018-11-08 15:44, Michal Hocko wrote:

On Thu 08-11-18 15:33:06, Arun KS wrote:

On 2018-11-08 14:04, Michal Hocko wrote:
> On Thu 08-11-18 13:53:18, Arun KS wrote:
> > Now totalram_pages and managed_pages are atomic varibles. No need
> > of managed_page_count spinlock.
>
> As explained earlier. Please add a motivation here. Feel free to reuse
> wording from
> http://lkml.kernel.org/r/20181107103630.gf2...@dhcp22.suse.cz

Sure. Will add in next spin.


Andrew usually updates changelogs if you give him the full wording.
I would wait few days before resubmitting, if that is needed at all.


mm: Remove managed_page_count spinlock

Now that totalram_pages and managed_pages are atomic varibles, no need
of managed_page_count spinlock. The lock had really a weak consistency
guarantee. It hasn't been used for anything but the update but no reader
actually cares about all the values being updated to be in sync.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 


0day will throw a lot of random configs which can reveal some 
leftovers.


Yea. Fixed few of them during v3.

Regards,
Arun


Re: [PATCH v3 4/4] mm: Remove managed_page_count spinlock

2018-11-08 Thread Arun KS

On 2018-11-08 15:44, Michal Hocko wrote:

On Thu 08-11-18 15:33:06, Arun KS wrote:

On 2018-11-08 14:04, Michal Hocko wrote:
> On Thu 08-11-18 13:53:18, Arun KS wrote:
> > Now totalram_pages and managed_pages are atomic varibles. No need
> > of managed_page_count spinlock.
>
> As explained earlier. Please add a motivation here. Feel free to reuse
> wording from
> http://lkml.kernel.org/r/20181107103630.gf2...@dhcp22.suse.cz

Sure. Will add in next spin.


Andrew usually updates changelogs if you give him the full wording.
I would wait few days before resubmitting, if that is needed at all.


mm: Remove managed_page_count spinlock

Now that totalram_pages and managed_pages are atomic varibles, no need
of managed_page_count spinlock. The lock had really a weak consistency
guarantee. It hasn't been used for anything but the update but no reader
actually cares about all the values being updated to be in sync.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 


0day will throw a lot of random configs which can reveal some 
leftovers.


Yea. Fixed few of them during v3.

Regards,
Arun


Re: [PATCH v3 4/4] mm: Remove managed_page_count spinlock

2018-11-08 Thread Arun KS

On 2018-11-08 14:04, Michal Hocko wrote:

On Thu 08-11-18 13:53:18, Arun KS wrote:

Now totalram_pages and managed_pages are atomic varibles. No need
of managed_page_count spinlock.


As explained earlier. Please add a motivation here. Feel free to reuse
wording from 
http://lkml.kernel.org/r/20181107103630.gf2...@dhcp22.suse.cz


Sure. Will add in next spin.

Regards,
Arun




Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e73dc31..c71b4d9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8b64cc..26c5e14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);

-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 EXPORT_SYMBOL(_totalram_pages);
 unsigned long totalreserve_pages __read_mostly;
@@ -7065,14 +7062,12 @@ static int __init 
cmdline_parse_movablecore(char *p)


 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);

--
1.9.1


Re: [PATCH v3 4/4] mm: Remove managed_page_count spinlock

2018-11-08 Thread Arun KS

On 2018-11-08 14:04, Michal Hocko wrote:

On Thu 08-11-18 13:53:18, Arun KS wrote:

Now totalram_pages and managed_pages are atomic varibles. No need
of managed_page_count spinlock.


As explained earlier. Please add a motivation here. Feel free to reuse
wording from 
http://lkml.kernel.org/r/20181107103630.gf2...@dhcp22.suse.cz


Sure. Will add in next spin.

Regards,
Arun




Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e73dc31..c71b4d9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8b64cc..26c5e14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);

-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 EXPORT_SYMBOL(_totalram_pages);
 unsigned long totalreserve_pages __read_mostly;
@@ -7065,14 +7062,12 @@ static int __init 
cmdline_parse_movablecore(char *p)


 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);

--
1.9.1


[PATCH v3 1/4] mm: reference totalram_pages and managed_pages once per function

2018-11-08 Thread Arun KS
This patch is in preparation to a later patch which converts totalram_pages
and zone->managed_pages to atomic variables. Please note that re-reading
the value might lead to a different value and as such it could lead to
unexpected behavior. There are no known bugs as a result of the current code
but it is better to prevent from them in principle.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
---
 arch/um/kernel/mem.c |  3 +--
 arch/x86/kernel/cpu/microcode/core.c |  5 +++--
 drivers/hv/hv_balloon.c  | 19 ++-
 fs/file_table.c  |  7 ---
 kernel/fork.c|  5 +++--
 kernel/kexec_core.c  |  5 +++--
 mm/page_alloc.c  |  5 +++--
 mm/shmem.c   |  3 ++-
 net/dccp/proto.c |  7 ---
 net/netfilter/nf_conntrack_core.c|  7 ---
 net/netfilter/xt_hashlimit.c |  5 +++--
 net/sctp/protocol.c  |  7 ---
 12 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 1067469..134d3fd 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -51,8 +51,7 @@ void __init mem_init(void)
 
/* this will put all low memory onto the freelists */
memblock_free_all();
-   max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn = totalram_pages;
mem_init_print_info(NULL);
kmalloc_ok = 1;
 }
diff --git a/arch/x86/kernel/cpu/microcode/core.c 
b/arch/x86/kernel/cpu/microcode/core.c
index 2637ff0..99c67ca 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, const 
char __user *buf,
   size_t len, loff_t *ppos)
 {
ssize_t ret = -EINVAL;
+   unsigned long totalram_pgs = totalram_pages;
 
-   if ((len >> PAGE_SHIFT) > totalram_pages) {
-   pr_err("too much data (max %ld pages)\n", totalram_pages);
+   if ((len >> PAGE_SHIFT) > totalram_pgs) {
+   pr_err("too much data (max %ld pages)\n", totalram_pgs);
return ret;
}
 
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4163151..cac4945 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1090,6 +1090,7 @@ static void process_info(struct hv_dynmem_device *dm, 
struct dm_info_msg *msg)
 static unsigned long compute_balloon_floor(void)
 {
unsigned long min_pages;
+   unsigned long totalram_pgs = totalram_pages;
 #define MB2PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
/* Simple continuous piecewiese linear function:
 *  max MiB -> min MiB  gradient
@@ -1102,16 +1103,16 @@ static unsigned long compute_balloon_floor(void)
 *8192   744(1/16)
 *   32768  1512(1/32)
 */
-   if (totalram_pages < MB2PAGES(128))
-   min_pages = MB2PAGES(8) + (totalram_pages >> 1);
-   else if (totalram_pages < MB2PAGES(512))
-   min_pages = MB2PAGES(40) + (totalram_pages >> 2);
-   else if (totalram_pages < MB2PAGES(2048))
-   min_pages = MB2PAGES(104) + (totalram_pages >> 3);
-   else if (totalram_pages < MB2PAGES(8192))
-   min_pages = MB2PAGES(232) + (totalram_pages >> 4);
+   if (totalram_pgs < MB2PAGES(128))
+   min_pages = MB2PAGES(8) + (totalram_pgs >> 1);
+   else if (totalram_pgs < MB2PAGES(512))
+   min_pages = MB2PAGES(40) + (totalram_pgs >> 2);
+   else if (totalram_pgs < MB2PAGES(2048))
+   min_pages = MB2PAGES(104) + (totalram_pgs >> 3);
+   else if (totalram_pgs < MB2PAGES(8192))
+   min_pages = MB2PAGES(232) + (totalram_pgs >> 4);
else
-   min_pages = MB2PAGES(488) + (totalram_pages >> 5);
+   min_pages = MB2PAGES(488) + (totalram_pgs >> 5);
 #undef MB2PAGES
return min_pages;
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index e49af4c..6e3c088 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -380,10 +380,11 @@ void __init files_init(void)
 void __init files_maxfiles_init(void)
 {
unsigned long n;
-   unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+   unsigned long totalram_pgs = totalram_pages;
+   unsigned long memreserve = (totalram_pgs - nr_free_pages()) * 3/2;
 
-   memreserve = min(memreserve, totalram_pages - 1);
-   n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
+   memreserve = min(memreserve, totalram_pgs - 1);
+   n = ((totalram_pgs - memreserve) * (PAGE_SIZE / 1024)) / 10;
 
files_stat.max_files = max_t(unsigned long, n, NR_FILE);

[PATCH v3 1/4] mm: reference totalram_pages and managed_pages once per function

2018-11-08 Thread Arun KS
This patch is in preparation to a later patch which converts totalram_pages
and zone->managed_pages to atomic variables. Please note that re-reading
the value might lead to a different value and as such it could lead to
unexpected behavior. There are no known bugs as a result of the current code
but it is better to prevent from them in principle.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
---
 arch/um/kernel/mem.c |  3 +--
 arch/x86/kernel/cpu/microcode/core.c |  5 +++--
 drivers/hv/hv_balloon.c  | 19 ++-
 fs/file_table.c  |  7 ---
 kernel/fork.c|  5 +++--
 kernel/kexec_core.c  |  5 +++--
 mm/page_alloc.c  |  5 +++--
 mm/shmem.c   |  3 ++-
 net/dccp/proto.c |  7 ---
 net/netfilter/nf_conntrack_core.c|  7 ---
 net/netfilter/xt_hashlimit.c |  5 +++--
 net/sctp/protocol.c  |  7 ---
 12 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 1067469..134d3fd 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -51,8 +51,7 @@ void __init mem_init(void)
 
/* this will put all low memory onto the freelists */
memblock_free_all();
-   max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn = totalram_pages;
mem_init_print_info(NULL);
kmalloc_ok = 1;
 }
diff --git a/arch/x86/kernel/cpu/microcode/core.c 
b/arch/x86/kernel/cpu/microcode/core.c
index 2637ff0..99c67ca 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, const 
char __user *buf,
   size_t len, loff_t *ppos)
 {
ssize_t ret = -EINVAL;
+   unsigned long totalram_pgs = totalram_pages;
 
-   if ((len >> PAGE_SHIFT) > totalram_pages) {
-   pr_err("too much data (max %ld pages)\n", totalram_pages);
+   if ((len >> PAGE_SHIFT) > totalram_pgs) {
+   pr_err("too much data (max %ld pages)\n", totalram_pgs);
return ret;
}
 
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4163151..cac4945 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1090,6 +1090,7 @@ static void process_info(struct hv_dynmem_device *dm, 
struct dm_info_msg *msg)
 static unsigned long compute_balloon_floor(void)
 {
unsigned long min_pages;
+   unsigned long totalram_pgs = totalram_pages;
 #define MB2PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
/* Simple continuous piecewiese linear function:
 *  max MiB -> min MiB  gradient
@@ -1102,16 +1103,16 @@ static unsigned long compute_balloon_floor(void)
 *8192   744(1/16)
 *   32768  1512(1/32)
 */
-   if (totalram_pages < MB2PAGES(128))
-   min_pages = MB2PAGES(8) + (totalram_pages >> 1);
-   else if (totalram_pages < MB2PAGES(512))
-   min_pages = MB2PAGES(40) + (totalram_pages >> 2);
-   else if (totalram_pages < MB2PAGES(2048))
-   min_pages = MB2PAGES(104) + (totalram_pages >> 3);
-   else if (totalram_pages < MB2PAGES(8192))
-   min_pages = MB2PAGES(232) + (totalram_pages >> 4);
+   if (totalram_pgs < MB2PAGES(128))
+   min_pages = MB2PAGES(8) + (totalram_pgs >> 1);
+   else if (totalram_pgs < MB2PAGES(512))
+   min_pages = MB2PAGES(40) + (totalram_pgs >> 2);
+   else if (totalram_pgs < MB2PAGES(2048))
+   min_pages = MB2PAGES(104) + (totalram_pgs >> 3);
+   else if (totalram_pgs < MB2PAGES(8192))
+   min_pages = MB2PAGES(232) + (totalram_pgs >> 4);
else
-   min_pages = MB2PAGES(488) + (totalram_pages >> 5);
+   min_pages = MB2PAGES(488) + (totalram_pgs >> 5);
 #undef MB2PAGES
return min_pages;
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index e49af4c..6e3c088 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -380,10 +380,11 @@ void __init files_init(void)
 void __init files_maxfiles_init(void)
 {
unsigned long n;
-   unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+   unsigned long totalram_pgs = totalram_pages;
+   unsigned long memreserve = (totalram_pgs - nr_free_pages()) * 3/2;
 
-   memreserve = min(memreserve, totalram_pages - 1);
-   n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
+   memreserve = min(memreserve, totalram_pgs - 1);
+   n = ((totalram_pgs - memreserve) * (PAGE_SIZE / 1024)) / 10;
 
files_stat.max_files = max_t(unsigned long, n, NR_FILE);

[PATCH v3 2/4] mm: convert zone->managed_pages to atomic variable

2018-11-08 Thread Arun KS
totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

This patch converts zone->managed_pages. Subsequent patches will
convert totalram_panges, totalhigh_pages and eventually
managed_page_count_lock will be removed.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 

---
Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in lenght here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
So it seemes better to remove the lock and convert variables
to atomic, with preventing poteintial store-to-read tearing as
a bonus.

Most of the changes are done by below coccinelle script,

@@
struct zone *z;
expression e1;
@@
(
- z->managed_pages = e1
+ atomic_long_set(>managed_pages, e1)
|
- e1->managed_pages++
+ atomic_long_inc(>managed_pages)
|
- z->managed_pages
+ zone_managed_pages(z)
)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, >managed_pages)

@@
expression z;
@@
- z.managed_pages
+ zone_managed_pages()

Then, manually apply following change,
include/linux/mmzone.h

- unsigned long managed_pages;
+ atomic_long_t managed_pages;

+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}

---
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 include/linux/mmzone.h|  9 +--
 lib/show_mem.c|  2 +-
 mm/memblock.c |  2 +-
 mm/page_alloc.c   | 44 +--
 mm/vmstat.c   |  4 ++--
 6 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 56412b0..c0e55bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -848,7 +848,7 @@ static int kfd_fill_mem_info_for_cpu(int numa_node_id, int 
*avail_size,
 */
pgdat = NODE_DATA(numa_node_id);
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-   mem_in_bytes += pgdat->node_zones[zone_type].managed_pages;
+   mem_in_bytes += 
zone_managed_pages(>node_zones[zone_type]);
mem_in_bytes <<= PAGE_SHIFT;
 
sub_type_hdr->length_low = lower_32_bits(mem_in_bytes);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 847705a..e73dc31 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -435,7 +435,7 @@ struct zone {
 * adjust_managed_page_count() should be used instead of directly
 * touching zone->managed_pages and totalram_pages.
 */
-   unsigned long   managed_pages;
+   atomic_long_t   managed_pages;
unsigned long   spanned_pages;
unsigned long   present_pages;
 
@@ -524,6 +524,11 @@ enum pgdat_flags {
PGDAT_RECLAIM_LOCKED,   /* prevents concurrent reclaim */
 };
 
+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}
+
 static inline unsigned long zone_end_pfn(const struct zone *zone)
 {
return zone->zone_start_pfn + zone->spanned_pages;
@@ -814,7 +819,7 @@ static inline bool is_dev_zone(const struct zone *zone)
  */
 static inline bool managed_zone(struct zone *zone)
 {
-   return zone->managed_pages;
+   return zone_managed_pages(zone);
 }
 
 /* Returns true if a zone has memory */
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d..eefe67d 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -28,7 +28,7 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
continue;
 
total += zone->present_pages;
-   reserved += zone->present_pages - zone->managed_pages;
+   reserved += zone->present_pages - 
zone_managed_pages(zone);
 
if (is_highmem_idx(zoneid))
highmem += zone->present_pages;
diff --git a/mm/memblock.c b/mm/memblock.c
index 7df468c..bbd82ab 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1950,7 +1950,7 @@ void reset_node_managed_pages(pg_data_t *pgdat)
struct zone *z;
 
for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
-   z->managed_pages = 0;
+   atomic_long_set(>managed_pages, 0);
 }
 
 void __init reset_all_zones_managed_pages(void)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 173312b..22e6645 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ 

[PATCH v3 3/4] mm: convert totalram_pages and totalhigh_pages variables to atomic

2018-11-08 Thread Arun KS
totalram_pages and totalhigh_pages are made static inline function.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
coccinelle script to make most of the changes,

@@
declarer name EXPORT_SYMBOL;
symbol totalram_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalram_pages);
|
- totalram_pages = e
+ totalram_pages_set(e)
|
- totalram_pages += e
+ totalram_pages_add(e)
|
- totalram_pages++
+ totalram_pages_inc()
|
- totalram_pages--
+ totalram_pages_dec()
|
- totalram_pages
+ totalram_pages()
)

@@
symbol totalhigh_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalhigh_pages);
|
- totalhigh_pages = e
+ totalhigh_pages_set(e)
|
- totalhigh_pages += e
+ totalhigh_pages_add(e)
|
- totalhigh_pages++
+ totalhigh_pages_inc()
|
- totalhigh_pages--
+ totalhigh_pages_dec()
|
- totalhigh_pages
+ totalhigh_pages()
)

Manaually apply all changes of following files,

include/linux/highmem.h
include/linux/mm.h
include/linux/swap.h
mm/highmem.c

and for mm/page_alloc.c mannualy apply only below changes,

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 

 /* Protect totalram_pages and zone->managed_pages */
 static DEFINE_SPINLOCK(managed_page_count_lock);

-unsigned long totalram_pages __read_mostly;
+atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;

---
---
 arch/csky/mm/init.c   |  4 ++--
 arch/powerpc/platforms/pseries/cmm.c  | 10 +-
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  2 +-
 arch/x86/kernel/cpu/microcode/core.c  |  2 +-
 drivers/char/agp/backend.c|  4 ++--
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 ++--
 drivers/hv/hv_balloon.c   |  2 +-
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 ++--
 drivers/parisc/sba_iommu.c|  4 ++--
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +++---
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  2 +-
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 +--
 include/linux/mm.h| 27 +-
 include/linux/swap.h  |  1 -
 kernel/fork.c |  2 +-
 kernel/kexec_core.c   |  2 +-
 kernel/power/snapshot.c   |  2 +-
 mm/highmem.c  |  5 ++---
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  4 ++--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 20 ++-
 mm/shmem.c|  8 
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 ++--
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 ++--
 net/dccp/proto.c  |  2 +-
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  2 +-
 net/netfilter/xt_hashlimit.c  |  2 +-
 security/integrity/ima/ima_kexec.c|  2 +-
 52 files changed, 129 insertions(+), 80 deletions(-)

diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c
index dc07c07..66e5970 100644
--- a/arch/csky/mm/init.c
+++ b/arch/csky/mm/init.c
@@ -71,7 +71,7 @@ void free_initrd_mem(unsigned long start, unsigned long end)
ClearPageReserved(virt_to_page(start));
init_page_count(virt_to_page(start));
free_page(start);
-   totalram_pages++;
+   totalram_pages_inc();
}
 }
 #endif
@@ -88,7 +88,7 @@ void free_initmem(v

[PATCH v3 4/4] mm: Remove managed_page_count spinlock

2018-11-08 Thread Arun KS
Now totalram_pages and managed_pages are atomic varibles. No need
of managed_page_count spinlock.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e73dc31..c71b4d9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8b64cc..26c5e14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);
 
-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 EXPORT_SYMBOL(_totalram_pages);
 unsigned long totalreserve_pages __read_mostly;
@@ -7065,14 +7062,12 @@ static int __init cmdline_parse_movablecore(char *p)
 
 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);
 
-- 
1.9.1



[PATCH v3 2/4] mm: convert zone->managed_pages to atomic variable

2018-11-08 Thread Arun KS
totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

This patch converts zone->managed_pages. Subsequent patches will
convert totalram_panges, totalhigh_pages and eventually
managed_page_count_lock will be removed.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 

---
Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in lenght here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
So it seemes better to remove the lock and convert variables
to atomic, with preventing poteintial store-to-read tearing as
a bonus.

Most of the changes are done by below coccinelle script,

@@
struct zone *z;
expression e1;
@@
(
- z->managed_pages = e1
+ atomic_long_set(>managed_pages, e1)
|
- e1->managed_pages++
+ atomic_long_inc(>managed_pages)
|
- z->managed_pages
+ zone_managed_pages(z)
)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, >managed_pages)

@@
expression z;
@@
- z.managed_pages
+ zone_managed_pages()

Then, manually apply following change,
include/linux/mmzone.h

- unsigned long managed_pages;
+ atomic_long_t managed_pages;

+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}

---
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 include/linux/mmzone.h|  9 +--
 lib/show_mem.c|  2 +-
 mm/memblock.c |  2 +-
 mm/page_alloc.c   | 44 +--
 mm/vmstat.c   |  4 ++--
 6 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 56412b0..c0e55bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -848,7 +848,7 @@ static int kfd_fill_mem_info_for_cpu(int numa_node_id, int 
*avail_size,
 */
pgdat = NODE_DATA(numa_node_id);
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-   mem_in_bytes += pgdat->node_zones[zone_type].managed_pages;
+   mem_in_bytes += 
zone_managed_pages(>node_zones[zone_type]);
mem_in_bytes <<= PAGE_SHIFT;
 
sub_type_hdr->length_low = lower_32_bits(mem_in_bytes);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 847705a..e73dc31 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -435,7 +435,7 @@ struct zone {
 * adjust_managed_page_count() should be used instead of directly
 * touching zone->managed_pages and totalram_pages.
 */
-   unsigned long   managed_pages;
+   atomic_long_t   managed_pages;
unsigned long   spanned_pages;
unsigned long   present_pages;
 
@@ -524,6 +524,11 @@ enum pgdat_flags {
PGDAT_RECLAIM_LOCKED,   /* prevents concurrent reclaim */
 };
 
+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}
+
 static inline unsigned long zone_end_pfn(const struct zone *zone)
 {
return zone->zone_start_pfn + zone->spanned_pages;
@@ -814,7 +819,7 @@ static inline bool is_dev_zone(const struct zone *zone)
  */
 static inline bool managed_zone(struct zone *zone)
 {
-   return zone->managed_pages;
+   return zone_managed_pages(zone);
 }
 
 /* Returns true if a zone has memory */
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d..eefe67d 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -28,7 +28,7 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
continue;
 
total += zone->present_pages;
-   reserved += zone->present_pages - zone->managed_pages;
+   reserved += zone->present_pages - 
zone_managed_pages(zone);
 
if (is_highmem_idx(zoneid))
highmem += zone->present_pages;
diff --git a/mm/memblock.c b/mm/memblock.c
index 7df468c..bbd82ab 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1950,7 +1950,7 @@ void reset_node_managed_pages(pg_data_t *pgdat)
struct zone *z;
 
for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
-   z->managed_pages = 0;
+   atomic_long_set(>managed_pages, 0);
 }
 
 void __init reset_all_zones_managed_pages(void)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 173312b..22e6645 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ 

[PATCH v3 3/4] mm: convert totalram_pages and totalhigh_pages variables to atomic

2018-11-08 Thread Arun KS
totalram_pages and totalhigh_pages are made static inline function.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
coccinelle script to make most of the changes,

@@
declarer name EXPORT_SYMBOL;
symbol totalram_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalram_pages);
|
- totalram_pages = e
+ totalram_pages_set(e)
|
- totalram_pages += e
+ totalram_pages_add(e)
|
- totalram_pages++
+ totalram_pages_inc()
|
- totalram_pages--
+ totalram_pages_dec()
|
- totalram_pages
+ totalram_pages()
)

@@
symbol totalhigh_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalhigh_pages);
|
- totalhigh_pages = e
+ totalhigh_pages_set(e)
|
- totalhigh_pages += e
+ totalhigh_pages_add(e)
|
- totalhigh_pages++
+ totalhigh_pages_inc()
|
- totalhigh_pages--
+ totalhigh_pages_dec()
|
- totalhigh_pages
+ totalhigh_pages()
)

Manaually apply all changes of following files,

include/linux/highmem.h
include/linux/mm.h
include/linux/swap.h
mm/highmem.c

and for mm/page_alloc.c mannualy apply only below changes,

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 

 /* Protect totalram_pages and zone->managed_pages */
 static DEFINE_SPINLOCK(managed_page_count_lock);

-unsigned long totalram_pages __read_mostly;
+atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;

---
---
 arch/csky/mm/init.c   |  4 ++--
 arch/powerpc/platforms/pseries/cmm.c  | 10 +-
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  2 +-
 arch/x86/kernel/cpu/microcode/core.c  |  2 +-
 drivers/char/agp/backend.c|  4 ++--
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 ++--
 drivers/hv/hv_balloon.c   |  2 +-
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 ++--
 drivers/parisc/sba_iommu.c|  4 ++--
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +++---
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  2 +-
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 +--
 include/linux/mm.h| 27 +-
 include/linux/swap.h  |  1 -
 kernel/fork.c |  2 +-
 kernel/kexec_core.c   |  2 +-
 kernel/power/snapshot.c   |  2 +-
 mm/highmem.c  |  5 ++---
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  4 ++--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 20 ++-
 mm/shmem.c|  8 
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 ++--
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 ++--
 net/dccp/proto.c  |  2 +-
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  2 +-
 net/netfilter/xt_hashlimit.c  |  2 +-
 security/integrity/ima/ima_kexec.c|  2 +-
 52 files changed, 129 insertions(+), 80 deletions(-)

diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c
index dc07c07..66e5970 100644
--- a/arch/csky/mm/init.c
+++ b/arch/csky/mm/init.c
@@ -71,7 +71,7 @@ void free_initrd_mem(unsigned long start, unsigned long end)
ClearPageReserved(virt_to_page(start));
init_page_count(virt_to_page(start));
free_page(start);
-   totalram_pages++;
+   totalram_pages_inc();
}
 }
 #endif
@@ -88,7 +88,7 @@ void free_initmem(v

[PATCH v3 4/4] mm: Remove managed_page_count spinlock

2018-11-08 Thread Arun KS
Now totalram_pages and managed_pages are atomic varibles. No need
of managed_page_count spinlock.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e73dc31..c71b4d9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8b64cc..26c5e14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);
 
-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 EXPORT_SYMBOL(_totalram_pages);
 unsigned long totalreserve_pages __read_mostly;
@@ -7065,14 +7062,12 @@ static int __init cmdline_parse_movablecore(char *p)
 
 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);
 
-- 
1.9.1



[PATCH v3 0/4] mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-08 Thread Arun KS
This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & i386_defconfig)
on 4.20-rc1. And memory hotplug tested on arm64, but on an older version
of kernel.

totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
It seemes better to remove the lock and convert variables
to atomic. With the change, preventing poteintial store-to-read
tearing comes as a bonus.

Changes in v3:
- Fixed kbuild test robot errors.
- Modifed changelogs to be more clear.
- EXPORT_SYMBOL for _totalram_pages and _totalhigh_pages.

Arun KS (4):
  mm: reference totalram_pages and managed_pages once per function
  mm: convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to atomic
  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  3 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c|  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c|  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h| 27 +-
 include/linux/mmzone.h| 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c|  2 +-
 mm/highmem.c  |  5 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 72 +--
 mm/shmem.c|  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c|  2 +-
 57 files changed, 195 insertions(+), 142 deletions(-)

-- 
1.9.1



[PATCH v3 0/4] mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-08 Thread Arun KS
This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & i386_defconfig)
on 4.20-rc1. And memory hotplug tested on arm64, but on an older version
of kernel.

totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
It seemes better to remove the lock and convert variables
to atomic. With the change, preventing poteintial store-to-read
tearing comes as a bonus.

Changes in v3:
- Fixed kbuild test robot errors.
- Modifed changelogs to be more clear.
- EXPORT_SYMBOL for _totalram_pages and _totalhigh_pages.

Arun KS (4):
  mm: reference totalram_pages and managed_pages once per function
  mm: convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to atomic
  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  3 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c|  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c|  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h| 27 +-
 include/linux/mmzone.h| 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c|  2 +-
 mm/highmem.c  |  5 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 72 +--
 mm/shmem.c|  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c|  2 +-
 57 files changed, 195 insertions(+), 142 deletions(-)

-- 
1.9.1



Re: [PATCH v2 3/4] mm: convert totalram_pages and totalhigh_pages variables to atomic

2018-11-07 Thread Arun KS

On 2018-11-07 14:34, Vlastimil Babka wrote:

On 11/6/18 5:21 PM, Arun KS wrote:

totalram_pages and totalhigh_pages are made static inline function.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 


Acked-by: Vlastimil Babka 

One bug (probably) below:


diff --git a/mm/highmem.c b/mm/highmem.c
index 59db322..02a9a4b 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -105,9 +105,7 @@ static inline wait_queue_head_t 
*get_pkmap_wait_queue_head(unsigned int color)

 }
 #endif

-unsigned long totalhigh_pages __read_mostly;
-EXPORT_SYMBOL(totalhigh_pages);


I think you still need to export _totalhigh_pages so that modules can
use the inline accessors.


Thanks for pointing this. I missed that. Will do the same for 
_totalram_pages.


Regards,
Arun




-
+atomic_long_t _totalhigh_pages __read_mostly;

 EXPORT_PER_CPU_SYMBOL(__kmap_atomic_idx);



Re: [PATCH v2 3/4] mm: convert totalram_pages and totalhigh_pages variables to atomic

2018-11-07 Thread Arun KS

On 2018-11-07 14:34, Vlastimil Babka wrote:

On 11/6/18 5:21 PM, Arun KS wrote:

totalram_pages and totalhigh_pages are made static inline function.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 


Acked-by: Vlastimil Babka 

One bug (probably) below:


diff --git a/mm/highmem.c b/mm/highmem.c
index 59db322..02a9a4b 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -105,9 +105,7 @@ static inline wait_queue_head_t 
*get_pkmap_wait_queue_head(unsigned int color)

 }
 #endif

-unsigned long totalhigh_pages __read_mostly;
-EXPORT_SYMBOL(totalhigh_pages);


I think you still need to export _totalhigh_pages so that modules can
use the inline accessors.


Thanks for pointing this. I missed that. Will do the same for 
_totalram_pages.


Regards,
Arun




-
+atomic_long_t _totalhigh_pages __read_mostly;

 EXPORT_PER_CPU_SYMBOL(__kmap_atomic_idx);



Re: [PATCH v6 1/2] memory_hotplug: Free pages as higher order

2018-11-06 Thread Arun KS

On 2018-11-07 01:38, Michal Hocko wrote:

On Tue 06-11-18 21:01:29, Arun KS wrote:

On 2018-11-06 19:36, Michal Hocko wrote:
> On Tue 06-11-18 11:33:13, Arun KS wrote:
> > When free pages are done with higher order, time spend on
> > coalescing pages by buddy allocator can be reduced. With
> > section size of 256MB, hot add latency of a single section
> > shows improvement from 50-60 ms to less than 1 ms, hence
> > improving the hot add latency by 60%. Modify external
> > providers of online callback to align with the change.
> >
> > This patch modifies totalram_pages, zone->managed_pages and
> > totalhigh_pages outside managed_page_count_lock. A follow up
> > series will be send to convert these variable to atomic to
> > avoid readers potentially seeing a store tear.
>
> Is there any reason to rush this through rather than wait for counters
> conversion first?

Sure Michal.

Conversion patch, https://patchwork.kernel.org/cover/10657217/ is 
currently

incremental to this patch.


The ordering should be other way around. Because as things stand with
this patch first it is possible to introduce a subtle race prone
updates. As I've said I am skeptical the race would matter, really, but
there is no real reason to risk for that. Especially when you have the
other (first) half ready.


Makes sense. I have rebased the preparatory patch on top of -rc1.
https://patchwork.kernel.org/patch/10670787/

Regards,
Arun


Re: [PATCH v6 1/2] memory_hotplug: Free pages as higher order

2018-11-06 Thread Arun KS

On 2018-11-07 01:38, Michal Hocko wrote:

On Tue 06-11-18 21:01:29, Arun KS wrote:

On 2018-11-06 19:36, Michal Hocko wrote:
> On Tue 06-11-18 11:33:13, Arun KS wrote:
> > When free pages are done with higher order, time spend on
> > coalescing pages by buddy allocator can be reduced. With
> > section size of 256MB, hot add latency of a single section
> > shows improvement from 50-60 ms to less than 1 ms, hence
> > improving the hot add latency by 60%. Modify external
> > providers of online callback to align with the change.
> >
> > This patch modifies totalram_pages, zone->managed_pages and
> > totalhigh_pages outside managed_page_count_lock. A follow up
> > series will be send to convert these variable to atomic to
> > avoid readers potentially seeing a store tear.
>
> Is there any reason to rush this through rather than wait for counters
> conversion first?

Sure Michal.

Conversion patch, https://patchwork.kernel.org/cover/10657217/ is 
currently

incremental to this patch.


The ordering should be other way around. Because as things stand with
this patch first it is possible to introduce a subtle race prone
updates. As I've said I am skeptical the race would matter, really, but
there is no real reason to risk for that. Especially when you have the
other (first) half ready.


Makes sense. I have rebased the preparatory patch on top of -rc1.
https://patchwork.kernel.org/patch/10670787/

Regards,
Arun


Re: [PATCH v1 0/4]mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-06 Thread Arun KS

On 2018-11-07 05:52, Andrew Morton wrote:
On Fri, 26 Oct 2018 16:30:58 +0530 Arun KS  
wrote:



This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.


The whole point appears to be removal of managed_page_count_lock, yes?

Why?  What is the value of this patchset?  If "performance" then are 
any

measurements available?


Hello Andrew,

https://patchwork.kernel.org/patch/10670787/
In version 2, I have added motivation behind this conversion. Pasting 
same here,


totalram_pages, zone->managed_pages and totalhigh_pages updates are 
protected by managed_page_count_lock, but readers never care about it. 
Convert these variables to atomic to avoid readers potentially seeing a 
store tear. I don't think we have a performance improvement here.


Regards,
Arun


Re: [PATCH v1 0/4]mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-06 Thread Arun KS

On 2018-11-07 05:52, Andrew Morton wrote:
On Fri, 26 Oct 2018 16:30:58 +0530 Arun KS  
wrote:



This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.


The whole point appears to be removal of managed_page_count_lock, yes?

Why?  What is the value of this patchset?  If "performance" then are 
any

measurements available?


Hello Andrew,

https://patchwork.kernel.org/patch/10670787/
In version 2, I have added motivation behind this conversion. Pasting 
same here,


totalram_pages, zone->managed_pages and totalhigh_pages updates are 
protected by managed_page_count_lock, but readers never care about it. 
Convert these variables to atomic to avoid readers potentially seeing a 
store tear. I don't think we have a performance improvement here.


Regards,
Arun


[PATCH v2 2/4] mm: Convert zone->managed_pages to atomic variable

2018-11-06 Thread Arun KS
totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

This patch converts zone->managed_pages. Subsequent patches will
convert totalram_panges, totalhigh_pages and eventually
managed_page_count_lock will be removed.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 

---
Most of the changes are done by below coccinelle script,

@@
struct zone *z;
expression e1;
@@
(
- z->managed_pages = e1
+ atomic_long_set(>managed_pages, e1)
|
- e1->managed_pages++
+ atomic_long_inc(>managed_pages)
|
- z->managed_pages
+ zone_managed_pages(z)
)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, >managed_pages)

@@
expression z;
@@
- z.managed_pages
+ zone_managed_pages()

Then, manually apply following change,
include/linux/mmzone.h

- unsigned long managed_pages;
+ atomic_long_t managed_pages;

+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}

---
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 include/linux/mmzone.h|  9 +--
 lib/show_mem.c|  2 +-
 mm/memblock.c |  2 +-
 mm/page_alloc.c   | 44 +--
 mm/vmstat.c   |  4 ++--
 6 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 56412b0..c0e55bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -848,7 +848,7 @@ static int kfd_fill_mem_info_for_cpu(int numa_node_id, int 
*avail_size,
 */
pgdat = NODE_DATA(numa_node_id);
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-   mem_in_bytes += pgdat->node_zones[zone_type].managed_pages;
+   mem_in_bytes += 
zone_managed_pages(>node_zones[zone_type]);
mem_in_bytes <<= PAGE_SHIFT;
 
sub_type_hdr->length_low = lower_32_bits(mem_in_bytes);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 847705a..e73dc31 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -435,7 +435,7 @@ struct zone {
 * adjust_managed_page_count() should be used instead of directly
 * touching zone->managed_pages and totalram_pages.
 */
-   unsigned long   managed_pages;
+   atomic_long_t   managed_pages;
unsigned long   spanned_pages;
unsigned long   present_pages;
 
@@ -524,6 +524,11 @@ enum pgdat_flags {
PGDAT_RECLAIM_LOCKED,   /* prevents concurrent reclaim */
 };
 
+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}
+
 static inline unsigned long zone_end_pfn(const struct zone *zone)
 {
return zone->zone_start_pfn + zone->spanned_pages;
@@ -814,7 +819,7 @@ static inline bool is_dev_zone(const struct zone *zone)
  */
 static inline bool managed_zone(struct zone *zone)
 {
-   return zone->managed_pages;
+   return zone_managed_pages(zone);
 }
 
 /* Returns true if a zone has memory */
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d..eefe67d 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -28,7 +28,7 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
continue;
 
total += zone->present_pages;
-   reserved += zone->present_pages - zone->managed_pages;
+   reserved += zone->present_pages - 
zone_managed_pages(zone);
 
if (is_highmem_idx(zoneid))
highmem += zone->present_pages;
diff --git a/mm/memblock.c b/mm/memblock.c
index 7df468c..bbd82ab 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1950,7 +1950,7 @@ void reset_node_managed_pages(pg_data_t *pgdat)
struct zone *z;
 
for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
-   z->managed_pages = 0;
+   atomic_long_set(>managed_pages, 0);
 }
 
 void __init reset_all_zones_managed_pages(void)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 173312b..22e6645 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1279,7 +1279,7 @@ static void __init __free_pages_boot_core(struct page 
*page, unsigned int order)
__ClearPageReserved(p);
set_page_count(p, 0);
 
-   page_zone(page)->managed_pages += nr_pages;
+   atomic_long_add(nr_pages, _zone(page)->managed_pages);
set_page_refcounted(page);
__fre

[PATCH v2 2/4] mm: Convert zone->managed_pages to atomic variable

2018-11-06 Thread Arun KS
totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

This patch converts zone->managed_pages. Subsequent patches will
convert totalram_panges, totalhigh_pages and eventually
managed_page_count_lock will be removed.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 

---
Most of the changes are done by below coccinelle script,

@@
struct zone *z;
expression e1;
@@
(
- z->managed_pages = e1
+ atomic_long_set(>managed_pages, e1)
|
- e1->managed_pages++
+ atomic_long_inc(>managed_pages)
|
- z->managed_pages
+ zone_managed_pages(z)
)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, >managed_pages)

@@
expression z;
@@
- z.managed_pages
+ zone_managed_pages()

Then, manually apply following change,
include/linux/mmzone.h

- unsigned long managed_pages;
+ atomic_long_t managed_pages;

+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}

---
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 include/linux/mmzone.h|  9 +--
 lib/show_mem.c|  2 +-
 mm/memblock.c |  2 +-
 mm/page_alloc.c   | 44 +--
 mm/vmstat.c   |  4 ++--
 6 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 56412b0..c0e55bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -848,7 +848,7 @@ static int kfd_fill_mem_info_for_cpu(int numa_node_id, int 
*avail_size,
 */
pgdat = NODE_DATA(numa_node_id);
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-   mem_in_bytes += pgdat->node_zones[zone_type].managed_pages;
+   mem_in_bytes += 
zone_managed_pages(>node_zones[zone_type]);
mem_in_bytes <<= PAGE_SHIFT;
 
sub_type_hdr->length_low = lower_32_bits(mem_in_bytes);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 847705a..e73dc31 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -435,7 +435,7 @@ struct zone {
 * adjust_managed_page_count() should be used instead of directly
 * touching zone->managed_pages and totalram_pages.
 */
-   unsigned long   managed_pages;
+   atomic_long_t   managed_pages;
unsigned long   spanned_pages;
unsigned long   present_pages;
 
@@ -524,6 +524,11 @@ enum pgdat_flags {
PGDAT_RECLAIM_LOCKED,   /* prevents concurrent reclaim */
 };
 
+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}
+
 static inline unsigned long zone_end_pfn(const struct zone *zone)
 {
return zone->zone_start_pfn + zone->spanned_pages;
@@ -814,7 +819,7 @@ static inline bool is_dev_zone(const struct zone *zone)
  */
 static inline bool managed_zone(struct zone *zone)
 {
-   return zone->managed_pages;
+   return zone_managed_pages(zone);
 }
 
 /* Returns true if a zone has memory */
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d..eefe67d 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -28,7 +28,7 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
continue;
 
total += zone->present_pages;
-   reserved += zone->present_pages - zone->managed_pages;
+   reserved += zone->present_pages - 
zone_managed_pages(zone);
 
if (is_highmem_idx(zoneid))
highmem += zone->present_pages;
diff --git a/mm/memblock.c b/mm/memblock.c
index 7df468c..bbd82ab 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1950,7 +1950,7 @@ void reset_node_managed_pages(pg_data_t *pgdat)
struct zone *z;
 
for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
-   z->managed_pages = 0;
+   atomic_long_set(>managed_pages, 0);
 }
 
 void __init reset_all_zones_managed_pages(void)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 173312b..22e6645 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1279,7 +1279,7 @@ static void __init __free_pages_boot_core(struct page 
*page, unsigned int order)
__ClearPageReserved(p);
set_page_count(p, 0);
 
-   page_zone(page)->managed_pages += nr_pages;
+   atomic_long_add(nr_pages, _zone(page)->managed_pages);
set_page_refcounted(page);
__fre

[PATCH v2 4/4] mm: Remove managed_page_count spinlock

2018-11-06 Thread Arun KS
Now totalram_pages and managed_pages are atomic varibles. No need
of managed_page_count spinlock.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e73dc31..c71b4d9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2a42c3f..4d78bde 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);
 
-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;
@@ -7064,14 +7061,12 @@ static int __init cmdline_parse_movablecore(char *p)
 
 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);
 
-- 
1.9.1



[PATCH v2 4/4] mm: Remove managed_page_count spinlock

2018-11-06 Thread Arun KS
Now totalram_pages and managed_pages are atomic varibles. No need
of managed_page_count spinlock.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e73dc31..c71b4d9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2a42c3f..4d78bde 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);
 
-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;
@@ -7064,14 +7061,12 @@ static int __init cmdline_parse_movablecore(char *p)
 
 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);
 
-- 
1.9.1



[PATCH v2 0/4] mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-06 Thread Arun KS
This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & i386_defconfig)
on 4.20-rc1. And memory hotplug tested on arm64, but on an older version
of kernel.

Arun KS (4):
  mm: Fix multiple evaluvations of totalram_pages and managed_pages
  mm: Convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to atomic
  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  3 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c|  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c|  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h| 27 +-
 include/linux/mmzone.h| 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c|  2 +-
 mm/highmem.c  |  4 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 71 +--
 mm/shmem.c|  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c|  2 +-
 57 files changed, 193 insertions(+), 142 deletions(-)

-- 
1.9.1



[PATCH v2 1/4] mm: Fix multiple evaluvations of totalram_pages and managed_pages

2018-11-06 Thread Arun KS
This patch is in preparation to a later patch which converts totalram_pages
and zone->managed_pages to atomic variables. This patch does not introduce
any functional changes.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
---
 arch/um/kernel/mem.c |  3 +--
 arch/x86/kernel/cpu/microcode/core.c |  5 +++--
 drivers/hv/hv_balloon.c  | 19 ++-
 fs/file_table.c  |  7 ---
 kernel/fork.c|  5 +++--
 kernel/kexec_core.c  |  5 +++--
 mm/page_alloc.c  |  5 +++--
 mm/shmem.c   |  3 ++-
 net/dccp/proto.c |  7 ---
 net/netfilter/nf_conntrack_core.c|  7 ---
 net/netfilter/xt_hashlimit.c |  5 +++--
 net/sctp/protocol.c  |  7 ---
 12 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 1067469..134d3fd 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -51,8 +51,7 @@ void __init mem_init(void)
 
/* this will put all low memory onto the freelists */
memblock_free_all();
-   max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn = totalram_pages;
mem_init_print_info(NULL);
kmalloc_ok = 1;
 }
diff --git a/arch/x86/kernel/cpu/microcode/core.c 
b/arch/x86/kernel/cpu/microcode/core.c
index 2637ff0..99c67ca 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, const 
char __user *buf,
   size_t len, loff_t *ppos)
 {
ssize_t ret = -EINVAL;
+   unsigned long totalram_pgs = totalram_pages;
 
-   if ((len >> PAGE_SHIFT) > totalram_pages) {
-   pr_err("too much data (max %ld pages)\n", totalram_pages);
+   if ((len >> PAGE_SHIFT) > totalram_pgs) {
+   pr_err("too much data (max %ld pages)\n", totalram_pgs);
return ret;
}
 
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4163151..cac4945 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1090,6 +1090,7 @@ static void process_info(struct hv_dynmem_device *dm, 
struct dm_info_msg *msg)
 static unsigned long compute_balloon_floor(void)
 {
unsigned long min_pages;
+   unsigned long totalram_pgs = totalram_pages;
 #define MB2PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
/* Simple continuous piecewiese linear function:
 *  max MiB -> min MiB  gradient
@@ -1102,16 +1103,16 @@ static unsigned long compute_balloon_floor(void)
 *8192   744(1/16)
 *   32768  1512(1/32)
 */
-   if (totalram_pages < MB2PAGES(128))
-   min_pages = MB2PAGES(8) + (totalram_pages >> 1);
-   else if (totalram_pages < MB2PAGES(512))
-   min_pages = MB2PAGES(40) + (totalram_pages >> 2);
-   else if (totalram_pages < MB2PAGES(2048))
-   min_pages = MB2PAGES(104) + (totalram_pages >> 3);
-   else if (totalram_pages < MB2PAGES(8192))
-   min_pages = MB2PAGES(232) + (totalram_pages >> 4);
+   if (totalram_pgs < MB2PAGES(128))
+   min_pages = MB2PAGES(8) + (totalram_pgs >> 1);
+   else if (totalram_pgs < MB2PAGES(512))
+   min_pages = MB2PAGES(40) + (totalram_pgs >> 2);
+   else if (totalram_pgs < MB2PAGES(2048))
+   min_pages = MB2PAGES(104) + (totalram_pgs >> 3);
+   else if (totalram_pgs < MB2PAGES(8192))
+   min_pages = MB2PAGES(232) + (totalram_pgs >> 4);
else
-   min_pages = MB2PAGES(488) + (totalram_pages >> 5);
+   min_pages = MB2PAGES(488) + (totalram_pgs >> 5);
 #undef MB2PAGES
return min_pages;
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index e49af4c..6e3c088 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -380,10 +380,11 @@ void __init files_init(void)
 void __init files_maxfiles_init(void)
 {
unsigned long n;
-   unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+   unsigned long totalram_pgs = totalram_pages;
+   unsigned long memreserve = (totalram_pgs - nr_free_pages()) * 3/2;
 
-   memreserve = min(memreserve, totalram_pages - 1);
-   n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
+   memreserve = min(memreserve, totalram_pgs - 1);
+   n = ((totalram_pgs - memreserve) * (PAGE_SIZE / 1024)) / 10;
 
files_stat.max_files = max_t(unsigned long, n, NR_FILE);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index 07cddff..7823f31 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -739,15 +739,16 @@ void __init __weak arch_task_cache_init(void) { }
 stati

[PATCH v2 0/4] mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-06 Thread Arun KS
This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & i386_defconfig)
on 4.20-rc1. And memory hotplug tested on arm64, but on an older version
of kernel.

Arun KS (4):
  mm: Fix multiple evaluvations of totalram_pages and managed_pages
  mm: Convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to atomic
  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  3 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c|  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c|  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h| 27 +-
 include/linux/mmzone.h| 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c|  2 +-
 mm/highmem.c  |  4 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 71 +--
 mm/shmem.c|  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c|  2 +-
 57 files changed, 193 insertions(+), 142 deletions(-)

-- 
1.9.1



[PATCH v2 1/4] mm: Fix multiple evaluvations of totalram_pages and managed_pages

2018-11-06 Thread Arun KS
This patch is in preparation to a later patch which converts totalram_pages
and zone->managed_pages to atomic variables. This patch does not introduce
any functional changes.

Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
---
 arch/um/kernel/mem.c |  3 +--
 arch/x86/kernel/cpu/microcode/core.c |  5 +++--
 drivers/hv/hv_balloon.c  | 19 ++-
 fs/file_table.c  |  7 ---
 kernel/fork.c|  5 +++--
 kernel/kexec_core.c  |  5 +++--
 mm/page_alloc.c  |  5 +++--
 mm/shmem.c   |  3 ++-
 net/dccp/proto.c |  7 ---
 net/netfilter/nf_conntrack_core.c|  7 ---
 net/netfilter/xt_hashlimit.c |  5 +++--
 net/sctp/protocol.c  |  7 ---
 12 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 1067469..134d3fd 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -51,8 +51,7 @@ void __init mem_init(void)
 
/* this will put all low memory onto the freelists */
memblock_free_all();
-   max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn = totalram_pages;
mem_init_print_info(NULL);
kmalloc_ok = 1;
 }
diff --git a/arch/x86/kernel/cpu/microcode/core.c 
b/arch/x86/kernel/cpu/microcode/core.c
index 2637ff0..99c67ca 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, const 
char __user *buf,
   size_t len, loff_t *ppos)
 {
ssize_t ret = -EINVAL;
+   unsigned long totalram_pgs = totalram_pages;
 
-   if ((len >> PAGE_SHIFT) > totalram_pages) {
-   pr_err("too much data (max %ld pages)\n", totalram_pages);
+   if ((len >> PAGE_SHIFT) > totalram_pgs) {
+   pr_err("too much data (max %ld pages)\n", totalram_pgs);
return ret;
}
 
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4163151..cac4945 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1090,6 +1090,7 @@ static void process_info(struct hv_dynmem_device *dm, 
struct dm_info_msg *msg)
 static unsigned long compute_balloon_floor(void)
 {
unsigned long min_pages;
+   unsigned long totalram_pgs = totalram_pages;
 #define MB2PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
/* Simple continuous piecewiese linear function:
 *  max MiB -> min MiB  gradient
@@ -1102,16 +1103,16 @@ static unsigned long compute_balloon_floor(void)
 *8192   744(1/16)
 *   32768  1512(1/32)
 */
-   if (totalram_pages < MB2PAGES(128))
-   min_pages = MB2PAGES(8) + (totalram_pages >> 1);
-   else if (totalram_pages < MB2PAGES(512))
-   min_pages = MB2PAGES(40) + (totalram_pages >> 2);
-   else if (totalram_pages < MB2PAGES(2048))
-   min_pages = MB2PAGES(104) + (totalram_pages >> 3);
-   else if (totalram_pages < MB2PAGES(8192))
-   min_pages = MB2PAGES(232) + (totalram_pages >> 4);
+   if (totalram_pgs < MB2PAGES(128))
+   min_pages = MB2PAGES(8) + (totalram_pgs >> 1);
+   else if (totalram_pgs < MB2PAGES(512))
+   min_pages = MB2PAGES(40) + (totalram_pgs >> 2);
+   else if (totalram_pgs < MB2PAGES(2048))
+   min_pages = MB2PAGES(104) + (totalram_pgs >> 3);
+   else if (totalram_pgs < MB2PAGES(8192))
+   min_pages = MB2PAGES(232) + (totalram_pgs >> 4);
else
-   min_pages = MB2PAGES(488) + (totalram_pages >> 5);
+   min_pages = MB2PAGES(488) + (totalram_pgs >> 5);
 #undef MB2PAGES
return min_pages;
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index e49af4c..6e3c088 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -380,10 +380,11 @@ void __init files_init(void)
 void __init files_maxfiles_init(void)
 {
unsigned long n;
-   unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+   unsigned long totalram_pgs = totalram_pages;
+   unsigned long memreserve = (totalram_pgs - nr_free_pages()) * 3/2;
 
-   memreserve = min(memreserve, totalram_pages - 1);
-   n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
+   memreserve = min(memreserve, totalram_pgs - 1);
+   n = ((totalram_pgs - memreserve) * (PAGE_SIZE / 1024)) / 10;
 
files_stat.max_files = max_t(unsigned long, n, NR_FILE);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index 07cddff..7823f31 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -739,15 +739,16 @@ void __init __weak arch_task_cache_init(void) { }
 stati

[PATCH v2 3/4] mm: convert totalram_pages and totalhigh_pages variables to atomic

2018-11-06 Thread Arun KS
totalram_pages and totalhigh_pages are made static inline function.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 

---
coccinelle script to make most of the changes,

@@
declarer name EXPORT_SYMBOL;
symbol totalram_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalram_pages);
|
- totalram_pages = e
+ totalram_pages_set(e)
|
- totalram_pages += e
+ totalram_pages_add(e)
|
- totalram_pages++
+ totalram_pages_inc()
|
- totalram_pages--
+ totalram_pages_dec()
|
- totalram_pages
+ totalram_pages()
)

@@
symbol totalhigh_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalhigh_pages);
|
- totalhigh_pages = e
+ totalhigh_pages_set(e)
|
- totalhigh_pages += e
+ totalhigh_pages_add(e)
|
- totalhigh_pages++
+ totalhigh_pages_inc()
|
- totalhigh_pages--
+ totalhigh_pages_dec()
|
- totalhigh_pages
+ totalhigh_pages()
)

Manaually apply all changes of following files,

include/linux/highmem.h
include/linux/mm.h
include/linux/swap.h
mm/highmem.c

and for mm/page_alloc.c mannualy apply only below changes,

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 

 /* Protect totalram_pages and zone->managed_pages */
 static DEFINE_SPINLOCK(managed_page_count_lock);

-unsigned long totalram_pages __read_mostly;
+atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;
---
---
 arch/csky/mm/init.c   |  4 ++--
 arch/powerpc/platforms/pseries/cmm.c  | 10 +-
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  2 +-
 arch/x86/kernel/cpu/microcode/core.c  |  2 +-
 drivers/char/agp/backend.c|  4 ++--
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 ++--
 drivers/hv/hv_balloon.c   |  2 +-
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 ++--
 drivers/parisc/sba_iommu.c|  4 ++--
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +++---
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  2 +-
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 +--
 include/linux/mm.h| 27 +-
 include/linux/swap.h  |  1 -
 kernel/fork.c |  2 +-
 kernel/kexec_core.c   |  2 +-
 kernel/power/snapshot.c   |  2 +-
 mm/highmem.c  |  4 +---
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  4 ++--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 19 +-
 mm/shmem.c|  8 
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 ++--
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 ++--
 net/dccp/proto.c  |  2 +-
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  2 +-
 net/netfilter/xt_hashlimit.c  |  2 +-
 security/integrity/ima/ima_kexec.c|  2 +-
 52 files changed, 127 insertions(+), 80 deletions(-)

diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c
index dc07c07..66e5970 100644
--- a/arch/csky/mm/init.c
+++ b/arch/csky/mm/init.c
@@ -71,7 +71,7 @@ void free_initrd_mem(unsigned long start, unsigned long end)
ClearPageReserved(virt_to_page(start));
init_page_count(virt_to_page(start));
free_page(start);
-   totalram_pages++;
+   totalram_pages_inc();
}
 }
 #endif
@@ -88,7 +88,7 @@ void free_initmem(v

[PATCH v2 3/4] mm: convert totalram_pages and totalhigh_pages variables to atomic

2018-11-06 Thread Arun KS
totalram_pages and totalhigh_pages are made static inline function.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 
Reviewed-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 

---
coccinelle script to make most of the changes,

@@
declarer name EXPORT_SYMBOL;
symbol totalram_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalram_pages);
|
- totalram_pages = e
+ totalram_pages_set(e)
|
- totalram_pages += e
+ totalram_pages_add(e)
|
- totalram_pages++
+ totalram_pages_inc()
|
- totalram_pages--
+ totalram_pages_dec()
|
- totalram_pages
+ totalram_pages()
)

@@
symbol totalhigh_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalhigh_pages);
|
- totalhigh_pages = e
+ totalhigh_pages_set(e)
|
- totalhigh_pages += e
+ totalhigh_pages_add(e)
|
- totalhigh_pages++
+ totalhigh_pages_inc()
|
- totalhigh_pages--
+ totalhigh_pages_dec()
|
- totalhigh_pages
+ totalhigh_pages()
)

Manaually apply all changes of following files,

include/linux/highmem.h
include/linux/mm.h
include/linux/swap.h
mm/highmem.c

and for mm/page_alloc.c mannualy apply only below changes,

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 

 /* Protect totalram_pages and zone->managed_pages */
 static DEFINE_SPINLOCK(managed_page_count_lock);

-unsigned long totalram_pages __read_mostly;
+atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;
---
---
 arch/csky/mm/init.c   |  4 ++--
 arch/powerpc/platforms/pseries/cmm.c  | 10 +-
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  2 +-
 arch/x86/kernel/cpu/microcode/core.c  |  2 +-
 drivers/char/agp/backend.c|  4 ++--
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 ++--
 drivers/hv/hv_balloon.c   |  2 +-
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 ++--
 drivers/parisc/sba_iommu.c|  4 ++--
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +++---
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  2 +-
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 +--
 include/linux/mm.h| 27 +-
 include/linux/swap.h  |  1 -
 kernel/fork.c |  2 +-
 kernel/kexec_core.c   |  2 +-
 kernel/power/snapshot.c   |  2 +-
 mm/highmem.c  |  4 +---
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  4 ++--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 19 +-
 mm/shmem.c|  8 
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 ++--
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 ++--
 net/dccp/proto.c  |  2 +-
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  2 +-
 net/netfilter/xt_hashlimit.c  |  2 +-
 security/integrity/ima/ima_kexec.c|  2 +-
 52 files changed, 127 insertions(+), 80 deletions(-)

diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c
index dc07c07..66e5970 100644
--- a/arch/csky/mm/init.c
+++ b/arch/csky/mm/init.c
@@ -71,7 +71,7 @@ void free_initrd_mem(unsigned long start, unsigned long end)
ClearPageReserved(virt_to_page(start));
init_page_count(virt_to_page(start));
free_page(start);
-   totalram_pages++;
+   totalram_pages_inc();
}
 }
 #endif
@@ -88,7 +88,7 @@ void free_initmem(v

Re: [PATCH v6 1/2] memory_hotplug: Free pages as higher order

2018-11-06 Thread Arun KS

On 2018-11-06 19:36, Michal Hocko wrote:

On Tue 06-11-18 11:33:13, Arun KS wrote:

When free pages are done with higher order, time spend on
coalescing pages by buddy allocator can be reduced. With
section size of 256MB, hot add latency of a single section
shows improvement from 50-60 ms to less than 1 ms, hence
improving the hot add latency by 60%. Modify external
providers of online callback to align with the change.

This patch modifies totalram_pages, zone->managed_pages and
totalhigh_pages outside managed_page_count_lock. A follow up
series will be send to convert these variable to atomic to
avoid readers potentially seeing a store tear.


Is there any reason to rush this through rather than wait for counters
conversion first?


Sure Michal.

Conversion patch, https://patchwork.kernel.org/cover/10657217/ is 
currently incremental to this patch. I ll change the order. Will wait 
for preparatory patch to settle first.


Regards,
Arun.



The patch as is looks good to me - modulo atomic counters of course. I
cannot really judge whether existing updaters do really race in 
practice

to take this riskless.

The improvement is nice of course but this is a rare operation and 50ms
vs 1ms is hardly noticeable. So I would rather wait for the preparatory
work to settle. Btw. is there anything blocking that? It seems to be
mostly automated.


Re: [PATCH v6 1/2] memory_hotplug: Free pages as higher order

2018-11-06 Thread Arun KS

On 2018-11-06 19:36, Michal Hocko wrote:

On Tue 06-11-18 11:33:13, Arun KS wrote:

When free pages are done with higher order, time spend on
coalescing pages by buddy allocator can be reduced. With
section size of 256MB, hot add latency of a single section
shows improvement from 50-60 ms to less than 1 ms, hence
improving the hot add latency by 60%. Modify external
providers of online callback to align with the change.

This patch modifies totalram_pages, zone->managed_pages and
totalhigh_pages outside managed_page_count_lock. A follow up
series will be send to convert these variable to atomic to
avoid readers potentially seeing a store tear.


Is there any reason to rush this through rather than wait for counters
conversion first?


Sure Michal.

Conversion patch, https://patchwork.kernel.org/cover/10657217/ is 
currently incremental to this patch. I ll change the order. Will wait 
for preparatory patch to settle first.


Regards,
Arun.



The patch as is looks good to me - modulo atomic counters of course. I
cannot really judge whether existing updaters do really race in 
practice

to take this riskless.

The improvement is nice of course but this is a rare operation and 50ms
vs 1ms is hardly noticeable. So I would rather wait for the preparatory
work to settle. Btw. is there anything blocking that? It seems to be
mostly automated.


Re: [PATCH v1 0/4]mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-06 Thread Arun KS

On 2018-11-06 14:07, Konstantin Khlebnikov wrote:

On 06.11.2018 11:30, Arun KS wrote:

On 2018-11-06 13:47, Konstantin Khlebnikov wrote:

On 06.11.2018 8:38, Arun KS wrote:

Any comments?


Looks good.
Except unclear motivation behind this change.
This should be in comment of one of patch.


totalram_pages, zone->managed_pages and totalhigh_pages are sometimes 
modified outside managed_page_count_lock. Hence convert these variable 
to atomic to avoid readers potentially seeing a store tear.


So, this is just theoretical issue or splat from sanitizer.
After boot memory online\offline are strictly serialized by 
rw-semaphore.


Few instances which can race with hot add. Please see below,
https://patchwork.kernel.org/patch/10627521/

Regards,
Arun





Will update the comment.

Regards,
Arun



Reviewed-by: Konstantin Khlebnikov 



Regards,
Arun

On 2018-10-26 16:30, Arun KS wrote:

This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & 
i386_defconfig)
on tip of linux-mmotm. And memory hotplug tested on arm64, but on 
an

older version of kernel.

Arun KS (4):
  mm: Fix multiple evaluvations of totalram_pages and managed_pages
  mm: Convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to 
atomic

  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  3 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c    |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c    |  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c    |  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c    |  2 +-
 fs/nfsd/nfscache.c    |  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c    |  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h    | 27 +-
 include/linux/mmzone.h    | 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c    |  2 +-
 mm/highmem.c  |  4 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/memory_hotplug.c   |  4 +-
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 71 
+--

 mm/shmem.c    |  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c    |  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c    |  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c    |  2 +-
 58 files changed, 195 insertions(+), 144 deletions(-)


Re: [PATCH v1 0/4]mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-06 Thread Arun KS

On 2018-11-06 14:07, Konstantin Khlebnikov wrote:

On 06.11.2018 11:30, Arun KS wrote:

On 2018-11-06 13:47, Konstantin Khlebnikov wrote:

On 06.11.2018 8:38, Arun KS wrote:

Any comments?


Looks good.
Except unclear motivation behind this change.
This should be in comment of one of patch.


totalram_pages, zone->managed_pages and totalhigh_pages are sometimes 
modified outside managed_page_count_lock. Hence convert these variable 
to atomic to avoid readers potentially seeing a store tear.


So, this is just theoretical issue or splat from sanitizer.
After boot memory online\offline are strictly serialized by 
rw-semaphore.


Few instances which can race with hot add. Please see below,
https://patchwork.kernel.org/patch/10627521/

Regards,
Arun





Will update the comment.

Regards,
Arun



Reviewed-by: Konstantin Khlebnikov 



Regards,
Arun

On 2018-10-26 16:30, Arun KS wrote:

This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & 
i386_defconfig)
on tip of linux-mmotm. And memory hotplug tested on arm64, but on 
an

older version of kernel.

Arun KS (4):
  mm: Fix multiple evaluvations of totalram_pages and managed_pages
  mm: Convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to 
atomic

  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  3 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c    |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c    |  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c    |  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c    |  2 +-
 fs/nfsd/nfscache.c    |  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c    |  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h    | 27 +-
 include/linux/mmzone.h    | 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c    |  2 +-
 mm/highmem.c  |  4 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/memory_hotplug.c   |  4 +-
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 71 
+--

 mm/shmem.c    |  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c    |  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c    |  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c    |  2 +-
 58 files changed, 195 insertions(+), 144 deletions(-)


Re: [PATCH v1 0/4]mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-06 Thread Arun KS

On 2018-11-06 13:47, Konstantin Khlebnikov wrote:

On 06.11.2018 8:38, Arun KS wrote:

Any comments?


Looks good.
Except unclear motivation behind this change.
This should be in comment of one of patch.


totalram_pages, zone->managed_pages and totalhigh_pages are sometimes 
modified outside managed_page_count_lock. Hence convert these variable 
to atomic to avoid readers potentially seeing a store tear.


Will update the comment.

Regards,
Arun



Reviewed-by: Konstantin Khlebnikov 



Regards,
Arun

On 2018-10-26 16:30, Arun KS wrote:

This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & 
i386_defconfig)

on tip of linux-mmotm. And memory hotplug tested on arm64, but on an
older version of kernel.

Arun KS (4):
  mm: Fix multiple evaluvations of totalram_pages and managed_pages
  mm: Convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to atomic
  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  3 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c    |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c    |  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c    |  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c    |  2 +-
 fs/nfsd/nfscache.c    |  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c    |  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h    | 27 +-
 include/linux/mmzone.h    | 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c    |  2 +-
 mm/highmem.c  |  4 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/memory_hotplug.c   |  4 +-
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 71 
+--

 mm/shmem.c    |  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c    |  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c    |  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c    |  2 +-
 58 files changed, 195 insertions(+), 144 deletions(-)


Re: [PATCH v1 0/4]mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-06 Thread Arun KS

On 2018-11-06 13:47, Konstantin Khlebnikov wrote:

On 06.11.2018 8:38, Arun KS wrote:

Any comments?


Looks good.
Except unclear motivation behind this change.
This should be in comment of one of patch.


totalram_pages, zone->managed_pages and totalhigh_pages are sometimes 
modified outside managed_page_count_lock. Hence convert these variable 
to atomic to avoid readers potentially seeing a store tear.


Will update the comment.

Regards,
Arun



Reviewed-by: Konstantin Khlebnikov 



Regards,
Arun

On 2018-10-26 16:30, Arun KS wrote:

This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & 
i386_defconfig)

on tip of linux-mmotm. And memory hotplug tested on arm64, but on an
older version of kernel.

Arun KS (4):
  mm: Fix multiple evaluvations of totalram_pages and managed_pages
  mm: Convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to atomic
  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  3 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c    |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c    |  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c    |  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c    |  2 +-
 fs/nfsd/nfscache.c    |  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c    |  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h    | 27 +-
 include/linux/mmzone.h    | 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c    |  2 +-
 mm/highmem.c  |  4 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/memory_hotplug.c   |  4 +-
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 71 
+--

 mm/shmem.c    |  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c    |  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c    |  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c    |  2 +-
 58 files changed, 195 insertions(+), 144 deletions(-)


[PATCH v6 2/2] mm/page_alloc: remove software prefetching in __free_pages_core

2018-11-05 Thread Arun KS
They not only increase the code footprint, they actually make things
slower rather than faster. Remove them as contemporary hardware doesn't
need any hint.

Suggested-by: Dan Williams 
Signed-off-by: Arun KS 
---
 mm/page_alloc.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7cf503f..a1b9a6a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1270,14 +1270,10 @@ void __free_pages_core(struct page *page, unsigned int 
order)
struct page *p = page;
unsigned int loop;
 
-   prefetchw(p);
-   for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
-   prefetchw(p + 1);
+   for (loop = 0; loop < nr_pages ; loop++, p++) {
__ClearPageReserved(p);
set_page_count(p, 0);
}
-   __ClearPageReserved(p);
-   set_page_count(p, 0);
 
page_zone(page)->managed_pages += nr_pages;
set_page_refcounted(page);
-- 
1.9.1



[PATCH v6 2/2] mm/page_alloc: remove software prefetching in __free_pages_core

2018-11-05 Thread Arun KS
They not only increase the code footprint, they actually make things
slower rather than faster. Remove them as contemporary hardware doesn't
need any hint.

Suggested-by: Dan Williams 
Signed-off-by: Arun KS 
---
 mm/page_alloc.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7cf503f..a1b9a6a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1270,14 +1270,10 @@ void __free_pages_core(struct page *page, unsigned int 
order)
struct page *p = page;
unsigned int loop;
 
-   prefetchw(p);
-   for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
-   prefetchw(p + 1);
+   for (loop = 0; loop < nr_pages ; loop++, p++) {
__ClearPageReserved(p);
set_page_count(p, 0);
}
-   __ClearPageReserved(p);
-   set_page_count(p, 0);
 
page_zone(page)->managed_pages += nr_pages;
set_page_refcounted(page);
-- 
1.9.1



[PATCH v6 1/2] memory_hotplug: Free pages as higher order

2018-11-05 Thread Arun KS
When free pages are done with higher order, time spend on
coalescing pages by buddy allocator can be reduced. With
section size of 256MB, hot add latency of a single section
shows improvement from 50-60 ms to less than 1 ms, hence
improving the hot add latency by 60%. Modify external
providers of online callback to align with the change.

This patch modifies totalram_pages, zone->managed_pages and
totalhigh_pages outside managed_page_count_lock. A follow up
series will be send to convert these variable to atomic to
avoid readers potentially seeing a store tear.

Signed-off-by: Arun KS 
---
Changes since v5:
- Rebased to 4.20-rc1.
- Changelog updated.

Changes since v4:
- As suggested by Michal Hocko,
- Simplify logic in online_pages_block() by using get_order().
- Seperate out removal of prefetch from __free_pages_core().

Changes since v3:
- Renamed _free_pages_boot_core -> __free_pages_core.
- Removed prefetch from __free_pages_core.
- Removed xen_online_page().

Changes since v2:
- Reuse code from __free_pages_boot_core().

Changes since v1:
- Removed prefetch().

Changes since RFC:
- Rebase.
- As suggested by Michal Hocko remove pages_per_block.
- Modifed external providers of online_page_callback.

v5: https://lore.kernel.org/patchwork/patch/995739/
v4: https://lore.kernel.org/patchwork/patch/995111/
v3: https://lore.kernel.org/patchwork/patch/992348/
v2: https://lore.kernel.org/patchwork/patch/991363/
v1: https://lore.kernel.org/patchwork/patch/989445/
RFC: https://lore.kernel.org/patchwork/patch/984754/

---

Signed-off-by: Arun KS 
---
 drivers/hv/hv_balloon.c|  6 --
 drivers/xen/balloon.c  | 23 +++
 include/linux/memory_hotplug.h |  2 +-
 mm/internal.h  |  1 +
 mm/memory_hotplug.c| 42 ++
 mm/page_alloc.c|  8 
 6 files changed, 55 insertions(+), 27 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4163151..5728dc4 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -771,7 +771,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
}
 }
 
-static void hv_online_page(struct page *pg)
+static int hv_online_page(struct page *pg, unsigned int order)
 {
struct hv_hotadd_state *has;
unsigned long flags;
@@ -783,10 +783,12 @@ static void hv_online_page(struct page *pg)
if ((pfn < has->start_pfn) || (pfn >= has->end_pfn))
continue;
 
-   hv_page_online_one(has, pg);
+   hv_bring_pgs_online(has, pfn, (1UL << order));
break;
}
spin_unlock_irqrestore(_device.ha_lock, flags);
+
+   return 0;
 }
 
 static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index fdfc64f..1214828 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -390,8 +390,8 @@ static enum bp_state reserve_additional_memory(void)
 
/*
 * add_memory_resource() will call online_pages() which in its turn
-* will call xen_online_page() callback causing deadlock if we don't
-* release balloon_mutex here. Unlocking here is safe because the
+* will call xen_bring_pgs_online() callback causing deadlock if we
+* don't release balloon_mutex here. Unlocking here is safe because the
 * callers drop the mutex before trying again.
 */
mutex_unlock(_mutex);
@@ -414,15 +414,22 @@ static enum bp_state reserve_additional_memory(void)
return BP_ECANCELED;
 }
 
-static void xen_online_page(struct page *page)
+static int xen_bring_pgs_online(struct page *pg, unsigned int order)
 {
-   __online_page_set_limits(page);
+   unsigned long i, size = (1 << order);
+   unsigned long start_pfn = page_to_pfn(pg);
+   struct page *p;
 
+   pr_debug("Online %lu pages starting at pfn 0x%lx\n", size, start_pfn);
mutex_lock(_mutex);
-
-   __balloon_append(page);
-
+   for (i = 0; i < size; i++) {
+   p = pfn_to_page(start_pfn + i);
+   __online_page_set_limits(p);
+   __balloon_append(p);
+   }
mutex_unlock(_mutex);
+
+   return 0;
 }
 
 static int xen_memory_notifier(struct notifier_block *nb, unsigned long val, 
void *v)
@@ -747,7 +754,7 @@ static int __init balloon_init(void)
balloon_stats.max_retry_count = RETRY_UNLIMITED;
 
 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
-   set_online_page_callback(_online_page);
+   set_online_page_callback(_bring_pgs_online);
register_memory_notifier(_memory_nb);
register_sysctl_table(xen_root);
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ffd9cd1..84e9ae2 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -87,7 +87,7 @@ ex

[PATCH v6 1/2] memory_hotplug: Free pages as higher order

2018-11-05 Thread Arun KS
When free pages are done with higher order, time spend on
coalescing pages by buddy allocator can be reduced. With
section size of 256MB, hot add latency of a single section
shows improvement from 50-60 ms to less than 1 ms, hence
improving the hot add latency by 60%. Modify external
providers of online callback to align with the change.

This patch modifies totalram_pages, zone->managed_pages and
totalhigh_pages outside managed_page_count_lock. A follow up
series will be send to convert these variable to atomic to
avoid readers potentially seeing a store tear.

Signed-off-by: Arun KS 
---
Changes since v5:
- Rebased to 4.20-rc1.
- Changelog updated.

Changes since v4:
- As suggested by Michal Hocko,
- Simplify logic in online_pages_block() by using get_order().
- Seperate out removal of prefetch from __free_pages_core().

Changes since v3:
- Renamed _free_pages_boot_core -> __free_pages_core.
- Removed prefetch from __free_pages_core.
- Removed xen_online_page().

Changes since v2:
- Reuse code from __free_pages_boot_core().

Changes since v1:
- Removed prefetch().

Changes since RFC:
- Rebase.
- As suggested by Michal Hocko remove pages_per_block.
- Modifed external providers of online_page_callback.

v5: https://lore.kernel.org/patchwork/patch/995739/
v4: https://lore.kernel.org/patchwork/patch/995111/
v3: https://lore.kernel.org/patchwork/patch/992348/
v2: https://lore.kernel.org/patchwork/patch/991363/
v1: https://lore.kernel.org/patchwork/patch/989445/
RFC: https://lore.kernel.org/patchwork/patch/984754/

---

Signed-off-by: Arun KS 
---
 drivers/hv/hv_balloon.c|  6 --
 drivers/xen/balloon.c  | 23 +++
 include/linux/memory_hotplug.h |  2 +-
 mm/internal.h  |  1 +
 mm/memory_hotplug.c| 42 ++
 mm/page_alloc.c|  8 
 6 files changed, 55 insertions(+), 27 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4163151..5728dc4 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -771,7 +771,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
}
 }
 
-static void hv_online_page(struct page *pg)
+static int hv_online_page(struct page *pg, unsigned int order)
 {
struct hv_hotadd_state *has;
unsigned long flags;
@@ -783,10 +783,12 @@ static void hv_online_page(struct page *pg)
if ((pfn < has->start_pfn) || (pfn >= has->end_pfn))
continue;
 
-   hv_page_online_one(has, pg);
+   hv_bring_pgs_online(has, pfn, (1UL << order));
break;
}
spin_unlock_irqrestore(_device.ha_lock, flags);
+
+   return 0;
 }
 
 static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index fdfc64f..1214828 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -390,8 +390,8 @@ static enum bp_state reserve_additional_memory(void)
 
/*
 * add_memory_resource() will call online_pages() which in its turn
-* will call xen_online_page() callback causing deadlock if we don't
-* release balloon_mutex here. Unlocking here is safe because the
+* will call xen_bring_pgs_online() callback causing deadlock if we
+* don't release balloon_mutex here. Unlocking here is safe because the
 * callers drop the mutex before trying again.
 */
mutex_unlock(_mutex);
@@ -414,15 +414,22 @@ static enum bp_state reserve_additional_memory(void)
return BP_ECANCELED;
 }
 
-static void xen_online_page(struct page *page)
+static int xen_bring_pgs_online(struct page *pg, unsigned int order)
 {
-   __online_page_set_limits(page);
+   unsigned long i, size = (1 << order);
+   unsigned long start_pfn = page_to_pfn(pg);
+   struct page *p;
 
+   pr_debug("Online %lu pages starting at pfn 0x%lx\n", size, start_pfn);
mutex_lock(_mutex);
-
-   __balloon_append(page);
-
+   for (i = 0; i < size; i++) {
+   p = pfn_to_page(start_pfn + i);
+   __online_page_set_limits(p);
+   __balloon_append(p);
+   }
mutex_unlock(_mutex);
+
+   return 0;
 }
 
 static int xen_memory_notifier(struct notifier_block *nb, unsigned long val, 
void *v)
@@ -747,7 +754,7 @@ static int __init balloon_init(void)
balloon_stats.max_retry_count = RETRY_UNLIMITED;
 
 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
-   set_online_page_callback(_online_page);
+   set_online_page_callback(_bring_pgs_online);
register_memory_notifier(_memory_nb);
register_sysctl_table(xen_root);
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ffd9cd1..84e9ae2 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -87,7 +87,7 @@ ex

Re: [PATCH v1 0/4]mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-05 Thread Arun KS

Any comments?

Regards,
Arun

On 2018-10-26 16:30, Arun KS wrote:

This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & i386_defconfig)
on tip of linux-mmotm. And memory hotplug tested on arm64, but on an
older version of kernel.

Arun KS (4):
  mm: Fix multiple evaluvations of totalram_pages and managed_pages
  mm: Convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to atomic
  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  3 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c|  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c|  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h| 27 +-
 include/linux/mmzone.h| 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c|  2 +-
 mm/highmem.c  |  4 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/memory_hotplug.c   |  4 +-
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 71 
+--

 mm/shmem.c|  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c|  2 +-
 58 files changed, 195 insertions(+), 144 deletions(-)


Re: [PATCH v1 0/4]mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

2018-11-05 Thread Arun KS

Any comments?

Regards,
Arun

On 2018-10-26 16:30, Arun KS wrote:

This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & i386_defconfig)
on tip of linux-mmotm. And memory hotplug tested on arm64, but on an
older version of kernel.

Arun KS (4):
  mm: Fix multiple evaluvations of totalram_pages and managed_pages
  mm: Convert zone->managed_pages to atomic variable
  mm: convert totalram_pages and totalhigh_pages variables to atomic
  mm: Remove managed_page_count spinlock

 arch/csky/mm/init.c   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  | 10 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  3 +-
 arch/x86/kernel/cpu/microcode/core.c  |  5 +-
 drivers/char/agp/backend.c|  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
 drivers/hv/hv_balloon.c   | 19 +++
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 +-
 drivers/parisc/sba_iommu.c|  4 +-
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +--
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  7 +--
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 ++-
 include/linux/mm.h| 27 +-
 include/linux/mmzone.h| 15 +++---
 include/linux/swap.h  |  1 -
 kernel/fork.c |  5 +-
 kernel/kexec_core.c   |  5 +-
 kernel/power/snapshot.c   |  2 +-
 lib/show_mem.c|  2 +-
 mm/highmem.c  |  4 +-
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  6 +--
 mm/memory_hotplug.c   |  4 +-
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 71 
+--

 mm/shmem.c|  7 +--
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 +-
 mm/vmstat.c   |  4 +-
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 +-
 net/dccp/proto.c  |  7 +--
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  7 +--
 net/netfilter/xt_hashlimit.c  |  5 +-
 net/sctp/protocol.c   |  7 +--
 security/integrity/ima/ima_kexec.c|  2 +-
 58 files changed, 195 insertions(+), 144 deletions(-)


[PATCH v1 1/4] mm: Fix multiple evaluvations of totalram_pages and managed_pages

2018-10-26 Thread Arun KS
This patch is in preparation to a later patch which converts totalram_pages
and zone->managed_pages to atomic variables. This patch does not introduce
any functional changes.

Signed-off-by: Arun KS 
---
 arch/um/kernel/mem.c |  3 +--
 arch/x86/kernel/cpu/microcode/core.c |  5 +++--
 drivers/hv/hv_balloon.c  | 19 ++-
 fs/file_table.c  |  7 ---
 kernel/fork.c|  5 +++--
 kernel/kexec_core.c  |  5 +++--
 mm/page_alloc.c  |  5 +++--
 mm/shmem.c   |  3 ++-
 net/dccp/proto.c |  7 ---
 net/netfilter/nf_conntrack_core.c|  7 ---
 net/netfilter/xt_hashlimit.c |  5 +++--
 net/sctp/protocol.c  |  7 ---
 12 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 1067469..134d3fd 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -51,8 +51,7 @@ void __init mem_init(void)
 
/* this will put all low memory onto the freelists */
memblock_free_all();
-   max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn = totalram_pages;
mem_init_print_info(NULL);
kmalloc_ok = 1;
 }
diff --git a/arch/x86/kernel/cpu/microcode/core.c 
b/arch/x86/kernel/cpu/microcode/core.c
index 2637ff0..99c67ca 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, const 
char __user *buf,
   size_t len, loff_t *ppos)
 {
ssize_t ret = -EINVAL;
+   unsigned long totalram_pgs = totalram_pages;
 
-   if ((len >> PAGE_SHIFT) > totalram_pages) {
-   pr_err("too much data (max %ld pages)\n", totalram_pages);
+   if ((len >> PAGE_SHIFT) > totalram_pgs) {
+   pr_err("too much data (max %ld pages)\n", totalram_pgs);
return ret;
}
 
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index c5bc0b5..2a60f9a 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1092,6 +1092,7 @@ static void process_info(struct hv_dynmem_device *dm, 
struct dm_info_msg *msg)
 static unsigned long compute_balloon_floor(void)
 {
unsigned long min_pages;
+   unsigned long totalram_pgs = totalram_pages;
 #define MB2PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
/* Simple continuous piecewiese linear function:
 *  max MiB -> min MiB  gradient
@@ -1104,16 +1105,16 @@ static unsigned long compute_balloon_floor(void)
 *8192   744(1/16)
 *   32768  1512(1/32)
 */
-   if (totalram_pages < MB2PAGES(128))
-   min_pages = MB2PAGES(8) + (totalram_pages >> 1);
-   else if (totalram_pages < MB2PAGES(512))
-   min_pages = MB2PAGES(40) + (totalram_pages >> 2);
-   else if (totalram_pages < MB2PAGES(2048))
-   min_pages = MB2PAGES(104) + (totalram_pages >> 3);
-   else if (totalram_pages < MB2PAGES(8192))
-   min_pages = MB2PAGES(232) + (totalram_pages >> 4);
+   if (totalram_pgs < MB2PAGES(128))
+   min_pages = MB2PAGES(8) + (totalram_pgs >> 1);
+   else if (totalram_pgs < MB2PAGES(512))
+   min_pages = MB2PAGES(40) + (totalram_pgs >> 2);
+   else if (totalram_pgs < MB2PAGES(2048))
+   min_pages = MB2PAGES(104) + (totalram_pgs >> 3);
+   else if (totalram_pgs < MB2PAGES(8192))
+   min_pages = MB2PAGES(232) + (totalram_pgs >> 4);
else
-   min_pages = MB2PAGES(488) + (totalram_pages >> 5);
+   min_pages = MB2PAGES(488) + (totalram_pgs >> 5);
 #undef MB2PAGES
return min_pages;
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index e03c8d1..5d36655 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -383,10 +383,11 @@ void __init files_init(void)
 void __init files_maxfiles_init(void)
 {
unsigned long n;
-   unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+   unsigned long totalram_pgs = totalram_pages;
+   unsigned long memreserve = (totalram_pgs - nr_free_pages()) * 3/2;
 
-   memreserve = min(memreserve, totalram_pages - 1);
-   n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
+   memreserve = min(memreserve, totalram_pgs - 1);
+   n = ((totalram_pgs - memreserve) * (PAGE_SIZE / 1024)) / 10;
 
files_stat.max_files = max_t(unsigned long, n, NR_FILE);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index 2f78d32..63d57f7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -739,15 +739,16 @@ void __init __weak arch_task_cache_init(void) { }
 static void set_max_threads(unsigned int 

[PATCH v1 1/4] mm: Fix multiple evaluvations of totalram_pages and managed_pages

2018-10-26 Thread Arun KS
This patch is in preparation to a later patch which converts totalram_pages
and zone->managed_pages to atomic variables. This patch does not introduce
any functional changes.

Signed-off-by: Arun KS 
---
 arch/um/kernel/mem.c |  3 +--
 arch/x86/kernel/cpu/microcode/core.c |  5 +++--
 drivers/hv/hv_balloon.c  | 19 ++-
 fs/file_table.c  |  7 ---
 kernel/fork.c|  5 +++--
 kernel/kexec_core.c  |  5 +++--
 mm/page_alloc.c  |  5 +++--
 mm/shmem.c   |  3 ++-
 net/dccp/proto.c |  7 ---
 net/netfilter/nf_conntrack_core.c|  7 ---
 net/netfilter/xt_hashlimit.c |  5 +++--
 net/sctp/protocol.c  |  7 ---
 12 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 1067469..134d3fd 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -51,8 +51,7 @@ void __init mem_init(void)
 
/* this will put all low memory onto the freelists */
memblock_free_all();
-   max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn = totalram_pages;
mem_init_print_info(NULL);
kmalloc_ok = 1;
 }
diff --git a/arch/x86/kernel/cpu/microcode/core.c 
b/arch/x86/kernel/cpu/microcode/core.c
index 2637ff0..99c67ca 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, const 
char __user *buf,
   size_t len, loff_t *ppos)
 {
ssize_t ret = -EINVAL;
+   unsigned long totalram_pgs = totalram_pages;
 
-   if ((len >> PAGE_SHIFT) > totalram_pages) {
-   pr_err("too much data (max %ld pages)\n", totalram_pages);
+   if ((len >> PAGE_SHIFT) > totalram_pgs) {
+   pr_err("too much data (max %ld pages)\n", totalram_pgs);
return ret;
}
 
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index c5bc0b5..2a60f9a 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1092,6 +1092,7 @@ static void process_info(struct hv_dynmem_device *dm, 
struct dm_info_msg *msg)
 static unsigned long compute_balloon_floor(void)
 {
unsigned long min_pages;
+   unsigned long totalram_pgs = totalram_pages;
 #define MB2PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
/* Simple continuous piecewiese linear function:
 *  max MiB -> min MiB  gradient
@@ -1104,16 +1105,16 @@ static unsigned long compute_balloon_floor(void)
 *8192   744(1/16)
 *   32768  1512(1/32)
 */
-   if (totalram_pages < MB2PAGES(128))
-   min_pages = MB2PAGES(8) + (totalram_pages >> 1);
-   else if (totalram_pages < MB2PAGES(512))
-   min_pages = MB2PAGES(40) + (totalram_pages >> 2);
-   else if (totalram_pages < MB2PAGES(2048))
-   min_pages = MB2PAGES(104) + (totalram_pages >> 3);
-   else if (totalram_pages < MB2PAGES(8192))
-   min_pages = MB2PAGES(232) + (totalram_pages >> 4);
+   if (totalram_pgs < MB2PAGES(128))
+   min_pages = MB2PAGES(8) + (totalram_pgs >> 1);
+   else if (totalram_pgs < MB2PAGES(512))
+   min_pages = MB2PAGES(40) + (totalram_pgs >> 2);
+   else if (totalram_pgs < MB2PAGES(2048))
+   min_pages = MB2PAGES(104) + (totalram_pgs >> 3);
+   else if (totalram_pgs < MB2PAGES(8192))
+   min_pages = MB2PAGES(232) + (totalram_pgs >> 4);
else
-   min_pages = MB2PAGES(488) + (totalram_pages >> 5);
+   min_pages = MB2PAGES(488) + (totalram_pgs >> 5);
 #undef MB2PAGES
return min_pages;
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index e03c8d1..5d36655 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -383,10 +383,11 @@ void __init files_init(void)
 void __init files_maxfiles_init(void)
 {
unsigned long n;
-   unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+   unsigned long totalram_pgs = totalram_pages;
+   unsigned long memreserve = (totalram_pgs - nr_free_pages()) * 3/2;
 
-   memreserve = min(memreserve, totalram_pages - 1);
-   n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
+   memreserve = min(memreserve, totalram_pgs - 1);
+   n = ((totalram_pgs - memreserve) * (PAGE_SIZE / 1024)) / 10;
 
files_stat.max_files = max_t(unsigned long, n, NR_FILE);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index 2f78d32..63d57f7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -739,15 +739,16 @@ void __init __weak arch_task_cache_init(void) { }
 static void set_max_threads(unsigned int 

[PATCH v1 4/4] mm: Remove managed_page_count spinlock

2018-10-26 Thread Arun KS
Now totalram_pages and managed_pages are atomic varibles. No need
of managed_page_count spinlock.

Signed-off-by: Arun KS 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 597b0c7..aa960f6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index af832de..e29e78f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);
 
-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;
@@ -7062,14 +7059,12 @@ static int __init cmdline_parse_movablecore(char *p)
 
 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);
 
-- 
1.9.1



[PATCH v1 3/4] mm: convert totalram_pages and totalhigh_pages variables to atomic

2018-10-26 Thread Arun KS
totalram_pages and totalhigh_pages are made static inline function.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 

---
coccinelle script to make most of the changes,

@@
declarer name EXPORT_SYMBOL;
symbol totalram_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalram_pages);
|
- totalram_pages = e
+ totalram_pages_set(e)
|
- totalram_pages += e
+ totalram_pages_add(e)
|
- totalram_pages++
+ totalram_pages_inc()
|
- totalram_pages--
+ totalram_pages_dec()
|
- totalram_pages
+ totalram_pages()
)

@@
symbol totalhigh_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalhigh_pages);
|
- totalhigh_pages = e
+ totalhigh_pages_set(e)
|
- totalhigh_pages += e
+ totalhigh_pages_add(e)
|
- totalhigh_pages++
+ totalhigh_pages_inc()
|
- totalhigh_pages--
+ totalhigh_pages_dec()
|
- totalhigh_pages
+ totalhigh_pages()
)

Manaually apply all changes of following files,

include/linux/highmem.h
include/linux/mm.h
include/linux/swap.h
mm/highmem.c

and for mm/page_alloc.c mannualy apply only below changes,

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 

 /* Protect totalram_pages and zone->managed_pages */
 static DEFINE_SPINLOCK(managed_page_count_lock);

-unsigned long totalram_pages __read_mostly;
+atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;
---
 arch/csky/mm/init.c   |  4 ++--
 arch/powerpc/platforms/pseries/cmm.c  | 10 +-
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  2 +-
 arch/x86/kernel/cpu/microcode/core.c  |  2 +-
 drivers/char/agp/backend.c|  4 ++--
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 ++--
 drivers/hv/hv_balloon.c   |  2 +-
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 ++--
 drivers/parisc/sba_iommu.c|  4 ++--
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +++---
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  2 +-
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 +--
 include/linux/mm.h| 27 +-
 include/linux/swap.h  |  1 -
 kernel/fork.c |  2 +-
 kernel/kexec_core.c   |  2 +-
 kernel/power/snapshot.c   |  2 +-
 mm/highmem.c  |  4 +---
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  4 ++--
 mm/memory_hotplug.c   |  4 ++--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 19 +-
 mm/shmem.c|  8 
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 ++--
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 ++--
 net/dccp/proto.c  |  2 +-
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  2 +-
 net/netfilter/xt_hashlimit.c  |  2 +-
 security/integrity/ima/ima_kexec.c|  2 +-
 53 files changed, 129 insertions(+), 82 deletions(-)

diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c
index dc07c07..66e5970 100644
--- a/arch/csky/mm/init.c
+++ b/arch/csky/mm/init.c
@@ -71,7 +71,7 @@ void free_initrd_mem(unsigned long start, unsigned long end)
ClearPageReserved(virt_to_page(start));
init_page_count(virt_to_page(start));
free_page(start);
-   totalram_pages++;
+   totalram_pages_inc();
}
 }
 #endif
@@ -88,7 +88,7 @@ void free_initmem(void)
ClearPageReser

[PATCH v1 2/4] mm: Convert zone->managed_pages to atomic variable

2018-10-26 Thread Arun KS
managed_page_count_lock will be removed in subsequent patch after
totalram_pages and totalhigh_pages are converted to atomic.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 

---
Most of the changes are done by below coccinelle script,

@@
struct zone *z;
expression e1;
@@
(
- z->managed_pages = e1
+ atomic_long_set(>managed_pages, e1)
|
- e1->managed_pages++
+ atomic_long_inc(>managed_pages)
|
- z->managed_pages
+ zone_managed_pages(z)
)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, >managed_pages)

@@
expression z;
@@
- z.managed_pages
+ zone_managed_pages()

Then, manually apply following change,
include/linux/mmzone.h

- unsigned long managed_pages;
+ atomic_long_t managed_pages;

+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}

---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 include/linux/mmzone.h|  9 +--
 lib/show_mem.c|  2 +-
 mm/memblock.c |  2 +-
 mm/page_alloc.c   | 44 +--
 mm/vmstat.c   |  4 ++--
 6 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 56412b0..c0e55bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -848,7 +848,7 @@ static int kfd_fill_mem_info_for_cpu(int numa_node_id, int 
*avail_size,
 */
pgdat = NODE_DATA(numa_node_id);
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-   mem_in_bytes += pgdat->node_zones[zone_type].managed_pages;
+   mem_in_bytes += 
zone_managed_pages(>node_zones[zone_type]);
mem_in_bytes <<= PAGE_SHIFT;
 
sub_type_hdr->length_low = lower_32_bits(mem_in_bytes);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 809..597b0c7 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -435,7 +435,7 @@ struct zone {
 * adjust_managed_page_count() should be used instead of directly
 * touching zone->managed_pages and totalram_pages.
 */
-   unsigned long   managed_pages;
+   atomic_long_t   managed_pages;
unsigned long   spanned_pages;
unsigned long   present_pages;
 
@@ -524,6 +524,11 @@ enum pgdat_flags {
PGDAT_RECLAIM_LOCKED,   /* prevents concurrent reclaim */
 };
 
+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}
+
 static inline unsigned long zone_end_pfn(const struct zone *zone)
 {
return zone->zone_start_pfn + zone->spanned_pages;
@@ -814,7 +819,7 @@ static inline bool is_dev_zone(const struct zone *zone)
  */
 static inline bool managed_zone(struct zone *zone)
 {
-   return zone->managed_pages;
+   return zone_managed_pages(zone);
 }
 
 /* Returns true if a zone has memory */
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d..eefe67d 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -28,7 +28,7 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
continue;
 
total += zone->present_pages;
-   reserved += zone->present_pages - zone->managed_pages;
+   reserved += zone->present_pages - 
zone_managed_pages(zone);
 
if (is_highmem_idx(zoneid))
highmem += zone->present_pages;
diff --git a/mm/memblock.c b/mm/memblock.c
index eddcac2..14a6219 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2001,7 +2001,7 @@ void reset_node_managed_pages(pg_data_t *pgdat)
struct zone *z;
 
for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
-   z->managed_pages = 0;
+   atomic_long_set(>managed_pages, 0);
 }
 
 void __init reset_all_zones_managed_pages(void)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f045191..f077849 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1275,7 +1275,7 @@ void __free_pages_core(struct page *page, unsigned int 
order)
set_page_count(p, 0);
}
 
-   page_zone(page)->managed_pages += nr_pages;
+   atomic_long_add(nr_pages, _zone(page)->managed_pages);
set_page_refcounted(page);
__free_pages(page, order);
 }
@@ -2254,7 +2254,7 @@ static void reserve_highatomic_pageblock(struct page 
*page, struct zone *zone,
 * Limit the number reserved to 1 pageblock or roughly 1% of a zone.
 * Check is race-prone but harmless.
 */
-   max_managed = (zone->managed_pages / 100) + pageblock_nr_pages;
+   max_managed =

[PATCH v1 4/4] mm: Remove managed_page_count spinlock

2018-10-26 Thread Arun KS
Now totalram_pages and managed_pages are atomic varibles. No need
of managed_page_count spinlock.

Signed-off-by: Arun KS 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 597b0c7..aa960f6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index af832de..e29e78f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);
 
-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;
@@ -7062,14 +7059,12 @@ static int __init cmdline_parse_movablecore(char *p)
 
 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);
 
-- 
1.9.1



[PATCH v1 3/4] mm: convert totalram_pages and totalhigh_pages variables to atomic

2018-10-26 Thread Arun KS
totalram_pages and totalhigh_pages are made static inline function.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 

---
coccinelle script to make most of the changes,

@@
declarer name EXPORT_SYMBOL;
symbol totalram_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalram_pages);
|
- totalram_pages = e
+ totalram_pages_set(e)
|
- totalram_pages += e
+ totalram_pages_add(e)
|
- totalram_pages++
+ totalram_pages_inc()
|
- totalram_pages--
+ totalram_pages_dec()
|
- totalram_pages
+ totalram_pages()
)

@@
symbol totalhigh_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalhigh_pages);
|
- totalhigh_pages = e
+ totalhigh_pages_set(e)
|
- totalhigh_pages += e
+ totalhigh_pages_add(e)
|
- totalhigh_pages++
+ totalhigh_pages_inc()
|
- totalhigh_pages--
+ totalhigh_pages_dec()
|
- totalhigh_pages
+ totalhigh_pages()
)

Manaually apply all changes of following files,

include/linux/highmem.h
include/linux/mm.h
include/linux/swap.h
mm/highmem.c

and for mm/page_alloc.c mannualy apply only below changes,

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 

 /* Protect totalram_pages and zone->managed_pages */
 static DEFINE_SPINLOCK(managed_page_count_lock);

-unsigned long totalram_pages __read_mostly;
+atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;
---
 arch/csky/mm/init.c   |  4 ++--
 arch/powerpc/platforms/pseries/cmm.c  | 10 +-
 arch/s390/mm/init.c   |  2 +-
 arch/um/kernel/mem.c  |  2 +-
 arch/x86/kernel/cpu/microcode/core.c  |  2 +-
 drivers/char/agp/backend.c|  4 ++--
 drivers/gpu/drm/i915/i915_gem.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 ++--
 drivers/hv/hv_balloon.c   |  2 +-
 drivers/md/dm-bufio.c |  2 +-
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-stats.c |  2 +-
 drivers/media/platform/mtk-vpu/mtk_vpu.c  |  2 +-
 drivers/misc/vmw_balloon.c|  2 +-
 drivers/parisc/ccio-dma.c |  4 ++--
 drivers/parisc/sba_iommu.c|  4 ++--
 drivers/staging/android/ion/ion_system_heap.c |  2 +-
 drivers/xen/xen-selfballoon.c |  6 +++---
 fs/ceph/super.h   |  2 +-
 fs/file_table.c   |  2 +-
 fs/fuse/inode.c   |  2 +-
 fs/nfs/write.c|  2 +-
 fs/nfsd/nfscache.c|  2 +-
 fs/ntfs/malloc.h  |  2 +-
 fs/proc/base.c|  2 +-
 include/linux/highmem.h   | 28 +--
 include/linux/mm.h| 27 +-
 include/linux/swap.h  |  1 -
 kernel/fork.c |  2 +-
 kernel/kexec_core.c   |  2 +-
 kernel/power/snapshot.c   |  2 +-
 mm/highmem.c  |  4 +---
 mm/huge_memory.c  |  2 +-
 mm/kasan/quarantine.c |  2 +-
 mm/memblock.c |  4 ++--
 mm/memory_hotplug.c   |  4 ++--
 mm/mm_init.c  |  2 +-
 mm/oom_kill.c |  2 +-
 mm/page_alloc.c   | 19 +-
 mm/shmem.c|  8 
 mm/slab.c |  2 +-
 mm/swap.c |  2 +-
 mm/util.c |  2 +-
 mm/vmalloc.c  |  4 ++--
 mm/workingset.c   |  2 +-
 mm/zswap.c|  4 ++--
 net/dccp/proto.c  |  2 +-
 net/decnet/dn_route.c |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/netfilter/nf_conntrack_core.c |  2 +-
 net/netfilter/xt_hashlimit.c  |  2 +-
 security/integrity/ima/ima_kexec.c|  2 +-
 53 files changed, 129 insertions(+), 82 deletions(-)

diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c
index dc07c07..66e5970 100644
--- a/arch/csky/mm/init.c
+++ b/arch/csky/mm/init.c
@@ -71,7 +71,7 @@ void free_initrd_mem(unsigned long start, unsigned long end)
ClearPageReserved(virt_to_page(start));
init_page_count(virt_to_page(start));
free_page(start);
-   totalram_pages++;
+   totalram_pages_inc();
}
 }
 #endif
@@ -88,7 +88,7 @@ void free_initmem(void)
ClearPageReser

[PATCH v1 2/4] mm: Convert zone->managed_pages to atomic variable

2018-10-26 Thread Arun KS
managed_page_count_lock will be removed in subsequent patch after
totalram_pages and totalhigh_pages are converted to atomic.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 

---
Most of the changes are done by below coccinelle script,

@@
struct zone *z;
expression e1;
@@
(
- z->managed_pages = e1
+ atomic_long_set(>managed_pages, e1)
|
- e1->managed_pages++
+ atomic_long_inc(>managed_pages)
|
- z->managed_pages
+ zone_managed_pages(z)
)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, >managed_pages)

@@
expression z;
@@
- z.managed_pages
+ zone_managed_pages()

Then, manually apply following change,
include/linux/mmzone.h

- unsigned long managed_pages;
+ atomic_long_t managed_pages;

+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}

---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 include/linux/mmzone.h|  9 +--
 lib/show_mem.c|  2 +-
 mm/memblock.c |  2 +-
 mm/page_alloc.c   | 44 +--
 mm/vmstat.c   |  4 ++--
 6 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 56412b0..c0e55bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -848,7 +848,7 @@ static int kfd_fill_mem_info_for_cpu(int numa_node_id, int 
*avail_size,
 */
pgdat = NODE_DATA(numa_node_id);
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-   mem_in_bytes += pgdat->node_zones[zone_type].managed_pages;
+   mem_in_bytes += 
zone_managed_pages(>node_zones[zone_type]);
mem_in_bytes <<= PAGE_SHIFT;
 
sub_type_hdr->length_low = lower_32_bits(mem_in_bytes);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 809..597b0c7 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -435,7 +435,7 @@ struct zone {
 * adjust_managed_page_count() should be used instead of directly
 * touching zone->managed_pages and totalram_pages.
 */
-   unsigned long   managed_pages;
+   atomic_long_t   managed_pages;
unsigned long   spanned_pages;
unsigned long   present_pages;
 
@@ -524,6 +524,11 @@ enum pgdat_flags {
PGDAT_RECLAIM_LOCKED,   /* prevents concurrent reclaim */
 };
 
+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}
+
 static inline unsigned long zone_end_pfn(const struct zone *zone)
 {
return zone->zone_start_pfn + zone->spanned_pages;
@@ -814,7 +819,7 @@ static inline bool is_dev_zone(const struct zone *zone)
  */
 static inline bool managed_zone(struct zone *zone)
 {
-   return zone->managed_pages;
+   return zone_managed_pages(zone);
 }
 
 /* Returns true if a zone has memory */
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d..eefe67d 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -28,7 +28,7 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
continue;
 
total += zone->present_pages;
-   reserved += zone->present_pages - zone->managed_pages;
+   reserved += zone->present_pages - 
zone_managed_pages(zone);
 
if (is_highmem_idx(zoneid))
highmem += zone->present_pages;
diff --git a/mm/memblock.c b/mm/memblock.c
index eddcac2..14a6219 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2001,7 +2001,7 @@ void reset_node_managed_pages(pg_data_t *pgdat)
struct zone *z;
 
for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
-   z->managed_pages = 0;
+   atomic_long_set(>managed_pages, 0);
 }
 
 void __init reset_all_zones_managed_pages(void)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f045191..f077849 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1275,7 +1275,7 @@ void __free_pages_core(struct page *page, unsigned int 
order)
set_page_count(p, 0);
}
 
-   page_zone(page)->managed_pages += nr_pages;
+   atomic_long_add(nr_pages, _zone(page)->managed_pages);
set_page_refcounted(page);
__free_pages(page, order);
 }
@@ -2254,7 +2254,7 @@ static void reserve_highatomic_pageblock(struct page 
*page, struct zone *zone,
 * Limit the number reserved to 1 pageblock or roughly 1% of a zone.
 * Check is race-prone but harmless.
 */
-   max_managed = (zone->managed_pages / 100) + pageblock_nr_pages;
+   max_managed =

[PATCH v1 4/4] mm: Remove managed_page_count spinlock

2018-10-26 Thread Arun KS
Now totalram_pages and managed_pages are atomic varibles. No need
of managed_page_count spinlock.

Signed-off-by: Arun KS 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 597b0c7..aa960f6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index af832de..e29e78f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);
 
-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;
@@ -7062,14 +7059,12 @@ static int __init cmdline_parse_movablecore(char *p)
 
 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);
 
-- 
1.9.1



[PATCH v1 4/4] mm: Remove managed_page_count spinlock

2018-10-26 Thread Arun KS
Now totalram_pages and managed_pages are atomic varibles. No need
of managed_page_count spinlock.

Signed-off-by: Arun KS 
---
 include/linux/mmzone.h | 6 --
 mm/page_alloc.c| 5 -
 2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 597b0c7..aa960f6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
 * Write access to present_pages at runtime should be protected by
 * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
 * present_pages should get_online_mems() to get a stable value.
-*
-* Read access to managed_pages should be safe because it's unsigned
-* long. Write access to zone->managed_pages and totalram_pages are
-* protected by managed_page_count_lock at runtime. Idealy only
-* adjust_managed_page_count() should be used instead of directly
-* touching zone->managed_pages and totalram_pages.
 */
atomic_long_t   managed_pages;
unsigned long   spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index af832de..e29e78f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
 };
 EXPORT_SYMBOL(node_states);
 
-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
 atomic_long_t _totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;
@@ -7062,14 +7059,12 @@ static int __init cmdline_parse_movablecore(char *p)
 
 void adjust_managed_page_count(struct page *page, long count)
 {
-   spin_lock(_page_count_lock);
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
 #endif
-   spin_unlock(_page_count_lock);
 }
 EXPORT_SYMBOL(adjust_managed_page_count);
 
-- 
1.9.1



[PATCH v1 2/4] mm: Convert zone->managed_pages to atomic variable

2018-10-26 Thread Arun KS
managed_page_count_lock will be removed in subsequent patch after
totalram_pages and totalhigh_pages are converted to atomic.

Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Signed-off-by: Arun KS 

---
Most of the changes are done by below coccinelle script,

@@
struct zone *z;
expression e1;
@@
(
- z->managed_pages = e1
+ atomic_long_set(>managed_pages, e1)
|
- e1->managed_pages++
+ atomic_long_inc(>managed_pages)
|
- z->managed_pages
+ zone_managed_pages(z)
)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, >managed_pages)

@@
expression z;
@@
- z.managed_pages
+ zone_managed_pages()

Then, manually apply following change,
include/linux/mmzone.h

- unsigned long managed_pages;
+ atomic_long_t managed_pages;

+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}

---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  2 +-
 include/linux/mmzone.h|  9 +--
 lib/show_mem.c|  2 +-
 mm/memblock.c |  2 +-
 mm/page_alloc.c   | 44 +--
 mm/vmstat.c   |  4 ++--
 6 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 56412b0..c0e55bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -848,7 +848,7 @@ static int kfd_fill_mem_info_for_cpu(int numa_node_id, int 
*avail_size,
 */
pgdat = NODE_DATA(numa_node_id);
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-   mem_in_bytes += pgdat->node_zones[zone_type].managed_pages;
+   mem_in_bytes += 
zone_managed_pages(>node_zones[zone_type]);
mem_in_bytes <<= PAGE_SHIFT;
 
sub_type_hdr->length_low = lower_32_bits(mem_in_bytes);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 809..597b0c7 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -435,7 +435,7 @@ struct zone {
 * adjust_managed_page_count() should be used instead of directly
 * touching zone->managed_pages and totalram_pages.
 */
-   unsigned long   managed_pages;
+   atomic_long_t   managed_pages;
unsigned long   spanned_pages;
unsigned long   present_pages;
 
@@ -524,6 +524,11 @@ enum pgdat_flags {
PGDAT_RECLAIM_LOCKED,   /* prevents concurrent reclaim */
 };
 
+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+   return (unsigned long)atomic_long_read(>managed_pages);
+}
+
 static inline unsigned long zone_end_pfn(const struct zone *zone)
 {
return zone->zone_start_pfn + zone->spanned_pages;
@@ -814,7 +819,7 @@ static inline bool is_dev_zone(const struct zone *zone)
  */
 static inline bool managed_zone(struct zone *zone)
 {
-   return zone->managed_pages;
+   return zone_managed_pages(zone);
 }
 
 /* Returns true if a zone has memory */
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d..eefe67d 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -28,7 +28,7 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
continue;
 
total += zone->present_pages;
-   reserved += zone->present_pages - zone->managed_pages;
+   reserved += zone->present_pages - 
zone_managed_pages(zone);
 
if (is_highmem_idx(zoneid))
highmem += zone->present_pages;
diff --git a/mm/memblock.c b/mm/memblock.c
index eddcac2..14a6219 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2001,7 +2001,7 @@ void reset_node_managed_pages(pg_data_t *pgdat)
struct zone *z;
 
for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
-   z->managed_pages = 0;
+   atomic_long_set(>managed_pages, 0);
 }
 
 void __init reset_all_zones_managed_pages(void)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f045191..f077849 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1275,7 +1275,7 @@ void __free_pages_core(struct page *page, unsigned int 
order)
set_page_count(p, 0);
}
 
-   page_zone(page)->managed_pages += nr_pages;
+   atomic_long_add(nr_pages, _zone(page)->managed_pages);
set_page_refcounted(page);
__free_pages(page, order);
 }
@@ -2254,7 +2254,7 @@ static void reserve_highatomic_pageblock(struct page 
*page, struct zone *zone,
 * Limit the number reserved to 1 pageblock or roughly 1% of a zone.
 * Check is race-prone but harmless.
 */
-   max_managed = (zone->managed_pages / 100) + pageblock_nr_pages;
+   max_managed =

[PATCH v1 1/4] mm: Fix multiple evaluvations of totalram_pages and managed_pages

2018-10-26 Thread Arun KS
This patch is in preparation to a later patch which converts totalram_pages
and zone->managed_pages to atomic variables. This patch does not introduce
any functional changes.

Signed-off-by: Arun KS 
---
 arch/um/kernel/mem.c |  3 +--
 arch/x86/kernel/cpu/microcode/core.c |  5 +++--
 drivers/hv/hv_balloon.c  | 19 ++-
 fs/file_table.c  |  7 ---
 kernel/fork.c|  5 +++--
 kernel/kexec_core.c  |  5 +++--
 mm/page_alloc.c  |  5 +++--
 mm/shmem.c   |  3 ++-
 net/dccp/proto.c |  7 ---
 net/netfilter/nf_conntrack_core.c|  7 ---
 net/netfilter/xt_hashlimit.c |  5 +++--
 net/sctp/protocol.c  |  7 ---
 12 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 1067469..134d3fd 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -51,8 +51,7 @@ void __init mem_init(void)
 
/* this will put all low memory onto the freelists */
memblock_free_all();
-   max_low_pfn = totalram_pages;
-   max_pfn = totalram_pages;
+   max_pfn = max_low_pfn = totalram_pages;
mem_init_print_info(NULL);
kmalloc_ok = 1;
 }
diff --git a/arch/x86/kernel/cpu/microcode/core.c 
b/arch/x86/kernel/cpu/microcode/core.c
index 2637ff0..99c67ca 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, const 
char __user *buf,
   size_t len, loff_t *ppos)
 {
ssize_t ret = -EINVAL;
+   unsigned long totalram_pgs = totalram_pages;
 
-   if ((len >> PAGE_SHIFT) > totalram_pages) {
-   pr_err("too much data (max %ld pages)\n", totalram_pages);
+   if ((len >> PAGE_SHIFT) > totalram_pgs) {
+   pr_err("too much data (max %ld pages)\n", totalram_pgs);
return ret;
}
 
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index c5bc0b5..2a60f9a 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1092,6 +1092,7 @@ static void process_info(struct hv_dynmem_device *dm, 
struct dm_info_msg *msg)
 static unsigned long compute_balloon_floor(void)
 {
unsigned long min_pages;
+   unsigned long totalram_pgs = totalram_pages;
 #define MB2PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
/* Simple continuous piecewiese linear function:
 *  max MiB -> min MiB  gradient
@@ -1104,16 +1105,16 @@ static unsigned long compute_balloon_floor(void)
 *8192   744(1/16)
 *   32768  1512(1/32)
 */
-   if (totalram_pages < MB2PAGES(128))
-   min_pages = MB2PAGES(8) + (totalram_pages >> 1);
-   else if (totalram_pages < MB2PAGES(512))
-   min_pages = MB2PAGES(40) + (totalram_pages >> 2);
-   else if (totalram_pages < MB2PAGES(2048))
-   min_pages = MB2PAGES(104) + (totalram_pages >> 3);
-   else if (totalram_pages < MB2PAGES(8192))
-   min_pages = MB2PAGES(232) + (totalram_pages >> 4);
+   if (totalram_pgs < MB2PAGES(128))
+   min_pages = MB2PAGES(8) + (totalram_pgs >> 1);
+   else if (totalram_pgs < MB2PAGES(512))
+   min_pages = MB2PAGES(40) + (totalram_pgs >> 2);
+   else if (totalram_pgs < MB2PAGES(2048))
+   min_pages = MB2PAGES(104) + (totalram_pgs >> 3);
+   else if (totalram_pgs < MB2PAGES(8192))
+   min_pages = MB2PAGES(232) + (totalram_pgs >> 4);
else
-   min_pages = MB2PAGES(488) + (totalram_pages >> 5);
+   min_pages = MB2PAGES(488) + (totalram_pgs >> 5);
 #undef MB2PAGES
return min_pages;
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index e03c8d1..5d36655 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -383,10 +383,11 @@ void __init files_init(void)
 void __init files_maxfiles_init(void)
 {
unsigned long n;
-   unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+   unsigned long totalram_pgs = totalram_pages;
+   unsigned long memreserve = (totalram_pgs - nr_free_pages()) * 3/2;
 
-   memreserve = min(memreserve, totalram_pages - 1);
-   n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
+   memreserve = min(memreserve, totalram_pgs - 1);
+   n = ((totalram_pgs - memreserve) * (PAGE_SIZE / 1024)) / 10;
 
files_stat.max_files = max_t(unsigned long, n, NR_FILE);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index 2f78d32..63d57f7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -739,15 +739,16 @@ void __init __weak arch_task_cache_init(void) { }
 static void set_max_threads(unsigned int 

  1   2   >