Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section

2012-03-01 Thread Nishanth Aravamudan
On 29.02.2012 [15:28:30 -0800], Andrew Morton wrote:
> On Wed, 29 Feb 2012 10:12:33 -0800
> Nishanth Aravamudan  wrote:
> 
> > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> > Overcommit) on powerpc, we tripped the following:
> > 
> > kernel BUG at mm/bootmem.c:483!
> >
> > ...
> > 
> > This is
> > 
> > BUG_ON(limit && goal + size > limit);
> > 
> > and after some debugging, it seems that
> > 
> > goal = 0x700
> > limit = 0x800
> > 
> > and sparse_early_usemaps_alloc_node ->
> > sparse_early_usemaps_alloc_pgdat_section calls
> > 
> > return alloc_bootmem_section(usemap_size() * count, section_nr);
> > 
> > This is on a system with 8TB available via the AMS pool, and as a quirk
> > of AMS in firmware, all of that memory shows up in node 0. So, we end up
> > with an allocation that will fail the goal/limit constraints. In theory,
> > we could "fall-back" to alloc_bootmem_node() in
> > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> > defined, we'll BUG_ON() instead. A simple solution appears to be to
> > unconditionally remove the limit condition in alloc_bootmem_section,
> > meaning allocations are allowed to cross section boundaries (necessary
> > for systems of this size).
> > 
> > Johannes Weiner pointed out that if alloc_bootmem_section() no longer
> > guarantees section-locality, we need check_usemap_section_nr() to print
> > possible cross-dependencies between node descriptors and the usemaps
> > allocated through it. That makes the two loops in
> > sparse_early_usemaps_alloc_node() identical, so re-factor the code a
> > bit.
> 
> The patch is a bit scary now, so I think we should merge it into
> 3.4-rc1 and then backport it into 3.3.1 if nothing blows up.
> 
> Do you think it should be backported into 3.3.x?  Earlier kernels?

Upon review, it would be good if we can get it pushed back to kernels
3.0.x, 3.1.x and 3.2.x.

Thanks,
Nish

-- 
Nishanth Aravamudan 
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section

2012-03-01 Thread Mel Gorman
On Wed, Feb 29, 2012 at 10:12:33AM -0800, Nishanth Aravamudan wrote:
> 
>
> Signed-off-by: Nishanth Aravamudan 
> 

Acked-by: Mel Gorman 

-- 
Mel Gorman
SUSE Labs
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section

2012-02-29 Thread Nishanth Aravamudan
On 29.02.2012 [15:28:30 -0800], Andrew Morton wrote:
> On Wed, 29 Feb 2012 10:12:33 -0800
> Nishanth Aravamudan  wrote:
> 
> > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> > Overcommit) on powerpc, we tripped the following:
> > 
> > kernel BUG at mm/bootmem.c:483!
> >
> > ...
> > 
> > This is
> > 
> > BUG_ON(limit && goal + size > limit);
> > 
> > and after some debugging, it seems that
> > 
> > goal = 0x700
> > limit = 0x800
> > 
> > and sparse_early_usemaps_alloc_node ->
> > sparse_early_usemaps_alloc_pgdat_section calls
> > 
> > return alloc_bootmem_section(usemap_size() * count, section_nr);
> > 
> > This is on a system with 8TB available via the AMS pool, and as a quirk
> > of AMS in firmware, all of that memory shows up in node 0. So, we end up
> > with an allocation that will fail the goal/limit constraints. In theory,
> > we could "fall-back" to alloc_bootmem_node() in
> > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> > defined, we'll BUG_ON() instead. A simple solution appears to be to
> > unconditionally remove the limit condition in alloc_bootmem_section,
> > meaning allocations are allowed to cross section boundaries (necessary
> > for systems of this size).
> > 
> > Johannes Weiner pointed out that if alloc_bootmem_section() no longer
> > guarantees section-locality, we need check_usemap_section_nr() to print
> > possible cross-dependencies between node descriptors and the usemaps
> > allocated through it. That makes the two loops in
> > sparse_early_usemaps_alloc_node() identical, so re-factor the code a
> > bit.
> 
> The patch is a bit scary now, so I think we should merge it into
> 3.4-rc1 and then backport it into 3.3.1 if nothing blows up.

I think that's fair.

> Do you think it should be backported into 3.3.x?  Earlier kernels?

3.3.x seems reasonable. If I had to guess, I think this could be hit on
any kernels with this functionality -- that is, sparsemem in general?
Not sure how far back it's worth backporting.

> Also, this?

Urgh, yeah, that's way better.

Acked-by: Nishanth Aravamudan 

> --- 
> a/mm/bootmem.c~bootmem-sparsemem-remove-limit-constraint-in-alloc_bootmem_section-fix
> +++ a/mm/bootmem.c
> @@ -766,14 +766,13 @@ void * __init alloc_bootmem_section(unsi
>   unsigned long section_nr)
>  {
>   bootmem_data_t *bdata;
> - unsigned long pfn, goal, limit;
> + unsigned long pfn, goal;
> 
>   pfn = section_nr_to_pfn(section_nr);
>   goal = pfn << PAGE_SHIFT;
> - limit = 0;
>   bdata = &bootmem_node_data[early_pfn_to_nid(pfn)];
> 
> - return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit);
> + return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, 0);
>  }
>  #endif

Thanks for all the feedback!

-Nish

-- 
Nishanth Aravamudan 
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section

2012-02-29 Thread Andrew Morton
On Wed, 29 Feb 2012 10:12:33 -0800
Nishanth Aravamudan  wrote:

> While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> Overcommit) on powerpc, we tripped the following:
> 
> kernel BUG at mm/bootmem.c:483!
>
> ...
> 
> This is
> 
> BUG_ON(limit && goal + size > limit);
> 
> and after some debugging, it seems that
> 
>   goal = 0x700
>   limit = 0x800
> 
> and sparse_early_usemaps_alloc_node ->
> sparse_early_usemaps_alloc_pgdat_section calls
> 
>   return alloc_bootmem_section(usemap_size() * count, section_nr);
> 
> This is on a system with 8TB available via the AMS pool, and as a quirk
> of AMS in firmware, all of that memory shows up in node 0. So, we end up
> with an allocation that will fail the goal/limit constraints. In theory,
> we could "fall-back" to alloc_bootmem_node() in
> sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> defined, we'll BUG_ON() instead. A simple solution appears to be to
> unconditionally remove the limit condition in alloc_bootmem_section,
> meaning allocations are allowed to cross section boundaries (necessary
> for systems of this size).
> 
> Johannes Weiner pointed out that if alloc_bootmem_section() no longer
> guarantees section-locality, we need check_usemap_section_nr() to print
> possible cross-dependencies between node descriptors and the usemaps
> allocated through it. That makes the two loops in
> sparse_early_usemaps_alloc_node() identical, so re-factor the code a
> bit.

The patch is a bit scary now, so I think we should merge it into
3.4-rc1 and then backport it into 3.3.1 if nothing blows up.

Do you think it should be backported into 3.3.x?  Earlier kernels?

Also, this?

--- 
a/mm/bootmem.c~bootmem-sparsemem-remove-limit-constraint-in-alloc_bootmem_section-fix
+++ a/mm/bootmem.c
@@ -766,14 +766,13 @@ void * __init alloc_bootmem_section(unsi
unsigned long section_nr)
 {
bootmem_data_t *bdata;
-   unsigned long pfn, goal, limit;
+   unsigned long pfn, goal;
 
pfn = section_nr_to_pfn(section_nr);
goal = pfn << PAGE_SHIFT;
-   limit = 0;
bdata = &bootmem_node_data[early_pfn_to_nid(pfn)];
 
-   return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit);
+   return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, 0);
 }
 #endif
 
_

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section

2012-02-29 Thread Johannes Weiner
On Wed, Feb 29, 2012 at 10:12:33AM -0800, Nishanth Aravamudan wrote:
> On 28.02.2012 [15:47:32 +], Mel Gorman wrote:
> > On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote:
> > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> > > Overcommit) on powerpc, we tripped the following:
> > > 
> > > kernel BUG at mm/bootmem.c:483!
> > > cpu 0x0: Vector: 700 (Program Check) at [c0c03940]
> > > pc: c0a62bd8: .alloc_bootmem_core+0x90/0x39c
> > > lr: c0a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
> > > sp: c0c03bc0
> > >msr: 80021032
> > >   current = 0xc0b0cce0
> > >   paca= 0xc1d8
> > > pid   = 0, comm = swapper
> > > kernel BUG at mm/bootmem.c:483!
> > > enter ? for help
> > > [c0c03c80] c0a64bcc
> > > .sparse_early_usemaps_alloc_node+0x84/0x29c
> > > [c0c03d50] c0a64f10 .sparse_init+0x12c/0x28c
> > > [c0c03e20] c0a474f4 .setup_arch+0x20c/0x294
> > > [c0c03ee0] c0a4079c .start_kernel+0xb4/0x460
> > > [c0c03f90] c0009670 .start_here_common+0x1c/0x2c
> > > 
> > > This is
> > > 
> > > BUG_ON(limit && goal + size > limit);
> > > 
> > > and after some debugging, it seems that
> > > 
> > >   goal = 0x700
> > >   limit = 0x800
> > > 
> > > and sparse_early_usemaps_alloc_node ->
> > > sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls
> > > 
> > >   return alloc_bootmem_section(usemap_size() * count, section_nr);
> > > 
> > > This is on a system with 8TB available via the AMS pool, and as a quirk
> > > of AMS in firmware, all of that memory shows up in node 0. So, we end up
> > > with an allocation that will fail the goal/limit constraints. In theory,
> > > we could "fall-back" to alloc_bootmem_node() in
> > > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> > > defined, we'll BUG_ON() instead. A simple solution appears to be to
> > > disable the limit check if the size of the allocation in
> > > alloc_bootmem_secition exceeds the section size.
> > > 
> > > Signed-off-by: Nishanth Aravamudan 
> > > Cc: Dave Hansen 
> > > Cc: Anton Blanchard 
> > > Cc: Paul Mackerras 
> > > Cc: Ben Herrenschmidt 
> > > Cc: Robert Jennings 
> > > Cc: linux...@kvack.org
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > ---
> > >  include/linux/mmzone.h |2 ++
> > >  mm/bootmem.c   |5 -
> > >  2 files changed, 6 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > > index 650ba2f..4176834 100644
> > > --- a/include/linux/mmzone.h
> > > +++ b/include/linux/mmzone.h
> > > @@ -967,6 +967,8 @@ static inline unsigned long early_pfn_to_nid(unsigned 
> > > long pfn)
> > >   * PA_SECTION_SHIFT  physical address to/from section number
> > >   * PFN_SECTION_SHIFT pfn to/from section number
> > >   */
> > > +#define BYTES_PER_SECTION(1UL << SECTION_SIZE_BITS)
> > > +
> > >  #define SECTIONS_SHIFT   (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
> > >  
> > >  #define PA_SECTION_SHIFT (SECTION_SIZE_BITS)
> > > diff --git a/mm/bootmem.c b/mm/bootmem.c
> > > index 668e94d..5cbbc76 100644
> > > --- a/mm/bootmem.c
> > > +++ b/mm/bootmem.c
> > > @@ -770,7 +770,10 @@ void * __init alloc_bootmem_section(unsigned long 
> > > size,
> > >  
> > >   pfn = section_nr_to_pfn(section_nr);
> > >   goal = pfn << PAGE_SHIFT;
> > > - limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
> > > + if (size > BYTES_PER_SECTION)
> > > + limit = 0;
> > > + else
> > > + limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
> > 
> > As it's ok to spill the allocation over to an adjacent section, why not
> > just make limit==0 unconditionally. That would avoid defining
> > BYTES_PER_SECTION.
> 
> Something like this?
> 
> Andrew, presuming Mel & Johannes give their, ack this should presumably
> supersede the patch you pulled into -mm.
> 
> Thanks,
> Nish
> 
> ---
> 
> While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> Overcommit) on powerpc, we tripped the following:
> 
> kernel BUG at mm/bootmem.c:483!
> cpu 0x0: Vector: 700 (Program Check) at [c0c03940]
> pc: c0a62bd8: .alloc_bootmem_core+0x90/0x39c
> lr: c0a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
> sp: c0c03bc0
>msr: 80021032
>   current = 0xc0b0cce0
>   paca= 0xc1d8
> pid   = 0, comm = swapper
> kernel BUG at mm/bootmem.c:483!
> enter ? for help
> [c0c03c80] c0a64bcc
> .sparse_early_usemaps_alloc_node+0x84/0x29c
> [c0c03d50] c0a64f10 .sparse_init+0x12c/0x28c
> [c0c03e20] c0a474f4 .setup_arch+0x20c/0x294
> [c0c03ee0] c0a4079c .start_kernel+0xb4/0x460
> [c0c03f90] c0009670 .start_here_common+0x1c/0x2c
> 
> This is
> 
> BUG_ON(

[PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section

2012-02-29 Thread Nishanth Aravamudan
On 28.02.2012 [15:47:32 +], Mel Gorman wrote:
> On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote:
> > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> > Overcommit) on powerpc, we tripped the following:
> > 
> > kernel BUG at mm/bootmem.c:483!
> > cpu 0x0: Vector: 700 (Program Check) at [c0c03940]
> > pc: c0a62bd8: .alloc_bootmem_core+0x90/0x39c
> > lr: c0a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
> > sp: c0c03bc0
> >msr: 80021032
> >   current = 0xc0b0cce0
> >   paca= 0xc1d8
> > pid   = 0, comm = swapper
> > kernel BUG at mm/bootmem.c:483!
> > enter ? for help
> > [c0c03c80] c0a64bcc
> > .sparse_early_usemaps_alloc_node+0x84/0x29c
> > [c0c03d50] c0a64f10 .sparse_init+0x12c/0x28c
> > [c0c03e20] c0a474f4 .setup_arch+0x20c/0x294
> > [c0c03ee0] c0a4079c .start_kernel+0xb4/0x460
> > [c0c03f90] c0009670 .start_here_common+0x1c/0x2c
> > 
> > This is
> > 
> > BUG_ON(limit && goal + size > limit);
> > 
> > and after some debugging, it seems that
> > 
> > goal = 0x700
> > limit = 0x800
> > 
> > and sparse_early_usemaps_alloc_node ->
> > sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls
> > 
> > return alloc_bootmem_section(usemap_size() * count, section_nr);
> > 
> > This is on a system with 8TB available via the AMS pool, and as a quirk
> > of AMS in firmware, all of that memory shows up in node 0. So, we end up
> > with an allocation that will fail the goal/limit constraints. In theory,
> > we could "fall-back" to alloc_bootmem_node() in
> > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> > defined, we'll BUG_ON() instead. A simple solution appears to be to
> > disable the limit check if the size of the allocation in
> > alloc_bootmem_secition exceeds the section size.
> > 
> > Signed-off-by: Nishanth Aravamudan 
> > Cc: Dave Hansen 
> > Cc: Anton Blanchard 
> > Cc: Paul Mackerras 
> > Cc: Ben Herrenschmidt 
> > Cc: Robert Jennings 
> > Cc: linux...@kvack.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > ---
> >  include/linux/mmzone.h |2 ++
> >  mm/bootmem.c   |5 -
> >  2 files changed, 6 insertions(+), 1 deletions(-)
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 650ba2f..4176834 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -967,6 +967,8 @@ static inline unsigned long early_pfn_to_nid(unsigned 
> > long pfn)
> >   * PA_SECTION_SHIFTphysical address to/from section number
> >   * PFN_SECTION_SHIFT   pfn to/from section number
> >   */
> > +#define BYTES_PER_SECTION  (1UL << SECTION_SIZE_BITS)
> > +
> >  #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
> >  
> >  #define PA_SECTION_SHIFT   (SECTION_SIZE_BITS)
> > diff --git a/mm/bootmem.c b/mm/bootmem.c
> > index 668e94d..5cbbc76 100644
> > --- a/mm/bootmem.c
> > +++ b/mm/bootmem.c
> > @@ -770,7 +770,10 @@ void * __init alloc_bootmem_section(unsigned long size,
> >  
> > pfn = section_nr_to_pfn(section_nr);
> > goal = pfn << PAGE_SHIFT;
> > -   limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
> > +   if (size > BYTES_PER_SECTION)
> > +   limit = 0;
> > +   else
> > +   limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
> 
> As it's ok to spill the allocation over to an adjacent section, why not
> just make limit==0 unconditionally. That would avoid defining
> BYTES_PER_SECTION.

Something like this?

Andrew, presuming Mel & Johannes give their, ack this should presumably
supersede the patch you pulled into -mm.

Thanks,
Nish

---

While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
Overcommit) on powerpc, we tripped the following:

kernel BUG at mm/bootmem.c:483!
cpu 0x0: Vector: 700 (Program Check) at [c0c03940]
pc: c0a62bd8: .alloc_bootmem_core+0x90/0x39c
lr: c0a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
sp: c0c03bc0
   msr: 80021032
  current = 0xc0b0cce0
  paca= 0xc1d8
pid   = 0, comm = swapper
kernel BUG at mm/bootmem.c:483!
enter ? for help
[c0c03c80] c0a64bcc
.sparse_early_usemaps_alloc_node+0x84/0x29c
[c0c03d50] c0a64f10 .sparse_init+0x12c/0x28c
[c0c03e20] c0a474f4 .setup_arch+0x20c/0x294
[c0c03ee0] c0a4079c .start_kernel+0xb4/0x460
[c0c03f90] c0009670 .start_here_common+0x1c/0x2c

This is

BUG_ON(limit && goal + size > limit);

and after some debugging, it seems that

goal = 0x700
limit = 0x800

and sparse_early_usemaps_alloc_node ->
sparse_early_usemaps_alloc_pgdat_section calls

return alloc_bootmem_section(usemap_size() * count, section_nr);

This is on