RE: [PATCH v2 2/2] mm: Introduce kernelcore=reliable option
Dear Xishi, Thanks for reviewing. > -Original Message- > From: Xishi Qiu [mailto:qiuxi...@huawei.com] > Sent: Wednesday, December 09, 2015 11:26 AM > To: Izumi, Taku/泉 拓 > Cc: linux-kernel@vger.kernel.org; linux...@kvack.org; tony.l...@intel.com; > Kamezawa, Hiroyuki/亀澤 寛之; m...@csn.ul.ie; > a...@linux-foundation.org; dave.han...@intel.com; m...@codeblueprint.co.uk > Subject: Re: [PATCH v2 2/2] mm: Introduce kernelcore=reliable option > > On 2015/11/27 23:04, Taku Izumi wrote: > > > This patch extends existing "kernelcore" option and > > introduces kernelcore=reliable option. By specifying > > "reliable" instead of specifying the amount of memory, > > non-reliable region will be arranged into ZONE_MOVABLE. > > > > v1 -> v2: > > - Refine so that the following case also can be > >handled properly: > > > > Node X: |MM--MM| > >(legend) M: mirrored -: not mirrrored > > > > In this case, ZONE_NORMAL and ZONE_MOVABLE are > > arranged like bellow: > > > > Node X: |--| > > |ooxxoo| ZONE_NORMAL > > |ooxx| ZONE_MOVABLE > >(legend) o: present x: absent > > > > Signed-off-by: Taku Izumi > > --- > > Documentation/kernel-parameters.txt | 9 ++- > > mm/page_alloc.c | 110 > > ++-- > > 2 files changed, 112 insertions(+), 7 deletions(-) > > > > diff --git a/Documentation/kernel-parameters.txt > > b/Documentation/kernel-parameters.txt > > index f8aae63..ed44c2c8 100644 > > --- a/Documentation/kernel-parameters.txt > > +++ b/Documentation/kernel-parameters.txt > > @@ -1695,7 +1695,8 @@ bytes respectively. Such letter suffixes can also be > > entirely omitted. > > > > keepinitrd [HW,ARM] > > > > - kernelcore=nn[KMG] [KNL,X86,IA-64,PPC] This parameter > > + kernelcore= Format: nn[KMG] | "reliable" > > + [KNL,X86,IA-64,PPC] This parameter > > specifies the amount of memory usable by the kernel > > for non-movable allocations. The requested amount is > > spread evenly throughout all nodes in the system. The > > @@ -1711,6 +1712,12 @@ bytes respectively. Such letter suffixes can also be > > entirely omitted. > > use the HighMem zone if it exists, and the Normal > > zone if it does not. > > > > + Instead of specifying the amount of memory (nn[KMS]), > > + you can specify "reliable" option. In case "reliable" > > + option is specified, reliable memory is used for > > + non-movable allocations and remaining memory is used > > + for Movable pages. > > + > > kgdbdbgp= [KGDB,HW] kgdb over EHCI usb debug port. > > Format: [,poll interval] > > The controller # is the number of the ehci usb debug > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index acb0b4e..006a3d8 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -251,6 +251,7 @@ static unsigned long __meminitdata > > arch_zone_highest_possible_pfn[MAX_NR_ZONES]; > > static unsigned long __initdata required_kernelcore; > > static unsigned long __initdata required_movablecore; > > static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES]; > > +static bool reliable_kernelcore; > > > > /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ > > int movable_zone; > > @@ -4472,6 +4473,7 @@ void __meminit memmap_init_zone(unsigned long size, > > int nid, unsigned long zone, > > unsigned long pfn; > > struct zone *z; > > unsigned long nr_initialised = 0; > > + struct memblock_region *r = NULL, *tmp; > > > > if (highest_memmap_pfn < end_pfn - 1) > > highest_memmap_pfn = end_pfn - 1; > > @@ -4491,6 +4493,38 @@ void __meminit memmap_init_zone(unsigned long size, > > int nid, unsigned long zone, > > if (!update_defer_init(pgdat, pfn, end_pfn, > > &nr_initialised)) > > break; > > + > > + /* > > +* if not reliable_kernelcore and ZONE_MOVABLE exists, > > +* range from zone_
Re: [PATCH v2 2/2] mm: Introduce kernelcore=reliable option
On 2015/12/9 10:25, Xishi Qiu wrote: > On 2015/11/27 23:04, Taku Izumi wrote: > >> This patch extends existing "kernelcore" option and >> introduces kernelcore=reliable option. By specifying >> "reliable" instead of specifying the amount of memory, >> non-reliable region will be arranged into ZONE_MOVABLE. >> >> v1 -> v2: >> - Refine so that the following case also can be >>handled properly: >> >> Node X: |MM--MM| >>(legend) M: mirrored -: not mirrrored >> >> In this case, ZONE_NORMAL and ZONE_MOVABLE are >> arranged like bellow: >> >> Node X: |--| >> |ooxxoo| ZONE_NORMAL >> |ooxx| ZONE_MOVABLE >>(legend) o: present x: absent >> >> Signed-off-by: Taku Izumi >> --- >> Documentation/kernel-parameters.txt | 9 ++- >> mm/page_alloc.c | 110 >> ++-- >> 2 files changed, 112 insertions(+), 7 deletions(-) >> >> diff --git a/Documentation/kernel-parameters.txt >> b/Documentation/kernel-parameters.txt >> index f8aae63..ed44c2c8 100644 >> --- a/Documentation/kernel-parameters.txt >> +++ b/Documentation/kernel-parameters.txt >> @@ -1695,7 +1695,8 @@ bytes respectively. Such letter suffixes can also be >> entirely omitted. >> >> keepinitrd [HW,ARM] >> >> -kernelcore=nn[KMG] [KNL,X86,IA-64,PPC] This parameter >> +kernelcore= Format: nn[KMG] | "reliable" >> +[KNL,X86,IA-64,PPC] This parameter >> specifies the amount of memory usable by the kernel >> for non-movable allocations. The requested amount is >> spread evenly throughout all nodes in the system. The >> @@ -1711,6 +1712,12 @@ bytes respectively. Such letter suffixes can also be >> entirely omitted. >> use the HighMem zone if it exists, and the Normal >> zone if it does not. >> >> +Instead of specifying the amount of memory (nn[KMS]), >> +you can specify "reliable" option. In case "reliable" >> +option is specified, reliable memory is used for >> +non-movable allocations and remaining memory is used >> +for Movable pages. >> + >> kgdbdbgp= [KGDB,HW] kgdb over EHCI usb debug port. >> Format: [,poll interval] >> The controller # is the number of the ehci usb debug >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index acb0b4e..006a3d8 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -251,6 +251,7 @@ static unsigned long __meminitdata >> arch_zone_highest_possible_pfn[MAX_NR_ZONES]; >> static unsigned long __initdata required_kernelcore; >> static unsigned long __initdata required_movablecore; >> static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES]; >> +static bool reliable_kernelcore; >> >> /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ >> int movable_zone; >> @@ -4472,6 +4473,7 @@ void __meminit memmap_init_zone(unsigned long size, >> int nid, unsigned long zone, >> unsigned long pfn; >> struct zone *z; >> unsigned long nr_initialised = 0; >> +struct memblock_region *r = NULL, *tmp; >> >> if (highest_memmap_pfn < end_pfn - 1) >> highest_memmap_pfn = end_pfn - 1; >> @@ -4491,6 +4493,38 @@ void __meminit memmap_init_zone(unsigned long size, >> int nid, unsigned long zone, >> if (!update_defer_init(pgdat, pfn, end_pfn, >> &nr_initialised)) >> break; >> + >> +/* >> + * if not reliable_kernelcore and ZONE_MOVABLE exists, >> + * range from zone_movable_pfn[nid] to end of each node >> + * should be ZONE_MOVABLE not ZONE_NORMAL. skip it. >> + */ >> +if (!reliable_kernelcore && zone_movable_pfn[nid]) >> +if (zone == ZONE_NORMAL && >> +pfn >= zone_movable_pfn[nid]) >> +continue; >> + >> +/* >> + * check given memblock attribute by firmware which >> + * can affect kernel memory layout. >> + * if zone==ZONE_MOVABLE but memory is mirrored, >> + * it's an overlapped memmap init. skip it. >> + */ >> +if (reliable_kernelcore && zone == ZONE_MOVABLE) { >> +if (!r || >> +pfn >= memblock_region_memory_end_pfn(r)) { >> +for_each_memblock(memory, tmp) >> +if (pfn < >> memblock_region_memory_end_pfn(tmp)) >> +
Re: [PATCH v2 2/2] mm: Introduce kernelcore=reliable option
On 2015/11/27 23:04, Taku Izumi wrote: > This patch extends existing "kernelcore" option and > introduces kernelcore=reliable option. By specifying > "reliable" instead of specifying the amount of memory, > non-reliable region will be arranged into ZONE_MOVABLE. > > v1 -> v2: > - Refine so that the following case also can be >handled properly: > > Node X: |MM--MM| >(legend) M: mirrored -: not mirrrored > > In this case, ZONE_NORMAL and ZONE_MOVABLE are > arranged like bellow: > > Node X: |--| > |ooxxoo| ZONE_NORMAL > |ooxx| ZONE_MOVABLE >(legend) o: present x: absent > > Signed-off-by: Taku Izumi > --- > Documentation/kernel-parameters.txt | 9 ++- > mm/page_alloc.c | 110 > ++-- > 2 files changed, 112 insertions(+), 7 deletions(-) > > diff --git a/Documentation/kernel-parameters.txt > b/Documentation/kernel-parameters.txt > index f8aae63..ed44c2c8 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -1695,7 +1695,8 @@ bytes respectively. Such letter suffixes can also be > entirely omitted. > > keepinitrd [HW,ARM] > > - kernelcore=nn[KMG] [KNL,X86,IA-64,PPC] This parameter > + kernelcore= Format: nn[KMG] | "reliable" > + [KNL,X86,IA-64,PPC] This parameter > specifies the amount of memory usable by the kernel > for non-movable allocations. The requested amount is > spread evenly throughout all nodes in the system. The > @@ -1711,6 +1712,12 @@ bytes respectively. Such letter suffixes can also be > entirely omitted. > use the HighMem zone if it exists, and the Normal > zone if it does not. > > + Instead of specifying the amount of memory (nn[KMS]), > + you can specify "reliable" option. In case "reliable" > + option is specified, reliable memory is used for > + non-movable allocations and remaining memory is used > + for Movable pages. > + > kgdbdbgp= [KGDB,HW] kgdb over EHCI usb debug port. > Format: [,poll interval] > The controller # is the number of the ehci usb debug > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index acb0b4e..006a3d8 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -251,6 +251,7 @@ static unsigned long __meminitdata > arch_zone_highest_possible_pfn[MAX_NR_ZONES]; > static unsigned long __initdata required_kernelcore; > static unsigned long __initdata required_movablecore; > static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES]; > +static bool reliable_kernelcore; > > /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ > int movable_zone; > @@ -4472,6 +4473,7 @@ void __meminit memmap_init_zone(unsigned long size, int > nid, unsigned long zone, > unsigned long pfn; > struct zone *z; > unsigned long nr_initialised = 0; > + struct memblock_region *r = NULL, *tmp; > > if (highest_memmap_pfn < end_pfn - 1) > highest_memmap_pfn = end_pfn - 1; > @@ -4491,6 +4493,38 @@ void __meminit memmap_init_zone(unsigned long size, > int nid, unsigned long zone, > if (!update_defer_init(pgdat, pfn, end_pfn, > &nr_initialised)) > break; > + > + /* > + * if not reliable_kernelcore and ZONE_MOVABLE exists, > + * range from zone_movable_pfn[nid] to end of each node > + * should be ZONE_MOVABLE not ZONE_NORMAL. skip it. > + */ > + if (!reliable_kernelcore && zone_movable_pfn[nid]) > + if (zone == ZONE_NORMAL && > + pfn >= zone_movable_pfn[nid]) > + continue; > + > + /* > + * check given memblock attribute by firmware which > + * can affect kernel memory layout. > + * if zone==ZONE_MOVABLE but memory is mirrored, > + * it's an overlapped memmap init. skip it. > + */ > + if (reliable_kernelcore && zone == ZONE_MOVABLE) { > + if (!r || > + pfn >= memblock_region_memory_end_pfn(r)) { > + for_each_memblock(memory, tmp) > + if (pfn < > memblock_region_memory_end_pfn(tmp)) > + break; > +
[PATCH v2 2/2] mm: Introduce kernelcore=reliable option
This patch extends existing "kernelcore" option and introduces kernelcore=reliable option. By specifying "reliable" instead of specifying the amount of memory, non-reliable region will be arranged into ZONE_MOVABLE. v1 -> v2: - Refine so that the following case also can be handled properly: Node X: |MM--MM| (legend) M: mirrored -: not mirrrored In this case, ZONE_NORMAL and ZONE_MOVABLE are arranged like bellow: Node X: |--| |ooxxoo| ZONE_NORMAL |ooxx| ZONE_MOVABLE (legend) o: present x: absent Signed-off-by: Taku Izumi --- Documentation/kernel-parameters.txt | 9 ++- mm/page_alloc.c | 110 ++-- 2 files changed, 112 insertions(+), 7 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index f8aae63..ed44c2c8 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1695,7 +1695,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. keepinitrd [HW,ARM] - kernelcore=nn[KMG] [KNL,X86,IA-64,PPC] This parameter + kernelcore= Format: nn[KMG] | "reliable" + [KNL,X86,IA-64,PPC] This parameter specifies the amount of memory usable by the kernel for non-movable allocations. The requested amount is spread evenly throughout all nodes in the system. The @@ -1711,6 +1712,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. use the HighMem zone if it exists, and the Normal zone if it does not. + Instead of specifying the amount of memory (nn[KMS]), + you can specify "reliable" option. In case "reliable" + option is specified, reliable memory is used for + non-movable allocations and remaining memory is used + for Movable pages. + kgdbdbgp= [KGDB,HW] kgdb over EHCI usb debug port. Format: [,poll interval] The controller # is the number of the ehci usb debug diff --git a/mm/page_alloc.c b/mm/page_alloc.c index acb0b4e..006a3d8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -251,6 +251,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES]; static unsigned long __initdata required_kernelcore; static unsigned long __initdata required_movablecore; static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES]; +static bool reliable_kernelcore; /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ int movable_zone; @@ -4472,6 +4473,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, unsigned long pfn; struct zone *z; unsigned long nr_initialised = 0; + struct memblock_region *r = NULL, *tmp; if (highest_memmap_pfn < end_pfn - 1) highest_memmap_pfn = end_pfn - 1; @@ -4491,6 +4493,38 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised)) break; + + /* +* if not reliable_kernelcore and ZONE_MOVABLE exists, +* range from zone_movable_pfn[nid] to end of each node +* should be ZONE_MOVABLE not ZONE_NORMAL. skip it. +*/ + if (!reliable_kernelcore && zone_movable_pfn[nid]) + if (zone == ZONE_NORMAL && + pfn >= zone_movable_pfn[nid]) + continue; + + /* +* check given memblock attribute by firmware which +* can affect kernel memory layout. +* if zone==ZONE_MOVABLE but memory is mirrored, +* it's an overlapped memmap init. skip it. +*/ + if (reliable_kernelcore && zone == ZONE_MOVABLE) { + if (!r || + pfn >= memblock_region_memory_end_pfn(r)) { + for_each_memblock(memory, tmp) + if (pfn < memblock_region_memory_end_pfn(tmp)) + break; + r = tmp; + } + if (pfn >= memblock_region_memory_base_pfn(r) && + m