Re: [PATCHv3 2/2] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-01-02 Thread Mike Rapoport
On Wed, Jan 02, 2019 at 02:47:54PM +0800, Pingfan Liu wrote:
> On Mon, Dec 31, 2018 at 4:46 PM Mike Rapoport  wrote:
> >
> > On Fri, Dec 28, 2018 at 11:00:02AM +0800, Pingfan Liu wrote:
> > > Customer reported a bug on a high end server with many pcie devices, where
> > > kernel bootup with crashkernel=384M, and kaslr is enabled. Even
> > > though we still see much memory under 896 MB, the finding still failed
> > > intermittently. Because currently we can only find region under 896 MB,
> > > if w/0 ',high' specified. Then KASLR breaks 896 MB into several parts
> > > randomly, and crashkernel reservation need be aligned to 128 MB, that's
> > > why failure is found. It raises confusion to the end user that sometimes
> > > crashkernel=X works while sometimes fails.
> > > If want to make it succeed, customer can change kernel option to
> > > "crashkernel=384M, high". Just this give "crashkernel=xx@yy" a very
> > > limited space to behave even though its grammer looks more generic.
> > > And we can't answer questions raised from customer that confidently:
> > > 1) why it doesn't succeed to reserve 896 MB;
> > > 2) what's wrong with memory region under 4G;
> > > 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
> > >
> > > This patch simplifies the method suggested in the mail [1]. It just goes
> > > bottom-up to find a candidate region for crashkernel. The bottom-up may be
> > > better compatible with the old reservation style, i.e. still want to get
> > > memory region from 896 MB firstly, then [896 MB, 4G], finally above 4G.
> > >
> > > There is one trivial thing about the compatibility with old kexec-tools:
> > > if the reserved region is above 896M, then old tool will fail to load
> > > bzImage. But without this patch, the old tool also fail since there is no
> > > memory below 896M can be reserved for crashkernel.
> > >
> > > [1]: http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> > > Signed-off-by: Pingfan Liu 
> > > Cc: Tang Chen 
> > > Cc: "Rafael J. Wysocki" 
> > > Cc: Len Brown 
> > > Cc: Andrew Morton 
> > > Cc: Mike Rapoport 
> > > Cc: Michal Hocko 
> > > Cc: Jonathan Corbet 
> > > Cc: Yaowei Bai 
> > > Cc: Pavel Tatashin 
> > > Cc: Nicholas Piggin 
> > > Cc: Naoya Horiguchi 
> > > Cc: Daniel Vacek 
> > > Cc: Mathieu Malaterre 
> > > Cc: Stefan Agner 
> > > Cc: Dave Young 
> > > Cc: Baoquan He 
> > > Cc: ying...@kernel.org,
> > > Cc: vgo...@redhat.com
> > > Cc: linux-kernel@vger.kernel.org
> > > ---
> > >  arch/x86/kernel/setup.c | 9 ++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > index d494b9b..165f9c3 100644
> > > --- a/arch/x86/kernel/setup.c
> > > +++ b/arch/x86/kernel/setup.c
> > > @@ -541,15 +541,18 @@ static void __init reserve_crashkernel(void)
> > >
> > >   /* 0 means: find the address automatically */
> > >   if (crash_base <= 0) {
> > > + bool bottom_up = memblock_bottom_up();
> > > +
> > > + memblock_set_bottom_up(true);
> > >
> > >   /*
> > >* Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> > >* as old kexec-tools loads bzImage below that, unless
> > >* "crashkernel=size[KMG],high" is specified.
> > >*/
> > >   crash_base = memblock_find_in_range(CRASH_ALIGN,
> > > - high ? 
> > > CRASH_ADDR_HIGH_MAX
> > > -  : 
> > > CRASH_ADDR_LOW_MAX,
> > > - crash_size, 
> > > CRASH_ALIGN);
> > > + (max_pfn * PAGE_SIZE), crash_size, CRASH_ALIGN);
> > > + memblock_set_bottom_up(bottom_up);
> >
> > Using bottom-up does not guarantee that the allocation won't fall into a
> > removable memory, it only makes it highly probable.
> >
> > I think that the 'max_pfn * PAGE_SIZE' limit should be replaced with the
> > end of the non-removable memory node.
> >
> Since passing MEMBLOCK_NONE, memblock_find_in_range() ->...->
> __next_mem_range(), there is a logic to guarantee hotmovable memory
> will not be stamped over.
> if (movable_node_is_enabled() && memblock_is_hotpluggable(m))
> continue;

Thanks for the clarification, I've missed that.
 
> Thanks,
> Pingfan
> 
> > > +
> > >   if (!crash_base) {
> > >   pr_info("crashkernel reservation failed - No 
> > > suitable area found.\n");
> > >   return;
> > > --
> > > 2.7.4
> > >
> >
> > --
> > Sincerely yours,
> > Mike.
> >
> 

-- 
Sincerely yours,
Mike.



Re: [PATCHv3 2/2] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-01-01 Thread Pingfan Liu
On Mon, Dec 31, 2018 at 4:46 PM Mike Rapoport  wrote:
>
> On Fri, Dec 28, 2018 at 11:00:02AM +0800, Pingfan Liu wrote:
> > Customer reported a bug on a high end server with many pcie devices, where
> > kernel bootup with crashkernel=384M, and kaslr is enabled. Even
> > though we still see much memory under 896 MB, the finding still failed
> > intermittently. Because currently we can only find region under 896 MB,
> > if w/0 ',high' specified. Then KASLR breaks 896 MB into several parts
> > randomly, and crashkernel reservation need be aligned to 128 MB, that's
> > why failure is found. It raises confusion to the end user that sometimes
> > crashkernel=X works while sometimes fails.
> > If want to make it succeed, customer can change kernel option to
> > "crashkernel=384M, high". Just this give "crashkernel=xx@yy" a very
> > limited space to behave even though its grammer looks more generic.
> > And we can't answer questions raised from customer that confidently:
> > 1) why it doesn't succeed to reserve 896 MB;
> > 2) what's wrong with memory region under 4G;
> > 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
> >
> > This patch simplifies the method suggested in the mail [1]. It just goes
> > bottom-up to find a candidate region for crashkernel. The bottom-up may be
> > better compatible with the old reservation style, i.e. still want to get
> > memory region from 896 MB firstly, then [896 MB, 4G], finally above 4G.
> >
> > There is one trivial thing about the compatibility with old kexec-tools:
> > if the reserved region is above 896M, then old tool will fail to load
> > bzImage. But without this patch, the old tool also fail since there is no
> > memory below 896M can be reserved for crashkernel.
> >
> > [1]: http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> > Signed-off-by: Pingfan Liu 
> > Cc: Tang Chen 
> > Cc: "Rafael J. Wysocki" 
> > Cc: Len Brown 
> > Cc: Andrew Morton 
> > Cc: Mike Rapoport 
> > Cc: Michal Hocko 
> > Cc: Jonathan Corbet 
> > Cc: Yaowei Bai 
> > Cc: Pavel Tatashin 
> > Cc: Nicholas Piggin 
> > Cc: Naoya Horiguchi 
> > Cc: Daniel Vacek 
> > Cc: Mathieu Malaterre 
> > Cc: Stefan Agner 
> > Cc: Dave Young 
> > Cc: Baoquan He 
> > Cc: ying...@kernel.org,
> > Cc: vgo...@redhat.com
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  arch/x86/kernel/setup.c | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index d494b9b..165f9c3 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -541,15 +541,18 @@ static void __init reserve_crashkernel(void)
> >
> >   /* 0 means: find the address automatically */
> >   if (crash_base <= 0) {
> > + bool bottom_up = memblock_bottom_up();
> > +
> > + memblock_set_bottom_up(true);
> >
> >   /*
> >* Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> >* as old kexec-tools loads bzImage below that, unless
> >* "crashkernel=size[KMG],high" is specified.
> >*/
> >   crash_base = memblock_find_in_range(CRASH_ALIGN,
> > - high ? CRASH_ADDR_HIGH_MAX
> > -  : CRASH_ADDR_LOW_MAX,
> > - crash_size, CRASH_ALIGN);
> > + (max_pfn * PAGE_SIZE), crash_size, CRASH_ALIGN);
> > + memblock_set_bottom_up(bottom_up);
>
> Using bottom-up does not guarantee that the allocation won't fall into a
> removable memory, it only makes it highly probable.
>
> I think that the 'max_pfn * PAGE_SIZE' limit should be replaced with the
> end of the non-removable memory node.
>
Since passing MEMBLOCK_NONE, memblock_find_in_range() ->...->
__next_mem_range(), there is a logic to guarantee hotmovable memory
will not be stamped over.
if (movable_node_is_enabled() && memblock_is_hotpluggable(m))
continue;

Thanks,
Pingfan

> > +
> >   if (!crash_base) {
> >   pr_info("crashkernel reservation failed - No suitable 
> > area found.\n");
> >   return;
> > --
> > 2.7.4
> >
>
> --
> Sincerely yours,
> Mike.
>


Re: [PATCHv3 2/2] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2018-12-31 Thread Mike Rapoport
On Fri, Dec 28, 2018 at 11:00:02AM +0800, Pingfan Liu wrote:
> Customer reported a bug on a high end server with many pcie devices, where
> kernel bootup with crashkernel=384M, and kaslr is enabled. Even
> though we still see much memory under 896 MB, the finding still failed
> intermittently. Because currently we can only find region under 896 MB,
> if w/0 ',high' specified. Then KASLR breaks 896 MB into several parts
> randomly, and crashkernel reservation need be aligned to 128 MB, that's
> why failure is found. It raises confusion to the end user that sometimes
> crashkernel=X works while sometimes fails.
> If want to make it succeed, customer can change kernel option to
> "crashkernel=384M, high". Just this give "crashkernel=xx@yy" a very
> limited space to behave even though its grammer looks more generic.
> And we can't answer questions raised from customer that confidently:
> 1) why it doesn't succeed to reserve 896 MB;
> 2) what's wrong with memory region under 4G;
> 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
> 
> This patch simplifies the method suggested in the mail [1]. It just goes
> bottom-up to find a candidate region for crashkernel. The bottom-up may be
> better compatible with the old reservation style, i.e. still want to get
> memory region from 896 MB firstly, then [896 MB, 4G], finally above 4G.
> 
> There is one trivial thing about the compatibility with old kexec-tools:
> if the reserved region is above 896M, then old tool will fail to load
> bzImage. But without this patch, the old tool also fail since there is no
> memory below 896M can be reserved for crashkernel.
> 
> [1]: http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> Signed-off-by: Pingfan Liu 
> Cc: Tang Chen 
> Cc: "Rafael J. Wysocki" 
> Cc: Len Brown 
> Cc: Andrew Morton 
> Cc: Mike Rapoport 
> Cc: Michal Hocko 
> Cc: Jonathan Corbet 
> Cc: Yaowei Bai 
> Cc: Pavel Tatashin 
> Cc: Nicholas Piggin 
> Cc: Naoya Horiguchi 
> Cc: Daniel Vacek 
> Cc: Mathieu Malaterre 
> Cc: Stefan Agner 
> Cc: Dave Young 
> Cc: Baoquan He 
> Cc: ying...@kernel.org,
> Cc: vgo...@redhat.com
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/x86/kernel/setup.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index d494b9b..165f9c3 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -541,15 +541,18 @@ static void __init reserve_crashkernel(void)
> 
>   /* 0 means: find the address automatically */
>   if (crash_base <= 0) {
> + bool bottom_up = memblock_bottom_up();
> +
> + memblock_set_bottom_up(true);
>
>   /*
>* Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
>* as old kexec-tools loads bzImage below that, unless
>* "crashkernel=size[KMG],high" is specified.
>*/
>   crash_base = memblock_find_in_range(CRASH_ALIGN,
> - high ? CRASH_ADDR_HIGH_MAX
> -  : CRASH_ADDR_LOW_MAX,
> - crash_size, CRASH_ALIGN);
> + (max_pfn * PAGE_SIZE), crash_size, CRASH_ALIGN);
> + memblock_set_bottom_up(bottom_up);

Using bottom-up does not guarantee that the allocation won't fall into a
removable memory, it only makes it highly probable.

I think that the 'max_pfn * PAGE_SIZE' limit should be replaced with the
end of the non-removable memory node.

> +
>   if (!crash_base) {
>   pr_info("crashkernel reservation failed - No suitable 
> area found.\n");
>   return;
> -- 
> 2.7.4
> 

-- 
Sincerely yours,
Mike.



[PATCHv3 2/2] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2018-12-27 Thread Pingfan Liu
Customer reported a bug on a high end server with many pcie devices, where
kernel bootup with crashkernel=384M, and kaslr is enabled. Even
though we still see much memory under 896 MB, the finding still failed
intermittently. Because currently we can only find region under 896 MB,
if w/0 ',high' specified. Then KASLR breaks 896 MB into several parts
randomly, and crashkernel reservation need be aligned to 128 MB, that's
why failure is found. It raises confusion to the end user that sometimes
crashkernel=X works while sometimes fails.
If want to make it succeed, customer can change kernel option to
"crashkernel=384M, high". Just this give "crashkernel=xx@yy" a very
limited space to behave even though its grammer looks more generic.
And we can't answer questions raised from customer that confidently:
1) why it doesn't succeed to reserve 896 MB;
2) what's wrong with memory region under 4G;
3) why I have to add ',high', I only require 384 MB, not 3840 MB.

This patch simplifies the method suggested in the mail [1]. It just goes
bottom-up to find a candidate region for crashkernel. The bottom-up may be
better compatible with the old reservation style, i.e. still want to get
memory region from 896 MB firstly, then [896 MB, 4G], finally above 4G.

There is one trivial thing about the compatibility with old kexec-tools:
if the reserved region is above 896M, then old tool will fail to load
bzImage. But without this patch, the old tool also fail since there is no
memory below 896M can be reserved for crashkernel.

[1]: http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
Signed-off-by: Pingfan Liu 
Cc: Tang Chen 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: Michal Hocko 
Cc: Jonathan Corbet 
Cc: Yaowei Bai 
Cc: Pavel Tatashin 
Cc: Nicholas Piggin 
Cc: Naoya Horiguchi 
Cc: Daniel Vacek 
Cc: Mathieu Malaterre 
Cc: Stefan Agner 
Cc: Dave Young 
Cc: Baoquan He 
Cc: ying...@kernel.org,
Cc: vgo...@redhat.com
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kernel/setup.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d494b9b..165f9c3 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -541,15 +541,18 @@ static void __init reserve_crashkernel(void)
 
/* 0 means: find the address automatically */
if (crash_base <= 0) {
+   bool bottom_up = memblock_bottom_up();
+
+   memblock_set_bottom_up(true);
/*
 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
 * as old kexec-tools loads bzImage below that, unless
 * "crashkernel=size[KMG],high" is specified.
 */
crash_base = memblock_find_in_range(CRASH_ALIGN,
-   high ? CRASH_ADDR_HIGH_MAX
-: CRASH_ADDR_LOW_MAX,
-   crash_size, CRASH_ALIGN);
+   (max_pfn * PAGE_SIZE), crash_size, CRASH_ALIGN);
+   memblock_set_bottom_up(bottom_up);
+
if (!crash_base) {
pr_info("crashkernel reservation failed - No suitable 
area found.\n");
return;
-- 
2.7.4