Re: [PATCH v3] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()

2015-10-02 Thread Ingo Molnar

* Andrew Morton  wrote:

> On Tue, 29 Sep 2015 20:58:57 +0800 "Lee, Chun-Yi"  
> wrote:
> 
> > This patch modified the code in fill_up_crash_elf_data by using
> > walk_system_ram_res instead of walk_system_ram_range to count the max
> > number of crash memory ranges. That's because the walk_system_ram_range
> > filters out small memory regions that are resided in the same page, but
> > walk_system_ram_res does not.
> > 
> > The oringial issue is page fault error that sometimes happened on big 
> > machines
> > when preparing ELF headers:
> > 
> > [  305.291522] BUG: unable to handle kernel paging request at 
> > c90613fc9000
> > [  305.299621] IP: [] 
> > prepare_elf64_ram_headers_callback+0x165/0x260
> > [  305.308300] PGD e32067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0
> > [  305.315393] Oops: 0002 [#1] SMP
> > [...snip]
> > [  305.420953] task: 8e1c01ced600 ti: 8e1c03ec2000 task.ti: 
> > 8e1c03ec2000
> > [  305.429292] RIP: 0010:[]  [] 
> > prepare_elf64_ra
> > m_headers_callback+0x165/0x260
> > [...snip]
> > 
> > After tracing prepare_elf64_headers and prepare_elf64_ram_headers_callback,
> > the code uses walk_system_ram_res to fill-in crash memory regions 
> > information
> > to program header, so it counts those small memory regions that are resided 
> > in
> > a page area. But, when kernel was using walk_system_ram_range in
> > fill_up_crash_elf_data to count the number of crash memory regions, it 
> > filters
> > out small regions. I printed those small memory regions, for example:
> > 
> > kexec: Get nr_ram ranges. vaddr=0x880077592258 paddr=0x77592258, 
> > sz=0xdc0
> > 
> > Base on the code in walk_system_ram_range, this memory region will be 
> > filtered
> > out:
> > 
> > pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
> > end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
> > end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
> > 
> > So, the max_nr_ranges that's counted by kernel doesn't include small memory
> > regions. That causes the page fault issue happened in later code path for
> > preparing EFL headers.
> > 
> > This issus is not easy to reproduce on small machines that don't have too
> > many CPUs because the allocated page aligned ELF buffer has more free space
> > to cover those small memory regions' PT_LOAD headers.
> > 
> 
> fyi, I added a cc:stable to my copy of this patch.

Note that I already have it applied, with a much improved changelog:

  e3c41e37b0f4 ("x86/kexec: Fix kexec crash in syscall kexec_file_load()")

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()

2015-10-02 Thread Ingo Molnar

* Andrew Morton  wrote:

> On Tue, 29 Sep 2015 20:58:57 +0800 "Lee, Chun-Yi"  
> wrote:
> 
> > This patch modified the code in fill_up_crash_elf_data by using
> > walk_system_ram_res instead of walk_system_ram_range to count the max
> > number of crash memory ranges. That's because the walk_system_ram_range
> > filters out small memory regions that are resided in the same page, but
> > walk_system_ram_res does not.
> > 
> > The oringial issue is page fault error that sometimes happened on big 
> > machines
> > when preparing ELF headers:
> > 
> > [  305.291522] BUG: unable to handle kernel paging request at 
> > c90613fc9000
> > [  305.299621] IP: [] 
> > prepare_elf64_ram_headers_callback+0x165/0x260
> > [  305.308300] PGD e32067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0
> > [  305.315393] Oops: 0002 [#1] SMP
> > [...snip]
> > [  305.420953] task: 8e1c01ced600 ti: 8e1c03ec2000 task.ti: 
> > 8e1c03ec2000
> > [  305.429292] RIP: 0010:[]  [] 
> > prepare_elf64_ra
> > m_headers_callback+0x165/0x260
> > [...snip]
> > 
> > After tracing prepare_elf64_headers and prepare_elf64_ram_headers_callback,
> > the code uses walk_system_ram_res to fill-in crash memory regions 
> > information
> > to program header, so it counts those small memory regions that are resided 
> > in
> > a page area. But, when kernel was using walk_system_ram_range in
> > fill_up_crash_elf_data to count the number of crash memory regions, it 
> > filters
> > out small regions. I printed those small memory regions, for example:
> > 
> > kexec: Get nr_ram ranges. vaddr=0x880077592258 paddr=0x77592258, 
> > sz=0xdc0
> > 
> > Base on the code in walk_system_ram_range, this memory region will be 
> > filtered
> > out:
> > 
> > pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
> > end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
> > end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
> > 
> > So, the max_nr_ranges that's counted by kernel doesn't include small memory
> > regions. That causes the page fault issue happened in later code path for
> > preparing EFL headers.
> > 
> > This issus is not easy to reproduce on small machines that don't have too
> > many CPUs because the allocated page aligned ELF buffer has more free space
> > to cover those small memory regions' PT_LOAD headers.
> > 
> 
> fyi, I added a cc:stable to my copy of this patch.

Note that I already have it applied, with a much improved changelog:

  e3c41e37b0f4 ("x86/kexec: Fix kexec crash in syscall kexec_file_load()")

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()

2015-10-01 Thread Andrew Morton
On Tue, 29 Sep 2015 20:58:57 +0800 "Lee, Chun-Yi"  
wrote:

> This patch modified the code in fill_up_crash_elf_data by using
> walk_system_ram_res instead of walk_system_ram_range to count the max
> number of crash memory ranges. That's because the walk_system_ram_range
> filters out small memory regions that are resided in the same page, but
> walk_system_ram_res does not.
> 
> The oringial issue is page fault error that sometimes happened on big machines
> when preparing ELF headers:
> 
> [  305.291522] BUG: unable to handle kernel paging request at c90613fc9000
> [  305.299621] IP: [] 
> prepare_elf64_ram_headers_callback+0x165/0x260
> [  305.308300] PGD e32067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0
> [  305.315393] Oops: 0002 [#1] SMP
> [...snip]
> [  305.420953] task: 8e1c01ced600 ti: 8e1c03ec2000 task.ti: 
> 8e1c03ec2000
> [  305.429292] RIP: 0010:[]  [] 
> prepare_elf64_ra
> m_headers_callback+0x165/0x260
> [...snip]
> 
> After tracing prepare_elf64_headers and prepare_elf64_ram_headers_callback,
> the code uses walk_system_ram_res to fill-in crash memory regions information
> to program header, so it counts those small memory regions that are resided in
> a page area. But, when kernel was using walk_system_ram_range in
> fill_up_crash_elf_data to count the number of crash memory regions, it filters
> out small regions. I printed those small memory regions, for example:
> 
> kexec: Get nr_ram ranges. vaddr=0x880077592258 paddr=0x77592258, sz=0xdc0
> 
> Base on the code in walk_system_ram_range, this memory region will be filtered
> out:
> 
> pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
> end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
> end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
> 
> So, the max_nr_ranges that's counted by kernel doesn't include small memory
> regions. That causes the page fault issue happened in later code path for
> preparing EFL headers.
> 
> This issus is not easy to reproduce on small machines that don't have too
> many CPUs because the allocated page aligned ELF buffer has more free space
> to cover those small memory regions' PT_LOAD headers.
> 

fyi, I added a cc:stable to my copy of this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()

2015-10-01 Thread Andrew Morton
On Tue, 29 Sep 2015 20:58:57 +0800 "Lee, Chun-Yi"  
wrote:

> This patch modified the code in fill_up_crash_elf_data by using
> walk_system_ram_res instead of walk_system_ram_range to count the max
> number of crash memory ranges. That's because the walk_system_ram_range
> filters out small memory regions that are resided in the same page, but
> walk_system_ram_res does not.
> 
> The oringial issue is page fault error that sometimes happened on big machines
> when preparing ELF headers:
> 
> [  305.291522] BUG: unable to handle kernel paging request at c90613fc9000
> [  305.299621] IP: [] 
> prepare_elf64_ram_headers_callback+0x165/0x260
> [  305.308300] PGD e32067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0
> [  305.315393] Oops: 0002 [#1] SMP
> [...snip]
> [  305.420953] task: 8e1c01ced600 ti: 8e1c03ec2000 task.ti: 
> 8e1c03ec2000
> [  305.429292] RIP: 0010:[]  [] 
> prepare_elf64_ra
> m_headers_callback+0x165/0x260
> [...snip]
> 
> After tracing prepare_elf64_headers and prepare_elf64_ram_headers_callback,
> the code uses walk_system_ram_res to fill-in crash memory regions information
> to program header, so it counts those small memory regions that are resided in
> a page area. But, when kernel was using walk_system_ram_range in
> fill_up_crash_elf_data to count the number of crash memory regions, it filters
> out small regions. I printed those small memory regions, for example:
> 
> kexec: Get nr_ram ranges. vaddr=0x880077592258 paddr=0x77592258, sz=0xdc0
> 
> Base on the code in walk_system_ram_range, this memory region will be filtered
> out:
> 
> pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
> end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
> end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
> 
> So, the max_nr_ranges that's counted by kernel doesn't include small memory
> regions. That causes the page fault issue happened in later code path for
> preparing EFL headers.
> 
> This issus is not easy to reproduce on small machines that don't have too
> many CPUs because the allocated page aligned ELF buffer has more free space
> to cover those small memory regions' PT_LOAD headers.
> 

fyi, I added a cc:stable to my copy of this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()

2015-09-30 Thread Minfei Huang
On 09/29/15 at 08:58pm, Lee, Chun-Yi wrote:
> This patch modified the code in fill_up_crash_elf_data by using
> walk_system_ram_res instead of walk_system_ram_range to count the max
> number of crash memory ranges. That's because the walk_system_ram_range
> filters out small memory regions that are resided in the same page, but
> walk_system_ram_res does not.
> 
> The oringial issue is page fault error that sometimes happened on big machines
> when preparing ELF headers:
> 
> [  305.291522] BUG: unable to handle kernel paging request at c90613fc9000
> [  305.299621] IP: [] 
> prepare_elf64_ram_headers_callback+0x165/0x260
> [  305.308300] PGD e32067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0
> [  305.315393] Oops: 0002 [#1] SMP
> [...snip]
> [  305.420953] task: 8e1c01ced600 ti: 8e1c03ec2000 task.ti: 
> 8e1c03ec2000
> [  305.429292] RIP: 0010:[]  [] 
> prepare_elf64_ra
> m_headers_callback+0x165/0x260
> [...snip]
> 
> After tracing prepare_elf64_headers and prepare_elf64_ram_headers_callback,
> the code uses walk_system_ram_res to fill-in crash memory regions information
> to program header, so it counts those small memory regions that are resided in
> a page area. But, when kernel was using walk_system_ram_range in
> fill_up_crash_elf_data to count the number of crash memory regions, it filters
> out small regions. I printed those small memory regions, for example:
> 
> kexec: Get nr_ram ranges. vaddr=0x880077592258 paddr=0x77592258, sz=0xdc0
> 
> Base on the code in walk_system_ram_range, this memory region will be filtered
> out:
> 
> pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
> end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
> end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
> 
> So, the max_nr_ranges that's counted by kernel doesn't include small memory
> regions. That causes the page fault issue happened in later code path for
> preparing EFL headers.
> 
> This issus is not easy to reproduce on small machines that don't have too
> many CPUs because the allocated page aligned ELF buffer has more free space
> to cover those small memory regions' PT_LOAD headers.
> 
> v3:
> Changed the declaration of nr_ranges to be unsigned int*
> 
> v2:
> To simplify the patch description, removed some things about CPU number to
> avoid confusing patch reviewer.
> 
> Signed-off-by: Lee, Chun-Yi 
> ---
>  arch/x86/kernel/crash.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index e068d66..74ca2fe 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -185,10 +185,9 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>  }
>  
>  #ifdef CONFIG_KEXEC_FILE
> -static int get_nr_ram_ranges_callback(unsigned long start_pfn,
> - unsigned long nr_pfn, void *arg)
> +static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
>  {
> - int *nr_ranges = arg;
> + unsigned int *nr_ranges = arg;
>  
>   (*nr_ranges)++;
>   return 0;
> @@ -214,7 +213,7 @@ static void fill_up_crash_elf_data(struct crash_elf_data 
> *ced,
>  
>   ced->image = image;
>  
> - walk_system_ram_range(0, -1, _ranges,
> + walk_system_ram_res(0, -1, _ranges,
>   get_nr_ram_ranges_callback);
>  
>   ced->max_nr_ranges = nr_ranges;
> -- 
> 2.1.4
> 

Reviewed-by: Minfei Huang 

> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()

2015-09-30 Thread Minfei Huang
On 09/29/15 at 08:58pm, Lee, Chun-Yi wrote:
> This patch modified the code in fill_up_crash_elf_data by using
> walk_system_ram_res instead of walk_system_ram_range to count the max
> number of crash memory ranges. That's because the walk_system_ram_range
> filters out small memory regions that are resided in the same page, but
> walk_system_ram_res does not.
> 
> The oringial issue is page fault error that sometimes happened on big machines
> when preparing ELF headers:
> 
> [  305.291522] BUG: unable to handle kernel paging request at c90613fc9000
> [  305.299621] IP: [] 
> prepare_elf64_ram_headers_callback+0x165/0x260
> [  305.308300] PGD e32067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0
> [  305.315393] Oops: 0002 [#1] SMP
> [...snip]
> [  305.420953] task: 8e1c01ced600 ti: 8e1c03ec2000 task.ti: 
> 8e1c03ec2000
> [  305.429292] RIP: 0010:[]  [] 
> prepare_elf64_ra
> m_headers_callback+0x165/0x260
> [...snip]
> 
> After tracing prepare_elf64_headers and prepare_elf64_ram_headers_callback,
> the code uses walk_system_ram_res to fill-in crash memory regions information
> to program header, so it counts those small memory regions that are resided in
> a page area. But, when kernel was using walk_system_ram_range in
> fill_up_crash_elf_data to count the number of crash memory regions, it filters
> out small regions. I printed those small memory regions, for example:
> 
> kexec: Get nr_ram ranges. vaddr=0x880077592258 paddr=0x77592258, sz=0xdc0
> 
> Base on the code in walk_system_ram_range, this memory region will be filtered
> out:
> 
> pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
> end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
> end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
> 
> So, the max_nr_ranges that's counted by kernel doesn't include small memory
> regions. That causes the page fault issue happened in later code path for
> preparing EFL headers.
> 
> This issus is not easy to reproduce on small machines that don't have too
> many CPUs because the allocated page aligned ELF buffer has more free space
> to cover those small memory regions' PT_LOAD headers.
> 
> v3:
> Changed the declaration of nr_ranges to be unsigned int*
> 
> v2:
> To simplify the patch description, removed some things about CPU number to
> avoid confusing patch reviewer.
> 
> Signed-off-by: Lee, Chun-Yi 
> ---
>  arch/x86/kernel/crash.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index e068d66..74ca2fe 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -185,10 +185,9 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>  }
>  
>  #ifdef CONFIG_KEXEC_FILE
> -static int get_nr_ram_ranges_callback(unsigned long start_pfn,
> - unsigned long nr_pfn, void *arg)
> +static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
>  {
> - int *nr_ranges = arg;
> + unsigned int *nr_ranges = arg;
>  
>   (*nr_ranges)++;
>   return 0;
> @@ -214,7 +213,7 @@ static void fill_up_crash_elf_data(struct crash_elf_data 
> *ced,
>  
>   ced->image = image;
>  
> - walk_system_ram_range(0, -1, _ranges,
> + walk_system_ram_res(0, -1, _ranges,
>   get_nr_ram_ranges_callback);
>  
>   ced->max_nr_ranges = nr_ranges;
> -- 
> 2.1.4
> 

Reviewed-by: Minfei Huang 

> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()

2015-09-29 Thread Dave Young
On 09/29/15 at 08:58pm, Lee, Chun-Yi wrote:
> This patch modified the code in fill_up_crash_elf_data by using
> walk_system_ram_res instead of walk_system_ram_range to count the max
> number of crash memory ranges. That's because the walk_system_ram_range
> filters out small memory regions that are resided in the same page, but
> walk_system_ram_res does not.
> 
> The oringial issue is page fault error that sometimes happened on big machines
> when preparing ELF headers:
> 
> [  305.291522] BUG: unable to handle kernel paging request at c90613fc9000
> [  305.299621] IP: [] 
> prepare_elf64_ram_headers_callback+0x165/0x260
> [  305.308300] PGD e32067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0
> [  305.315393] Oops: 0002 [#1] SMP
> [...snip]
> [  305.420953] task: 8e1c01ced600 ti: 8e1c03ec2000 task.ti: 
> 8e1c03ec2000
> [  305.429292] RIP: 0010:[]  [] 
> prepare_elf64_ra
> m_headers_callback+0x165/0x260
> [...snip]
> 
> After tracing prepare_elf64_headers and prepare_elf64_ram_headers_callback,
> the code uses walk_system_ram_res to fill-in crash memory regions information
> to program header, so it counts those small memory regions that are resided in
> a page area. But, when kernel was using walk_system_ram_range in
> fill_up_crash_elf_data to count the number of crash memory regions, it filters
> out small regions. I printed those small memory regions, for example:
> 
> kexec: Get nr_ram ranges. vaddr=0x880077592258 paddr=0x77592258, sz=0xdc0
> 
> Base on the code in walk_system_ram_range, this memory region will be filtered
> out:
> 
> pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
> end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
> end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
> 
> So, the max_nr_ranges that's counted by kernel doesn't include small memory
> regions. That causes the page fault issue happened in later code path for
> preparing EFL headers.
> 
> This issus is not easy to reproduce on small machines that don't have too
> many CPUs because the allocated page aligned ELF buffer has more free space
> to cover those small memory regions' PT_LOAD headers.
> 
> v3:
> Changed the declaration of nr_ranges to be unsigned int*
> 
> v2:
> To simplify the patch description, removed some things about CPU number to
> avoid confusing patch reviewer.
> 
> Signed-off-by: Lee, Chun-Yi 
> ---
>  arch/x86/kernel/crash.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index e068d66..74ca2fe 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -185,10 +185,9 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>  }
>  
>  #ifdef CONFIG_KEXEC_FILE
> -static int get_nr_ram_ranges_callback(unsigned long start_pfn,
> - unsigned long nr_pfn, void *arg)
> +static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
>  {
> - int *nr_ranges = arg;
> + unsigned int *nr_ranges = arg;
>  
>   (*nr_ranges)++;
>   return 0;
> @@ -214,7 +213,7 @@ static void fill_up_crash_elf_data(struct crash_elf_data 
> *ced,
>  
>   ced->image = image;
>  
> - walk_system_ram_range(0, -1, _ranges,
> + walk_system_ram_res(0, -1, _ranges,
>   get_nr_ram_ranges_callback);
>  
>   ced->max_nr_ranges = nr_ranges;

Acked-by: Dave Young 

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()

2015-09-29 Thread Lee, Chun-Yi
This patch modified the code in fill_up_crash_elf_data by using
walk_system_ram_res instead of walk_system_ram_range to count the max
number of crash memory ranges. That's because the walk_system_ram_range
filters out small memory regions that are resided in the same page, but
walk_system_ram_res does not.

The oringial issue is page fault error that sometimes happened on big machines
when preparing ELF headers:

[  305.291522] BUG: unable to handle kernel paging request at c90613fc9000
[  305.299621] IP: [] 
prepare_elf64_ram_headers_callback+0x165/0x260
[  305.308300] PGD e32067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0
[  305.315393] Oops: 0002 [#1] SMP
[...snip]
[  305.420953] task: 8e1c01ced600 ti: 8e1c03ec2000 task.ti: 
8e1c03ec2000
[  305.429292] RIP: 0010:[]  [] 
prepare_elf64_ra
m_headers_callback+0x165/0x260
[...snip]

After tracing prepare_elf64_headers and prepare_elf64_ram_headers_callback,
the code uses walk_system_ram_res to fill-in crash memory regions information
to program header, so it counts those small memory regions that are resided in
a page area. But, when kernel was using walk_system_ram_range in
fill_up_crash_elf_data to count the number of crash memory regions, it filters
out small regions. I printed those small memory regions, for example:

kexec: Get nr_ram ranges. vaddr=0x880077592258 paddr=0x77592258, sz=0xdc0

Base on the code in walk_system_ram_range, this memory region will be filtered
out:

pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE

So, the max_nr_ranges that's counted by kernel doesn't include small memory
regions. That causes the page fault issue happened in later code path for
preparing EFL headers.

This issus is not easy to reproduce on small machines that don't have too
many CPUs because the allocated page aligned ELF buffer has more free space
to cover those small memory regions' PT_LOAD headers.

v3:
Changed the declaration of nr_ranges to be unsigned int*

v2:
To simplify the patch description, removed some things about CPU number to
avoid confusing patch reviewer.

Signed-off-by: Lee, Chun-Yi 
---
 arch/x86/kernel/crash.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e068d66..74ca2fe 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -185,10 +185,9 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_KEXEC_FILE
-static int get_nr_ram_ranges_callback(unsigned long start_pfn,
-   unsigned long nr_pfn, void *arg)
+static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
 {
-   int *nr_ranges = arg;
+   unsigned int *nr_ranges = arg;
 
(*nr_ranges)++;
return 0;
@@ -214,7 +213,7 @@ static void fill_up_crash_elf_data(struct crash_elf_data 
*ced,
 
ced->image = image;
 
-   walk_system_ram_range(0, -1, _ranges,
+   walk_system_ram_res(0, -1, _ranges,
get_nr_ram_ranges_callback);
 
ced->max_nr_ranges = nr_ranges;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()

2015-09-29 Thread Lee, Chun-Yi
This patch modified the code in fill_up_crash_elf_data by using
walk_system_ram_res instead of walk_system_ram_range to count the max
number of crash memory ranges. That's because the walk_system_ram_range
filters out small memory regions that are resided in the same page, but
walk_system_ram_res does not.

The oringial issue is page fault error that sometimes happened on big machines
when preparing ELF headers:

[  305.291522] BUG: unable to handle kernel paging request at c90613fc9000
[  305.299621] IP: [] 
prepare_elf64_ram_headers_callback+0x165/0x260
[  305.308300] PGD e32067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0
[  305.315393] Oops: 0002 [#1] SMP
[...snip]
[  305.420953] task: 8e1c01ced600 ti: 8e1c03ec2000 task.ti: 
8e1c03ec2000
[  305.429292] RIP: 0010:[]  [] 
prepare_elf64_ra
m_headers_callback+0x165/0x260
[...snip]

After tracing prepare_elf64_headers and prepare_elf64_ram_headers_callback,
the code uses walk_system_ram_res to fill-in crash memory regions information
to program header, so it counts those small memory regions that are resided in
a page area. But, when kernel was using walk_system_ram_range in
fill_up_crash_elf_data to count the number of crash memory regions, it filters
out small regions. I printed those small memory regions, for example:

kexec: Get nr_ram ranges. vaddr=0x880077592258 paddr=0x77592258, sz=0xdc0

Base on the code in walk_system_ram_range, this memory region will be filtered
out:

pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE

So, the max_nr_ranges that's counted by kernel doesn't include small memory
regions. That causes the page fault issue happened in later code path for
preparing EFL headers.

This issus is not easy to reproduce on small machines that don't have too
many CPUs because the allocated page aligned ELF buffer has more free space
to cover those small memory regions' PT_LOAD headers.

v3:
Changed the declaration of nr_ranges to be unsigned int*

v2:
To simplify the patch description, removed some things about CPU number to
avoid confusing patch reviewer.

Signed-off-by: Lee, Chun-Yi 
---
 arch/x86/kernel/crash.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e068d66..74ca2fe 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -185,10 +185,9 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_KEXEC_FILE
-static int get_nr_ram_ranges_callback(unsigned long start_pfn,
-   unsigned long nr_pfn, void *arg)
+static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
 {
-   int *nr_ranges = arg;
+   unsigned int *nr_ranges = arg;
 
(*nr_ranges)++;
return 0;
@@ -214,7 +213,7 @@ static void fill_up_crash_elf_data(struct crash_elf_data 
*ced,
 
ced->image = image;
 
-   walk_system_ram_range(0, -1, _ranges,
+   walk_system_ram_res(0, -1, _ranges,
get_nr_ram_ranges_callback);
 
ced->max_nr_ranges = nr_ranges;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()

2015-09-29 Thread Dave Young
On 09/29/15 at 08:58pm, Lee, Chun-Yi wrote:
> This patch modified the code in fill_up_crash_elf_data by using
> walk_system_ram_res instead of walk_system_ram_range to count the max
> number of crash memory ranges. That's because the walk_system_ram_range
> filters out small memory regions that are resided in the same page, but
> walk_system_ram_res does not.
> 
> The oringial issue is page fault error that sometimes happened on big machines
> when preparing ELF headers:
> 
> [  305.291522] BUG: unable to handle kernel paging request at c90613fc9000
> [  305.299621] IP: [] 
> prepare_elf64_ram_headers_callback+0x165/0x260
> [  305.308300] PGD e32067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0
> [  305.315393] Oops: 0002 [#1] SMP
> [...snip]
> [  305.420953] task: 8e1c01ced600 ti: 8e1c03ec2000 task.ti: 
> 8e1c03ec2000
> [  305.429292] RIP: 0010:[]  [] 
> prepare_elf64_ra
> m_headers_callback+0x165/0x260
> [...snip]
> 
> After tracing prepare_elf64_headers and prepare_elf64_ram_headers_callback,
> the code uses walk_system_ram_res to fill-in crash memory regions information
> to program header, so it counts those small memory regions that are resided in
> a page area. But, when kernel was using walk_system_ram_range in
> fill_up_crash_elf_data to count the number of crash memory regions, it filters
> out small regions. I printed those small memory regions, for example:
> 
> kexec: Get nr_ram ranges. vaddr=0x880077592258 paddr=0x77592258, sz=0xdc0
> 
> Base on the code in walk_system_ram_range, this memory region will be filtered
> out:
> 
> pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
> end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
> end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
> 
> So, the max_nr_ranges that's counted by kernel doesn't include small memory
> regions. That causes the page fault issue happened in later code path for
> preparing EFL headers.
> 
> This issus is not easy to reproduce on small machines that don't have too
> many CPUs because the allocated page aligned ELF buffer has more free space
> to cover those small memory regions' PT_LOAD headers.
> 
> v3:
> Changed the declaration of nr_ranges to be unsigned int*
> 
> v2:
> To simplify the patch description, removed some things about CPU number to
> avoid confusing patch reviewer.
> 
> Signed-off-by: Lee, Chun-Yi 
> ---
>  arch/x86/kernel/crash.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index e068d66..74ca2fe 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -185,10 +185,9 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>  }
>  
>  #ifdef CONFIG_KEXEC_FILE
> -static int get_nr_ram_ranges_callback(unsigned long start_pfn,
> - unsigned long nr_pfn, void *arg)
> +static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
>  {
> - int *nr_ranges = arg;
> + unsigned int *nr_ranges = arg;
>  
>   (*nr_ranges)++;
>   return 0;
> @@ -214,7 +213,7 @@ static void fill_up_crash_elf_data(struct crash_elf_data 
> *ced,
>  
>   ced->image = image;
>  
> - walk_system_ram_range(0, -1, _ranges,
> + walk_system_ram_res(0, -1, _ranges,
>   get_nr_ram_ranges_callback);
>  
>   ced->max_nr_ranges = nr_ranges;

Acked-by: Dave Young 

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/