Re: [PATCH 0/2] Build ORC fast lookup table in scripts/sorttable tool

2020-06-03 Thread changhuaixin



> On Jun 2, 2020, at 1:38 AM, Josh Poimboeuf  wrote:
> 
> On Sun, May 31, 2020 at 01:26:54PM +0800, changhuaixin wrote:
>>   It turned out to be an alignment problem. If sh_size of previous section
>>   orc_unwind is not 4-byte aligned, sh_offset of the following orc_lookup
>>   section is not 4-byte aligned too. However, the VMA of section orc_lookup
>>   is aligned to the nearest 4-byte. Thus, the orc_lookup section means two
>>   different ares for scripts/sorttable tool and kernel.
>> 
>>   Sections headers look like this when it happens:
>> 
>>   12 .orc_unwind_ip 00172124  82573b28  02573b28  01773b28
>>2**0
>>CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
>>   13 .orc_unwind   0022b1b6  826e5c4c  026e5c4c  018e5c4c
>>2**0
>>CONTENTS, ALLOC, LOAD, READONLY, DATA
>>   14 .orc_lookup   0003003c  82910e04  02910e04  01b10e02
>>2**0
>>ALLOC
>>   15 .vvar 1000  82941000  02941000  01b41000
>>2**4
>>CONTENTS, ALLOC, LOAD, DATA
>> 
>>   Sorttable tool uses the are starting with offset 0x01b10e02 for 0x0003003c
>>   bytes. While kernel use the area starting with VMA at  0x82910e04
>>   for 0x0003003c bytes, meaning that each entry in this table used by kernel
>>   is actually 2 bytes behind the corresponding entry set from sorttable
>>   tool.
>> 
>>   Any suggestion on fixing this?
> 
> The VMA and LMA are both 4-byte aligned.  The file offset alignment
> (0x01b10e02) shouldn't matter.
> 
> Actually it looks like the problem is that the section doesn't have
> CONTENTS, so it's just loaded as a BSS section (all zeros).  The section
> needs to be type SHT_PROGBITS instead of SHT_NOBITS.
> 
> $ readelf -S vmlinux |grep orc_lookup
>  [16] .orc_lookup   NOBITS   82b68418  01d68418
> 
> I tried to fix it with
> 
> diff --git a/scripts/sorttable.h b/scripts/sorttable.h
> index a36c76c17be4..76adb1fb88f8 100644
> --- a/scripts/sorttable.h
> +++ b/scripts/sorttable.h
> @@ -341,6 +341,7 @@ static int do_sort(Elf_Ehdr *ehdr,
>   param.lookup_table_size = s->sh_size;
>   param.orc_lookup_table = (unsigned int *)
>   ((void *)ehdr + s->sh_offset);
> + w(SHT_PROGBITS, >sh_type);
>   }
>   if (!strcmp(secstrings + idx, ".text")) {
>   param.text_size = s->sh_size;
> 
> 
> But that makes kallsyms unhappy, so I guess we need to do it from the
> linker script where .orc_lookup is created.
> 
> Linker script doesn't seem to allow manual specification of the section
> type, so this is the best I could come up with:
> 
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index db600ef218d7..49f4f5bc6165 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -826,6 +826,8 @@
>   . += (((SIZEOF(.text) + LOOKUP_BLOCK_SIZE - 1) /\
>   LOOKUP_BLOCK_SIZE) + 1) * 4;\
>   orc_lookup_end = .; \
> + /* HACK: force SHT_PROGBITS so sorttable can edit: */   \
> + BYTE(1);\
>   }
> #else
> #define ORC_UNWIND_TABLE

Thanks! It works.




Re: [PATCH 0/2] Build ORC fast lookup table in scripts/sorttable tool

2020-06-01 Thread Josh Poimboeuf
On Sun, May 31, 2020 at 01:26:54PM +0800, changhuaixin wrote:
>It turned out to be an alignment problem. If sh_size of previous section
>orc_unwind is not 4-byte aligned, sh_offset of the following orc_lookup
>section is not 4-byte aligned too. However, the VMA of section orc_lookup
>is aligned to the nearest 4-byte. Thus, the orc_lookup section means two
>different ares for scripts/sorttable tool and kernel.
> 
>Sections headers look like this when it happens:
> 
>12 .orc_unwind_ip 00172124  82573b28  02573b28  01773b28
> 2**0
> CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
>13 .orc_unwind   0022b1b6  826e5c4c  026e5c4c  018e5c4c
> 2**0
> CONTENTS, ALLOC, LOAD, READONLY, DATA
>14 .orc_lookup   0003003c  82910e04  02910e04  01b10e02
> 2**0
> ALLOC
>15 .vvar 1000  82941000  02941000  01b41000
> 2**4
> CONTENTS, ALLOC, LOAD, DATA
> 
>Sorttable tool uses the are starting with offset 0x01b10e02 for 0x0003003c
>bytes. While kernel use the area starting with VMA at  0x82910e04
>for 0x0003003c bytes, meaning that each entry in this table used by kernel
>is actually 2 bytes behind the corresponding entry set from sorttable
>tool.
> 
>Any suggestion on fixing this?

The VMA and LMA are both 4-byte aligned.  The file offset alignment
(0x01b10e02) shouldn't matter.

Actually it looks like the problem is that the section doesn't have
CONTENTS, so it's just loaded as a BSS section (all zeros).  The section
needs to be type SHT_PROGBITS instead of SHT_NOBITS.

$ readelf -S vmlinux |grep orc_lookup
  [16] .orc_lookup   NOBITS   82b68418  01d68418

I tried to fix it with

diff --git a/scripts/sorttable.h b/scripts/sorttable.h
index a36c76c17be4..76adb1fb88f8 100644
--- a/scripts/sorttable.h
+++ b/scripts/sorttable.h
@@ -341,6 +341,7 @@ static int do_sort(Elf_Ehdr *ehdr,
param.lookup_table_size = s->sh_size;
param.orc_lookup_table = (unsigned int *)
((void *)ehdr + s->sh_offset);
+   w(SHT_PROGBITS, >sh_type);
}
if (!strcmp(secstrings + idx, ".text")) {
param.text_size = s->sh_size;


But that makes kallsyms unhappy, so I guess we need to do it from the
linker script where .orc_lookup is created.

Linker script doesn't seem to allow manual specification of the section
type, so this is the best I could come up with:

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index db600ef218d7..49f4f5bc6165 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -826,6 +826,8 @@
. += (((SIZEOF(.text) + LOOKUP_BLOCK_SIZE - 1) /\
LOOKUP_BLOCK_SIZE) + 1) * 4;\
orc_lookup_end = .; \
+   /* HACK: force SHT_PROGBITS so sorttable can edit: */   \
+   BYTE(1);\
}
 #else
 #define ORC_UNWIND_TABLE



Re: [PATCH 0/2] Build ORC fast lookup table in scripts/sorttable tool

2020-05-26 Thread changhuaixin
Thanks for your kindly reply. Let me have a check.

> On May 23, 2020, at 2:28 AM, Josh Poimboeuf  wrote:
> 
> On Wed, Apr 29, 2020 at 02:46:24PM +0800, Huaixin Chang wrote:
>> Move building of fast lookup table from boot to sorttable tool. This saves us
>> 6380us boot time on Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz with cores.
>> 
>> Huaixin Chang (2):
>>  scripts/sorttable: Build orc fast lookup table via sorttable tool
>>  x86/unwind/orc: Remove unwind_init() from x86 boot
>> 
>> arch/x86/include/asm/unwind.h |  2 -
>> arch/x86/kernel/setup.c   |  2 -
>> arch/x86/kernel/unwind_orc.c  | 51 --
>> scripts/sorttable.h   | 99 
>> ---
>> 4 files changed, 92 insertions(+), 62 deletions(-)
> 
> I tested this (rebased on tip/master), it seems to break ORC
> completely... e.g. /proc/self/stack is empty.
> 
> -- 
> Josh



Re: [PATCH 0/2] Build ORC fast lookup table in scripts/sorttable tool

2020-05-22 Thread Josh Poimboeuf
On Wed, Apr 29, 2020 at 02:46:24PM +0800, Huaixin Chang wrote:
> Move building of fast lookup table from boot to sorttable tool. This saves us
> 6380us boot time on Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz with cores.
> 
> Huaixin Chang (2):
>   scripts/sorttable: Build orc fast lookup table via sorttable tool
>   x86/unwind/orc: Remove unwind_init() from x86 boot
> 
>  arch/x86/include/asm/unwind.h |  2 -
>  arch/x86/kernel/setup.c   |  2 -
>  arch/x86/kernel/unwind_orc.c  | 51 --
>  scripts/sorttable.h   | 99 
> ---
>  4 files changed, 92 insertions(+), 62 deletions(-)

I tested this (rebased on tip/master), it seems to break ORC
completely... e.g. /proc/self/stack is empty.

-- 
Josh



Re: [PATCH 0/2] Build ORC fast lookup table in scripts/sorttable tool

2020-04-29 Thread Josh Poimboeuf
On Wed, Apr 29, 2020 at 11:06:58PM -0500, Josh Poimboeuf wrote:
> On Thu, Apr 30, 2020 at 10:32:17AM +0800, changhuaixin wrote:
> > 
> > 
> > > On Apr 29, 2020, at 4:49 PM, Peter Zijlstra  wrote:
> > > 
> > > On Wed, Apr 29, 2020 at 02:46:24PM +0800, Huaixin Chang wrote:
> > >> Move building of fast lookup table from boot to sorttable tool. This 
> > >> saves us
> > >> 6380us boot time on Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz with cores.
> > > 
> > > And what does it add to the build time?
> > 
> > It takes a little more than 7ms to build fast lookup table in
> > sorttable on the same CPU. And it is on the critical path.
> 
> Thanks, I like it.  It will help make the in-kernel unwinder even
> simpler.  And it will enable unwinding from early boot.
> 
> Maybe someday we can move all the table sorting code into objtool, once
> we have objtool running on vmlinux.o.
> 
> I'll try to review the patches soon.

BTW, another cool feature would be for sorttable to run on modules
during the module linking phase.

-- 
Josh



Re: [PATCH 0/2] Build ORC fast lookup table in scripts/sorttable tool

2020-04-29 Thread Josh Poimboeuf
On Thu, Apr 30, 2020 at 10:32:17AM +0800, changhuaixin wrote:
> 
> 
> > On Apr 29, 2020, at 4:49 PM, Peter Zijlstra  wrote:
> > 
> > On Wed, Apr 29, 2020 at 02:46:24PM +0800, Huaixin Chang wrote:
> >> Move building of fast lookup table from boot to sorttable tool. This saves 
> >> us
> >> 6380us boot time on Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz with cores.
> > 
> > And what does it add to the build time?
> 
> It takes a little more than 7ms to build fast lookup table in
> sorttable on the same CPU. And it is on the critical path.

Thanks, I like it.  It will help make the in-kernel unwinder even
simpler.  And it will enable unwinding from early boot.

Maybe someday we can move all the table sorting code into objtool, once
we have objtool running on vmlinux.o.

I'll try to review the patches soon.

-- 
Josh



Re: [PATCH 0/2] Build ORC fast lookup table in scripts/sorttable tool

2020-04-29 Thread changhuaixin



> On Apr 29, 2020, at 4:49 PM, Peter Zijlstra  wrote:
> 
> On Wed, Apr 29, 2020 at 02:46:24PM +0800, Huaixin Chang wrote:
>> Move building of fast lookup table from boot to sorttable tool. This saves us
>> 6380us boot time on Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz with cores.
> 
> And what does it add to the build time?

It takes a little more than 7ms to build fast lookup table in sorttable on the 
same CPU. And it is on the critical path.  


Re: [PATCH 0/2] Build ORC fast lookup table in scripts/sorttable tool

2020-04-29 Thread Peter Zijlstra
On Wed, Apr 29, 2020 at 02:46:24PM +0800, Huaixin Chang wrote:
> Move building of fast lookup table from boot to sorttable tool. This saves us
> 6380us boot time on Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz with cores.

And what does it add to the build time?


[PATCH 0/2] Build ORC fast lookup table in scripts/sorttable tool

2020-04-29 Thread Huaixin Chang
Move building of fast lookup table from boot to sorttable tool. This saves us
6380us boot time on Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz with cores.

Huaixin Chang (2):
  scripts/sorttable: Build orc fast lookup table via sorttable tool
  x86/unwind/orc: Remove unwind_init() from x86 boot

 arch/x86/include/asm/unwind.h |  2 -
 arch/x86/kernel/setup.c   |  2 -
 arch/x86/kernel/unwind_orc.c  | 51 --
 scripts/sorttable.h   | 99 ---
 4 files changed, 92 insertions(+), 62 deletions(-)

-- 
2.14.4.44.g2045bb6