Re: [PATCH] make __section_nr more efficient

2016-07-21 Thread Dave Hansen
On 07/20/2016 06:55 PM, zhouchengming wrote:
> Thanks for your reply. I don't know the compiler will optimize the loop.
> But when I see the assembly code of __section_nr, it seems to still have
> the loop in it.

Oh, well.  I guess it got broken in the last decade or so.  Your patch
looks good to me, and the fact that we ended up here means the original
approach was at least a little fragile.


Re: [PATCH] make __section_nr more efficient

2016-07-20 Thread zhouchengming

On 2016/7/21 5:36, Dave Hansen wrote:

On 07/19/2016 09:18 PM, Zhou Chengming wrote:

When CONFIG_SPARSEMEM_EXTREME is disabled, __section_nr can get
the section number with a subtraction directly.


Does this actually *do* anything?

It was a long time ago, but if I remember correctly, the entire loop in
__section_nr() goes away because root_nr==NR_SECTION_ROOTS, so
root_nr=1, and the compiler optimizes away the entire subtraction.

So this basically adds an #ifdef and gets us nothing, although it makes
the situation much more explicit.  Perhaps the comment should say that
this works *and* is efficient because the compiler can optimize all the
extreme complexity away.

.



Thanks for your reply. I don't know the compiler will optimize the loop.
But when I see the assembly code of __section_nr, it seems to still have
the loop in it.

My gcc version: gcc version 4.9.0 (GCC)
CONFIG_SPARSEMEM_EXTREME: disabled

Before this patch:

 <__section_nr>:
   0:   55  push   %rbp
   1:   48 c7 c2 00 00 00 00mov$0x0,%rdx
4: R_X86_64_32S mem_section
   8:   31 c0   xor%eax,%eax
   a:   48 89 e5mov%rsp,%rbp
   d:   eb 0d   jmp1c <__section_nr+0x1c>
   f:   48 83 c0 01 add$0x1,%rax
  13:   48 81 fa 00 00 00 00cmp$0x0,%rdx
16: R_X86_64_32Smem_section+0x80
  1a:   74 26   je 42 <__section_nr+0x42>
  1c:   48 89 d1mov%rdx,%rcx
  1f:   ba 10 00 00 00  mov$0x10,%edx
  24:   48 85 c9test   %rcx,%rcx
  27:   74 e6   je f <__section_nr+0xf>
  29:   48 39 cfcmp%rcx,%rdi
  2c:   48 8d 51 10 lea0x10(%rcx),%rdx
  30:   72 dd   jb f <__section_nr+0xf>
  32:   48 39 d7cmp%rdx,%rdi
  35:   73 d8   jaef <__section_nr+0xf>
  37:   48 29 cfsub%rcx,%rdi
  3a:   48 c1 ff 04 sar$0x4,%rdi
  3e:   01 f8   add%edi,%eax
  40:   5d  pop%rbp
  41:   c3  retq
  42:   48 29 cfsub%rcx,%rdi
  45:   b8 00 00 08 00  mov$0x8,%eax
  4a:   48 c1 ff 04 sar$0x4,%rdi
  4e:   01 f8   add%edi,%eax
  50:   5d  pop%rbp
  51:   c3  retq
  52:   66 66 66 66 66 2e 0fdata32 data32 data32 data32 nopw 
%cs:0x0(%rax,%rax,1)
  59:   1f 84 00 00 00 00 00

After this patch:

 <__section_nr>:
   0:   55  push   %rbp
   1:   48 89 f8mov%rdi,%rax
   4:   48 2d 00 00 00 00   sub$0x0,%rax
6: R_X86_64_32S mem_section
   a:   48 89 e5mov%rsp,%rbp
   d:   48 c1 f8 04 sar$0x4,%rax
  11:   5d  pop%rbp
  12:   c3  retq
  13:   66 66 66 66 2e 0f 1fdata32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
  1a:   84 00 00 00 00 00


Thanks!





Re: [PATCH] make __section_nr more efficient

2016-07-20 Thread Dave Hansen
On 07/19/2016 09:18 PM, Zhou Chengming wrote:
> When CONFIG_SPARSEMEM_EXTREME is disabled, __section_nr can get
> the section number with a subtraction directly.

Does this actually *do* anything?

It was a long time ago, but if I remember correctly, the entire loop in
__section_nr() goes away because root_nr==NR_SECTION_ROOTS, so
root_nr=1, and the compiler optimizes away the entire subtraction.

So this basically adds an #ifdef and gets us nothing, although it makes
the situation much more explicit.  Perhaps the comment should say that
this works *and* is efficient because the compiler can optimize all the
extreme complexity away.