Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Nadav Amit
Linus Torvalds  wrote:

> On Tue, May 1, 2018 at 9:46 AM Nadav Amit  wrote:
> 
>> My bad. It’s not the new-line. Let me do some more digging.
> 
> From the gcc docs:
> 
>   Some targets require that GCC track the size of each instruction used
>   in order to generate correct code.  Because the final length of the
>   code produced by an @code{asm} statement is only known by the
>   assembler, GCC must make an estimate as to how big it will be.  It
>   does this by counting the number of instructions in the pattern of the
>   @code{asm} and multiplying that by the length of the longest
>   instruction supported by that processor.  (When working out the number
>   of instructions, it assumes that any occurrence of a newline or of
>   whatever statement separator character is supported by the assembler --
>   typically @samp{;} --- indicates the end of an instruction.)
> 
> so it probably counts newlines and semicolons to estimate the size.

Thanks. I probably did not have enough coffee.

I’ll work on it.

Regards,
Nadav

Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Nadav Amit
Linus Torvalds  wrote:

> On Tue, May 1, 2018 at 9:46 AM Nadav Amit  wrote:
> 
>> My bad. It’s not the new-line. Let me do some more digging.
> 
> From the gcc docs:
> 
>   Some targets require that GCC track the size of each instruction used
>   in order to generate correct code.  Because the final length of the
>   code produced by an @code{asm} statement is only known by the
>   assembler, GCC must make an estimate as to how big it will be.  It
>   does this by counting the number of instructions in the pattern of the
>   @code{asm} and multiplying that by the length of the longest
>   instruction supported by that processor.  (When working out the number
>   of instructions, it assumes that any occurrence of a newline or of
>   whatever statement separator character is supported by the assembler --
>   typically @samp{;} --- indicates the end of an instruction.)
> 
> so it probably counts newlines and semicolons to estimate the size.

Thanks. I probably did not have enough coffee.

I’ll work on it.

Regards,
Nadav

Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Linus Torvalds
On Tue, May 1, 2018 at 9:46 AM Nadav Amit  wrote:

> My bad. It’s not the new-line. Let me do some more digging.

 From the gcc docs:

   Some targets require that GCC track the size of each instruction used
   in order to generate correct code.  Because the final length of the
   code produced by an @code{asm} statement is only known by the
   assembler, GCC must make an estimate as to how big it will be.  It
   does this by counting the number of instructions in the pattern of the
   @code{asm} and multiplying that by the length of the longest
   instruction supported by that processor.  (When working out the number
   of instructions, it assumes that any occurrence of a newline or of
   whatever statement separator character is supported by the assembler --
   typically @samp{;} --- indicates the end of an instruction.)

so it probably counts newlines and semicolons to estimate the size.

 Linus


Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Linus Torvalds
On Tue, May 1, 2018 at 9:46 AM Nadav Amit  wrote:

> My bad. It’s not the new-line. Let me do some more digging.

 From the gcc docs:

   Some targets require that GCC track the size of each instruction used
   in order to generate correct code.  Because the final length of the
   code produced by an @code{asm} statement is only known by the
   assembler, GCC must make an estimate as to how big it will be.  It
   does this by counting the number of instructions in the pattern of the
   @code{asm} and multiplying that by the length of the longest
   instruction supported by that processor.  (When working out the number
   of instructions, it assumes that any occurrence of a newline or of
   whatever statement separator character is supported by the assembler --
   typically @samp{;} --- indicates the end of an instruction.)

so it probably counts newlines and semicolons to estimate the size.

 Linus


Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Linus Torvalds
On Tue, May 1, 2018 at 9:39 AM Nadav Amit  wrote:

> Would it be reasonable to remove new-lines in such cases?

Yeah, that may be fine for some of our (already illegible) section crud.

 Linus


Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Linus Torvalds
On Tue, May 1, 2018 at 9:39 AM Nadav Amit  wrote:

> Would it be reasonable to remove new-lines in such cases?

Yeah, that may be fine for some of our (already illegible) section crud.

 Linus


Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Nadav Amit
Nadav Amit  wrote:

> Linus Torvalds  wrote:
> 
>> On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf  wrote:
>> 
>>> But if I remove the section completely by removing the
>>> pushsection/popsection, then copy_overflow() gets inlined.
>> 
>>> So GCC's inlining decisions are somehow influenced by the existence of
>>> some random empty section.  This definitely seems like a GCC bug to me.
>> 
>> I think gcc uses the size of the string to approximate the size of an
>> inline asm.
>> 
>> So I don't think it's the "empty section" that makes gcc do this, I think
>> it's literally "our inline asms _look_ big”.
> 
> I didn’t think about that.
> 
> Playing with the code a bit more, it seems that it is actually related to
> the number of “new-lines” in the inline assembly. Removing 4 new-lines from
> _BUG_FLAGS (those that can be removed without breaking assembly) eliminated
> most of the non-inlined versions of copy_overflow().
> 
> Would it be reasonable to remove new-lines in such cases?

My bad. It’s not the new-line. Let me do some more digging.

Nadav



Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Nadav Amit
Nadav Amit  wrote:

> Linus Torvalds  wrote:
> 
>> On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf  wrote:
>> 
>>> But if I remove the section completely by removing the
>>> pushsection/popsection, then copy_overflow() gets inlined.
>> 
>>> So GCC's inlining decisions are somehow influenced by the existence of
>>> some random empty section.  This definitely seems like a GCC bug to me.
>> 
>> I think gcc uses the size of the string to approximate the size of an
>> inline asm.
>> 
>> So I don't think it's the "empty section" that makes gcc do this, I think
>> it's literally "our inline asms _look_ big”.
> 
> I didn’t think about that.
> 
> Playing with the code a bit more, it seems that it is actually related to
> the number of “new-lines” in the inline assembly. Removing 4 new-lines from
> _BUG_FLAGS (those that can be removed without breaking assembly) eliminated
> most of the non-inlined versions of copy_overflow().
> 
> Would it be reasonable to remove new-lines in such cases?

My bad. It’s not the new-line. Let me do some more digging.

Nadav



Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Nadav Amit
Linus Torvalds  wrote:

> On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf  wrote:
> 
>> But if I remove the section completely by removing the
>> pushsection/popsection, then copy_overflow() gets inlined.
> 
>> So GCC's inlining decisions are somehow influenced by the existence of
>> some random empty section.  This definitely seems like a GCC bug to me.
> 
> I think gcc uses the size of the string to approximate the size of an
> inline asm.
> 
> So I don't think it's the "empty section" that makes gcc do this, I think
> it's literally "our inline asms _look_ big”.

I didn’t think about that.

Playing with the code a bit more, it seems that it is actually related to
the number of “new-lines” in the inline assembly. Removing 4 new-lines from
_BUG_FLAGS (those that can be removed without breaking assembly) eliminated
most of the non-inlined versions of copy_overflow().

Would it be reasonable to remove new-lines in such cases?

Regards,
Nadav

Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Nadav Amit
Linus Torvalds  wrote:

> On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf  wrote:
> 
>> But if I remove the section completely by removing the
>> pushsection/popsection, then copy_overflow() gets inlined.
> 
>> So GCC's inlining decisions are somehow influenced by the existence of
>> some random empty section.  This definitely seems like a GCC bug to me.
> 
> I think gcc uses the size of the string to approximate the size of an
> inline asm.
> 
> So I don't think it's the "empty section" that makes gcc do this, I think
> it's literally "our inline asms _look_ big”.

I didn’t think about that.

Playing with the code a bit more, it seems that it is actually related to
the number of “new-lines” in the inline assembly. Removing 4 new-lines from
_BUG_FLAGS (those that can be removed without breaking assembly) eliminated
most of the non-inlined versions of copy_overflow().

Would it be reasonable to remove new-lines in such cases?

Regards,
Nadav

Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Linus Torvalds
On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf  wrote:

> But if I remove the section completely by removing the
> pushsection/popsection, then copy_overflow() gets inlined.

> So GCC's inlining decisions are somehow influenced by the existence of
> some random empty section.  This definitely seems like a GCC bug to me.

I think gcc uses the size of the string to approximate the size of an
inline asm.

So I don't think it's the "empty section" that makes gcc do this, I think
it's literally "our inline asms _look_ big".

Linus "does this section directive make me look fat?"
Torvalds


Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Linus Torvalds
On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf  wrote:

> But if I remove the section completely by removing the
> pushsection/popsection, then copy_overflow() gets inlined.

> So GCC's inlining decisions are somehow influenced by the existence of
> some random empty section.  This definitely seems like a GCC bug to me.

I think gcc uses the size of the string to approximate the size of an
inline asm.

So I don't think it's the "empty section" that makes gcc do this, I think
it's literally "our inline asms _look_ big".

Linus "does this section directive make me look fat?"
Torvalds


Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Josh Poimboeuf
On Tue, May 01, 2018 at 06:50:14AM +, Nadav Amit wrote:
> When gcc considers the size of a function for inlining decisions, it
> apparently considers *all* sections. Since the kernel extensively uses
> sections for things other than code (e.g., exception-table, bug-table), the
> optimality of these decisions seem questionable to me.
> 
> The objtool’s sections may be the most extreme case, as these sections are
> discarded, while their size still appears to be considered by the inlining
> heuristics. It may be beneficial not to consider (some) the other sections
> as well, as they do not affect code-caching but only increase the kernel
> size.
> 
> To illustrate the issue, consider the function copy_overflow():
> 
>0x819315e0 <+0>:   push   %rbp
>0x819315e1 <+1>:   mov%rsi,%rdx
>0x819315e4 <+4>:   mov%edi,%esi
>0x819315e6 <+6>:   mov$0x820bc4b8,%rdi
>0x819315ed <+13>:  mov%rsp,%rbp
>0x819315f0 <+16>:  callq  0x81089b70 <__warn_printk>
>0x819315f5 <+21>:  ud2
>0x819315f7 <+23>:  pop%rbp
>0x819315f8 <+24>:  retq   
> 
> This function seems to me as a great candidate for inlining. Yet, in my 4.16
> build (using gcc 7.2), I get 38 non-inlined instances of this function in
> vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number
> non-inlined instances to 35. Removing, in addition, the data which is saved
> in the __bug_table makes all the instances of the function to be inlined.
> 
> Obviously this certain function can be set as __always_inline, but the inline
> heuristics seems to me as wrongfully biased. 
> 
> What do you think?
> 
> Is there a way to make gcc to ignore sections for its inlining heuristics?

Good find.

Playing around with one of the affected files (crypto/af_alg.o), if I
make the .discard.reachable section empty by removing the text reference
from the annotate_reachable() macro, then copy_overflow() still isn't
inlined.

But if I remove the section completely by removing the
pushsection/popsection, then copy_overflow() gets inlined.

So GCC's inlining decisions are somehow influenced by the existence of
some random empty section.  This definitely seems like a GCC bug to me.

-- 
Josh


Re: Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Josh Poimboeuf
On Tue, May 01, 2018 at 06:50:14AM +, Nadav Amit wrote:
> When gcc considers the size of a function for inlining decisions, it
> apparently considers *all* sections. Since the kernel extensively uses
> sections for things other than code (e.g., exception-table, bug-table), the
> optimality of these decisions seem questionable to me.
> 
> The objtool’s sections may be the most extreme case, as these sections are
> discarded, while their size still appears to be considered by the inlining
> heuristics. It may be beneficial not to consider (some) the other sections
> as well, as they do not affect code-caching but only increase the kernel
> size.
> 
> To illustrate the issue, consider the function copy_overflow():
> 
>0x819315e0 <+0>:   push   %rbp
>0x819315e1 <+1>:   mov%rsi,%rdx
>0x819315e4 <+4>:   mov%edi,%esi
>0x819315e6 <+6>:   mov$0x820bc4b8,%rdi
>0x819315ed <+13>:  mov%rsp,%rbp
>0x819315f0 <+16>:  callq  0x81089b70 <__warn_printk>
>0x819315f5 <+21>:  ud2
>0x819315f7 <+23>:  pop%rbp
>0x819315f8 <+24>:  retq   
> 
> This function seems to me as a great candidate for inlining. Yet, in my 4.16
> build (using gcc 7.2), I get 38 non-inlined instances of this function in
> vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number
> non-inlined instances to 35. Removing, in addition, the data which is saved
> in the __bug_table makes all the instances of the function to be inlined.
> 
> Obviously this certain function can be set as __always_inline, but the inline
> heuristics seems to me as wrongfully biased. 
> 
> What do you think?
> 
> Is there a way to make gcc to ignore sections for its inlining heuristics?

Good find.

Playing around with one of the affected files (crypto/af_alg.o), if I
make the .discard.reachable section empty by removing the text reference
from the annotate_reachable() macro, then copy_overflow() still isn't
inlined.

But if I remove the section completely by removing the
pushsection/popsection, then copy_overflow() gets inlined.

So GCC's inlining decisions are somehow influenced by the existence of
some random empty section.  This definitely seems like a GCC bug to me.

-- 
Josh


Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Nadav Amit
When gcc considers the size of a function for inlining decisions, it
apparently considers *all* sections. Since the kernel extensively uses
sections for things other than code (e.g., exception-table, bug-table), the
optimality of these decisions seem questionable to me.

The objtool’s sections may be the most extreme case, as these sections are
discarded, while their size still appears to be considered by the inlining
heuristics. It may be beneficial not to consider (some) the other sections
as well, as they do not affect code-caching but only increase the kernel
size.

To illustrate the issue, consider the function copy_overflow():

   0x819315e0 <+0>: push   %rbp
   0x819315e1 <+1>: mov%rsi,%rdx
   0x819315e4 <+4>: mov%edi,%esi
   0x819315e6 <+6>: mov$0x820bc4b8,%rdi
   0x819315ed <+13>:mov%rsp,%rbp
   0x819315f0 <+16>:callq  0x81089b70 <__warn_printk>
   0x819315f5 <+21>:ud2
   0x819315f7 <+23>:pop%rbp
   0x819315f8 <+24>:retq   

This function seems to me as a great candidate for inlining. Yet, in my 4.16
build (using gcc 7.2), I get 38 non-inlined instances of this function in
vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number
non-inlined instances to 35. Removing, in addition, the data which is saved
in the __bug_table makes all the instances of the function to be inlined.

Obviously this certain function can be set as __always_inline, but the inline
heuristics seems to me as wrongfully biased. 

What do you think?

Is there a way to make gcc to ignore sections for its inlining heuristics?

Thanks,
Nadav

Suboptimal inline heuristics due to non-code sections

2018-05-01 Thread Nadav Amit
When gcc considers the size of a function for inlining decisions, it
apparently considers *all* sections. Since the kernel extensively uses
sections for things other than code (e.g., exception-table, bug-table), the
optimality of these decisions seem questionable to me.

The objtool’s sections may be the most extreme case, as these sections are
discarded, while their size still appears to be considered by the inlining
heuristics. It may be beneficial not to consider (some) the other sections
as well, as they do not affect code-caching but only increase the kernel
size.

To illustrate the issue, consider the function copy_overflow():

   0x819315e0 <+0>: push   %rbp
   0x819315e1 <+1>: mov%rsi,%rdx
   0x819315e4 <+4>: mov%edi,%esi
   0x819315e6 <+6>: mov$0x820bc4b8,%rdi
   0x819315ed <+13>:mov%rsp,%rbp
   0x819315f0 <+16>:callq  0x81089b70 <__warn_printk>
   0x819315f5 <+21>:ud2
   0x819315f7 <+23>:pop%rbp
   0x819315f8 <+24>:retq   

This function seems to me as a great candidate for inlining. Yet, in my 4.16
build (using gcc 7.2), I get 38 non-inlined instances of this function in
vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number
non-inlined instances to 35. Removing, in addition, the data which is saved
in the __bug_table makes all the instances of the function to be inlined.

Obviously this certain function can be set as __always_inline, but the inline
heuristics seems to me as wrongfully biased. 

What do you think?

Is there a way to make gcc to ignore sections for its inlining heuristics?

Thanks,
Nadav