Re: Suboptimal inline heuristics due to non-code sections
Linus Torvaldswrote: > On Tue, May 1, 2018 at 9:46 AM Nadav Amit wrote: > >> My bad. It’s not the new-line. Let me do some more digging. > > From the gcc docs: > > Some targets require that GCC track the size of each instruction used > in order to generate correct code. Because the final length of the > code produced by an @code{asm} statement is only known by the > assembler, GCC must make an estimate as to how big it will be. It > does this by counting the number of instructions in the pattern of the > @code{asm} and multiplying that by the length of the longest > instruction supported by that processor. (When working out the number > of instructions, it assumes that any occurrence of a newline or of > whatever statement separator character is supported by the assembler -- > typically @samp{;} --- indicates the end of an instruction.) > > so it probably counts newlines and semicolons to estimate the size. Thanks. I probably did not have enough coffee. I’ll work on it. Regards, Nadav
Re: Suboptimal inline heuristics due to non-code sections
Linus Torvalds wrote: > On Tue, May 1, 2018 at 9:46 AM Nadav Amit wrote: > >> My bad. It’s not the new-line. Let me do some more digging. > > From the gcc docs: > > Some targets require that GCC track the size of each instruction used > in order to generate correct code. Because the final length of the > code produced by an @code{asm} statement is only known by the > assembler, GCC must make an estimate as to how big it will be. It > does this by counting the number of instructions in the pattern of the > @code{asm} and multiplying that by the length of the longest > instruction supported by that processor. (When working out the number > of instructions, it assumes that any occurrence of a newline or of > whatever statement separator character is supported by the assembler -- > typically @samp{;} --- indicates the end of an instruction.) > > so it probably counts newlines and semicolons to estimate the size. Thanks. I probably did not have enough coffee. I’ll work on it. Regards, Nadav
Re: Suboptimal inline heuristics due to non-code sections
On Tue, May 1, 2018 at 9:46 AM Nadav Amitwrote: > My bad. It’s not the new-line. Let me do some more digging. From the gcc docs: Some targets require that GCC track the size of each instruction used in order to generate correct code. Because the final length of the code produced by an @code{asm} statement is only known by the assembler, GCC must make an estimate as to how big it will be. It does this by counting the number of instructions in the pattern of the @code{asm} and multiplying that by the length of the longest instruction supported by that processor. (When working out the number of instructions, it assumes that any occurrence of a newline or of whatever statement separator character is supported by the assembler -- typically @samp{;} --- indicates the end of an instruction.) so it probably counts newlines and semicolons to estimate the size. Linus
Re: Suboptimal inline heuristics due to non-code sections
On Tue, May 1, 2018 at 9:46 AM Nadav Amit wrote: > My bad. It’s not the new-line. Let me do some more digging. From the gcc docs: Some targets require that GCC track the size of each instruction used in order to generate correct code. Because the final length of the code produced by an @code{asm} statement is only known by the assembler, GCC must make an estimate as to how big it will be. It does this by counting the number of instructions in the pattern of the @code{asm} and multiplying that by the length of the longest instruction supported by that processor. (When working out the number of instructions, it assumes that any occurrence of a newline or of whatever statement separator character is supported by the assembler -- typically @samp{;} --- indicates the end of an instruction.) so it probably counts newlines and semicolons to estimate the size. Linus
Re: Suboptimal inline heuristics due to non-code sections
On Tue, May 1, 2018 at 9:39 AM Nadav Amitwrote: > Would it be reasonable to remove new-lines in such cases? Yeah, that may be fine for some of our (already illegible) section crud. Linus
Re: Suboptimal inline heuristics due to non-code sections
On Tue, May 1, 2018 at 9:39 AM Nadav Amit wrote: > Would it be reasonable to remove new-lines in such cases? Yeah, that may be fine for some of our (already illegible) section crud. Linus
Re: Suboptimal inline heuristics due to non-code sections
Nadav Amitwrote: > Linus Torvalds wrote: > >> On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf wrote: >> >>> But if I remove the section completely by removing the >>> pushsection/popsection, then copy_overflow() gets inlined. >> >>> So GCC's inlining decisions are somehow influenced by the existence of >>> some random empty section. This definitely seems like a GCC bug to me. >> >> I think gcc uses the size of the string to approximate the size of an >> inline asm. >> >> So I don't think it's the "empty section" that makes gcc do this, I think >> it's literally "our inline asms _look_ big”. > > I didn’t think about that. > > Playing with the code a bit more, it seems that it is actually related to > the number of “new-lines” in the inline assembly. Removing 4 new-lines from > _BUG_FLAGS (those that can be removed without breaking assembly) eliminated > most of the non-inlined versions of copy_overflow(). > > Would it be reasonable to remove new-lines in such cases? My bad. It’s not the new-line. Let me do some more digging. Nadav
Re: Suboptimal inline heuristics due to non-code sections
Nadav Amit wrote: > Linus Torvalds wrote: > >> On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf wrote: >> >>> But if I remove the section completely by removing the >>> pushsection/popsection, then copy_overflow() gets inlined. >> >>> So GCC's inlining decisions are somehow influenced by the existence of >>> some random empty section. This definitely seems like a GCC bug to me. >> >> I think gcc uses the size of the string to approximate the size of an >> inline asm. >> >> So I don't think it's the "empty section" that makes gcc do this, I think >> it's literally "our inline asms _look_ big”. > > I didn’t think about that. > > Playing with the code a bit more, it seems that it is actually related to > the number of “new-lines” in the inline assembly. Removing 4 new-lines from > _BUG_FLAGS (those that can be removed without breaking assembly) eliminated > most of the non-inlined versions of copy_overflow(). > > Would it be reasonable to remove new-lines in such cases? My bad. It’s not the new-line. Let me do some more digging. Nadav
Re: Suboptimal inline heuristics due to non-code sections
Linus Torvaldswrote: > On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf wrote: > >> But if I remove the section completely by removing the >> pushsection/popsection, then copy_overflow() gets inlined. > >> So GCC's inlining decisions are somehow influenced by the existence of >> some random empty section. This definitely seems like a GCC bug to me. > > I think gcc uses the size of the string to approximate the size of an > inline asm. > > So I don't think it's the "empty section" that makes gcc do this, I think > it's literally "our inline asms _look_ big”. I didn’t think about that. Playing with the code a bit more, it seems that it is actually related to the number of “new-lines” in the inline assembly. Removing 4 new-lines from _BUG_FLAGS (those that can be removed without breaking assembly) eliminated most of the non-inlined versions of copy_overflow(). Would it be reasonable to remove new-lines in such cases? Regards, Nadav
Re: Suboptimal inline heuristics due to non-code sections
Linus Torvalds wrote: > On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf wrote: > >> But if I remove the section completely by removing the >> pushsection/popsection, then copy_overflow() gets inlined. > >> So GCC's inlining decisions are somehow influenced by the existence of >> some random empty section. This definitely seems like a GCC bug to me. > > I think gcc uses the size of the string to approximate the size of an > inline asm. > > So I don't think it's the "empty section" that makes gcc do this, I think > it's literally "our inline asms _look_ big”. I didn’t think about that. Playing with the code a bit more, it seems that it is actually related to the number of “new-lines” in the inline assembly. Removing 4 new-lines from _BUG_FLAGS (those that can be removed without breaking assembly) eliminated most of the non-inlined versions of copy_overflow(). Would it be reasonable to remove new-lines in such cases? Regards, Nadav
Re: Suboptimal inline heuristics due to non-code sections
On Tue, May 1, 2018 at 6:40 AM Josh Poimboeufwrote: > But if I remove the section completely by removing the > pushsection/popsection, then copy_overflow() gets inlined. > So GCC's inlining decisions are somehow influenced by the existence of > some random empty section. This definitely seems like a GCC bug to me. I think gcc uses the size of the string to approximate the size of an inline asm. So I don't think it's the "empty section" that makes gcc do this, I think it's literally "our inline asms _look_ big". Linus "does this section directive make me look fat?" Torvalds
Re: Suboptimal inline heuristics due to non-code sections
On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf wrote: > But if I remove the section completely by removing the > pushsection/popsection, then copy_overflow() gets inlined. > So GCC's inlining decisions are somehow influenced by the existence of > some random empty section. This definitely seems like a GCC bug to me. I think gcc uses the size of the string to approximate the size of an inline asm. So I don't think it's the "empty section" that makes gcc do this, I think it's literally "our inline asms _look_ big". Linus "does this section directive make me look fat?" Torvalds
Re: Suboptimal inline heuristics due to non-code sections
On Tue, May 01, 2018 at 06:50:14AM +, Nadav Amit wrote: > When gcc considers the size of a function for inlining decisions, it > apparently considers *all* sections. Since the kernel extensively uses > sections for things other than code (e.g., exception-table, bug-table), the > optimality of these decisions seem questionable to me. > > The objtool’s sections may be the most extreme case, as these sections are > discarded, while their size still appears to be considered by the inlining > heuristics. It may be beneficial not to consider (some) the other sections > as well, as they do not affect code-caching but only increase the kernel > size. > > To illustrate the issue, consider the function copy_overflow(): > >0x819315e0 <+0>: push %rbp >0x819315e1 <+1>: mov%rsi,%rdx >0x819315e4 <+4>: mov%edi,%esi >0x819315e6 <+6>: mov$0x820bc4b8,%rdi >0x819315ed <+13>: mov%rsp,%rbp >0x819315f0 <+16>: callq 0x81089b70 <__warn_printk> >0x819315f5 <+21>: ud2 >0x819315f7 <+23>: pop%rbp >0x819315f8 <+24>: retq > > This function seems to me as a great candidate for inlining. Yet, in my 4.16 > build (using gcc 7.2), I get 38 non-inlined instances of this function in > vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number > non-inlined instances to 35. Removing, in addition, the data which is saved > in the __bug_table makes all the instances of the function to be inlined. > > Obviously this certain function can be set as __always_inline, but the inline > heuristics seems to me as wrongfully biased. > > What do you think? > > Is there a way to make gcc to ignore sections for its inlining heuristics? Good find. Playing around with one of the affected files (crypto/af_alg.o), if I make the .discard.reachable section empty by removing the text reference from the annotate_reachable() macro, then copy_overflow() still isn't inlined. But if I remove the section completely by removing the pushsection/popsection, then copy_overflow() gets inlined. So GCC's inlining decisions are somehow influenced by the existence of some random empty section. This definitely seems like a GCC bug to me. -- Josh
Re: Suboptimal inline heuristics due to non-code sections
On Tue, May 01, 2018 at 06:50:14AM +, Nadav Amit wrote: > When gcc considers the size of a function for inlining decisions, it > apparently considers *all* sections. Since the kernel extensively uses > sections for things other than code (e.g., exception-table, bug-table), the > optimality of these decisions seem questionable to me. > > The objtool’s sections may be the most extreme case, as these sections are > discarded, while their size still appears to be considered by the inlining > heuristics. It may be beneficial not to consider (some) the other sections > as well, as they do not affect code-caching but only increase the kernel > size. > > To illustrate the issue, consider the function copy_overflow(): > >0x819315e0 <+0>: push %rbp >0x819315e1 <+1>: mov%rsi,%rdx >0x819315e4 <+4>: mov%edi,%esi >0x819315e6 <+6>: mov$0x820bc4b8,%rdi >0x819315ed <+13>: mov%rsp,%rbp >0x819315f0 <+16>: callq 0x81089b70 <__warn_printk> >0x819315f5 <+21>: ud2 >0x819315f7 <+23>: pop%rbp >0x819315f8 <+24>: retq > > This function seems to me as a great candidate for inlining. Yet, in my 4.16 > build (using gcc 7.2), I get 38 non-inlined instances of this function in > vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number > non-inlined instances to 35. Removing, in addition, the data which is saved > in the __bug_table makes all the instances of the function to be inlined. > > Obviously this certain function can be set as __always_inline, but the inline > heuristics seems to me as wrongfully biased. > > What do you think? > > Is there a way to make gcc to ignore sections for its inlining heuristics? Good find. Playing around with one of the affected files (crypto/af_alg.o), if I make the .discard.reachable section empty by removing the text reference from the annotate_reachable() macro, then copy_overflow() still isn't inlined. But if I remove the section completely by removing the pushsection/popsection, then copy_overflow() gets inlined. So GCC's inlining decisions are somehow influenced by the existence of some random empty section. This definitely seems like a GCC bug to me. -- Josh
Suboptimal inline heuristics due to non-code sections
When gcc considers the size of a function for inlining decisions, it apparently considers *all* sections. Since the kernel extensively uses sections for things other than code (e.g., exception-table, bug-table), the optimality of these decisions seem questionable to me. The objtool’s sections may be the most extreme case, as these sections are discarded, while their size still appears to be considered by the inlining heuristics. It may be beneficial not to consider (some) the other sections as well, as they do not affect code-caching but only increase the kernel size. To illustrate the issue, consider the function copy_overflow(): 0x819315e0 <+0>: push %rbp 0x819315e1 <+1>: mov%rsi,%rdx 0x819315e4 <+4>: mov%edi,%esi 0x819315e6 <+6>: mov$0x820bc4b8,%rdi 0x819315ed <+13>:mov%rsp,%rbp 0x819315f0 <+16>:callq 0x81089b70 <__warn_printk> 0x819315f5 <+21>:ud2 0x819315f7 <+23>:pop%rbp 0x819315f8 <+24>:retq This function seems to me as a great candidate for inlining. Yet, in my 4.16 build (using gcc 7.2), I get 38 non-inlined instances of this function in vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number non-inlined instances to 35. Removing, in addition, the data which is saved in the __bug_table makes all the instances of the function to be inlined. Obviously this certain function can be set as __always_inline, but the inline heuristics seems to me as wrongfully biased. What do you think? Is there a way to make gcc to ignore sections for its inlining heuristics? Thanks, Nadav
Suboptimal inline heuristics due to non-code sections
When gcc considers the size of a function for inlining decisions, it apparently considers *all* sections. Since the kernel extensively uses sections for things other than code (e.g., exception-table, bug-table), the optimality of these decisions seem questionable to me. The objtool’s sections may be the most extreme case, as these sections are discarded, while their size still appears to be considered by the inlining heuristics. It may be beneficial not to consider (some) the other sections as well, as they do not affect code-caching but only increase the kernel size. To illustrate the issue, consider the function copy_overflow(): 0x819315e0 <+0>: push %rbp 0x819315e1 <+1>: mov%rsi,%rdx 0x819315e4 <+4>: mov%edi,%esi 0x819315e6 <+6>: mov$0x820bc4b8,%rdi 0x819315ed <+13>:mov%rsp,%rbp 0x819315f0 <+16>:callq 0x81089b70 <__warn_printk> 0x819315f5 <+21>:ud2 0x819315f7 <+23>:pop%rbp 0x819315f8 <+24>:retq This function seems to me as a great candidate for inlining. Yet, in my 4.16 build (using gcc 7.2), I get 38 non-inlined instances of this function in vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number non-inlined instances to 35. Removing, in addition, the data which is saved in the __bug_table makes all the instances of the function to be inlined. Obviously this certain function can be set as __always_inline, but the inline heuristics seems to me as wrongfully biased. What do you think? Is there a way to make gcc to ignore sections for its inlining heuristics? Thanks, Nadav