Re: Inlining can be _very_bad...
On Fri, Mar 30, 2007 at 12:01:11AM +0200, J.A. Magallón wrote: > On Thu, 29 Mar 2007 19:52:54 +0200, Adrian Bunk <[EMAIL PROTECTED]> wrote: > > > On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote: > > > Hi all... > > > > > > I post this here as it can be of direct interest for kernel development > > > (as I recall many discussions about inlining yes or no...). > > > > > > Testing other problems, I finally got this this issue: the same short > > > and stupid loop lasted from 3 to 5 times more if it was in main() than > > > if it was in an out-of-line function. The same (bad thing) happens if > > > the function is inlined. > > >... > > > It looks like is updating the stack on each iteration...This is > > > -march=opteron > > > code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4. > > > > > > tst.c and Makefile attached. > > > > > > Nice, isn't it ? Please, probe where is my fault... > > > > The only fault is to post this issue here instead of the gcc Bugzilla. > > Sorry, my intention was just something like 'take a look at your > reduction-like code, perhaps its slw', something like checksum > funtions in tcp or raid that are inlined expecting to be faster > and in fact they are slower... Unless a function that has more than 1 caller is very tiny or reduces at compile time to a very tiny rest, it's not expected that inlining was faster on current CPUs. But most times that's already only up to the compiler - e.g. current gcc versions already automatically inline all static functions with only 1 caller. > > In your example the compiler should produce code not slower than with > > the out-of-line version when inlining. If it doesn't the bug in the > > compiler resulting in this should be fixed. > > That's what I expected, but... > Going to gcc bugzilla... Thanks. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inlining can be _very_bad...
On Thu, 29 Mar 2007 19:52:54 +0200, Adrian Bunk <[EMAIL PROTECTED]> wrote: > On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote: > > Hi all... > > > > I post this here as it can be of direct interest for kernel development > > (as I recall many discussions about inlining yes or no...). > > > > Testing other problems, I finally got this this issue: the same short > > and stupid loop lasted from 3 to 5 times more if it was in main() than > > if it was in an out-of-line function. The same (bad thing) happens if > > the function is inlined. > >... > > It looks like is updating the stack on each iteration...This is > > -march=opteron > > code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4. > > > > tst.c and Makefile attached. > > > > Nice, isn't it ? Please, probe where is my fault... > > The only fault is to post this issue here instead of the gcc Bugzilla. > Sorry, my intention was just something like 'take a look at your reduction-like code, perhaps its slw', something like checksum funtions in tcp or raid that are inlined expecting to be faster and in fact they are slower... > In your example the compiler should produce code not slower than with > the out-of-line version when inlining. If it doesn't the bug in the > compiler resulting in this should be fixed. > That's what I expected, but... Going to gcc bugzilla... -- J.A. Magallon \ Software is like sex: \ It's better when it's free Mandriva Linux release 2007.1 (Cooker) for i586 Linux 2.6.20-jam06 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #2 SMP PREEMPT - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inlining can be _very_bad...
On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote: > Hi all... > > I post this here as it can be of direct interest for kernel development > (as I recall many discussions about inlining yes or no...). > > Testing other problems, I finally got this this issue: the same short > and stupid loop lasted from 3 to 5 times more if it was in main() than > if it was in an out-of-line function. The same (bad thing) happens if > the function is inlined. >... > It looks like is updating the stack on each iteration...This is -march=opteron > code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4. > > tst.c and Makefile attached. > > Nice, isn't it ? Please, probe where is my fault... The only fault is to post this issue here instead of the gcc Bugzilla. In your example the compiler should produce code not slower than with the out-of-line version when inlining. If it doesn't the bug in the compiler resulting in this should be fixed. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inlining can be _very_bad...
On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote: Hi all... I post this here as it can be of direct interest for kernel development (as I recall many discussions about inlining yes or no...). Testing other problems, I finally got this this issue: the same short and stupid loop lasted from 3 to 5 times more if it was in main() than if it was in an out-of-line function. The same (bad thing) happens if the function is inlined. ... It looks like is updating the stack on each iteration...This is -march=opteron code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4. tst.c and Makefile attached. Nice, isn't it ? Please, probe where is my fault... The only fault is to post this issue here instead of the gcc Bugzilla. In your example the compiler should produce code not slower than with the out-of-line version when inlining. If it doesn't the bug in the compiler resulting in this should be fixed. cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inlining can be _very_bad...
On Thu, 29 Mar 2007 19:52:54 +0200, Adrian Bunk [EMAIL PROTECTED] wrote: On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote: Hi all... I post this here as it can be of direct interest for kernel development (as I recall many discussions about inlining yes or no...). Testing other problems, I finally got this this issue: the same short and stupid loop lasted from 3 to 5 times more if it was in main() than if it was in an out-of-line function. The same (bad thing) happens if the function is inlined. ... It looks like is updating the stack on each iteration...This is -march=opteron code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4. tst.c and Makefile attached. Nice, isn't it ? Please, probe where is my fault... The only fault is to post this issue here instead of the gcc Bugzilla. Sorry, my intention was just something like 'take a look at your reduction-like code, perhaps its slw', something like checksum funtions in tcp or raid that are inlined expecting to be faster and in fact they are slower... In your example the compiler should produce code not slower than with the out-of-line version when inlining. If it doesn't the bug in the compiler resulting in this should be fixed. That's what I expected, but... Going to gcc bugzilla... -- J.A. Magallon jamagallon()ono!com \ Software is like sex: \ It's better when it's free Mandriva Linux release 2007.1 (Cooker) for i586 Linux 2.6.20-jam06 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #2 SMP PREEMPT - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inlining can be _very_bad...
On Fri, Mar 30, 2007 at 12:01:11AM +0200, J.A. Magallón wrote: On Thu, 29 Mar 2007 19:52:54 +0200, Adrian Bunk [EMAIL PROTECTED] wrote: On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote: Hi all... I post this here as it can be of direct interest for kernel development (as I recall many discussions about inlining yes or no...). Testing other problems, I finally got this this issue: the same short and stupid loop lasted from 3 to 5 times more if it was in main() than if it was in an out-of-line function. The same (bad thing) happens if the function is inlined. ... It looks like is updating the stack on each iteration...This is -march=opteron code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4. tst.c and Makefile attached. Nice, isn't it ? Please, probe where is my fault... The only fault is to post this issue here instead of the gcc Bugzilla. Sorry, my intention was just something like 'take a look at your reduction-like code, perhaps its slw', something like checksum funtions in tcp or raid that are inlined expecting to be faster and in fact they are slower... Unless a function that has more than 1 caller is very tiny or reduces at compile time to a very tiny rest, it's not expected that inlining was faster on current CPUs. But most times that's already only up to the compiler - e.g. current gcc versions already automatically inline all static functions with only 1 caller. In your example the compiler should produce code not slower than with the out-of-line version when inlining. If it doesn't the bug in the compiler resulting in this should be fixed. That's what I expected, but... Going to gcc bugzilla... Thanks. cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inlining can be _very_bad...
On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote: > It looks like is updating the stack on each iteration...This is -march=opteron > code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4. > > tst.c and Makefile attached. > > Nice, isn't it ? Please, probe where is my fault... Yes, gcc sucks in its handling of large return values, news at 11. I have several outstanding bugs on cases where gcc could keep things in registers but doesn't. That said, it tends to do much better on plain integer code, as that is what it gets tuned for. Do NOT propagate the blanket myth that inlining is a bad thing. It is very useful for small functions where the overhead associated with call/ret sequences and register clobbers overshadows the work being done. The call/ret updates alone can make a big difference when there are lots of other (more useful) memory transactions to complete. Take a look at things like the notifier hooks for an example of something that does far too little work per function call and should really be inlined. -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <[EMAIL PROTECTED]>. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Inlining can be _very_bad...
Hi all... I post this here as it can be of direct interest for kernel development (as I recall many discussions about inlining yes or no...). Testing other problems, I finally got this this issue: the same short and stupid loop lasted from 3 to 5 times more if it was in main() than if it was in an out-of-line function. The same (bad thing) happens if the function is inlined. The basic code is like this: float data[]; [inline] double one() { double sum; sum = 0; for (i=0; i tst T0: 1145.12 ms S0: 268435456.00 T1: 457.19 ms S1: 268435456.00 With one() inlined: apolo:~/e4> tst T0: 1200.52 ms S0: 268435456.00 T1: 1200.14 ms S1: 268435456.00 Looking at the assembler, the non-inlined version does: .L2: cvtss2sd(%rdx,%rax,4), %xmm0 incq%rax cmpq$268435456, %rax addsd %xmm0, %xmm1 jne .L2 and the inlined .L13: cvtss2sd(%rdx,%rax,4), %xmm0 incq%rax cmpq$268435456, %rax addsd 8(%rsp), %xmm0 movsd %xmm0, 8(%rsp) jne .L13 It looks like is updating the stack on each iteration...This is -march=opteron code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4. tst.c and Makefile attached. Nice, isn't it ? Please, probe where is my fault... -- J.A. Magallon \ Software is like sex: \ It's better when it's free Mandriva Linux release 2007.1 (Cooker) for i586 Linux 2.6.20-jam06 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #1 SMP PREEMPT Makefile Description: Binary data #include #include #include #define SIZE 256*1024*1024 #define elap(t0,t1) \ ((1000*t1.tv_sec+0.001*t1.tv_usec) - (1000*t0.tv_sec+0.001*t0.tv_usec)) double one(); float *data; #ifdef INLINE inline #endif double one() { int i; double sum; sum = 0; asm("#FBGN"); for (i=0; i
Inlining can be _very_bad...
Hi all... I post this here as it can be of direct interest for kernel development (as I recall many discussions about inlining yes or no...). Testing other problems, I finally got this this issue: the same short and stupid loop lasted from 3 to 5 times more if it was in main() than if it was in an out-of-line function. The same (bad thing) happens if the function is inlined. The basic code is like this: float data[]; [inline] double one() { double sum; sum = 0; for (i=0; iSIZE; i++) sum += data[i]; return sum; } int main() { gettimeofday(tv0,0); for (i=0; iSIZE; i++) s0 += data[i]; gettimeofday(tv1,0); printf(T0: %6.2f ms\n,elap(tv0,tv1)); gettimeofday(tv0,0); s1 = one(); gettimeofday(tv1,0); printf(T1: %6.2f ms\n,elap(tv0,tv1)); } The times if one() is not inlined (emt64, 2.33GHz): apolo:~/e4 tst T0: 1145.12 ms S0: 268435456.00 T1: 457.19 ms S1: 268435456.00 With one() inlined: apolo:~/e4 tst T0: 1200.52 ms S0: 268435456.00 T1: 1200.14 ms S1: 268435456.00 Looking at the assembler, the non-inlined version does: .L2: cvtss2sd(%rdx,%rax,4), %xmm0 incq%rax cmpq$268435456, %rax addsd %xmm0, %xmm1 jne .L2 and the inlined .L13: cvtss2sd(%rdx,%rax,4), %xmm0 incq%rax cmpq$268435456, %rax addsd 8(%rsp), %xmm0 movsd %xmm0, 8(%rsp) jne .L13 It looks like is updating the stack on each iteration...This is -march=opteron code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4. tst.c and Makefile attached. Nice, isn't it ? Please, probe where is my fault... -- J.A. Magallon jamagallon()ono!com \ Software is like sex: \ It's better when it's free Mandriva Linux release 2007.1 (Cooker) for i586 Linux 2.6.20-jam06 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #1 SMP PREEMPT Makefile Description: Binary data #include stdio.h #include stdlib.h #include sys/time.h #define SIZE 256*1024*1024 #define elap(t0,t1) \ ((1000*t1.tv_sec+0.001*t1.tv_usec) - (1000*t0.tv_sec+0.001*t0.tv_usec)) double one(); float *data; #ifdef INLINE inline #endif double one() { int i; double sum; sum = 0; asm(#FBGN); for (i=0; iSIZE; i++) sum += data[i]; asm(#FEND); return sum; } int main(int argc,char** argv) { struct timeval tv0,tv1; double s0,s1; inti; data = malloc(SIZE*sizeof(float)); for (i=0; iSIZE; i++) data[i] = 1; gettimeofday(tv0,0); s0 = 0; asm(#MBGN); for (i=0; iSIZE; i++) s0 += data[i]; asm(#MEND); gettimeofday(tv1,0); printf(T0: %6.2f ms\n,elap(tv0,tv1)); printf(S0: %0.2lf\n,s0); gettimeofday(tv0,0); s1 = one(); gettimeofday(tv1,0); printf(T1: %6.2f ms\n,elap(tv0,tv1)); printf(S1: %0.2lf\n,s1); free(data); return 0; }
Re: Inlining can be _very_bad...
On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote: It looks like is updating the stack on each iteration...This is -march=opteron code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4. tst.c and Makefile attached. Nice, isn't it ? Please, probe where is my fault... Yes, gcc sucks in its handling of large return values, news at 11. I have several outstanding bugs on cases where gcc could keep things in registers but doesn't. That said, it tends to do much better on plain integer code, as that is what it gets tuned for. Do NOT propagate the blanket myth that inlining is a bad thing. It is very useful for small functions where the overhead associated with call/ret sequences and register clobbers overshadows the work being done. The call/ret updates alone can make a big difference when there are lots of other (more useful) memory transactions to complete. Take a look at things like the notifier hooks for an example of something that does far too little work per function call and should really be inlined. -ben -- Time is of no importance, Mr. President, only life is important. Don't Email: [EMAIL PROTECTED]. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/