Re: Inlining can be _very_bad...

2007-03-29 Thread Adrian Bunk
On Fri, Mar 30, 2007 at 12:01:11AM +0200, J.A. Magallón wrote:
> On Thu, 29 Mar 2007 19:52:54 +0200, Adrian Bunk <[EMAIL PROTECTED]> wrote:
> 
> > On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote:
> > > Hi all...
> > > 
> > > I post this here as it can be of direct interest for kernel development
> > > (as I recall many discussions about inlining yes or no...).
> > > 
> > > Testing other problems, I finally got this this issue: the same short
> > > and stupid loop lasted from 3 to 5 times more if it was in main() than
> > > if it was in an out-of-line function. The same (bad thing) happens if
> > > the function is inlined.
> > >...
> > > It looks like is updating the stack on each iteration...This is 
> > > -march=opteron
> > > code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
> > > 
> > > tst.c and Makefile attached.
> > > 
> > > Nice, isn't it ? Please, probe where is my fault...
> > 
> > The only fault is to post this issue here instead of the gcc Bugzilla.
> 
> Sorry, my intention was just something like 'take a look at your
> reduction-like code, perhaps its slw', something like checksum
> funtions in tcp or raid that are inlined expecting to be faster
> and in fact they are slower...

Unless a function that has more than 1 caller is very tiny or reduces at 
compile time to a very tiny rest, it's not expected that inlining was 
faster on current CPUs.

But most times that's already only up to the compiler - e.g. current gcc 
versions already automatically inline all static functions with only 
1 caller.

> > In your example the compiler should produce code not slower than with 
> > the out-of-line version when inlining. If it doesn't the bug in the 
> > compiler resulting in this should be fixed.
> 
> That's what I expected, but...
> Going to gcc bugzilla...

Thanks.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Inlining can be _very_bad...

2007-03-29 Thread J.A. Magallón
On Thu, 29 Mar 2007 19:52:54 +0200, Adrian Bunk <[EMAIL PROTECTED]> wrote:

> On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote:
> > Hi all...
> > 
> > I post this here as it can be of direct interest for kernel development
> > (as I recall many discussions about inlining yes or no...).
> > 
> > Testing other problems, I finally got this this issue: the same short
> > and stupid loop lasted from 3 to 5 times more if it was in main() than
> > if it was in an out-of-line function. The same (bad thing) happens if
> > the function is inlined.
> >...
> > It looks like is updating the stack on each iteration...This is 
> > -march=opteron
> > code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
> > 
> > tst.c and Makefile attached.
> > 
> > Nice, isn't it ? Please, probe where is my fault...
> 
> The only fault is to post this issue here instead of the gcc Bugzilla.
> 

Sorry, my intention was just something like 'take a look at your
reduction-like code, perhaps its slw', something like checksum
funtions in tcp or raid that are inlined expecting to be faster
and in fact they are slower...

> In your example the compiler should produce code not slower than with 
> the out-of-line version when inlining. If it doesn't the bug in the 
> compiler resulting in this should be fixed.
> 

That's what I expected, but...
Going to gcc bugzilla...

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam06 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #2 SMP 
PREEMPT
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Inlining can be _very_bad...

2007-03-29 Thread Adrian Bunk
On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote:
> Hi all...
> 
> I post this here as it can be of direct interest for kernel development
> (as I recall many discussions about inlining yes or no...).
> 
> Testing other problems, I finally got this this issue: the same short
> and stupid loop lasted from 3 to 5 times more if it was in main() than
> if it was in an out-of-line function. The same (bad thing) happens if
> the function is inlined.
>...
> It looks like is updating the stack on each iteration...This is -march=opteron
> code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
> 
> tst.c and Makefile attached.
> 
> Nice, isn't it ? Please, probe where is my fault...

The only fault is to post this issue here instead of the gcc Bugzilla.

In your example the compiler should produce code not slower than with 
the out-of-line version when inlining. If it doesn't the bug in the 
compiler resulting in this should be fixed.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Inlining can be _very_bad...

2007-03-29 Thread Adrian Bunk
On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote:
 Hi all...
 
 I post this here as it can be of direct interest for kernel development
 (as I recall many discussions about inlining yes or no...).
 
 Testing other problems, I finally got this this issue: the same short
 and stupid loop lasted from 3 to 5 times more if it was in main() than
 if it was in an out-of-line function. The same (bad thing) happens if
 the function is inlined.
...
 It looks like is updating the stack on each iteration...This is -march=opteron
 code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
 
 tst.c and Makefile attached.
 
 Nice, isn't it ? Please, probe where is my fault...

The only fault is to post this issue here instead of the gcc Bugzilla.

In your example the compiler should produce code not slower than with 
the out-of-line version when inlining. If it doesn't the bug in the 
compiler resulting in this should be fixed.

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Inlining can be _very_bad...

2007-03-29 Thread J.A. Magallón
On Thu, 29 Mar 2007 19:52:54 +0200, Adrian Bunk [EMAIL PROTECTED] wrote:

 On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote:
  Hi all...
  
  I post this here as it can be of direct interest for kernel development
  (as I recall many discussions about inlining yes or no...).
  
  Testing other problems, I finally got this this issue: the same short
  and stupid loop lasted from 3 to 5 times more if it was in main() than
  if it was in an out-of-line function. The same (bad thing) happens if
  the function is inlined.
 ...
  It looks like is updating the stack on each iteration...This is 
  -march=opteron
  code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
  
  tst.c and Makefile attached.
  
  Nice, isn't it ? Please, probe where is my fault...
 
 The only fault is to post this issue here instead of the gcc Bugzilla.
 

Sorry, my intention was just something like 'take a look at your
reduction-like code, perhaps its slw', something like checksum
funtions in tcp or raid that are inlined expecting to be faster
and in fact they are slower...

 In your example the compiler should produce code not slower than with 
 the out-of-line version when inlining. If it doesn't the bug in the 
 compiler resulting in this should be fixed.
 

That's what I expected, but...
Going to gcc bugzilla...

--
J.A. Magallon jamagallon()ono!com \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam06 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #2 SMP 
PREEMPT
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Inlining can be _very_bad...

2007-03-29 Thread Adrian Bunk
On Fri, Mar 30, 2007 at 12:01:11AM +0200, J.A. Magallón wrote:
 On Thu, 29 Mar 2007 19:52:54 +0200, Adrian Bunk [EMAIL PROTECTED] wrote:
 
  On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote:
   Hi all...
   
   I post this here as it can be of direct interest for kernel development
   (as I recall many discussions about inlining yes or no...).
   
   Testing other problems, I finally got this this issue: the same short
   and stupid loop lasted from 3 to 5 times more if it was in main() than
   if it was in an out-of-line function. The same (bad thing) happens if
   the function is inlined.
  ...
   It looks like is updating the stack on each iteration...This is 
   -march=opteron
   code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
   
   tst.c and Makefile attached.
   
   Nice, isn't it ? Please, probe where is my fault...
  
  The only fault is to post this issue here instead of the gcc Bugzilla.
 
 Sorry, my intention was just something like 'take a look at your
 reduction-like code, perhaps its slw', something like checksum
 funtions in tcp or raid that are inlined expecting to be faster
 and in fact they are slower...

Unless a function that has more than 1 caller is very tiny or reduces at 
compile time to a very tiny rest, it's not expected that inlining was 
faster on current CPUs.

But most times that's already only up to the compiler - e.g. current gcc 
versions already automatically inline all static functions with only 
1 caller.

  In your example the compiler should produce code not slower than with 
  the out-of-line version when inlining. If it doesn't the bug in the 
  compiler resulting in this should be fixed.
 
 That's what I expected, but...
 Going to gcc bugzilla...

Thanks.

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Inlining can be _very_bad...

2007-03-28 Thread Benjamin LaHaise
On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote:
> It looks like is updating the stack on each iteration...This is -march=opteron
> code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
> 
> tst.c and Makefile attached.
> 
> Nice, isn't it ? Please, probe where is my fault...

Yes, gcc sucks in its handling of large return values, news at 11.  I have 
several outstanding bugs on cases where gcc could keep things in registers 
but doesn't.

That said, it tends to do much better on plain integer code, as that is 
what it gets tuned for.  Do NOT propagate the blanket myth that inlining is 
a bad thing.  It is very useful for small functions where the overhead 
associated with call/ret sequences and register clobbers overshadows the 
work being done.  The call/ret updates alone can make a big difference when 
there are lots of other (more useful) memory transactions to complete.  Take 
a look at things like the notifier hooks for an example of something that 
does far too little work per function call and should really be inlined.

-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Inlining can be _very_bad...

2007-03-28 Thread J.A. Magallón
Hi all...

I post this here as it can be of direct interest for kernel development
(as I recall many discussions about inlining yes or no...).

Testing other problems, I finally got this this issue: the same short
and stupid loop lasted from 3 to 5 times more if it was in main() than
if it was in an out-of-line function. The same (bad thing) happens if
the function is inlined.

The basic code is like this:

float   data[];

[inline] double one()
{
double sum;
sum = 0;
for (i=0; i tst
T0: 1145.12 ms
S0: 268435456.00
T1: 457.19 ms
S1: 268435456.00

With one() inlined:

apolo:~/e4> tst
T0: 1200.52 ms
S0: 268435456.00
T1: 1200.14 ms
S1: 268435456.00

Looking at the assembler, the non-inlined version does:

.L2:
cvtss2sd(%rdx,%rax,4), %xmm0
incq%rax
cmpq$268435456, %rax
addsd   %xmm0, %xmm1
jne .L2

and the inlined

.L13:
cvtss2sd(%rdx,%rax,4), %xmm0
incq%rax
cmpq$268435456, %rax
addsd   8(%rsp), %xmm0
movsd   %xmm0, 8(%rsp)
jne .L13

It looks like is updating the stack on each iteration...This is -march=opteron
code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.

tst.c and Makefile attached.

Nice, isn't it ? Please, probe where is my fault...

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam06 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #1 SMP 
PREEMPT


Makefile
Description: Binary data
#include 
#include 
#include 

#define SIZE 256*1024*1024

#define elap(t0,t1) \
	((1000*t1.tv_sec+0.001*t1.tv_usec) - (1000*t0.tv_sec+0.001*t0.tv_usec))

double  one();

float	*data;

#ifdef INLINE
inline
#endif
double one()
{
	int i;
	double sum;

	sum = 0;
	asm("#FBGN");
	for (i=0; i

Inlining can be _very_bad...

2007-03-28 Thread J.A. Magallón
Hi all...

I post this here as it can be of direct interest for kernel development
(as I recall many discussions about inlining yes or no...).

Testing other problems, I finally got this this issue: the same short
and stupid loop lasted from 3 to 5 times more if it was in main() than
if it was in an out-of-line function. The same (bad thing) happens if
the function is inlined.

The basic code is like this:

float   data[];

[inline] double one()
{
double sum;
sum = 0;
for (i=0; iSIZE; i++) sum += data[i];
return sum;
}

int main()
{
gettimeofday(tv0,0);
for (i=0; iSIZE; i++)
s0 += data[i];
gettimeofday(tv1,0);
printf(T0: %6.2f ms\n,elap(tv0,tv1));
gettimeofday(tv0,0);
s1 = one();
gettimeofday(tv1,0);
printf(T1: %6.2f ms\n,elap(tv0,tv1));
}

The times if one() is not inlined (emt64, 2.33GHz):

apolo:~/e4 tst
T0: 1145.12 ms
S0: 268435456.00
T1: 457.19 ms
S1: 268435456.00

With one() inlined:

apolo:~/e4 tst
T0: 1200.52 ms
S0: 268435456.00
T1: 1200.14 ms
S1: 268435456.00

Looking at the assembler, the non-inlined version does:

.L2:
cvtss2sd(%rdx,%rax,4), %xmm0
incq%rax
cmpq$268435456, %rax
addsd   %xmm0, %xmm1
jne .L2

and the inlined

.L13:
cvtss2sd(%rdx,%rax,4), %xmm0
incq%rax
cmpq$268435456, %rax
addsd   8(%rsp), %xmm0
movsd   %xmm0, 8(%rsp)
jne .L13

It looks like is updating the stack on each iteration...This is -march=opteron
code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.

tst.c and Makefile attached.

Nice, isn't it ? Please, probe where is my fault...

--
J.A. Magallon jamagallon()ono!com \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam06 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #1 SMP 
PREEMPT


Makefile
Description: Binary data
#include stdio.h
#include stdlib.h
#include sys/time.h

#define SIZE 256*1024*1024

#define elap(t0,t1) \
	((1000*t1.tv_sec+0.001*t1.tv_usec) - (1000*t0.tv_sec+0.001*t0.tv_usec))

double  one();

float	*data;

#ifdef INLINE
inline
#endif
double one()
{
	int i;
	double sum;

	sum = 0;
	asm(#FBGN);
	for (i=0; iSIZE; i++)
		sum += data[i];
	asm(#FEND);

	return sum;
}

int main(int argc,char** argv)
{
	struct timeval	tv0,tv1;
	double			s0,s1;
	inti;

	data = malloc(SIZE*sizeof(float));
	for (i=0; iSIZE; i++)
		data[i] = 1;

	gettimeofday(tv0,0);
	s0 = 0;
	asm(#MBGN);
	for (i=0; iSIZE; i++)
		s0 += data[i];
	asm(#MEND);
	gettimeofday(tv1,0);
	printf(T0: %6.2f ms\n,elap(tv0,tv1));
	printf(S0: %0.2lf\n,s0);

	gettimeofday(tv0,0);
		s1 = one();
	gettimeofday(tv1,0);
	printf(T1: %6.2f ms\n,elap(tv0,tv1));
	printf(S1: %0.2lf\n,s1);

	free(data);

	return 0;
}



Re: Inlining can be _very_bad...

2007-03-28 Thread Benjamin LaHaise
On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote:
 It looks like is updating the stack on each iteration...This is -march=opteron
 code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
 
 tst.c and Makefile attached.
 
 Nice, isn't it ? Please, probe where is my fault...

Yes, gcc sucks in its handling of large return values, news at 11.  I have 
several outstanding bugs on cases where gcc could keep things in registers 
but doesn't.

That said, it tends to do much better on plain integer code, as that is 
what it gets tuned for.  Do NOT propagate the blanket myth that inlining is 
a bad thing.  It is very useful for small functions where the overhead 
associated with call/ret sequences and register clobbers overshadows the 
work being done.  The call/ret updates alone can make a big difference when 
there are lots of other (more useful) memory transactions to complete.  Take 
a look at things like the notifier hooks for an example of something that 
does far too little work per function call and should really be inlined.

-ben
-- 
Time is of no importance, Mr. President, only life is important.
Don't Email: [EMAIL PROTECTED].
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/